Since the count of fundbasecore is the same as the number of distinct fund names, we know that the fundbasecore table is clean. In the first query I am finding duplicate rows and choosing the row that has the greater last investment date. I then match this table back to fundbasenound but choose all the rows from fundbasecore for which there is no corresponding fund in fundups based on fund name and date of last investment. This allows the funds with the earlier date of last investment to be chosen.
===Cleaning Firmbase===
The primary key for firms will be firm name. First I got rid of all undisclosed firms. I also filtered out two firms that have identical firm names and founding dates. The reason for this is because I use founding dates to filter out duplicate firm names. If there are two rows that have the same firm name and founding date, they will not be filtered out by the third query below. Thus, I chose to filter those out completely.
DROP TABLE firmbasenound;
CREATE TABLE firmbasenound AS
SELECT DISTINCT * FROM firmbase WHERE firmname NOT LIKE '%Undisclosed Firm%' AND firmname NOT LIKE '%Amundi%' AND firmname NOT LIKE '%Schroder Adveq Management%';
--15452
SELECT COUNT(*) FROM(SELECT DISTINCT firmname FROM firmbasenound)a;
--15437
Since these counts are not equal we will have to clean the table further. We will use the same method from before.
DROP TABLE firmdups;
CREATE TABLE firmdups AS SELECT
firmname, max(foundingdate) as foundingdate FROM firmbasenound GROUP BY firmname HAVING COUNT(*)>1;
--15
DROP TABLE firmbasecore;
CREATE TABLE firmbasecore AS
SELECT A.* FROM firmbasenound AS A LEFT JOIN firmdups AS B ON A.firmname=B.firmname AND A.foundingdate=B.foundingdate WHERE B.firmname IS NULL AND B.foundingdate IS NULL;
--15437
Since the count of firmbasecore and the DISTINCT query are the same, the firm table is now clean.
==Instructions on Matching PortCos to Issuers and M&As From Ed==