Changes

Jump to navigation Jump to search
484 bytes added ,  10:15, 25 July 2018
As you can see, I still have duplicates in both the MAClean and IPOClean files. I ran an aggregate function to get rid of these duplicates:
 
There are two companies that have the name Masspower in the MACleanNoDups file. One is written in all caps and will thus not be caught by the aggregate function. I will select only the companies where the primary keys occurs once and join this to MAClean. I will then select needed info from MANoDps.
DROP TABLE MACleanNoDups;
CREATE TABLE MACleanNoDups AS
SELECT targetnameA.*, targetstateeffectivedate, announceddatetransactionamt, MIN(x1) as x1enterpriseval, MINacquirorstatus FROM MAClean AS A JOIN (x2) as x2 SELECT targetname, MIN(method) as methodtargetstate, MIN(conamestd) as conamestd, MIN(coname) as conameannounceddate, MINCOUNT(statecode*) as statecode, MIN(datefirstinv) as datefirstinv, MIN(targetnamestd) as targetnamestd FROM MAClean GROUP BY targetname, targetstate, announceddate;HAVING COUNT(*)=1 ) AS B --7189There are two companies that have the name Masspower in the MACleanNoDups fileON A.targetname=B.targetname AND A.targetstate=B. One is written in all caps and will thus not be caught by the aggregate functiontargetstate AND A. You will have to find it manually and delete it in order to ensure that your joining of MAs to companybasecore will not add in extra rowsannounceddate=B. announceddate DELETE FROM MaCleanNoDups WHERE conameLEFT JOIN MANoDups AS C ON A.targetname=C.targetname AND A.targetstate='Masspower' C.targetstate AND A.announceddate='2006-03-15'C.announceddate; --71887171
SELECT COUNT(*) FROM(SELECT DISTINCT coname, statecode, datefirstinv FROM MACleanNoDups)a;
--7171
 
Thus the portco primary key is unique in the table. We will use this later.
Now do the same for the IPOs.
 
DROP TABLE IPOCleanNoDups;
CREATE TABLE IPOCleanNoDups AS
SELECT issuernameA.*, issuerstateprincipalamt, issuedateproceedsamt, MIN(x1) naicode as x1naics, MINzipcode, status, foundeddate FROM IPOClean AS A JOIN (x2) as x2 SELECT issuername, issuerstate, MIN(method) as methodissuedate, MINCOUNT(conamestd*) as conamestdFROM IPOClean GROUP BY issuername, MIN(coname) as conameissuerstate, MINissuedate HAVING COUNT(statecode*) as statecode, =1 MIN(datefirstinv ) as datefirstinv, MIN(issuernamestd) as issuernamestdAS B FROM IPOCleanON A.issuername=B.issuername AND A.issuerstate=B.issuerstate AND A.issuedate=B.issuedate GROUP BY LEFT JOIN IPONoDups AS C ON A.issuername=C.issuer AND A.issuerstate=C.statecode AND A.issuedate=C.issuedate; --2136   SELECT COUNT(*) FROM(SELECT DISTINCT coname, issuerstatestatecode, issuedatedatefirstinv FROM IPOCleanNoDups)a; --21412136
Now the duplicates are out of the MAClean and IPOClean data and we can start to construct the ExitKeysClean table.
158

edits

Navigation menu