The third error is in an area code where 1-8 is written. This hyphen causes errors. Interestingly, the line number given by PuTTY was correct, and I found it in my text file and deleted it manually.
These were the only errors I encountered while loading this table.
DROP TABLE mas;
CREATE TABLE mas (
announceddate date,
effectivedate date,
targetname varchar(255),
targetstate varchar(100),
acquirorname varchar(255),
acquirorstate varchar(100),
transactionamt money,
enterpriseval varchar(255),
acquirorstatus varchar(150)
);
\COPY mas FROM 'MAUSTargetComp100pc1985-July2018-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--119432
==Instructions on Matching PortCos to Issuers and M&As From Ed==
FROM MAClean
GROUP BY targetname, targetstate, announceddate;
--7189
There are two companies that have the name Masspower in the MACleanNoDups file. One is written in all caps and will thus not be caught by the aggregate function. You will have to find it manually and delete it in order to ensure that your joining of MAs to companybasecore will not add in extra rows.
DELETE FROM MaCleanNoDups WHERE coname='Masspower' AND announceddate='2006-03-15';
--7188
Now do the same for the IPOs.
DROP TABLE IPOCleanNoDups;
--2141
Now the duplicates are out of the MAClean and IPOClean data and we can start to construct the PortCoExit ExitKeysClean table. ==Creating ExitKeysClean===
First I looked for the PortCos that were in both the MAs and the IPOs. I did this using:
FROM IPOClean AS A
JOIN MAClean AS B ON A.coname=B.coname AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv;
--92
I then pulled out the IPOs that were only IPOs and MAs that were only MAs.
DROP TABLE ExitKeysCleanIPONoExit;
CREATE TABLE ExitKeysCleanIPONoExit AS
SELECT A.*, B.targetname, B.targetstate, B.announceddate
FROM IPOCleanNoDups AS A LEFT JOIN MACleanNoDups AS B ON A.coname=B.Coname AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv WHERE B.statecode IS NULL AND
B.coname IS NULL AND B.datefirstinv IS NULL;
--2049
DROP TABLE ExitKeysCleanMANoExit;
CREATE TABLE ExitKeysCleanMANoExit AS
SELECT A.*, B.issuername, B.issuerstate, B.issuedate
FROM MACleanNoDups AS A LEFT JOIN IPOCleanNoDups AS B ON A.coname=B.Coname AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv WHERE B.statecode IS NULL AND
B.coname IS NULL AND B.datefirstinv IS NULL;
--7096
Since 2141-92=2019 and 7188-92=1096, we know that the duplicate companies were extracted successfully.
I then created a column that identifies whether a company underwent an MA or an IPO. A 0 indicates an MA and a 1 indicates an IPO.
DROP TABLE ExitKeysCleanMA;
CREATE TABLE ExitKeysCleanMA AS
SELECT *,
CASE
WHEN issuername IS NULL AND issuerstate IS NULL AND issuedate IS NULL THEN 0 ELSE NULL END AS MAvsIPO
FROM exitkeyscleanmanoexit;
--7096
DROP TABLE ExitKeysCleanIPO;
CREATE TABLE ExitKeysCleanIPO AS
SELECT *,
CASE
WHEN targetname IS NULL AND targetstate IS NULL AND announceddate IS NULL THEN 1 ELSE NULL END AS MAvsIPO
FROM exitkeyscleaniponoexit as A;
--2049
I then wrote a query to check which date was lower and used that to indicate whether I chose the company to have undergone an MA or an IPO in the column MSvsIPO(I chose based on which process came first). A 0 in the column represented an MA being chosen and a 1 represented an IPO being chosen.
FROM IPOMAForReview;
--92
Then out of this table I extracted the MAs and IPOs using the the created MAvsIPO flag:
DROP TABLE MASelected;
CREATE TABLE MASelected AS
SELECT A.coname, A.statecode, A.datefirstinv, B.targetname, B.targetstate, B.announceddate, B.issuername, B.issuerstate, B.issuedate, B.mavsipo
FROM IPOMASelected AS B LEFT JOIN companybasecore AS A ON A.coname=B.coname AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv WHERE mavsipo=0;
--25
DROP TABLE IPOSelected;
CREATE TABLE IPOSelected AS
SELECT A.coname, A.statecode, A.datefirstinv, B.targetname, B.targetstate, B.announceddate, B.issuername, B.issuerstate, B.issuedate, B.mavsipo
FROM IPOMASelected AS B LEFT JOIN companybasecore AS A ON A.coname=B.coname AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv WHERE mavsipo=1;
--67
I then put together all of the IPOs that I selected into one table and all of the MAs I selected into another table. I did this using UNION statements. I did this because I didn't want to duplicate any IPOs and UNION acts as a SELECT DISTINCT statement. Thus if the number of the two tables added together equaled the final count of the table, I did not have any duplicate rows.
DROP TABLE SelectedIPOSAll;
CREATE TABLE SelectedIPOSALL AS
SELECT A.coname, A.statecode, A.datefirstinv, A.targetname, A.targetstate, A.announceddate, A.issuername, A.issuerstate, A.issuedate, A.mavsipo FROM
ExitKeysCleanIPO AS A
UNION
SELECT IPOSelected.* FROM
IPOSelected;
--2116
--Makes sense because 2049+67=2116
DROP TABLE SelectedMASAll;
CREATE TABLE SelectedMASALL AS
SELECT A.coname, A.statecode, A.datefirstinv, A.targetname, A.targetstate, A.announceddate, A.issuername, A.issuerstate, A.issuedate, A.mavsipo FROM
ExitKeysCleanMA AS A
UNION
SELECT MASelected.* FROM
MASelected;
--7121
--Makes sense because 25+7096=7121
I then checked both of these files to make sure that their primary keys were still distinct and thus valid.
SELECT COUNT(*) FROM(SELECT DISTINCT targetname, targetstate, announceddate FROM SelectedMasAll)a;
--7121
SELECT COUNT(*) FROM(SELECT targetname, targetstate, announceddate FROM SelectedMasAll)a;
--7121
SELECT COUNT(*) FROM(SELECT issuername, issuerstate, issuedate FROM SelectedIPOMA)a;
--2116
SELECT COUNT(*) FROM(SELECT DISTINCT issuername, issuerstate, issuedate FROM SelectedIPOMA)a;
--2116
I combined the two tables and checked the result to make sure it Unioned correctly.
DROP TABLE SelectedIPOMA;
CREATE TABLE SelectedIPOMA AS
SELECT A.*
FROM SelectedMASALL AS A
UNION
SELECT B.*
FROM SelectedIPOSALL AS B;
--9237
SELECT COUNT(*) FROM SelectedIPOMA WHERE mavsipo=0;
--7121
SELECT COUNT(*) FROM SelectedIPOMA WHERE mavsipo=1;
--2116
SELECT COUNT(*) FROM SelectedIPOMA WHERE targetname IS NOT NULL;
--7188
SELECT COUNT(*) FROM SelectedIPOMA WHERE issuername IS NOT NULL;
--2141
Everything seems to check out, so we can move on to joining the SelectedIPOMA table to the companybasecore table to create the ExitKeysClean table.
DROP TABLE ExitKeysClean;
CREATE TABLE ExitKeysClean AS
SELECT A.coname, A.statecode, A.datefirstinv, B.targetname, B.targetstate, B.announceddate, B.issuername, B.issuerstate, B.issuedate, B.mavsipo
FROM companybasecore AS A
LEFT JOIN SelectedIPOMA AS B ON A.coname=B.coname AND A.statecode=B.statecode AND A.datefirstinv=B.datefirstinv;
--48000
Since 48000 is the same number of rows in the companybasecore, we know that the join was successful and no rows were added that we don't want.
==Create the PortCoExit Table==
From consulting with Ed and the VC Database Rebuild wiki, I decided to make the PortCoExit table with an mavsipo, an exitdate, an exited, and an exitvalue column. DROP TABLE PortCoExit;
CREATE TABLE PortCoExit AS
SELECT A.coname, A.statecode, A.datefirstinv,
CASE WHEN a.mavsipo=0 THEN 0::int WHEN a.mavsipo=1 THEN 1::int ELSE NULL::int END AS mavsipo,
CASE WHEN a.mavsipo=0 THEN B.announceddate WHEN a.mavsipo=1 THEN C.issuedate ELSE NULL::date END AS exitdate,
CASE WHEN a.mavsipo=0 OR a.mavsipo=1 THEN 1::int ELSE 0::int END AS exited,
CASE WHEN a.mavsipo=0 THEN B.transactionamt WHEN a.mavsipo=1 THEN C.proceedsamt ELSE NULL::money END AS exitvalue
FROM ExitKeysClean AS A
LEFT JOIN MANoDups AS B ON A.targetname=B.targetname AND A.targetstate=B.targetstate AND A.announceddate=B.announceddate
LEFT JOIN IPONoDups AS C ON A.issuername=C.issuer AND A.issuerstate=C.statecode AND A.issuedate=C.issuedate;
--48000
The issue that I am currently having is that the exitvalue column for MAs is always Null, and there are some weird numbers in the table. I will continue to look for these. I have been continuously checking for duplicates and validity of the primary keys, so I do not believe these problems to be due to duplicates