The third error is in an area code where 1-8 is written. This hyphen causes errors. Interestingly, the line number given by PuTTY was correct, and I found it in my text file and deleted it manually.
These were the only errors I encountered while loading this table.
==Instructions on Matching PortCos to Issuers and M&As From Ed==
Get portco keys
DROP TABLE portcokeys;
CREATE TABLE portcokey AS
SELECT coname, statecode, datefirst
FROM portcocore;
--CHECK COUNT IS SAME AS portcocore OR THESE KEYS ARE VALID AND FIX THAT FIRST
Get distinct coname and put it in a file
\COPY (SELECT DISTINCT coname FROM portcokeys) TO 'DistinctConame.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
Match that to itself
Move DistinctConame.txt to E:\McNair\Software\Scripts\Matcher\Input
Open powershell and change directory to E:\McNair\Software\Scripts\Matcher
Run the matcher in mode2:
perl Matcher.pl -file1="DistinctConame.txt" -file2="DistinctConame.txt" -mode=2
Pick up the output file from E:\McNair\Software\Scripts\Matcher\Output (it is probably called DistinctConame.txt-DistinctConame.txt.matched) and move it to your Z drive directory
Load the matches into the dbase
DROP TABLE PortcoStd;
CREATE TABLE PortcoStd (
conamestd varchar(255),
coname varchar(255),
norm varchar(100),
x1 varchar(255),
x2 varchar(255)
);
\COPY CohortCoStd FROM 'DistinctConame.txt-DistinctConame.txt.matched' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--YOUR COUNT
Join the Conamestd back to the portcokeys table to create your matching table
DROP TABLE portcokeysstd;
CREATE TABLE portcokeysstd AS
SELECT B.conamestd, A.*
FROM portcokey AS A
JOIN PortcoStd AS B ON A.coname=B.coname
--CHECK COUNT IS SAME AS portcokey OR YOU LOST SOME NAMES OR INFLATED THE DATA
Put that in a file for matching (conamestd is in first column by construction)
\COPY portcokeysstd TO 'PortCoMatchInput.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--YOUR COUNT
Now prepare to repeat that process for M&A's and IPOs:
*For M&As your keys (for now) will be targetname, statecode, dateannounced
*For IPOs your keys (for now) will be issuername, statecode, issuedate
*FIRST CLEAN EACH DATASET. The easiest way to remove duplicates (if you have lots of them) is to use an aggregate query:
DROP TABLE IPOCoreNoDups;
CREATE TABLE IPOCoreNoDups as
SELECT issuername, statecode, issuedate, max(var1) as var1, avg(var2) as var2, ...
FROM IPOCore GROUP BY issuername, statecode, issuedate ORDER BY issuername, statecode, issuedate;
Note that you need all vars to be inside aggregates and that you should choose the aggregate function sensibly by looking at the data. Generally use MAX for amounts and MIN for dates. You can also use MAX or MIN on text strings.
And now build the same stacks as before but to create Issuerkeystd and TargetKeystd (or whatever you call them). Make sure that issuerstd (and targetnamestd) is in the first column.
Now match Portcokeystd to Issuerkeystd, and match Portcokeystd to Targetkeystd
*Move the files into the input director as before
*Run the matcher script but WITHOUT mode 2:
perl Matcher.pl -file1="PortCoMatchInput.txt" -file2="IssuerMatchInput.txt"
perl Matcher.pl -file1="PortCoMatchInput.txt" -file2="TargetMatchInput.txt"
Open each of these files in excel and mark good matches with 1s and bad matches with 0s by adding columns to compare dates, states, etc, and filtering.
When you are done:
*Build a new sheet of just good matches.
*Save the excel files
*Copy each of your match sheets to a text file
*CREATE TABLE to reflect the data you are going to load (include std names and keys)
*\COPY the data (using the exact copy command above but changing the table and file names) into the table
*Celebrate!
*Next we'll deal with any firms that have an IPO and an M&A and decide which we'll keep
*And then we'll join in the chosen IPO and M&A data and move on!