6/29 -- I filtered out accelerator and investment matches that had the same data as the terms of joining given on the accelerator website. Then I took this data and matched it against the cohort list for companies without cohort years. I was only able to find 5 companies, which means this approach will not get us the data we want. After calling Ed, I matched a list of company names (from our data) to itself and a list of company names (from crunchbase) to itself. These two files have not been cleaned but they are in McNair\Software\Database Scripts\Crunchbase2 and have -MATCHED at the ends of their file names.
---------------------------------------------------------------------------------------------
7/9 - Tried to understand the output of the matcher to understand the results from last week. After talking to Dylan and Connor, we decided to go through all of the matches from our data that were flagged as multiple matches. In a file called 'company name self matches', orange highlights are a minor normalization difference, red highlights are most likely a duplicate, yellow highlights seem to be duplicates but it wasn't obvious to me.