Both of these projects (and as a corollary, this project) are dependent on the [[Demo Day Page Parser]], [[Industry Classifier]], and the [[Whois Parser]].
==7/9/18 UpdateMost Recent Work==
Here's a project update on the work that has been done since coming to McNair. The most recent file is
The newest updated sheet of cohort company info is under the '''Cohorts Final''' sheet of '''The File to Rule Them All.xlsx'''.
Working with [[Maxine Tao]], we have matched companies to their respective pages and information found in Crunchbase (via UUID). We ensured single matches by doing a 1-1-1-1 match with our data and with Crunchbase (using the Matcher). We then received additional information on 8092 companies. The following new information (on top of what we already had) is included in the sheet:
*short_description
*long_description
*status (was merged with costatus)
===The Equity Variables: COMPLETE===
[[Maxine Tao]] and I have added six new variables to the '''Accelerators Final''' sheet. Those variables are:
*The average % of equity among accelerators who take equity (rough estimate--do not use for anything official) is 6.49% (got this number by only looking at accelerators who take equity, averaging equity amount for accelerators who report a range (e.g. 4%-10% equity would be coded as 7% equity) and took mean.
===Matching Accelerators to UUIDs: COMPLETEvia Crunchbase===
We've also added UUIDs for 163 of our 166 accelerators. The UUIDs can be found in Column AE of the '''Accelerators Final''' sheet.
This is the master file and should never be modified unless we find a UUID changed. ALL OTHER SHEETS with UUIDs are linked to this sheet so its changes will be reflected elsewhere.
More information can be found on the [[Crunchbase Data ]] page.
===Linking Accelerators to Founders/LinkedIn Crawling: COMPLETE===
[[Grace Tan]] got the [[LinkedIn Crawler (Python)]] to work, which means we currently have the following information about accelerator founders: