Changes

Jump to navigation Jump to search
1,865 bytes removed ,  17:27, 19 June 2017
no edit summary
Old notes are [[Old Venture Capital Data Work]]
All pulls and processing scripts are in: E:\McNair\Projects\{{Retrieving US VC Database Each of the following has a .rpt, .ssh, and .txt file, and has the following constraints in its search:*Venture related deals*Round date in 1/1/1980 to 06/15/2017  The portfolio company (inc. round) based pulls also have:*Portfolio company nation = United States  The datasets retrieved from Data From SDC platinum (on June 15-16th 2017) are:*Portfolio companies (USVC1980-present.ssh) - attributes to be extracted from roundlevel information *Portfolio company descriptions - just the portco name and the long description. Custom processed.*Round-on-one-line (USVCRound1980-present) - processed using RoundOnOneLine.pl*Funds (USVCFund1980-present)*Firms (VCFirms1980-present) - includes branch office so attributes must be extracted*IPOs*M&A All files were processed with NormalizeFixedWidth.pl (after footers were removed) unless otherwise indicated. Some files required some minor post-processing to load into PostgreSQL. Issues included:*Firm level data didn't normalize correctly - had to adjust headers*Stray quotation in address line*Area code had a 1- in it*Some line counts were off by one or two*"Firm Capital under Mgmt" column header for VCFirms has a {0mil} which screws up the normalizer. Delete this part of the column title prior to running normalizer and make sure to put in the proper number of spaces. }
Additional datafiles (in E:\McNair\Projects\VC Database):
*GeocodedVCData.txt 43,724 records, tab-delimited with companynames but with "none" for some geocoords.
==Loading the data into SQL==
 
The SQL script and load data are in:
Z:\VentureCapitalData\SDCVCData
 
The load script is broken into separate sql scripts:
LoadFirms.sql
LoadFunds.sql
LoadRound.sql
LoadRoundbase.sql
 
Issues:
VCFirms line 7139 the text 'L"opera' has a stray quotation mark which will prevent the copy into psql table. Remove stray quotation manually.
VCFirms line 12461 the text '1-8' has a hyphen which creates and import error into psql table. Remove the hyphen manually.
==Processing the base tables==

Navigation menu