Difference between revisions of "VC Database Rebuild"

From edegan.com
Jump to navigation Jump to search
Line 46: Line 46:
 
  CREATE TABLE companybase AS
 
  CREATE TABLE companybase AS
 
  SELECT DISTINCT  
 
  SELECT DISTINCT  
  coname,updateddate,foundingdate,datelastinv,datefirstinv,investedk,city,description,msa,msacode,nationcode,statecode,addr1,ad
+
  coname,updateddate,foundingdate,datelastinv,datefirstinv,investedk,city,description,msa,msacode,nationcode,statecode,addr1,addr2,indclass,indsubgroup3,indminor,url,zip  
dr2,indclass,indsubgroup3,indminor,url,zip  
 
 
  FROM roundbase
 
  FROM roundbase
 
  ORDER BY coname;
 
  ORDER BY coname;
Line 56: Line 55:
 
  FROM roundbase
 
  FROM roundbase
 
  ORDER BY coname;
 
  ORDER BY coname;
 +
 +
==Creating Stage Flags Table==
 +
Stage flags will be used to later on to determine if a company received seed, early or later stage financing. The growthflag is '1' if either the seed, early or later flags is '1'. The exclude flag is used to exclude all companies that received financing for activities we are not interested in and thus should be excluded from our dataset. Entries like 'Open Market Purchase', 'PIPE', etc are the things that the exclude flag filters out. It is built off the round table.
 +
DROP TABLE stageflags;
 +
CREATE TABLE stageflags AS
 +
SELECT coname, statecode, datefirstinv, rounddate, stage3,
 +
CASE
 +
  WHEN stage3 = 'Seed' THEN 1::int
 +
  ELSE 0::int
 +
END AS seedflag,
 +
CASE
 +
  WHEN stage3 = 'Early Stage' THEN 1::int
 +
  ELSE 0::int
 +
  END AS earlyflag,
 +
CASE
 +
  WHEN stage3 = 'Later Stage' THEN 1::int
 +
  ELSE 0::int
 +
  END AS laterflag,
 +
CASE
 +
  WHEN stage3 = 'Seed' OR stage3 = 'Later Stage' OR stage3 = 'Early Stage' THEN 1::int
 +
  ELSE 0::int
 +
  END AS growthflag,
 +
CASE
 +
  WHEN stage3 = 'Acq. for Expansion' OR stage3 = 'Acquisition' OR stage3 = 'Bridge Loan' OR stage3 = 'Expansion' OR stage3 = 'Pending Acq' OR stage3 = 'Recap or Turnaround' OR stage3 = 'Mezzanine' THEN 1::int
 +
  ELSE 0::int
 +
  END AS transactionflag,
 +
CASE
 +
  WHEN stage3 = 'LBO' OR stage3 = 'MBO' OR stage3 = 'Open Market Purchase' OR stage3 = 'PIPE' OR stage3 = 'Secondary Buyout'
 +
OR stage3 = 'Other' OR stage3 = 'VC Partnership' OR stage3 = 'Secondary Purchase' THEN 1::int
 +
  ELSE 0::int
 +
  END AS excludeflag
 +
FROM round;

Revision as of 12:23, 17 July 2017

Plan

Rebuild roundbase, round, geo, ipos, mas from SDC data. Create companybase from roundbase Create round from roundbase. Build stageflags from round.

Clean companybase by putting flags for Undisclosed Company, US location. Check if key (coname, statecode, datefirstinv) is valid. Remove duplicates manually/update command from roundbase. Check if round key is valid. Remove duplicates.

Build statelookup tables and roundlookup tables.

Clean firmbase tables. Clean ipo tables. Clean mas table.

Run matcher on ipos, companybase. Matcher on mas, companybase. Fix duplicate matches.

Join ipos and companybase. Check if count is valid. Fix match as required. Pull ipo key into companybase and companybase key into ipo table first. Then join.

Join mas and companybase. Check if count is valid. Fix match as required. Pull mas key into companybase and companybase key into mas table first. Then join.

Join ipocompanybase with macompanybase to get a table of portcos, ipos and mas.

Calculate exit date based on ipo, ma, datelastinv + 5 years.

Pull in sel flag into companybase and build dead or alive flag.

Match geodata to companybase. Pull geokey into companybase table. Lookup addresses to get geo data as required using geo.py.

Clean fundbase and check valid key (fundname, statecode, firstinvdate)

Clean firmbase and check valid key (firmname, foundingdate)

Loading starting data into database

Database is named vcdb2. It is located in /bulk/VentureCapitalData/SDCVCData. Launch with psql vcdb2. Load the following tables by running the commands below. Make sure the sql scripts and data txt files are all located in the folder. Check that the line numbers copied into your new tables match the line numbers in the Load files.

\i LoadFunds.sql
\i LoadIPOs.sql
\i LoadRoundbase.sql
\i LoadFirms.sql
\i LoadGeoData.sql
\i LoadLongDescription.sql
\i LoadRound.sql

Creating Base Tables

Create the base tables, companybase and round, by running the following scripts. These are the initial tables you will need to clean and join in order to get the master tables.

DROP TABLE companybase;
CREATE TABLE companybase AS
SELECT DISTINCT 
coname,updateddate,foundingdate,datelastinv,datefirstinv,investedk,city,description,msa,msacode,nationcode,statecode,addr1,addr2,indclass,indsubgroup3,indminor,url,zip 
FROM roundbase
ORDER BY coname;
DROP TABLE round;
CREATE TABLE round AS
SELECT DISTINCT coname,statecode,datefirstinv,rounddate,stage1,stage3,rndamtdisck,rndamtestk,roundnum,numinvestors
FROM roundbase
ORDER BY coname;

Creating Stage Flags Table

Stage flags will be used to later on to determine if a company received seed, early or later stage financing. The growthflag is '1' if either the seed, early or later flags is '1'. The exclude flag is used to exclude all companies that received financing for activities we are not interested in and thus should be excluded from our dataset. Entries like 'Open Market Purchase', 'PIPE', etc are the things that the exclude flag filters out. It is built off the round table.

DROP TABLE stageflags;
CREATE TABLE stageflags AS
SELECT coname, statecode, datefirstinv, rounddate, stage3,
CASE
 WHEN stage3 = 'Seed' THEN 1::int
 ELSE 0::int
END AS seedflag,
CASE
 WHEN stage3 = 'Early Stage' THEN 1::int
 ELSE 0::int
 END AS earlyflag,
CASE
 WHEN stage3 = 'Later Stage' THEN 1::int
 ELSE 0::int
 END AS laterflag,
CASE
 WHEN stage3 = 'Seed' OR stage3 = 'Later Stage' OR stage3 = 'Early Stage' THEN 1::int
 ELSE 0::int
 END AS growthflag,
CASE
 WHEN stage3 = 'Acq. for Expansion' OR stage3 = 'Acquisition' OR stage3 = 'Bridge Loan' OR stage3 = 'Expansion' OR stage3 = 'Pending Acq' OR stage3 = 'Recap or Turnaround' OR stage3 = 'Mezzanine' THEN 1::int
 ELSE 0::int
 END AS transactionflag,
CASE
 WHEN stage3 = 'LBO' OR stage3 = 'MBO' OR stage3 = 'Open Market Purchase' OR stage3 = 'PIPE' OR stage3 = 'Secondary Buyout' 
OR stage3 = 'Other' OR stage3 = 'VC Partnership' OR stage3 = 'Secondary Purchase' THEN 1::int
 ELSE 0::int
 END AS excludeflag
FROM round;