Difference between revisions of "VCDB23"

From edegan.com
Jump to navigation Jump to search
(Created page with "VCDB23 is the 2023 iteration of my venture capital database. The last previous build was VCBD20, and this follows the same basic design. == Processing Steps == # Cop...")
 
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[VCDB23]] is the 2023 iteration of my venture capital database. The last previous build was [[VCBD20]], and this follows the same basic design.
+
[[VCDB23]] is the 2023 iteration of my venture capital database. The last previous build was [[VCDB20]], and this follows the same basic design. This build was partial - not all location data was included. It was superseded by [[VCDB24]].
  
 
== Processing Steps ==
 
== Processing Steps ==
Line 7: Line 7:
 
# Run the ssh files against SDC Platinum. Note that SDC Platinum's service will be withdrawn on 31 December 2023.
 
# Run the ssh files against SDC Platinum. Note that SDC Platinum's service will be withdrawn on 31 December 2023.
 
# Run the [[SDC Normalizer]] script (one of the pl files) on each output
 
# Run the [[SDC Normalizer]] script (one of the pl files) on each output
## There are special steps for USFirms1980, USFund1980, USFundExecs1980, USPortCo1980, and RoundOnOneLine (which needs multistep processing)
+
## Fix the header row in USFirms1980.txt before normalizing (the Capital Under Management column name is too long)
## PortCo_Long_Description requires bespoke processing
+
## Remove double quotes from USFund1980-normal.txt, USFundExecs1980-normal.txt, USPortCo1980-normal.txt, USFirmBranchOffices1980.txt
 +
## The private and public M&A file sets have to be separately combined into 2 files after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t in each.
 +
## For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first, then RoundOnOneLine.pl, and then fix the header.
 +
## PortCoLongDescription must be pre-processed from the command line and then post-processed in excel (see VCDB20H1 and Vcdb4#Long_Description). However, I didn't load it for this run.
 
# Create a new database on mother (createdb vcdb23) and setup a directory for the input files: E:\projects\vcdb23
 
# Create a new database on mother (createdb vcdb23) and setup a directory for the input files: E:\projects\vcdb23
 
# Copy over and edit Load.sql. Run it section-by-section.
 
# Copy over and edit Load.sql. Run it section-by-section.

Latest revision as of 18:37, 29 December 2023

VCDB23 is the 2023 iteration of my venture capital database. The last previous build was VCDB20, and this follows the same basic design. This build was partial - not all location data was included. It was superseded by VCDB24.

Processing Steps

  1. Copy over the rpt, ssh, and pl files, and bulk edit the ssh files, now in E:\projects\vcdb23\SDC.
    1. Change 12/31/2020 (and one 07/20/2020) to 12/31/2022 and vcdb20 to vcdb23
  2. Run the ssh files against SDC Platinum. Note that SDC Platinum's service will be withdrawn on 31 December 2023.
  3. Run the SDC Normalizer script (one of the pl files) on each output
    1. Fix the header row in USFirms1980.txt before normalizing (the Capital Under Management column name is too long)
    2. Remove double quotes from USFund1980-normal.txt, USFundExecs1980-normal.txt, USPortCo1980-normal.txt, USFirmBranchOffices1980.txt
    3. The private and public M&A file sets have to be separately combined into 2 files after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t in each.
    4. For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first, then RoundOnOneLine.pl, and then fix the header.
    5. PortCoLongDescription must be pre-processed from the command line and then post-processed in excel (see VCDB20H1 and Vcdb4#Long_Description). However, I didn't load it for this run.
  4. Create a new database on mother (createdb vcdb23) and setup a directory for the input files: E:\projects\vcdb23
  5. Copy over and edit Load.sql. Run it section-by-section.