Difference between revisions of "VCDB23"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
− | [[VCDB23]] is the 2023 iteration of my venture capital database. The last previous build was [[ | + | [[VCDB23]] is the 2023 iteration of my venture capital database. The last previous build was [[VCDB20]], and this follows the same basic design. |
== Processing Steps == | == Processing Steps == | ||
Line 11: | Line 11: | ||
## The private and public M&A file sets have to be separately combined into 2 files after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t in each. | ## The private and public M&A file sets have to be separately combined into 2 files after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t in each. | ||
## For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first, then RoundOnOneLine.pl, and then fix the header. | ## For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first, then RoundOnOneLine.pl, and then fix the header. | ||
− | ## PortCoLongDescription must be pre-processed from the command line and then post-processed in excel (see VCDB20H1 and Vcdb4#Long_Description). | + | ## PortCoLongDescription must be pre-processed from the command line and then post-processed in excel (see VCDB20H1 and Vcdb4#Long_Description). However, I didn't load it for this run. |
# Create a new database on mother (createdb vcdb23) and setup a directory for the input files: E:\projects\vcdb23 | # Create a new database on mother (createdb vcdb23) and setup a directory for the input files: E:\projects\vcdb23 | ||
# Copy over and edit Load.sql. Run it section-by-section. | # Copy over and edit Load.sql. Run it section-by-section. |
Revision as of 10:25, 10 January 2023
VCDB23 is the 2023 iteration of my venture capital database. The last previous build was VCDB20, and this follows the same basic design.
Processing Steps
- Copy over the rpt, ssh, and pl files, and bulk edit the ssh files, now in E:\projects\vcdb23\SDC.
- Change 12/31/2020 (and one 07/20/2020) to 12/31/2022 and vcdb20 to vcdb23
- Run the ssh files against SDC Platinum. Note that SDC Platinum's service will be withdrawn on 31 December 2023.
- Run the SDC Normalizer script (one of the pl files) on each output
- Fix the header row in USFirms1980.txt before normalizing (the Capital Under Management column name is too long)
- Remove double quotes from USFund1980-normal.txt, USFundExecs1980-normal.txt, USPortCo1980-normal.txt, USFirmBranchOffices1980.txt
- The private and public M&A file sets have to be separately combined into 2 files after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t in each.
- For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first, then RoundOnOneLine.pl, and then fix the header.
- PortCoLongDescription must be pre-processed from the command line and then post-processed in excel (see VCDB20H1 and Vcdb4#Long_Description). However, I didn't load it for this run.
- Create a new database on mother (createdb vcdb23) and setup a directory for the input files: E:\projects\vcdb23
- Copy over and edit Load.sql. Run it section-by-section.