To run the linear probability model, we need to build a new dataset. This was partially done in the Stata Do-File explained above, but doing it in SQL will give the opportunity to be more flexible when choosing the synthetic match.
The end result is a table that lists all matches that could have occurred in every possible market, including the real one. First, we need to exactly define what a market is. In this case, a market consists of all matches that occurred in a year and within a industry sector, usually defined by a code. Therefore, the size and type of market hinges on what industry code is being used. There are three categories, each one more granular, that defines a startup industry. The broader one is the industry class'with 3 categories, 'Information Technology', 'Medical/Health/Life Science' and 'Non-High Tech'. After that there is the Minor group, with categories such as Communications and Media, Computer Hardware, or Biotech, or Consumer Related. After that, the finer one is the Subgroup, which gets very specific, like Wireless Communication Services or Medical Imaging. A industry code is then a 4-digit number, where the first belongs to the industry Class, the second to the industry Minor group and the last two to the Subgroup. We aggregate Subgroups with less than 20 observations (ie, number of startups) in an 'Other' category to create 'code20', and an analogous 'code100' for less than 100 observations. We want to create a table that lists for each unique portco all the firms in its eligible firmsmarket, ie, active in the year it had its first investment from the real matched VC and that had invested in a portco of the same code100 /20 in that year.
After that we can simply append/union the real match table and calculate the variables from the original dataset on this new table.
The final code that does this is called 'CreatingLPM_withoutsyn.sql'when using code100, and 'CreatingLPM_withoutsyn_code20.sql' ===Histograms=== The code 'Histograms.sql' exports two tables to Z:\VentureCapitalData\SDCVCData\vcdb2 called 'DistribCode100.sql' and 'DistribCode20.sql'. After that, I import them into Excel and create histograms to characterize the distribution of market size. The excel file is in E:\McNair\Projects\MatchingEntrepsToVC\Stata\Tex