Dropping the entire market is surely way too extreme. We should just drop the offending portco and only drop the market if the number of real matches drops below our threshold (e.g., 5).
==== Review of Changes ==== ...HERE ==== pccitydollarsrankm1 ====
There are a number of possible explanations for why this variable has lots of missing.
There do seem to be missing placenames. 4855/69882 PortCoSuper records don't join to PlaceYearRanking on placename and state (ignoring year) and 4,561 of these have valid zips. However, only 263 had growth VC and just 82 has non-null positive invested amounts, so this isn't the issue.
==== pcnumperson / pcexp ====
pcnumperson suffers from a number of endogeneity issues, including:
ELSE 0::int END AS matchinstagebroad,
=== V1 Changes to date === Code is in E:\projects\unobservedcomplementarities\BuildDataset.sql
Changes:
*Changed MatchHighestRandom to MatchMaster. It is MatchMostNumerous (i.e., pick the firm with max(numportcos) for each portco from RLMaster) with a random tie break. It contains a lot of variables pertaining to the portco, firm, round, and match!
*MatchKeys is coname, statecode, datefirstinv, firmname, as well as minroundin, year, code, code20, code100. It replaces RealMatchesCode.
Code is in E:\projects\unobservedcomplementarities\BuildDataset.sql
*Replaced SynRealSetc20 with SynthKeys_Code20.
*Replaced AllRealMatchKeysC20Code with ComboKeys_Code20, also renamed realmatch variable to isreal.