Marcos Ki Hyung Lee (Work Log)
Summer 2018
2018-06-25:
Studied the SQL code that creates the variables synsumprevsameindu100 synsumprevsameindu20 synsumprevsameindu synsumprevsamesector synnumprevportcos syntotsameindu100 syntotsameindu20 syntotsameindu syntotsamesector (and also the non synthetic ones).
Found some problems with the nonsynthetic ones, with some double counting when VCS met with more than portco on the same day AND the portcos had the same industry code. The code subtracts 1 from the sum of dummies that indicate same industry code, but in these instances, they should be subctrating more.
Also, the synthetic counterparts have weird values. The historical ones (previous from meeting the portco) are mostly 0 or -1, while all-time have lots of missings. Initially I thought it was an error from the code, but after thinking about this, I think it is a feature of the randomization. To correct the negative numbers, I think we should not subtract 1 from the sum of dummies. We did that to account for the repeated portcos that showed up in the blowout table, but now these repetitions don't happen, since we are joining a table with synthetic matches with real matches.
2018-06-22:
Plans for today: try to fix dataset.
Found more errors in matched dataset. Synthetic firms variables seem to be wrong, as there are negative numbers and lots of missings.
Also, variables form matched firms like number of people, doctors, etc, and city name, are missing seemingly at random.
Egan walked me through the SQL code that generates de matched dataset. We made a more precise count of coinvestors. Before, we were double counting funds. Now, if a PortCo had only one VC fund investment, numcoinvestor == 0.
Looking into the syntethic variables problem, the main problem is on the variables synsumprevsameindu100 synsumprevsameindu20 synsumprevsameindu synsumprevsamesector synnumprevportcos syntotsameindu100 syntotsameindu20 syntotsameindu syntotsamesector.
They basically count the number of PortCos VCs invested that were in the same industry code as them, before meeting the current matched POrtCo (synsum*) and over all time (syntot*). So they are integers and tot >= sum. However, for the synthetic firm ones, they are mostly -1 on the sum ones, and missing on tot ones.
Looking at the code that generates these synthetic, there seems to be a problem when joining and subtracting one to the sum of dummies where A.code100 = B.code100 for example. Can't figure out how to correct it yet.
2018-06-21:
Plans for today: get a full understanding of dataset and variables, start making some summary statistics.
Inspected matched dataset and found inconsistencies on invesmentment amounts of VCs in PortCos. Talked to Egan about this, we will check it out carefully on the source SQL code tomorrow.
Made summary statistics of firm variables. There does not seem to be inconsistencies on that.
2018-06-20: Created folder at "E:\McNair\Projects\MatchingEntrepsToVC\Stata\", imported files into Stata, and made master dofile.