Again the data is just a single synthetic for each real. In this analysis, Marcos also clusters the standard errors at the year level, but does not use any fixed effects.
The labels in the pdf are somewhat misleading. The margin command reports only the underlying covariates not the interactions, (unless you specifically generate the variables). If Marcos had An analysis of ran just the underlying variables without the interactions, we would have produced markedly different margins. ! The margins in table 6 column 1 of the pdf are coming from the following:
PDF -> source
Note that STATA uses ## to report both main effects for each variable as well as an interaction, so c.hqdist##c.hqdist reports both hqdist and hqdist^2, while c.serials##c.numprevportco reports serials, numprevportco, and serials*numprevportco. Variables are omitted when duplicated as in c.serials##c.numprevportco and c.patentsprevc##c.numprevportco, which both report numprevportco.
We don't get the same lasso results as Markus:
Variable MarcosLasso NewLasso
-----------------------------
hdqist yes yes
sumprevsameindu20 yes yes
serials yes no
numprevportcos yes no
firmtenure yes yes
patentsprevc no no
But Marcos's spec isn't very grounded. He clusters standard errors at the year level but uses no fixed effects. We want to know what goes on inside markets, implying market-level fixed effects. He believes that "Since non-match speci�c variables are not used in the structural model, we have to interact VC or Startup speci�c variables." I'm not sure that this is correct. He goes on to say that "Therefore, the main specifi�cation is one which every match-specifi�c variable has a quadratic interaction, and startup and VC variables are interacted with each other. Also, we exclude industry code from the model because it is a discrete variable, and
we transform VC founding year to VC tenure, which subtracts the former with year of match."
Industry certainly won't matter with market fixed effects. Marcos also used numprevportco as if it was purely a VC variable, rather than being closer to a match specific variable.
I tried Marcos's approach using all of the possible variables (old and new) but always and only using firmtenurel as a VC interaction variable (as firmportcosl is used to pick the real from the list of potential reals, and as firmapportione~ml is correlated with firmportcosl). I will also only use pccityoverallr~1l as the PortCo interaction variable, as that's the only PortCo variable that survives to significance.
The result was:
. margins, dydx(*) post
Average marginal effects Number of obs = 381,882
Model VCE : Robust
Expression : Linear prediction, predict()
dy/dx w.r.t. : pccityoverallrankm1l firmtenurel firmportcosl matchprevindu20l matchbodistl
matchinstagenarrow matchcity matchstate
--------------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
---------------------+----------------------------------------------------------------
pccityoverallrankm1l | .0055706 .0002035 27.38 0.000 .0051718 .0059694
firmtenurel | .0059353 .0005165 11.49 0.000 .0049229 .0069477
firmportcosl | .0052155 .0004399 11.86 0.000 .0043532 .0060777
matchprevindu20l | -.0536725 .0007413 -72.41 0.000 -.0551254 -.0522196
matchbodistl | -.0106516 .0003413 -31.21 0.000 -.0113205 -.0099826
matchinstagenarrow | .0057086 .0007494 7.62 0.000 .0042398 .0071774
matchcity | .0684326 .0041129 16.64 0.000 .0603715 .0764937
matchstate | .0436431 .0015343 28.45 0.000 .040636 .0466503
--------------------------------------------------------------------------------------
Finally, collapse the dataset by summing realmatch and produce a histogram and some analysis.
===Notes from Conference Call===