Difference between revisions of "Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists"
Line 9: | Line 9: | ||
Jeremy's paper with David Hsu and Chenyu Yang is here: [http://fox.web.rice.edu/working-papers/fox-hsu-yang-matching.pdf Unobserved Heterogeneity in Matching Games with an Application to Venture Capital]. | Jeremy's paper with David Hsu and Chenyu Yang is here: [http://fox.web.rice.edu/working-papers/fox-hsu-yang-matching.pdf Unobserved Heterogeneity in Matching Games with an Application to Venture Capital]. | ||
− | Abstract: Agents in two-sided matching games vary in characteristics that are unobservable in typical data | + | |
+ | '''Abstract:''' Agents in two-sided matching games vary in characteristics that are unobservable in typical data | ||
on matching markets. We investigate the identification of the distribution of unobserved characteristics | on matching markets. We investigate the identification of the distribution of unobserved characteristics | ||
using data on who matches with whom. In full generality, we consider many-to-many | using data on who matches with whom. In full generality, we consider many-to-many |
Revision as of 17:34, 22 January 2018
Academic Paper | |
---|---|
Title | Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists |
Author | Ed Egan, Jeremy Fox, David Hsu |
RAs | Meghana Gaur |
Status | In development |
© edegan.com, 2016 |
Contents
Reference Papers
Jeremy's paper with David Hsu and Chenyu Yang is here: Unobserved Heterogeneity in Matching Games with an Application to Venture Capital.
Abstract: Agents in two-sided matching games vary in characteristics that are unobservable in typical data on matching markets. We investigate the identification of the distribution of unobserved characteristics using data on who matches with whom. In full generality, we consider many-to-many matching and matching with trades. The distribution of match-specific unobservables cannot be fully recovered without information on unmatched agents, but the distribution of a combination of unobservables, which we call unobserved complementarities, can be identified. Using data on unmatched agents restores identification. We estimate the contribution of observables and unobservable complementarities to match production in venture capital investments in biotechnology and medical firms.
Fox Hsu Yang (2015) - Unobserverd Heterogeneity in Matching Games with an Application to Venture Capital provides some notes.
Matlab Code
Abhijit Brahme (Work Log) contains his notes on working with the Matlab code. There is a seperate page here: Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists Matlab Code.
Data specification
The data spec sent to Jeremy is in:
Z:\Projects\MatchingAcceleratorsToVCs
Data foundations
The database is vcdb2
The foundational tables were built using:
Z:\VentureCapitalData\SDCVCData\vcdb2\ProcessData2.sql
The documentation, which is a little messy, is on VC Database Rebuild
Our SQL script, which builds on top of the above database (still in vcdb2) is in:
E:\McNair\Projects\MatchingEntrepsToVC\DataWork
Dataset build
Decisions
Decisions we need to make:
- Will we need synthetic matches? If so what we do we do for outcomes? Can still do dyadic and left/right pair variables.
- Granularity of industry: To start let's use minor industry group (see below). We use a much finer grained industry definition and aggregate back up to balance out the counts somewhat later.
- Matching to a fund or a firm: For now, we will work with funds, though deals are sometimes transferred across funds within a firm (i.e. from Kliener fund IV to Kliener fund V), this is probably comparatively rare (check!).
- Dealing with the right censorship problem: We can likely address this with indicator variables to condition on, but may want to restrict estimation to dyads that don't have this issue. For now we will take portfolio companies that received their last investment before 2007, to allow funds a full 10 years to clear their portfolios.
- Inadequate coverage in early years: VentureXpert's coverage is notably inferior prior to 1982. We should start with portco that received their first investment in 1985 and forward.
- Determination of lead VC - see below
- How to collapse VC rounds (date, amount, etc.): We will use only seed, early, later stage investment and insist on the presence of seed/early for inclusion. We can then have date first, investment duration (to date last), total investment.
Objective dataset description
Unit of observation - a startup-fund match.
Constraints:
- PortCo name disclosed
- PortCo date of first investment >= 1/1/1985
- PortCo date of last investment <= 2007 to allow 10 yrs for the funds
- PortCo received at least one round of Seed or Early stage investment
- Matched VC is not undisclosed
Variables:
Startup:
- PortCo ID
- PortCo Name
- Longitude, latitude,
- State of inc., industry, year of founding, year of first investment, year of last investment
- SEL $invested, SEL num rounds, transactional VC indicator and $inv, investment duration SEL (yrs)
- Exit indicator, exit value, exit type indicator
- alive2016 indicator, last round pre-2012 indicator
- total MOOMI (Money Out Over Money In)
Fund:
- fund ID
- fund name
- Number of funds investing (SEL)
- As averages (?) and for lead:
- Fund ipo count, Fund M&A count, Fund investment count(calc at end), fund ipo rate, fund M&A rate, fund exit count, fund exit rate, fund ipo $, fund M&A $, fund exit $, fund fraction of MOOMI.
- Total invested by lead, number of rounds participation by lead, stage of participation of lead, location of lead, last investment pre-2012 indicator, lead fund type indicator (corp, priv, gov, etc.), lead fund size, lead fund vintage year.
Dyadic variables:
- Distance between lead and portco,
- industry preference match between lead and portco
- maybe stage-match (doesn't make a lot of sense when collapsing rounds) between lead and port co.
Identifying lead VCs
Possible methods:
- Best performing participant (on exit count/value or fractional MOOMI) with tie-breaker
- Closest participant (using great circle distance)
- Most frequent participant with tie-breaker
- Participant with greatest investment with tie-breaker
- Participant in earliest round that stayed in for longest with tie-breaker
Minor Industry
Across all time and without regard to SEL vs. transaction, here's the minor industry list and counts:
indminorgroup | count -------------------------------+------- Industrial/Energy | 2871 Internet Specific | 8794 Biotechnology | 2592 Semiconductors/Other Elect. | 2402 Other Products | 4891 Computer Hardware | 2061 Computer Software and Services | 10550 Communications and Media | 3271 Medical/Health | 4373 Consumer Related | 3161
Literature from David
Literature to "validate" our sample. I think you probably know the papers I reference below (let me know if you need any of them-some for which I am coauthor you can get from my website).
- VCs are more likely to match with geographically proximate startups (Lerner on corporate governance, Sorenson on geography)
- Startups prefer to match with VCs with domain experience within their startup sector (Morten Sorensen), possibly also prefer to match by stage of VC specialization relative to their own stage of development (not sure which paper if any documents that)
- Startup patents signal VCs (Hsu/Ziedonis in SMJ)
- VCs prefer serial founders, or at least may interact differently with founders based on their prior founding experience (Hsu 2007 in Research Policy)
- If we have access to more individual data: VCs prefer to invest in founders with similar demographic characteristics relative to their own characteristics (Gompers et al within the past few years in JFE, Bengtsson and Hsu in JBV within the last few years).
Work Done in Late November by Dylan & Ed
SBIR Data taken from McNair\Projects\SBIR\Data\Aggregate SBIR\SBIR.txt. -Note! This file needed to be opened in excel to be readable, and took a very long time to open due to its large size. SBIR firm names converted to a pivot table to eliminate exact repeat entries, and then exported to a txt file, NSBIR. NSBIR then matched using The Matcher in mode 2 with the following code:
"-file1="NSBIR.txt" -file2="NSBIR.txt" -mode=2"
Output then placed in:
McNair\Projects\MatchingEntrepsToVC\Matching\Output
The original pre-matched, cleaned NSBIR.txt file is moved to:
McNair\Projects\MatchingEntrepsToVC\Matching\Input.
There is a sql file to extract VC portcos (SEL backed only), with key info from vcdb2, and distinct assignee names from allpatentsprocessed here:
E:\McNair\Projects\MatchingEntrepsToVC\Matching
There are three input files:
- distinctNSBIR.txt - made by pivot tabling SBIR.txt from the SBIR aggregation project
- distinctassignees.txt - extracted as distinct from allpatentsprocessed
- vcbackedselcokeys.txt - extracted with key info from vcdb2. It needs pivot tabling to get unique names.
These .txt files were made distinct, and then matched against themselves for normalization. The normalized files still need to be matched against each other. They are located in:
McNair\Projects\MatchingEntrepsToVC\Matching\Normalized
These normalized files were then matched against each other. Approximately 12,000 matches. they are located in:
McNair\Projects\MatchingEntrepsToVC\Matching\Normalized & Matched