Hubs

The Hubs Research Project is a full-length academic paper analyzing the effectiveness of "hubs", a component of the entrepreneurship ecosystem, in the advancement and growth of entrepreneurial success in a metropolitan area.

This research will primarily focused on large and mid-sized Metropolitan Statistical Areas (MSAs), as that is where the greater majority of Venture Capital funding is located.

The data set includes all United States Venture Capital transactions (moneytree) from the twenty-five year period of 1990 through 2015. Data has been accumulated at the portfolio company, fund, and round level. It will be analyzed at the MSA level. We will be looking at in terms of number of companies funded in number of funds active, and flow of investment in a given MSA

The data set has now been uploaded to the database server, named Hubs. There are 4 tables:

Rounds: Rounddate, coname, state, roundno, stage1, etc.
CombinedRounds: Coname, rounddate, discamount, fundname
Companies: LastInv, FirstInv, coname, MSA, MSACode, Address, state, datefounded, totalknownfunding, industry(major)
Funds: fundname, closingdate, lastinv, firstinv, msa, msacode, avinv, nocoinv, totalknowninv, address

Used variables:

Companies: Coname, MSACode, Industry, state
MSALookupTable: MSACode, MSASuper
IndustryLookupTable: IndustryMajor, InduCode
-> 
CompanyInfo: Coname, MSASuper, InduCode, state (complete)

Funds: fundname, msacode, state
MSALookupTable: MSACode, MSASuper 
-> 
FundInfo: fundname, msacode, state (complete)

Rounds: coname, rounddate, stagecode, roundno
CombinedRounds: coname, rounddate, discamount, fundname
->
RoundInfoSuper: coname, rounddate, nofunds, discamount   
->
RoundInfo: Coname, roundyear, fundname, estamount (complete)

Then take:

RoundInfo: Coname, roundyear, fundname, estamount
CompanyInfo: Coname, MSASuper, InduCode, state
FundInfo: fundname, msacode, state
->
SuperRoundInfo: Coname, CoMSASuper, CoInduCode, CoState, FundName, FundMSASuper, FundState, RoundYear, RoundEstAmount
->
MSAPortCos: Count(CoName) As NoPortCosFunded, CoMSASuper, RoundYear
...

Notes on Creation of Primary Data Set

Raw tables

companies (last investment, first investment, company name, MSA, MSA code, address, state, date founded, known funding, industry)
funds (fund closing date, last investment, first investment, fund name, address, MSA, MSA code, Average investment, number companies invested (NoCos), known investment)
rounds (round date, company name, state, round number, stage 1, stage 2, stage 3)
combined rounds (company name, round date, disclosed amount, investor)
msalist (changes MSAs to CMSAs— combined MSAs)
industry list (changes 6 industry categories to 4— ICT, Life Sciences, Semiconductors, Other)

Process

cleaned tables to eliminate duplications, undisclosed variables
changed all original characters to include CMSA and Industry Codes (companyinfo3, fundinfocleanfinal, roundinfoclean)
matched funds to avoid any issues with names (i.e. Fund ABC L.P./Fund ABC LP/Fund ABC)
matched oundinfoclean investors to fundinfocleanfinal investors (roundinfo.txt >> cleanfundfinal.txt)
join by round and company conames
bridge years (1990-2016), stage, and cmsa
populate data with count of companies (Deal flow) and estimated amount ($)

Hub Candidates Data Set

The Hubs candidate data set is a list of potential hubs found in MSAs throughout the country. Researchers are currently pulling qualitative and quantitative information from the candidate's websites, in an attempt to categorize what can be identified as a hub. This is a difficult data set to pull, as there is little to no quantitative information available for this category of institution, and is dependent on accessibility of information to the public on the internet.

Characteristics/Variables

Year Founded
Square footage
LinkedIN self-identifiers (what the organization classifies itself on its LinkedIN profile)
Activeness on Twitter (binomial)
Member Directory available online (binomial)
Number of conference rooms
Price ($/month) for Flex desk
Offers Reserved desk (binomial)
Offers office space for rent (binomial)
Offers community membership-- not for coworking but for community events, etc. (binomial)
Number of events offered per month (estimate)
Offers code academy
Mission Statement/Vision (for qualitative or key-word analysis)

These characteristics/variables will be used to determine whether a candidate is or is not likely to be a Hub.

As of March 10th 2016, the list contains 125 Hub candidates.

Supplementary Data Sets

Patent data: to be pulled from USPTO or SDC Platinum.

unable to find on the internet, must be pulled from the larger dataset

Number of STEM Graduate Students (NSF) and University R&D Spending (NSF): Grad Students found for the year 2015, no data going back historically; R&D found for the past 10 years

categorized university by MSA, can be used for all university-based projects

Per Capita Income and Employment Data (US Census Bureau): complete for most recent census, unable to find data going back historically

Firm Births (BDS): data set found for 1990 to present, currently being cleaned up for use

Resources

Yael Hochberg and Fehder (2015), located in dropbox
- Use this paper as a guideline on how to conduct the analysis
US Census Bureau data on employment by MSA: http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_14_5YR_B23027&prodType=table
USPTO tility patents by MSA: http://www.uspto.gov/web/offices/ac/ido/oeip/taf/cls_cbsa/allcbsa_gd.htm
MSA level trends: http://www.metrotrends.org/data.cf

To Do

We need to find and clean up data sets at the MSA level

Patent data (USPTO)
Number of STEM Graduate Students (NSF)
- in progress
University R&D Spending (NSF)
Per Capita Income (US Census)
- complete (Employment and Income_MSA.xls)
Employment (US Census)
- complete (Employment and Income_MSA.xls)
Firm births (BDS)
SELECT MSAs!!! Possible method: choosing CMSAs with Distinct companies funded
- >100 = 38
- >75 = 45
- >50 = 52
- >25 = 80
- Total 238
- greater than 100 will give us 52 CMSAs to work with

Data Cleaning

Cleaning tasks:

Remove PortCos named Undisclosed, etc.
Remove Funds named Unknown, etc.
Basic Data cleaning:
- Enormous outliers on funds invested
- Check dates

Lookup tables:

SuperMSAs
Industry
Stages

The Target Dataset

We will need to process the following variables:

SuperMSA - combine SanFran and SanJose, New York and Newark?, NC Research triangle, others?

Example dataset:

MSA      Year       SeedVCInv      SeedEarlyVCInv      LaterVCInv     NoDeals   FundsInvested   DistinctInvestors   ....
----------------------------------------------------------------------------------------------------------------------------
1234     2001       1000000        20000000            30000000       4          7              7

Note that the unit of observation is MSA-Year.

Variables to be computed at the MSA level:

HubActive (binary)
NoHubsActive (Count)
HubSqFt
Other Hub Vars (build list!!!)
SeedVCInv
SeedEarlyVCInv
NoDeals (done by local VCs?)
- NoDealsNear
- NoDealsFar
NoPortCosFunded
FundsInv (in an MSA)
- FundsInvFromNear (within MSA?)
- FundsInvFromFar (outside MSA?)
DistinctInvestors
- DistinctInvestorsNear (within MSA?)
- DistinctInvestorsFar (outside MSA?)
PatentCount
NoSTEMGrads
FirmBirths (BDS data)
UniRandDSpend
PerCapitaIncome
Employment

We need to:

Check funds invested means dollars invested
Categorize near and far! Is it within MSA vs. not, within adjacent MSAs, etc.?

There may be a second dataset that has Hub-Industry-Year (where industry is semiconductor/non-semiconductor?).

Hubs

Contents

Primary Data Set

Hub Candidates Data Set

Supplementary Data Sets

Resources

To Do

Data Cleaning

The Target Dataset

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools