Difference between revisions of "The Impact of Entrepreneurship Hubs on Urban Venture Capital Investment"
Line 79: | Line 79: | ||
We processed data as follows: | We processed data as follows: | ||
− | #Create | + | #Create the '''CMSA-Year''' Table |
##Create single variable tables: Distinct CMSA, year, stage, found year of fund and found year of company. | ##Create single variable tables: Distinct CMSA, year, stage, found year of fund and found year of company. | ||
##Create the cross production tables: CMSA-year, CMSA-year-fund year founded and CMSA-year-company year founded | ##Create the cross production tables: CMSA-year, CMSA-year-fund year founded and CMSA-year-company year founded | ||
Line 86: | Line 86: | ||
##Create a table with 'Company CMSA', 'round year', 'disclosed amount' from rounds-companies combined table, and add stage binary variables. Join it to CMSA-year-company year founded | ##Create a table with 'Company CMSA', 'round year', 'disclosed amount' from rounds-companies combined table, and add stage binary variables. Join it to CMSA-year-company year founded | ||
##Create a table with 'CMSA', 'fund year', 'number of investors' from cleaned funds table and join it to CMSA-year-fund year founded | ##Create a table with 'CMSA', 'fund year', 'number of investors' from cleaned funds table and join it to CMSA-year-fund year founded | ||
− | #Create near-far and stages table | + | #Create '''near-far''' and stages table |
##Add fund data to rounds-companies | ##Add fund data to rounds-companies | ||
##Create near-far and stages binary variable | ##Create near-far and stages binary variable | ||
##Count investment and deals by CMSA and year, categorized by near-far and stages | ##Count investment and deals by CMSA and year, categorized by near-far and stages | ||
− | #Combine all tables by round year | + | #Combine all tables by CMSA and round-year |
==Supplementary Data Sets== | ==Supplementary Data Sets== |
Revision as of 11:18, 1 July 2016
The Impact of Entrepreneurship Hubs on Urban Venture Capital Investment | |
---|---|
Project Information | |
Project Title | |
Start Date | |
Deadline | |
Primary Billing | |
Notes | |
Has project status | |
Copyright © 2016 edegan.com. All Rights Reserved. |
Contents
Abstract
The Hubs Research Project is a full-length academic paper analyzing the effectiveness of "hubs", a component of the entrepreneurship ecosystem, in the advancement and growth of entrepreneurial success in a metropolitan area.
This research will primarily be focused on large and mid-sized Metropolitan Statistical Areas (MSAs), as that is where the greater majority of Venture Capital funding is located.
Data
Venture Capital Transactions Data Set
The main goal of the data set is to aggregate company, fund, and round level data to be analyzed at a combined MSA and year level. The data set is compromised of two major parts: a granular company/fund/round and an aggregated CMSA-Year. The data includes all United States Venture Capital transactions (moneytree) from the twenty-five year period of 1990 through 2015.
The Hubs data set, from SDC Platinum, has been constructed in the server:
Data files are in 128.42.44.181/bulk/Hubs All files are in 128.42.44.182/bulk/Projects/Hubs psql Hubs2
Please note that this dataset is currently being constructed and has not been completely uploaded yet.
General Procedure - Granular Table
- Start with separate raw datasets for Companies, Funds, and Rounds
- Add Data to Each Individual dataset (e.g. add MSA code)
- Clean and standardize names (e.g. company or fund name) for each dataset
- Join the Datasets (here we need to exclude undisclosed companies)
General Procedure - Granular Table
- Create a consistent CMSA-Year table to be used later
- Using the tables from the granular table, parse out the right data
- Join the parsed out data with the CMSA-Year Table
- Join these Tables
Tables and Specific Procedure Used
Raw data tables
- Funds: fund name, first investment date, last investment date, fund closing date, address, known investment, average investment, number of companies invested, MSA, MSA code.
- Rounds: round date, company name, state, round number, stage 1, stage 2, stage 3
- Combined Rounds: company name, round date, disclosed amount, investor
- Companies: company name, first investment, last investment, MSA, MSA code, address, state, date founded, known funding, industry
- MSA List: MSA, MSA code, CMSA, CMSA code
- Industry List: changes 6 industry categories to 4— ICT, Life Sciences, Semiconductors, Other
Granular Table (Fund-Round-Company)
The final table here contains all venture capital transactions by disclosed funds and portfolio companies, together with their CMSAs. To get the table, we processed the raw data sets in the following steps:
- Clean Company data
- Import raw data companies
- Add variable 'CMSA' from data set MSA list, update variable 'industry' by joining data set industry list
- Remove duplicates and remove undisclosed companies
- Clean Fund data
- Import raw data funds
- Add variable 'CMSA'
- Remove duplicates and remove undisclosed funds
- Match fund names with itself using [The Matcher (Tool) |The Matcher] to get the standard fund names
- Clean Round data
- Import raw data rounds and combined rounds
- Add variables 'number of investment', 'estimated investment' and 'year'
- Remove duplicates and remove undisclosed funds
- Combine Companies and Rounds
- Combine cleaned companies and rounds data table on company names
- Add variable 'round number' and 'stage'
- Remove duplicates
- Combine Funds and rounds-companies
- Match fund names in rounds data table with standard fund names using [The Matcher (Tool) |The Matcher] to standardize fund names in rounds data table
- Join standard fund names to rounds-companies table
- Join cleaned funds table to rounds-companies table on standard fund names
CMSA-Year Aggregated Table
The final table contains number of companies and amount of investment, categorized by distance and stages, of each CMSA.
We processed data as follows:
- Create the CMSA-Year Table
- Create single variable tables: Distinct CMSA, year, stage, found year of fund and found year of company.
- Create the cross production tables: CMSA-year, CMSA-year-fund year founded and CMSA-year-company year founded
- Draw data from cleaned companies, funds and rounds tables
- Create a table with 'CMSA', 'number of companies' and 'year Founded' from cleaned companies table and join it to CMSA -year founded
- Create a table with 'Company CMSA', 'round year', 'disclosed amount' from rounds-companies combined table, and add stage binary variables. Join it to CMSA-year-company year founded
- Create a table with 'CMSA', 'fund year', 'number of investors' from cleaned funds table and join it to CMSA-year-fund year founded
- Create near-far and stages table
- Add fund data to rounds-companies
- Create near-far and stages binary variable
- Count investment and deals by CMSA and year, categorized by near-far and stages
- Combine all tables by CMSA and round-year
Supplementary Data Sets
Supplementary data sets are cleaned and joined back to CMSAyear table on CMSA and year.
- Number of STEM graduate student, by university and year(2005 to 2014).
E:\McNair\Projects\Hubs\STEM grads for upload v2.xls
- University R&D spending, by university and year(2004 to 2014).
E:\McNair\Projects\Hubs\NSF spending for upload.xls
- Income per capital, by MSA and year(2000 to 2012)
E:\McNair\Projects\Hubs\Income per capita upload.xls
- Wages and salaries, by MSA and year(2000 to 2012)
E:\McNair\Projects\Hubs\Wage for upload v2.xls