Difference between revisions of "Hubs"
(92 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | {{McNair | + | {{Project |
+ | |Has project output=Data | ||
+ | |Has sponsor=McNair Center | ||
|Has title=Hubs | |Has title=Hubs | ||
|Has owner=Hira Farooqi, | |Has owner=Hira Farooqi, | ||
|Has keywords=Data | |Has keywords=Data | ||
|Has project status=Active | |Has project status=Active | ||
+ | |Does subsume=Hubs Analysis 2017, | ||
}} | }} | ||
− | |||
− | + | '''Important Notice: The last update to the hubs data was done manually by Ed and is in E:\projects\MeasuringHGHTEcosystems\HubsData-RevisedSimplified.xlsx''' | |
− | |||
− | |||
− | |||
− | |||
− | |||
+ | The Hubs Research Project is a full-length academic paper analyzing the effectiveness of "hubs", a component of the entrepreneurship ecosystem, in the advancement and growth of entrepreneurial success in a metropolitan area. It focuses on cities in the United States as the primary unit of analysis. | ||
− | + | This page contains information about data used for this research project, including data sources, location of data on RDP and details on data processing. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Information on initial data work done prior to Summer 2017 can be found at [[Hubs Summer 2016]]. | |
− | + | '''Note on joining:''' The city-state-year ID from VC data is used as the master ID for joining datasets. Each table (e.g. income, nih, nsf, sbir, compustat) is first joined with the VC data on city-state-year ID and then the resulting tables are all joined together in the final table. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
===Data by zip code=== | ===Data by zip code=== | ||
Line 285: | Line 44: | ||
The Census claims to provide city to MSA here: https://www.census.gov/geo/maps-data/data/ua_rel_download.html | The Census claims to provide city to MSA here: https://www.census.gov/geo/maps-data/data/ua_rel_download.html | ||
However, there is only CBSA! | However, there is only CBSA! | ||
+ | |||
+ | This might do it: https://www2.census.gov/geo/pdfs/maps-data/data/rel/explanation_ua_cbsa_rel_10.pdf | ||
+ | We can maybe track city to principal city to MSA | ||
==COMPUSTAT Data== | ==COMPUSTAT Data== | ||
+ | The data set includes information on publicly traded firms in the US. It was obtained from the Wharton Research Data Services (https://wrds-web.wharton.upenn.edu/wrds/index.cfm?). | ||
− | Data is in: | + | |
+ | Raw Data is in: | ||
E:\McNair\Projects\Hubs\Summer 2017 | E:\McNair\Projects\Hubs\Summer 2017 | ||
Z:\Hubs\2017 | Z:\Hubs\2017 | ||
− | |||
− | |||
− | |||
− | |||
The source file is RandDExpenditures.txt. It contains: | The source file is RandDExpenditures.txt. It contains: | ||
− | *Date from 1980-2017 (July) | + | *Date from 1980-2017 (July). |
*427799 records | *427799 records | ||
*Fields include: | *Fields include: | ||
**R&D Expenditure | **R&D Expenditure | ||
**Address (inc. city, zip, state) | **Address (inc. city, zip, state) | ||
+ | **Revenue of firms | ||
+ | |||
+ | Database is '''cities''' | ||
+ | |||
+ | SQL script is: COMPUSTAT.sql | ||
Output file is COMPUSTATSummary.txt. It contains: | Output file is COMPUSTATSummary.txt. It contains: | ||
Line 307: | Line 72: | ||
*1979-2016 | *1979-2016 | ||
*4440 cities | *4440 cities | ||
+ | |||
+ | It is located in | ||
+ | Z:\Hubs\2017\Output_Files | ||
==NSF Data== | ==NSF Data== | ||
Line 317: | Line 85: | ||
SQL script is: nsf_2017.sql | SQL script is: nsf_2017.sql | ||
− | The source | + | The source files are: nsf2017.txt, copied from table '''nsf''', and nsf_institution copied from table '''nsf_grants_institution''' from the biotech db. |
− | + | They contain: | |
*Award ID | *Award ID | ||
*Award Institution | *Award Institution | ||
Line 325: | Line 93: | ||
*Institution city | *Institution city | ||
*Award Value | *Award Value | ||
+ | *Organization state code | ||
From 1900 - 2017 | From 1900 - 2017 | ||
Output file is nsfSummary.txt. It contains: | Output file is nsfSummary.txt. It contains: | ||
− | *Variables: City, year, | + | *Variables: City, State code year, nsf_nogrants, nsf_valuegrant |
*1900-2017 | *1900-2017 | ||
− | + | ===Joined NSF table=== | |
− | + | The joined nsf table with the VC table is found in db '''cities'''. The table is named '''merged_nsf'''. | |
+ | All the values of nogrants and valuegrant with missing values for years 1990-2017 are set equal to 0. | ||
+ | The sql script is in | ||
+ | Z:\HUbs\2017\sql scripts | ||
==NIH Data== | ==NIH Data== | ||
Line 346: | Line 118: | ||
*nih_2013_2015 | *nih_2013_2015 | ||
located in E:\McNair\Projects\Federal Grant Data\NIH | located in E:\McNair\Projects\Federal Grant Data\NIH | ||
+ | |||
+ | |||
+ | |||
+ | The script that cleans NIH data and generates the summary table is titled '''nihSummary'''. It is located here: | ||
+ | |||
+ | Z:\Hubs\2017\sql scripts | ||
+ | |||
+ | This table includes | ||
+ | *year | ||
+ | *city | ||
+ | *state | ||
+ | *country | ||
+ | *nogrants (number of grants) | ||
+ | *valuegrant | ||
+ | *city_state | ||
*Date from 1986-2015 | *Date from 1986-2015 | ||
+ | |||
+ | ===Joined NIH table=== | ||
+ | The joined NIH table with the VC table is found in db '''cities'''. The table is named '''merged_nih'''. | ||
+ | All the values of nih_valuegrant and nih_nogrants with missing values for years 1986-2015 are set equal to 0. | ||
+ | The sql script is in | ||
+ | Z:\HUbs\2017\sql scripts | ||
==Clinical Trials Data== | ==Clinical Trials Data== | ||
Line 363: | Line 156: | ||
*Date from 1999-2017 | *Date from 1999-2017 | ||
+ | |||
+ | ===Joined clinical trials table=== | ||
+ | |||
+ | The file which contains the number of trials in each city and year is located in: | ||
+ | Z:\Hubs\2017 | ||
+ | |||
+ | The file is in: | ||
+ | Z:\Hubs\2017\clean data | ||
+ | The name of the file is: | ||
+ | ctrialsSummary.txt | ||
+ | |||
+ | It contains: | ||
+ | *city | ||
+ | *year | ||
+ | *city_state_year | ||
+ | *noctrials - number of trials | ||
+ | |||
+ | The ctrials is joined with VC table. | ||
+ | The joined SQL script is: '''new_ctrials.sql''' and it is located in | ||
+ | Z:\Hubs\2017\sql scripts | ||
+ | |||
+ | The name of the joined table is '''new_merged_ctrials'''. | ||
+ | |||
+ | It contains: | ||
+ | *city | ||
+ | *state | ||
+ | *city_state_id | ||
+ | *city_state_year | ||
+ | *year | ||
+ | *noctrials | ||
+ | *seedamtm | ||
+ | *earlyamtm | ||
+ | *lateramtm | ||
+ | *selamtm | ||
+ | *numseeds | ||
+ | *numearly | ||
+ | *numlater | ||
+ | *numsel | ||
+ | |||
+ | All the values of noctrials with missing values for years 1999-2017 are set equal to 0. | ||
+ | |||
+ | ==Population Data== | ||
+ | Data is in: | ||
+ | Z:\Hubs | ||
+ | E:\McNair\Projects\Hubs\Summer 2017 | ||
+ | |||
+ | Database is '''cities''' | ||
+ | |||
+ | SQL script is: '''population.sql''' | ||
+ | The source files are: | ||
+ | *pop2000_2009.xlsx | ||
+ | *pop2010_2016.xlsx | ||
+ | |||
+ | They contain: | ||
+ | *State | ||
+ | *City name | ||
+ | *Year | ||
+ | *Population Estimates | ||
+ | |||
+ | Date from 2000-2016 | ||
+ | |||
+ | ===Joined population table=== | ||
+ | |||
+ | Data is in: | ||
+ | Z:\Hubs\2017\clean data | ||
+ | The file names are | ||
+ | 1_population.txt - contains data on population estimates from 2000-2009 | ||
+ | 2_population.txt - contains data on population estimates from 2010-2016 | ||
+ | |||
+ | |||
+ | Database is '''cities''' | ||
+ | SQL script is: '''new_population.sql''', located in | ||
+ | Z:\Hubs\2017\sql scripts | ||
+ | |||
+ | The population table is joined on VC table. The table is called '''new_merged_population'''. | ||
+ | |||
+ | They contain: | ||
+ | *City | ||
+ | *State | ||
+ | *city_state_id to uniquely identify each city | ||
+ | *city_state_year to uniquely identify each city in each year | ||
+ | *Population estimates | ||
+ | *Year | ||
+ | *Code from the state code and Fips code | ||
+ | *State full name | ||
+ | |||
+ | ==Income Data== | ||
+ | |||
+ | Raw data was obtained from Census data, American Communities Survey. | ||
+ | |||
+ | Raw Data is in: | ||
+ | E:\McNair\Projects\Hubs\Summer 2017\MSA Income_raw.zip | ||
+ | |||
+ | |||
+ | Date from 2005-2015 | ||
+ | |||
+ | The master list with MSAs and principal cities is titled '''list2.xls'''. It is located at: | ||
+ | Z:\Hubs\2017 | ||
+ | |||
+ | This master list includes: | ||
+ | *MSA code | ||
+ | *MSA name | ||
+ | *Principal City | ||
+ | *State | ||
+ | *Place code (city code) | ||
+ | *State Code | ||
+ | |||
+ | This master list was edited to associate each principal city with a unique state. E.g. if New York is the principal city located in New York-New Jersey MSA, it was associated with state NY-NJ. So '''list''' was edited to put New York with NY. | ||
+ | |||
+ | |||
+ | Cleaned Income data files are in | ||
+ | Z:\Hubs\2017\merging_on_ID | ||
+ | |||
+ | They contain: | ||
+ | *MSA code | ||
+ | *MSA | ||
+ | *Year | ||
+ | *Total Household Income | ||
+ | |||
+ | The MSA-City-State look up file is titled '''msa_city_state_wcode.txt'''. It is located in | ||
+ | Z:\Hubs\2017\merging_on_ID | ||
+ | |||
+ | The SQL file that merges income data from ACS (by MSA - Year) with the MSA-City file is titled '''income.sql'''. It is located here: | ||
+ | Z:\Hubs\2017\sql scripts | ||
+ | |||
+ | |||
+ | The final income table is in db '''cities''' titled '''merged_income'''. | ||
+ | |||
+ | It includes: | ||
+ | *MSA | ||
+ | *City | ||
+ | *State | ||
+ | *Year | ||
+ | *Total Household Income | ||
+ | |||
+ | The table includes 8780 observations | ||
+ | |||
+ | ===Joined income table=== | ||
+ | |||
+ | Data is in: | ||
+ | Z:\Hubs\clean data | ||
+ | The file names are: | ||
+ | INC_05.txt - INC_15.txt | ||
+ | |||
+ | |||
+ | Database is '''cities''' | ||
+ | SQL script is: merged_income.sql | ||
+ | |||
+ | |||
+ | They contain: | ||
+ | *City | ||
+ | *State | ||
+ | *city_state_id to uniquely identify each city | ||
+ | *Income | ||
+ | *Year | ||
+ | *Code from the state code and Fips code | ||
+ | |||
+ | ==Employment Data== | ||
+ | |||
+ | Data on employment was obtained from American Communities Survey, US Census Bureau. | ||
+ | |||
+ | Raw Data is in: | ||
+ | E:\McNair\Projects\Hubs\Summer 2017\Employment Data by MSA | ||
+ | Cleaned files are in | ||
+ | Z:\Hubs\2017\clean data | ||
+ | |||
+ | They contain: | ||
+ | *MSA code | ||
+ | *MSA | ||
+ | *Year | ||
+ | *Employment rate of individuals 16 years or older | ||
+ | *Unemployment rate of individuals 16 years or older | ||
+ | |||
+ | Date from 2005-2015 | ||
+ | |||
+ | The SQL file that merges employment data from ACS (by MSA - Year) with the MSA-City file is titled '''Employment.sql'''. | ||
+ | The file is located in: | ||
+ | Z:\Hubs\2017 | ||
+ | |||
+ | The final table is in db '''cities''' titled '''merged_employment'''. | ||
+ | |||
+ | It includes: | ||
+ | *MSA | ||
+ | *City | ||
+ | *Year | ||
+ | *Employment rate | ||
+ | *Unemployment rate | ||
+ | |||
+ | ===Joined employment table=== | ||
+ | |||
+ | Data is in: | ||
+ | Z:\Hubs\clean data | ||
+ | |||
+ | The file names are: | ||
+ | EMP_05.txt - EMP_15.txt | ||
+ | |||
+ | Database is '''cities''' | ||
+ | SQL script is: '''new_employment.sql''' and it is located in | ||
+ | Z:\Hubs\2017\sql scripts | ||
+ | |||
+ | The final table which is joined on VC is in db cities titled '''new_merged_employment'''. | ||
+ | |||
+ | They contain: | ||
+ | *City | ||
+ | *State | ||
+ | *Code from the state code and Fips code | ||
+ | *city_state_id to uniquely identify each city | ||
+ | *city_state_year to uniquely identify each city in each year | ||
+ | *Employment rates of individuals of 16 years or older | ||
+ | *Unemployment rates of individuals of 16 years or older | ||
+ | *Year | ||
+ | |||
+ | ==Schooling Data== | ||
+ | |||
+ | Data on schooling was obtained from American Communities Survey, US Census Bureau. | ||
+ | |||
+ | Raw Data is in: | ||
+ | E:\McNair\Projects\Hubs\Summer 2017\School Enrollment Data by MSA | ||
+ | Cleaned files are in | ||
+ | Z:\Hubs\2017\clean data | ||
+ | |||
+ | They contain: | ||
+ | *MSA code | ||
+ | *MSA | ||
+ | *Year | ||
+ | *Total number of population 3 years and over enrolled in school | ||
+ | *Percent of population 3 years and over enrolled in public school | ||
+ | *Percent of population 3 years and over enrolled in private school | ||
+ | |||
+ | Date from 2005-2015 | ||
+ | |||
+ | The SQL file that merges schooling data from ACS (by MSA - Year) with the MSA-City file is titled '''schooling.sql'''. | ||
+ | The file is located in: | ||
+ | Z:\Hubs\2017 | ||
+ | |||
+ | The final table is in db '''cities''' titled '''merged_schooling'''. | ||
+ | |||
+ | It includes: | ||
+ | *MSA | ||
+ | *City | ||
+ | *Year | ||
+ | *Total | ||
+ | *Percent_public_schooling | ||
+ | *Percent_private_schooling | ||
+ | |||
+ | ===Joined schooling table=== | ||
+ | |||
+ | Data is in: | ||
+ | Z:\Hubs\clean data | ||
+ | The file names are: | ||
+ | SCH_05.txt - SCH_15.txt | ||
+ | |||
+ | |||
+ | Database is '''cities''' | ||
+ | SQL script which joins this table with VC table is: '''new_merged_schooling.sql''' | ||
+ | The final table is in db '''cities''' titled '''new_merged_schooling'''. | ||
+ | |||
+ | It contains: | ||
+ | *City | ||
+ | *State | ||
+ | *city_state_id to uniquely identify each city | ||
+ | *city_state_year to uniquely identify each city in each year | ||
+ | *Total number of school enrollment | ||
+ | *Percentage enrolled in public schools | ||
+ | *Percentage enrolled in private schools | ||
+ | *Year | ||
+ | *Code from the state code and Fips code | ||
+ | |||
+ | ==VC Data== | ||
+ | |||
+ | |||
+ | |||
+ | Raw Data is in: | ||
+ | Z:\VentureCapitalData\SDCVCData\vcdb2 | ||
+ | The file name is roundleveloutput2.txt | ||
+ | |||
+ | It contains: | ||
+ | *city | ||
+ | *state | ||
+ | *year | ||
+ | *seedamtm - seed, amount in millions | ||
+ | *earlyamtm - early, amount in millions | ||
+ | *lateramtm - late, amount in millions | ||
+ | *selamtm - seed early late, amount in millions | ||
+ | *numseeds - number of seeds | ||
+ | *numearly | ||
+ | *numlater | ||
+ | *numsel | ||
+ | *numdeals | ||
+ | *numalive | ||
+ | |||
+ | |||
+ | Date from 1948-2017 | ||
+ | |||
+ | |||
+ | The table is in db '''cities''' titled '''new_vc'''. | ||
+ | |||
+ | It includes: | ||
+ | *city | ||
+ | *state | ||
+ | *city_state_id | ||
+ | *city_state_year | ||
+ | *seedamtm | ||
+ | *earlyamtm | ||
+ | *lateramtm | ||
+ | *selamtm | ||
+ | *numseeds | ||
+ | *numearly | ||
+ | *numlater | ||
+ | *numsel | ||
+ | *numdeals | ||
+ | *numalive | ||
+ | *year | ||
+ | |||
+ | ==Final Joined Data set == | ||
+ | |||
+ | The final data set is in file '''final.txt''' and is located here: | ||
+ | Z:\Hubs\2017 | ||
+ | |||
+ | It includes: | ||
+ | *city | ||
+ | *state | ||
+ | *city_state_year - (ID that data is merged on) | ||
+ | *year | ||
+ | *seedamtm - Seed Amount | ||
+ | *earlyamtm - Early Investment Amount | ||
+ | *lateramtm - Late Investment Amount | ||
+ | *selamtm - Seed early or late amount | ||
+ | *numseeds - Number of seed investments | ||
+ | *numearly - Number of early investments | ||
+ | *numlater - Number of late investments | ||
+ | *numsel | ||
+ | *numdeals - Number of deals (first contracts) | ||
+ | *numalive - Number of start ups alive | ||
+ | *income - Income per capita in each city-year | ||
+ | *sbir_nogrants - Number of SBIR grants | ||
+ | *sbir_valuegrant - Value of SBIR grants | ||
+ | *emp - Employment stats of each city-year | ||
+ | *unemp - Rate of unemployment | ||
+ | *popestimate - Population estimate of each city-year | ||
+ | *private - Enrollment in private schools | ||
+ | *public - Enrollment in public schools | ||
+ | *total - | ||
+ | *numfirms - Number of publicly traded firms | ||
+ | *randd - R&D expenditure of publicly traded firms | ||
+ | *revenue - Revenue of PTF | ||
+ | *totalassets | ||
+ | *nsf_nogrants - Number of NSF grants | ||
+ | *valuegrant - Value of NSF grants | ||
+ | *nih_nogrants - Number of NIH grants | ||
+ | *nih_valuegrant - Value of NIH grants | ||
+ | *noctrials - NUmber of clinical trials | ||
+ | |||
+ | == Defining Hubs == | ||
+ | '''Summer 2016''' - Last year a master list of 125 "potential" hubs was used. A scorecard was developed which filtered these 125 candidate hubs to determine which of these should be included in the study sample. This method resulted in a sample size of ~ 30. The master list and the final hubs list is titled '''Hubs Data v2_'16'''. It is located here: | ||
+ | Z:\Hubs\2017\hubs_data | ||
+ | |||
+ | '''Summer 2017''' - In order to obtain a more statistically significant sample of hubs, we developed 5 criteria which produce a more relaxed definition of hubs than last year. These include | ||
+ | |||
+ | *Availability of co-working space | ||
+ | *Coding classes or tech events | ||
+ | *Some focus on the tech sector (this is important as our dependent variable is VC funding) | ||
+ | *Presence of an accelerator | ||
+ | *Availability of mentorship for members. | ||
+ | |||
+ | We will review the 125 candidate hubs and select those which satisfy a subset or all of these characteristics. | ||
+ | |||
+ | |||
+ | |||
+ | [[category:Internal]] |
Latest revision as of 12:41, 21 September 2020
Hubs | |
---|---|
Project Information | |
Has title | Hubs |
Has owner | Hira Farooqi |
Has start date | |
Has deadline date | |
Has keywords | Data |
Has project status | Active |
Does subsume | Hubs Analysis 2017 |
Has sponsor | McNair Center |
Has project output | Data |
Copyright © 2019 edegan.com. All Rights Reserved. |
Important Notice: The last update to the hubs data was done manually by Ed and is in E:\projects\MeasuringHGHTEcosystems\HubsData-RevisedSimplified.xlsx
The Hubs Research Project is a full-length academic paper analyzing the effectiveness of "hubs", a component of the entrepreneurship ecosystem, in the advancement and growth of entrepreneurial success in a metropolitan area. It focuses on cities in the United States as the primary unit of analysis.
This page contains information about data used for this research project, including data sources, location of data on RDP and details on data processing.
Information on initial data work done prior to Summer 2017 can be found at Hubs Summer 2016.
Note on joining: The city-state-year ID from VC data is used as the master ID for joining datasets. Each table (e.g. income, nih, nsf, sbir, compustat) is first joined with the VC data on city-state-year ID and then the resulting tables are all joined together in the final table.
Contents
Data by zip code
- Population data, 2000-2016 - US Census Bureau (E:\McNair\Hubs\summer 2017)
https://www2.census.gov/programs-surveys/popest/datasets/
- Income data, 1998-2014 - The Internal Revenue Service (E:\McNair\Hubs\summer 2017)
https://www.irs.gov/uac/about-irs
- DCI index, to assess the economic well-being of communities
http://eig.org/dci/interactive-maps/u-s-zip-codes
- R&D Expenses, 1980-2016 - Wharton Research Data Services (E:\McNair\Hubs\summer 2017)
- Zipcode look-up table obtained from https://www.unitedstateszipcodes.org/zip-code-database/. It's available in (E:\McNair\Hubs\summer 2017).
Data by MSA
We have principle cities of MSAs from the census: https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html
We might be able to go City -> FIPS place code -> MSA?
Cities and their FIPS codes (which don't perfectly correspond) are available from https://www.census.gov/geo/reference/codes/place.html
The Census claims to provide city to MSA here: https://www.census.gov/geo/maps-data/data/ua_rel_download.html However, there is only CBSA!
This might do it: https://www2.census.gov/geo/pdfs/maps-data/data/rel/explanation_ua_cbsa_rel_10.pdf We can maybe track city to principal city to MSA
COMPUSTAT Data
The data set includes information on publicly traded firms in the US. It was obtained from the Wharton Research Data Services (https://wrds-web.wharton.upenn.edu/wrds/index.cfm?).
Raw Data is in:
E:\McNair\Projects\Hubs\Summer 2017 Z:\Hubs\2017
The source file is RandDExpenditures.txt. It contains:
- Date from 1980-2017 (July).
- 427799 records
- Fields include:
- R&D Expenditure
- Address (inc. city, zip, state)
- Revenue of firms
Database is cities
SQL script is: COMPUSTAT.sql
Output file is COMPUSTATSummary.txt. It contains:
- Variables: City, year, No.public firms, sum R&D, sum Sales, sum total assets
- 1979-2016
- 4440 cities
It is located in
Z:\Hubs\2017\Output_Files
NSF Data
Data is in:
E:\McNair\Projects\Hubs\Summer 2017 Z:\Hubs\2017
Database is cities
SQL script is: nsf_2017.sql
The source files are: nsf2017.txt, copied from table nsf, and nsf_institution copied from table nsf_grants_institution from the biotech db.
They contain:
- Award ID
- Award Institution
- Award Effective date
- Institution city
- Award Value
- Organization state code
From 1900 - 2017
Output file is nsfSummary.txt. It contains:
- Variables: City, State code year, nsf_nogrants, nsf_valuegrant
- 1900-2017
Joined NSF table
The joined nsf table with the VC table is found in db cities. The table is named merged_nsf. All the values of nogrants and valuegrant with missing values for years 1990-2017 are set equal to 0. The sql script is in
Z:\HUbs\2017\sql scripts
NIH Data
Data is in:
Z:\Hubs E:\McNair\Projects\Hubs\Summer 2017
Database is cities SQL script is: nih2017.sql The source files are:
- nih_1986_2001.csv
- nih_2002_2012.txt
- nih_2013_2015
located in E:\McNair\Projects\Federal Grant Data\NIH
The script that cleans NIH data and generates the summary table is titled nihSummary. It is located here:
Z:\Hubs\2017\sql scripts
This table includes
- year
- city
- state
- country
- nogrants (number of grants)
- valuegrant
- city_state
- Date from 1986-2015
Joined NIH table
The joined NIH table with the VC table is found in db cities. The table is named merged_nih. All the values of nih_valuegrant and nih_nogrants with missing values for years 1986-2015 are set equal to 0. The sql script is in
Z:\HUbs\2017\sql scripts
Clinical Trials Data
Data is in:
Z:\Hubs E:\McNair\Projects\Hubs\Summer 2017
Database is cities SQL script is: ctrials.sql The source file is:
- medclinical.txt
located in Z:\Hubs\2017
- Date from 1999-2017
Joined clinical trials table
The file which contains the number of trials in each city and year is located in:
Z:\Hubs\2017
The file is in:
Z:\Hubs\2017\clean data
The name of the file is:
ctrialsSummary.txt
It contains:
- city
- year
- city_state_year
- noctrials - number of trials
The ctrials is joined with VC table. The joined SQL script is: new_ctrials.sql and it is located in
Z:\Hubs\2017\sql scripts
The name of the joined table is new_merged_ctrials.
It contains:
- city
- state
- city_state_id
- city_state_year
- year
- noctrials
- seedamtm
- earlyamtm
- lateramtm
- selamtm
- numseeds
- numearly
- numlater
- numsel
All the values of noctrials with missing values for years 1999-2017 are set equal to 0.
Population Data
Data is in:
Z:\Hubs E:\McNair\Projects\Hubs\Summer 2017
Database is cities
SQL script is: population.sql The source files are:
- pop2000_2009.xlsx
- pop2010_2016.xlsx
They contain:
- State
- City name
- Year
- Population Estimates
Date from 2000-2016
Joined population table
Data is in:
Z:\Hubs\2017\clean data
The file names are
1_population.txt - contains data on population estimates from 2000-2009 2_population.txt - contains data on population estimates from 2010-2016
Database is cities
SQL script is: new_population.sql, located in
Z:\Hubs\2017\sql scripts
The population table is joined on VC table. The table is called new_merged_population.
They contain:
- City
- State
- city_state_id to uniquely identify each city
- city_state_year to uniquely identify each city in each year
- Population estimates
- Year
- Code from the state code and Fips code
- State full name
Income Data
Raw data was obtained from Census data, American Communities Survey.
Raw Data is in:
E:\McNair\Projects\Hubs\Summer 2017\MSA Income_raw.zip
Date from 2005-2015
The master list with MSAs and principal cities is titled list2.xls. It is located at:
Z:\Hubs\2017
This master list includes:
- MSA code
- MSA name
- Principal City
- State
- Place code (city code)
- State Code
This master list was edited to associate each principal city with a unique state. E.g. if New York is the principal city located in New York-New Jersey MSA, it was associated with state NY-NJ. So list was edited to put New York with NY.
Cleaned Income data files are in
Z:\Hubs\2017\merging_on_ID
They contain:
- MSA code
- MSA
- Year
- Total Household Income
The MSA-City-State look up file is titled msa_city_state_wcode.txt. It is located in
Z:\Hubs\2017\merging_on_ID
The SQL file that merges income data from ACS (by MSA - Year) with the MSA-City file is titled income.sql. It is located here:
Z:\Hubs\2017\sql scripts
The final income table is in db cities titled merged_income.
It includes:
- MSA
- City
- State
- Year
- Total Household Income
The table includes 8780 observations
Joined income table
Data is in:
Z:\Hubs\clean data
The file names are:
INC_05.txt - INC_15.txt
Database is cities SQL script is: merged_income.sql
They contain:
- City
- State
- city_state_id to uniquely identify each city
- Income
- Year
- Code from the state code and Fips code
Employment Data
Data on employment was obtained from American Communities Survey, US Census Bureau.
Raw Data is in:
E:\McNair\Projects\Hubs\Summer 2017\Employment Data by MSA
Cleaned files are in
Z:\Hubs\2017\clean data
They contain:
- MSA code
- MSA
- Year
- Employment rate of individuals 16 years or older
- Unemployment rate of individuals 16 years or older
Date from 2005-2015
The SQL file that merges employment data from ACS (by MSA - Year) with the MSA-City file is titled Employment.sql. The file is located in:
Z:\Hubs\2017
The final table is in db cities titled merged_employment.
It includes:
- MSA
- City
- Year
- Employment rate
- Unemployment rate
Joined employment table
Data is in:
Z:\Hubs\clean data
The file names are:
EMP_05.txt - EMP_15.txt
Database is cities SQL script is: new_employment.sql and it is located in Z:\Hubs\2017\sql scripts
The final table which is joined on VC is in db cities titled new_merged_employment.
They contain:
- City
- State
- Code from the state code and Fips code
- city_state_id to uniquely identify each city
- city_state_year to uniquely identify each city in each year
- Employment rates of individuals of 16 years or older
- Unemployment rates of individuals of 16 years or older
- Year
Schooling Data
Data on schooling was obtained from American Communities Survey, US Census Bureau.
Raw Data is in:
E:\McNair\Projects\Hubs\Summer 2017\School Enrollment Data by MSA
Cleaned files are in
Z:\Hubs\2017\clean data
They contain:
- MSA code
- MSA
- Year
- Total number of population 3 years and over enrolled in school
- Percent of population 3 years and over enrolled in public school
- Percent of population 3 years and over enrolled in private school
Date from 2005-2015
The SQL file that merges schooling data from ACS (by MSA - Year) with the MSA-City file is titled schooling.sql. The file is located in:
Z:\Hubs\2017
The final table is in db cities titled merged_schooling.
It includes:
- MSA
- City
- Year
- Total
- Percent_public_schooling
- Percent_private_schooling
Joined schooling table
Data is in:
Z:\Hubs\clean data
The file names are:
SCH_05.txt - SCH_15.txt
Database is cities SQL script which joins this table with VC table is: new_merged_schooling.sql The final table is in db cities titled new_merged_schooling.
It contains:
- City
- State
- city_state_id to uniquely identify each city
- city_state_year to uniquely identify each city in each year
- Total number of school enrollment
- Percentage enrolled in public schools
- Percentage enrolled in private schools
- Year
- Code from the state code and Fips code
VC Data
Raw Data is in:
Z:\VentureCapitalData\SDCVCData\vcdb2 The file name is roundleveloutput2.txt
It contains:
- city
- state
- year
- seedamtm - seed, amount in millions
- earlyamtm - early, amount in millions
- lateramtm - late, amount in millions
- selamtm - seed early late, amount in millions
- numseeds - number of seeds
- numearly
- numlater
- numsel
- numdeals
- numalive
Date from 1948-2017
The table is in db cities titled new_vc.
It includes:
- city
- state
- city_state_id
- city_state_year
- seedamtm
- earlyamtm
- lateramtm
- selamtm
- numseeds
- numearly
- numlater
- numsel
- numdeals
- numalive
- year
Final Joined Data set
The final data set is in file final.txt and is located here:
Z:\Hubs\2017
It includes:
- city
- state
- city_state_year - (ID that data is merged on)
- year
- seedamtm - Seed Amount
- earlyamtm - Early Investment Amount
- lateramtm - Late Investment Amount
- selamtm - Seed early or late amount
- numseeds - Number of seed investments
- numearly - Number of early investments
- numlater - Number of late investments
- numsel
- numdeals - Number of deals (first contracts)
- numalive - Number of start ups alive
- income - Income per capita in each city-year
- sbir_nogrants - Number of SBIR grants
- sbir_valuegrant - Value of SBIR grants
- emp - Employment stats of each city-year
- unemp - Rate of unemployment
- popestimate - Population estimate of each city-year
- private - Enrollment in private schools
- public - Enrollment in public schools
- total -
- numfirms - Number of publicly traded firms
- randd - R&D expenditure of publicly traded firms
- revenue - Revenue of PTF
- totalassets
- nsf_nogrants - Number of NSF grants
- valuegrant - Value of NSF grants
- nih_nogrants - Number of NIH grants
- nih_valuegrant - Value of NIH grants
- noctrials - NUmber of clinical trials
Defining Hubs
Summer 2016 - Last year a master list of 125 "potential" hubs was used. A scorecard was developed which filtered these 125 candidate hubs to determine which of these should be included in the study sample. This method resulted in a sample size of ~ 30. The master list and the final hubs list is titled Hubs Data v2_'16. It is located here:
Z:\Hubs\2017\hubs_data
Summer 2017 - In order to obtain a more statistically significant sample of hubs, we developed 5 criteria which produce a more relaxed definition of hubs than last year. These include
- Availability of co-working space
- Coding classes or tech events
- Some focus on the tech sector (this is important as our dependent variable is VC funding)
- Presence of an accelerator
- Availability of mentorship for members.
We will review the 125 candidate hubs and select those which satisfy a subset or all of these characteristics.