Difference between revisions of "Ranking US Cities by Venture Capital"
(15 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | {{McNair | + | {{Project |
+ | |Has project output= | ||
+ | |Has sponsor=McNair Center | ||
|Has title=Ranking US Cities by Venture Capital | |Has title=Ranking US Cities by Venture Capital | ||
|Has owner=Ed Egan, Anne Dayton, Diana Carranza, | |Has owner=Ed Egan, Anne Dayton, Diana Carranza, | ||
Line 9: | Line 11: | ||
==Project Description== | ==Project Description== | ||
− | This project was initially undertaken in the summer of 2017 and resulted in a full report on 2016. | + | This project was initially undertaken in the summer of 2017 and resulted in a full report on 2016. See: [[US Startup City Ranking]]. |
− | The data was then updated for 2017 and 2018 (Q1 and Q2), which resulted in a ranking spreadsheet and document, but not a full report. | + | The data was then updated for 2017 and 2018 (Q1 and Q2), which resulted in a ranking spreadsheet and document, but not a full report in the first rebuild using [[Restoring vcdb3|vcdb3]]. It was then updated again to include the first half of 2019, using [[Vcdb4]]. The third update includes everything up to the end of 2020 and uses [[VCDB20]]. |
− | == | + | ==Third Rebuild== |
− | The data was completely rebuilt based on geocoded places (i.e., place names from Tiger) and was restricted to rounds of growth VC from the outset. In this way, it can be joined the [[American Community Survey]], and it uses real, rather than self-reported cities (lots of people claim to be located in San Francisco while actually being in Alameda or South San Fran, etc.). Likewise, companies on the Emeryville side of the Emeryville-Berkeley border often claim to be in Berkeley. | + | The third rebuild uses [[VCDB20]] and covers up until the end of 2020. It combines geocoding with city information when geocoding is unavailable. The code is in: |
+ | E:\projects\vcdb20\Ranking.sql | ||
+ | |||
+ | The outputs, including the xlsx file that puts everything together, are in: | ||
+ | E:\projects\ranking | ||
+ | |||
+ | The build implements the following decisions: | ||
+ | *'''Placenames''' are from geocoding where possible and '''city''' names where not. | ||
+ | *The '''ACS''' and '''Tigergeog''' tables are joined in using '''geoid''', which is determined from '''placenames''' (where possible). | ||
+ | *A startup received growth VC if it one of its round stages was seed, early or later. | ||
+ | *The ranking uses only growth VC (i.e., '''growthflag'''==1 on '''round''') and '''rounddate''' < '2021-01-01'. | ||
+ | *Places must have non-zero VC in at least one year between 1980 and 2020, inclusive. | ||
+ | *'''numalive''' calculates dead as exit==1 or date>=(datelastinv+5 years), and alive as date>=datefirstinv and dead==0. | ||
+ | *'''newdeal''' is a first investment (i.e., irrespective of stage and one per startup) | ||
+ | *Undisclosed amounts are treated as zeros. Amounts are in millions (unless otherwise stated). | ||
+ | |||
+ | ===Datasets=== | ||
+ | |||
+ | *PlaceYearRankingFull.txt (covers all places and years 1980-2020) | ||
+ | *PlaceYearRanking200.txt (top 200 places for years 1980-2020) | ||
+ | *StateYearRanking.txt (50 states + DC and PR for years 1980-2020) | ||
+ | *PlaceYearRanking2020.txt (all places for 2020) | ||
+ | *PlaceYearRanking200-2020.txt (top 200 places for 2020) | ||
+ | *StateYearRanking2020.txt (50 states + DC and PR for 2020) | ||
+ | |||
+ | ===Artifacts and Facts=== | ||
+ | |||
+ | The main artifacts are: | ||
+ | *The Top 100 (and 200) for 2020 | ||
+ | *A graph of total US growth VC investment 1980-2020 | ||
+ | |||
+ | Other artifacts/facts: | ||
+ | *Fraction of data that is geocodable or has a valid placename | ||
+ | *Fraction of data with disclosed amounts | ||
+ | *Turnover of the Top 10, 20, 50, 100 | ||
+ | *Correlations between the three component measures (top 200?) | ||
+ | *Cumulative Percentage of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time | ||
+ | *Average Growth rate of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time | ||
+ | *Focus on select cities: Rankings over time for Houston, St. Louis, Cincinnati, Boulder, Waltham, Palo Alto | ||
+ | |||
+ | ==Second (Complete) Rebuild== | ||
+ | |||
+ | This build used [[Vcdb4]], see [[Vcdb4#Ranking]]. | ||
+ | |||
+ | The data was completely rebuilt based on geocoded places (i.e., place names from Tiger) and was restricted to rounds of growth VC from the outset. In this way, it can be joined the [[American Community Survey (ACS) Data]], and it uses real, rather than self-reported cities (lots of people claim to be located in San Francisco while actually being in Alameda or South San Fran, etc.). Likewise, companies on the Emeryville side of the Emeryville-Berkeley border often claim to be in Berkeley. | ||
==First code rebuild== | ==First code rebuild== | ||
Line 41: | Line 87: | ||
A code section was added to E:\project\vcdb3\Ranking.sql to explore city growth rates. | A code section was added to E:\project\vcdb3\Ranking.sql to explore city growth rates. | ||
− | The main problem with estimating the average of <math>\frac{dollars_t - dollars_{t-1} }{dollars_{t-1}}< | + | The main problem with estimating the average of <math>\frac{dollars_t - dollars_{t-1} }{dollars_{t-1}}</math>, where dollars is growth VC dollars invested in a city-year, is a truncation issue. <math>dollars_{t-1}</math> is often zero, and it is impossible for the resulting value to be below -100% while it can be above +100%. |
I experimented with various ways to limit this issue, including considering: | I experimented with various ways to limit this issue, including considering: | ||
*Only cities that have had more than $10m in a given year | *Only cities that have had more than $10m in a given year | ||
*City-year pairs with more than $10m | *City-year pairs with more than $10m | ||
− | *The top 100, 50 or 20 cities in 2017, for years from 1990 to 2017 or 2000 to 2017. | + | *The top 100, 50, or 20 cities in 2017, for years from 1990 to 2017 or 2000 to 2017. |
*Only cities that had $10m or more for every year from 2000 to 2017 | *Only cities that had $10m or more for every year from 2000 to 2017 | ||
− | The growth rate for cities that had $10m or more for every year from 2000 to 2017 was 0.334 per year, implying that such "established" ecosystems double their VC every three years. There are 47 such cities. The list of average growth rates by city for these cities is below. The average growth rate increases to 39.17% for the 61 cities which have $10m or more for at least all bar one years, and to 51.43% for the 72 cities that have it | + | The growth rate for cities that had $10m or more for every year from 2000 to 2017 was 0.334 per year, implying that such "established" ecosystems double their VC every three years. There are 47 such cities. The list of average growth rates by city for these cities is below. The average growth rate increases to 39.17% for the 61 cities which have $10m or more for at least all bar one years, and to 51.43% for the 72 cities that have it for all bar two years. |
city | avg | city | avg | ||
Line 102: | Line 148: | ||
(47 rows) | (47 rows) | ||
− | Note that if we exclude 2000 (i.e. consider 2001 to 2017 inclusive), the overall average growth rate drops to 0.2812, and some cities, like Houston, have materially lower growth rates. It is worth noting that Houston had one big up year in 2013, where growth VC investment levels where 2.168 times the previous year, but then suffered a about a 50% mean reversion in the following years. Without 2013, Houston's average 2001 to 2017 growth rate is | + | Note that if we exclude 2000 (i.e. consider 2001 to 2017 inclusive), the overall average growth rate drops to 0.2812, and some cities, like Houston, have materially lower growth rates. It is worth noting that Houston had one big up year in 2013, where growth VC investment levels where 2.168 times the previous year, but then suffered a about a 50% mean reversion in the following years. Without 2013, Houston's average 2001 to 2017 growth rate is -0.0197349989630022. |
city | state | avg | city | state | avg |
Latest revision as of 12:22, 7 March 2021
Ranking US Cities by Venture Capital | |
---|---|
Project Information | |
Has title | Ranking US Cities by Venture Capital |
Has owner | Ed Egan, Anne Dayton, Diana Carranza |
Has start date | Summer 2017 |
Has deadline date | |
Has project status | Active |
Does subsume | Top Cities for VC Backed Companies |
Subsumed by: | Houston's Venture Capital Issue Brief |
Has sponsor | McNair Center |
Copyright © 2019 edegan.com. All Rights Reserved. |
Contents
Project Description
This project was initially undertaken in the summer of 2017 and resulted in a full report on 2016. See: US Startup City Ranking.
The data was then updated for 2017 and 2018 (Q1 and Q2), which resulted in a ranking spreadsheet and document, but not a full report in the first rebuild using vcdb3. It was then updated again to include the first half of 2019, using Vcdb4. The third update includes everything up to the end of 2020 and uses VCDB20.
Third Rebuild
The third rebuild uses VCDB20 and covers up until the end of 2020. It combines geocoding with city information when geocoding is unavailable. The code is in:
E:\projects\vcdb20\Ranking.sql
The outputs, including the xlsx file that puts everything together, are in:
E:\projects\ranking
The build implements the following decisions:
- Placenames are from geocoding where possible and city names where not.
- The ACS and Tigergeog tables are joined in using geoid, which is determined from placenames (where possible).
- A startup received growth VC if it one of its round stages was seed, early or later.
- The ranking uses only growth VC (i.e., growthflag==1 on round) and rounddate < '2021-01-01'.
- Places must have non-zero VC in at least one year between 1980 and 2020, inclusive.
- numalive calculates dead as exit==1 or date>=(datelastinv+5 years), and alive as date>=datefirstinv and dead==0.
- newdeal is a first investment (i.e., irrespective of stage and one per startup)
- Undisclosed amounts are treated as zeros. Amounts are in millions (unless otherwise stated).
Datasets
- PlaceYearRankingFull.txt (covers all places and years 1980-2020)
- PlaceYearRanking200.txt (top 200 places for years 1980-2020)
- StateYearRanking.txt (50 states + DC and PR for years 1980-2020)
- PlaceYearRanking2020.txt (all places for 2020)
- PlaceYearRanking200-2020.txt (top 200 places for 2020)
- StateYearRanking2020.txt (50 states + DC and PR for 2020)
Artifacts and Facts
The main artifacts are:
- The Top 100 (and 200) for 2020
- A graph of total US growth VC investment 1980-2020
Other artifacts/facts:
- Fraction of data that is geocodable or has a valid placename
- Fraction of data with disclosed amounts
- Turnover of the Top 10, 20, 50, 100
- Correlations between the three component measures (top 200?)
- Cumulative Percentage of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time
- Average Growth rate of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time
- Focus on select cities: Rankings over time for Houston, St. Louis, Cincinnati, Boulder, Waltham, Palo Alto
Second (Complete) Rebuild
This build used Vcdb4, see Vcdb4#Ranking.
The data was completely rebuilt based on geocoded places (i.e., place names from Tiger) and was restricted to rounds of growth VC from the outset. In this way, it can be joined the American Community Survey (ACS) Data, and it uses real, rather than self-reported cities (lots of people claim to be located in San Francisco while actually being in Alameda or South San Fran, etc.). Likewise, companies on the Emeryville side of the Emeryville-Berkeley border often claim to be in Berkeley.
First code rebuild
The constraint that a city had to have raised $100m in SEL VC over the last 34 years was too onerous for some of the very new ecosystems and was removed from the last build.
The build file is:
E:\project\vcdb3\Ranking.sql
This file replaces both RoundRanking.sql and RoundRankingState.sql, and fixes some minor issues with the latter having overwritten tables from the former.
The output files are:
- Ranking2016WMove.txt
- Ranking2017WMove.txt
- Ranking2018Q12WMove.txt
- Ranking2016WMoveState.txt
- Ranking2017WMoveState.txt
- Ranking2018Q12WMoveState.txt
- USSelamnt.txt
They are produced in Z:/vcdb3
City Growth Rates
A code section was added to E:\project\vcdb3\Ranking.sql to explore city growth rates.
The main problem with estimating the average of [math]\frac{dollars_t - dollars_{t-1} }{dollars_{t-1}}[/math], where dollars is growth VC dollars invested in a city-year, is a truncation issue. [math]dollars_{t-1}[/math] is often zero, and it is impossible for the resulting value to be below -100% while it can be above +100%.
I experimented with various ways to limit this issue, including considering:
- Only cities that have had more than $10m in a given year
- City-year pairs with more than $10m
- The top 100, 50, or 20 cities in 2017, for years from 1990 to 2017 or 2000 to 2017.
- Only cities that had $10m or more for every year from 2000 to 2017
The growth rate for cities that had $10m or more for every year from 2000 to 2017 was 0.334 per year, implying that such "established" ecosystems double their VC every three years. There are 47 such cities. The list of average growth rates by city for these cities is below. The average growth rate increases to 39.17% for the 61 cities which have $10m or more for at least all bar one years, and to 51.43% for the 72 cities that have it for all bar two years.
city | avg ---------------------+-------------------- Malvern | 0.993156571636778 La Jolla | 0.610594418156654 Woburn | 0.603879780755441 Scottsdale | 0.575267305240787 Hayward | 0.541730108455613 Carlsbad | 0.523716703358772 Pleasanton | 0.509316345882909 Denver | 0.501050394443719 Burlingame | 0.500020365810236 Watertown | 0.472356466716033 Bedford | 0.463504586546262 Minneapolis | 0.458034053534397 Menlo Park | 0.456066638196533 North Waltham | 0.443317946821974 Palo Alto | 0.425111728982036 Emeryville | 0.421690271365118 Boston | 0.412678555600625 Seattle | 0.381774873244813 Los Gatos | 0.374521134293509 Lexington | 0.369969174443311 Los Angeles | 0.365024808593637 Atlanta | 0.351405016195985 South San Francisco | 0.321407553118583 Durham | 0.298104095936328 Morrisville | 0.297076835403358 Pittsburgh | 0.285423923204106 Boulder | 0.284022977435934 Rockville | 0.254953079027088 Campbell | 0.252975908859599 Waltham | 0.235784276400949 San Francisco | 0.231574232838588 Cambridge | 0.226097618461997 Burlington | 0.225559858389355 Chicago | 0.222048532754601 New York | 0.222000093754721 Irvine | 0.208556113551383 Houston | 0.206415333165726 Fremont | 0.204771682823134 San Mateo | 0.196399732252421 Cupertino | 0.18040546805283 Mountain View | 0.159432352723845 San Jose | 0.152962850233141 Redwood City | 0.0888134263293376 San Diego | 0.0711881806017128 Santa Clara | 0.0566618949392164 Sunnyvale | 0.0318692877947854 Austin | 0.0254452013443905 (47 rows)
Note that if we exclude 2000 (i.e. consider 2001 to 2017 inclusive), the overall average growth rate drops to 0.2812, and some cities, like Houston, have materially lower growth rates. It is worth noting that Houston had one big up year in 2013, where growth VC investment levels where 2.168 times the previous year, but then suffered a about a 50% mean reversion in the following years. Without 2013, Houston's average 2001 to 2017 growth rate is -0.0197349989630022.
city | state | avg ---------------------+-------+---------------------- La Jolla | CA | 0.680031123241241 Woburn | MA | 0.64163859928389 Malvern | PA | 0.548552704731425 Menlo Park | CA | 0.504383164761797 Bedford | MA | 0.504211517175831 Hayward | CA | 0.49942185834955 Carlsbad | CA | 0.488399093032109 North Waltham | MA | 0.479226357065314 Minneapolis | MN | 0.473356376707567 Scottsdale | AZ | 0.434025239485977 Watertown | MA | 0.409908014646241 Emeryville | CA | 0.403333541766609 Pleasanton | CA | 0.402025316290345 Burlingame | CA | 0.401579348905957 Seattle | WA | 0.398842474741046 Boston | MA | 0.378228024530469 Morrisville | NC | 0.354372777597948 Palo Alto | CA | 0.350685408163029 Atlanta | GA | 0.336685642867238 Denver | CO | 0.332644416223719 Los Angeles | CA | 0.304585103537747 Boulder | CO | 0.28880663996075 South San Francisco | CA | 0.248595987210001 Campbell | CA | 0.245514436831657 Rockville | MD | 0.24315823017573 Los Gatos | CA | 0.233216007276288 Pittsburgh | PA | 0.225499709366802 Burlington | MA | 0.223634772577002 San Francisco | CA | 0.219993519832261 Lexington | MA | 0.205000361475032 New York | NY | 0.203868079937506 Chicago | IL | 0.203682874006923 Cupertino | CA | 0.191836717363744 San Mateo | CA | 0.187723172713732 Mountain View | CA | 0.166926239441008 Irvine | CA | 0.16644409093795 Fremont | CA | 0.150247623895727 Cambridge | MA | 0.140990253943638 Houston | TX | 0.109011554442698 San Diego | CA | 0.067832637853121 Redwood City | CA | 0.0666242814798974 San Jose | CA | 0.0471224685794607 Durham | NC | 0.0331127017373814 Waltham | MA | 0.0258917169868607 Austin | TX | 0.00294040411546173 Santa Clara | CA | -0.00296623469124349 Sunnyvale | CA | -0.00667850429419462 (47 rows)
2017 and 2018 Q1/2 updates
The final files are in
E:\mcnair\Projects\VentureXpert Database
They include:
- US and TX SEL Amount - Done.xlsx
- Master Tables Sheet.docx
- Top 15 States (Based on All Time SELAmnt) SEL vs Year - Done.xlsx
- TX Cities Graphs and Data - Done.xlsx
- Top 100 Cities 2017 - Done.xlsx
- All States 2018 (Q1 and Q2) - Done.xlsx
- All States 2017 - Done.xlsx
- Top 100 Cities 2018 (Q1 And 2) - Done.xlsx
The database is vcdb3 and the SQL scripts are in
E:\mcnair\Projects\VentureXpert Database\vcdb3\LoadingScripts
The database build is documented in VentureXpert Data#Ranking Tables and Graphs
2016 Report
Project Location
E:\McNair\Projects\Ecosystem\Ranking
The report is available from that directory as well https://www.bakerinstitute.org/research/top-100-us-startup-cities-2016/.
Main SQL file location
Ranking.sql
This file works on the database cities.
However, the cities database takes vc data from vcdb2. See VC Database Rebuild for information on the build.
The output from vcdb2 is roundleveloutput2, which is exported as roundleveloutput2.txt
roundleveloutput2.txt is imported into cities as roundleveloutput2, and then left joined with populationtablecorrected.
Last Population SQL file location
Z:\Hubs\2017\sql scripts
Z:\Hubs\2017\clean data\Population
One issue is that cities without populations in populationtablecorrected but in roundleveloutput2 get dropped later. For example, roundleveloutput2 contains Brooklyn, NY but populationtablecorrected doesn't (its New York count is therefore suspect too).
populationtablecorrected is loaded from files 'population1.txt' and 'population2.txt'. Dups are removed with a max.
Their construction notes are on Hubs#Population_Data