Ranking US Cities by Venture Capital

From edegan.com
Jump to navigation Jump to search

Project Description

This project was initially undertaken in the summer of 2017 and resulted in a full report on 2016. See: US Startup City Ranking.

The data was then updated for 2017 and 2018 (Q1 and Q2), which resulted in a ranking spreadsheet and document, but not a full report in the first rebuild using vcdb3. It was then updated again to include the first half of 2019, using Vcdb4. The third update includes everything up to the end of 2020 and uses VCDB20.

Third Rebuild

The third rebuild uses VCDB20 and covers up until the end of 2020. It combines geocoding with city information when geocoding is unavailable. The code is in:

E:\projects\vcdb20\Ranking.sql

The outputs, including the xlsx file that puts everything together, are in:

E:\projects\ranking

The build implements the following decisions:

  • Placenames are from geocoding where possible and city names where not.
  • The ACS and Tigergeog tables are joined in using geoid, which is determined from placenames (where possible).
  • A startup received growth VC if it one of its round stages was seed, early or later.
  • The ranking uses only growth VC (i.e., growthflag==1 on round) and rounddate < '2021-01-01'.
  • Places must have non-zero VC in at least one year between 1980 and 2020, inclusive.
  • numalive calculates dead as exit==1 or date>=(datelastinv+5 years), and alive as date>=datefirstinv and dead==0.
  • newdeal is a first investment (i.e., irrespective of stage and one per startup)
  • Undisclosed amounts are treated as zeros. Amounts are in millions (unless otherwise stated).

Datasets

  • PlaceYearRankingFull.txt (covers all places and years 1980-2020)
  • PlaceYearRanking200.txt (top 200 places for years 1980-2020)
  • StateYearRanking.txt (50 states + DC and PR for years 1980-2020)
  • PlaceYearRanking2020.txt (all places for 2020)
  • PlaceYearRanking200-2020.txt (top 200 places for 2020)
  • StateYearRanking2020.txt (50 states + DC and PR for 2020)

Artifacts and Facts

The main artifacts are:

  • The Top 100 (and 200) for 2020
  • A graph of total US growth VC investment 1980-2020

Other artifacts/facts:

  • Fraction of data that is geocodable or has a valid placename
  • Fraction of data with disclosed amounts
  • Turnover of the Top 10, 20, 50, 100
  • Correlations between the three component measures (top 200?)
  • Cumulative Percentage of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time
  • Average Growth rate of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time
  • Focus on select cities: Rankings over time for Houston, St. Louis, Cincinnati, Boulder, Waltham, Palo Alto

Second (Complete) Rebuild

This build used Vcdb4, see Vcdb4#Ranking.

The data was completely rebuilt based on geocoded places (i.e., place names from Tiger) and was restricted to rounds of growth VC from the outset. In this way, it can be joined the American Community Survey (ACS) Data, and it uses real, rather than self-reported cities (lots of people claim to be located in San Francisco while actually being in Alameda or South San Fran, etc.). Likewise, companies on the Emeryville side of the Emeryville-Berkeley border often claim to be in Berkeley.

First code rebuild

The constraint that a city had to have raised $100m in SEL VC over the last 34 years was too onerous for some of the very new ecosystems and was removed from the last build.

The build file is:

E:\project\vcdb3\Ranking.sql

This file replaces both RoundRanking.sql and RoundRankingState.sql, and fixes some minor issues with the latter having overwritten tables from the former.

The output files are:

  • Ranking2016WMove.txt
  • Ranking2017WMove.txt
  • Ranking2018Q12WMove.txt
  • Ranking2016WMoveState.txt
  • Ranking2017WMoveState.txt
  • Ranking2018Q12WMoveState.txt
  • USSelamnt.txt

They are produced in Z:/vcdb3

City Growth Rates

A code section was added to E:\project\vcdb3\Ranking.sql to explore city growth rates.

The main problem with estimating the average of [math]\frac{dollars_t - dollars_{t-1} }{dollars_{t-1}}[/math], where dollars is growth VC dollars invested in a city-year, is a truncation issue. [math]dollars_{t-1}[/math] is often zero, and it is impossible for the resulting value to be below -100% while it can be above +100%.

I experimented with various ways to limit this issue, including considering:

  • Only cities that have had more than $10m in a given year
  • City-year pairs with more than $10m
  • The top 100, 50, or 20 cities in 2017, for years from 1990 to 2017 or 2000 to 2017.
  • Only cities that had $10m or more for every year from 2000 to 2017

The growth rate for cities that had $10m or more for every year from 2000 to 2017 was 0.334 per year, implying that such "established" ecosystems double their VC every three years. There are 47 such cities. The list of average growth rates by city for these cities is below. The average growth rate increases to 39.17% for the 61 cities which have $10m or more for at least all bar one years, and to 51.43% for the 72 cities that have it for all bar two years.

        city         |        avg
---------------------+--------------------
 Malvern             |  0.993156571636778
 La Jolla            |  0.610594418156654
 Woburn              |  0.603879780755441
 Scottsdale          |  0.575267305240787
 Hayward             |  0.541730108455613
 Carlsbad            |  0.523716703358772
 Pleasanton          |  0.509316345882909
 Denver              |  0.501050394443719
 Burlingame          |  0.500020365810236
 Watertown           |  0.472356466716033
 Bedford             |  0.463504586546262
 Minneapolis         |  0.458034053534397
 Menlo Park          |  0.456066638196533
 North Waltham       |  0.443317946821974
 Palo Alto           |  0.425111728982036
 Emeryville          |  0.421690271365118
 Boston              |  0.412678555600625
 Seattle             |  0.381774873244813
 Los Gatos           |  0.374521134293509
 Lexington           |  0.369969174443311
 Los Angeles         |  0.365024808593637
 Atlanta             |  0.351405016195985
 South San Francisco |  0.321407553118583
 Durham              |  0.298104095936328
 Morrisville         |  0.297076835403358
 Pittsburgh          |  0.285423923204106
 Boulder             |  0.284022977435934
 Rockville           |  0.254953079027088
 Campbell            |  0.252975908859599
 Waltham             |  0.235784276400949
 San Francisco       |  0.231574232838588
 Cambridge           |  0.226097618461997
 Burlington          |  0.225559858389355
 Chicago             |  0.222048532754601
 New York            |  0.222000093754721
 Irvine              |  0.208556113551383
 Houston             |  0.206415333165726
 Fremont             |  0.204771682823134
 San Mateo           |  0.196399732252421
 Cupertino           |   0.18040546805283
 Mountain View       |  0.159432352723845
 San Jose            |  0.152962850233141
 Redwood City        | 0.0888134263293376
 San Diego           | 0.0711881806017128
 Santa Clara         | 0.0566618949392164
 Sunnyvale           | 0.0318692877947854
 Austin              | 0.0254452013443905
(47 rows) 

Note that if we exclude 2000 (i.e. consider 2001 to 2017 inclusive), the overall average growth rate drops to 0.2812, and some cities, like Houston, have materially lower growth rates. It is worth noting that Houston had one big up year in 2013, where growth VC investment levels where 2.168 times the previous year, but then suffered a about a 50% mean reversion in the following years. Without 2013, Houston's average 2001 to 2017 growth rate is -0.0197349989630022.

        city         | state |         avg
---------------------+-------+----------------------
 La Jolla            | CA    |    0.680031123241241
 Woburn              | MA    |     0.64163859928389
 Malvern             | PA    |    0.548552704731425
 Menlo Park          | CA    |    0.504383164761797
 Bedford             | MA    |    0.504211517175831
 Hayward             | CA    |     0.49942185834955
 Carlsbad            | CA    |    0.488399093032109
 North Waltham       | MA    |    0.479226357065314
 Minneapolis         | MN    |    0.473356376707567
 Scottsdale          | AZ    |    0.434025239485977
 Watertown           | MA    |    0.409908014646241
 Emeryville          | CA    |    0.403333541766609
 Pleasanton          | CA    |    0.402025316290345
 Burlingame          | CA    |    0.401579348905957
 Seattle             | WA    |    0.398842474741046
 Boston              | MA    |    0.378228024530469
 Morrisville         | NC    |    0.354372777597948
 Palo Alto           | CA    |    0.350685408163029
 Atlanta             | GA    |    0.336685642867238
 Denver              | CO    |    0.332644416223719
 Los Angeles         | CA    |    0.304585103537747
 Boulder             | CO    |     0.28880663996075
 South San Francisco | CA    |    0.248595987210001
 Campbell            | CA    |    0.245514436831657
 Rockville           | MD    |     0.24315823017573
 Los Gatos           | CA    |    0.233216007276288
 Pittsburgh          | PA    |    0.225499709366802
 Burlington          | MA    |    0.223634772577002
 San Francisco       | CA    |    0.219993519832261
 Lexington           | MA    |    0.205000361475032
 New York            | NY    |    0.203868079937506
 Chicago             | IL    |    0.203682874006923
 Cupertino           | CA    |    0.191836717363744
 San Mateo           | CA    |    0.187723172713732
 Mountain View       | CA    |    0.166926239441008
 Irvine              | CA    |     0.16644409093795
 Fremont             | CA    |    0.150247623895727
 Cambridge           | MA    |    0.140990253943638
 Houston             | TX    |    0.109011554442698
 San Diego           | CA    |    0.067832637853121
 Redwood City        | CA    |   0.0666242814798974
 San Jose            | CA    |   0.0471224685794607
 Durham              | NC    |   0.0331127017373814
 Waltham             | MA    |   0.0258917169868607
 Austin              | TX    |  0.00294040411546173
 Santa Clara         | CA    | -0.00296623469124349
 Sunnyvale           | CA    | -0.00667850429419462
(47 rows)

2017 and 2018 Q1/2 updates

The final files are in

E:\mcnair\Projects\VentureXpert Database

They include:

  • US and TX SEL Amount - Done.xlsx
  • Master Tables Sheet.docx
  • Top 15 States (Based on All Time SELAmnt) SEL vs Year - Done.xlsx
  • TX Cities Graphs and Data - Done.xlsx
  • Top 100 Cities 2017 - Done.xlsx
  • All States 2018 (Q1 and Q2) - Done.xlsx
  • All States 2017 - Done.xlsx
  • Top 100 Cities 2018 (Q1 And 2) - Done.xlsx

The database is vcdb3 and the SQL scripts are in

E:\mcnair\Projects\VentureXpert Database\vcdb3\LoadingScripts

The database build is documented in VentureXpert Data#Ranking Tables and Graphs

2016 Report

Project Location

E:\McNair\Projects\Ecosystem\Ranking

The report is available from that directory as well https://www.bakerinstitute.org/research/top-100-us-startup-cities-2016/.

Main SQL file location

Ranking.sql

This file works on the database cities.

However, the cities database takes vc data from vcdb2. See VC Database Rebuild for information on the build.

The output from vcdb2 is roundleveloutput2, which is exported as roundleveloutput2.txt

roundleveloutput2.txt is imported into cities as roundleveloutput2, and then left joined with populationtablecorrected.

Last Population SQL file location

Z:\Hubs\2017\sql scripts
Z:\Hubs\2017\clean data\Population

One issue is that cities without populations in populationtablecorrected but in roundleveloutput2 get dropped later. For example, roundleveloutput2 contains Brooklyn, NY but populationtablecorrected doesn't (its New York count is therefore suspect too).

populationtablecorrected is loaded from files 'population1.txt' and 'population2.txt'. Dups are removed with a max.

Their construction notes are on Hubs#Population_Data