Urban Start-up Agglomeration and Venture Capital Investment

From edegan.com
Revision as of 19:32, 16 September 2017 by Ed (talk | contribs) (→‎Data)
Jump to navigation Jump to search
Academic Paper
Title Urban Start-up Agglomeration
Author Ed Egan
RAs Peter Jalbert, Jake Silberman, Christy Warden
Status In development
© edegan.com, 2016


Summary

Agglomeration is generally thought to be one of the most important determinants of growth for urban entrepreneurship ecosystems. However, there is essentially no empirical evidence to support this. This paper takes advantage of geocoding and introduces a novel measure of agglomeration. This measure is the smallest circle area that covers all startup offices, subject to having at least N startups in each circle. Using GIS data on cities, this paper controls for the density and socio-demographics of an area to identify the effect of just agglomeration.

Description

Clusters of economic activity plays a significant role in the firms performance and growth. An important driver of growth is the knowledge spillover between firms. This includes among others the facilitation of information flow and ideas between firms which could be a milestone especially in the growth of startup firms or small businesses. This project focuses on the effects of agglomeration on the performance and growth of startup firms. It introduces a novel measure of agglomeration which can be used to empirically test the effects of clustering. This measure the is smallest total circle area that covers all of the startups in the sample such that there are at least n firms in each circle. The projects is based on the creation of an algorithm which gives an unbiased measure to be used in the empirical analysis. The regression we are interested in takes the following form:

Regression equation.png

The dependent variable is a measure of growth of the firms. This measure could be investment forwarded one period or growth in investment. The control variables include the number of the startups firms, m, the agglomeration measure, A and a vector of other control variables affecting the growth of firms at time t. Because of the endogeneity in the circle area or the measure of agglomeration, A, there is a need for an instrumental variable to get consistent estimates of the effects we are interested in. The proposed instrument is the presence of a river, or road in between the points representing geographical locations of the venture capital backed up firms. The instrument affects agglomeration without having a direct impact on the growth. This makes it good candidate for a valid instrument. The next tasks are determining the additional control variables to include in the regression, years to include in the analysis and methods of finding an unbiased measure of agglomeration.

Data

Ed's additional datawork is in

Z:\VentureCapitalData\SDCVCData\vcdb2\ProcessingCoLevelSimple.sql

The key table for circle processing is CoLevelBlowout, which is restricted (to include cities with greater than 10 active at some point in the data) to make CoLevelForCircles.

We need to:

  1. Winsorize CoLevelBlowout
  2. Make the Bay Area (over time) data
  3. Plot the Bay Area data (with colors per Bay Area city) for 1985 to present
  4. Combine the plots to make an animated gif
  • SDC VentureXpert
  • GIS City Data
  • Data on NSF, NIH, population, income, clinical trials, employment, schooling, R&D expenditures and revenue of firms can be found in Hubs.
  • Data on the number of new vc backed firms in each city and year is in:
Z:\Hubs\2017\clean data
The name of the file is firm_nr.txt.

Database is cities SQL script is: nr_firms.sql

Raw data is in:

Z:\VentureCapitalData\SDCVCData\vcdb2
The file is colevelsimple.txt

In order to see if there are outliers, I get the average coordinates for all cities and find the differences of the firm's coordinates from the city coordinate. The script for the average city coordinates is in

Z:\Hubs\2017\sql scripts and the file name is newcolevel.sql.

The differences are taken in excel. The file containing the differences is in

Z:\Hubs\2017 and the file name is new_colevel.txt.


  • Data on the circle area in each city and year is in:
Z:\Hubs\2017\clean data
The name of the file is circles.txt. (It contains only 106 observations)

Database is cities SQL script is: circles.sql

The script for joining the two tables on the VC table is in:

Z:\Hubs\2017\sql scripts
 The name of the file is new_firm_nr_circles.sql
  • We use the cities with greater than 10 active VC backed firms. Data on the cities and number of active firms is in:
E:\McNair\Projects\Hubs\Summer 2017
The file is CitiesWithGT10Active.txt

The script for joining the final data with this file is located in

Z:\Hubs\2017\sql scripts
The file name is final_joined_kerda.sql.

The final data is in

Z:\Hubs\2017\clean data
The file name is new_final_kerda.txt.

Also:

  • Accelerators data is in
Z:\Hubs\2017\clean data
The file name is accelerators.txt
The table is accelerators

The joined accelerators data with the VC table is in joined_accelerators table. The script is in

Z:\Hubs\2017\sql scripts
The file name is join_accelerators.sql

The do file is in

Z:\Hubs\2017\kerda
The name is agglomeartion_kerda.do

It includes the graphs, tables and the preliminary FE regressions with VC funding amount and growth rate. It also predicts the hazard rates, matches on the hazard rate in order to create synthetic control and treatment groups. What is left to do is to add 2 lagged and 3 forward observations for the cities which do have a match. Remove the overlapping observations for the years that get a treatment but which at the same time serve as a control.

Also:


Unbiased measure

The number of startups affects the total area of the circles according to some function. The task is to find an unbiased measure of the area, which is not affected by the number of the startups, given the size and their distribution.

For the unbiased calculation of a measure in a different context see: http://users.nber.org/~edegan/w/images/d/d0/Hall_(2005)_-_A_Note_On_The_Bias_In_Herfindahl_Type_Measures_Based_On_Count_Data.pdf

GIS Resources