Difference between revisions of "Urban Start-up Agglomeration and Venture Capital Investment"
(→Data) |
(→Data) |
||
Line 20: | Line 20: | ||
==Data== | ==Data== | ||
+ | |||
+ | ===Making the circle input data=== | ||
Ed's additional datawork is in | Ed's additional datawork is in | ||
Line 28: | Line 30: | ||
We need to: | We need to: | ||
#Winsorize CoLevelBlowout | #Winsorize CoLevelBlowout | ||
+ | #Compute the circles! | ||
#Make the Bay Area (over time) data | #Make the Bay Area (over time) data | ||
#Plot the Bay Area data (with colors per Bay Area city) for 1985 to present | #Plot the Bay Area data (with colors per Bay Area city) for 1985 to present | ||
#Combine the plots to make an animated gif | #Combine the plots to make an animated gif | ||
− | *SDC VentureXpert | + | ===Main Sources=== |
+ | |||
+ | The primary sources of data for this project are: | ||
+ | *SDC VentureXpert - from [[VC Database Rebuild]], the key table is ''' | ||
*GIS City Data | *GIS City Data | ||
*Data on NSF, NIH, population, income, clinical trials, employment, schooling, R&D expenditures and revenue of firms can be found in [[Hubs]]. | *Data on NSF, NIH, population, income, clinical trials, employment, schooling, R&D expenditures and revenue of firms can be found in [[Hubs]]. | ||
− | + | ||
+ | ===VC data=== | ||
+ | |||
+ | Data on the number of new vc backed firms in each city and year is in: | ||
Z:\Hubs\2017\clean data | Z:\Hubs\2017\clean data | ||
The name of the file is '''firm_nr.txt'''. | The name of the file is '''firm_nr.txt'''. | ||
Line 51: | Line 60: | ||
The differences are taken in excel. The file containing the differences is in | The differences are taken in excel. The file containing the differences is in | ||
Z:\Hubs\2017 and the file name is '''new_colevel.txt'''. | Z:\Hubs\2017 and the file name is '''new_colevel.txt'''. | ||
− | |||
*Data on the circle area in each city and year is in: | *Data on the circle area in each city and year is in: | ||
Line 75: | Line 83: | ||
The file name is '''new_final_kerda.txt'''. | The file name is '''new_final_kerda.txt'''. | ||
− | + | ===Accelerator data=== | |
− | + | ||
+ | Accelerators data is in | ||
Z:\Hubs\2017\clean data | Z:\Hubs\2017\clean data | ||
The file name is accelerators.txt | The file name is accelerators.txt | ||
Line 91: | Line 100: | ||
It also predicts the hazard rates, matches on the hazard rate in order to create synthetic control and treatment groups. | It also predicts the hazard rates, matches on the hazard rate in order to create synthetic control and treatment groups. | ||
What is left to do is to add 2 lagged and 3 forward observations for the cities which do have a match. Remove the overlapping observations for the years that get a treatment but which at the same time serve as a control. | What is left to do is to add 2 lagged and 3 forward observations for the cities which do have a match. Remove the overlapping observations for the years that get a treatment but which at the same time serve as a control. | ||
+ | |||
+ | ===See also=== | ||
Also: | Also: | ||
Line 98: | Line 109: | ||
− | + | ==Unbiased measure== | |
The number of startups affects the total area of the circles according to some function. The task is to find an unbiased measure of the area, which is not affected by the number of the startups, given the size and their distribution. | The number of startups affects the total area of the circles according to some function. The task is to find an unbiased measure of the area, which is not affected by the number of the startups, given the size and their distribution. | ||
Line 104: | Line 115: | ||
For the unbiased calculation of a measure in a different context see: http://users.nber.org/~edegan/w/images/d/d0/Hall_(2005)_-_A_Note_On_The_Bias_In_Herfindahl_Type_Measures_Based_On_Count_Data.pdf | For the unbiased calculation of a measure in a different context see: http://users.nber.org/~edegan/w/images/d/d0/Hall_(2005)_-_A_Note_On_The_Bias_In_Herfindahl_Type_Measures_Based_On_Count_Data.pdf | ||
− | + | ==GIS Resources== | |
+ | |||
*https://www.census.gov/geo/maps-data/data/tiger-line.html | *https://www.census.gov/geo/maps-data/data/tiger-line.html | ||
*https://www.census.gov/geo/maps-data/data/tiger.html | *https://www.census.gov/geo/maps-data/data/tiger.html | ||
*http://postgis.net/features/ | *http://postgis.net/features/ | ||
*https://en.wikipedia.org/wiki/GIS_file_formats | *https://en.wikipedia.org/wiki/GIS_file_formats |
Revision as of 19:36, 16 September 2017
Academic Paper | |
---|---|
Title | Urban Start-up Agglomeration |
Author | Ed Egan |
RAs | Peter Jalbert, Jake Silberman, Christy Warden |
Status | In development |
© edegan.com, 2016 |
Contents
Summary
Agglomeration is generally thought to be one of the most important determinants of growth for urban entrepreneurship ecosystems. However, there is essentially no empirical evidence to support this. This paper takes advantage of geocoding and introduces a novel measure of agglomeration. This measure is the smallest circle area that covers all startup offices, subject to having at least N startups in each circle. Using GIS data on cities, this paper controls for the density and socio-demographics of an area to identify the effect of just agglomeration.
Description
Clusters of economic activity plays a significant role in the firms performance and growth. An important driver of growth is the knowledge spillover between firms. This includes among others the facilitation of information flow and ideas between firms which could be a milestone especially in the growth of startup firms or small businesses. This project focuses on the effects of agglomeration on the performance and growth of startup firms. It introduces a novel measure of agglomeration which can be used to empirically test the effects of clustering. This measure the is smallest total circle area that covers all of the startups in the sample such that there are at least n firms in each circle. The projects is based on the creation of an algorithm which gives an unbiased measure to be used in the empirical analysis. The regression we are interested in takes the following form:
The dependent variable is a measure of growth of the firms. This measure could be investment forwarded one period or growth in investment. The control variables include the number of the startups firms, m, the agglomeration measure, A and a vector of other control variables affecting the growth of firms at time t. Because of the endogeneity in the circle area or the measure of agglomeration, A, there is a need for an instrumental variable to get consistent estimates of the effects we are interested in. The proposed instrument is the presence of a river, or road in between the points representing geographical locations of the venture capital backed up firms. The instrument affects agglomeration without having a direct impact on the growth. This makes it good candidate for a valid instrument. The next tasks are determining the additional control variables to include in the regression, years to include in the analysis and methods of finding an unbiased measure of agglomeration.
Data
Making the circle input data
Ed's additional datawork is in
Z:\VentureCapitalData\SDCVCData\vcdb2\ProcessingCoLevelSimple.sql
The key table for circle processing is CoLevelBlowout, which is restricted (to include cities with greater than 10 active at some point in the data) to make CoLevelForCircles.
We need to:
- Winsorize CoLevelBlowout
- Compute the circles!
- Make the Bay Area (over time) data
- Plot the Bay Area data (with colors per Bay Area city) for 1985 to present
- Combine the plots to make an animated gif
Main Sources
The primary sources of data for this project are:
- SDC VentureXpert - from VC Database Rebuild, the key table is
- GIS City Data
- Data on NSF, NIH, population, income, clinical trials, employment, schooling, R&D expenditures and revenue of firms can be found in Hubs.
VC data
Data on the number of new vc backed firms in each city and year is in:
Z:\Hubs\2017\clean data The name of the file is firm_nr.txt.
Database is cities SQL script is: nr_firms.sql
Raw data is in:
Z:\VentureCapitalData\SDCVCData\vcdb2 The file is colevelsimple.txt
In order to see if there are outliers, I get the average coordinates for all cities and find the differences of the firm's coordinates from the city coordinate. The script for the average city coordinates is in
Z:\Hubs\2017\sql scripts and the file name is newcolevel.sql.
The differences are taken in excel. The file containing the differences is in
Z:\Hubs\2017 and the file name is new_colevel.txt.
- Data on the circle area in each city and year is in:
Z:\Hubs\2017\clean data The name of the file is circles.txt. (It contains only 106 observations)
Database is cities SQL script is: circles.sql
The script for joining the two tables on the VC table is in:
Z:\Hubs\2017\sql scripts The name of the file is new_firm_nr_circles.sql
- We use the cities with greater than 10 active VC backed firms. Data on the cities and number of active firms is in:
E:\McNair\Projects\Hubs\Summer 2017 The file is CitiesWithGT10Active.txt
The script for joining the final data with this file is located in
Z:\Hubs\2017\sql scripts The file name is final_joined_kerda.sql.
The final data is in
Z:\Hubs\2017\clean data The file name is new_final_kerda.txt.
Accelerator data
Accelerators data is in
Z:\Hubs\2017\clean data The file name is accelerators.txt The table is accelerators
The joined accelerators data with the VC table is in joined_accelerators table. The script is in
Z:\Hubs\2017\sql scripts The file name is join_accelerators.sql
The do file is in
Z:\Hubs\2017\kerda The name is agglomeartion_kerda.do
It includes the graphs, tables and the preliminary FE regressions with VC funding amount and growth rate. It also predicts the hazard rates, matches on the hazard rate in order to create synthetic control and treatment groups. What is left to do is to add 2 lagged and 3 forward observations for the cities which do have a match. Remove the overlapping observations for the years that get a treatment but which at the same time serve as a control.
See also
Also:
- Enclosing Circle Algorithm
- Normalizer
- Geocode.py
Unbiased measure
The number of startups affects the total area of the circles according to some function. The task is to find an unbiased measure of the area, which is not affected by the number of the startups, given the size and their distribution.
For the unbiased calculation of a measure in a different context see: http://users.nber.org/~edegan/w/images/d/d0/Hall_(2005)_-_A_Note_On_The_Bias_In_Herfindahl_Type_Measures_Based_On_Count_Data.pdf