== Data Processing Steps ==
[[File:AgglomerationProcess_v2.png|right|thumb|320px|Data Processing Steps]] The script [[File:Agglomeration_CBSA.sql.pdf|File:Agglomeration_CBSA.sql.pdf]] provides the processing steps within the PostgreSQL database. We first load the startup data, add in the longitudes and latitudes, and combine them with the CBSA boundaries. Startups in our data our keyed by a triple (coname, statecode, datefirstinv) as two different companies can have the same names in different states, or within the same state at two different times.
A python script, HCA.py, consumes data on each startup and its location for each MSA-year. It performs the HCA and returns a file with layer and cluster numbers for each startup and MSA-year.
A full copy of the amended hierarchy.py is available from https://www.edegan.com/wiki/Delineating_Spatial_Agglomerations.
The results of the HCA.py script are loaded back to the database, which produces a dataset for analysis in Stata. The script AgglomerationMaxR2.do loads this dataset and performs the HCA-Regressions. The results are passed to a python script, Cubic.py, which selects the appropriate number of agglomerations for each MSA. The results from both AgglomerationMaxR2.do and Cubic.py are then loaded back into the database, which produces a final dataset and set of tables providing data for the maps. The analysis on the final dataset uses the Stata script AgglomerationAnalysis.do and the maps are made using custom queries in QGIS.
== Code ==