== Data Processing Steps ==
[[File:AgglomerationProcess_v2.png|right|thumb|320px|Data Processing Steps]] The script [[File:Agglomeration_CBSA.sql.pdf|Agglomeration_CBSA.sql]] provides the processing steps within the PostgreSQL database. We first load the startup data, add in the longitudes and latitudes, and combine them with the CBSA boundaries. Startups in our data our keyed by a triple (coname, statecode, datefirstinv) as two different companies can have the same names in different states, or within the same state at two different times.
A python script, HCA.py, consumes data on each startup and its location for each MSA-year. It performs the HCA and returns a file with layer and cluster numbers for each startup and MSA-year.