I also wanted to fix confusion between CSAs (Combined Statistical Areas)[https://en.wikipedia.org/wiki/Combined_statistical_area] and CMSAs (Consolidated Metropolitan Statistical Areas)[https://www2.census.gov/geo/pdfs/reference/GARM/Ch13GARM.pdf]. CMSA redirects to CSA on Wikipedia. However, it is actually not clear if these are the same things. OMB is the originator of both terms[https://www.census.gov/programs-surveys/metro-micro/about/masrp.html].
====The Elbow Method====
An attempt at a paragraph justifying the 'heuristic' method:
:Our heuristic method provides an objective technique for picking a city-year layer, which identifies and maps its clusters. It uses two measures. First, we are interested in clusters rather than lines or points, so we measure the percentage of clusters' locations. Second, we want to view each city-year through the same lens. As layer indices are not comparably across city-years, we use the HCA's 'fraction complete' to measure a layer's lens.
:As an HCA progresses towards completion, it takes locations out of clusters at an increasing rate and then decreasing rate. Accordingly, a city-year plot, with the fraction complete on the x-axis and the percentage of locations in clusters on the y-axis, gives an S-curve. The inflection point of this curve marks a conceptual transition between refining clusters and dismantling them.
'''Note:''' For each layer of the HCA from i=1 to i=I as (i-1)/(I-1), we define the HCA's fraction complete as (i-1)/(I-1). The HCA's fraction complete is then zero for the first layer when all locations are in a single hull, and one for the last layer when it has decomposed every cluster into separate locations.
From version 8:
:My ‘elbow method’ fits a cubic function to the relationship between the percentage unclustered and the percentage of locations that are in hulls and determines the inflection point. The inflection point finds the layer beyond which further unclustering moves locations out of hulls at a decreasing rate.
:As a rough guide, divisive clustering of geographic data of things like startups in cities moves through three stages. In the first stage, the algorithm identifies outliers as points, lines, or small-population hulls. Highest-first layers with low hull counts occur in this stage. In the second stage, the algorithm breaks apart core areas until it achieves the maximum number of hulls. Then, in the third stage, these hulls are refined, providing a progressively tighter lens on the core groupings, and dismantled, until all that remains are points. The elbow method identifies layers in this third stage at the tipping point between refining hulls and dismantling them.
====Guzman and Stern====