#We could do the elbow method on a per city-year basis. The number of statistical clusters is equal to the number of layers, so we'd be indexing over layers, and selecting a layer, for each city-year. It might be worth trying this for some city-year, say Tulsa, 2003. The code would be reusable for a bigger sample. Estimate: 3hrs.
#I've rechecked the code and I now think it is computationally feasible. What I was trying to do before was find the average distance between every set of coordinates, which is an order more complex than what we need to do to calculate even between-group variance (within and total are simpler). Think O(n) rather than O(n^2), and we have around ~20million statistical clusters spread over ~200k layers. Estimate, given (1) above: 2hrs.
=====Harder than it looks=====
In our context, we have a vectors <math>(a,b)</math> of locations, not a scalar for each point. This changes the math[https://online.stat.psu.edu/stat505/lesson/8/8.2], as well as the ultimate 'estimator'[https://online.stat.psu.edu/stat505/lesson/8/8.3] that we might use.
Specifically,
====The Heuristic Method Justification====