We can't find a decent, let alone seminal, reference for using the elbow method to select the number of clusters. Our problem, which uses geographic coordinates, is also a special case anyway. So, we could implement a method using scalar distance and put a description of it, and its relationship to other measures, in the appendix. It might be a good value-added for the paper.
One final thought: We could weight the distance between locations and the mean(s) by the fraction of startups in the location.
====The Heuristic Method Justification====