Difference between revisions of "Hierarchical Clustering"

McNair Project
Hierarchical Clustering
Project Information
Project Title	Hierarchical Clustering
Owner	Kyran Adams, Oliver Chang
Start Date	12/1/2017
Deadline
Keywords	Cluster, Clustering, Circles, Pain in the ass, Agglomeration
Primary Billing
Notes
Has project status	Active
	Copyright © 2016 edegan.com. All Rights Reserved.

Revision as of 14:20, 29 May 2019

Summary

The code is in

E:\projects\hca

The python3 file is main.py

The code uses the AgglomerativeClustering from sklearn.cluster, which doesn't have GPU support.

If this is being run on a new build box then

pip install statistics
pip install gmplot

The input is a tdt file named CoLevelForCircles.txt with 7 columns:

city state year lat lon coname datefirstinv

The output is a tdt file named Results.tsv with 8 columns:

(city, state, year) layer cluster ('lat','long','coname','datefirstinv')

Documentation

There's useful reference material here: https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/

Note that it should be possible to use Tensorflow's KMeansClustering to achieve the same result.

Old Code Notes

This code takes a CoLevel master file, clusters points using k (number of clusters) in the range [1, num points / 5), and creates a file output.tsv.

Output.tsv has columns place, statecode, year, layer, cluster, lat, long, coname, datefirstinv. Layer is k, and cluster is the id of the cluster that the point belongs to.

The original version by Kyran and Oliver is in:

E:\McNair\Projects\FastCircles\src

You can run this program with:

python3 main.py

Difference between revisions of "Hierarchical Clustering"

Revision as of 14:20, 29 May 2019

Summary

Documentation

Old Code Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools

@@ Line 15: / Line 15: @@
 The code uses the AgglomerativeClustering from sklearn.cluster, which doesn't have GPU support.
+If this is being run on a new build box then
+ pip install statistics
+ pip install gmplot
 The input is a tdt file named CoLevelForCircles.txt with 7 columns:
@@ Line 20: / Line 24: @@
 The output is a tdt file named Results.tsv with 8 columns:
   (city, state, year) layer cluster ('lat','long','coname','datefirstinv')
 ==Documentation==