Difference between revisions of "Hierarchical Clustering"
(Created page with "{{McNair Projects |Has title=Hierarchical Clustering |Has owner=Kyran Adams, Oliver Chang, |Has start date=12/1/2017 |Has keywords=Cluster, Clustering, Circles, Pain in the as...") |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | {{McNair | + | {{Project |
+ | |Has project output=Tool | ||
+ | |Has sponsor=McNair Center | ||
|Has title=Hierarchical Clustering | |Has title=Hierarchical Clustering | ||
|Has owner=Kyran Adams, Oliver Chang, | |Has owner=Kyran Adams, Oliver Chang, | ||
Line 6: | Line 8: | ||
|Has project status=Active | |Has project status=Active | ||
}} | }} | ||
− | + | ||
+ | ==Summary== | ||
+ | |||
+ | The code is in | ||
+ | E:\projects\hca | ||
+ | |||
+ | The python3 file is main.py | ||
+ | |||
+ | The code uses the AgglomerativeClustering from sklearn.cluster, which doesn't have GPU support. | ||
+ | |||
+ | If this is being run on a new build box then | ||
+ | pip install statistics | ||
+ | pip install gmplot | ||
+ | |||
+ | The input is a tdt file named CoLevelForCircles.txt with 7 columns: | ||
+ | city state year lat lon coname datefirstinv | ||
+ | |||
+ | The output is a tdt file named Results.tsv with 8 columns: | ||
+ | (city, state, year) layer cluster ('lat','long','coname','datefirstinv') | ||
+ | |||
+ | ==Documentation== | ||
+ | |||
+ | There's useful reference material here: https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/ | ||
+ | |||
+ | Note that it should be possible to use [https://www.tensorflow.org/api_docs/python/tf/contrib/factorization/KMeansClustering Tensorflow's KMeansClustering] to achieve the same result. | ||
+ | |||
+ | ==Old Code Notes== | ||
This code takes a CoLevel master file, clusters points using k (number of clusters) in the range [1, num points / 5), and creates a file output.tsv. | This code takes a CoLevel master file, clusters points using k (number of clusters) in the range [1, num points / 5), and creates a file output.tsv. | ||
Output.tsv has columns place, statecode, year, layer, cluster, lat, long, coname, datefirstinv. Layer is k, and cluster is the id of the cluster that the point belongs to. | Output.tsv has columns place, statecode, year, layer, cluster, lat, long, coname, datefirstinv. Layer is k, and cluster is the id of the cluster that the point belongs to. | ||
+ | |||
+ | The original version by Kyran and Oliver is in: | ||
+ | E:\McNair\Projects\FastCircles\src | ||
You can run this program with: | You can run this program with: | ||
<code>python3 main.py</code> | <code>python3 main.py</code> |
Latest revision as of 12:47, 21 September 2020
Hierarchical Clustering | |
---|---|
Project Information | |
Has title | Hierarchical Clustering |
Has owner | Kyran Adams, Oliver Chang |
Has start date | 12/1/2017 |
Has deadline date | |
Has keywords | Cluster, Clustering, Circles, Pain in the ass, Agglomeration |
Has project status | Active |
Has sponsor | McNair Center |
Has project output | Tool |
Copyright © 2019 edegan.com. All Rights Reserved. |
Summary
The code is in
E:\projects\hca
The python3 file is main.py
The code uses the AgglomerativeClustering from sklearn.cluster, which doesn't have GPU support.
If this is being run on a new build box then
pip install statistics pip install gmplot
The input is a tdt file named CoLevelForCircles.txt with 7 columns:
city state year lat lon coname datefirstinv
The output is a tdt file named Results.tsv with 8 columns:
(city, state, year) layer cluster ('lat','long','coname','datefirstinv')
Documentation
There's useful reference material here: https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/
Note that it should be possible to use Tensorflow's KMeansClustering to achieve the same result.
Old Code Notes
This code takes a CoLevel master file, clusters points using k (number of clusters) in the range [1, num points / 5), and creates a file output.tsv.
Output.tsv has columns place, statecode, year, layer, cluster, lat, long, coname, datefirstinv. Layer is k, and cluster is the id of the cluster that the point belongs to.
The original version by Kyran and Oliver is in:
E:\McNair\Projects\FastCircles\src
You can run this program with:
python3 main.py