Difference between revisions of "Parallel Enclosing Circle Algorithm"
Line 3: | Line 3: | ||
|Has owner=Oliver Chang, | |Has owner=Oliver Chang, | ||
|Has start date=July 31, 2017 | |Has start date=July 31, 2017 | ||
− | |Has deadline= | + | |Has deadline=October 4, 2017 |
− | |Has project status= | + | |Has project status=Complete |
|Is dependent on=Enclosing Circle Algorithm, | |Is dependent on=Enclosing Circle Algorithm, | ||
}} | }} | ||
+ | A thin-wrapper around [[Enclosing_Circle_Algorithm_(Rework)|the enclosing circle algorithm]] which allows for instance-level parallelization. | ||
+ | This project consists of the python files in <code>E:\McNair\Projects\OliverLovesCircles\src\python</code>. | ||
− | + | Parallelization is implemented via Python2's [https://docs.python.org/2/library/subprocess.html#subprocess.Popen <code>subprocess.open()</code>] which is non-blocking and available in the standard library. | |
Line 20: | Line 22: | ||
== Parameters == | == Parameters == | ||
− | * <code> | + | * in <code>circles.py</code>: |
− | * <code>n</code>: the minimum number of data points that must be included in a circle | + | ** <code>ITERATIONS</code>: the number of iterations to attempt for each <code>k</code> to find minimum for that <code>k</code> |
+ | ** <code>MIN_POINTS_PER_CIRCLE</code> (AKA <code>n</code>): the minimum number of data points that must be included in a circle | ||
+ | * in <code>vc_circles.py</code> | ||
+ | ** <code>NUMBER_INSTANCES</code>: number of parallel instances to run; assume no data-races between instances | ||
+ | ** <code>SWEEP_CYCLE_SECONDS</code>: amount of time before removing completed jobs from the current job and adding new jobs if any files are left to process | ||
+ | ** <code>TIMEOUT_MINUTES</code>: maximum running time of a parallel instance of the algorithm | ||
+ | ** <code>SPLIT_THRESHOLD</code>: if a dataset has more than this threshold of data points, it will be split via k-means | ||
− | == | + | == Example Usage == |
− | + | <nowiki> | |
+ | $ python vc_circles.py --infile E:/McNair/Projects/OliverLovesCircles/CoLevelForCirclesNotRunGTE200.txt</nowiki> | ||
+ | where <code>CoLevelForCirclesNotRunGTE200.txt</code> is a tab-separated values file with the columns | ||
+ | <code>placestate, place, statecode, year, latitude, longitude, coname, datefirstinv, placens, geoid, city</code> | ||
− | + | This command will populate (and overwrite) any files in <code>data/</code>. The format of the filenames in this directory are <code>{city}{sep}{state}{sep}{year}{sep}{num}.tsv</code> where <code>num</code> is a 0-indexed integer of a split city/state/year <code>infile</code> that has greater than <code>SPLIT_THRESHOLD</code> | |
− | |||
== Related Pages == | == Related Pages == |
Revision as of 11:47, 6 October 2017
Parallel Enclosing Circle Algorithm | |
---|---|
Project Information | |
Project Title | Parallel Enclosing Circle Algorithm |
Owner | Oliver Chang |
Start Date | July 31, 2017 |
Deadline | October 4, 2017 |
Primary Billing | |
Notes | |
Has project status | Complete |
Is dependent on | Enclosing Circle Algorithm |
Copyright © 2016 edegan.com. All Rights Reserved. |
A thin-wrapper around the enclosing circle algorithm which allows for instance-level parallelization.
This project consists of the python files in E:\McNair\Projects\OliverLovesCircles\src\python
.
Parallelization is implemented via Python2's subprocess.open()
which is non-blocking and available in the standard library.
The Problem
Note that this is not the classical enclosing circle algorithm.
Rather, we seek to minimize the sum of enclosing circles containing at least n
points.
Thus, multiple circles are allowed and inclusion in multiple circles is possible.
Parameters
- in
circles.py
:ITERATIONS
: the number of iterations to attempt for eachk
to find minimum for thatk
MIN_POINTS_PER_CIRCLE
(AKAn
): the minimum number of data points that must be included in a circle
- in
vc_circles.py
NUMBER_INSTANCES
: number of parallel instances to run; assume no data-races between instancesSWEEP_CYCLE_SECONDS
: amount of time before removing completed jobs from the current job and adding new jobs if any files are left to processTIMEOUT_MINUTES
: maximum running time of a parallel instance of the algorithmSPLIT_THRESHOLD
: if a dataset has more than this threshold of data points, it will be split via k-means
Example Usage
$ python vc_circles.py --infile E:/McNair/Projects/OliverLovesCircles/CoLevelForCirclesNotRunGTE200.txt
where CoLevelForCirclesNotRunGTE200.txt
is a tab-separated values file with the columns
placestate, place, statecode, year, latitude, longitude, coname, datefirstinv, placens, geoid, city
This command will populate (and overwrite) any files in data/
. The format of the filenames in this directory are {city}{sep}{state}{sep}{year}{sep}{num}.tsv
where num
is a 0-indexed integer of a split city/state/year infile
that has greater than SPLIT_THRESHOLD