Start Up Address Finder Algorithm (Tool)

From edegan.com
Revision as of 12:53, 22 March 2017 by Ed (talk | contribs)
Jump to navigation Jump to search


McNair Project
Start Up Address Finder Algorithm (Tool)
Project logo 02.png
Project Information
Project Title Start Up Address Finder Algorithm (Tool)
Owner Jake Floyd
Start Date Summer 2016
Deadline
Keywords Tool
Primary Billing
Notes
Has project status Complete
Copyright © 2016 edegan.com. All Rights Reserved.


Description

Notes: The Start Up Address Finder Algorithm aims to provide street level address for start ups contained within the Crunchbase database.

Input: Crunchbase company list (which includes large amounts of information about the company, including some funding information and founding information)

Output: Addresses (specific to street address) for every company known.

Algorithm

To be filled once project is completed

Development Notes

7/7: Project development Notes: Beginning

Company list downloaded from Crunchbase.
Important information provided from list including company name, url, country, state, region and city.
Data was analyzed to determine total number of companies as well as percentage of fields in which country, state, region, and city were not complete. This data was recorded in the image below.
Count 7-7-16.png
Following this it was determined that the Seattle region would be used as a test region.
1070 Companies were contained within the Seattle region.
Each company was assigned a random number using the randombetween() excel function.
These companies were then sorted based upon this number (and assigned a new number based upon their new order for identification purposes)
The function CORREL was then applied to find the relationship between random and assigned number (expected to be around 1 as this was an assigned number)
Then the same function was used to determine the correlation between these numbers and funding total and funding date. The results are displayed below:
Correl 7-7-16.png
These values made us confident that the order of the list was randomized
Following this the company name was entered into google in order to determine if the street level address could be found
After a short test it was noted that Crunchbase contained a significant amount of street level addresses, and this hypothesis was to be tested in order to see if this was a common trend
A total of 160 companies were tested:
140 Had addresses on crunchbase
Of the 20 that were not on crunch base 4 had address on both linkedin and bloomberg, while 1 had an address on only linkedin, and 3 had an address only listed on bloomberg
This meant that 8 of the 20 companies could be found using linkedin and bloomberg
The remaining 12 companies were checked to see if they could be found using a whois search
Of these 11 contained a url
Of these 11: 3 address were retrievable with a who is search will 8 were not
Of the 8 within the category unable to retrieve; 4 were protected by godaddy.com; and could possibly be retrieved from there, also one had a .io url that could not be found

7/?: Project development Notes (cont'd)

To be filled for next section