Doing training data - 2,600 pages and are a little bit more than 1/2 way (~1500-1600).
==Finding Company URLs==
In this file (sheet: 'Most Recent Merged Data'):
E:\McNair\Projects\Accelerators\Summer 2018\Merged W Crunchbase Data as of July 17.xlx
We filter for companies (~4500) that did not receive VC, are not in crunchbase, and do not have URLs.
Using a Google crawler and URL matching script, we will try to find as many URLs as possible.