Difference between revisions of "Incubator Seed Data Coverage"
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{Project | {{Project | ||
+ | |Has project output=Data | ||
+ | |Has sponsor=Kauffman Incubator Project | ||
+ | |Has sponsor=Kauffman Incubator Project | ||
|Has title=Incubator Seed Data Coverage | |Has title=Incubator Seed Data Coverage | ||
|Has owner=Ed Egan, | |Has owner=Ed Egan, | ||
Line 100: | Line 103: | ||
And out of the 15 incubator names, 11 were in our '''incubators''' table (irrespective of location), and of these y had name variation(s). | And out of the 15 incubator names, 11 were in our '''incubators''' table (irrespective of location), and of these y had name variation(s). | ||
− | Fortunately the count is small, so we can conduct a manual review. For the Google crawler, we only count a hit if the website of the incubator itself is included in the results, rather than a news article or other information that references the incubator. In the table below, 1 indicates the incubator was present in the source, and 0 indicates it was absent. | + | Fortunately the count is small, so we can conduct a manual review. For the Google crawler, we only count a hit if the website of the incubator itself is included in the results, rather than a news article or other information that references the incubator. |
+ | |||
+ | ==Results== | ||
+ | |||
+ | In the table below, 1 indicates the incubator was present in the source, and 0 indicates it was absent. The last column, labelled '''any''' records if the incubator was in our complete seed data, comprised of the '''incubator''' table and the [[Google Crawler]] results. | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 261: | Line 268: | ||
*For Bunker Labs, the Austin location is missing but the Seattle location is in US Incubators, the Chicago location is in AngelList, and the Google crawler found an Arlington, VA location too. | *For Bunker Labs, the Austin location is missing but the Seattle location is in US Incubators, the Chicago location is in AngelList, and the Google crawler found an Arlington, VA location too. | ||
*For Galvanize, the Seattle location is in US Incubators and the San Francisco and New York City locations are both in AngelList. | *For Galvanize, the Seattle location is in US Incubators and the San Francisco and New York City locations are both in AngelList. | ||
+ | |||
+ | '''Overall, 73% of the incubators in the hand-collected data are present in the seed data.''' | ||
+ | |||
+ | The three of the four absent incubators have a university affiliation. AU Entrepreneurship Incubator is based at American University in DC, IncubatorCTX is based at Concordia University in Northwest Austin, and Discovery Launchpad at UMN is at the University of Minnesota. This suggests that we would need another source of data to capture academic incubators. The fourth missing incubator is the branch office of a chain (Galvanize). This chain does not describe itself as an incubator but appears to meet the criteria for one. We could attempt to put together data on incubator chains and their offices separately. The data also suggests that the different sources do capture different incubators. There is little apparent correlation between the sources, and the Google Crawler is the only one to capture more than half of the incubators. |
Latest revision as of 12:41, 21 September 2020
Incubator Seed Data Coverage | |
---|---|
Project Information | |
Has title | Incubator Seed Data Coverage |
Has owner | Ed Egan |
Has start date | |
Has deadline date | |
Has project status | Active |
Subsumed by: | Incubator Seed Data, Incubators in Five Ecosystems |
Has sponsor | Kauffman Incubator Project |
Has project output | Data |
Copyright © 2019 edegan.com. All Rights Reserved. |
Contents
Overview
The purpose of this project is to test the coverage and accuracy of the Incubator Seed Data using the hand-collected data on Incubators in Five Ecosystems as a benchmark.
Specifically, this project fulfills point 6 of the Expected Outcomes by June 2019 of the Kauffman Incubator Project:
- 6. The seed data will have at least a 70% baseline accuracy and coverage of incubators compared to results from hand collected data on 5 ecosystems, as measured by the data analysis.
Data
The five ecosystem incubators are:
City | State | Incubator Name |
---|---|---|
Washington | DC | Inclusive Innovation Incubator (In3) |
Washington | DC | AU Entrepreneurship Incubator |
Washington | DC | Global Development Incubator |
Washington | DC | Halcyon Incubator |
Washington | DC | The Hatchery |
Burlington | VT | Vermont Center for Emerging Technologies (VCET) |
Austin | TX | Austin Technology Incubator |
Austin | TX | IncubatorCTX |
Austin | TX | Economic Growth Business Incubator |
Austin | TX | ACC Bioscience Incubator |
Austin | TX | Bunker Labs |
Austin | TX | Galvanize |
St. Paul | MN | University Enterprise Laboratories |
Minneapolis | MN | Discovery Launchpad at UMN |
St. Paul | MN | Lunar Startups |
The datasets to test against are (as tables in the incubators database, also available as tab-delimited text files):
- Incubators -- 2137 records, combining the records in CIAIncubators and USIncubators
- CIAIncubators -- 1603 records, combining incubators identified in Crunchbase, INBIA, and AngelList
- USIncubators -- 707 records, combining state and regional incubator lists found as a part of the US Incubators project
- Data from the Google Crawler run against the five ecosystems
Process
Load FiveEcosystemIncubators.txt into incubators then run the matcher:
perl Matcher.pl -mode=2 -file1="FiveEcosystemIncubators.txt" -file2="Incubators.txt"
Note that there is substantial name variation in Incubators.txt for the same firm, so standard name based matching doesn't work. For example:
Inclusive Innovation Incubator Inclusive Innovation Incubator DC in3dc.com Inclusive Innovation Incubator (In3) - D.C's first co-working, training, & incubator space intentional about diversity & inclusion. Washington 2301-D Georgia Ave, NW Crunchbase Inclusive Innovation Incubator (In3) Inclusive Innovation Incubator (In3) DC www.in3d.com Inclusive Innovation Incubator (In3) is the District's first community space focused on inclusion innovation and incubation. The incubator is committed to creating a collaborative environment where under-resourced members have access to the space and services needed to build or grow a successful business. Washington DC AngelList,USIncubators
And out of the 15 incubator names, 11 were in our incubators table (irrespective of location), and of these y had name variation(s).
Fortunately the count is small, so we can conduct a manual review. For the Google crawler, we only count a hit if the website of the incubator itself is included in the results, rather than a news article or other information that references the incubator.
Results
In the table below, 1 indicates the incubator was present in the source, and 0 indicates it was absent. The last column, labelled any records if the incubator was in our complete seed data, comprised of the incubator table and the Google Crawler results.
Name | Location | Crunchbase | INBIA | Angellist | US Incubators | Google Crawler | Any |
---|---|---|---|---|---|---|---|
Inclusive Innovation Incubator (In3) | Washington, DC | 1 | 0 | 1 | 1 | 1 | 1 |
AU Entrepreneurship Incubator | Washington, DC | 0 | 0 | 0 | 0 | 0 | 0 |
Global Development Incubator | Washington, DC | 0 | 0 | 0 | 1 | 0 | 1 |
Halcyon Incubator | Washington, DC | 0 | 0 | 0 | 1 | 1 | 1 |
The Hatchery | Washington, DC | 0 | 0 | 0 | 1 | 0 | 1 |
Vermont Center for Emerging Technologies (VCET) | Burlington, VT | 1 | 1 | 0 | 0 | 1 | 1 |
Austin Technology Incubator | Austin, TX | 1 | 0 | 1 | 0 | 1 | 1 |
IncubatorCTX | Austin, TX | 0 | 0 | 0 | 0 | 0 | 0 |
Economic Growth Business Incubator | Austin, TX | 0 | 0 | 0 | 0 | 1 | 1 |
ACC Bioscience Incubator | Austin, TX | 0 | 0 | 1 | 0 | 1 | 1 |
Bunker Labs | Austin, TX | 0 | 0 | 0 | 0 | 1 | 1 |
Galvanize | Austin, TX | 0 | 0 | 0 | 0 | 0 | 0 |
University Enterprise Laboratories | St. Paul , MN | 0 | 0 | 0 | 1 | 1 | 1 |
Discovery Launchpad at UMN | Minneapolis , MN | 0 | 0 | 0 | 0 | 0 | 0 |
Lunar Startups | St. Paul, MN | 0 | 0 | 0 | 1 | 0 | 1 |
Total | 3 (20%) | 1 (7%) | 3 (20%) | 6 (40%) | 8 (53%) | 11 (73%) |
Notes:
- For The Hatchery, the NY location is also in AngelList
- For Bunker Labs, the Austin location is missing but the Seattle location is in US Incubators, the Chicago location is in AngelList, and the Google crawler found an Arlington, VA location too.
- For Galvanize, the Seattle location is in US Incubators and the San Francisco and New York City locations are both in AngelList.
Overall, 73% of the incubators in the hand-collected data are present in the seed data.
The three of the four absent incubators have a university affiliation. AU Entrepreneurship Incubator is based at American University in DC, IncubatorCTX is based at Concordia University in Northwest Austin, and Discovery Launchpad at UMN is at the University of Minnesota. This suggests that we would need another source of data to capture academic incubators. The fourth missing incubator is the branch office of a chain (Galvanize). This chain does not describe itself as an incubator but appears to meet the criteria for one. We could attempt to put together data on incubator chains and their offices separately. The data also suggests that the different sources do capture different incubators. There is little apparent correlation between the sources, and the Google Crawler is the only one to capture more than half of the incubators.