Incubator Seed Data
Requirement: Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.
Incubator Seed Data | |
---|---|
Project Information | |
Has title | Incubator Seed Data |
Has owner | Anne Freeman |
Has start date | |
Has deadline date | |
Has project status | Active |
Is dependent on | Crunchbase Database, INBIA, Google Crawler |
Subsumed by: | Ecosystem Organization Classifier |
Copyright © 2019 edegan.com. All Rights Reserved. |
Status: We have identified at least 4 primary data sources. Crunchbase is our biggest structured source for incubators, and we have a license for Crunchbase Pro. Our other two structured sources are AngelList and INBIA. Given the paucity of strong sources, we decided to use a custom Google Crawler as a source. We will also be creating a new VentureXpert Database using data drawn from SDC Platinum, so that we have a source of information on venture capital backed startup firms.
Goal
We will evaluate data sources based on the number of incubators they have data on and the type of information they supply on these incubators. We will also record whether or not these data sources collect information on any other types of entrepreneurship organizations. Ideally these data sources would provide some or all of the variables that were identified as most important for identifying incubators (Formulate_baseline_attributes). However, it is unlikely that one data source will contain all of the baseline attributes identified, therefore if the data source can provide links to a large quantity of incubators or in-depth descriptions, they could still be viable.
Chosen Sources
Our four primary datasources are:
- Crunchbase
- INBIA
- AngelList
- Google Crawler
- Yi Ma's work assembling US Incubators, state-by-state
- ClusterMapping
- Wharton entrepreneurship club
- Gaebler
The Google Crawler was added instead of a structured source, with the exceptions of Crunchbase and AngelList, the structured sources are all small.
In addition, we will be using the sources listed below, and VentureXpert Database as a primary reference seed source (to see whether client companies received venture capital).
Evaluation of Main Sources
Source | Directions | How many? | Data | Benefits | Limitations |
---|---|---|---|---|---|
Whartoneclub Incubators |
|
21 |
|
Links to the home page of incubator | May not be able to get specific information from home page. Limited list of incubators. Some organizations listed may not fall under our definition of an incubator (eg. Y Combinator) |
InterNational Business Incubation Association or see our INBIA page |
|
415 |
|
The database contains information on a lot of economic development institutions and would provide a mass quantity of data | Challenging for web crawler as link connects to another page within inbia and then link on that page connects to company's homepage. Not all of institutions listed are incubators.
Out of the first ten links there were: 4 incubators, 2 educational programs, 1 broken link, 3 other economic development programs |
Clustermapping | Opened Source Link | 292 |
|
Provides a long list of entrepreneurship organizations | Often data is missing off of the separate page for the company, including the URL to the company's website. The description is often not detailed enough to determine the category for the economic organization without going to the company's website. Different types of entrepreneurship organizations are mixed together.
Using the first 10 links, three were accelerators, six were missing links (two were self-proclaimed incubators in description), and one was another type of support organization. |
The MBA Is Dead |
|
186 Results |
|
Can search by region or by category of companies | Seems to be a lot of data on accelerators and fewer incubators included
Out of the first 10 unique company links -- 1 was a broken link, 7 were accelerators, and 2 could possibly be incubators |
AngelList |
|
1,444 Results |
|
Can use key word "incubator" to filter data | Contains some hybrid of incubator and accelerator |
Gaebler |
|
360 Results | URL, incubator name | Well-organized list of incubators by state | It only provides URL and incubator name; contains bad links |
- Gaebler incubator list is in E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\Gaebler\Results.txt and the script to retrieve the results is in the same director and called Gaebler.py.
These main sources were found with Google Searches that included:
- "incubator database"
- "us business incubators database"
Accelerator Data Sources that are Potentially Viable
Source | Directions | How many? | Data | Benefits | Limitations |
---|---|---|---|---|---|
Accelerator Info |
|
464 | Each link on parent list leads to individual home page url of organization | Lots of programs | Mixed information on incubators and accelerators. Some of the university supported programs may not be considered either an incubator or an accelerator
Out of the first 10 links, 3 bad links, 3 potential incubators, and 4 accelerators |
Galidata | Filter by Region: North America | 164 |
|
reliable links directly to homepage of companies, can search within regions | Mix of incubators and accelerators. Can only filter region to North America.
Out of the first 10 organizations in the US -- 6 were accelerators and 4 could potentially be incubators. |
Crunchbase Database | See the Crunchbase Database project page for more information. | ||||
S-B-Z | Open and copy and paste into excel then clean up | 143 | Contains Name, URL, Description, Industry, Type, City, State | In E:\projects\Kauffman Incubator Project, as excel and txt | Mostly accelerators! |
The S-B-Z data is in E:\projects\Kauffman Incubator Project\S-B-Z.txt and contains a classifiable description field.
Region Specific Incubator Sources
See US Incubators, which extends the notes in this section with data collection.
Many state and local governments contain information on incubators and accelerators that operate within their jurisdiction. They do not provide comprehensive sources on all incubators within the US but could be helpful as sources to cross-reference with a larger database.
The National Business Incubation Association maintains a list of U.S. Incubation Associations. We went through this list and evaluated each association as a potential data source. These sites generally contain a list of incubators that are working in collaboration with the NBIA and are within that specific state. The sites could be useful in cross-referencing data pulled from the NBIA main database as some of the incubators listed on the state specific websites are not in the main NBIA database. They could also be helpful in cross-referencing data pulled from other main databases as these sites have reliable links, are filtered to include only incubators, and have a relatively consistent format.
Source | Directions | How many? | Region | Data | Benefits | Limitations |
---|---|---|---|---|---|---|
Alabama Business Incubation Network | Opened source link and counted incubators listed on the home page | 12 | Alabama | Incubator Name, Brief Description, and a link to the home page | Reliable links that are filtered to include only incubators | only contains information on incubators in Alabama that are associated with NBIA |
Florida Business Incubation Association | Opened source link and then opened links for each of the four regions in Florida | 66 | Florida | source link contains 4 links to the regions in Florida, each region contains incubator name, address, and a link to the home page | Provides reliable links. Filtered to include only information on incubators | May be challenging for a web crawler to navigate because it is separated by region. Only provides information about Florida incubators. |
Louisiana Business Incubation Association | Opened source link. Copied the data on incubators into a text editor and search for how many times the word "E-Mail" appeared | 28 | Louisiana |
|
data is filtered to include only incubators, links are reliable | only incubators in state of Louisiana, limited data set |
Maryland Business Incubation Association | Opened source link and counted number of incubators listed on the page | 35 | Maryland | Main site contains incubator name, short description, and link to another page within main site with contains a link to the incubator home page | Reliable links, filtered to include only incubators | It would be challenging for a web crawler to navigate, as the main page contains links internal to the site which then link to the home pages of incubators. Limited dataset with only incubators in Maryland. |
Massachusetts Association of Business Incubators | Open source link and count number of incubators listed on the page | 20 | Massachusetts | incubator name, short description, and link to incubator home page | reliable links, only data on incubators | limited dataset |
Boston Startup Guide | Scrolled down to the section labeled "Startup incubators in Boston" | 10 | Boston |
|
reliable links | relatively unformatted data that would be challenging to use. Limited in scope |
Michigan Business Innovation Association | Open source link and count number of incubators listed in the column next to the map | 15 | Michigan | incubator name, address, link to location on map, and link to incubator home page | reliable links, only data on incubators | limited dataset |
NH Tech Alliance | Open source link and count organizations listed under "NHBIN Member Locations" | 8 | New Hampshire | incubator name, town within NH, brief description, and link to home page | reliable links only data on incubators | limited dataset, not very structured organization on website |
NC Business Incubation Association | Open source link, click on each county and count the number of business incubators | 32 | North Carolina | Incubator name, address, program directors, and link | only data on incubators | limited dataset, hard to navigate site with web crawler, some of the incubators do not have links |
Oklahoma Business Incubator Association | Open source link and count the number of incubators | 29 | Oklahoma | Incubator name and link to it | reliable links, only data on incubators | limited dataset |
Incubators/Accelerators In DC | Open source link and count the number of incubators, I did not include co-working spaces | 15 | DC | Incubator name and link to it and brief description | reliable links, helpful description | limited dataset, mix of incubators and other organizations |
Accelerator Data Sources that are not viable
- Reason: does not include information on incubators
- Learn More: Previous Research
- Reason: data is cluttered/messy, does not provide links to incubator websites and doesn't include enough information for evaluation without incubator url
- Learn More: Previous Research
- Reason: does not include information on incubators
- Learn More: Previous Research
- Reason: this website is no longer active, the link will not work
- Learn More: Previous Research
- Reason: does not include information on incubators
- Learn More: https://www.brookings.edu/research/accelerating-growth-startup-accelerator-programs-in-the-united-states/
- Reason: does not include information on incubators
- Learn More: Previous Research
Other Sources Not Yet Explored
We found the following sources in the process of other work:
Assembling the data
The data is assembled in the dbase incubators from the following national sources, all copied in E:\projects\Kauffman Incubator Project\Incubator Data Assembly:
- 456 in CrunchbaseIncubators.txt, see Crunchbase_Database#Incubators_in_Crunchbase
- 415 in INBIA_data.txt, see INBIA#Retrieve_Data_from_URLs_Generated
- 1474 (self-declared as incubators but actually many different things) in angelList_companyInfo.txt, see AngelList_Database#Parsing_Saved_AngelList_Pages
- 292 in ClusterMapping.txt
- 21 in Wharton.txt
- 361 in Gaebler.txt
Note that the AngelList data also has angelList_employees.txt and angelList_portfolio.txt as associated files.