Difference between revisions of "Kauffman Incubator Project"

From edegan.com
Jump to navigation Jump to search
 
(23 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Welcome!
+
{{Infobox
 +
| bodystyle  = width: 25em; background-color: #ffffff; border: 1px solid #a63c07;
 +
| abovestyle  = background:#0d776e; font-size: 125%; color:#ffffff
 +
| above      = {{PAGENAME}}
 +
| image      = [[File:EMKF_Stacked_RGB_brighter_Blue_crop.png|200px|frameless|center]]
 +
| headerstyle = background:#0d776e; color:#ffffff
 +
| header1    = Project Information
 +
| labelstyle  = width: 50%;
 +
| label2      = Principal Investigator
 +
| data2      = [[Ed Egan]]
 +
| label3      = Academic Institution
 +
| data3      = Georgetown University
 +
| label4      = Grant Cohort
 +
| data4      = UMM
 +
| label5      = Duration
 +
| data5      = 2018-2019 Academic Year
 +
| belowstyle  = background:#d6d6d6; font-size: 0.7em;
 +
| below      = Copyright © 2019 edegan.com. All Rights Reserved.
 +
}}
  
==Project Introduction==
+
The [[Kauffman Incubator Project]] advanced the capacity of researchers and policymakers to measure the characteristics and performance of entrepreneurship ecosystem institutions. To this end, we created a set of tools to automate the identification and classification of ecosystem organizations and the extraction of data on startup firms from their websites. The grant was originally submitted in 2017 at Rice University but was drawn at Georgetown University from 2018 to 2019.
  
Our project will advance the capacity of researchers and policymakers to measure the characteristics and performance of entrepreneurship ecosystem institutions. We will create a set of tools to automate the identification and classification of ecosystem organizations, and the extraction of data from their websites. Specifically, our goal is to develop a system to 1) classify entrepreneurship ecosystem organizations, including high-growth technology incubators, startups, and venture capitalists based on a short textual description; 2) identify the client listing page on an incubator's website; and 3) automate the extraction of information about startups from an incubator's client listing page. At present, we envision using neural networks in these tools, and we expect that the third element will require new computer science.
+
==Administration==
  
==Project Expected Outputs==
+
===Goals===
 +
 
 +
Specifically, our goals are to develop a system to:
 +
# [[Ecosystem Organization Classifier|Classify entrepreneurship ecosystem organizations]], including high-growth technology incubators, startups, and venture capitalists based on a short textual description;
 +
# [[Listing Page Classifier|Identify the client listing page]] on an incubator's website;
 +
# [[Listing Page Extractor|Automate the extraction of information]] about startups from an incubator's client listing page;
 +
# Make this system available to the research community as opensource software.
 +
 
 +
We are using modern machine learning techniques in our tools, and we expect that the third element will require new computer science.
 +
 
 +
===Expected Outputs===
  
 
'''By March 2019'''
 
'''By March 2019'''
# Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.  
+
# Determine at least [[Incubator Seed Data|4 primary data sources]], or secure licenses to extract ‘seed data’ from these sources, as measured by program records.  
# Have a working prototype of an automated classifier to distinguish between incubators and other entities described in seed data, as measured by program records.
+
# Have a working prototype of an [[Listing Page Classifier|automated classifier]] to distinguish between incubators and other entities described in seed data, as measured by program records.
# Collect data in at least 5 ecosystems, as measured by availability of a dataset.
+
# Collect data in at least [[Incubators in Five Ecosystems|5 ecosystems]], as measured by availability of a dataset.
# Develop a protocol for the tool to extract client company identity information from incubator websites, as measured by program records.  
+
# Develop a [[LP Extractor Protocol|protocol for the tool]] to extract client company identity information from incubator websites, as measured by program records.  
  
 
'''By June 2019'''
 
'''By June 2019'''
  
# Have a working prototype of a tool to identify client company listings from incubator websites, as measured by program records.  
+
# Have a working prototype of a tool to [[Listing Page Classifier|identify client company listings]] from incubator websites, as measured by program records.  
 
# Upload the collected data to GitHub, Dataverse, or other publicly accessible web platform for use by a set of academics, as measured by program records.
 
# Upload the collected data to GitHub, Dataverse, or other publicly accessible web platform for use by a set of academics, as measured by program records.
#  Produce a summary on the open development process for the prototype as measured by program materials
+
#  Produce a summary on the [[Open Development Process]] for the prototype as measured by program materials.
  
==Project Expected Outcomes==
+
===Expected Outcomes===
  
 
'''By June 2019'''
 
'''By June 2019'''
 
# At least 3 improvements to the measurement system will be made as a result of collaboration with other researchers, as measured by feedback from collaborators.  
 
# At least 3 improvements to the measurement system will be made as a result of collaboration with other researchers, as measured by feedback from collaborators.  
# At least 15 researchers external to the UMM cohort will have contacted the grantee for additional information and/or collaboration, as measured by correspondence.  
+
# At least 15 researchers external to the UMM cohort will have contacted the project for additional information and/or collaboration, as measured by correspondence.  
 
# At least 5 outside users will have used the open source tool, as measured by records of use and postings of the data.
 
# At least 5 outside users will have used the open source tool, as measured by records of use and postings of the data.
 
# There will be at least 50 views of the online documentation of the development process, as measured by web analytics.
 
# There will be at least 50 views of the online documentation of the development process, as measured by web analytics.
 
# The data posted on the public forum will have at least 50 views/downloads of the uploaded data, as measured by web analytics.
 
# The data posted on the public forum will have at least 50 views/downloads of the uploaded data, as measured by web analytics.
#  The seed data will have at least an 70% baseline accuracy and coverage of incubators compared to results from hand collected data on 5 ecosystems, as measured by the data analysis.
+
#  The [[Incubator Seed Data|seed data]] will have at least a [[Incubator Seed Data Coverage|70% baseline accuracy and coverage of incubators]] compared to results from [[Incubators_in_Five_Ecosystems#Incubators|hand collected data on 5 ecosystems]], as measured by the data analysis.
 +
 
 +
==Fulfillment==
 +
 
 +
===Main Project Tree===
 +
 
 +
The list below shows the main project tree with the project assignments.
 +
 
 +
#[[Ecosystem Organization Classifier]] -- {{#show: Ecosystem Organization Classifier | ?Has owner}} ({{#show: Ecosystem Organization Classifier | ?Has project status}})
 +
##[[Defining Incubators]] -- {{#show: Defining Incubators | ?Has owner}} ({{#show: Defining Incubators | ?Has project status}})
 +
###[[Formulate baseline attributes]] -- {{#show: Formulate baseline attributes | ?Has owner}} ({{#show: Formulate baseline attributes | ?Has project status}})
 +
##[[Incubator Seed Data]] -- {{#show: Incubator Seed Data | ?Has owner}} ({{#show: Incubator Seed Data | ?Has project status}})
 +
###[[Crunchbase Database]] -- {{#show: Crunchbase Database | ?Has owner}} ({{#show: Crunchbase Database | ?Has project status}})
 +
###[[INBIA]] -- {{#show: INBIA | ?Has owner}} ({{#show: INBIA | ?Has project status}})
 +
###[[Google Crawler]] -- {{#show: Google Crawler | ?Has owner}} ({{#show: Google Crawler | ?Has project status}})
 +
###[[AngelList Database]] -- {{#show: AngelList Database | ?Has owner}} ({{#show: AngelList Database | ?Has project status}})
 +
###[[VentureXpert Database]] -- {{#show: VentureXpert Database | ?Has owner}} ({{#show: VentureXpert Database | ?Has project status}})
 +
####[[vcdb4]] -- {{#show: vcdb4 | ?Has owner}} ({{#show: vcdb4 | ?Has project status}})
 +
###[[US Incubators]] -- {{#show: US Incubators | ?Has owner}} ({{#show: US Incubators | ?Has project status}})
 +
###[[Incubator Seed Data Coverage]] -- {{#show: Incubator Seed Data Coverage | ?Has owner}} ({{#show: Incubator Seed Data Coverage | ?Has project status}})
 +
##[[Incubators in Five Ecosystems]] -- {{#show: Incubators in Five Ecosystems | ?Has owner}} ({{#show: Incubators in Five Ecosystems | ?Has project status}})
 +
###[[Ecosystem: Austin or Houston]] -- {{#show: Ecosystem: Austin or Houston | ?Has owner}} ({{#show: Ecosystem: Austin or Houston | ?Has project status}})
 +
###[[Ecosystem: Burlington VT]] -- {{#show: Ecosystem: Burlington VT | ?Has owner}} ({{#show: Ecosystem: Burlington VT | ?Has project status}})
 +
###[[Ecosystem: Denver CO]]  -- {{#show: Ecosystem: Denver CO | ?Has owner}} ({{#show: Ecosystem: Denver CO | ?Has project status}})
 +
###[[Ecosystem: Washington DC]] -- {{#show: Ecosystem: Washington DC | ?Has owner}} ({{#show: Ecosystem: Washington DC | ?Has project status}})
 +
###[[Ecosystem: Twin Cities MN]] -- {{#show: Ecosystem: Twin Cities MN | ?Has owner}} ({{#show: Ecosystem: Twin Cities MN | ?Has project status}})
 +
###[[Incubator Seed Data Coverage]] -- {{#show: Incubator Seed Data Coverage | ?Has owner}} ({{#show: Incubator Seed Data Coverage | ?Has project status}})
 +
#[[Listing Page Classifier]] -- {{#show: Listing Page Classifier | ?Has owner}} ({{#show: Listing Page Classifier | ?Has project status}})
 +
#[[Listing Page Extractor]]
 +
##[[Domain Specific Language Research]] -- {{#show: Domain Specific Language Research | ?Has owner}} ({{#show: Domain Specific Language Research | ?Has project status}})
 +
##[[Listing Page Plugin Spec]] -- {{#show: Listing Page Plugin Spec | ?Has owner}} ({{#show: Listing Page Plugin Spec | ?Has project status}})
 +
##[[LP Extractor Protocol]] -- {{#show: LP Extractor Protocol | ?Has owner}} ({{#show: LP Extractor Protocol | ?Has project status}})
 +
 
 +
===List of All Projects===
 +
 
 +
Information on each project under the Kauffman Incubator Project may be found in the table below. All projects are included in [[:Category:Project]] and use [[Template:Project]]. To create or edit a project, please use [[Form: Project]] and set the Has sponsor property to Kauffman Incubator Project:
 +
<nowiki>[[Has sponsor::Kauffman Incubator Project]]</nowiki>.
 +
 
 +
{{#ask: 
 +
  [[Category:Project]]
 +
  [[Has project status::Active]]
 +
  [[Has sponsor::Kauffman Incubator Project]]
 +
  | format=count
 +
  | intro=<strong>Data summary: There are </strong>
 +
  | outro=<strong> active projects found.</strong>
 +
}}
 +
 
 +
{{#ask:
 +
  [[Category:Project]]
 +
  [[Has project status::Active]]
 +
  [[Has sponsor::Kauffman Incubator Project]]
 +
  |mainlabel=Project
 +
  |?Has owner=Owner
 +
  |?Does subsume=Subsumes
 +
  |format=table
 +
}}
 +
 
 +
Researchers may also wish to review related precursor projects done at the [[McNair Projects|McNair Center]].

Latest revision as of 11:39, 20 September 2020

Kauffman Incubator Project
EMKF Stacked RGB brighter Blue crop.png
Project Information
Principal Investigator Ed Egan
Academic Institution Georgetown University
Grant Cohort UMM
Duration 2018-2019 Academic Year
Copyright © 2019 edegan.com. All Rights Reserved.

The Kauffman Incubator Project advanced the capacity of researchers and policymakers to measure the characteristics and performance of entrepreneurship ecosystem institutions. To this end, we created a set of tools to automate the identification and classification of ecosystem organizations and the extraction of data on startup firms from their websites. The grant was originally submitted in 2017 at Rice University but was drawn at Georgetown University from 2018 to 2019.

Administration

Goals

Specifically, our goals are to develop a system to:

  1. Classify entrepreneurship ecosystem organizations, including high-growth technology incubators, startups, and venture capitalists based on a short textual description;
  2. Identify the client listing page on an incubator's website;
  3. Automate the extraction of information about startups from an incubator's client listing page;
  4. Make this system available to the research community as opensource software.

We are using modern machine learning techniques in our tools, and we expect that the third element will require new computer science.

Expected Outputs

By March 2019

  1. Determine at least 4 primary data sources, or secure licenses to extract ‘seed data’ from these sources, as measured by program records.
  2. Have a working prototype of an automated classifier to distinguish between incubators and other entities described in seed data, as measured by program records.
  3. Collect data in at least 5 ecosystems, as measured by availability of a dataset.
  4. Develop a protocol for the tool to extract client company identity information from incubator websites, as measured by program records.

By June 2019

  1. Have a working prototype of a tool to identify client company listings from incubator websites, as measured by program records.
  2. Upload the collected data to GitHub, Dataverse, or other publicly accessible web platform for use by a set of academics, as measured by program records.
  3. Produce a summary on the Open Development Process for the prototype as measured by program materials.

Expected Outcomes

By June 2019

  1. At least 3 improvements to the measurement system will be made as a result of collaboration with other researchers, as measured by feedback from collaborators.
  2. At least 15 researchers external to the UMM cohort will have contacted the project for additional information and/or collaboration, as measured by correspondence.
  3. At least 5 outside users will have used the open source tool, as measured by records of use and postings of the data.
  4. There will be at least 50 views of the online documentation of the development process, as measured by web analytics.
  5. The data posted on the public forum will have at least 50 views/downloads of the uploaded data, as measured by web analytics.
  6. The seed data will have at least a 70% baseline accuracy and coverage of incubators compared to results from hand collected data on 5 ecosystems, as measured by the data analysis.

Fulfillment

Main Project Tree

The list below shows the main project tree with the project assignments.

  1. Ecosystem Organization Classifier -- Anne Freeman, Libby Bassini (Active)
    1. Defining Incubators -- Anne Freeman, Libby Bassini (Active)
      1. Formulate baseline attributes -- Anne Freeman (Active)
    2. Incubator Seed Data -- Anne Freeman (Active)
      1. Crunchbase Database -- Hiep Nguyen (Active)
      2. INBIA -- Anne Freeman (Active)
      3. Google Crawler -- Anne Freeman (Active)
      4. AngelList Database -- Anne Freeman (Active)
      5. VentureXpert Database -- Khai Nguyen, Vineet Anne (Active)
        1. vcdb4 -- Ed Egan (Active)
      6. US Incubators -- Yi Ma (Active)
      7. Incubator Seed Data Coverage -- Ed Egan (Active)
    3. Incubators in Five Ecosystems -- (Active)
      1. Ecosystem: Austin or Houston -- Yi Ma (Active)
      2. Ecosystem: Burlington VT -- Nancy Yu (Active)
      3. Ecosystem: Denver CO -- Khai Nguyen (Active)
      4. Ecosystem: Washington DC -- Libby Bassini (Active)
      5. Ecosystem: Twin Cities MN -- Vineet Anne (Active)
      6. Incubator Seed Data Coverage -- Ed Egan (Active)
  2. Listing Page Classifier -- Nancy Yu (Active)
  3. Listing Page Extractor
    1. Domain Specific Language Research -- Lasya Rajan (Active)
    2. Listing Page Plugin Spec -- Rex Bone (Active)
    3. LP Extractor Protocol -- Lasya Rajan (Active)

List of All Projects

Information on each project under the Kauffman Incubator Project may be found in the table below. All projects are included in Category:Project and use Template:Project. To create or edit a project, please use Form: Project and set the Has sponsor property to Kauffman Incubator Project:

[[Has sponsor::Kauffman Incubator Project]].

Data summary: There are 25 active projects found.

ProjectOwnerSubsumes
AngelList DatabaseAnne Freeman
Crunchbase DatabaseHiep Nguyen
DSL EncodingHiep Nguyen
DSL generatorHiep Nguyen
Defining IncubatorsAnne Freeman
Libby Bassini
Domain Specific Language ResearchLasya Rajan
Ecosystem Organization ClassifierAnne Freeman
Libby Bassini
Defining Incubators
Incubator Seed Data
Incubators in Five Ecosystems
Ecosystem: Austin TXYi Ma
Ecosystem: Burlington VTNancy Yu
Ecosystem: Denver COKhai Nguyen
Ecosystem: Twin Cities MNVineet Anne
Ecosystem: Washington DCLibby Bassini
Incubator Classifier - Concept DiagramHiep Nguyen
Incubator Classifier - Formulate baseline attributesAnne Freeman
Incubator Seed DataAnne FreemanIncubator Seed Data Coverage
Incubator Seed Data CoverageEd Egan
Incubators in Five EcosystemsEcosystem: Denver CO
Ecosystem: Washington DC
Ecosystem: Burlington VT
Ecosystem: Twin Cities MN
Ecosystem: Austin TX
Incubator Seed Data Coverage
LP Extractor ProtocolLasya Rajan
Listing Page ClassifierNancy Yu
Listing Page ExtractorLP Extractor Protocol
Listing Page Plugin SpecRex Bone
Open Development ProcessEd Egan
US IncubatorsYi Ma
Vcdb4Ed Egan
VentureXpert DatabaseKhai Nguyen
Vineet Anne

Researchers may also wish to review related precursor projects done at the McNair Center.