===As of 05/21/2018 the Google Sheet Workbook has been downloaded to the E drive. The now Excel Workbook is saved at E:\McNair\Projects\Accelerators\Summer 2018\Accelerator Master Variable List.xlsx. This is now the master file.===
Google Master Sheet: https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=0
*Cross-reference sheet with data from Peter's old accelerator consolidation file ("accelerator_data_noflag" and "accelerator_data" in "All Relevant Files") and fill in missing data
A 0 means we don't have founder data for that accelerator.
Specs: A tab delimited text file with the following fields:
Accelerator First Name Last Name LinkedInURL(if possible)
Getting the LinkedInURL will ensure accuracy, but will work without it.
*Shrey: Find "demo day" keywords, so that we can search AcceleratorName Year Keyword and get back potential demo day pages
==Accelerator Type project==
File to edit is called "Accelerator type list". Located in the folder E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs. More systematic information and instructions are in"Instructions for Accelerator type project" in E:\McNair\Projects\Accelerators\Spring 2018\Grouping project of ListOfAccs.
NOTE: until we get through all 270 accelerators, we will just categorize each accelerator into the following three categories as quickly as possible with short notes in teh "other info" column for these; once we have this, we will go back through the ones that aren't categorized and add notes to the "other info" column.
Type list:
*Private
*Corporate
*Academic
Note: if DEAD, noted here.
Other info:
*nonprofit? (y/n)
*Subtype abbreviations:
**S: for if a social entrepreneurship initiative
**I: for if an incubator
**A: for an angel group
**F: for foreign
**C: for in coworking space/hub/etc
**V: for if part of venture fund
**G: for if government funded/partnered
**T: for international
Note: subtypes (from individual text files in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data) were only found for 23 of the 270 accelerators. These accelerators were initially intended to be removed from the master list. Remaining subtypes are currently being added.
other info:
international offices, founders, industries, org type, program duration, or other interesting, easily accessed variables. Additional information is especially important for accelerators that have no other subtype abbreviation listed.
===Steps to research an accelerator===
1. Copy/paste URL listed in Accelerator type list file into google. If website is insufficient, try googling:
the name of the accelerator
the name of the accelerator + "crunchbase"
the name of the accelerator + "nonprofit"
the above steps sometimes lead to other helpful databases/news articles
2. Note whether:
1) Academic/Corporate/Private
2) For Profit/Nonprofit. Sometimes this isn't directly stated but can be inferred through their description of, say their investment process. If they don't address this at all it's probably For Profit.
3) subtype (S, I, A, F, C, V, G, T).
4) Additional, easily-accessed info. Number 4 is really important if there's no subtype.
All 270 need to be done by the end of the semester.
Type list file saved as
"Accelerator type list" in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs.
The list of ListofAccs, from which we drew Accelerator type list, should have no matches with any of the flagged accelerators in E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data. There are 23 matches though. So all subtypes must be searched and entered manually. Whether some were a nonprofit was listed in E:\McNair\Projects\Accelerators\Spring 2017\Grouping project of ListOfAccs, called "whether nonprofit...". Accelerators with no info there on whether nonprofit need to have info entered manually.
=Funded By Accelerators=
Reference the like-named portion in [[Crunchbase Data#Funded by Accelerators|Crunchbase Data]]
=End of Semester Report=
=End of Semester Notes=
*We have compiled a very long list of accelerators from many different databases. For the past couple of weeks, everyone in the center has been going through this list, 20 at a time, classifying each one as an accelerator or not an accelerator, and then proceeding to gather data on the accelerator using the process outlined below. This process went very smoothly. We have successfully gone through about 80% of the list. We are still missing information on the last hundred or so names. All of the collected data is located on the RDP, within the "Accelerators" folder under "Data"or on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet].*We have listed all of the startups from the accelerators that have break out cohorts on their website on the [https://docs.google.com/spreadsheets/d/1ikuxYwp9JIRrjz4qQcbdwTpbHOne-q2PterYTjzofjw/edit?ts=5aa2f1f9#gid=1132417337 "Accelerator Master Variable List" Google sheet]. This contains the following information in the "Cohort List (new)" sheet: accelerator name, year, cohort name, company name, description, founders, category/sector, and location. *Next steps include going through the demo day pages that have been downloaded and writing notes on the different types if possible (see [[Demo Day Page Google Classifier]]).
=Data Collection Notes=
Company Name Date Company Date Company Company Company City Company Street Address, Line 1 Company Street Address, Line 2 Total Known Company Industry Sub-Group 3 Company Industry Major Group Round Company Stage Level 3 Round Amt, Round Amt,
==3 files==
==Link to Crunchbase API application==
https://about.crunchbase.com/forms/research-access-apply/(Does not work anymore) https://data.crunchbase.com/v3/docs/using-the-api (Has new instructions for application)
#Copied "Seed Accelerators" table to TextPad, data sorted itself into lines. Returned 235 results.
#Clicking on the accelerator name itself links to a page with all of its associated startups, up until 6/2016 cohort
*Overall very extensive data for accelerators that are included on the list, but after cross-referencing from other sources shows that seed-db is lacking many newer accelerators; list is not all-inclusive.
*Includes regional distributions for accelerator groups as well. For example, rather than just "Techstars", the group is broken into Austin, Berlin, Boston, Boulder, etc.
This script takes a physical address and converts it into latitude and longitude coordinates. Should be used in conjunction with the Enclosing Circle program to find the concentration of accelerators.
E:\McNair\Software\CodeBase\EnclosingCircle.py
=Kauffman Foundation Incubator Proposal Information=
==Institutions==
Summary: F6S, Crunchbase, seed-db
Tools: Matcher - used to match lists of potential accelerators with our current list to identify duplicates/new matches (E:\McNair\Projects\Accelerators)
===F6S===
F6S WebCrawler and F6S Parser - E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs
We have the Crunchbase 2013 Snapshot which provided lots of new data on accelerators and incubators but we would love to use the Crunchbase API to get a current database snapshot that we could use to cross reference companies and add newly formed accelerator and incubator companies.
===AngelList===
===seed-db===
Obtained through www.seed.db/accelerators
===Global Accelerator Network (GAN)===
GAN Parser- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\scrapeaccel.py
GAN Data- E:\McNair\Projects\Accelerators\Web Scraping for Accelerators\GAN Accelerator Data
*Contains: Company Name, # of Companies Range, % of Companies Funded, Funding Raised by Companies, Employee Range, Exit Funding, Exit Date, Total Company Funding Raised, # of Mentors Range, % Equity, Location, Minimum Seed Capital Investment
==Cohorts==
*Cohorts obtained manually
*All Cohort txt files are saved under "E:\McNair\Projects\Accelerators\Data
Summary: Whois Parser, Geocode, Tools to determine industry, etc
===Whois Parser===
*Retrieves and parses Whois information. Specifically, takes a file with a column of domain names and populates the corresponding columns with information from the WhoIs API.
*Often used to obtain locations.
===Geocode===
Input: Company Address
Output: Directional Coordinates
*Used to obtain the locations of different Accelerators and Cohort companies.
===SDC Platinum Pull===
Used to obtain funding information and match companies that have gotten funding with companies that are Accelerator cohorts.
===Desired Information/Variables===
*Key People (founders, lead entrepreneurs, strategists, etc.)
*Total number of launched companies
*A FAQ for application details, accelerator vision, and
*Funds raised per company (average)
*Features offered by accelerator (perks, space, tools, etc)
==Desired Tools/Information==
===Automating the Process of Obtaining Cohorts===
*Automating this process would save a lot of time and really progress the project.
===Obtaining More Details on Accelerators===
*Having the kind of thorough information on industry, companies, funding, location, exits, mentors, leadership, that we got for the GAN companies would be fantastic.