Revision as of 11:34, 24 April 2017

Hi Veeral,

Intro

Welcome to the project. The documents are here: E:\Mcnair\Projects\Accelerators

SQL documents are here: E:\Mcnair\Projects\Accelerators\SQL_Data

Database Drive is here: Z:\Bulk\Accelerators

Important docs

The SDC pull that includes all of the round data since 1999: E:\Mcnair\Projects\Accelerators\VC_Data_Repeated_Down.txt or E:\Mcnair\Projects\Accelerators\"VC Data.xlsx"

The Cohorts of accelerators (under the Updated tab on the bottom): E:\Mcnair\Projects\Accelerators\"Clean Cohort Data.xlsx"

The Crunchbase Snapshots of organizations: E:\Mcnair\Projects\Accelerators\"Crunchbase Snapshot"\organizations.csv

To-do list

Filter out actual accelerators from the Crunchbase organizations data

Possibly by running accelerator_keywords.py
Possibly by using string searching in organizations.csv
Watch out for Venture capital companies (the organizations file has many of these and we'll probably pick up a lot in our "accelerator" filtered list

Match this list against the current list of accelerators

We have our own copy of the matcher in the accelerators E drive (try mode 1 and mode 2 for different results, mode 2 might be more helpful)

Find cohort data for all of the accelerators (if they're not accelerators take them off the list)

Don't worry about this stuff

Rank on VC

Getting a VC percentage for each Accelerator

Also categorize

Age
Nonprofit or not
Location

RegEx Code for repeating data down for the round data from SDC:

\n([^\t]+\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t)(.*)\n\t\t\t\t\t\t\t\t\t\t

\n\1\2\n\1

=if(isnumber(search("blah",B2))=TRUE,1,0) where blah is the substring (what you're searching for), B2 is the string (what your searching in) and 1 represents that it's present and 0 means it isn't.

=sum(A1:C1) This just sums the cells from A1 to C1

@@ Line 20: / Line 20: @@
 *Possibly by running accelerator_keywords.py
 *Possibly by using string searching in organizations.csv
+*Watch out for Venture capital companies (the organizations file has many of these and we'll probably pick up a lot in our "accelerator" filtered list
 #Match this list against the current list of accelerators
 *We have our own copy of the matcher in the accelerators E drive (try mode 1 and mode 2 for different results, mode 2 might be more helpful)

Difference between revisions of "Talk:Accelerator Seed List (Data)"

Revision as of 11:34, 24 April 2017

Contents

Intro

Important docs

To-do list

Don't worry about this stuff

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools