Revision as of 10:48, 24 April 2017

Hi Veeral,

Intro

Welcome to the project. The documents are here: E:\Mcnair\Projects\Accelerators

SQL documents are here: E:\Mcnair\Projects\Accelerators\SQL_Data

Database Drive is here: Z:\Bulk\Accelerators

Important docs

The SDC pull that includes all of the round data since 1999: E:\Mcnair\Projects\Accelerators\VC_Data_Repeated_Down.txt or E:\Mcnair\Projects\Accelerators\"VC Data.xlsx"

The Cohorts of accelerators (under the Updated tab on the bottom): E:\Mcnair\Projects\Accelerators\"Clean Cohort Data.xlsx"

The Crunchbase Snapshots of organizations: E:\Mcnair\Projects\Accelerators\"Crunchbase Snapshot"\organizations.csv

To-do list

Filter out actual accelerators from the Crunchbase organizations data

- Possibly by running accelerator_keywords.py
- Possibly by using string searching in organizations.csv
- Watch out for Venture capital companies (the organizations file has many of these and we'll probably pick up a lot in our "accelerator" filtered list

Match this list against the current list of accelerators

- We have our own copy of the matcher in the accelerators E drive (try mode 1 and mode 2 for different results, mode 2 might be more helpful)
- This will tell you whether it was part of the old list or not (and therefore whether we need to get data for it or not)

Find cohort data for all of the new accelerators (ones not previously on the list & if they're not accelerators take them off the list)

- We used regex for this
- once you find the cohort data put it into the updated cohort data list excel file

Match the cohort data against the round data from SDC

- Make sure to get both the accelerator name and the cohort company name in the first document
- In the second document (to match against the first) put the list of all companies funded in rounds (from SDC)
- in summary: File1 = Accelerator Cohorts and File2 = SDC data

Upload the match file into the psql database, then follow the code in accelerators.sql

- making new code with your new uploaded tables and documents, you should just be able to follow what we've already done to get a similar percentVC table
- The previous percent VC table you'll want it to look like is PercentVc4

Don't worry about this stuff

Rank on VC

Getting a VC percentage for each Accelerator

Also categorize

Age
Nonprofit or not
Location

RegEx Code for repeating data down for the round data from SDC:

\n([^\t]+\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t)(.*)\n\t\t\t\t\t\t\t\t\t\t

\n\1\2\n\1

=if(isnumber(search("blah",B2))=TRUE,1,0) where blah is the substring (what you're searching for), B2 is the string (what your searching in) and 1 represents that it's present and 0 means it isn't.

=sum(A1:C1) This just sums the cells from A1 to C1

@@ Line 18: / Line 18: @@
 #Filter out actual accelerators from the Crunchbase organizations data
-*Possibly by running accelerator_keywords.py
+**Possibly by running accelerator_keywords.py
-*Possibly by using string searching in organizations.csv
+**Possibly by using string searching in organizations.csv
-*Watch out for Venture capital companies (the organizations file has many of these and we'll probably pick up a lot in our "accelerator" filtered list
+**Watch out for Venture capital companies (the organizations file has many of these and we'll probably pick up a lot in our "accelerator" filtered list
 #Match this list against the current list of accelerators
-*We have our own copy of the matcher in the accelerators E drive (try mode 1 and mode 2 for different results, mode 2 might be more helpful)
+**We have our own copy of the matcher in the accelerators E drive (try mode 1 and mode 2 for different results, mode 2 might be more helpful)
-*This will tell you whether it was part of the old list or not (and therefore whether we need to get data for it or not)
+**This will tell you whether it was part of the old list or not (and therefore whether we need to get data for it or not)
 #Find cohort data for all of the new accelerators (ones not previously on the list & if they're not accelerators take them off the list)
-*We used regex for this
+**We used regex for this
-*once you find the cohort data put it into the updated cohort data list excel file
+**once you find the cohort data put it into the updated cohort data list excel file
 #Match the cohort data against the round data from SDC
-*Make sure to get both the accelerator name and the cohort company name in the first document
+**Make sure to get both the accelerator name and the cohort company name in the first document
-*In the second document (to match against the first) put the list of all companies funded in rounds (from SDC)
+**In the second document (to match against the first) put the list of all companies funded in rounds (from SDC)
-*in summary: File1 = Accelerator Cohorts and File2 = SDC data
+**in summary: File1 = Accelerator Cohorts and File2 = SDC data
 #Upload the match file into the psql database, then follow the code in accelerators.sql
-*making new code with your new uploaded tables and documents, you should just be able to follow what we've already done to get a similar percentVC table
+**making new code with your new uploaded tables and documents, you should just be able to follow what we've already done to get a similar percentVC table
-*The previous percent VC table you'll want it to look like is PercentVc4
+**The previous percent VC table you'll want it to look like is PercentVc4
 =Don't worry about this stuff=

Difference between revisions of "Talk:Accelerator Seed List (Data)"

Revision as of 10:48, 24 April 2017

Contents

Intro

Important docs

To-do list

Don't worry about this stuff

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools