Difference between revisions of "Crunchbase Data"

McNair Project
Crunchbase Data
Project Information
Project Title	Crunchbase Data
Owner	Adrian Smart
Start Date	June 2017
Deadline
Keywords	Data, Tool, Crunchbase, VC, Angel
Primary Billing
Notes
Has project status	Complete
	Copyright © 2016 edegan.com. All Rights Reserved.

Revision as of 17:39, 7 November 2017

Files and dbases

The dbase is:

crunchbasebulk

The files are in:

E:\McNair\Software\Database Scripts\Crunchbase
Z:\Crunchbase\CrunchbaseData

To do

Download the data
Extract top 5 lines
Build table specs
Import the data

Issue importing events.csv. Line count does not match import count. Also typos in data for start_time and end_time which created import errors with time datatype. Switched to varchar. Issue importing organizations.csv. Line count is 515496 but on import got 515491. Last 4 lines in file are chinese characters which might affect import.

Downloading the data

Our user key is: 662e263576fe3e4ea5991edfbcfb9883

The download script is written in perl. It is called downloadScript and located in E:\McNair\Software\Database Scripts\Crunchbase. You can execute it by typing "perl downloadScript.pl" in terminal.

Where, what, etc.

Importing the data

To import the data make sure that all 22 crunchbase csv files are on the db server in /bulk/crunchbase/crunchbaseData. Also make sure that the load_crunchbase.sql script is in this directory. Run psql crunchbasebulk to start the db from this directory. Run the command "\i load_crunchbase.sql" to run the script. This will load the contents of the 22 csv files into the db. Check that the number of lines copied into each table matches the actual lines in the csv file. The line numbers have been included in the comments of the load_crunchbase.sql script. See issues section for unexpected results.

Type \dt to get a list of the tables in the crunchbasebulk db. Type \q to quit db. When the db is not running you can type 'wc -l acquisitions.csv' to get a line count of this file.

News is a relationship query parameter

https://api.crunchbase.com/v/3/organizations?user_key=662e263576fe3e4ea5991edfbcfb9883&uuid=1e4f199c-363b-451b-a164-f94571075ee5

permalink is airbnb in this example:

https://api.crunchbase.com/v/3/organizations/airbnb/news?user_key=662e263576fe3e4ea5991edfbcfb9883

Same thing for people - people end point then permalink

Accelerator Founders Data

The Crunchbase API can be used to readily access Founders Data. The API is used in the web format: https://api.crunchbase.com/v/3/organizations/company_name/?user_key=662e263576fe3e4ea5991edfbcfb9883, to get a JSON object with alot of data.

One such field in the JSON object is Founders, which is followed by profiles of the founders for any given company. Not all the accelerators have a crunchbase page, but it is a good start.

The script for querying the API can be found at:

E:\McNair\Projects\Accelerators\crunchbase_founders.py

I queried the API for all the accelerators we have listed in the following file:

E:\McNair\Projects\Accelerators\Fall 2017\ListofAccs.txt

I retrieved a list of founders for the accelerators that returned results from Crunchbase. Out of the 269 accelerators we have on record, 136 of them turned up results for founders, resulting in 312 founders. The list of founders and their respective company can be found at:

E:\McNair\Projects\Accelerators\founder_names.txt

A table has also been added to the crunchbasebulk database called founders. The table contains 3 columns: Company, first_name, last_name. Another table, called founders_linkedin, contains 4 columns: Company, first_name, last_name, and linkedin_url.

The file has been exported to the McNair DB Server.

The next step is to match each founder name with a linkedin profile. These profiles can be accessed using our LinkedIn Crawler to gather more information about each founder. The results of matching the founders with their linkedin profiles can be found at:

E:\McNair\Projects\Accelerators\founders_linkedin.txt

Difference between revisions of "Crunchbase Data"

Revision as of 17:39, 7 November 2017

Contents

Files and dbases

To do

Downloading the data

Importing the data

Accelerator Founders Data

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools

@@ Line 61: / Line 61: @@
 I queried the API for all the accelerators we have listed in the following file:
-  E:\McNair\Projects\Accelerators\accelerators.txt
+  E:\McNair\Projects\Accelerators\Fall 2017\ListofAccs.txt
 I retrieved a list of founders for the accelerators that returned results from Crunchbase. Out of the 269 accelerators we have on record, 136 of them turned up results for founders, resulting in 312 founders. The list of founders and their respective company can be found at: