Difference between revisions of "Crunchbase Data"
Line 25: | Line 25: | ||
==Downloading the data== | ==Downloading the data== | ||
− | Our user key is: 662e263576fe3e4ea5991edfbcfb9883 | + | Our user key is: 662e263576fe3e4ea5991edfbcfb9883 \n |
The download script is written in perl. It is called downloadScript and located in E:\McNair\Software\Database Scripts\Crunchbase. You can execute it by typing "perl downloadScript.pl" in terminal. | The download script is written in perl. It is called downloadScript and located in E:\McNair\Software\Database Scripts\Crunchbase. You can execute it by typing "perl downloadScript.pl" in terminal. | ||
Revision as of 10:33, 19 June 2017
Crunchbase Data | |
---|---|
Project Information | |
Project Title | Crunchbase Data |
Owner | Adrian Smart |
Start Date | June 2017 |
Deadline | |
Keywords | Data, Tool, Crunchbase, VC, Angel |
Primary Billing | |
Notes | |
Has project status | Active |
Copyright © 2016 edegan.com. All Rights Reserved. |
Files and dbases
The dbase is:
crunchbasebulk
The files are in:
E:\McNair\Software\Database Scripts\Crunchbase Z:\Crunchbase\CrunchbaseData
To do
- Download the data
- Extract top 5 lines
- Build table specs
- Import the data
Issue importing events.csv. Line count does not match import count. Also typos in data for start_time and end_time which created import errors with time datatype. Switched to varchar. Issue importing organizations.csv. Line count is 515496 but on import got 515491. Last 4 lines in file are chinese characters which might affect import.
Downloading the data
Our user key is: 662e263576fe3e4ea5991edfbcfb9883 \n The download script is written in perl. It is called downloadScript and located in E:\McNair\Software\Database Scripts\Crunchbase. You can execute it by typing "perl downloadScript.pl" in terminal.
Where, what, etc.
Importing the data
To import the data make sure that all 22 crunchbase csv files are on the db server in /bulk/crunchbase/crunchbaseData. Also make sure that the load_crunchbase.sql script is in this directory. Run psql crunchbasebulk to start the db from this directory. Run the command "\i load_crunchbase.sql" to run the script. This will load the contents of the 22 csv files into the db. Check that the number of lines copied into each table matches the actual lines in the csv file. The line numbers have been included in the comments of the load_crunchbase.sql script. See issues section for unexpected results.
Type \dt to get a list of the tables in the crunchbasebulk db. Type \q to quit db. When the db is not running you can type 'wc -l acquisitions.csv' to get a line count of this file.
News is a relationship query parameter
permalink is airbnb in this example:
https://api.crunchbase.com/v/3/organizations/airbnb/news?user_key=662e263576fe3e4ea5991edfbcfb9883
Same thing for people - people end point then permalink