Difference between revisions of "Crunchbase Database"
Line 57: | Line 57: | ||
data\people_descriptions.csv | data\people_descriptions.csv | ||
− | The sql script get_data.sql from last year is copied to the current Crunchbase3 directory. However, two databases are very different now and adjustments are necessary. Hiep will | + | The sql script get_data.sql from last year is copied to the current Crunchbase3 directory. However, two databases are very different now and adjustments are necessary. To keep track of the data type from each csv file used to copy to sql tables, a file get_type.py is included in E:\projects\crunchbase3. This python script will print the first 5 rows of every data frame in the current directory. |
+ | |||
+ | Hiep will continue fixing the get_data.sql script on 03/22/2019. |
Revision as of 16:09, 21 March 2019
Crunchbase Database | |
---|---|
Project Information | |
Has title | Crunchbase Database |
Has owner | Hiep Nguyen |
Has start date | 2019/03/13 |
Has deadline date | 2019/03/22 |
Has project status | Active |
Dependent(s): | Ecosystem Organization Classifier, Incubator Seed Data |
Copyright © 2019 edegan.com. All Rights Reserved. |
Files and Dbase
Files are in:
- E:\projects\crunchbase3
- Z:\crunchbase3
Dbase is crunchbase3
The old project page is Crunchbase Data. File locations listed as Z:/bulk/ should now be Z:/bulk/mcnair/. For example there is an old loadscript in /bulk/mcnair/crunchbase/crunchbaseData/load_crunchbase.sql
Crunchbase Pro
https://www.crunchbase.com/login
Login details:
- mcnair@rice.edu getpasswordfromed
Getting and cleaning data
The url to make API calls is https://api.crunchbase.com/v3.1/csv_export/csv_export.tar.gz?user_key=[API KEY GOES HERE]
API key (premium) is located at E:\projects\crunchbase3
The command line (bash script) to get the data and extract the data (1.9gb) is at E:\projects\crunchbase3\get_data.sh
Alternatively, we can download and extract directly using windows command prompt by typing the following commands
curl -O https://api.crunchbase.com/v3.1/csv_export/csv_export.tar.gz?user_key=[API key goes here] \ tar -xvf csv_export.tar.gz_user_key=[API key goes here].
Current csv files from crunchbase data
data\acquisitions.csv data\category_groups.csv data\degrees.csv data\events.csv data\event_appearances.csv data\funding_rounds.csv data\funds.csv data\investments.csv data\investment_partners.csv data\investors.csv data\ipos.csv data\jobs.csv data\organizations.csv data\organization_descriptions.csv data\org_parents.csv data\people.csv data\people_descriptions.csv
The sql script get_data.sql from last year is copied to the current Crunchbase3 directory. However, two databases are very different now and adjustments are necessary. To keep track of the data type from each csv file used to copy to sql tables, a file get_type.py is included in E:\projects\crunchbase3. This python script will print the first 5 rows of every data frame in the current directory.
Hiep will continue fixing the get_data.sql script on 03/22/2019.