Difference between revisions of "Crunchbase Database"

Project
Crunchbase Database
Project Information
Has title	Crunchbase Database
Has owner	Hiep Nguyen
Has start date	2019/03/13
Has deadline date	2019/03/22
Has project status	Active
Dependent(s):	Ecosystem Organization Classifier, Incubator Seed Data
	Copyright © 2019 edegan.com. All Rights Reserved.

Revision as of 17:09, 21 March 2019

Files and Dbase

Files are in:

E:\projects\crunchbase3
Z:\crunchbase3

Dbase is crunchbase3

The old project page is Crunchbase Data. File locations listed as Z:/bulk/ should now be Z:/bulk/mcnair/. For example there is an old loadscript in /bulk/mcnair/crunchbase/crunchbaseData/load_crunchbase.sql

Crunchbase Pro

https://www.crunchbase.com/login

Login details:

mcnair@rice.edu getpasswordfromed

Getting and cleaning data

The url to make API calls is https://api.crunchbase.com/v3.1/csv_export/csv_export.tar.gz?user_key=[API KEY GOES HERE]

API key (premium) is located at E:\projects\crunchbase3

The command line (bash script) to get the data and extract the data (1.9gb) is at E:\projects\crunchbase3\get_data.sh

Alternatively, we can download and extract directly using windows command prompt by typing the following commands

curl -O https://api.crunchbase.com/v3.1/csv_export/csv_export.tar.gz?user_key=[API key goes here] \
      
tar -xvf csv_export.tar.gz_user_key=[API key goes here].

Current csv files from crunchbase data

data\acquisitions.csv
data\category_groups.csv
data\degrees.csv
data\events.csv
data\event_appearances.csv
data\funding_rounds.csv
data\funds.csv
data\investments.csv
data\investment_partners.csv
data\investors.csv
data\ipos.csv
data\jobs.csv
data\organizations.csv
data\organization_descriptions.csv
data\org_parents.csv
data\people.csv
data\people_descriptions.csv

The sql script get_data.sql from last year is copied to the current Crunchbase3 directory. However, two databases are very different now and adjustments are necessary. To keep track of the data type from each csv file used to copy to sql tables, a file get_type.py is included in E:\projects\crunchbase3. This python script will print the first 5 rows of every data frame in the current directory.

Hiep will continue fixing the get_data.sql script on 03/22/2019.

Difference between revisions of "Crunchbase Database"

Revision as of 17:09, 21 March 2019

Files and Dbase

Crunchbase Pro

Getting and cleaning data

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools

@@ Line 57: / Line 57: @@
   data\people_descriptions.csv
-The sql script get_data.sql from last year is copied to the current Crunchbase3 directory. However, two databases are very different now and adjustments are necessary. Hiep will fix it on 03/22/2019.
+The sql script get_data.sql from last year is copied to the current Crunchbase3 directory. However, two databases are very different now and adjustments are necessary. To keep track of the data type from each csv file used to copy to sql tables, a file get_type.py is included in E:\projects\crunchbase3. This python script will print the first 5 rows of every data frame in the current directory.
+Hiep will continue fixing the get_data.sql script on 03/22/2019.