Crunchbase 2013 Snapshot
Revision as of 13:42, 10 March 2017 by Mringheanu (talk | contribs)
Retrieval
The data was retrieved by Shrey and Matthew - STATE HOW AND FROM WHERE
Content
The snapshot contained 2 .tar.qz files, which were extracted into 181/crunchbase using the command
tar -zxvf file.tar.gz
The csv files (organizations.csv and people.csv) were copied for access to:
E:\McNair\Projects\Accelerators\Crunchbase Snapshot
The files (size in bytes) and their contents are
crunchbase_2013_snapshot_mysql.tar.gz
- license.txt 526
- cb_objects.sql 338955612
- cb_offices.sql 14850092
- cb_people.sql 13253952
- cb_ipos.sql 178397
- cb_milestones.sql 10498840
- cb_funds.sql 385010
- cb_relationships.sql 48655529
- cb_degrees.sql 13829471
- cb_investments.sql 6185134
- cb_acquisitions.sql 2309393
- cb_funding_rounds.sql 14681705
odm.csv.tar.gz
- organizations.csv 212013301
- 459916 records with the following fields:
- crunchbase_uuid
- type
- primary_role
- name
- crunchbase_url
- homepage_domain
- homepage_url
- profile_image_url
- facebook_url
- twitter_url
- linkedin_url
- stock_symbol
- location_city
- location_region
- location_country_code
- short_description
- 459916 records with the following fields:
- people.csv 188924229
- 521634 records with the following fields:
- crunchbase_uuid
- type
- first_name
- last_name
- crunchbase_url
- profile_image_url
- facebook_url
- twitter_url
- linkedin_url
- location_city
- location_region
- location_country_code
- title
- organization
- organization_crunchbase_url
- 521634 records with the following fields:
- crunchbase_license.txt 487
Changing MYSQL to PostgreSQL
The SQL files were generated in MySQL. We need to convert them to PostgreSQL. See: https://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL and http://stackoverflow.com/questions/1942586/comparison-of-database-column-types-in-mysql-postgresql-and-sqlite-cross-map
The key changes are:
MYSQL POSTGRESQL ----- ---------- LOCK --comment out as no need but LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] UNLOCK --comment out decimal(x,y) real (might work as is) datetime timestamp KEY --comment out as no need but FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ]