Difference between revisions of "Crunchbase 2013 Snapshot"

From edegan.com
Jump to navigation Jump to search
Line 24: Line 24:
 
*cb_acquisitions.sql 2309393  
 
*cb_acquisitions.sql 2309393  
 
*cb_funding_rounds.sql 14681705  
 
*cb_funding_rounds.sql 14681705  
 
The SQL files were generated in MySQL. We need to convert them to PostgreSQL. See: https://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL
 
  
 
'''odm.csv.tar.gz'''
 
'''odm.csv.tar.gz'''
Line 64: Line 62:
 
***organization_crunchbase_url
 
***organization_crunchbase_url
 
*crunchbase_license.txt 487
 
*crunchbase_license.txt 487
 +
 +
==Changing MYSQL to PostgreSQL==
 +
 +
The SQL files were generated in MySQL. We need to convert them to PostgreSQL. See: https://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL and http://stackoverflow.com/questions/1942586/comparison-of-database-column-types-in-mysql-postgresql-and-sqlite-cross-map
 +
 +
The key changes are:
 +
 +
MYSQL          POSTGRESQL
 +
-----          ----------
 +
LOCK          --comment out as no need but LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]
 +
UNLOCK        --comment out
 +
decimal(x,y)  real (might work as is)
 +
datetime      timestamp
 +
KEY            --comment out as no need but FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ]

Revision as of 16:43, 9 March 2017

Retrieval

The data was retrieved by Shrey and Matthew - STATE HOW AND FROM WHERE

Content

The snapshot contained 2 .tar.qz files

    • which were extracted into 181/crunchbase using the command
tar -zxvf file.tar.gz

The files (size in bytes) and their contents are

crunchbase_2013_snapshot_mysql.tar.gz

  • license.txt 526
  • cb_objects.sql 338955612
  • cb_offices.sql 14850092
  • cb_people.sql 13253952
  • cb_ipos.sql 178397
  • cb_milestones.sql 10498840
  • cb_funds.sql 385010
  • cb_relationships.sql 48655529
  • cb_degrees.sql 13829471
  • cb_investments.sql 6185134
  • cb_acquisitions.sql 2309393
  • cb_funding_rounds.sql 14681705

odm.csv.tar.gz

  • organizations.csv 212013301
    • 459916 records with the following fields:
      • crunchbase_uuid
      • type
      • primary_role
      • name
      • crunchbase_url
      • homepage_domain
      • homepage_url
      • profile_image_url
      • facebook_url
      • twitter_url
      • linkedin_url
      • stock_symbol
      • location_city
      • location_region
      • location_country_code
      • short_description
  • people.csv 188924229
    • 521634 records with the following fields:
      • crunchbase_uuid
      • type
      • first_name
      • last_name
      • crunchbase_url
      • profile_image_url
      • facebook_url
      • twitter_url
      • linkedin_url
      • location_city
      • location_region
      • location_country_code
      • title
      • organization
      • organization_crunchbase_url
  • crunchbase_license.txt 487

Changing MYSQL to PostgreSQL

The SQL files were generated in MySQL. We need to convert them to PostgreSQL. See: https://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL and http://stackoverflow.com/questions/1942586/comparison-of-database-column-types-in-mysql-postgresql-and-sqlite-cross-map

The key changes are:

MYSQL          POSTGRESQL
-----          ----------
LOCK           --comment out as no need but LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]
UNLOCK         --comment out
decimal(x,y)   real (might work as is)
datetime       timestamp
KEY            --comment out as no need but FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ]