Crunchbase 2013 Snapshot

From edegan.com
Revision as of 14:42, 9 March 2017 by Ed (talk | contribs)
Jump to navigation Jump to search

Retrieval

The data was retrieved by Shrey and Matthew - STATE HOW AND FROM WHERE

Content

The snapshot contained 2 .tar.qz files

    • which were extracted into 181/crunchbase using the command
tar -zxvf file.tar.gz

The files (size in bytes) and their contents are

crunchbase_2013_snapshot_mysql.tar.gz

  • license.txt 526
  • cb_objects.sql 338955612
  • cb_offices.sql 14850092
  • cb_people.sql 13253952
  • cb_ipos.sql 178397
  • cb_milestones.sql 10498840
  • cb_funds.sql 385010
  • cb_relationships.sql 48655529
  • cb_degrees.sql 13829471
  • cb_investments.sql 6185134
  • cb_acquisitions.sql 2309393
  • cb_funding_rounds.sql 14681705

The SQL files were generated in MySQL. We need to convert them to PostgreSQL. See: https://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL

odm.csv.tar.gz

  • organizations.csv 212013301
    • 459916 records with the following fields:
      • crunchbase_uuid
      • type
      • primary_role
      • name
      • crunchbase_url
      • homepage_domain
      • homepage_url
      • profile_image_url
      • facebook_url
      • twitter_url
      • linkedin_url
      • stock_symbol
      • location_city
      • location_region
      • location_country_code
      • short_description
  • people.csv 188924229
    • 521634 records with the following fields:
      • crunchbase_uuid
      • type
      • first_name
      • last_name
      • crunchbase_url
      • profile_image_url
      • facebook_url
      • twitter_url
      • linkedin_url
      • location_city
      • location_region
      • location_country_code
      • title
      • organization
      • organization_crunchbase_url
  • crunchbase_license.txt 487