Difference between revisions of "Crunchbase 2013 Snapshot"
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
+ | Username: mcnair@rice.edu | ||
+ | |||
+ | password: amount | ||
+ | |||
+ | ==Original Email== | ||
+ | |||
+ | Thank you for submitting a request for Research Access to Crunchbase through our API. We have reviewed your request, and granted you Basic Access. You can now access Crunchbase data in the following ways. | ||
+ | |||
+ | Check out the Open Data Map | ||
+ | Explore the 2013 Snapshot | ||
+ | Visit our website for instructions on accessing Crunchbase data. To access the REST API, you'll need your user key: | ||
+ | |||
+ | 6d382e4bbdaa297138f32a588b139f53 | ||
+ | |||
+ | |||
+ | With Basic Access, API use is limited to the Open Data Map and 2013 Snapshot. Access to the full API and latest funding round data requires a license. To learn more check out our offerings. | ||
+ | |||
+ | ==Basic Membership== | ||
+ | *Can not seem to filter results past the first 50 companies | ||
+ | *Very basic information such as company name, location, industry classification, website, and "Crunchbase ranking". | ||
+ | |||
==Retrieval== | ==Retrieval== | ||
− | The data was retrieved by Shrey and Matthew | + | The data was retrieved by Shrey and Matthew through an application from the Crunchbase Website for the API service. The data took about a month to come in due to a lack of response from Crunchbase itself. Eventually, they gave us basic access. |
==Content== | ==Content== | ||
− | The snapshot contained 2 .tar.qz files | + | The snapshot contained 2 .tar.qz files, which were extracted into 181/crunchbase using the command |
− | |||
tar -zxvf file.tar.gz | tar -zxvf file.tar.gz | ||
+ | |||
+ | The csv files (organizations.csv and people.csv) were copied for access to: | ||
+ | E:\McNair\Projects\Accelerators\Crunchbase Snapshot | ||
The files (size in bytes) and their contents are | The files (size in bytes) and their contents are | ||
Line 24: | Line 47: | ||
*cb_acquisitions.sql 2309393 | *cb_acquisitions.sql 2309393 | ||
*cb_funding_rounds.sql 14681705 | *cb_funding_rounds.sql 14681705 | ||
− | |||
− | |||
'''odm.csv.tar.gz''' | '''odm.csv.tar.gz''' | ||
Line 64: | Line 85: | ||
***organization_crunchbase_url | ***organization_crunchbase_url | ||
*crunchbase_license.txt 487 | *crunchbase_license.txt 487 | ||
+ | |||
+ | ==Changing MYSQL to PostgreSQL== | ||
+ | |||
+ | The SQL files were generated in MySQL. We need to convert them to PostgreSQL. See: https://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL and http://stackoverflow.com/questions/1942586/comparison-of-database-column-types-in-mysql-postgresql-and-sqlite-cross-map | ||
+ | |||
+ | The key changes are: | ||
+ | |||
+ | MYSQL POSTGRESQL | ||
+ | ----- ---------- | ||
+ | LOCK --comment out as no need but LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] | ||
+ | UNLOCK --comment out | ||
+ | decimal(x,y) real (might work as is) | ||
+ | datetime timestamp | ||
+ | KEY --comment out as no need but FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ] | ||
+ | |||
+ | ==Documentation and File Locations== | ||
+ | The Crunchbase information were broken down into two different files: | ||
+ | *The "organizations" Excel file contains: crunchbase_uuid type primary_role name crunchbase_url homepage_domain homepage_url profile_image_url facebook_url twitter_url linkedin_url stock_symbol location_city location_region location_country_code short_description | ||
+ | :* Located in E:\McNair\Projects\Accelerators\Crunchbase Snapshot | ||
+ | *The "people" Excel file contains: crunchbase_uuid type first_name last_name crunchbase_url profile_image_url facebook_url twitter_url linkedin_url location_city location_region location_country_code title organization organization_crunchbase_url | ||
+ | :* Located in E:\McNair\Projects\Accelerators\Crunchbase Snapshot | ||
+ | |||
+ | ==Obtaining Accelerators from the "Organizations" file== | ||
+ | #Created new columns in the data labeled "match: blah", where blah is the word we're searching for in the descriptions | ||
+ | #Added: "=if(isnumber(search("blah",B2))=TRUE,1,0)", where blah is the substring (what you're searching for), B2 is the string (what your searching in) and 1 represents that it's present and 0 means it isn't. | ||
+ | #Added: "=sum(A1:C1) This just sums the cells from A1 to C1" | ||
+ | #Compiled a list of potential accelerators depending on the number of matches (the sum) | ||
+ | :# File is labeled "PotentialAccelerators" which just has the list of accelerators we were considering based on their match number; located in E:\McNair\Projects\Accelerators\Crunchbase Snapshot | ||
+ | |||
+ | ==Appending Crunchbase Accelerators Information to Old Accelerator Data== | ||
+ | Matthew created a file called "Crunchbase Potential Accelerators", which repeats the names of all the accelerators in the Crunchbase folder, but includes a note as to whether the accelerator is already included in our data, or whether we need to add the file to our data before the semester ends. | ||
+ | *Adding the accelerator consists of adding the accelerator text file with basic information, the cohort html file, and most importantly, the cohort text file so we can calculate the VC raise rate | ||
+ | |||
+ | [[category:internal]] |
Latest revision as of 15:59, 17 April 2017
Username: mcnair@rice.edu
password: amount
Contents
Original Email
Thank you for submitting a request for Research Access to Crunchbase through our API. We have reviewed your request, and granted you Basic Access. You can now access Crunchbase data in the following ways.
Check out the Open Data Map Explore the 2013 Snapshot Visit our website for instructions on accessing Crunchbase data. To access the REST API, you'll need your user key:
6d382e4bbdaa297138f32a588b139f53
With Basic Access, API use is limited to the Open Data Map and 2013 Snapshot. Access to the full API and latest funding round data requires a license. To learn more check out our offerings.
Basic Membership
- Can not seem to filter results past the first 50 companies
- Very basic information such as company name, location, industry classification, website, and "Crunchbase ranking".
Retrieval
The data was retrieved by Shrey and Matthew through an application from the Crunchbase Website for the API service. The data took about a month to come in due to a lack of response from Crunchbase itself. Eventually, they gave us basic access.
Content
The snapshot contained 2 .tar.qz files, which were extracted into 181/crunchbase using the command
tar -zxvf file.tar.gz
The csv files (organizations.csv and people.csv) were copied for access to:
E:\McNair\Projects\Accelerators\Crunchbase Snapshot
The files (size in bytes) and their contents are
crunchbase_2013_snapshot_mysql.tar.gz
- license.txt 526
- cb_objects.sql 338955612
- cb_offices.sql 14850092
- cb_people.sql 13253952
- cb_ipos.sql 178397
- cb_milestones.sql 10498840
- cb_funds.sql 385010
- cb_relationships.sql 48655529
- cb_degrees.sql 13829471
- cb_investments.sql 6185134
- cb_acquisitions.sql 2309393
- cb_funding_rounds.sql 14681705
odm.csv.tar.gz
- organizations.csv 212013301
- 459916 records with the following fields:
- crunchbase_uuid
- type
- primary_role
- name
- crunchbase_url
- homepage_domain
- homepage_url
- profile_image_url
- facebook_url
- twitter_url
- linkedin_url
- stock_symbol
- location_city
- location_region
- location_country_code
- short_description
- 459916 records with the following fields:
- people.csv 188924229
- 521634 records with the following fields:
- crunchbase_uuid
- type
- first_name
- last_name
- crunchbase_url
- profile_image_url
- facebook_url
- twitter_url
- linkedin_url
- location_city
- location_region
- location_country_code
- title
- organization
- organization_crunchbase_url
- 521634 records with the following fields:
- crunchbase_license.txt 487
Changing MYSQL to PostgreSQL
The SQL files were generated in MySQL. We need to convert them to PostgreSQL. See: https://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL and http://stackoverflow.com/questions/1942586/comparison-of-database-column-types-in-mysql-postgresql-and-sqlite-cross-map
The key changes are:
MYSQL POSTGRESQL ----- ---------- LOCK --comment out as no need but LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] UNLOCK --comment out decimal(x,y) real (might work as is) datetime timestamp KEY --comment out as no need but FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ]
Documentation and File Locations
The Crunchbase information were broken down into two different files:
- The "organizations" Excel file contains: crunchbase_uuid type primary_role name crunchbase_url homepage_domain homepage_url profile_image_url facebook_url twitter_url linkedin_url stock_symbol location_city location_region location_country_code short_description
- Located in E:\McNair\Projects\Accelerators\Crunchbase Snapshot
- The "people" Excel file contains: crunchbase_uuid type first_name last_name crunchbase_url profile_image_url facebook_url twitter_url linkedin_url location_city location_region location_country_code title organization organization_crunchbase_url
- Located in E:\McNair\Projects\Accelerators\Crunchbase Snapshot
Obtaining Accelerators from the "Organizations" file
- Created new columns in the data labeled "match: blah", where blah is the word we're searching for in the descriptions
- Added: "=if(isnumber(search("blah",B2))=TRUE,1,0)", where blah is the substring (what you're searching for), B2 is the string (what your searching in) and 1 represents that it's present and 0 means it isn't.
- Added: "=sum(A1:C1) This just sums the cells from A1 to C1"
- Compiled a list of potential accelerators depending on the number of matches (the sum)
- File is labeled "PotentialAccelerators" which just has the list of accelerators we were considering based on their match number; located in E:\McNair\Projects\Accelerators\Crunchbase Snapshot
Appending Crunchbase Accelerators Information to Old Accelerator Data
Matthew created a file called "Crunchbase Potential Accelerators", which repeats the names of all the accelerators in the Crunchbase folder, but includes a note as to whether the accelerator is already included in our data, or whether we need to add the file to our data before the semester ends.
- Adding the accelerator consists of adding the accelerator text file with basic information, the cohort html file, and most importantly, the cohort text file so we can calculate the VC raise rate