Difference between revisions of "Matching VentureOne (Data)"
Jump to navigation
Jump to search
Line 19: | Line 19: | ||
*Variables used for matching: EntityName | *Variables used for matching: EntityName | ||
− | Original patent data is in our database: <code>128.42.44.181/bulk/ | + | Original patent data is in our database: <code>128.42.44.181/bulk/allpatentsprocessed</code> |
===Final Matched Tables=== | ===Final Matched Tables=== |
Revision as of 14:57, 5 July 2016
Matching VentureOne (Data) | |
---|---|
Project Information | |
Project Title | |
Start Date | |
Deadline | |
Primary Billing | |
Notes | |
Has project status | |
Copyright © 2016 edegan.com. All Rights Reserved. |
Overview
In this matching process, we will join patent data to VentureOne companies and count the number of patents that affiliated to each company.
We first get the standard company names for VentureOne companies from the source VentureOne data set. Then we standardize the names of the companies that have patents from our patent database. Based on the common standard company names, we join patent information to VentureOne companies.
Raw Data
Original data set of VentureOne companies can be found at: E:\McNair\Projects\Venture One Data\Venture Data 1.xlsx
- All Variables: EntityName,Employees, City, State, Zip, AreaCode, Business Status, IndustryGroup...etc
- Variables used for matching: EntityName
Original patent data is in our database: 128.42.44.181/bulk/allpatentsprocessed
Final Matched Tables
- Summary table displaying number of patents owned, minimum grant year, maximum grant year and average grant year for each company (including the ones that own no patents). It can be found at:
E:\McNair\Projects\Venture One Data\venturepatentreallyfinal.txt
- A table contains all patent information for the companies that have patents and can be found at
E:\McNair\Projects\Venture One Data\venturepatentfullyjoined.txt
Detailed Data Processing
- Get the VentureOne data ready
- Source file for VentureOne data
E:\McNair\Projects\Venture One Data\Venture Data 1.xlsx
Original data source - Clean it up
E:\McNair\Software\Scripts\Matcher\Input\Venture Data 1.txt
extraneous symbols and words removed - Match it against itself to get standardized entity names
E:\McNair\Projects\Venture One Data\Cleaned and Matched Data.xlsx
- Get the patent data ready
- Draw the distinct assignees
Z:\allpatentsprocessed\DistinctAssignees2.txt
- Match them against themselves to get standardized org names for patent data
Z:\allpatentsprocessed\DistinctAssignees2matched.txt
- Match standardized org names of patent data to standardized entity names of venture data
Z:\allpatentsprocessed\Venture Patent Matched.txt
- Join patent data to venture data to get patent information of each venture-backed company
- Join
patent
data toassignee
data, creatingfirstjoin_cleaned
which matches assignees to patent numbers. - Join
firstjoin_cleaned
data tomatchassignee
data, creatingsecondjoin_cleaned
which matches standard org names to patent numbers - Join
secondjoin_cleaned
data toventurepatentmatched
data, creatingfourthjoin_cleaned
which matches standard venture company names to patent numbers
- Final summary tables
- Summary table displaying number of patents owned, minimum grant year, maximum grant year and average grant year for each company
E:\McNair\Projects\Venture One Data\venturepatentreallyfinal.txt
- A table of all patent information for each company that has patents
E:\McNair\Projects\Venture One Data\venturepatentfullyjoined.txt
- Notes
- All data in
allpatentsprocessed database
. Access it by logging on toresearcher@McNair DBServ:/bulk/allpatentsprocessed
- A script of detailed processing procedure can be found at
E:\McNair\Projects\Venture One Data\patent data script.txt