Difference between revisions of "Matching VentureOne (Data)"

From edegan.com
Jump to navigation Jump to search
Line 13: Line 13:
 
#Clean it up <code>E:\McNair\Software\Scripts\Matcher\Input\Venture Data 1.txt</code> extraneous symbols and words removed
 
#Clean it up <code>E:\McNair\Software\Scripts\Matcher\Input\Venture Data 1.txt</code> extraneous symbols and words removed
 
#Match it against itself to get standardized entity names <code>E:\McNair\Projects\Venture One Data\Cleaned and Matched Data.xlsx</code>
 
#Match it against itself to get standardized entity names <code>E:\McNair\Projects\Venture One Data\Cleaned and Matched Data.xlsx</code>
 
+
<br>
 
 
 
;*Get the patent data ready
 
;*Get the patent data ready
 
#Draw the distinct assignees <code>Z:\allpatentsprocessed\DistinctAssignees2.txt </code>
 
#Draw the distinct assignees <code>Z:\allpatentsprocessed\DistinctAssignees2.txt </code>
 
#Match them against themselves to get standardized org names for patent data <code>Z:\allpatentsprocessed\DistinctAssignees2matched.txt </code>
 
#Match them against themselves to get standardized org names for patent data <code>Z:\allpatentsprocessed\DistinctAssignees2matched.txt </code>
 
+
<br>
 
;*Match standardized org names of patent data to standardized entity names of venture data
 
;*Match standardized org names of patent data to standardized entity names of venture data
 
:<code>Z:\allpatentsprocessed\Venture Patent Matched.txt</code>
 
:<code>Z:\allpatentsprocessed\Venture Patent Matched.txt</code>
 
+
<br>
 
*Join patent data to venture data to get patent information of each venture-backed companies
 
*Join patent data to venture data to get patent information of each venture-backed companies
 
#Join <code>patent</code> data to <code>assignee</code> data, creating <code>firstjoin_cleaned</code>
 
#Join <code>patent</code> data to <code>assignee</code> data, creating <code>firstjoin_cleaned</code>
 
#Join <code>firstjoin_cleaned</code> data to <code>matchassignee</code> data, creating <code>secondjoin_cleaned</code>
 
#Join <code>firstjoin_cleaned</code> data to <code>matchassignee</code> data, creating <code>secondjoin_cleaned</code>
 
#Join <code>secondjoin_cleaned</code> data to <code>venturepatentmatched</code> data, creating <code>fourthjoin_cleaned</code>
 
#Join <code>secondjoin_cleaned</code> data to <code>venturepatentmatched</code> data, creating <code>fourthjoin_cleaned</code>
 
+
<br>
 
*Final summary Tables
 
*Final summary Tables
 
#Summary table displaying number of patents, minimum grant year, maximum grant year and average grant year for each company <code>E:\McNair\Projects\Venture One Data\venturepatentreallyfinal.txt</code>
 
#Summary table displaying number of patents, minimum grant year, maximum grant year and average grant year for each company <code>E:\McNair\Projects\Venture One Data\venturepatentreallyfinal.txt</code>
 
#A table of all patent information for each company that has patent <code>E:\McNair\Projects\Venture One Data\venturepatentfullyjoined.txt</code>
 
#A table of all patent information for each company that has patent <code>E:\McNair\Projects\Venture One Data\venturepatentfullyjoined.txt</code>
 +
<br>
 +
*Notes
 +
#

Revision as of 11:40, 16 June 2016


McNair Project
Matching VentureOne (Data)
Project logo 02.png
Project Information
Project Title
Start Date
Deadline
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


Data Processing

  • Get the VentureOne data ready
  1. Source file for VentureOne data E:\McNair\Projects\Venture One Data\Venture Data 1.xlsx Original data source
  2. Clean it up E:\McNair\Software\Scripts\Matcher\Input\Venture Data 1.txt extraneous symbols and words removed
  3. Match it against itself to get standardized entity names E:\McNair\Projects\Venture One Data\Cleaned and Matched Data.xlsx


  • Get the patent data ready
  1. Draw the distinct assignees Z:\allpatentsprocessed\DistinctAssignees2.txt
  2. Match them against themselves to get standardized org names for patent data Z:\allpatentsprocessed\DistinctAssignees2matched.txt


  • Match standardized org names of patent data to standardized entity names of venture data
Z:\allpatentsprocessed\Venture Patent Matched.txt


  • Join patent data to venture data to get patent information of each venture-backed companies
  1. Join patent data to assignee data, creating firstjoin_cleaned
  2. Join firstjoin_cleaned data to matchassignee data, creating secondjoin_cleaned
  3. Join secondjoin_cleaned data to venturepatentmatched data, creating fourthjoin_cleaned


  • Final summary Tables
  1. Summary table displaying number of patents, minimum grant year, maximum grant year and average grant year for each company E:\McNair\Projects\Venture One Data\venturepatentreallyfinal.txt
  2. A table of all patent information for each company that has patent E:\McNair\Projects\Venture One Data\venturepatentfullyjoined.txt


  • Notes