Difference between revisions of "Matching VentureOne (Data)"

From edegan.com
Jump to navigation Jump to search
 
(41 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{McNair Projects
+
{{Project
|Project Title=Matching VentureOne (Data)
+
|Has project output=
|Topic Area=Patents and Innovation
+
|Has title=Matching VentureOne (Data)
|Owner=Ariel Sun, Rosemarie Ziedonis
+
|Has owner=Ariel Sun, Rosemarie Ziedonis
|Start Term=Summer 2016
+
|Has start date=Summer 2016
|Status=Active
+
|Due Date=NA
|Deliverable=Other
+
|Has sponsor=McNair Center
|Primary Billing= AccMcNair01
+
|Has notes=
 +
|Is dependent on=
 +
|Depends upon it=
 +
|Has project status=Complete
 
}}
 
}}
 +
=Updated=
 +
'''New Requirements'''
 +
*re-run the match using *both* name-related fields in the startups_cl.dta file:  “name” and “name_prev”.
 +
#the latter field pulls in patents applied for under a former name of the same company
  
 +
*in the output file, please include…
 +
#include the field “entityid” that corresponds to each startup (this step is critical; else, we can’t link patents filed under alternative names of the company to the same firm); in startups_cl.dta
 +
#all assignee-related fields in your patent data (e.g., assignee name, and any original and current uspto assignee codes listed for the patent); merge in from your patent files
 +
 +
'''Output'''
 +
 +
Text files are in:<code>E:\McNair\Projects\Venture One Data\</code>
 +
#<code>summarytablefinal</code>: summary on number of patents and grant year for all companies
 +
#<code>ullyjoinedtable</code>: all patent and assignee information for entities that have patents (combining 3 and 4)
 +
#<code>fullyjoinednow</code>: patent information under current name of the company
 +
#<code>fullyjoinedprev</code>: patent information under previous name of the company
 +
 +
A new version of sql script can be cound at :<code>E:\McNair\Projects\Venture One Data\sql script.txt</code>
 +
 +
'''notes'''
 +
  In summarytablefinal table: (for all entities)
 +
  Variables:
 +
  Entity Name|Standard Orgname|Number of patent|
 +
  Previous Name|Previous Standard Orgname|Previous Number of Patent|
 +
  Total number of Patent|
 +
  Orinigal ID | Revised ID|
 +
  min grant year|max grant year|avg grant year|
 +
  ***One company have the exactly same name for entity name and previous name(Z-KAT) and there is double counting of patent. So the total number of patent
 +
  should be 11 instead of 22.
 +
 +
  In fullyjoinedtable: (for entities that have patent)
 +
  Including all patent and assignee variables
 +
  33 variables in total
 +
  variables start with 'asg' are assignee information, e.g. asgtype =  assignee type
 +
  The rest are patent information.
 +
 +
=Old=
 
==Overview==
 
==Overview==
 
In this matching process, we will join patent data to VentureOne companies and count the number of patents that affiliated to each company.  
 
In this matching process, we will join patent data to VentureOne companies and count the number of patents that affiliated to each company.  
 
We first get the standard company names for VentureOne companies from the source VentureOne data set. Then we standardize the names of the companies that have patents from our patent database. Based on the common standard company names, we join patent information to VentureOne companies.
 
  
 
===Raw Data===
 
===Raw Data===
Line 18: Line 55:
 
*All Variables: EntityName,Employees, City, State, Zip, AreaCode, Business Status, IndustryGroup...etc
 
*All Variables: EntityName,Employees, City, State, Zip, AreaCode, Business Status, IndustryGroup...etc
 
*Variables used for matching: EntityName
 
*Variables used for matching: EntityName
 +
 +
Original patent data is in our database: <code>128.42.44.181/bulk/allpatent</code>
 +
 +
===Procedure===
 +
We first get the standard company names for VentureOne companies from the source VentureOne data set. Then we standardize the names of the companies that have patents from our patent database. Based on the common standard company names, we join patent information to VentureOne companies.
  
 
===Final Matched Tables===
 
===Final Matched Tables===
#Summary table displaying number of patents owned, minimum grant year, maximum grant year and average grant year for each company (including the ones that own no patents).It can be found at:<code>E:\McNair\Projects\Venture One Data\venturepatentreallyfinal.txt</code>
+
#Summary table displaying number of patents owned, minimum grant year, maximum grant year and average grant year for each company (including the ones that own no patents). It can be found at:<code>E:\McNair\Projects\Venture One Data\venturesummary.txt</code>
#A table contains all patent information for the companies that have patents and can be found at <code>E:\McNair\Projects\Venture One Data\venturepatentfullyjoined.txt</code>
+
#A table contains all patent information for the companies that have patents and can be found at <code>E:\McNair\Projects\Venture One Data\venturefullyjoined.txt</code>
 
 
  
 +
===Desired Variables===
  
 +
Below is the list of variables that were in the STATA file we were given:
 +
Contains data from C:\Users\ArielSun\Downloads\allpats_3sectors_06jun13.dta
 +
  obs:        19,409                         
 +
  vars:            36                          11 Jun 2016 17:31
 +
  size:    10,655,541                          (_dta has notes)
 +
----------------------------------------------------------------------------------------------------------------------
 +
              storage  display    value
 +
variable name  type    format    label      variable label
 +
----------------------------------------------------------------------------------------------------------------------
 +
id_vone        double  %9.0g                VentureOne id
 +
name            str39  %39s                  startup name
 +
patent          str9    %9s                  patno in string
 +
apn            str6    %6s                  pat application number
 +
nmi            str40  %40s                  inventor name
 +
ttl            str244  %40s                  invention title
 +
nma            str65  %65s                  original assignee
 +
ocd            str15  %15s                  main us patent class
 +
icd            str15  %15s                  main intl patent class
 +
apd            float  %td                  application date
 +
gdateold        float  %td                  Grant date
 +
fnd_year        float  %8.0g                startup founding year
 +
last_yr        float  %9.0g              * OLD last_yr, 2006; see notes
 +
source          byte    %8.0g                1 if 2012 delphion searches; else from 2004/5 search
 +
pdate          float  %td                  priority date, delphion; may pre-date application date if provisional apps
 +
utility        float  %9.0g              * 1 if utility patent as initially awarded; 0 if other (reissued, reexamed, design
 +
state_country  str3    %9s                  state/country of first inventor listed
 +
asscode        float  %9.0g                assignee code; basic.dta
 +
ayear          int    %9.0g                application year
 +
amonth          byte    %9.0g                application month
 +
atype          str1    %9s                * initial assignee type; see notes
 +
class          str3    %9s                  3 digit us pat class
 +
subclass        str6    %9s                  patent subclass
 +
gdate          int    %d                    grant, or issuance, date
 +
industry        str15  %15s                  semi, software, or med devices
 +
state_hq        str2    %9s                  firm hq location; vone
 +
status06        str4    %9s                * status of firm known in 2006; rhs truncation varies by sector
 +
exitdate        str8    %9s                  exit date, if known
 +
exityr          str4    %9s                  exit year, if known
 +
status08        str6    %9s                * status of firm in 2008, see notes
 +
last_yr08      int    %8.0g              * exityr if ipo/acq, else 2008
 +
dcohort        float  %9.0g                1 if founding yr during 1987-99
 +
lastyr08_minu~r float  %9.0g               
 +
dsearch_assign  float  %9.0g                1 if searches of pat assignment data need to be conducted; carlosn confirm?
 +
carlos_chk      float  %9.0g                carlos: pls confirm assignment data = compiled for these pats
 +
entityid        long    %12.0g                unique startup id as of 2008, vone
 +
                                            * indicated variables have notes
 +
----------------------------------------------------------------------------------------------------------------------
  
 
==Detailed Data Processing==
 
==Detailed Data Processing==
Line 51: Line 140:
 
#All data in <code>allpatentsprocessed database</code>. Access it by logging on to <code>researcher@McNair DBServ:/bulk/allpatentsprocessed</code>
 
#All data in <code>allpatentsprocessed database</code>. Access it by logging on to <code>researcher@McNair DBServ:/bulk/allpatentsprocessed</code>
 
#A script of detailed processing procedure can be found at <code>E:\McNair\Projects\Venture One Data\patent data script.txt</code>
 
#A script of detailed processing procedure can be found at <code>E:\McNair\Projects\Venture One Data\patent data script.txt</code>
 +
 +
==The matched data==
 +
 +
We are giving back two files:
 +
*One is at the patent level and contains information on 38,497 patents held by the 1,557 of the 3,357 companies.
 +
*The other file is at the company level and aggregate patent information for the 3,357 companies.
  
 
<includeonly>
 
<includeonly>
 
[[Category: McNair Projects]]
 
[[Category: McNair Projects]]
</includeonly><!-- flush -->
+
</includeonly><!-- flush flush --><!-- flush flush --><!-- flush flush --><!-- flush -->

Latest revision as of 12:34, 21 September 2020


Project
Matching VentureOne (Data)
Project logo 02.png
Project Information
Has title Matching VentureOne (Data)
Has owner Ariel Sun, Rosemarie Ziedonis
Has start date Summer 2016
Has deadline date
Has project status Complete
Has sponsor McNair Center
Copyright © 2019 edegan.com. All Rights Reserved.

Updated

New Requirements

  • re-run the match using *both* name-related fields in the startups_cl.dta file: “name” and “name_prev”.
  1. the latter field pulls in patents applied for under a former name of the same company
  • in the output file, please include…
  1. include the field “entityid” that corresponds to each startup (this step is critical; else, we can’t link patents filed under alternative names of the company to the same firm); in startups_cl.dta
  2. all assignee-related fields in your patent data (e.g., assignee name, and any original and current uspto assignee codes listed for the patent); merge in from your patent files

Output

Text files are in:E:\McNair\Projects\Venture One Data\

  1. summarytablefinal: summary on number of patents and grant year for all companies
  2. ullyjoinedtable: all patent and assignee information for entities that have patents (combining 3 and 4)
  3. fullyjoinednow: patent information under current name of the company
  4. fullyjoinedprev: patent information under previous name of the company

A new version of sql script can be cound at :E:\McNair\Projects\Venture One Data\sql script.txt

notes

 In summarytablefinal table: (for all entities)
 Variables:
 Entity Name|Standard Orgname|Number of patent|
 Previous Name|Previous Standard Orgname|Previous Number of Patent|
 Total number of Patent|
 Orinigal ID | Revised ID|
 min grant year|max grant year|avg grant year|
 ***One company have the exactly same name for entity name and previous name(Z-KAT) and there is double counting of patent. So the total number of patent
 should be 11 instead of 22.
 In fullyjoinedtable: (for entities that have patent)
 Including all patent and assignee variables
 33 variables in total
 variables start with 'asg' are assignee information, e.g. asgtype =  assignee type
 The rest are patent information.

Old

Overview

In this matching process, we will join patent data to VentureOne companies and count the number of patents that affiliated to each company.

Raw Data

Original data set of VentureOne companies can be found at: E:\McNair\Projects\Venture One Data\Venture Data 1.xlsx

  • All Variables: EntityName,Employees, City, State, Zip, AreaCode, Business Status, IndustryGroup...etc
  • Variables used for matching: EntityName

Original patent data is in our database: 128.42.44.181/bulk/allpatent

Procedure

We first get the standard company names for VentureOne companies from the source VentureOne data set. Then we standardize the names of the companies that have patents from our patent database. Based on the common standard company names, we join patent information to VentureOne companies.

Final Matched Tables

  1. Summary table displaying number of patents owned, minimum grant year, maximum grant year and average grant year for each company (including the ones that own no patents). It can be found at:E:\McNair\Projects\Venture One Data\venturesummary.txt
  2. A table contains all patent information for the companies that have patents and can be found at E:\McNair\Projects\Venture One Data\venturefullyjoined.txt

Desired Variables

Below is the list of variables that were in the STATA file we were given:

Contains data from C:\Users\ArielSun\Downloads\allpats_3sectors_06jun13.dta
  obs:        19,409                          
 vars:            36                          11 Jun 2016 17:31
 size:    10,655,541                          (_dta has notes)
----------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
id_vone         double  %9.0g                 VentureOne id
name            str39   %39s                  startup name
patent          str9    %9s                   patno in string
apn             str6    %6s                   pat application number
nmi             str40   %40s                  inventor name
ttl             str244  %40s                  invention title
nma             str65   %65s                  original assignee
ocd             str15   %15s                  main us patent class
icd             str15   %15s                  main intl patent class
apd             float   %td                   application date
gdateold        float   %td                   Grant date
fnd_year        float   %8.0g                 startup founding year
last_yr         float   %9.0g               * OLD last_yr, 2006; see notes
source          byte    %8.0g                 1 if 2012 delphion searches; else from 2004/5 search
pdate           float   %td                   priority date, delphion; may pre-date application date if provisional apps
utility         float   %9.0g               * 1 if utility patent as initially awarded; 0 if other (reissued, reexamed, design
state_country   str3    %9s                   state/country of first inventor listed
asscode         float   %9.0g                 assignee code; basic.dta
ayear           int     %9.0g                 application year
amonth          byte    %9.0g                 application month
atype           str1    %9s                 * initial assignee type; see notes
class           str3    %9s                   3 digit us pat class
subclass        str6    %9s                   patent subclass
gdate           int     %d                    grant, or issuance, date
industry        str15   %15s                  semi, software, or med devices
state_hq        str2    %9s                   firm hq location; vone
status06        str4    %9s                 * status of firm known in 2006; rhs truncation varies by sector
exitdate        str8    %9s                   exit date, if known
exityr          str4    %9s                   exit year, if known
status08        str6    %9s                 * status of firm in 2008, see notes
last_yr08       int     %8.0g               * exityr if ipo/acq, else 2008
dcohort         float   %9.0g                 1 if founding yr during 1987-99
lastyr08_minu~r float   %9.0g                 
dsearch_assign  float   %9.0g                 1 if searches of pat assignment data need to be conducted; carlosn confirm?
carlos_chk      float   %9.0g                 carlos: pls confirm assignment data = compiled for these pats
entityid        long    %12.0g                unique startup id as of 2008, vone
                                            * indicated variables have notes
----------------------------------------------------------------------------------------------------------------------

Detailed Data Processing

  • Get the VentureOne data ready
  1. Source file for VentureOne data E:\McNair\Projects\Venture One Data\Venture Data 1.xlsx Original data source
  2. Clean it up E:\McNair\Software\Scripts\Matcher\Input\Venture Data 1.txt extraneous symbols and words removed
  3. Match it against itself to get standardized entity names E:\McNair\Projects\Venture One Data\Cleaned and Matched Data.xlsx
  • Get the patent data ready
  1. Draw the distinct assignees Z:\allpatentsprocessed\DistinctAssignees2.txt
  2. Match them against themselves to get standardized org names for patent data Z:\allpatentsprocessed\DistinctAssignees2matched.txt
  • Match standardized org names of patent data to standardized entity names of venture data
Z:\allpatentsprocessed\Venture Patent Matched.txt
  • Join patent data to venture data to get patent information of each venture-backed company
  1. Join patent data to assignee data, creating firstjoin_cleaned which matches assignees to patent numbers.
  2. Join firstjoin_cleaned data to matchassignee data, creating secondjoin_cleaned which matches standard org names to patent numbers
  3. Join secondjoin_cleaned data to venturepatentmatched data, creating fourthjoin_cleaned which matches standard venture company names to patent numbers
  • Final summary tables
  1. Summary table displaying number of patents owned, minimum grant year, maximum grant year and average grant year for each company E:\McNair\Projects\Venture One Data\venturepatentreallyfinal.txt
  2. A table of all patent information for each company that has patents E:\McNair\Projects\Venture One Data\venturepatentfullyjoined.txt
  • Notes
  1. All data in allpatentsprocessed database. Access it by logging on to researcher@McNair DBServ:/bulk/allpatentsprocessed
  2. A script of detailed processing procedure can be found at E:\McNair\Projects\Venture One Data\patent data script.txt

The matched data

We are giving back two files:

  • One is at the patent level and contains information on 38,497 patents held by the 1,557 of the 3,357 companies.
  • The other file is at the company level and aggregate patent information for the 3,357 companies.