Difference between revisions of "Patent Data Specifications"

From edegan.com
Jump to navigation Jump to search
 
(39 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
Return to [[Patent Data Wiki Page]].  
 
Return to [[Patent Data Wiki Page]].  
 
==Background==
 
 
<section begin="pds" />
 
<section begin="pds" />
The patent database contains the merged dataset from the USPTO bulk data and Harvard Dataverse. This page explains the structure of the database and gives a complete list of tables. Each table has its own page which defines the table variables and their importance. Additionally, any known issues with a specific table is recorded on the table page.
+
The patent data has been separated into multiple databases based on data source or subject matter. Each database consists of several tables for which the known issues have been recorded For each database, the known issues with any table in the database have been recorded. The Patent Database contains the merged datasets from the USPTO bulk data and Harvard Dataverse using SQL. Specifics on how the datasets were merged are given in [[Patent Data Processing - SQL Steps]]. The Patent Database focuses on patents, patent litigation, patent maintenance, patent assignment, and other details on patent owners. The USPTO Assignees Database (version 2) focuses on patent assignments, a transaction between one or more patent owners with one or more parties where ownership or interest in one or more patents is assigned or shared. The database consists of historical assignment data provided by the USPTO in XML files. Specifics on how the database are given on the [[USPTO Assignees Data]] Page. <section end="pds" />
<section end="pds" />
 
 
 
==Tables==
 
 
 
[[Assignee Table | assignee]]  
 
 
 
[[Citation Table | citation]]
 
 
 
[[Fee Table | fee]]
 
 
 
[[Fee Status Table | feestatus]]
 
 
 
[[HistPatent Table | histpatent]]
 
 
 
[[LexJudge | lexjudge]]
 
 
 
[[Match Orgname Table | matchorgname]]
 
 
 
[[MSA List Table | msalist]]
 
 
 
[[MSA Wage Table | msawage]]
 
 
 
[[Patent Table | patent]]
 
 
 
[[PTO Assignee Table | PTOassignee]]
 
 
 
[[PTO Assignment | PTOassignment]]
 
 
 
[[PTOassignor]]
 
 
 
[[PTOproperty]]
 
 
 
[[PTOpatentfile]]
 
 
 
[[stdorgname]]
 
 
 
==Data Issues=====Citations Table===
 
The table has two columns, 'citingpatentnumber' and 'citedpatentnumber'. There are rows with 'citedpatentnumber' greater than 10000000. For instance:
 
 
 
  citingpatentnumber | citedpatentnumber
 
--------------------+-------------------
 
            9226901 |      102005013726
 
            9226905 |        101332187
 
            9226905 |        2006528175
 
            9226905 |        2011513304
 
            9226905 |    1020090061010
 
            9226905 |    1020110049808
 
            9226905 |        2010126349
 
            9226909 |        101340916
 
            9226909 |          10128910
 
            9226914 |        102318827
 
            9226914 |        2014109862
 
            9226915 |          10111049
 
            9226918 |        2008005345
 
            9226918 |        2008005345
 
            9226918 |        2008077092
 
            9226918 |        2008077092
 
            9226918 |        2008070268
 
            9226918 |        2008128126
 
            9226921 |        2008129994
 
            9226922 |        2012000595
 
            9226922 |        2012058127
 
            9226923 |        2010135524
 
            9226930 |        102002040
 
            9226930 |          19507522
 
            9226930 |          50106981
 
            9226930 |          53082783
 
            9226930 |          57052334
 
            9226930 |          10029979
 
            9226930 |          10045750
 
            9226931 |        2008203212
 
            9226931 |        2010227111
 
            9226931 |        2012068515
 
            9226931 |        2013106565
 
            9226932 |        103458880
 
            9226933 |        2006257751
 
            9226933 |        101366734
 
            9226933 |        101396048
 
            9226933 |        101926831
 
            9226933 |          20202562
 
            9226933 |      202005009120
 
            9226933 |          61063618
 
            9226937 |          11510473
 
            9226937 |        2001506579
 
            9226937 |        2001519791
 
            9226937 |        2003119127
 
            9226937 |        2003517831
 
            9226937 |        2005099761
 
            9226937 |        2006076681
 
            9226937 |        2006078941
 
            9226937 |        2006079021
 
            9226937 |        2006094209
 
            9226937 |        2006094210
 
            9226937 |        2006094233
 
            9226937 |        2006094235
 
 
 
 
 
allpatent_clone=# SELECT  COUNT(*) FROM citations WHERE citedpatentnumber>10000000;
 
  count 
 
---------
 
1411140
 
(1 row)
 
 
 
 
 
  allpatent_clone=# SELECT  COUNT(*) FROM citations WHERE citedpatentnumber IS NULL;
 
  count 
 
----------
 
  23516667
 
(1 row)
 
 
 
allpatent_clone=# SELECT  COUNT(*) FROM citations;
 
  count 
 
----------
 
  97680838
 
(1 row)
 
 
 
====Possible Solution====
 
Jul 8, 2016: The blank citedpatentnumbers were created due to inconsistency between the original type and the type in the citation table (string to integer). The blank entries mostly correspond to publication number, non-U.S. patent number, and non-standardized patent number. The next step would be to recreate the table accounting for these issues.  The U.S. publication number could be matched to the publications numbers in the histpatent table and be replaced by the corresponding patent numbers.
 
 
 
The foreign cited patents will be moved to a separate table.
 
 
 
===Assignees Table===
 
'Country' is missing.
 
 
 
allpatent_clone=# SELECT COUNT(*) FROM assignees WHERE country=' ';
 
  count 
 
---------
 
2361543
 
(1 row)
 
 
 
 
 
allpatent_clone=# SELECT COUNT(*) FROM assignees WHERE country='unknown';
 
count
 
-------
 
  3918
 
(1 row)
 
 
 
allpatent_clone=# SELECT COUNT(*) FROM assignees WHERE country IN ('unknown',' ') AND state=' ';
 
  count 
 
---------
 
1851353
 
(1 row)
 
 
 
 
 
UPDATE: The source of the problem seems to be the Harvard Dataverse.
 
 
 
===Assignees & assigneesUSU Tables===
 
 
 
No information provided about the assignee. No entries for orgname or first and last names.
 
  
allpatent_clone=# SELECT COUNT(patentnumber) FROM assignees WHERE (orgname='' OR orgname IS NULL) AND (firstname='' OR firstname IS NULL) AND (lastname='' OR lastname IS NULL);
+
==Databases==
count 
 
--------
 
344794
 
(1 row)
 
  
AssigneesUSU table was made with the following code:
+
[[Patent]]
  
SELECT orgname, patentnumber, country, firstname, lastname, state
+
[[USPTOAssigneesData]]
INTO assigneesUSU -- assignees in US,unknown, and blank entries
 
FROM assignees
 
WHERE country IN ('US', '', 'unknown') OR (state IS NOT NULL AND state!='')
 
ORDER BY orgname;
 
  
allpatent_clone=# SELECT COUNT(patentnumber) FROM assigneesUSU WHERE (orgname='' OR orgname IS NULL) AND (firstname='' OR firstname IS NULL) AND (lastname='' OR lastname IS NULL)
+
==References==
count 
+
<ref name=IEEE > [https://www.ieee.org/documents/ieee_why_inventors_reference.pdf] 'Why do Inventors Reference Papers and Patents in their Patent Applications?', IEEE (2010). </ref>
--------
+
[[Category: Internal]]
344793
+
[[Internal Classification: Legacy| ]]
(1 row)
 

Latest revision as of 16:51, 2 September 2016

Return to Patent Data Wiki Page.

The patent data has been separated into multiple databases based on data source or subject matter. Each database consists of several tables for which the known issues have been recorded For each database, the known issues with any table in the database have been recorded. The Patent Database contains the merged datasets from the USPTO bulk data and Harvard Dataverse using SQL. Specifics on how the datasets were merged are given in Patent Data Processing - SQL Steps. The Patent Database focuses on patents, patent litigation, patent maintenance, patent assignment, and other details on patent owners. The USPTO Assignees Database (version 2) focuses on patent assignments, a transaction between one or more patent owners with one or more parties where ownership or interest in one or more patents is assigned or shared. The database consists of historical assignment data provided by the USPTO in XML files. Specifics on how the database are given on the USPTO Assignees Data Page.

Databases

Patent

USPTOAssigneesData

References

[1]

  1. [1] 'Why do Inventors Reference Papers and Patents in their Patent Applications?', IEEE (2010).