|
|
(39 intermediate revisions by one other user not shown) |
Line 1: |
Line 1: |
| Return to [[Patent Data Wiki Page]]. | | Return to [[Patent Data Wiki Page]]. |
− |
| |
− | ==Background==
| |
| <section begin="pds" /> | | <section begin="pds" /> |
− | The patent database contains the merged dataset from the USPTO bulk data and Harvard Dataverse. This page explains the structure of the database and gives a complete list of tables. Each table has its own page which defines the table variables and their importance. Additionally, any known issues with a specific table is recorded on the table page. | + | The patent data has been separated into multiple databases based on data source or subject matter. Each database consists of several tables for which the known issues have been recorded For each database, the known issues with any table in the database have been recorded. The Patent Database contains the merged datasets from the USPTO bulk data and Harvard Dataverse using SQL. Specifics on how the datasets were merged are given in [[Patent Data Processing - SQL Steps]]. The Patent Database focuses on patents, patent litigation, patent maintenance, patent assignment, and other details on patent owners. The USPTO Assignees Database (version 2) focuses on patent assignments, a transaction between one or more patent owners with one or more parties where ownership or interest in one or more patents is assigned or shared. The database consists of historical assignment data provided by the USPTO in XML files. Specifics on how the database are given on the [[USPTO Assignees Data]] Page. <section end="pds" /> |
− | <section end="pds" />
| |
− | | |
− | ==Tables==
| |
− | | |
− | [[Assignee Table | assignee]] | |
− | | |
− | [[Citation Table | citation]]
| |
− | | |
− | [[Fee Table | fee]]
| |
− | | |
− | [[Fee Status Table | feestatus]]
| |
− | | |
− | [[HistPatent Table | histpatent]]
| |
− | | |
− | [[LexJudge | lexjudge]]
| |
− | | |
− | [[Match Orgname Table | matchorgname]]
| |
− | | |
− | [[MSA List Table | msalist]]
| |
− | | |
− | [[MSA Wage Table | msawage]]
| |
− | | |
− | [[Patent Table | patent]]
| |
− | | |
− | [[PTO Assignee Table | PTOassignee]]
| |
− | | |
− | [[PTO Assignment | PTOassignment]]
| |
− | | |
− | [[PTOassignor]]
| |
− | | |
− | [[PTOproperty]]
| |
− | | |
− | [[PTOpatentfile]]
| |
− | | |
− | [[stdorgname]]
| |
− | | |
− | ==Data Issues=====Citations Table===
| |
− | The table has two columns, 'citingpatentnumber' and 'citedpatentnumber'. There are rows with 'citedpatentnumber' greater than 10000000. For instance:
| |
− | | |
− | citingpatentnumber | citedpatentnumber
| |
− | --------------------+-------------------
| |
− | 9226901 | 102005013726
| |
− | 9226905 | 101332187
| |
− | 9226905 | 2006528175
| |
− | 9226905 | 2011513304
| |
− | 9226905 | 1020090061010
| |
− | 9226905 | 1020110049808
| |
− | 9226905 | 2010126349
| |
− | 9226909 | 101340916
| |
− | 9226909 | 10128910
| |
− | 9226914 | 102318827
| |
− | 9226914 | 2014109862
| |
− | 9226915 | 10111049
| |
− | 9226918 | 2008005345
| |
− | 9226918 | 2008005345
| |
− | 9226918 | 2008077092
| |
− | 9226918 | 2008077092
| |
− | 9226918 | 2008070268
| |
− | 9226918 | 2008128126
| |
− | 9226921 | 2008129994
| |
− | 9226922 | 2012000595
| |
− | 9226922 | 2012058127
| |
− | 9226923 | 2010135524
| |
− | 9226930 | 102002040
| |
− | 9226930 | 19507522
| |
− | 9226930 | 50106981
| |
− | 9226930 | 53082783
| |
− | 9226930 | 57052334
| |
− | 9226930 | 10029979
| |
− | 9226930 | 10045750
| |
− | 9226931 | 2008203212
| |
− | 9226931 | 2010227111
| |
− | 9226931 | 2012068515
| |
− | 9226931 | 2013106565
| |
− | 9226932 | 103458880
| |
− | 9226933 | 2006257751
| |
− | 9226933 | 101366734
| |
− | 9226933 | 101396048
| |
− | 9226933 | 101926831
| |
− | 9226933 | 20202562
| |
− | 9226933 | 202005009120
| |
− | 9226933 | 61063618
| |
− | 9226937 | 11510473
| |
− | 9226937 | 2001506579
| |
− | 9226937 | 2001519791
| |
− | 9226937 | 2003119127
| |
− | 9226937 | 2003517831
| |
− | 9226937 | 2005099761
| |
− | 9226937 | 2006076681
| |
− | 9226937 | 2006078941
| |
− | 9226937 | 2006079021
| |
− | 9226937 | 2006094209
| |
− | 9226937 | 2006094210
| |
− | 9226937 | 2006094233
| |
− | 9226937 | 2006094235
| |
− | | |
− | | |
− | allpatent_clone=# SELECT COUNT(*) FROM citations WHERE citedpatentnumber>10000000;
| |
− | count
| |
− | ---------
| |
− | 1411140
| |
− | (1 row)
| |
− | | |
− | | |
− | allpatent_clone=# SELECT COUNT(*) FROM citations WHERE citedpatentnumber IS NULL;
| |
− | count
| |
− | ----------
| |
− | 23516667
| |
− | (1 row)
| |
− | | |
− | allpatent_clone=# SELECT COUNT(*) FROM citations;
| |
− | count
| |
− | ----------
| |
− | 97680838
| |
− | (1 row)
| |
− | | |
− | ====Possible Solution====
| |
− | Jul 8, 2016: The blank citedpatentnumbers were created due to inconsistency between the original type and the type in the citation table (string to integer). The blank entries mostly correspond to publication number, non-U.S. patent number, and non-standardized patent number. The next step would be to recreate the table accounting for these issues. The U.S. publication number could be matched to the publications numbers in the histpatent table and be replaced by the corresponding patent numbers.
| |
− | | |
− | The foreign cited patents will be moved to a separate table.
| |
− | | |
− | ===Assignees Table===
| |
− | 'Country' is missing.
| |
− | | |
− | allpatent_clone=# SELECT COUNT(*) FROM assignees WHERE country=' ';
| |
− | count
| |
− | ---------
| |
− | 2361543
| |
− | (1 row)
| |
− | | |
− | | |
− | allpatent_clone=# SELECT COUNT(*) FROM assignees WHERE country='unknown';
| |
− | count
| |
− | -------
| |
− | 3918
| |
− | (1 row)
| |
− | | |
− | allpatent_clone=# SELECT COUNT(*) FROM assignees WHERE country IN ('unknown',' ') AND state=' ';
| |
− | count
| |
− | ---------
| |
− | 1851353
| |
− | (1 row)
| |
− | | |
− | | |
− | UPDATE: The source of the problem seems to be the Harvard Dataverse.
| |
− | | |
− | ===Assignees & assigneesUSU Tables=== | |
− | | |
− | No information provided about the assignee. No entries for orgname or first and last names.
| |
| | | |
− | allpatent_clone=# SELECT COUNT(patentnumber) FROM assignees WHERE (orgname='' OR orgname IS NULL) AND (firstname='' OR firstname IS NULL) AND (lastname='' OR lastname IS NULL);
| + | ==Databases== |
− | count
| |
− | --------
| |
− | 344794
| |
− | (1 row)
| |
| | | |
− | AssigneesUSU table was made with the following code:
| + | [[Patent]] |
| | | |
− | SELECT orgname, patentnumber, country, firstname, lastname, state
| + | [[USPTOAssigneesData]] |
− | INTO assigneesUSU -- assignees in US,unknown, and blank entries
| |
− | FROM assignees
| |
− | WHERE country IN ('US', '', 'unknown') OR (state IS NOT NULL AND state!='')
| |
− | ORDER BY orgname;
| |
| | | |
− | allpatent_clone=# SELECT COUNT(patentnumber) FROM assigneesUSU WHERE (orgname='' OR orgname IS NULL) AND (firstname='' OR firstname IS NULL) AND (lastname='' OR lastname IS NULL);
| + | ==References== |
− | count
| + | <ref name=IEEE > [https://www.ieee.org/documents/ieee_why_inventors_reference.pdf] 'Why do Inventors Reference Papers and Patents in their Patent Applications?', IEEE (2010). </ref> |
− | --------
| + | [[Category: Internal]] |
− | 344793
| + | [[Internal Classification: Legacy| ]] |
− | (1 row)
| |