Difference between revisions of "Data Model (Deprecated)"
(add note to current schema per "Redesigning Patent Database") |
|||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | '''See [[Patent]] for current implementation''' | |
The USPTO and Harvard Dataverse data has been combined into one database spanning from 1901 to 2016. The image below summarizes the data model the database was constructed to follow. | The USPTO and Harvard Dataverse data has been combined into one database spanning from 1901 to 2016. The image below summarizes the data model the database was constructed to follow. |
Latest revision as of 08:55, 24 May 2017
See Patent for current implementation
The USPTO and Harvard Dataverse data has been combined into one database spanning from 1901 to 2016. The image below summarizes the data model the database was constructed to follow.
XML Schema
Tags we are using:
- CPC Classification: https://en.wikipedia.org/wiki/Cooperative_Patent_Classification
- IPC - these are good and we just need the main classification
- USPC - just a numeric but not split. Is 22431 224/31 or 22/431, etc.
Tags we aren't using:
- Kind codes: http://www.uspto.gov/learning-and-resources/support-centers/electronic-business-center/kind-codes-included-uspto-patent
- Series codes: http://www.uspto.gov/web/offices/ac/ido/oeip/taf/filingyr.htm
Fields of Interest
In order to satisfy the data model, the following fields were of particular interest when extracting the data from the XML files and placing them in tables.
- patent number
- kind: http://www.uspto.gov/patents-application-process/patent-search/authority-files/uspto-kind-codes
- grantdate
- type
- applicationnumber
- filingdate
For priority, if there is more than 1, we want sequence 01
- prioritydate
- prioritycountry (should use ISO country codes - may need a lookup table)
- prioritypatentnumber
Classification IPC Classification CPC - we only need the main one
CPC is a classification scheme set up by the USPTO and the European Patent Office (EPO). The first classification codes rolled out on November 9, 2012.[1] Full implementation of the CPC classification system occurred on January 2015, at the same time of version 4.5 of the USPTO patent bulk data.[2]
- Section, Class, Subclass
- Main Group, Subgroup
Classification National: Note that the one below comes out to 2/2.11 (http://www.google.com/patents/US8925112#classifications)
- Country
- Class
Title of the patent Number of Claims Primary examiner:
- FirstName, LastName, Department
PCT/Regional Patent Number:
Patent Citations (we need all of them):
- CitingPatentNumber (from the patent)
- CitingPatentCountry (from the patent)
- CitedPatentNumber
- CitedPatentCountry
For non-patent references, we are just going to count them:
- NoNonPatRefs
- PatentNumber (and country) to build a key
- We need a standard name and address object for each inventor
Assignees
- PatentNumber (and country) to build a key
- We need a "standard" name and address object for each assignee
For further information on Assignee data from the USPTO, see USPTO Assignees Data.
Fields with Potential
- Abstract
- Claims (other than their count)