Changes
Jump to navigation
Jump to search
no edit summary
[https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/15705 Harvard Dataverse]
==New Notes==
The source files have transitioned from here:
*https://www.google.com/googlebooks/uspto-patents-grants-text.html (No longer maintained)
To:
*https://bulkdata.uspto.gov/ (includes 2016 data)
The historic data is the same both sides.
Each file contains, in order, sorted by document ID:
#Design patents (we will discard)
#Plant patents (we will discard)
#Reissues (we probably want them)
#Utility patents (we want them)
The classifications in the XML file are:
*IPC - these are good and we just need the main classification
*CPC - as above
*USPC - just a numeric but not split. Is 22431 224/31 or 22/431, etc.