Go to https://bulkdata.uspto.gov/ to bulk data from USPTO.
To see a description of what each file the USPTO bulk data contains, go to the bulk drive and navigate to McNair/Projects/Redesigning Patent Database/2017BulkDataProductDescriptions. This gives an overview, but does not explain how the XML files are structured. Those are the DTDs.
For assignment data, we pull from https://bulkdata.uspto.gov/data2/patent/assignment/. A description of all the elements in an USPTO assignment XML file is called a DTD, and the one for the assignment data can be found in the bulk drive under McNair/Project/Redesigning Patent Database/USPTO Assignment DTD
I'm currently looking at the DTD DTsD for USPTO patent data (which has 2005 and up DTDs have to be opened by Microsoft Visual Studio) to ascertain if there are any fields we're not currently pulling from the bulk data that we should. I am using the following link to figure out how to read a DTD: http://www.ldodds.com/delta/dtd_guide.html. There are only DTDs for 2005 and up, but there is a very long pdf that appears to detail the format of patent files pre - 2005. It has been saved under McNair/Projects/Redesigning Patent Database/
For example'''Pre - 2005'''According to the documentation, the inventor's name and city are required to be listed, and the street, state, and country of the inventor may also be listed. Paragraphs of abstract may also be listed under logical group "Abstract" in a field called "abstract 2005 - prensetThe USPTO patent data for 2005 seems to include multiple paragraphs for the abstract under an element called "abstract". I've included the line from the DTD below:
<!--A concise summary of the disclosure.-->
<!ELEMENT abstract (doc-page+ | (abst-problem , abst-solution) | p+)>
I will keep looking through DTDs and figure out how far back we have abstractsAn abstract is required to be included for all patents that are not design patents.
== Test Plan ==