There has been lots of work on storing information about the patents in databases, including methods of cleaning the data, what data should be included, etc. Some of it is obsolete and some of it is incorrect. Generally, the newer pages are going to be the most relevant, but it can be helpful to see what is done in the past, especially since some methodology (like the cleaning the data) hasn't changed that much.
==Joe's Work==
Work (likely finished): Identified paths within the XML examples for utility, reissue, plant, and design patents, for versions 4.0-4.5, from E:\McNair\Projects\SimplerPatentData\data\examples\granted. Only the granted folder was done. Initially, some xpaths were saved in E:\McNair\Projects\SimplerPatentData\data\examples\Patent Schema Reconciliation as a text file also. Xpaths identified for the following nodes:
strings section:
PATENT_TYPE
TITLE
PCT_DOCUMENT_NUMBER
PATENT_COUNTRY
PATENT_NUMBER
PATENT_KIND
PATENT_GRANT_DATE
APPLICATION_NUMBER
APPLICATION_FILING_DATE
PRIORITY_CLAIMS_DATE
PRIORITY_CLAIMS_COUNTRY
PRIORITY_CLAIMS_PATENT_NUMBER
IPCR_SUBCLASS
IPCR_MAIN_GROUP
IPCR_SUB_GROUP
CPC_SUBCLASS
CPC_MAIN_GROUP
CPC_SUB_GROUP
CLASSIFICATION_NATIONAL_COUNTRY
CLASSIFICATION_NATIONAL_CLASS
PRIMARY_EXAMINER_FIRST_NAME
PRIMARY_EXAMINER_LAST_NAME
PRIMARY_EXAMINER_DEPARTMENT
numbers section:
NUMBER_OF_CLAIMS
applicants section:
SEQUENCE
LAST_NAME
FIRST_NAME
ORG_NAME
CITY
COUNTRY
STATE
ADDRESS
POSTCODE
citations section
CITATION_DESCRIPTION
CITATION NUMBER
NPL CITATION NUMBER
COUNTRY
CITATIONS DOC NUMBER
CITATIONS KIND
CITATIONS NAME
CITATIONS DATE
SEQUENCE
LAST_NAME
FIRST_NAME
CITY
COUNTRY
STATE
ADDRESS
LAST_NAME
FIRST_NAME
ORG_NAME
CITY
COUNTRY
STATE
ADDRESS
lawyers section
SEQUENCE
FIRST_NAME
ORG_NAME
==Shelby's Work==