Difference between revisions of "Patent Schema Reconciliation"
(11 intermediate revisions by the same user not shown) | |||
Line 28: | Line 28: | ||
'''Details from [[Joe Reilly]] [[Work Logs]] [[Joe Reilly (Work Log)|(log page)]]''' | '''Details from [[Joe Reilly]] [[Work Logs]] [[Joe Reilly (Work Log)|(log page)]]''' | ||
− | + | Added xpaths to both [[Equivalent XPath and APS Queries]], and Patent Schema Reconciliation text.txt in E:\McNair\Projects\SimplerPatentData\data\examples\Patent Schema Reconciliation for the following nodes. In [[Equivalent XPath and APS Queries]], listed examples: | |
+ | Granted[edit] | ||
− | |||
− | |||
− | + | strings | |
− | |||
− | + | ||
+ | |||
+ | PATENT_TYPE | ||
+ | |||
+ | TITLE | ||
+ | |||
+ | PCT_DOCUMENT_NUMBER | ||
+ | |||
+ | PATENT_COUNTRY | ||
+ | |||
+ | PATENT_NUMBER | ||
+ | |||
+ | PATENT_KIND | ||
+ | |||
+ | PATENT_GRANT_DATE | ||
+ | |||
+ | APPLICATION_NUMBER | ||
+ | |||
+ | APPLICATION_FILING_DATE | ||
+ | |||
+ | PRIORITY_CLAIMS_DATE | ||
+ | |||
+ | PRIORITY_CLAIMS_COUNTRY | ||
+ | |||
+ | PRIORITY_CLAIMS_PATENT_NUMBER | ||
+ | |||
+ | IPCR_SUBCLASS | ||
+ | |||
+ | IPCR_MAIN_GROUP | ||
+ | |||
+ | IPCR_SUB_GROUP | ||
+ | |||
+ | CPC_SUBCLASS | ||
+ | |||
+ | CPC_MAIN_GROUP | ||
+ | |||
+ | CPC_SUB_GROUP | ||
+ | |||
+ | CLASSIFICATION_NATIONAL_COUNTRY | ||
+ | |||
+ | CLASSIFICATION_NATIONAL_CLASS | ||
+ | |||
+ | PRIMARY_EXAMINER_FIRST_NAME | ||
+ | |||
+ | PRIMARY_EXAMINER_LAST_NAME | ||
+ | |||
+ | PRIMARY_EXAMINER_DEPARTMENT | ||
+ | |||
+ | |||
+ | |||
+ | numbers | ||
+ | |||
+ | |||
+ | NUMBER_OF_CLAIMS | ||
+ | |||
+ | |||
+ | |||
+ | applicants | ||
+ | |||
+ | |||
+ | |||
+ | SEQUENCE | ||
+ | |||
+ | LAST_NAME | ||
+ | |||
+ | FIRST_NAME | ||
+ | |||
+ | ORG_NAME | ||
+ | |||
+ | CITY | ||
+ | |||
+ | COUNTRY | ||
+ | |||
+ | STATE | ||
+ | |||
+ | ADDRESS | ||
+ | |||
+ | POSTCODE | ||
+ | |||
+ | |||
+ | citations | ||
+ | |||
+ | |||
+ | |||
+ | CITATION_DESCRIPTION | ||
+ | |||
+ | CITATION NUMBER | ||
+ | |||
+ | NPL CITATION NUMBER | ||
+ | |||
+ | COUNTRY | ||
+ | |||
+ | CITATIONS DOC NUMBER | ||
+ | |||
+ | CITATIONS KIND | ||
+ | |||
+ | CITATIONS NAME | ||
+ | |||
+ | CITATIONS DATE | ||
+ | |||
+ | SEQUENCE | ||
+ | |||
+ | LAST_NAME | ||
+ | |||
+ | FIRST_NAME | ||
+ | |||
+ | CITY | ||
+ | |||
+ | COUNTRY | ||
+ | |||
+ | STATE | ||
+ | |||
+ | ADDRESS | ||
+ | |||
+ | LAST_NAME | ||
+ | |||
+ | FIRST_NAME | ||
+ | |||
+ | ORG_NAME | ||
+ | |||
+ | CITY | ||
+ | |||
+ | COUNTRY | ||
+ | |||
+ | STATE | ||
+ | |||
+ | ADDRESS | ||
+ | |||
+ | |||
+ | |||
+ | lawyers | ||
+ | |||
+ | SEQUENCE | ||
+ | |||
+ | FIRST_NAME | ||
+ | |||
+ | ORG_NAME | ||
==Useful links== | ==Useful links== |
Latest revision as of 17:56, 14 November 2017
Example files
E:\McNair\Projects\SimplerPatentData\data\examples
There are two sets:
- Granted
- Applications
Applications contains just utility and some plant, whereas granted contains design, plant, reissue, and utility patents (i.e., all four types of patents). Both applications and granted have multiple versions (e.g., v4.5, v4.4, v4.3, ..., etc.).
The Task
For both sets (starting with granted), all types, and all versions, we need to identify the xpath (or APS equivalent, see below) for each node.
A node is something like:
- patent number (it shows up as document_id)
- filing number (it also shows up as a document_id in another place)
- grant date
- kind
- type
- applicationnumber
- filingdate
Some nodes are lists of other nodes, for example the assignees node contains multiple assignment records.
Task Notes
Details from Joe Reilly Work Logs (log page)
Added xpaths to both Equivalent XPath and APS Queries, and Patent Schema Reconciliation text.txt in E:\McNair\Projects\SimplerPatentData\data\examples\Patent Schema Reconciliation for the following nodes. In Equivalent XPath and APS Queries, listed examples:
Granted[edit]
strings
PATENT_TYPE
TITLE
PCT_DOCUMENT_NUMBER
PATENT_COUNTRY
PATENT_NUMBER
PATENT_KIND
PATENT_GRANT_DATE
APPLICATION_NUMBER
APPLICATION_FILING_DATE
PRIORITY_CLAIMS_DATE
PRIORITY_CLAIMS_COUNTRY
PRIORITY_CLAIMS_PATENT_NUMBER
IPCR_SUBCLASS
IPCR_MAIN_GROUP
IPCR_SUB_GROUP
CPC_SUBCLASS
CPC_MAIN_GROUP
CPC_SUB_GROUP
CLASSIFICATION_NATIONAL_COUNTRY
CLASSIFICATION_NATIONAL_CLASS
PRIMARY_EXAMINER_FIRST_NAME
PRIMARY_EXAMINER_LAST_NAME
PRIMARY_EXAMINER_DEPARTMENT
numbers
NUMBER_OF_CLAIMS
applicants
SEQUENCE
LAST_NAME
FIRST_NAME
ORG_NAME
CITY
COUNTRY
STATE
ADDRESS
POSTCODE
citations
CITATION_DESCRIPTION
CITATION NUMBER
NPL CITATION NUMBER
COUNTRY
CITATIONS DOC NUMBER
CITATIONS KIND
CITATIONS NAME
CITATIONS DATE
SEQUENCE
LAST_NAME
FIRST_NAME
CITY
COUNTRY
STATE
ADDRESS
LAST_NAME
FIRST_NAME
ORG_NAME
CITY
COUNTRY
STATE
ADDRESS
lawyers
SEQUENCE
FIRST_NAME
ORG_NAME
Useful links
The Equivalent_XPath_and_APS_Queries#Query_Equivalences page has example XPath statements
The Reproducible_Patent_Data#Schema_Reconciliation page shows which schemas are associated with which year
The Patent_Data_Extraction_Scripts_(Tool)#Utility_patent_grants_fields pages has examples of nodes and where to find them for utility patents (XML version 4.4, I think).