Changes

Jump to navigation Jump to search
no edit summary
{{Project
|Has project output=Tool
|Has sponsor=McNair Center
|Has title=Patent Data Extraction Scripts (Tool)
|Has owner=Marcela Interiano,
|Has project status=Subsume
|Has keywords=Tool
}}
 
===Patent applications===
 
Note that our application data appears to be ONLY utility patents, except for a few plant patents.
 
At the top level, in spec 4.0 (and presumably others) there are:
<us-patent-application lang="EN" dtd-version="v4.0 2004-12-02" file="US20050000001A1-20050106.XML"
status="PARALLEL-RUN" id="us-patent-application" country="US" date-produced="20041222" date-publ="20050106">
<us-bibliographic-data-application lang="EN" country="US">
...
</us-bibliographic-data-application>
<abstract id="abstract">
</abstract>
<drawings id="DRAWINGS">
</drawings>
<description id="description">
<?summary-of-invention description="Summary of Invention" end="lead"?>
<?summary-of-invention description="Summary of Invention" end="tail"?>
<?brief-description-of-drawings description="Brief Description of Drawings" end="lead"?>
<?brief-description-of-drawings description="Brief Description of Drawings" end="tail"?>
<?detailed-description description="Detailed Description" end="lead"?>
<?detailed-description description="Detailed Description" end="tail"?>
</description>
<claims id="claims">
</claims>
</us-patent-application>
 
We are currently processing only:
<us-bibliographic-data-application lang="EN" country="US">
...
</us-bibliographic-data-application>
 
===Utility patent grants fields===
The XML files for patent data are available at
*https://bulkdata.uspto.gov/
*http://patents.reedtech.com/patent-products.php
 
Patent data up to year 2015 can also be obtained from https://www.google.com/googlebooks/uspto-patents.html. This repository is no longer updated.
 
Each XML file contains, in order, sorted by document ID:
#Design patents
#Plant patents
#Reissues
#Utility patents
 
====Overview====
 
DESIGN Patents:
 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v45-2014-04-03.dtd" [ ]>
<us-patent-grant lang="EN" dtd-version="v4.5 2014-04-03" file="USD0774273-20161220.XML"
status="PRODUCTION" id="us-patent-grant" country="US" date-produced="20161205" date-publ="20161220">
<us-bibliographic-data-grant>
</us-bibliographic-data-grant>
<drawings id="DRAWINGS">
</drawings>
<description id="description">
<?brief-description-of-drawings description="Brief Description of Drawings" end="lead"?>
<description-of-drawings>
</description-of-drawings>
<?brief-description-of-drawings description="Brief Description of Drawings" end="tail"?>
</description>
<us-claim-statement>CLAIM</us-claim-statement>
<claims id="claims">
</claims>
</us-patent-grant>
 
====Patent====
<onlyinclude>
*patent number
*kind: http://www.uspto.gov/patents-application-process/patent-search/authority-files/uspto-kind-codes
*grantdate
</onlyinclude>
For version 4.5:
<publication-reference>
</document-id>
</publication-reference>
<onlyinclude>
*type
*applicationnumber
*filingdate
</onlyinclude>
<application-reference appl-type="utility">
<document-id>
</application-reference>
<onlyinclude>
For priority, if there is more than 1, we want sequence 01
*prioritydate
*prioritycountry (should use ISO country codes - may need a lookup table)
*prioritypatentnumber
</onlyinclude>
*'''find 4.3 file with priority claim'''
</priority-claim>
</priority-claims>
<onlyinclude>Classification IPC </onlyinclude>- we only need the first one: http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf
*Section, Class, SubClass - Together these concord to US subclass: http://www.uspto.gov/web/patents/classification/international/ipc/ipc8/ipc_concordance/ipcsel.htm#a
*MainGroup, SubGroup
...
</classifications-ipcr>
<onlyinclude>
Classification CPC - we only need the main one
*Section, Class, Subclass
*Main Group, Subgroup
</onlyinclude>
*'''v 4.2, 4.3, 4.4 does not have this'''
 
<classifications-cpc>
<main-cpc>
</main-cpc>
</classifications-cpc>
<onlyinclude>
Classification National: Note that the one below comes out to 2/2.11 (http://www.google.com/patents/US8925112#classifications)
*Country
*Class
</onlyinclude>
'''THIS IS NOT UNIQUE. What classifications are we searching for?'''
<classification-national>
<main-classification>2 211</main-classification>
</classification-national>
<onlyinclude>Title of the patent</onlyinclude>:
<invention-title id="d2e61">Aircrew ensembles</invention-title>
<onlyinclude>Number of Claims</onlyinclude>:
<number-of-claims>12</number-of-claims>
<onlyinclude>
Primary examiner:
*FirstName, LastName, Department
</onlyinclude>
<examiners>
<primary-examiner>
...
</examiners>
<onlyinclude>
PCT/Regional Patent Number:
</onlyinclude>
*PCTNumber (just the doc number - if it starts with PCT set a flag)
*'''not in all v 4.5'''
====Citations====
<onlyinclude>
Patent Citations (we need all of them):
*CitingPatentNumber (from the patent)
*CitingPatentCountry (from the patent)
</onlyinclude>
<publication-reference>
<document-id>
</document-id>
</publication-reference>
<onlyinclude>
*CitedPatentNumber
*CitedPatentCountry
</onlyinclude>
*'''V 4.2 does not have <us-references-cited>
...
</us-references-cited>
<onlyinclude>
For non-patent references, we are just going to count them:
*NoNonPatRefs
</onlyinclude>
<us-references-cited>
...
*'''For v 4.3, 4.4, 4.5'''
<onlyinclude>
*PatentNumber (and country) to build a key
*We need a "standard" name and address object for each inventor</onlyinclude> 
<us-parties>
<us-applicants>
...
</parties>
<onlyinclude>
====Assignees====
*PatentNumber (and country) to build a key
*We need a "standard" name and address object for each assignee
</onlyinclude>
<assignees>
<assignee>
</assignees>
<onlyinclude> For further information on Assignee data from the USPTO, see [[USPTO Assignees Data]]. ====Other things we might wantFields with Potential====
*Abstract
*Claims (other than their count)
</onlyinclude>
====Things we don't need====
I have also downloaded all of them on to the database server and can be found by
cd /bulk/patent
 
[[Category:Patent]]

Navigation menu