Difference between revisions of "Patent Data"

From edegan.com
Jump to navigation Jump to search
imported>Ed
imported>Ed
Line 106: Line 106:
 
*pubdate
 
*pubdate
 
  <document-id>
 
  <document-id>
<country>US</country>
+
<country>US</country>
<doc-number>08925112</doc-number>
+
<doc-number>08925112</doc-number>
<kind>B2</kind>
+
<kind>B2</kind>
<date>20150106</date>
+
<date>20150106</date>
 
  </document-id>
 
  </document-id>
  
Line 116: Line 116:
 
*filingdate
 
*filingdate
 
  <application-reference appl-type="utility">
 
  <application-reference appl-type="utility">
<document-id>
+
<document-id>
<country>US</country>
+
<country>US</country>
<doc-number>13824291</doc-number>
+
<doc-number>13824291</doc-number>
<date>20110929</date>
+
<date>20110929</date>
</document-id>
+
</document-id>
 
  </application-reference>
 
  </application-reference>
  
Line 129: Line 129:
  
 
  <priority-claims>
 
  <priority-claims>
<priority-claim sequence="01" kind="national">
+
<priority-claim sequence="01" kind="national">
<country>GB</country>
+
<country>GB</country>
<doc-number>1016384.8</doc-number>
+
<doc-number>1016384.8</doc-number>
<date>20100930</date>
+
<date>20100930</date>
</priority-claim>
+
</priority-claim>
 
  </priority-claims>
 
  </priority-claims>
  
Line 141: Line 141:
  
 
  <classifications-ipcr>
 
  <classifications-ipcr>
<classification-ipcr>
+
<classification-ipcr>
<ipc-version-indicator>
+
<ipc-version-indicator>
<date>20060101</date>
+
<date>20060101</date>
</ipc-version-indicator>
+
</ipc-version-indicator>
<classification-level>A</classification-level>
+
<classification-level>A</classification-level>
<section>B</section>
+
<section>B</section>
<class>64</class>
+
<class>64</class>
<subclass>G</subclass>
+
<subclass>G</subclass>
<main-group>6</main-group>
+
<main-group>6</main-group>
<subgroup>00</subgroup>
+
<subgroup>00</subgroup>
<symbol-position>F</symbol-position>
+
<symbol-position>F</symbol-position>
<classification-value>I</classification-value>
+
<classification-value>I</classification-value>
...
+
...
</classification-ipcr>
+
</classification-ipcr>
...
+
...
 
  </classifications-ipcr>
 
  </classifications-ipcr>
  
Line 164: Line 164:
 
  <main-cpc>
 
  <main-cpc>
 
  <classification-cpc>
 
  <classification-cpc>
<cpc-version-indicator>
+
<cpc-version-indicator>
<date>20130101</date>
+
<date>20130101</date>
</cpc-version-indicator>
+
</cpc-version-indicator>
<section>B</section>
+
<section>B</section>
<class>64</class>
+
<class>64</class>
<subclass>D</subclass>
+
<subclass>D</subclass>
<main-group>10</main-group>
+
<main-group>10</main-group>
<subgroup>00</subgroup>
+
<subgroup>00</subgroup>
<symbol-position>F</symbol-position>
+
<symbol-position>F</symbol-position>
<classification-value>I</classification-value>
+
<classification-value>I</classification-value>
...  
+
...  
 
  </classification-cpc>
 
  </classification-cpc>
 
</main-cpc>
 
</main-cpc>
Line 183: Line 183:
  
 
<classification-national>
 
<classification-national>
<country>US</country>
+
<country>US</country>
<main-classification>2 211</main-classification>
+
<main-classification>2 211</main-classification>
 
</classification-national>
 
</classification-national>
  
Line 198: Line 198:
 
  <examiners>
 
  <examiners>
 
  <primary-examiner>
 
  <primary-examiner>
<last-name>Patel</last-name>
+
<last-name>Patel</last-name>
<first-name>Tejash</first-name>
+
<first-name>Tejash</first-name>
<department>3765</department>
+
<department>3765</department>
</primary-examiner>
+
</primary-examiner>
 
  ...
 
  ...
 
  </examiners>
 
  </examiners>
Line 209: Line 209:
  
 
  <pct-or-regional-filing-data>
 
  <pct-or-regional-filing-data>
<document-id>
+
<document-id>
<country>WO</country>
+
<country>WO</country>
<doc-number>PCT/EP2011/067014</doc-number>
+
<doc-number>PCT/EP2011/067014</doc-number>
<kind>00</kind>
+
<kind>00</kind>
<date>20110929</date>
+
<date>20110929</date>
</document-id>
+
</document-id>
 
  ...
 
  ...
 
  </pct-or-regional-filing-data>
 
  </pct-or-regional-filing-data>
Line 225: Line 225:
  
 
   <document-id>
 
   <document-id>
<country>US</country>
+
  <country>US</country>
<doc-number>08925112</doc-number>
+
  <doc-number>08925112</doc-number>
<kind>B2</kind>
+
  <kind>B2</kind>
<date>20150106</date>
+
  <date>20150106</date>
 
   </document-id>
 
   </document-id>
  
Line 235: Line 235:
  
 
  <us-references-cited>
 
  <us-references-cited>
<us-citation>
+
<us-citation>
<patcit num="00001">
+
<patcit num="00001">
<document-id>
+
<document-id>
<country>US</country>
+
<country>US</country>
<doc-number>1105569</doc-number>
+
<doc-number>1105569</doc-number>
<kind>A</kind>
+
<kind>A</kind>
<name>Lacrotte</name>
+
<name>Lacrotte</name>
<date>19140700</date>
+
<date>19140700</date>
</document-id>
+
</document-id>
</patcit>
+
</patcit>
<category>cited by examiner</category>
+
<category>cited by examiner</category>
<classification-national>
+
<classification-national>
<country>US</country>
+
<country>US</country>
<main-classification>2 214</main-classification>
+
<main-classification>2 214</main-classification>
</classification-national>
+
</classification-national>
</us-citation>
+
</us-citation>
...
+
...
 
  </us-references-cited>
 
  </us-references-cited>
  
Line 257: Line 257:
 
  <us-references-cited>
 
  <us-references-cited>
 
  ...
 
  ...
<us-citation>
+
<us-citation>
<nplcit num="00020">
+
<nplcit num="00020">
<othercit>
+
<othercit>
European Search Report dated Jan. 20, 2011 as received in European Patent Application No. GB1016384.8.
+
European Search Report dated Jan. 20, 2011 as received in European Patent Application No. GB1016384.8.
</othercit>
+
</othercit>
</nplcit>
+
</nplcit>
<category>cited by applicant</category>
+
<category>cited by applicant</category>
</us-citation>
+
</us-citation>
 
  </us-references-cited>  
 
  </us-references-cited>  
 
    
 
    
Line 272: Line 272:
 
*We need a "standard" name and address object for each inventor
 
*We need a "standard" name and address object for each inventor
 
  <us-parties>
 
  <us-parties>
<us-applicants>...</us-applicants>
+
<us-applicants>...</us-applicants>
<inventors>
+
<inventors>
<inventor sequence="001" designation="us-only">
+
<inventor sequence="001" designation="us-only">
<addressbook>
+
<addressbook>
<last-name>Oliver</last-name>
+
<last-name>Oliver</last-name>
<first-name>Paul</first-name>
+
<first-name>Paul</first-name>
<address>
+
<address>
<city>Rhyl</city>
+
<city>Rhyl</city>
<country>GB</country>
+
<country>GB</country>
</address>
+
</address>
</addressbook>
+
</addressbook>
</inventor>
+
</inventor>
...
+
...
</inventors>
+
</inventors>
...
+
...
 
  <us-parties>
 
  <us-parties>
  
Line 295: Line 295:
  
 
  <assignees>
 
  <assignees>
<assignee>
+
<assignee>
<addressbook>
+
<addressbook>
<orgname>Survitec Group Limited</orgname>
+
<orgname>Survitec Group Limited</orgname>
<role>03</role>
+
<role>03</role>
<address>
+
<address>
<city>Merseyside</city>
+
<city>Merseyside</city>
<country>GB</country>
+
<country>GB</country>
</address>
+
</address>
</addressbook>
+
</addressbook>
</assignee>
+
</assignee>
 
  </assignees>
 
  </assignees>
  

Revision as of 18:30, 21 March 2016

The Patent Data page is for instructions on how to get the USPTO patent data, how to use the database, and for the documentation of our database.

ER diagram

Patent Data.png

Downloading the files

The files (in xml format) for granted patent data can be obtained at granted patent

The files for patent application data can be obtained at patent applications

The files for maintenance fees data can be obtained at maintenance

Scripts are available to perform a bulk download of all the above files:

Script to download patent application data from 2001-2004

Script to download patent application data from 2005-2015

Script to download patent grant data from 1976-2000

Script to download patent grant data from 2001-2004

Script to download patent grant data from 2005-2015

To use the scripts, save the scripts as shell scripts, then either

$ sh Applications_download_2001-2004.sh

or first change the script to an executable and execute it

$ chmod a+x Applications_download_2001-2004.sh
$ ./Applications_download_2001-2004.sh

Notice there will be several hundreds of .zip files of size ~100mb getting downloaded so the process might take a while. When all the files are downloaded, unzip all of them using

$ unzip *.zip

XML Schema Notes

Tags we are using:

Tags we aren't using:

Parsing and Processing the XML files

The ParserSpliter.pl script will first split a large Patent Data XML file into smaller XML files, one for each patent data. And it will then parse and process each Patent Data XML file.

Some of the files are somehow mal-formatted, and will be moved to a ./failed_files directory If you add a character anywhere in these files, they somehow become fine to be processed by the script.

In order to use this script, you will need to have XML::Simple and Try::Tiny installed.

Open up CPAN shell:

$ perl -e shell -MCPAN

Install:

cpan[0]> install XML::Simple
cpan[1]> install Try::Tiny
cpan[2]> install Switch

Once the packages have been installed, use the script like the following example:

perl PatentParser.pl -file=ipa150319_small.xml

Other Resources

Documentations for the xml files

See Also

tool to convert dtd to xsd

Harvard Dataverse

New Notes

The source files have transitioned from here:

To:

The historic data is the same both sides.

Each file contains, in order, sorted by document ID:

  1. Design patents (we will discard)
  2. Plant patents (we will discard)
  3. Reissues (we probably want them)
  4. Utility patents (we want them)

The classifications in the XML file are:

  • IPC - these are good and we just need the main classification
  • CPC - as above
  • USPC - just a numeric but not split. Is 22431 224/31 or 22/431, etc.

Fields of Interest

We only care about Utility patents (and maybe Reissue patents too)

Utility patent grants fields

Patent

<document-id>
<country>US</country>
<doc-number>08925112</doc-number>
<kind>B2</kind>
<date>20150106</date>
</document-id>
  • type
  • applicationnumber
  • filingdate
<application-reference appl-type="utility">
<document-id>
<country>US</country>
<doc-number>13824291</doc-number>
<date>20110929</date>
</document-id>
</application-reference>

For priority, if there is more than 1, we want sequence 01

  • prioritydate
  • prioritycountry (should use ISO country codes - may need a lookup table)
  • prioritypatentnumber
<priority-claims>
<priority-claim sequence="01" kind="national">
<country>GB</country>
<doc-number>1016384.8</doc-number>
<date>20100930</date>
</priority-claim>
</priority-claims>

Classification IPC - we only need the first one: http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf

<classifications-ipcr>
<classification-ipcr>
<ipc-version-indicator>
<date>20060101</date>
</ipc-version-indicator>
<classification-level>A</classification-level>
B
<class>64</class> <subclass>G</subclass> <main-group>6</main-group> <subgroup>00</subgroup> <symbol-position>F</symbol-position> <classification-value>I</classification-value> ... </classification-ipcr> ... </classifications-ipcr>

Classification CPC - we only need the main one

  • Section, Class, Subclass
  • Main Group, Subgroup
<main-cpc>
<classification-cpc>
<cpc-version-indicator>
<date>20130101</date>
</cpc-version-indicator>
B
<class>64</class> <subclass>D</subclass> <main-group>10</main-group> <subgroup>00</subgroup> <symbol-position>F</symbol-position> <classification-value>I</classification-value> ... </classification-cpc>

</main-cpc>

Classification National: Note that the one below comes out to 2/2.11 (http://www.google.com/patents/US8925112#classifications)

  • Country
  • Class

<classification-national> <country>US</country> <main-classification>2 211</main-classification> </classification-national>

Title of the patent:

<invention-title id="d2e61">Aircrew ensembles</invention-title>

Number of Claims:

<number-of-claims>12</number-of-claims>

Primary examiner:

  • FirstName, LastName, Department
<examiners>
<primary-examiner>
<last-name>Patel</last-name>
<first-name>Tejash</first-name>
<department>3765</department>
</primary-examiner>
...
</examiners>

PCT/Regional Patent Number:

  • PCTNumber (just the doc number - if it starts with PCT set a flag)
<pct-or-regional-filing-data>
<document-id>
<country>WO</country>
<doc-number>PCT/EP2011/067014</doc-number>
<kind>00</kind>
<date>20110929</date>
</document-id>
...
</pct-or-regional-filing-data>

Citations

Patent Citations (we need all of them):

  • CitingPatentNumber (from the patent)
  • CitingPatentCountry (from the patent)
 <document-id>
 <country>US</country>
 <doc-number>08925112</doc-number>
 <kind>B2</kind>
 <date>20150106</date>
 </document-id>
  • CitedPatentNumber
  • CitedPatentCountry
<us-references-cited>
<us-citation>
<patcit num="00001">
<document-id>
<country>US</country>
<doc-number>1105569</doc-number>
<kind>A</kind>
<name>Lacrotte</name>
<date>19140700</date>
</document-id>
</patcit>
<category>cited by examiner</category>
<classification-national>
<country>US</country>
<main-classification>2 214</main-classification>
</classification-national>
</us-citation>
...
</us-references-cited>

For non-patent references, we are just going to count them:

<us-references-cited>
...
<us-citation>
<nplcit num="00020">
<othercit>
European Search Report dated Jan. 20, 2011 as received in European Patent Application No. GB1016384.8.
</othercit>
</nplcit>
<category>cited by applicant</category>
</us-citation>
</us-references-cited> 
 

Inventors

  • PatentNumber (and country) to build a key
  • We need a "standard" name and address object for each inventor
<us-parties>
<us-applicants>...</us-applicants>
<inventors>
<inventor sequence="001" designation="us-only">
<addressbook>
<last-name>Oliver</last-name>
<first-name>Paul</first-name>
<address>
<city>Rhyl</city>
<country>GB</country>
</address>
</addressbook>
</inventor>
...
</inventors>
...
<us-parties>

Assignees

  • PatentNumber (and country) to build a key
  • We need a "standard" name and address object for each assignee
<assignees>
<assignee>
<addressbook>
<orgname>Survitec Group Limited</orgname>
<role>03</role>
<address>
<city>Merseyside</city>
<country>GB</country>
</address>
</addressbook>
</assignee>
</assignees>


Other things we might want

  • Abstract
  • Claims (other than their count)

Things we don't need

General:

Classification related:

  • Level - This appears to be either core or advanced. Not sure it matters.
  • SymbolPosition, ClassificationValue - we likely don't need them
  • Classification status and data source - no idea what these do