Difference between revisions of "Harvard Dataverse"

From edegan.com
Jump to navigation Jump to search
imported>Leo
imported>Ed
Line 1: Line 1:
 
This page records how to load/use the Harvard Dataverse.
 
This page records how to load/use the Harvard Dataverse.
 +
 +
==Getting the data==
  
 
Where we get the Harvard Dataverse data:
 
Where we get the Harvard Dataverse data:
Line 6: Line 8:
 
For more information about the patent data:
 
For more information about the patent data:
 
[[Patent Data]]
 
[[Patent Data]]
 +
 +
==Accessing the dbase==
  
 
On the database server, the entire Harvard Dataverse for Patent Data is downloaded in directory  
 
On the database server, the entire Harvard Dataverse for Patent Data is downloaded in directory  
Line 78: Line 82:
 
  /bulk/Software/Database\ Scripts/droptables.sql
 
  /bulk/Software/Database\ Scripts/droptables.sql
  
 +
==Creating the tables==
  
 
The following is the schema for the 10 tables in this database:
 
The following is the schema for the 10 tables in this database:
  
 
  CREATE TABLE patents (
 
  CREATE TABLE patents (
     Patent varchar,
+
     Patent varchar, #to int
     Kind varchar,
+
     Kind varchar,   #only types NULL, A1, B1, B2, E1 (check E1 is numeric)
 
     Claims integer,  
 
     Claims integer,  
     AppType varchar,
+
     AppType varchar, #to int (http://www.uspto.gov/web/offices/ac/ido/oeip/taf/filingyr.htm) 29 is design
     AppNum varchar,  
+
     AppNum varchar, #to int
 
     GDate date,       
 
     GDate date,       
 
     GYear integer,     
 
     GYear integer,     
     AppDate varchar,    
+
     AppDate varchar, #date
     AppYear varchar
+
     AppYear varchar #to int
 
  );
 
  );
  
 
  CREATE TABLE classes (
 
  CREATE TABLE classes (
     Patent varchar,
+
     Patent varchar, #to int
 
     Prim integer,
 
     Prim integer,
 
     Class varchar,
 
     Class varchar,
Line 101: Line 106:
  
 
  CREATE TABLE assignees (
 
  CREATE TABLE assignees (
     Patent varchar,
+
     Patent varchar, #to int
     AsgType varchar,
+
     AsgType varchar, #to int
 
     Assignee varchar,
 
     Assignee varchar,
 
     City varchar,
 
     City varchar,

Revision as of 13:22, 5 April 2016

This page records how to load/use the Harvard Dataverse.

Getting the data

Where we get the Harvard Dataverse data: Harvard Dataverse

For more information about the patent data: Patent Data

Accessing the dbase

On the database server, the entire Harvard Dataverse for Patent Data is downloaded in directory

 /bulk/patent

The files have been processed to be used for the create table scripts I have created to load the Harvard Dataverse database into our own psql database.

These preprocessed files are available under

/bulk/patent/preprocessed

The database I have created for the Patent Data from 1975-2010 is called patent data. After logging in to researcher@128.42.44.181, you can get access to the database by

psql patentdata

You can see a list of the tables inside the patent database by

patentdata=# \dt
           List of relations
Schema |   Name    | Type  |   Owner    
--------+-----------+-------+------------
public | assignees | table | researcher
public | citations | table | researcher
public | classes   | table | researcher
public | inventors | table | researcher
public | invpat    | table | researcher
public | lawyers   | table | researcher
public | patdesc   | table | researcher
public | patents   | table | researcher
public | scirefs   | table | researcher
public | usreldocs | table | researcher
(10 rows)

You can select from the database, for example, by

patentdata=# SELECT * FROM CLASSES WHERE patent='03930270';
 patent  | prim | class | subclass 
----------+------+-------+----------
03930270 |    1 | 360   | 130.24
03930270 |    0 | 360   | 84
03930270 |    0 | 360   | 271.8
03930270 |    0 | G9B   | 15.08
(4 rows)

Or, do a join:

patentdata=# SELECT * FROM inventors AS i, classes AS c WHERE i.Patent=c.Patent AND i.Patent='03930270';
 patent  | firstname | lastname | street |   city    | state | country | zipcode | nationality | invseq |  patent  | prim | class | subclass 
----------+-----------+----------+--------+-----------+-------+---------+---------+-------------+--------+----------+------+-------+----------
03930270 | YOSHINORI | YAMAZOE  |        | OSAKA     |       | JP      |         |             |      3 | 03930270 |    1 | 360   | 130.24
03930270 | YOSHINORI | YAMAZOE  |        | OSAKA     |       | JP      |         |             |      3 | 03930270 |    0 | 360   | 84
03930270 | YOSHINORI | YAMAZOE  |        | OSAKA     |       | JP      |         |             |      3 | 03930270 |    0 | 360   | 271.8
03930270 | YOSHINORI | YAMAZOE  |        | OSAKA     |       | JP      |         |             |      3 | 03930270 |    0 | G9B   | 15.08
03930270 | CHIHIRO   | NAKAMURA |        | MORIGUCHI |       | JP      |         |             |      4 | 03930270 |    1 | 360   | 130.24
03930270 | CHIHIRO   | NAKAMURA |        | MORIGUCHI |       | JP      |         |             |      4 | 03930270 |    0 | 360   | 84
03930270 | CHIHIRO   | NAKAMURA |        | MORIGUCHI |       | JP      |         |             |      4 | 03930270 |    0 | 360   | 271.8
03930270 | CHIHIRO   | NAKAMURA |        | MORIGUCHI |       | JP      |         |             |      4 | 03930270 |    0 | G9B   | 15.08
03930270 | SATOSHI   | KIKUYA   |        | KADOMA    |       | JP      |         |             |      2 | 03930270 |    1 | 360   | 130.24
03930270 | SATOSHI   | KIKUYA   |        | KADOMA    |       | JP      |         |             |      2 | 03930270 |    0 | 360   | 84
03930270 | SATOSHI   | KIKUYA   |        | KADOMA    |       | JP      |         |             |      2 | 03930270 |    0 | 360   | 271.8
03930270 | SATOSHI   | KIKUYA   |        | KADOMA    |       | JP      |         |             |      2 | 03930270 |    0 | G9B   | 15.08
03930270 | SHIZUHIKO | TANIGAWA |        | MORIGUCHI |       | JP      |         |             |      1 | 03930270 |    1 | 360   | 130.24
03930270 | SHIZUHIKO | TANIGAWA |        | MORIGUCHI |       | JP      |         |             |      1 | 03930270 |    0 | 360   | 84
03930270 | SHIZUHIKO | TANIGAWA |        | MORIGUCHI |       | JP      |         |             |      1 | 03930270 |    0 | 360   | 271.8
03930270 | SHIZUHIKO | TANIGAWA |        | MORIGUCHI |       | JP      |         |             |      1 | 03930270 |    0 | G9B   | 15.08
(16 rows)

The schema I used for this database is documented at the bottom of this page.

And this is also saved as a .sql script under

/bulk/Software/Database\ Scripts/createtables.sql

A corresponding Drop table scripts is also available at

/bulk/Software/Database\ Scripts/droptables.sql

Creating the tables

The following is the schema for the 10 tables in this database:

CREATE TABLE patents (
    Patent varchar, #to int
    Kind varchar,   #only types NULL, A1, B1, B2, E1 (check E1 is numeric)
    Claims integer, 
    AppType varchar, #to int (http://www.uspto.gov/web/offices/ac/ido/oeip/taf/filingyr.htm) 29 is design
    AppNum varchar,  #to int
    GDate date,      
    GYear integer,     
    AppDate varchar, #date
    AppYear varchar  #to int
);
CREATE TABLE classes (
    Patent varchar,  #to int
    Prim integer,
    Class varchar,
    Subclass varchar
);
CREATE TABLE assignees (
    Patent varchar,  #to int
    AsgType varchar, #to int
    Assignee varchar,
    City varchar,
    State varchar,
    Country varchar,
    Nationality varchar,
    Residence varchar,
    AsgSeq integer
);
CREATE TABLE citations (
    Patent varchar,
    Cit_Date varchar,
    Cit_Name varchar,
    Cit_Kind varchar,
    Cit_Country varchar,
    Citation varchar,
    Category varchar,
    CitSeq integer
);
CREATE TABLE inventors (
    Patent varchar,
    Firstname varchar,
    Lastname varchar,
    Street varchar,
    City varchar,
    State varchar,
    Country varchar,
    Zipcode integer,
    Nationality varchar,
    InvSeq integer
);
CREATE TABLE patdesc (
    Patent varchar,
    Abstract varchar,
    Title varchar
);
CREATE TABLE lawyers (
    Patent varchar,
    Firstname varchar,
    Lastname varchar,
    LawCountry varchar,
    OrgName varchar,
    LawSeq integer
);
CREATE TABLE scirefs (
    Patent varchar,
    Descrip varchar,
    CitSeq integer
);
CREATE TABLE usreldocs (
    Patent varchar,
    DocType varchar,
    OrderSeq integer,
    Country varchar,
    RelPatent varchar,
    Kind varchar,
    RelDate varchar,
    Status varchar
);
CREATE TABLE invpat (
    Firstname varchar,
    Lastname varchar,
    Street varchar,
    City varchar,
    State varchar,
    Country varchar,
    Zipcode integer,
    Lat double precision,
    Lon double precision,
    InvSeq integer,
    Patent varchar,
    GYear integer,
    AppYear varchar,
    AppDate varchar,
    Assignee varchar,
    AsgNum varchar,
    Class varchar,
    InvNum varchar,
    Low varchar,
    Up varchar
);