Redesign Assignment and Patent Database

McNair Project
Redesign Assignment and Patent Database
Project Information
Project Title	Redesign Assignment and Patent Database
Owner	Shelby Bice
Start Date	9/2017
Deadline
Keywords	patent
Primary Billing
Notes
Has project status
Is dependent on	Reproducible Patent Data
	Copyright © 2016 edegan.com. All Rights Reserved.

This is an extension of the work I did last semester under "Redesigning Patent Database". Instead of simply reconfiguring the existing database, this project encompasses and full redesign and creation of a new Patent database and a new Assignment database that will be joined together.

Reference for definitions of fields in the two databases: https://www.oecd.org/sti/sci-tech/37569498.pdf

Paper on how to extract paper from XML files, parse it, and put it into a database: https://funginstitute.berkeley.edu/wp-content/uploads/2014/06/patentprocessor.pdf

Particularly useful, because it provides solutions for cleaning location and firm data

ER Diagram for Assignment and Patent Databases

Assignment Database Structure

After reading through the paper "The USPTO Patent Assignment Dataset: Descriptions and Analysis"(https://www.uspto.gov/sites/default/files/documents/USPTO_Patents_Assignment_Dataset_WP.pdf) and taking into account are needs for the Assignment database (the ability to connect the two databases together, the ability to trace conveyance of a patent over time) I decided on the following structure for the Assignment database.

The foreign key that connects every table in the database is rf_id, which stands for reel frame id. It is unique for each entry in the main table, assignment and is included in every table.

Conveyance is the only table that cannot be fully constructed as data is inserted into the database. Some of the fields depend on other tables being constructed, so it may be either partially populated while the other tables are being populated or is the last table to be populated.

Full list of tables: assignee, assignor, document_info, assignment, conveyance

ASSIGNEE

Purpose: store all the assignees from all the assignments

Each entry represents: the recipient of a patent in an entry in the ASSIGNMENT table

Primary key: rf_id, unless it is possible to have multiple assignees, then rf_id and assignee name

All columns (variables) in the table:

rf_id (bigint) reel frame number
assignee_name (varchar(255)) name of the assignee
address_line_1 (varchar(255)) first line of address of assignee
address_line_2 (varchar(255)) second line of address of assignee
city (varchar(255)) city of assignee
state (varchar(255)) state of assignee
country var(char(255)) country of assignee
postal_code (varchar(255)) post code of the assignee

ASSIGNOR

Purpose: store all the assignors from all the assignments

Each entry represents: the one assigning/granting the license for the entry in the ASSIGNMENT table

Primary key: rf_id and assignor_name

All columns (variables):

rf_id (bigint) reel frame number
assignor_name (varchar(255)) name of the assignor

CONVEYANCE

Purpose: represent the conveyance of a particular patent overtime

Each entry represents: an individual transaction for a patent (not unlike assignment, except this table has information about the assignment in relation to other assignments for the patent

Primary key: rf_id

All columns (variables): Two possibilities, depending on whether we want data to be repeated or not.

Not-repeated (with exception of primary/foreign key):

rf_id (bigint) reel frame number
next_transaction (bigint) rf_id of next transaction
num_conveyance (int) number in order of conveyance by date
acknowledgment_date (date) date that the assignor acknowledged the transaction
execution_date (date) date that the transaction
conveyance_type (varchar(255)) the USPTO paper determined a couple of different conveyance types and also outlined how to determine based on the conveyance text in the xml file which conveyance type is relevant to the transaction. The conveyance types are:
- assignment
- merger
- change of name
- government interest
- agreement
- security agreement
- release

Repeated, also include:

assignee_name (varchar(255)) name of the assignee
assignor_name (varchar(255)) name of the assignor
patent_number (varchar(255)) identifying number for the patent

DOCUMENT_INFO

Purpose: store extra information relevant to the patents represented in ASSIGNMENT

Each entry represents: information about the documentation and the patent for a particular entry in ASSIGNMENT

Primary key: rf_id

All columns (variables) when created:

rf_id (bigint) reel frame number
app_num (varchar(255)) application document USPTO number
app_date (date) date that the application was filed
app_country (varhar(255)) country in which it was filed
pgpub_num (varchar(255)) pre-grant publication document USPTO number
pgpub_date (date) date that pre-grant publication was released
pgpub_country (varchar(255)) country in which the patent is published before being granted a patent number
patent_number (varchar(255)) granted patent document USPTO numbe
grant_date (date) date that the patent is officially published
grant_country (varchar(255)) country in which the patent is published after being granted a patent number
invention_title (varchar(255)) title of the invention
language (varchar(255)) languageof invention title – could be potentially useful and interesting to investigate
reel_num (bigint) number of the reel the assignment was stored on
frame_num (bigint) number of the frame on the reel the assignment was stored on

ASSIGNMENT

Purpose: represent assignment transactions – central table for the database

Each entry represents: an individual patent assignment transaction

Primary key: rf_id

All columns (variables):

rf_id (bigint) reel fram number
assignment_id (bigint) US patent assignment id
correspondent_name (varchar(255)) name of correspondent for the transaction, could be a lawyer
correspondent_address1 (varchar(255)) first line of address for correspondent
correspondent_address2 (varchar(255)) second line of address for correspondent
correspondent_address3 (varchar(255)) third line of address for correspondent
correspondent_address4 (varchar(255)) fourth line of address for correspondent
record_date (date) date recorded with USPTO)
last_update_date (date) date that information for this assignment was last update with the USPTO
page_cnt (bigint) page count of assignment record)

More extensive notes exist under :E/McNair/Projects/Redesigning Patent Database/New Patent Database Project/Notes on USPTO Assignment Data Paper

Patent Database Structure

Most of this design is based off what I investigated last semester and therefore is subject to change, since the design for the Assignment database is somewhat new.

The foreign key that connects every table is patent_number, which uniquely identifies every patent in the PATENT table and is included in every table.

PATENT

Purpose: represent patents – central table for the database

Each entry represents: an individual patent

Primary key: patent_number

All columns (variables):

patent_number (varchar(255)) unique identifier for the patent from the UPSTO office, can contain letters hence varchar
grantdate (date) date that patent was officially granted
prioritydate (date) date used to established novelty of an invention in regards to other inventions (source: http://www.bios.net/daisy/patentlens/2343.html)
prioritycountry (varchar(255)) country where patent is first filed
prioritypatentnumber (varchar(255)) placeholder patent number if the patent is published before it is fully approved
cpcsubgroup (varchar(255)) cooperative patent classification subgroup
pctpatentnumber (varchar(255)) international patent number (according to Patent Cooperation Treaty)
appnum(varchar(255)) application number
grantyear (int) year the patent was granted
appdate (date) date that the application was filed
appyear (int) year that the application was filed
nber (varchar(255)) classification according to the National Bureau of Economic Research (NBER) Patent Citations Data File
uspc (varchar(255)) United States patent classification
uspc_sub (varchar(255)) United States patent classification subgroup

ASSIGNEE

Purpose: Represent the person the patent was currently assigned to at publication according to the patent information (the information in the Assignment database is more complete)

Each entry represents: the assignee for a patent in PATENT

Primary key: patent_number, first_name, last_name (in case it is possible for there to be multiple assignees for an individual patent)

All columns (variables):

patent_number (varchar(255)) unique identifier for the patent from the UPSTO office, can contain letters hence varchar
first_name (varchar(255)) first name of assignee
last name (varchar(255)) last name of assignee
address (varchar(255)) address of assignee
postcode (varchar(255)) postcode of assignee
orgname (varchar(255)) name of the organization owning the patent (if applicable - organization may go by multiple orgnames)
city (varchar(255)) city of the assignee
country (varchar(255)) country of the assignee
state (varchar(255)) state of assignee
residence (varchar(255)) mailing address of assignee if they receive mail at a different address than the one listed above
size_of_firm (varchar(255)) size of the organization, large, small, or micro - relates to how much the fee they pay for the patent is

CITATION

Purpose: Represent all the patents in PATENT that another patent in PATENT cites

Each entry represents: one citation that a patent in PATENT makes

Primary key: patent_number AND cited_patent_number (because a patent might cite multiple patents, so both variables are needed for the primary key)

All columns (variables):

patent_number (varchar(255)) unique identifier for the patent from the UPSTO office, can contain letters hence varchar
cited_patent_number (varchar(255)) unique identifier from the UPSTO office for the patent being cited

FEE

Purpose: Represent information about the fees paid for an individual patent

Each entry represents: information about the fees paid on the patent

Primary key: patent_number

All columns:

patent_number (varchar(255)) unique identifier for the patent from the UPSTO office, can contain letters hence varchar
four (boolean) payment of maintenance fee 4th year
eight (boolean) payment of maintenance fee 8th year
twelve (boolean) payment of maintenance fee 12th year
fee_date (date) date that fee is paid on
fee_code (varchar(255)) code for type of fee to pay

CLAIMS

Purpose: Represent information about the claims for an individual patent

Each entry represents: information about a claim that a patent makes

Primary key: patent_number and claim_id

All columns:

patent_number (varchar(255)) unique identifier for the patent from the UPSTO office, can contain letters hence varchar
claim_id (varchar(255)) id number for the claim
language (varchar(255)) language of claim
claim_type (varchar(255)) type of claim
status (varchar(255)) status of the claim
claim_text (text) the text of the claim

HISTPATENT

Purpose: Represent historical data about the patent pertaining to dates and publication

Each entry represents: sequence of historical data about an individual patent in PATENT

patent_number
pubno
pubdate
display date,
distype
exp_dt
pta

Connecting Patent database and Assignment database

The answer to connecting the Patent database to the Assignment database lies somewhere in using the information in the table DOCUMENT_INFO to connect to a patent_id from the Patent database PATENT table for each assignment in the Assignment database ASSIGNMENT table. On further investigation into "The USPTO Patent Assignment Dataset: Descriptions and Analysis, the field they called "grant_doc_num" (grant_num above) in DOCUMENT_INFO is the patent number (the description of DOCUMENT_INFO above has been altered to reflect this.)

The paper mentions that there will errors in patent number stored in DOCUMENT_INFO under grant_doc_num so a separate table called DOCUMENT_INFO_ADMIN was constructed to determine how prevalent errors were. They queried the patent number from for the appno_doc_num from administrative data. The "appno_doc_num" is the Application Document USPTO number from DOCUMENT_INFO. On further investigation into the paper, it became clear that "administrative data" refers to internal USPTO data that was available to the authors of the paper, but that we might not have. However, in 99% of cases the grant_doc_num and the queried patent number based on the appno_doc_num were the same, so we can probably rely on grant_doc.

Therefore, since every table in the Patent database will have the patent number stored, to connect any table in Assignment to the Patent database, it would first be joined with DOCUMENT_INFO on rf_id and then joined with the appropriate table in Patent on patent number.

Also in the paper, they mentioned the Assignments on the Web for Patents (AOTW-P), a searchable database of individual USPTO assignment records keyed on reel-frame identification, patent number, and assignor or assignee name (https://assignment.uspto.gov/patent/index.html#/patent/search). Obviously it would not be possible to individually use this tool to query all the patent numbers, but if it would be possible to write a script to somehow query each patent number using the rf_id and parse the response, this could potentially be useful to check the patent numbers, but might not be any more accurate than what will already be in DOCUMENT_INFO.

Redesign Assignment and Patent Database

Contents

ER Diagram for Assignment and Patent Databases

Assignment Database Structure

ASSIGNEE

ASSIGNOR

CONVEYANCE

DOCUMENT_INFO

ASSIGNMENT

Patent Database Structure

PATENT

ASSIGNEE

CITATION

FEE

CLAIMS

HISTPATENT

Connecting Patent database and Assignment database

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools