Redesign Assignment and Patent Database
This is an extension of the work I did last semester under "Redesigning Patent Database". Instead of simply reconfiguring the existing database, this project encompasses and full redesign and creation of a new Patent database and a new Assignment database that will be joined together.
Redesign Assignment and Patent Database | |
---|---|
Project Information | |
Project Title | Redesign Assignment and Patent Database |
Owner | Shelby Bice |
Start Date | 9/2017 |
Deadline | |
Keywords | patent |
Primary Billing | |
Notes | |
Has project status | |
Is dependent on | Reproducible Patent Data |
Copyright © 2016 edegan.com. All Rights Reserved. |
Assignment Database Structure
After reading through the paper "The USPTO Patent Assignment Dataset: Descriptions and Analysis"(https://www.uspto.gov/sites/default/files/documents/USPTO_Patents_Assignment_Dataset_WP.pdf) and taking into account are needs for the Assignment database (the ability to connect the two databases together, the ability to trace conveyance of a patent over time) I decided on the following structure for the Assignment database.
The foreign key that connects every table in the database is rf_id, which stands for reel frame id. It is unique for each entry in the main table, assignment and is included in every table.
Conveyance is the only table that cannot be fully constructed as data is inserted into the database. Some of the fields depend on other tables being constructed, so it may be either partially populated while the other tables are being populated or is the last table to be populated.
Full list of tables: assignee, assignor, document_info, assignment, conveyance
ASSIGNEE
Purpose: store all the assignees from all the assignments
Each entry represents: the recipient of a patent in an entry in the ASSIGNMENT table
Primary key: rf_id, unless it is possible to have multiple assignees, then rf_id and assignee name
All columns (variables) in the table: rf_id, name, assignee, address_line_1, address_line_2, city, state, country, postal_code
ASSIGNOR
Purpose: store all the assignors from all the assignments
Each entry represents: the one assigning/granting the license for the entry in the ASSIGNMENT table
Primary key: rf_id and assignor_name
All columns (variables): rf_id, assignor_name
CONVEYANCE
Purpose: represent the conveyance of a particular patent overtime
Each entry represents: an individual transaction for a patent (not unlike assignment, except this table has information about the assignment in relation to other assignments for the patent
Primary key: rf_id
All columns (variables): Two possibilities, depending on whether we want data to be repeated or not.
Not-repeated (with exception of primary/foreign key): rf_id, next_transaction (also rf_id), num_conveyance (number in order of conveyance by date), acknowledgment_date (in ASSIGNOR for the assignment data paper but would be moved here. Could originally be stored in ASSIGNOR until CONVEYANCE is generated), execution_date (same deal as acknowledgment date, should probably be moved here; if there are multiple execution dates, we will store the last one) conveyance_type
Repeated, also include: assignee, assignor, patent_id
DOCUMENT_INFO
Purpose: store extra information relevant to the patents represented in ASSIGNMENT
Each entry represents: information about the documentation and the patent for a particular entry in ASSIGNMENT
Primary key: rf_id
All columns (variables) when created: rf_id, app_num (application document USPTO number), app_date, app_country, pgpub_num (pre-grant publication document USPTO number), pgpub_date, pgpub_country, grant_num (granted patent document USPTO number, also know as the patent number), grant_date, grant_country, invention_title, language (of invention title – could be potentially useful and interesting to investigate), reel_num, frame_num (reel_num and frame_num were in ASSIGNMENT for the uspto data paper, but I think they would be more useful here)
ASSIGNMENT
Purpose: represent assignment transactions – central table for the database
Each entry represents: an individual patent assignment transaction
Primary key: rf_id
All columns (variables): rf_id, assignment_id (US patent assignment id), correspondent_name, correspondent_address1, correspondent_address_2, correspondent_address3, correspondent_address4, record_date (date recorded with USPTO), last_update_date, page_cnt (page count of assignment record)
More extensive notes exist under :E/McNair/Projects/Redesigning Patent Database/New Patent Database Project/Notes on USPTO Assignment Data Paper
Patent Database Structure
Most of this design is based off what I investigated last semester and therefore is subject to change, since the design for the Assignment database is somewhat new.
The foreign key that connects every table is patent_number, which uniquely identifies every patent in the PATENT table and is included in every table.
PATENT
Purpose: represent patents – central table for the database
Each entry represents: an individual patent
Primary key: patent_number
All columns (variables): patent_number, grantdate, prioritydate, prioritycountry, prioritypatentnumber, cpcsubgroup, pctpatentnumber, claims, appnum, grantyear, appdate, appyear, nber, uspc, uspc_sub
Connecting Patent database and Assignment database
The answer to connecting the Patent database to the Assignment database lies somewhere in using the information in the table DOCUMENT_INFO to connect to a patent_id from the Patent database PATENT table for each assignment in the Assignment database ASSIGNMENT table. On further investigation into "The USPTO Patent Assignment Dataset: Descriptions and Analysis, the field they called "grant_doc_num" (grant_num above) in DOCUMENT_INFO is the patent number (the description of DOCUMENT_INFO above has been altered to reflect this.)
The paper mentions that there will errors in patent number stored in DOCUMENT_INFO under grant_doc_num so a separate table called DOCUMENT_INFO_ADMIN was constructed to determine how prevalent errors were. They queried the patent number from for the appno_doc_num from administrative data. The "appno_doc_num" is the Application Document USPTO number from DOCUMENT_INFO. On further investigation into the paper, it became clear that "administrative data" refers to internal USPTO data that was available to the authors of the paper, but that we might not have. However, in 99% of cases the grant_doc_num and the queried patent number based on the appno_doc_num were the same, so we can probably rely on grant_doc.
Therefore, since every table in the Patent database will have the patent number stored, to connect any table in Assignment to the Patent database, it would first be joined with DOCUMENT_INFO on rf_id and then joined with the appropriate table in Patent on patent number.
Also in the paper, they mentioned the Assignments on the Web for Patents (AOTW-P), a searchable database of individual USPTO assignment records keyed on reel-frame identification, patent number, and assignor or assignee name (https://assignment.uspto.gov/patent/index.html#/patent/search). Obviously it would not be possible to individually use this tool to query all the patent numbers, but if it would be possible to write a script to somehow query each patent number using the rf_id and parse the response, this could potentially be useful to check the patent numbers, but might not be any more accurate than what will already be in DOCUMENT_INFO.