Difference between revisions of "USPTO Patent Assignment Dataset"

Project
USPTO Patent Assignment Dataset
Project Information
Has title	USPTO Patent Assignment Dataset
Has owner	Ed Egan
Has start date
Has deadline date
Has keywords	Data
Has project status	Active
Has sponsor	McNair Center
Has project output	Data
	Copyright © 2019 edegan.com. All Rights Reserved.

Latest revision as of 13:41, 21 September 2020

This project describes the build out and basic use of the USPTO Assignment Dataset.

The data, scripts, etc. are in:

E:\McNair\Projects\USPTO Patent Assignment Dataset

The data is described in a USPTO Economic Working Paper by Marco, Myers, Graham and others: https://www.uspto.gov/sites/default/files/documents/USPTO_Patents_Assignment_Dataset_WP.pdf

Pre-load checks

The data is large. We don't have space on the main dbase server for it.

df -h
/dev/nvme1n1p2  235G  208G   15G  94% /var/postgresql

Note: To check dbase space usage on the dbase server see Posgres_Server_Configuration#Size.2C_Backup_.26_Restore.

The postgres dbase on the RDP, however, currently has more than 300Gb free and is on a solid state drive, so its performance should be acceptable.

Getting the data

The data is available pre-processed (see the working paper) from https://bulkdata.uspto.gov/#addt. Specifically, download csv.zip (1284462233, 2017-03-28 15:47) from https://bulkdata.uspto.gov/data/patent/assignment/economics/2016/

The load script is:

LoadUSPTOPAD.sql

To get the data into ASCII or ASCII, move it to the dbase server then:

Check its encoding using:

file -i Car.java

Convert it to UTF-8 using (the TRANSLIT option approximates characters that can't be directly encoded)

iconv -f oldformat -t UTF-8//TRANSLIT file -o outfile

- The sc options forces iconv to ignore bad chars and move on:

iconv -sc -f oldformat -t UTF-8//TRANSLIT file -o outfile

Bash scripts to do all of the csvs is in Z:\USPTO_assigneesdata; make them executable and then run whichever you need

chmod  +x  encoding.sh
./encoding.sh

Note that the final source encoding was Win1252 and the final target encoding was ASCII
All bar three of the files had to be manually fixed to remove errors. Final files are in E:\McNair\Projects\USPTO Patent Assignment Dataset

@@ Line 1: / Line 1: @@
-{{McNair Projects
+{{Project
+|Has project output=Data
+|Has sponsor=McNair Center
 |Has title=USPTO Patent Assignment Dataset
 |Has owner=Ed Egan,
@@ Line 7: / Line 9: @@
 This project describes the build out and basic use of the USPTO Assignment Dataset.
+The data, scripts, etc. are in:
+ E:\McNair\Projects\USPTO Patent Assignment Dataset
 The data is described in a USPTO Economic Working Paper by Marco, Myers, Graham and others: https://www.uspto.gov/sites/default/files/documents/USPTO_Patents_Assignment_Dataset_WP.pdf
 ==Pre-load checks==
@@ Line 24: / Line 28: @@
 ==Getting the data==
+The data is available pre-processed (see the working paper) from https://bulkdata.uspto.gov/#addt. Specifically, download csv.zip (1284462233, 2017-03-28 15:47) from https://bulkdata.uspto.gov/data/patent/assignment/economics/2016/
+The load script is:
+ LoadUSPTOPAD.sql
+To get the data into ASCII or ASCII, move it to the dbase server then:
+*Check its encoding using:
+ file -i Car.java
+*Convert it to UTF-8 using (the TRANSLIT option approximates characters that can't be directly encoded)
+ iconv -f oldformat -t UTF-8//TRANSLIT file -o outfile
+**The sc  options forces iconv to ignore bad chars and move on:
+ iconv -sc -f oldformat -t UTF-8//TRANSLIT file -o outfile
+*Bash scripts to do all of the csvs is in Z:\USPTO_assigneesdata; make them executable and then run whichever you need
+ chmod  +x  encoding.sh
+ ./encoding.sh
+*Note that the final source encoding was Win1252 and the final target encoding was ASCII
+*All bar three of the files had to be manually fixed to remove errors. Final files are in E:\McNair\Projects\USPTO Patent Assignment Dataset

Difference between revisions of "USPTO Patent Assignment Dataset"

Latest revision as of 13:41, 21 September 2020

Pre-load checks

Getting the data

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools