USPTO Patent Litigation Data
USPTO Patent Litigation Data | |
---|---|
Project Information | |
Has title | USPTO Patent Litigation Data |
Has owner | Ed Egan |
Has start date | May 2017 |
Has deadline date | |
Has keywords | Patent, USPTO, Litigation, Data |
Has project status | Tabled |
Dependent(s): | Patent Design Main Page |
Has sponsor | McNair Center |
Has project output | Data |
Copyright © 2019 edegan.com. All Rights Reserved. |
Getting the data
The data is available from https://bulkdata.uspto.gov/data2/patent/litigation/2015/, which is linked directly from https://bulkdata.uspto.gov/
The data comes in .dta (STATA) and .csv formats. We took the .csv files:
attorneys.csv.zip 29751808 2016-12-29 07:44 cases.csv.zip 3085347 2016-12-29 07:45 documents.csv.zip 244399591 2016-12-29 07:46 names.csv.zip 7256777 2016-12-29 07:48 pacer_cases.csv.zip 2453937 2016-12-29 07:48 csv.zip 286947372 2016-12-29 07:45
csv.zip appears to contain the five other files!
The files are in E:\McNair\PatentData\Litigation\Raw Data
Quick Review
File Tops (5 lines)
names.csv
case_row_id,case_number,party_row_count,party_type,name_row_count,name 1,0:79-cv-06704-JCP,1,Plaintiff,1,Burroghs Wellcome Co. 1,0:79-cv-06704-JCP,2,Defendant,2,Generix Drug Corp. 3,0:83-cv-06860-JAG,3,Plaintiff,3,Kenneth R. Cornwall 3,0:83-cv-06860-JAG,4,Defendant,4,"U. S. COnstruction Manufacturing, Inc." ...
attorneys.csv
case_row_id,case_number,party_row_count,party_type,attorney_row_count,name,contactinfo,position 14,0:92-cv-00398-MJP,40,Plaintiff ,1,"Joel Wyman Collins , Jr","Collins and Lacy; PO Box 12487; Columbia, SC 29211; 803-256-2660; Fax: 803-771-4484; Email: jcollins@collinsandlacy.com",LEAD ATTORNEY; ATTORNEY TO BE NOTICED 14,0:92-cv-00398-MJP,41,Plaintiff ,2,"Joel Wyman Collins , Jr",(See above for address),LEAD ATTORNEY; ATTORNEY TO BE NOTICED ...
cases.csv
case_row_id,case_number,pacer_id,case_name,court_name,assigned_to,referred_to,case_cause,jurisdictional_basis,demand,jury_demand,lead_case,related_case,settlement,date_filed,date_closed,date_last_filed 54973,01-Jan-1970,223949,"ASTRAZENECA AB et al v. SANDOZ, INC.",UNITED STATES DISTRICT COURT DISTRICT OF NEW JERSEY,Judge Joel A. Pisano,Magistrate Judge Tonianne J. Bongiovanni,35:271 Patent Infringement,Federal Question,,,,,,2009-01-14,2011-06-02, 427,0:00-cv-00019,1338,Banner Engineering v. Harris Instrument,UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA,,,,,,,,,,2000-01-04,2000-03-09,2000-03-02 428,0:00-cv-00058,1377,"Advanced UroScience, et al v. Inamed Corporation, et al",UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA,,,,,,,,,,2000-01-11,2000-11-30,2001-02-28 429,0:00-cv-00172-DWF-AJB,,Farnam Companies Inc v. Miller Manufacturing,U.S. District of Minnesota (DMN),Judge Donovan W. Frank,Chief Mag. Judge Arthur J. Boylan,35:271 Patent Infringement,Federal Question,,,0:98-cv-00040-DWF-AJB,"Dist of AZ, 99-01804",,,, ...
documents.csv
case_row_id,case_number,doc_count,attachment,date_filed,long_description,doc_number,short_description,upload_date 1,0:79-cv-06704-JCP,1,,2000-08-03,"COPY OF PAPER DOCKET SHEET (kw, Deputy Clerk) (Entered: 08/03/2000)",37,, 1,0:79-cv-06704-JCP,2,,1982-05-31,"CASE CLOSED. Case and Motions no longer referred to Magistrate. (kw, Deputy Clerk) (Entered: 08/03/2000)",,, 3,0:83-cv-06860-JAG,1,,2004-02-13,COPY OF PAPER DOCKET SHEET (Former Deputy Clerk) (Entered: 02/13/2004),123,, 3,0:83-cv-06860-JAG,2,,1992-03-01,Case closed (Former Deputy Clerk) (Entered: 03/05/1992),,, ...
pacer_cases.csv
case_name,court_code,court_name,date_closed,case_number,pacer_id,date_filed "Davis v. Favelle Favco Cranes, et al",txsd,Texas Southern District Court,08/13/2001,1:2000-cv-00003,3,2000-01-03 Monsanto v. Sierks et al,ned,Nebraska District Court,07/10/2002,4:2002-cv-00105,4,2002-03-04 "Armament Sys & Proc v. Coast Cutlery Co Inc, et al",wied,Wisconsin Eastern District Court,03/03/2008,1:2000-cv-01273,4,2000-09-20 Tektronix Inc. v. Integraph Corporation,ord,Oregon District Court,09/04/1998,3:1998-cv-00599,7,1998-07-09 ...
File specs
pacer_cases.csv: 74,954 records case_name varchar(255), court_code varchar(10), --txsd, ned court_name varchar(255), date_closed date, --mm/dd/yyyy case_number varchar(100), --e.g., 1:2000-cv-00003 pacer_id int, --appears to be int date_filed date --yyyy-mm-dd
documents.csv 5,186,345 records case_row_id int, case_number varchar(100), --e.g. 0:79-cv-06704-JCP doc_count int, attachment varchar(255), --NULL when seen date_filed date, --yyyy-mm-dd long_description text, doc_number int, short_description varchar(255) --NULL when seen upload_date date --NULL when seen
cases.csv: 74,630 records case_row_id int, case_number varchar(100), pacer_id int, case_name text, court_name text, assigned_to varchar(255), --Often NULL. Examples Judge Joel A. Pisano referred_to varchar(255), --Often NULL. Examples Magistrate Judge Tonianne J. Bongiovanni case_cause varchar(255), --Often NULL. Examples: 35:271 Patent Infringement, 35:145 Patent Infringement, etc. jurisdictional_basis varchar(255), --Often NULL. Examples: Federal Question demand varchar(100), --appears to be one of NULL, plaintiff, defendant, both jury_demand varchar(100), lead_case varchar(100), --appears to be case_number related_case text, --appears to be a mix of things in semicolon seperated list settlement text, --appears always NULL date_filed date, --yyyy-mm-dd date_closed date, --yyyy-mm-dd date_last_filed date --yyyy-mm-dd
names.csv 561,019 records case_row_id int, case_number varchar(100), party_row_count int, party_type varchar(20), --Plaintiff or Defendant name_row_count int, name varchar(255)
attorney.csv: 1,223,419 records case_row_id int, case_number varchar(100), party_row_count int, party_type varchar(20), --Plaintiff or Defendant attorney_row_count int, name varchar(255), contactinfo varchar(255), --semicolon seperated value position varchar(255) --semicolon seperated list e.g., LEAD ATTORNEY; ATTORNEY TO BE NOTICED
Obvious Issues
There are no codified patent numbers and outcomes
Some patent numbers can be found in documents.long_description but it seems that this is the docket headers and most patents will likely be in the documents themselves (which we don't have and would have to OCR).
We might be able to piece together outcomes from documents.long_description but this is going to be very hard. Clearly, this is one of Lex Machina's value added.