USPTO Patent Litigation Data

From edegan.com
Revision as of 10:28, 31 October 2017 by MichelleH (talk | contribs)
Jump to navigation Jump to search


McNair Project
USPTO Patent Litigation Data
Project logo 02.png
Project Information
Project Title USPTO Patent Litigation Data
Owner Ed Egan
Start Date May 2017
Deadline
Keywords Patent, USPTO, Litigation, Data
Primary Billing
Notes
Has project status Tabled
Copyright © 2016 edegan.com. All Rights Reserved.


Getting the data

The data is available from https://bulkdata.uspto.gov/data2/patent/litigation/2015/, which is linked directly from https://bulkdata.uspto.gov/

The data comes in .dta (STATA) and .csv formats. We took the .csv files:

attorneys.csv.zip	29751808	2016-12-29 07:44
cases.csv.zip	3085347	2016-12-29 07:45
documents.csv.zip	244399591	2016-12-29 07:46
names.csv.zip	7256777	2016-12-29 07:48
pacer_cases.csv.zip	2453937	2016-12-29 07:48
csv.zip	286947372	2016-12-29 07:45

csv.zip appears to contain the five other files!

The files are in E:\McNair\PatentData\Litigation\Raw Data

Quick Review

File Tops (5 lines)

names.csv

case_row_id,case_number,party_row_count,party_type,name_row_count,name
1,0:79-cv-06704-JCP,1,Plaintiff,1,Burroghs Wellcome Co.
1,0:79-cv-06704-JCP,2,Defendant,2,Generix Drug Corp.
3,0:83-cv-06860-JAG,3,Plaintiff,3,Kenneth R. Cornwall
3,0:83-cv-06860-JAG,4,Defendant,4,"U. S. COnstruction Manufacturing, Inc."
...

attorneys.csv

case_row_id,case_number,party_row_count,party_type,attorney_row_count,name,contactinfo,position
14,0:92-cv-00398-MJP,40,Plaintiff ,1,"Joel Wyman Collins , Jr","Collins and Lacy; PO Box 12487; Columbia, SC 29211; 803-256-2660; Fax: 
803-771-4484; Email: jcollins@collinsandlacy.com",LEAD ATTORNEY; ATTORNEY TO BE NOTICED
14,0:92-cv-00398-MJP,41,Plaintiff ,2,"Joel Wyman Collins , Jr",(See above for address),LEAD ATTORNEY; ATTORNEY TO BE NOTICED
...

cases.csv

case_row_id,case_number,pacer_id,case_name,court_name,assigned_to,referred_to,case_cause,jurisdictional_basis,demand,jury_demand,lead_case,related_case,settlement,date_filed,date_closed,date_last_filed
54973,01-Jan-1970,223949,"ASTRAZENECA AB et al v. SANDOZ, INC.",UNITED STATES DISTRICT COURT DISTRICT OF NEW JERSEY,Judge Joel A. Pisano,Magistrate Judge Tonianne J. Bongiovanni,35:271 Patent Infringement,Federal Question,,,,,,2009-01-14,2011-06-02,
427,0:00-cv-00019,1338,Banner Engineering v. Harris Instrument,UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA,,,,,,,,,,2000-01-04,2000-03-09,2000-03-02
428,0:00-cv-00058,1377,"Advanced UroScience, et al v. Inamed Corporation, et al",UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA,,,,,,,,,,2000-01-11,2000-11-30,2001-02-28
429,0:00-cv-00172-DWF-AJB,,Farnam Companies Inc v. Miller Manufacturing,U.S. District of Minnesota (DMN),Judge Donovan W. Frank,Chief Mag. Judge Arthur J. Boylan,35:271 Patent Infringement,Federal Question,,,0:98-cv-00040-DWF-AJB,"Dist of AZ, 99-01804",,,,
...

documents.csv

case_row_id,case_number,doc_count,attachment,date_filed,long_description,doc_number,short_description,upload_date
1,0:79-cv-06704-JCP,1,,2000-08-03,"COPY OF PAPER DOCKET SHEET (kw, Deputy Clerk) (Entered: 08/03/2000)",37,,
1,0:79-cv-06704-JCP,2,,1982-05-31,"CASE CLOSED. Case and Motions no longer referred to Magistrate. (kw, Deputy Clerk) (Entered: 08/03/2000)",,,
3,0:83-cv-06860-JAG,1,,2004-02-13,COPY OF PAPER DOCKET SHEET (Former Deputy Clerk) (Entered: 02/13/2004),123,,
3,0:83-cv-06860-JAG,2,,1992-03-01,Case closed (Former Deputy Clerk) (Entered: 03/05/1992),,,
...

pacer_cases.csv

case_name,court_code,court_name,date_closed,case_number,pacer_id,date_filed
"Davis v. Favelle Favco Cranes, et al",txsd,Texas Southern District Court,08/13/2001,1:2000-cv-00003,3,2000-01-03
Monsanto v. Sierks et al,ned,Nebraska District Court,07/10/2002,4:2002-cv-00105,4,2002-03-04
"Armament Sys & Proc v. Coast Cutlery Co Inc, et al",wied,Wisconsin Eastern District Court,03/03/2008,1:2000-cv-01273,4,2000-09-20
Tektronix Inc. v. Integraph Corporation,ord,Oregon District Court,09/04/1998,3:1998-cv-00599,7,1998-07-09
...

File specs

pacer_cases.csv: 74,954 records
 case_name  varchar(255),
 court_code  varchar(10), --txsd, ned
 court_name  varchar(255),
 date_closed  date, --mm/dd/yyyy
 case_number  varchar(100), --e.g., 1:2000-cv-00003
 pacer_id   int, --appears to be int
 date_filed  date --yyyy-mm-dd
documents.csv  5,186,345 records
 case_row_id  int,
 case_number  varchar(100), --e.g. 0:79-cv-06704-JCP
 doc_count  int, 
 attachment  varchar(255), --NULL when seen
 date_filed  date, --yyyy-mm-dd
 long_description  text, 
 doc_number  int,
 short_description  varchar(255) --NULL when seen
 upload_date  date --NULL when seen
cases.csv: 74,630 records
 case_row_id  int,
 case_number  varchar(100), 
 pacer_id  int,
 case_name  text,
 court_name  text,
 assigned_to  varchar(255), --Often NULL. Examples Judge Joel A. Pisano
 referred_to  varchar(255), --Often NULL. Examples Magistrate Judge Tonianne J. Bongiovanni
 case_cause  varchar(255), --Often NULL. Examples: 35:271 Patent Infringement, 35:145 Patent Infringement, etc.
 jurisdictional_basis  varchar(255), --Often NULL. Examples: Federal Question
 demand  varchar(100), --appears to be one of NULL, plaintiff, defendant, both
 jury_demand  varchar(100),
 lead_case  varchar(100), --appears to be case_number
 related_case  text, --appears to be a mix of things in semicolon seperated list
 settlement text, --appears always NULL
 date_filed  date, --yyyy-mm-dd
 date_closed  date, --yyyy-mm-dd
 date_last_filed  date --yyyy-mm-dd
names.csv  561,019 records
 case_row_id  int,
 case_number  varchar(100),
 party_row_count  int, 
 party_type  varchar(20), --Plaintiff or Defendant
 name_row_count  int,
 name  varchar(255)
attorney.csv: 1,223,419 records
 case_row_id  int,
 case_number  varchar(100),
 party_row_count  int,
 party_type   varchar(20),  --Plaintiff or Defendant
 attorney_row_count  int,
 name  varchar(255),
 contactinfo  varchar(255), --semicolon seperated value
 position  varchar(255) --semicolon seperated list e.g., LEAD ATTORNEY; ATTORNEY TO BE NOTICED

Obvious Issues

There are no codified patent numbers and outcomes

Some patent numbers can be found in documents.long_description but it seems that this is the docket headers and most patents will likely be in the documents themselves (which we don't have and would have to OCR).

We might be able to piece together outcomes from documents.long_description but this is going to be very hard. Clearly, this is one of Lex Machina's value added.