Difference between revisions of "PTLR Webcrawler"

From edegan.com
Jump to navigation Jump to search
Line 29: Line 29:
 
Created file FindKeyTerms.py in Software/Google_Scholar_Crawler which takes in a text file and returns counts of the key terms from the codification page.  
 
Created file FindKeyTerms.py in Software/Google_Scholar_Crawler which takes in a text file and returns counts of the key terms from the codification page.  
 
Already included SIS, DHCI and OP terms and working on adding the others.
 
Already included SIS, DHCI and OP terms and working on adding the others.
 +
 +
 +
09/28
 +
 +
Thought that the pdf to text converter wasn't working, but realized that it does just sloooowly (70 papers converted overnight). Should be fine since we are still developing the rest of the code and we only need to convert them to txt once.
 +
 +
Continued to load PTLR codification terms to the word finding code and got most of the way through (there are so many ahhh but I'm learning ways to do this more quickly). Once they're all loaded up, I will create some example files of the kind output this program will produce for Lauren to review and start:
 +
 +
1) Seeking definitions of patent thicket (I think I'll start by pulling any sentence that patent thicket occurs in as well as the sentence before and after).
 +
 +
2) Classifying papers based on the matrix of term appearances that the current program builds.
  
 
=Lauren's LOG=
 
=Lauren's LOG=

Revision as of 15:24, 28 September 2017

PTLR Codification

Christy

Monday: 3-5

Tuesday: 9-10:30, 4-5:45

Thursday: 2:15-3:45

Steps

Search on Google

Complete, query in command line to get results

Download BibTex

Complete

Download PDFs

Incomplete, struggling to find links.

Christy's LOG

09/27

Created file FindKeyTerms.py in Software/Google_Scholar_Crawler which takes in a text file and returns counts of the key terms from the codification page. Already included SIS, DHCI and OP terms and working on adding the others.


09/28

Thought that the pdf to text converter wasn't working, but realized that it does just sloooowly (70 papers converted overnight). Should be fine since we are still developing the rest of the code and we only need to convert them to txt once.

Continued to load PTLR codification terms to the word finding code and got most of the way through (there are so many ahhh but I'm learning ways to do this more quickly). Once they're all loaded up, I will create some example files of the kind output this program will produce for Lauren to review and start:

1) Seeking definitions of patent thicket (I think I'll start by pulling any sentence that patent thicket occurs in as well as the sentence before and after).

2) Classifying papers based on the matrix of term appearances that the current program builds.

Lauren's LOG

09/27

Took a random sample from "Candidate Papers by LB" and am reading each paper, extracting the definitions, and coding the definitions by hand. This is expected to the be a control group which will be tested for accuracy against computer coded papers in the future. The random sample contains the following publications:

Entezarkheir (2016) - Patent Ownership Fragmentation and Market Value An Empirical Analysis.pdf

Herrera (2014) - Not Purely Wasteful Exploring a Potential Benefit to Weak Patents.pdf

Kumari et al. (2017) - Managing Intellectual Property in Collaborative Way to Meet the Agricultural Challenges in India.pdf

Pauly (2015) - The Role of Intellectual Property in Collaborative Research Crossing the 'Valley of Death' by Turning Discovery into Health.pdf

Lampe Moser (2013) - Patent Pools and Innovation in Substitute Technologies - Evidence From the 19th-Century Sewing Machine Industry.pdf

Phuc (2014) - Firm's Strategic Responses in Standardization.pdf

Reisinger Tarantino (2016) - Patent Pools in Vertically Related Markets.pdf

Miller Tabarrok (2014) - Ill-Conceived, Even If Competently Administered - Software Patents, Litigation, and Innovation--A Comment on Graham and Vishnubhakat.pdf

Llanes Poblete (2014) - Ex Ante Agreements in Standard Setting and Patent-Pool Formation.pdf

Utku (2014) The Near Certainty of Patent Assertion Entity Victory in Portfolio Patent Litigation.pdf

Trappey et al. (2016) - Computer Supported Comparative Analysis of Technology Portfolio for LTE-A Patent Pools.pdf

Delcamp Leiponen (2015) - Patent Acquisition Services - A Market Solution to a Legal Problem or Nuclear Warfare.pdf

Allison Lemley Schwartz (2015) - Our Divided Patent System.pdf

Cremers Schliessler (2014) - Patent Litigation Settlement in Germany - Why Parties Settle During Trial.pdf