Changes

PTLR Webcrawler (view source)

Revision as of 15:24, 28 September 2017

818 bytes added , 15:24, 28 September 2017

no edit summary

Created file FindKeyTerms.py in Software/Google_Scholar_Crawler which takes in a text file and returns counts of the key terms from the codification page.

Already included SIS, DHCI and OP terms and working on adding the others.

09/28

Thought that the pdf to text converter wasn't working, but realized that it does just sloooowly (70 papers converted overnight). Should be fine since we are still developing the rest of the code and we only need to convert them to txt once.

Continued to load PTLR codification terms to the word finding code and got most of the way through (there are so many ahhh but I'm learning ways to do this more quickly). Once they're all loaded up, I will create some example files of the kind output this program will produce for Lauren to review and start:

1) Seeking definitions of patent thicket (I think I'll start by pulling any sentence that patent thicket occurs in as well as the sentence before and after).

2) Classifying papers based on the matrix of term appearances that the current program builds.

=Lauren's LOG=

ChristyW

272

edits

Changes

PTLR Webcrawler (view source)

Revision as of 15:24, 28 September 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools