Difference between revisions of "Scholar Crawler Main Program"
Jump to navigation
Jump to search
(Created page with "{{McNair Projects |Has title=Scholar Crawler Main Program |Has owner=Christy Warden, |Has start date=10/23/2017 |Has keywords=Google Scholar, python |Has project status=Active...") |
|||
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | {{McNair | + | {{Project |
+ | |Has project output=Tool | ||
+ | |Has sponsor=McNair Center | ||
|Has title=Scholar Crawler Main Program | |Has title=Scholar Crawler Main Program | ||
|Has owner=Christy Warden, | |Has owner=Christy Warden, | ||
Line 7: | Line 9: | ||
|Depends upon it=[[PTLR Webcrawler]] | |Depends upon it=[[PTLR Webcrawler]] | ||
}} | }} | ||
+ | |||
+ | =Overview= | ||
+ | This code is located at E:/McNair/Software/Google_Scholar_Crawler/mainProgram.py. It calls on various other pieces of code to create a cohesive program for the patent thicket project which takes in a search term and a number of pages. It responds by searching on Google Scholar for that term, downloaded as many papers as it can from that search, converting them to text and searching for key terms and a definition of patent thicket in the text. Each piece of code can also be used individually for other applications. | ||
+ | |||
+ | =Stage 1= | ||
+ | Sets up a series of directories for results to go in. | ||
+ | |||
+ | =Stage 2= | ||
+ | [[Google Scholar Crawler]] under scholarcrawl.py heading. | ||
+ | |||
+ | =Stage 3= | ||
+ | [[PDF Downloader]] | ||
+ | |||
+ | =Stage 4= | ||
+ | [[PDF to Text Converter]] | ||
+ | |||
+ | =Stage 5= | ||
+ | [[Key Terms Search]] |
Latest revision as of 12:47, 21 September 2020
Scholar Crawler Main Program | |
---|---|
Project Information | |
Has title | Scholar Crawler Main Program |
Has owner | Christy Warden |
Has start date | 10/23/2017 |
Has deadline date | |
Has keywords | Google Scholar, python |
Has project status | Active |
Has sponsor | McNair Center |
Has project output | Tool |
Copyright © 2019 edegan.com. All Rights Reserved. |
Overview
This code is located at E:/McNair/Software/Google_Scholar_Crawler/mainProgram.py. It calls on various other pieces of code to create a cohesive program for the patent thicket project which takes in a search term and a number of pages. It responds by searching on Google Scholar for that term, downloaded as many papers as it can from that search, converting them to text and searching for key terms and a definition of patent thicket in the text. Each piece of code can also be used individually for other applications.
Stage 1
Sets up a series of directories for results to go in.
Stage 2
Google Scholar Crawler under scholarcrawl.py heading.