Changes

Jump to navigation Jump to search
no edit summary
[[Grace Tan]] [[Work Logs]] [[Grace Tan (Work Log)|(log page)]]
 
2018-07-19: Started on converting pdfs to txt files. I found pdf_to_txt_bulk_PTLR.py in E:/McNair/Software/Google_Scholar_Crawler. I copied this and moved my data to E:/McNair/Software/Patent_Thicket. When I tried to run the program, it was giving me an error when trying to import pdfminer because I originally tried running it in Z, so I installed pdfminer.six and it did not complain with python3.6 It gave me a different error with a missing package which Wei said we shouldn't touch so I moved everything back to E. It now runs but cannot convert any pdfs. I have no idea how to fix this.
2018-07-18: Finished running the rest of the 100 pages. Took quite a long time because google scholar was catching me after 2-5 pages rather than 5-10. It helped to switch between the different wifi(rice visitor, rice owls, eduroam). Altogether resulted in 958 bibtex files and 613 pdfs from 1000 entries. There might be more entries but I'm not sure where to find them. I saved the data and code onto the rdp by connecting to it from the selenium box.
108

edits

Navigation menu