Changes

Grace Tan (Work Log) (view source)

Revision as of 16:44, 18 July 2018

513 bytes added , 16:44, 18 July 2018

no edit summary

[[Grace Tan]] [[Work Logs]] [[Grace Tan (Work Log)|(log page)]]

2018-07-1518: ~~Ran google scholar crawler~~Finished running the rest of the 100 pages. ~~When~~ Took quite a long time because google scholar ~~blocks~~ was catching me ~~with a 403 error code~~after 2-5 pages rather than 5-10. It helped to switch between the different wifi(rice visitor, rice owls, eduroam). Altogether resulted in 958 bibtex files and 613 pdfs from 1000 entries. There might be more entries but I ~~exit~~ 'm not sure where to find them. I saved the ~~program~~ data and ~~rerun it at~~ code onto the ~~page that~~ rdp by connecting to it ~~last looked at by clicking on~~ from the ~~correct page number before crawling~~selenium box.

2018-07-1417: Ran google scholar crawler. When google scholar blocks me with a 403 error code, I exit the program and rerun it at the page that it last looked at by clicking on the correct page number before crawling. I finished running through 68/100 pages of google scholar. 2018-07-16: Ran through 10 pages of google scholar first thing without a problem. Tried running through all 100 pages but kept on getting caught. Helped Augi with discrepancies in data and will try google scholar crawler again tomorrow.

2018-07-13: Fixed problem where pdf urls were not saving to txt file. Created another txt file to save urls that are not pdfs. Didn't run into a single recaptcha all morning. Towards the end, it started catching me at the 7th query and forced the program to restart. For some reason, selecting the css element triggered google scholar to find me. I changed the css element tag for the "next" button to the path and I was able to get through the 4th page. It is still not able to click on the actual link but I'm not sure if that's supposed to do anything.

GraceTan

108

edits

Changes

Grace Tan (Work Log) (view source)

Revision as of 16:44, 18 July 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools