NEVER touch the TrainingHTML folder, datareader.py or the classifier.txt. These are used internally to train data.
==Amazon Mechanical Turk==
There's a file in the folder
CrawledHTMLFull
called
FinalResultWithURL
that was manually created by combining the file
crawled_demoday_page_list.txt
in the mother folder and the file
predicted.txt
This file combined the predictions to the actual url of the websites.
Since MTurk makes it hard for us to display the downloaded HTML, it is much faster to just copy the url into the question box rather than trying to display the downloaded HTML.
However. there is a disadvantage to this: websites are ever changing, so there is a possibility that in the future, the URL may not be usable, or has changed to something else; on the other hand, downloaded HTMLs remain the same because it does not require any internet connection to render and thus, the content is static.
To create the MTurk for this project, follow this tutorial in [[Mechanical Turk (Tool)]]. For testing and development purpose, use https://requestersandbox.mturk.com/
Test account:
email: mcboatfaceboaty670@gmail.com
password: sameastheoneforemail2018
For this project, all the fields that was asked of the user is:
''Connor, add the criteria here''
Layout:
''Connor, add the screenshot here''
==Advance User Guide: An in-depth look into the project and the various settings==