Difference between revisions of "Listing Page Classifier Progress"

Revision as of 15:18, 4 April 2019

This page records the progress on the Listing Page Classifier Project

3/28/2019

Assigned Tasks:

Suggested Approaches:

4/1/2019

Site map:

Some internal links may not include home_page url : e.g. /careers
Updated urlcrawler.py (having issues with identifying internal links does not start with "/") <- will work on this part tomorrow

4/2/2019

Site map:

Solved the second bullet point from yesterday
Recursion to get internal links from a page causing HTTPerror on some websites (should set up a depth constraint- WILL WORK ON THIS TOMORROW )

4/3/2019

Site map:

4/4/2019

Site map (DONE):

@@ Line 36: / Line 36: @@
 '''4/4/2019'''
-Site map:
+Site map (DONE):
 *Test run couple sites to see if there are edge cases that I missed
 *Implement the code: try to output the result in a txt file
+*Will work on screenshot generator next week