Difference between revisions of "Listing Page Classifier"
Jump to navigation
Jump to search
Line 4: | Line 4: | ||
|Has project status=Active | |Has project status=Active | ||
}} | }} | ||
− | |||
− | |||
− | |||
− | |||
== Main Tasks == | == Main Tasks == | ||
Line 17: | Line 13: | ||
# URL Crawler | # URL Crawler | ||
E:\projects\listing page identifier\urlcrawler.py | E:\projects\listing page identifier\urlcrawler.py | ||
+ | |||
+ | === Image Processing === | ||
+ | |||
+ | This method would likely rely on a [https://en.wikipedia.org/wiki/Convolutional_neural_network convolutional neural network (CNN)] to classify HTML elements present in web page screenshots. Implementation could be achieved by combining the VGG16 model or ResNet architecture with batch normalization to increase accuracy in this context. |
Revision as of 13:51, 30 March 2019
Listing Page Classifier | |
---|---|
Project Information | |
Has title | Listing Page Classifier |
Has owner | Nancy Yu |
Has start date | |
Has deadline date | |
Has project status | Active |
Copyright © 2019 edegan.com. All Rights Reserved. |
Main Tasks
- Build a site map generator: output every internal links of input websites
- Build a generator that captures screenshot of individual web pages
- Build a CNN classifier using Python and TensorFlow
Approaches (IN PROGRESS)
- URL Crawler
E:\projects\listing page identifier\urlcrawler.py
Image Processing
This method would likely rely on a convolutional neural network (CNN) to classify HTML elements present in web page screenshots. Implementation could be achieved by combining the VGG16 model or ResNet architecture with batch normalization to increase accuracy in this context.