Difference between revisions of "Listing Page Classifier"

From edegan.com
Jump to navigation Jump to search
Line 6: Line 6:
  
 
== Main Tasks ==
 
== Main Tasks ==
# Build a site map generator: output every internal links of input websites
+
# Build a site map generator: output every internal link of input websites
# Build a generator that captures screenshot of individual web pages
+
# Build a tool that captures a screenshot of individual web pages
 
# Build a CNN classifier using Python and TensorFlow
 
# Build a CNN classifier using Python and TensorFlow
  

Revision as of 09:52, 31 March 2019


Project
Listing Page Classifier
Project logo 02.png
Project Information
Has title Listing Page Classifier
Has owner Nancy Yu
Has start date
Has deadline date
Has project status Active
Copyright © 2019 edegan.com. All Rights Reserved.


Main Tasks

  1. Build a site map generator: output every internal link of input websites
  2. Build a tool that captures a screenshot of individual web pages
  3. Build a CNN classifier using Python and TensorFlow

Approaches (IN PROGRESS)

  1. URL Crawler
E:\projects\listing page identifier\urlcrawler.py

Image Processing

This method would likely rely on a convolutional neural network (CNN) to classify HTML elements present in web page screenshots. Implementation could be achieved by combining the VGG16 model or ResNet architecture with batch normalization to increase accuracy in this context.