The objective of this project is to determine which web page on an incubator's website contains the client company listing.
The project will ultimately use data (incubator names and URLs) identified using the [[Ecosystem Organization Classifier]] (perhaps in conjunction with an additional website finder tool, if the [[Incubator Seed Data]] source does not contain URLs). Initially, however, we will use are using accelerator websites taken from the master file from the [[U.S. Seed Accelerators]] project.
We will build are building three tools: a site map generator, a web page screenshot tool, and an image classifier. Then, given an incubator URL, we will find and generate (standardized size) screenshots of every web page on the website, code which page is the client listing page, and use the images and the coding to train our classifier. The We currently plan to build the classifier will likely be built using a convolutional neural network (CNN), as these are particularly good effective at handling image classification.
==Current Work==