Listing Page Extractor
The objective of this project is to build a tool that automatically extracts the listing of client companies from an incumbator's website. The first step of the project is to develop the LP Extractor Protocol.
Listing Page Extractor | |
---|---|
Project Information | |
Has title | Listing Page Extractor |
Has start date | |
Has deadline date | |
Has project status | Active |
Does subsume | LP Extractor Protocol |
Has sponsor | Kauffman Incubator Project |
Has project output | Tool |
Copyright © 2019 edegan.com. All Rights Reserved. |
LP Extractor Protocol
The LP Extractor Protocol currently envisages marking data locations on webpages, converting webpages into a simplified Domain Specific Language (DSL), and then encoding the DSL into a matrix. The markings of data locations would be encoded into a companion matrix. Both matrices will then be fed into a neural network, which is trained to produce the markings given the DSL. To date, we have conducted a literature review that has found papers describing similar "paired input" networks, and are in the process refining our understanding of the pre-existing code and work related to each step.