Difference between revisions of "Listing Page Extractor"

From edegan.com
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{Project
 
{{Project
 +
|Has project output=Tool
 +
|Has sponsor=Kauffman Incubator Project
 
|Has title=Listing Page Extractor
 
|Has title=Listing Page Extractor
 
|Has project status=Active
 
|Has project status=Active
Line 5: Line 7:
 
}}
 
}}
  
[[Category:Project Component]]
+
The objective of this project is to build a tool that automatically extracts the listing of client companies from an incumbator's website. The first step of the project is to develop the [[LP Extractor Protocol]].
 +
 
 +
==LP Extractor Protocol==
 +
 
 +
{{:LP Extractor Protocol}}

Latest revision as of 12:47, 21 September 2020


Project
Listing Page Extractor
Project logo 02.png
Project Information
Has title Listing Page Extractor
Has start date
Has deadline date
Has project status Active
Does subsume LP Extractor Protocol
Has sponsor Kauffman Incubator Project
Has project output Tool
Copyright © 2019 edegan.com. All Rights Reserved.


The objective of this project is to build a tool that automatically extracts the listing of client companies from an incumbator's website. The first step of the project is to develop the LP Extractor Protocol.

LP Extractor Protocol

The LP Extractor Protocol currently envisages marking data locations on webpages, converting webpages into a simplified Domain Specific Language (DSL), and then encoding the DSL into a matrix. The markings of data locations would be encoded into a companion matrix. Both matrices will then be fed into a neural network, which is trained to produce the markings given the DSL. To date, we have conducted a literature review that has found papers describing similar "paired input" networks, and are in the process refining our understanding of the pre-existing code and work related to each step.