Demo Day Page Parser
Revision as of 16:20, 15 November 2017 by Peterjalbert (talk | contribs)
Demo Day Page Parser | |
---|---|
Project Information | |
Project Title | Demo Day Page Parser |
Owner | Peter Jalbert |
Start Date | |
Deadline | |
Primary Billing | |
Notes | |
Has project status | Active |
Copyright © 2016 edegan.com. All Rights Reserved. |
Project Specs
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the Accelerator Data page.
Code Location
The code directory for this project can be found:
E:\McNair\Software\Accelerators
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:
DemoDayCrawler.py
A script to rip from HTML to TXT can be found below. This script reads HTML files from a directory, and writes them to TXT in another directory:
htmlToText.py