A script to rip from HTML to TXT can be found below. This script reads HTML files from a the DemoDayHTML directory, and writes them to TXT in another the DemoDayTxt directory:
htmlToText.py
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.
The script can be found:
KeyTerms.py
The Keyword matches text file can be found:
DemoDayTxt\KeyTermFile\KeyTerms.txt