A file with the name of the results that passed keyword matching:
DemoDayHitsFull.txt
==Faulty Results==
The first pass through the data revealed articles that had thousands of hits for keyword matches. This seemed highly suspicious, so we dug in deeper to investigate the cause of this issue.
The following script in the same directory analyzes the keyword matches to determine the words with the highest number of hits.
DemoDayAnalysis.py
After investigation, it was found that many company names were taken after common english words. Here are some of the companies causing issues along with their associated accelerator:
the, L-Spark
Matter, This.
Fledge, HERE
StartupBootCamp, We...
LightBank Start, Zero
Entrepreneurs Roundtable Accelerator, SELECT
Y Combinator, Her
Y Combinator, Final
AngelCube, class
Matter, common
L-Spark, Company
After removing these companies from consideration as keywords,