The results after the matching done by STEP2_findcorrecturl.py, are in 'ACTUAL_finalurls.txt'.
Note that in the end, I decided to only take URLs that were given a match score of greater than 0.9by setting this restriction in STEP2_findcorrecturl.py. Then I manually removed any duplicates/inaccurate results. If you want, you can set the threshold lower in STEP2 and use STEP3_clean.py to find the URL with the highest score for each company. ====Using Python files====""""To use STEP1_crawl.py""""
==An Overview==