Changes

U.S. Seed Accelerators (view source)

Revision as of 09:40, 30 July 2018

135 bytes added , 09:40, 30 July 2018

'''To use STEP3_clean.py''':

Note this is an optional step to use depending on the accuracy level you need and what kind of data you crawled earlier. I chose not to use this and instead set a more restrictive threshold in STEP2.

1. Change file f to be the output file from STEP2(you should delete anything that says "no match", and when you use STEP2, you must also write the ratio score to the text file). Change g to be the desired name of the output file for this part.

Your output should be a text file containing the company name and the URL that had the highest assigned score in STEP2. In case of more than 1 URL with the highest score, the script should take the first one.

Maxine.tao

145

edits

Changes

U.S. Seed Accelerators (view source)

Revision as of 09:40, 30 July 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools