==Hand Collecting Data==
During the initially test run, the number of good pages was 359. The data is then handled by hand by fellow interns.
ConnorThe file for hand-coding is in: /bulk/McNair/Projects/Accelerator Demo Day/Test Run/CrawledDemoDayHTMLFull/'''FinalResultWithURL''' For the sake of collaboration, the team copied this information to a Google Sheet, accessible here: https://docs.google.com/spreadsheets/d/16Suyp364lMkmUuUmK2dy_9MeSoS1X4DfFl3dYYDGPT4/edit ?usp=sharing We split the process into four parts. Each interns will do the following: 1. Go to the given URL.2. Record whether the page is good data (column F); this can later be used by [[Minh Le]] to refine/fine-tune training data.3. Record whether the page is announcing a cohort or recapping/explaining a demo day (column G). This variable will be used to decide if we should subtract weeks from the given date (e.g. if it is recapping a demo day, the cohort went through the accelerator for the past ~12 weeks, and we should subtract weeks as such).4. Record date, month, year, and the companies listed for that given accelerator.5. Note any any information here, such as a cohort's special name. Once this process is finished, we will filter only the 1s in Column F, and [[Connor Rothschild]] and [[Maxine Tao]] will work to populate empty cells in The File to Rule Them All with that data.
==Advance User Guide: An in-depth look into the project and the various settings==