#Class Label: Cohort indicator ( 1 - it is a cohort page, 0 - not a cohort page)
====Data Preprocessing (IN PROGRESS)====* This part aims to create an automation process for combining results assign corresponding cohort indicators to each internal url generated from by the Site Map Tool with corresponding cohort indicators. The generated data is splited Results are split into two text files: train.txt and test.txt.
Python file saved in
E:\projects\listing page identifier\generate_dataset.py
* Images are also split into two folders: train and test
** separated into different sub folders: cohort and not_cohort