E:\projects\listing page identifier\generate_dataset.py
'''''Generate and Separate Label Image Data: ''''' feed train.txt and text.txt into Screenshot Tool to get our image data
This process also auto-generates class label and index in the name of the image file (see example below) [[File:autoName.png|450px]] * Images are split into two folders: train and testThe leading 0 or 1 indicates whether it is a corhort webpage or not* Images are also separated into corresponding sub folders: cohort and not_cohort within The second number after the first '_' represents the index(row number) in the folder <code>train and .txt</code> or <code>text.txt</code>*These two numbers will become helpful during the folder testmodeling
====CNN Model====
Python file saved in
E:\projects\listing page identifier\cnn.py