** 100 out of 145(around 70%) of the data will be used to train our model, the rest (45 accelerators, around 30%) will be used as the test data
*The type of inputs for CNN model:
#Picture of the web page (Image data that is generated from the above screenshot tool) #Cohort indicator (Categorical data: 1 - it is a cohort page, 0 - not a cohort page)
'''Note:''' The cohort indicator implies that our dataset is a labeled dataset, this may become helpful when choosing packages for building the CNN model
====Data Preprocessing (IN PROGRESS)====
This part aims to create an automation process for combining results generated from the Site Map Tool and the Screenshot Tool with cohort indicators. The generated dataset from this process will be fed into our CNN model.