This graph the number of training examples given versus the accuracy.
2018-03-28: Changed to using sykitlearn random forest instead of tensorflow, because this would allow me to see which features have a lot of value and might be affecting the model negatively. One observation I made is that certain years affect the model highly... Maybe I should generalize it for the occurrence of any year. Also, I discovered that just using hand-picked features improved accuracy by 10% rather than using all of the word counts. After that, the only other feature I can think of is the number of images in the page or in the center of the page, because often there are images with all of the cohort companies' logos. Tomorrow I am also going to work on hyperparameter tuning and increasing the amount of data we have.