7/16 - Figured out which file is capable of rewriting the Classifier.pkl file and how all the code and test files go together. I built a small training and test data set to work with, and I got IndustryClassifierCOPY.py to run on my data. I had to fix many index and key issues in parts of the code, which is not commented at all. With 10 industry categories and 970 training data points, I think the accuracy rate is around 30%. I tried to run the code on a bigger training data set, hoping that the accuracy rate would come up, but I got error messages back.
7/17 - Tried to run test data through FinalIndustryClassifier.py but it doesn't work even though the same file works in IndustryClassifierCOPY.py. The crunchbase descriptions are longer than the old ones from venturexpert, and I'm thinking that the accuracy rate may come up if I give the model more data and use these long descriptions. I talked with Wei and we tried to figure out the details of sklearn and the code together.