Changes

Jump to navigation Jump to search
617 bytes added ,  15:47, 3 August 2018
E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\BuildTestData.sql
Since this dataset has different and more classifications than the venture capital data previously used, we need to rebuild a coding system for the classifier.==MLP Classifier==
The new version that I am editing on is:
E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\IndustryClassifierCONDENSED-USETHIS.py
bigtest2018.txt
This file modifies the Classifier.pkl file which stores the components of the model. Eventually, we should be able to run this through FinalIndustryClassifier.py.
 
The crunchbase data in my training data has almost 40 labels and I could not get the accuracy rate of this model to go up past 30%. However, if you assign only 3 labels, the accuracy rate goes up to 50%
 
==LSTM Model==
See old page here [[Deep Text Classifier]]. I updated the preprocessing file to run on python3.
 
I tried updating this code to run on the new data from Crunchbase. Files used are located in:
E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\Yang's Code
 
You should first run the preprocessing file and then use the classification file. I could not figure out why the accuracy on this model was only 10% with 40 labels and around 30% with 5-8 labels. The accuracy of this one should be higher than the MLP classifier.
=New Notes=
145

edits

Navigation menu