==Industry Classifier== - by Yang Zhang
Goal: for For each company we want to classify its industry based on its description
Approach:
step Step 1: encode the text description into numerical values.step Step 2: build a deep neural network to learn to classify.
For step 1, a very naive way is to use the "bag of words " representation. The obvious drawback of this way is drawbacks are that you just ignore the correlations between the words and also their relative orders. So, instead, we use "word2vec " (https://en.wikipedia.org/wiki/Word2vec) this method, where , in short, each word is represented as mapped to a vector which indicate represents how likely the other words will appear around this center word.
For step 2: we have tried 1D /2D convolutional NN (Neural Network) and LSTMRNN (Recurrent Neural Network). Both All the models can achieve 90+% training accuracy and around 60% testing accuracy. Notice that this task is even hard for humans and the baseline of randomly guessing is around 10%, 60% is acceptable. Turning the parameters doesn't help much meaning we might have reached the model's max capability.
Next steps:
Try with longer descriptions and see if more information can provide us better accuracy