python preprocessing.py
* Step 5 : give your pickle file a more reasonable name
By default, the name of the pickle file is same as the original ".txt" file. But it's highly likely that you will use the same text inputs to predict different things. So it's important to give your pickle file a more reasonable name. For example, from "longdescriptions.pkl" to "longdescriptions_indu.pkl" given the input file name is "longdescriptions.txt".
'''Model Training/Prediction (classification_MMM_LLL.py)''' : this is where the deep neural network is. The "MMM" represents the model. For example, currently I have "1DConvolution", "2DConvolution" and "LSTM". "LLL" represents the name of the label. Notice that for the same text inputs we can predict for different things using the same model literally. For example, "classification_LSTM_indu.py" is a LSTM model to predict the industray based on the descriptions. And "classification_LSTM_ipo.py" is a LSTM model to predict the IPO status based on the same descriptions. You need to name your files properly! This Python file, no matter what the model is, will always load in a pickle file you generated in the previous step and train the neural network. At the end, the well trained neural network will predict on your testing examples (the examples you don't see during the training) and print the accuracy. To run this part:
python classification_LSTM.py
Notice that the data preprocessing part usually only needs to be done once. The saved pickle file is basically a machine friendly code that can be loaded very fast.
==Data Preprocessing==