Difference between revisions of "Industry Classifier"
Peterjalbert (talk | contribs) |
|||
Line 62: | Line 62: | ||
Stastical methods for analyzing results from a neural network. | Stastical methods for analyzing results from a neural network. | ||
− | [ | + | [https://en.wikipedia.org/wiki/Precision_and_recall Precision and Recall] |
Revision as of 12:08, 13 February 2017
Industry Classifier | |
---|---|
Project Information | |
Project Title | |
Start Date | |
Deadline | |
Primary Billing | |
Notes | |
Has project status | |
Copyright © 2016 edegan.com. All Rights Reserved. |
Contents
Possible Tools
Python Tools
SciKit Learn SVM
http://scikit-learn.org/stable/modules/svm.html#svm
It's complexity is between O(n^2) and O(n^3). Seems easy to use. This is not a neural net; it is a support vector machine.
SciKit Learn Neural Net
http://scikit-learn.org/stable/modules/neural_networks_supervised.html
This IS a neural net using back propagation.
It's complexity is listed as: Suppose there are n training samples, m features, k hidden layers, each containing h neurons - for simplicity, and o output neurons. The time complexity of backpropagation is O(n * m * h^k * o * i), where i is the number of iterations. Since backpropagation has a high time complexity, it is advisable to start with smaller number of hidden neurons and few hidden layers for training.
SK Neural Network Package
This is a separate package than listed above. It requires a separate installation. Documentation is provided at:
https://scikit-neuralnetwork.readthedocs.io/en/latest/index.html
We ran into deprecation warnings, and the program would not execute due to a missing g++ drive.
R Tools
R seems to have a built in package called "neuralnet".
An example is given at:
https://www.packtpub.com/books/content/training-and-visualizing-neural-network-r
Scripts
Scripts and data for this project are located in:
E:\McNair\Projects\Accelerators\Code+Final_Data\ChristyCode
Industry Classifier
This is a neural net built in python that trains on industry designation data from the SDC Platinum database. It serves as a predictive model to predict the industry allocation of given companies. The file is located in the directory listed above.
Addresses.txt
This text file contains investment info, name, address, city, state of Portfolio companies.
Descriptions.txt
This text file contains company, short description, major industry, minor industry of Portfolio companies.
Statistics
Stastical methods for analyzing results from a neural network.