Difference between revisions of "Accelerator Demo Day"
Leminh.ams (talk | contribs) |
Leminh.ams (talk | contribs) |
||
Line 16: | Line 16: | ||
==The Classifier== | ==The Classifier== | ||
===Input (Features)=== | ===Input (Features)=== | ||
− | The input (features) right now is the frequency of X_NUMBER of words appearing in each documents. The word choice is hand selected. | + | The input (features) right now is the frequency of X_NUMBER of words appearing in each documents. The word choice is hand selected. This is the naive bag-of-word approach. |
Idea: Create a matrix with the first col being the file BiBTex, and the following columns are the words, and the value at (file, word) is the frequency of that word in the file. | Idea: Create a matrix with the first col being the file BiBTex, and the following columns are the words, and the value at (file, word) is the frequency of that word in the file. | ||
Then, split the matrix into an array of row vectors, and each vector is then feed into the RNN) | Then, split the matrix into an array of row vectors, and each vector is then feed into the RNN) | ||
+ | This seems to not give really high accuracy with our LSTM RNN, so I will consider a word2vec approach | ||
==Reading resources== | ==Reading resources== | ||
http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf | http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf |
Revision as of 10:52, 20 July 2018
Accelerator Demo Day | |
---|---|
Project Information | |
Project Title | Accelerator Demo Day |
Owner | Minh Le |
Start Date | 06/18/2018 |
Deadline | |
Primary Billing | |
Notes | |
Has project status | Active |
Subsumes: | Demo Day Page Parser, Demo Day Page Google Classifier |
Copyright © 2016 edegan.com. All Rights Reserved. |
Contents
Project
This project that utilizes Selenium and Machine Learning to get good candidate web pages and classify webpages as a demo day page containing a list of cohort companies, currently using scikit learn's random forest model and a bag of words approach
Code Location
The source code and relevant files for the project can be found here:
E:\McNair\Projects\Accelerator Demo Day\
Development Notes
The Crawler Functionality
To be updated
The Classifier
Input (Features)
The input (features) right now is the frequency of X_NUMBER of words appearing in each documents. The word choice is hand selected. This is the naive bag-of-word approach.
Idea: Create a matrix with the first col being the file BiBTex, and the following columns are the words, and the value at (file, word) is the frequency of that word in the file. Then, split the matrix into an array of row vectors, and each vector is then feed into the RNN)
This seems to not give really high accuracy with our LSTM RNN, so I will consider a word2vec approach
Reading resources
http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf