Changes

Demo Day Page Google Classifier (view source)

Revision as of 16:08, 5 April 2018

240 bytes added , 16:08, 5 April 2018

This is a tensorflow project that classifies webpages as a demo day page containing a list of cohort companies, currently using scikit learn's random forest model. The classifier itself takes:

A: The number of times each word in words.txt occurs in a webpage. This is calculated by web_demo_features.py in the same directory. It also takes the number of occurrences of years from 1900-2099, and month words group in seasons. It also takes the number of simple links (links in the form www.abc.com or www.abc.org) and the number of those that are attached to images.

B: A set of webpages hand-classified as to whether they contain a list of cohort companies. This is stored in classification.txt, which is a tsv equivalent of Demo Day URLS.xlsx. Keep in mind that this txt file must be utf-8 encoded. In textpad, one can convert a file to utf-8 by pressing save-as, and changing the encoding at the bottom.

Kyranstar

226

edits

Changes

Demo Day Page Google Classifier (view source)

Revision as of 16:08, 5 April 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools