Changes

Kyran Adams (Work Log) (view source)

Revision as of 16:53, 19 April 2018

153 bytes added , 16:53, 19 April 2018

→‎Spring 2018

[[Kyran Adams]] [[Work Logs]] [[Kyran Adams (Work Log)|(log page)]]

2018-04-16: Still working through using auto-generated features. It takes forever. :/ I reduced the number of words looked at to about 3000. This makes it a lot faster, and seems like it should still be accurate, because the most frequent words are words like "demo" and "accelerator". I also switched from using beautiful soup for text extraction to [https://github.com/aaronsw/html2text html2text]. I might consider using [https://nlp.stanford.edu/IR-book/html/htmledition/sublinear-tf-scaling-1.html Sublinear tf scaling] (parameter in the tf model).

2018-04-16: I think I'm going to transition from using hand-picked feature words to automatically generated features. [http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html This webpage] has a good example. I could also use n-grams, instead of unigrams. I might also consider using a SVM instead of a random forest, or a combination of the two.

Kyranstar

226

edits

Changes

Kyran Adams (Work Log) (view source)

Revision as of 16:53, 19 April 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools