Difference between revisions of "Ecosystem Organization Classifier"
Line 9: | Line 9: | ||
The purpose of this project is to build a classifier, which takes the description of an ecosystem organization (i.e., a startup, a venture capitalist, an incubator, etc.) and either correctly classifies the organization's type or correctly classifies incubators vs. non-incubators. | The purpose of this project is to build a classifier, which takes the description of an ecosystem organization (i.e., a startup, a venture capitalist, an incubator, etc.) and either correctly classifies the organization's type or correctly classifies incubators vs. non-incubators. | ||
+ | |||
+ | ===Text Processing=== | ||
+ | |||
+ | There are two possible classification methods for the processing the text of target HTML pages. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses shallow 2 layer neural networks to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.) | ||
==Related Projects== | ==Related Projects== |
Revision as of 13:51, 30 March 2019
Ecosystem Organization Classifier | |
---|---|
Project Information | |
Has title | Ecosystem Organization Classifier |
Has start date | |
Has deadline date | |
Has project status | Active |
Is dependent on | Crunchbase Database, VentureXpert Database |
Does subsume | Defining Incubators, Incubator Seed Data, Incubators in Five Ecosystems |
Copyright © 2019 edegan.com. All Rights Reserved. |
Introduction
The purpose of this project is to build a classifier, which takes the description of an ecosystem organization (i.e., a startup, a venture capitalist, an incubator, etc.) and either correctly classifies the organization's type or correctly classifies incubators vs. non-incubators.
Text Processing
There are two possible classification methods for the processing the text of target HTML pages. The first is a "Bag of Words" approach, which uses Term Frequency – Inverse Document Frequency to do basic natural language processing and select words or phrases which have discriminant capabilities. The second is a Word2Vec approach which uses shallow 2 layer neural networks to reduce descriptions to a vector with high discriminant potential. (See "Memo for Evan" in E:\mcnair\Projects\Incubators for further detail.)
Related Projects
Subsumed Projects: Defining Incubators, Incubator Seed Data, Incubators in Five Ecosystems
This project is dependent on: Crunchbase Database, VentureXpert Database