Changes
Jump to navigation
Jump to search
Extracting Features from Surnames (view source)
Revision as of 18:39, 10 July 2009
, 18:39, 10 July 2009no edit summary
*This page is a part of series in [[Classifying Names by Culture]]
Extracting features from surnames entails encoding the frequency of [http://en.wikipedia.org/wiki/Ngram n-grams] and other features such as the string length. Recall that 1-grams are letters or characters, also called unigrams, 2-grams are called bigrams or digraphs, and 3-grams are called trigrams. In some applications entire words, sentences or other tokens are used as grams.