Changes

Geocoding Inventor Locations (view source)

Revision as of 21:11, 24 August 2009

10 bytes added , 21:11, 24 August 2009

m

Longest Common Subsequence (LCS) is an abundantly used fuzzy matching technique. The [http://en.wikipedia.org/wiki/Longest_common_subsequence Longest Common Subsequence page on wikipedia] provides a very detailed background. However, LCS matching of two datasets is an NP-Hard problem and extremely processor intensive. To avoid long run-times, LCS matching is done on only a small sub-set of strings that have met the NGram criteria detailed below.

~~NGram~~ NGrams are ~~letter~~ character-based token strings. Source and reference strings are transformed to include only characters from one of the following numbered sets:

#ABCDEFGHIJKLMNOPQRSTUVWXYZ (i.e. uppercase Latin alphabet)

#0123456789 (i.e. Standard numbers)

Anonymous user

imported>Ed

Changes

Geocoding Inventor Locations (view source)

Revision as of 21:11, 24 August 2009

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools