Changes
Jump to navigation
Jump to search
Geocoding Inventor Locations (view source)
Revision as of 18:38, 22 August 2009
, 18:38, 22 August 2009→NGram and LCS Matching
===NGram and LCS Matching===
Longest Common Subsequence (LCS) is an abundantly used fuzzy matching technique. The [http://en.wikipedia.org/wiki/Longest_common_subsequence Longest Common Subsequence page on wikipedia] provides a very detailed background. However, LCS is an matching of two datasets is an NP-Hard problem and extremely processor intensive. To avoid long run -times, LCS matching is done on only a small sub-set of string strings that have met the NGram criteria detailed below.
NGram are letter token strings. Source and reference strings are transformed to include only characters from one of the following numbered sets: