Changes
Jump to navigation
Jump to search
Geocoding Inventor Locations (view source)
Revision as of 01:59, 21 January 2010
, 01:59, 21 January 2010no edit summary
By default all files are outputted to the Results directory. Which files are output depends on the options selected, though the main results file is always outputted (with or without unmatched addresses) and includes fuzzy matches (unless the <tt>-e</tt> option is used to force just exact matching). The main results file outputs:
*COUNTRY - From the source entry
*STR - From the source entry
*CTY - From the source entry
*EXP_CITY - From the source entry
*EXP_ADM - From the source entry
*EXP_POSTCODE - From the source entry
*CTY_STR - A compound entry, delimited by #, used as an internal key. It is the software's best estimate of an address structure.
*EXP_STR - A compound entry, delimited by #, made from the exception data in a similar way to CTY_STR
*PRS_POSTCODE - The software's best estimate of the postcode if any
*MATCH_TYPE - The match type that was used to make the match
*PLACE - The name of the most precise location
*UNI - The GNS unique identifier of the most precise location
*LAT - The latitude of the most precise location
*LONG - The longitude of the most precise location
*FC - The FC code of the most precise location
The most precise location is taken to be the finest grained result. That is the match corresponding to the lowest level FC code. In the case of the default of FC=A,P,L preference is given to L then P then A. The following variables are then repeated for each FC code searched, and prefixed by the FC code (if no match was found for this FC code the entries will be blank):
*NAME
*UNI
*LAT
*LONG
The fuzzy match file(s), if requested with <tt>-wf</tt>, have the same format (they are written by the same method). The report file is a copy of the output to the terminal, and can be enabled with the <tt>-r</tt> option. The human choice file (enabled with <tt>-human</tt> has its own format as follows:
*SOURCENAME - The word, token or string from the source entry that is being considered as relevant for a match
*REFNAME - The name of a place in the GNS file
*COUNTRY - From the source entry
*STR - From the source entry
*CTY - From the source entry
*EXP_CITY - From the source entry
*EXP_ADM - From the source entry
*EXP_POSTCODE - From the source entry
*REFTOTAL - The total number of grams in REFNAME
*SOURCETOTAL - The total number of grams in SOURCENAME
*REFPC - the percentage of the REFNAME grams that appear in the SOURCENAME gram set
*SOURCEPC - the percentage of the SOURCENAME grams that appear in the REFNAME gram set
*LEFTGRAMS - the number of the REFNAME grams that appear in the SOURCENAME gram set
*RIGHTGRAMS - the number of the SOURCENAME grams that appear in the REFNAME gram set
*LCSSCORE - The size of the longest common subsequence in characters
*SOURCELENGTH - The length of SOURCENAME
*REFLENGTH - The length of REFNAME
*MAXLENGTH - The maximum of the lengths of SOURCENAME and REFNAME
*LCSPC - The LCSSCORE divided by the MAXLENGTH
*FIRSTLETTERBINDS - Whether the fuzzy matching algorithm required the same first letter in SOURCENAME and REFNAME
*GRAMALPHABET - The gram alphabet used by the matching algorithm
*GRAMLENGTH
- The length of the n-grams used