Changes
Jump to navigation
Jump to search
==The Source Files==MatchLocations.pl also retrieves a list of all ISO3166 codes included in the data (from the MatchPatent.pm module) and in any specified override file, and calls Match::GNS.pm to load them. An override file can be specified with the <tt>-over</tt> option. Override files are tab-delimited and have the format:
Per country source files are extracted from the NBER patent data ListedISO3166 1stPreference 2ndPreference 3rdPreference ... The format of the source file(s) is as follows (XX is an ISO3166 code):
XX.txt - Tab delimited plain text with no (intentional) string quotation. Column(s): <tt>country</tt> <tt>str</tt> <tt>cty</tt> <tt>adm</tt> <tt>city</tt> <tt>postcode</tt> <tt>str</tt> The column order ISO3166 listed in the source data is not important. <tt>country</tt>, <tt>str</tt>, then overridden and <tt>cty</tt> can not all be null. <tt>adm</tt> <tt>city</tt> <tt>postcode</tt> the alternatives are optional 'exception' fields that are processed with prioritysearched for matches in order of preference. They provide hand corrections and other specifically generated informationThe search is terminated when a match is found or the override set is exhausted.
The perl module Match::Patent.pm loads and provides an interface to this source data. The source code is the primary module documentation.
[[Postal Codes]]
Geocoding Inventor Locations (view source)
Revision as of 23:53, 20 January 2010
, 23:53, 20 January 2010no edit summary
produces a simple help output.
==The Source Files==
Per country source files are extracted from the NBER patent data. The format of the source file(s) is as follows (XX is an ISO3166 code):
XX.txt - Tab delimited plain text with no (intentional) string quotation.
Column(s): <tt>country</tt> <tt>str</tt> <tt>cty</tt> <tt>adm</tt> <tt>city</tt> <tt>postcode</tt> <tt>str</tt>
The column order is not important. <tt>country</tt>, <tt>str</tt>, and <tt>cty</tt> can not all be null. <tt>adm</tt> <tt>city</tt> <tt>postcode</tt> are optional 'exception' fields that are processed with priority. They provide hand corrections and other specifically generated information.
The perl module Match::Patent.pm loads and provides an interface to this source data. The source code is the primary module documentation. The Match::PostalCodes.pm perl module provides a method to extract [[Postal Codes]] from a the addresses for a large number of ISO3166 codes, and implements 'standard' postal code identification for all other jurisdictions.
==Reference Data==
This project uses [[ISO3166]] two-character country codes to name source and reference files. GNS does not use ISO3166 country codes, and so users will need to translate accordingly (see the [[GEOnet Names Server | GNS page]] for details). A full bundle of correctly names GNS files is also available.
The perl module Match::GNS.pm loads, indexes and provides an interface to key variables from this data. The source code is the primary module documentation. The load() method takes and an ISO3166 code, and the index methods and most other methods take one of two specific GNS FC codes (e.g. "P" for populated place, "L" for locality, and "A" for administrative area). Which GNS FC codes are used is specified in the @Letters global varible of MatchLocations.pl and inherited by all other modules.
==The Matching Process==
The matching process is carried out by [http://www.edegan.com/repository/MatchPatentLocations.pl MatchLocations.pl] script, and its dependent modules (detailed above), which has a standard pod based command line interface. The <tt>-co </tt> option specifies the ISO3166 country code to be matched. If the override option is used, then the <tt>-co</tt> option can be used to specify the source file. When an override option is set to 1, rather than to the filename containing the overrides, then the source files countries are used to determine which GNS lookups to perform, otherwise the <tt>-co</tt> option specifies the GNS reference set.
Glossary of terms: