Difference between revisions of "Geocoding Inventor Locations"
imported>Ed |
imported>Ed |
||
Line 32: | Line 32: | ||
<tt>postcode WKU CTY city county</tt> | <tt>postcode WKU CTY city county</tt> | ||
− | Where <tt>WKU</tt> is the patent number and <tt>CTY</tt> is an address field. It appears that postcode, city and county are derived fields, extracted from CTY by an algorithm that uses comma seperation. As these fields are error prone they were discarded and regenerated. | + | Where <tt>WKU</tt> is the patent number and <tt>CTY</tt> is an address field. It appears that postcode, city and county are derived fields, extracted from CTY by an algorithm that uses comma seperation. As these fields are error prone they were discarded (and regenerated in the matching script). |
Countries that are being processed include: | Countries that are being processed include: | ||
− | #The UK (Source file: [http://www.edegan.com/repository/UK-PatentInventorLocations.txt UK-PatentInventorLocations.txt], Reference file: [http://www.edegan.com/repository/GNS-UK.txt GNS-UK.txt] | + | #The UK (Source file: [http://www.edegan.com/repository/UK-PatentInventorLocations.txt UK-PatentInventorLocations.txt], Reference file: [http://www.edegan.com/repository/GNS-UK.txt GNS-UK.txt]) |
Revision as of 19:55, 24 July 2009
- This page is part of a series under the NBER Patent Data Project
This page details the various matching techniques used to Geocode inventor locations in the NBER patent data. Geocoding inventor locations entails matching the inventor addresses provided in the patent data to known locations through-out the world and recording their longitude and latitude.
The reference data for the locations (which provides the longitude and latitudes) is taken from the (U.S.) National Geospatial-Intelligence Agency's GEOnet Names Server (GNS) which covers the world excluding the U.S. and Antartica.
Details of the GEOnet Names Server (GNS)
- Place names are recorded in Romanized form
- Country (and province/territory) names, as well as their assigned codes, are recorded using FIPS Standard #10. Users should note that there are differences between the Federal Information Processing Standard (FIPS) country names and the UN Country Names.
- The GNS output format contains various custom codes. Of particular interest are:
- LAT and LONG - Latitude and Longitude (Decimal - also available as DMS)
- NT - Name Type (Only A,P and L are of use in matching to addresses)
- A = Administrative region type feature
- P = Populated place type feature
- V = Vegetation type feature
- L = Locality or area type feature
- U = Undersea type feature
- R = Streets, highways, roads, or railroad type feature
- T = Hypsographic type feature
- H = Hydrographic type feature
- S = Spot type feature
- DC - Designation Code (DC provides a refinement of NT, details are in GNS-DesignationCodes.txt
- SHORT_FORM - a Short Form of the name that is commonly used
- FULL_NAME - the Long Form of the name
- FULL_NAME_ND - the Long Form of the name without diacritics
The Source And Reference Files
Source files are currently extracted from the NBER patent data on a per country basis. The original format of the source files was:
postcode WKU CTY city county
Where WKU is the patent number and CTY is an address field. It appears that postcode, city and county are derived fields, extracted from CTY by an algorithm that uses comma seperation. As these fields are error prone they were discarded (and regenerated in the matching script).
Countries that are being processed include:
- The UK (Source file: UK-PatentInventorLocations.txt, Reference file: GNS-UK.txt)