Changes
Jump to navigation
Jump to search
Geocoding Inventor Locations (view source)
Revision as of 19:03, 22 August 2009
, 19:03, 22 August 2009→Postal Codes
*United Kingdom ([http://en.wikipedia.org/wiki/UK_postcodes Sourced from Wikipedia]): A9 9AA, A99 9AA, A9A 9AA, AA9 9AA, AA99 9AA, AA9A 9AA. Simple Regex: <tt>([A-Z]{1,2}[0-9]{1,2}[A-Z]{0,1}\s[0-9][A-Z]{2,2})</tt>
*France ([http://en.wikipedia.org/wiki/Postal_codes_in_France Sourced from Wikipedia]): NNNMM or NNMMM where NN and NNN are numerics indicating the préfectures and sous-préfectures, respectively, and MMM are other numerics. However the following formats also appear frequently in the patent data: F NNNNN, F-NNNNN, F - NNNNN, F- NN NNN, FNNNNN, F-NN NNN, F-NN, FR - NNNNN, FR NNNNN, FR-NNNNN, (NNNNN), (NN), - NNNNN, -NNNNN-, -NN, NN, NN - N-NNNNN, NNNN, NN NNN, NN.NNN, NNN/N, NN., F. NNNNN, F.NNNNN, "FRNN,NNN". French postal codes most often, but not exclusively, occur at the start of the address string. If there are fewer than 5 digits, trailing zeros should be added. **Simple Regex: <tt>\(?F?R?\.?\s?\d?-?\s?\d{2,3}\.?\s?\/?\d{0,3}-?\)?</tt>*Belgium ([http://en.wikipedia.org/wiki/List_of_postal_codes_in_Belgium Sourced from Wikipedia]): NNNN where N is a numeric. Belgian postcodes are usually placed before the city, and the number of trailing zeros indicates the size of the city. However, the following formats also appear frequently in the patent data: NN NNNN, NNN, NNNN, NNNNN, NNNN-, NNN B-NNNN, NNN B-NNN, NN-NNNN, B-NNNN, NN - B NNNN, NN, N - B - NNNN, "NN, B. NNNN", B - NNNN, B -NNNN, B NNNN, B- NNNN, B--NNNN, B-NNNN, B-NNNN-, BNNNN, BNNNNN, BE - NNNN, BE-NNNN, BF-NNNN. **Simple Regex: <tt>\d{0,3},?\s?-{0,2}\s?B?[EF]?\.?\s?-{0,2}\s?\d{1,5}-? </tt>
*Germany ([http://en.wikipedia.org/wiki/List_of_postal_codes_in_Germany Sourced from Wikipedia]): Currently (post 1993) German postcal codes consist of five digits: NNMMM where NN indicates the broad area and MMM indicates the sub-area. Prior to 1993 postal codes had four digits NNNN and between 1989 and 1993, O-NNNN (for East, Ost, Germany) and W-NNNN (for West Germany) was used. However, the following formats also appear frequently in the patent data: (NNNN), (D-NNNN), -NNNN, 0-NNNN, 0 - NNNN, 0-NNN, 0NNNN, N CityName NN, NN CityName NN, NNN CityName NN, NNNN CityName NN, NNNN CityName N, NNNN CityName N/BRD, NNNN CityName N/HB, 1-DNNNN, "10,NNNN", BRD-NNNN, N, NN, NNN, NNNN, NNNNN, NN.NNNNN, NN-NNNNN, d-NNNN, NN - NNNNN, NN NNN, D-N, D-NN, D-NNN, D-NNNN, D-NNNNN, D N CityName NN, D NN, D NNNN, D NNNNN, D- NNNN, D- NNNNN, D--NNNNN, D-0-NNNN, D-0NNNN, D-N-NNNNN, D.NNNN, D.NNNNN, D.-NNNNN, D0NNNN, DNNN NN. DNN, DNNN, DNNNN, DNNNNN, DE - NNNN, DE 0NNNN, DE NNNNN, DE-0NNNNN, DE-0-NNNN, DE-NNNN, DE-NNNNN, O-NNNN, W-NNNN, W-NNNN CityName NN, W-NNNNN CityName NN, WNNNN. Where CityName is Berlin, Hamburg, Dusseldorf, Seevetal, etc., and DM, DS and DW appear instead of DE sometimes. Unfortunately this list is not exhaustive. Readers should note that there is a frequent transcription error of O (Ooh) as 0 (Zero).
**Simple Regex (Doesn't catch everything): <tt>\(?(DE|D|1|W|)\.?-?[O0]?(BRD|1|)\s?-?\s?\d{2,5}\)?</tt>
*Spain ([http://en.wikipedia.org/wiki/List_of_postal_codes_in_Spain Sourced from Wikipedia]): Post 1976 Spanish postcodes are five digits of the format NNMMM, where NN indicates the province (01-52) or a reserved code (e.g. 80 for P.O. boxes). In the patent data Spansish postcodes are comparatively well behaved, with the following standard variants appearing: NNNNN, NNN NN, NNNN, NN NNNNN, NN- NNNNN, -NNNNN, NNNNN-, "NN, NNNNN", NNN, NN, N NNNNN-, NN-NN, NN-NN NNNNN, NNNNN-IBI, E-NNNNN, E-NNNN, E - NNNNN, E--NNNNN, ES-NNNNN.
**Simple Regex: <tt>(E|ES|)\d{0,2},?\s?-{0,2}\s?\d{2,5}-?(IBI|)</tt>
*Switzerland ([http://en.wikipedia.org/wiki/Postal_codes_in_Switzerland_and_Liechtenstein Sourced from Wikipedia]): Swiss (and Lictenstein) postcodes are heirarchical four-digit numbers of the form District+Area+Route+PONumber, where districts are numbered West to East (would you expect less from the Swiss?). In the patent data Swiss postcodes are compartvely immaculately behaved with the following formats appearing: NNNN, NNNN-, CH-NNNN, CH - NNNN, CH NNNN, CHNNN, CH- NNNN, CHNNN. Though the "H" may sometimes be lowercase.
**Simple Regex: <tt>(CH|Ch|)\s?-?\s?\d{3,4}-?</tt>
The Match::PostalCodes.pm perl module provides a method to extract a postcode from a text string for a given ISO3166 code.