:'''3. Postcode(U.S.)'''
::U.S. post code follows the pattern [five digits - four digits]. In this way, U.S. patents can be extracted by searching for post code with regular expression. "(^|\s)\d{5}-\d{4}($|\s)"
::For example,
E:/McNair/Projects/PatentAddress/RxPostcode.sql
:'''4. State (U.S.)'''
::There are some patterns that can be used to extract city information. ::*'(^|\s)CITY NAME[,] STATE POSTCODE' The state and post code are always together, separated by a space. So we can also extract state information with regular expression too.
::For example,
HOUSTON, TEXAS 77256-6571 | TEXAS
BROOKINGS, SOUTH DAKOTA 57006-0128 | SOUTH DAKOTA
::*'CITY NAME STATE(Abbreviation) POSTCODE'
:: For example:
NEW YORK NY 10022-3201
WAUKEGAN IL 60085-2195
::The extracted state records are stored in table ptoassigneend_missus_final.
::* The city feature needs to be standardized. For example, 'GRAND CAYMAN, CAYMAN ISLAND' and 'GRAND CAYMAN' indicate the same city.
::* Some state and country features don't match. ::For example: addrline2 | city | country-----------------------------------------------------------------------------------------------+-----------------------------+--------- 2882 SAND HILL ROAD MENLO PARK, 'Beijing' CALIFORNIA 94025- 'UNITED STATES7022 | TOKYO | JAPAN 2801 CENTERVILLE ROAD, 10022'P.O. BOX 15439 WILMINGTON, DE 19850-5439 | TOKYO | JAPAN 1-6, UCHISAIWAI-CHO 1-CHOME | CHIYODA-KU, TOKYO | JAPAN MENLO PARK, CA 94025-7022 | TOKYO | JAPAN | MINATO-KU, TOKYO 10585-8518 | JAPAN 1225 NORTH HIGHWAY 169, MINNEAPOLIS, MINNESOTA 55441-5058 | TOKYO | JAPAN 2882 SAND HILL ROAD MENLO PARK, CA 94025-7022 | TOKYO | JAPAN 3001 ORCHARD PARKWAY SAN JOSE, CALIFORNIA 95134-2088 | TOKYO 107 | JAPAN 3001 ORCHARD PARKWAY SAN JOSE, CA 95134-2088 | MINATO-KU, TOKYO 107 | JAPAN 3001 ORCHARD PARKWAY SAN JOSE, CALIFORNIA 95134-2088 | TOKYO 107 | JAPAN 3001 ORCHARD PARKWAY SAN JOSE, CA 95134-2088 | MINATO-KU, TOKYO 107 | JAPAN
::* Both state name and its abbreviation exist.