=== Clean Address: more patterns ===
====Clean postcode & state====
Postcode should always be at the end of addrline1, addrlin2 or city. Exclude P.O. BOX #.
For example, the The SQL code to extract postcode and state from addrline1 is: SELECT addrline1, SUBSTRING(addrline1, '\d{5}') postcode_candid, REPLACE(REPLACE(SUBSTRING(addrline1, '[,]\s{1,}\w{1,}\s{0,}\w{0,}\d{5}'),'\d{5}',''),',','') state_candid
FROM ptoassigneend_us_temp2
WHERE addrline1 ~* '(^|\s)\d{5}$' AND NOT (addrline1 ~* 'BO') AND NOT (addrline1 ~* 'P[.]O') OR addrline2 ~* '(^|\s)\d{5}$' AND NOT (addrline2 ~* 'BO') AND NOT (addrline2 ~* 'P[.]O') OR city ~* '(^|\s)\d{5}$' AND NOT (city ~* 'BO') AND NOT (city ~* 'P[.]O'); # SELECT 6680
Examples:
addrline1 | substring 181 METRO DRIVE SUITE 520 SAN JOSE, CA 95110 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 95110 ----------- 98625 | , CA 9511098625 11065 ROSELLE STREET SAN DIEGO1650 WEST BIG BEAVER ROAD TROY, CALIFORNIA 92121 MI 48084 | 11065 48084 1114 AVE NY NY 10036 | 10036 GLENDALE, CALIFORNIA 92121CA 91204 | 91204 400 MARGARET STREET46 BAKER ST., NOPROVIDENCE, R. 20 PLATTSBURGH, NEW YORK 12901 | 12901 I. 02905 |02905
====Clean city & state====