First, filter out records with city that is a city, zip that is a zip, state that is a state. Note: The consistency between city and state or city and postcode is not checked in this section.
*=====zip that is a zip=====
Match the pattern 5-4 or 5 digits.
*=====state that is a state=====
Select distinct state records with
The output shows that all the records not null or not space are valid state names.
*=====city that is a city=====
The feature Select distinct city is messy. In this way, we first clean feature city before identifying clean datarecords in ptoassigneend_us_extracted and store them in ptoasigneend_us_citylist (775).
The following methods can be used to identify clean city.
Select distinct city records and store them in table citylist (30971). Select distinct city records in ptoassigneend_us_extracted and store them in citylist2 (775). The following patterns can be used to clean feature city. * Feature city is in citylist2ptoassigneend_us_citylist.
SQL Code:
WHERE city IN (
SELECT citylist
FROM citylist2ptoassigneend_us_citylist); -- SELECT 2574811 !!!(992450 LEFT)