ptoassigneend_us_candid2 (postcode is clean) | 184123
Union ptoassigneend_us_identify(0-4) to get generate ptoassigneend_us_identify_subtotal (3195769) with clean city, state and postcode. This table contains 89.5% of all the records in ptoassigneend_allus. 10.5% left in ptoassigneend_us_temp5.
Table "public.ptoassigneend_us_identify_subtotal"
postcode_cleaned | text |
* ptoassigneend_us_candid1 is a subset of ptoassigneend_us_temp5. It contains clean city and state info, but postcode is missing. 6.7% data left in ptoassigneend_us_temp6.
* ptoassigneend_us_candid2 is also a subset of ptoassigneend_us_temp5. It contains clean postcode info, but city and state are not identified. 5.0% data left in ptoassigneend_us_temp7. I randomly checked the city_extracted in ptoassigneend_us_candid2, and it is quite clean actually. Since these cities don't exist in ptoassigneend_us_citylist2city records may not be accurate, such as Oklahama City, we have no idea how to identify clean records. Maybe we can restrict the length of records to filter out clean city.
Note:
About 60 records are missing. For example, the # of records in ptoassigneend_us_temp + # of records in ptoassigneend_us_identify0 != # ptoassigneend_allus.