===Summary===
Table Name | Records # ----------------------------------------------------|------------- ptoassigneend_allus | 3572605 ----------------------------------------------------|------------- ptoassigneend_us_identify0 | 5343 ptoassigneend_us_temp | 3567261 ptoassigneend_us_identify1 | 2511356 ptoassigneend_us_temp2 | 1055874 ptoassigneend_us_identify2 | 14508 ptoassigneend_us_temp3 | 1041366 ptoassigneend_us_identify3 | 664524 ptoassigneend_us_temp4 | 376835 ptoassigneend_us_identify4 | 38 ptoassigneend_us_temp5 | 376797 ptoassigneend_us_identify_subtotal | 3195769 ----------------------------------------------------|-------------
ptoassigneend_us_candid1 (city and state is clean) | 136958
ptoassigneend_us_candid2 (postcode is clean) | 184123
Union ptoassigneend_us_identify(0-4) to get ptoassigneend_us_identify_subtotal (3195769). 10.5% left in ptoassigneend_us_temp5.
ptoassigneend_us_candid1 is a subset of ptoassigneend_us_temp5. It doesn't contain clean postcode info, but contains clean city and state info. 6.7% data left in ptoassigneend_us_temp6.
ptoassigneend_us_candid2 is also a subset of ptoassigneend_us_temp5. It contains clean postcode info. 5.0% data left in ptoassigneend_us_temp7. I randomly checked the city_extracted in ptoassigneend_us_candid2, and it is quite clean actually. But these cities don't exist in ptoassigneend_us_citylist2, so we have no idea how to identify clean records.
Note:
About 60 records are missing. For example, the # of records in ptoassigneend_us_temp + # of records in ptoassigneend_us_identify0 != # ptoassigneend_allus.