=====Postcode=====
Reminder: 'postcode_city ' is the postcodes extracted from 'city'; 'postcode_addr1 ' is the postcodes extracted from 'addrline1'; 'postcode_addr2 ' is the postcodes extracted from 'addrline2'.
The 'postcode_city', 'postcode_addr1 ' and 'postcode_addr2 ' are consistent.
Examples:
37831-8243 | 37831-8243
The issue is the inconsistency between 'postcode ' and ('postcode_city', 'postcode_addr1 ' and 'postcode_addr2').
* Inconsistency between 'postcode ' and 'postcode_addr1'
'postcode_addr1 ' beats 'postcode ' because 'addrline1 ' is detailed. For example:
addrline1 | postcode_addr1 | postcode_new
P.O. BOX 6 / 83707-0006 | 83707-0006 | 83716
P.O. BOX 6 / 83707-0006 | 83707-0006 | 83716
* Inconsistency between 'postcode ' and 'postcode_addr2'
'postcode_addr2 ' beats 'postcode ' because 'addrline2 ' is detailed.
Example:
addrline2 | postcode_addr2 | postcode_new
P.O. BOX 6 / 83707-0006 | 83707-0006 | 83716-9632
P.O. BOX 674412, HOUSTON, TX 77267-4412 | 77267-4412 | 77002
Besides, I randomly picked some records and googled address and postcode. These records The results support 'postcode_addr2'.
Examples:
Providence, RI 02903
* Inconsistency between 'postcode ' and 'postcode_city'
'postcode_city ' beats 'postcode'.
city | state | postcode_city | postcode
E:/McNair/Projects/PatentAddress/Functions.sql
For records of which 'addrline1', 'addrline2 ' and 'city ' don't contain postcode info, just clean the feature 'postcode ' as the 'postcode_cleaned'
All the cleaned postcodes for U.S. patents are stored in ptoassigneend_us_cleaned (see feature postcode_cleaned).