IO files are on the dbase server in:
Z:/PatentAddress
====To do====
In no particular order:
*Remove city, state, zip, country from addrline1 & addrline2 to get clean addrlines.
*Maybe concatenate addrline1 and addrline to make addrline
*Identify clean data (e.g. City that is a city, zip that is a zip, state that is a state)
**By pattern, length, match to list
*Try some more patterns, perhaps with a slightly higher false positive rate, on the remaining uncleaned data
**Iterate!
====Introduction====
Currently done:
*Five features (addrline1, addrline2, city, state, postcode) in the table contain address information.
*Features addrline1, addrline2 and city are not cleaned. They can have suite, street, city, state and postcode information, or any combination of these