03/10/2017 - Extracted U.S. address information in ptoassigneend table. The extracted records are stored in the new table 'ptoassigneend_missus'. See details [[Patent Data Restructure]]
03/1213/2017 - Applied similar methods to extract address information from Japanese patents. The results are stored in 'ptoassigneend_missjapan'. Matched the post code pattern to 200 distinct countries that exist in patent table. 03/14/2017 - As mentioned above, three kinds of information that can be extracted from address columns are city, country and post code (plus state for U.S.). The post code extracted is quite accurate for almost all the countries, and so is the country information (and the state for U.S.). The problem is that the city information extracted is not quite good. It messes up with street names. One approach to increase the accuracy is to list all the possible cities in each country, and then match the address columns to these cities, which is time consuming.