Difference between revisions of "Sonia Zhang (Work Log)"

From edegan.com
Jump to navigation Jump to search
Line 27: Line 27:
  
 
03/15/2017 - Cleaned the country column.
 
03/15/2017 - Cleaned the country column.
 +
 +
03/16/2017 - 03/27/2017 - Restructure addrline1, addrline2 and city features. See [[Patent Data Restructure]].
 
[[Category:Work Log]]
 
[[Category:Work Log]]

Revision as of 16:53, 27 March 2017


Sonia Zhang Work Logs (log page)

Summer Work

02/23/2017 - Set Up the User Page and the Work Log Page. Got an overview of the patent data.

02/27/2017 - Started working on the issues listed on Patent Data Issues.

02/28/2017 - Cleaned the assigneeinfo, msalist etc.

03/1/2017 - Had a meeting discussing problems in the patent data.

03/2/2017 - Cleaned some of the 'name' and 'city' records in ptoassigneend2. Created ptoassigneend_country table to store country information. Figured out some methods to fill the empty 'city'/'country' information.

03/06/2017 - Updated ptoassigneend table. Filled some of the missing value in 'country' as 'UNITED STATES' based on 'state' information.

03/10/2017 - Extracted U.S. address information in ptoassigneend table. The extracted records are stored in the new table 'ptoassigneend_missus'. See details Patent Data Restructure

03/13/2017 - Applied similar methods to extract address information from Japanese patents. The results are stored in 'ptoassigneend_missjapan'. Matched the post code pattern to 200 distinct countries that exist in patent table.

03/14/2017 - Focused on the country and post code information. Extracted country and post code information from addrline1 and addrline2 columns for patents from Japan and South Korea.

Cleaned the names of country. (Not finished yet)

The problem needed to be solved is that the post code from some countries follows the same pattern of the street code. For example, based on the pattern of [three digits-three digits], records from South Korea and German(some) are extracted.

03/15/2017 - Cleaned the country column.

03/16/2017 - 03/27/2017 - Restructure addrline1, addrline2 and city features. See Patent Data Restructure.