Difference between revisions of "Patent Assignment Data Restructure"
Jump to navigation
Jump to search
Line 14: | Line 14: | ||
::* alphanumeric character strings | ::* alphanumeric character strings | ||
− | :3. | + | :3. Restructure address information in ptoassignee table to extract meaningful information |
− | :4. | + | :4. Verify that cleaned patent documentids correspond to patent numbers or application numbers in the patent table |
− | :5. | + | :5. Restructure address information in ptoassignment table |
− | :6. Transform structure of the dataset | + | :6. Transform structure of the dataset |
Revision as of 16:03, 2 March 2017
In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:
- 1. Clean ptoassignment table to unique keys.
- 2. Clean ptoproperties to remove nonutility patents. The patent numbers currently include:
- 7 digit patent numbers
- application numbers
- unknown numbers that cannot be matched to patent numbers in the patent table
- 20090108066
- 20100007288
- 20090108066
- 20100110022
- Design and Reissue patents ('%D%' or '%RE%')
- alphanumeric character strings
- 3. Restructure address information in ptoassignee table to extract meaningful information
- 4. Verify that cleaned patent documentids correspond to patent numbers or application numbers in the patent table
- 5. Restructure address information in ptoassignment table
- 6. Transform structure of the dataset