Difference between revisions of "Patent Assignment Data Restructure"
Jump to navigation
Jump to search
Line 7: | Line 7: | ||
::* application numbers | ::* application numbers | ||
::* unknown numbers that cannot be matched to patent numbers in the patent table | ::* unknown numbers that cannot be matched to patent numbers in the patent table | ||
− | ::20090108066 | + | ::::20090108066 |
− | ::20100007288 | + | ::::20100007288 |
− | ::20090108066 | + | ::::20090108066 |
− | ::20100110022 | + | ::::20100110022 |
::*(including patent numbers, application numbers, something else that we haven't matched yet). | ::*(including patent numbers, application numbers, something else that we haven't matched yet). |
Revision as of 15:55, 2 March 2017
In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:
- 1. Clean ptoassignment table to unique keys.
- 2. Clean ptoproperties to remove nonutility patents. The patent numbers currently include:
- 7 digit patent numbers
- application numbers
- unknown numbers that cannot be matched to patent numbers in the patent table
- 20090108066
- 20100007288
- 20090108066
- 20100110022
- (including patent numbers, application numbers, something else that we haven't matched yet).
- 3. Clean ptoassignee to extract address components and clean it up.
- 4. Check all patent numbers accounted for in ptoassignee_currentusa.
- 5. Correspondence address clean up.
- 6. Transform structure of the dataset.