In order to restructure the current patent dataset, the data requires rigorous cleaning. The primary areas for improvement are:
:1. Clean ptoassignment table to unique keys.
:2. Clean ptoproperties to remove nonutility patents . The patent numbers currently include:::* 7 digit patent numbers::* application numbers ::* unknown numbers that cannot be matched to patent numbers in the patent table ::20090108066 ::20100007288 ::20090108066 ::20100110022 ::*(including patent numbers, application numbers, something else that we haven't matched yet).
:3. Clean ptoassignee to extract address components and clean it up.
:4. Check all patent numbers accounted for in ptoassignee_currentusa.
:5. Correspondence address clean up.
:6. Transform structureof the dataset.