====Cleaning up patent data and inserting it into the existing patent data table====
=====For the table called "patents=====
INSERT statement: INSERT INTO patents_merged ( SELECT patents.patentnumber, patents.grantdate, patents.prioritydate, patents.prioritycountry, patents.prioritypatentnumber, patents.cpcsubgroup, patents.pctpatentnumber, patents.numberofclaims, patents.applicationnumber, NULL, patents.filingdate, NULL, NULL, NULL, NULL FROM patents ); Equivalent COPY statements: First copy data from RDP to a CSV file: \COPY (SELECT patents.patentnumber, patents.grantdate, patents.prioritydate, patents.prioritycountry, patents.prioritypatentnumber, patents.cpcsubgroup, patents.pctpatentnumber, patents.numberofclaims, patents.applicationnumber, NULL, patents.filingdate, NULL, NULL, NULL, NULL FROM patents) TO '/tmp/merged_patents_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; --COPY 1646225 Then copy CSV file into table: \COPY patents_merged FROM '/tmp/merged_patents_export.txt' DELIMITER AS E'\t' HEADER NULL AS CSV; -- RESULT : COPY 1646225 where "patents" is the name of the table containing the data from the RDP update and "patents_merged" is the name of the table we want to insert the new data into. Table that we want to insert in patent is called "patent" in the datbase "patent". So you would replace "patents_merge" with "patent" . The data in columns nber, uspc, and uspc_sub is taken from historicalpatentdata. Here is the insert statement belowSQL code to do that after the new data from the RDP has been inserted into the table "patent". UPDATE patent p SET nber = hp.nber, uspc = hp.uspc, uspc_sub = hp.uspc FROM historicalpatentdata hp WHERE hp.patentnumber = CAST(p.patent AS varchar); Missing information (represented by NULL):gyear (int)appear (int) Schema of patent data when loaded into RDP (based on existin documentation - have not checked schema on RDP postgres server): patentnumber | int | not null kind | varchar | grantdate | date | type | varchar | applicationnumber | varchar | filingdate | date | prioritydate | date | prioritycountry | varchar | prioritypatentnumber | varchar | ussubclass | varchar | maingroup | varchar | subgroup | varchar | cpcsubclass | varchar | cpcmaingroup | varchar | cpcsubgroup | varchar | classificationnationalcountry | varchar | classificationnationalclass | varchar | title | varchar | numberofclaims | int | primaryexaminerfirstname | varchar | primaryexaminerlastname | varchar | primaryexaminerdepartment | varchar | pctpatentnumber | varchar | filename | varchar | Schema of patent table in patent (called "patent"): patent | integer | not null | plain | | gdate | date | | plain | | prioritydate | date | | plain | | prioritycountry | character varying | | extended | | prioritypatentnumber | character varying | | extended | | cpcsubgroup | character varying | | extended | | pctpatentnumber | character varying | | extended | | claims | integer | | plain | | appnum | integer | | plain | | gyear | integer | | plain | | appdate | date | | plain | | appyear | integer | | plain | | nber | integer | | plain | | uspc | character varying | | extended | | uspc_sub | character varying | | extended | | Indexes: "patents_pkey" PRIMARY KEY, btree (patent) "patent_idx" UNIQUE, btree (patent)
===For the USPTO Assignment Data===