::*By now, we only focus on cleaning American patents.
:'''2. Tables'''::*ptoassigneend_allus :::This table contains all the U.S. patents extracted from ptoassigneend table. The rule to extract this table is 'country = 'UNITED STATES' ' or any of the features contains U.S. postcode. :::The SQL code is: CREATE TABLE ptoassigneend_allus AS SELECT * FROM ptoassigneend WHERE city ~* '.*\d{5}[-]\d{4}.*' OR addrline1 ~* '.*\d{5}[-]\d{4}.*' OR addrline2 ~* '.*\d{5}[-]\d{4}.*' OR postcode ~* '.*\d{5}[-]\d{4}.*' OR country ~* 'UNITED STATES'; SELECT 3572682 Table "public.ptoassigneend_allus" Column | Type | Modifiers reelno | integer | frameno | integer | name | character varying(500) | addrline1 | character varying(500) | addrline2 | character varying(500) | city | character varying(500) | state | character varying(500) | country | character varying(500) | postcode | character varying(80) | :::The ptoassigneend_allus table may miss some U.S. patents. ::*ptoassigneend_missus_final :::State and postcode information are extracted from addrline1, addrline2 and city columns and are stored in ptoassigneend_missus_final table. See section 3 and 4. :::This table is a subset of ptoassigneend_allus table. Table "public.ptoassigneend_missus_final" Column | Type | Modifiers reelno | integer | frameno | integer | name | character varying(500) | addrline1 | character varying(500) | addrline2 | character varying(500) | city | character varying(500) | state | character varying(500) | country | character varying(500) | postcode | character varying(80) | postcode_city | text | postcode_addr1 | text | postcode_addr2 | text | state_city | text | state_addr1 | text | state_addr2 | text | :::postcode_city is the postcode extracted from 'city'. postcode_addr1 is the postcode extracted from 'addrline1'. postcode_addr2 is the postcode extracted from 'addrline2'. :::state_city is the state name extracted from 'city'. state_addr1 is the state name extracted from 'addrline1'. state_addr2 is the state name extracted from 'addrline2'. ::*ptoassigneend_city :::City information are extracted from addrline1, addrline2 and city columns and are stored in ptoassigneend_city table. See section 5. :::This table is a subset of ptoassigneend_allus table. :'''3. Postcode(U.S.)'''
::U.S. post code follows the pattern [five digits - four digits]. In this way, U.S. patents can be extracted by searching for post code with regular expression "(^|\s)\d{5}-\d{4}($|\s)"
E:/McNair/Projects/PatentAddress/RxPostcode.sql
:'''43. State (U.S.)'''
:: There are some patterns that can be used to extract state information.
E:/McNair/Projects/PatentAddress/RxPostcode.sql
:'''54. City (U.S.)'''
:: There are some patterns that can be used to extract city information.
E:/McNair/Projects/PatentAddress/CityPatterns.sql
:'''5. Tables'''
::*ptoassigneend_allus
:::This table contains all the U.S. patents extracted from ptoassigneend table. The rule to extract this table is 'country = 'UNITED STATES' ' or any of the features contains U.S. postcode.
:::The SQL code is:
CREATE TABLE ptoassigneend_allus AS
SELECT * FROM ptoassigneend
WHERE city ~* '.*\d{5}[-]\d{4}.*' OR addrline1 ~* '.*\d{5}[-]\d{4}.*' OR
addrline2 ~* '.*\d{5}[-]\d{4}.*' OR postcode ~* '.*\d{5}[-]\d{4}.*' OR
country ~* 'UNITED STATES';
SELECT 3572682
Table "public.ptoassigneend_allus"
Column | Type | Modifiers
reelno | integer |
frameno | integer |
name | character varying(500) |
addrline1 | character varying(500) |
addrline2 | character varying(500) |
city | character varying(500) |
state | character varying(500) |
country | character varying(500) |
postcode | character varying(80) |
:::The ptoassigneend_allus table may miss some U.S. patents.
::*ptoassigneend_missus_final
:::State and postcode information are extracted from addrline1, addrline2 and city columns and are stored in ptoassigneend_missus_final table. See section 3 and 4.
:::This table is a subset of ptoassigneend_allus table.
Table "public.ptoassigneend_missus_final"
Column | Type | Modifiers
reelno | integer |
frameno | integer |
name | character varying(500) |
addrline1 | character varying(500) |
addrline2 | character varying(500) |
city | character varying(500) |
state | character varying(500) |
country | character varying(500) |
postcode | character varying(80) |
postcode_city | text |
postcode_addr1 | text |
postcode_addr2 | text |
state_city | text |
state_addr1 | text |
state_addr2 | text |
:::postcode_city is the postcode extracted from 'city'. postcode_addr1 is the postcode extracted from 'addrline1'. postcode_addr2 is the postcode extracted from 'addrline2'.
:::state_city is the state name extracted from 'city'. state_addr1 is the state name extracted from 'addrline1'. state_addr2 is the state name extracted from 'addrline2'.
::*ptoassigneend_city
:::City information are extracted from addrline1, addrline2 and city columns and are stored in ptoassigneend_city table. See section 5.
:::This table is a subset of ptoassigneend_allus table.
:'''6. Issues'''