Simplified code is in:
E:/McNair/Projects/PatentAddress/Functions.sql
=====5. Output (Tables)=====
*'''ptoassigneend_allus'''
This table contains all the U.S. patents extracted from ptoassigneend table. The rule to generate this table is 'country = 'UNITED STATES' ' or any of the features contains U.S. postcode.
Noise exists because of postcode errors.
The ptoassigneend_allus table may miss some U.S. patents which lose postcode records.
SQL code:
CREATE TABLE ptoassigneend_allus AS
SELECT * FROM ptoassigneend
WHERE city ~* '.*\d{5}[-]\d{4}.*' OR addrline1 ~* '.*\d{5}[-]\d{4}.*' OR
addrline2 ~* '.*\d{5}[-]\d{4}.*' OR postcode ~* '.*\d{5}[-]\d{4}.*' OR
country ~* 'UNITED STATES';
SELECT 3572682
Table "public.ptoassigneend_allus"
Column | Type | Modifiers
reelno | integer |
frameno | integer |
name | character varying(500) |
addrline1 | character varying(500) |
addrline2 | character varying(500) |
city | character varying(500) |
state | character varying(500) |
country | character varying(500) |
postcode | character varying(80) |
*'''ptoassigneend_us_extracted'''
Contain all the original features as well as city, state and postcode info extracted from features addrline1, addrline2 and city. See Section 2, 3, 4 for extraction process.
Table "public.ptoassigneend_us_extracted"
Column | Type | Modifiers
reelno | integer |
frameno | integer |
name | character varying(500) |
addrline1 | character varying(500) |
addrline2 | character varying(500) |
city | character varying(500) |
state | character varying(500) |
country | character varying(500) |
postcode | character varying(80) |
postqcode_city | text |
postcode_addr1 | text |
postcode_addr2 | text |
state_city | text |
state_addr1 | text |
state_addr2 | text |
city_addr1 | text |
city_addr2 | text |
city_city | text |
postcode_city is the postcode extracted from 'city'; postcode_addr1 is the postcode extracted from 'addrline1'; postcode_addr2 is the postcode extracted from 'addrline2'.
state_city is the state name extracted from 'city'; state_addr1 is the state name extracted from 'addrline1'; state_addr2 is the state name extracted from 'addrline2'.
city_city is the city name extracted from 'city'; city_addr1 is the city name extracted from 'addrline1'; city_addr2 is the city name extracted from 'addrline2'.
====Master Table====