Tiger Geocoder
This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:
Tiger Geocoder | |
---|---|
Project Information | |
Project Title | Tiger Geocoder |
Owner | Peter Jalbert |
Start Date | Fall 2017 |
Deadline | |
Keywords | Tiger, Geocoder, Database |
Primary Billing | |
Notes | |
Has project status | |
Copyright © 2016 edegan.com. All Rights Reserved. |
PostGIS Installation Database Server Documentation
The official documentation for using and installing the Tiger Geocoder can be found in the following.
General Instructions Installation Instructions Geocoder Documentation
Location
The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called "coffeeshops" that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then,
psql geocoder
Installation
Install and Nation Data
I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:
--Add Extensions to database CREATE EXTENSION postgis; CREATE EXTENSION fuzzystrmatch; CREATE EXTENSION postgis_tiger_geocoder; CREATE EXTENSION address_standardizer;
You can test that the installation worked by running the following query:
SELECT na.address, na.streetname,na.streettypeabbrev, na.zip FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;
This should return the following:
address | streetname | streettypeabbrev | zip ---------+------------+------------------+------- 1 | Devonshire | Pl | 02109
Next, a new profile needs to be created by using the following command.
INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_command) SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_command FROM tiger.loader_platform WHERE os = 'sh';
The installation instructions also provide the following note:
As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.
If you would like this feature, you can enable it by using the following command. This should be done before loading the script.
UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';
The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:
export PGHOST=localhost + export PGUSER=postgres + export PGPASSWORD=yourpasswordhere + export PGDATABASE=geocoder + PSQL=${PGBIN}/psql + SHP2PGSQL=shp2pgsql + cd ${staging_fold} + TMPDIR="${staging_fold}/temp/" + UNZIPTOOL=unzip + WGETTOOL="/usr/bin/wget" + export PGBIN=/usr/lib/postgresql/9.6/bin + export PGPORT=5432 + export PGHOST=localhost + export PGUSER=postgres + export PGPASSWORD=yourpasswordhere + export PGDATABASE=geocoder + PSQL=${PGBIN}/psql + SHP2PGSQL=shp2pgsql + cd ${staging_fold}
Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:
/gisdata
There needs to be a directory called "temp" in the gisdata directory. To make the script, use the following from the command line:
psql -c "SELECT Loader_Generate_Nation_Script('test')" -d databasename -tA > /gisdata/nation_script_load.sh
This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths.
Change directories:
cd /gisdata
Edit the script using your favorite command line text editor. Specifically, edit the following fields.
PGUSER=postgres PGPASSWORD=(Ask Anne for this password)!
Everything else remains the same.
Run the script by using: sh nation_script_load.sh
Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.
State Data
The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.
psql -c "SELECT Loader_Generate_Script(ARRAY['MA'], 'test')" -d geocoder -tA > /gisdata/ma_load.sh
CURRENT PROGRESS: The following states have been downloaded into the geocoder database.
AL, AK, AZ, MA
Geocode Function
The official arguments for the function are the following:
setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);
The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.
Single Address
An example query for a single address is:
SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, (addy).address As stno, (addy).streetname As street, (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip FROM geocode('75 State Street, Boston MA 02109', 1) As g;
rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess. ST_X(g.geomout) -- This retrieves the longitude coordinate of the point. ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point. addy -- In general, addy is a normalized address resulting from the input address. (addy).address -- The number of the address (Ex: "75" Blabla rd.) (addy).streetname -- The name of the street (Ex: 75 "Blabla" rd.) (addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla "rd") (addy).location -- The city location of the address. (addy).stateabbrev -- The state abbreviation. (addy).zip -- The zipcode of the address.
The output of the query above would be:
rating | lon | lat | stno | street | styp | city | st | zip --------+-------------------+----------------+------+--------+------+--------+----+------- 0 | -71.0557505845646 | 42.35897920691 | 75 | State | St | Boston | MA | 02109