Tiger Geocoder

From edegan.com
Revision as of 16:13, 15 November 2017 by Peterjalbert (talk | contribs)
Jump to navigation Jump to search



This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:

McNair Project
Tiger Geocoder
Tiger.jpg
Project Information
Project Title Tiger Geocoder
Owner Peter Jalbert
Start Date Fall 2017
Deadline
Keywords Tiger, Geocoder, Database
Primary Billing
Notes
Has project status Active
Copyright © 2016 edegan.com. All Rights Reserved.

PostGIS Installation Database Server Documentation

The official documentation for using and installing the Tiger Geocoder can be found in the following.

General Instructions Installation Instructions Geocoder Documentation

Location

The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called "coffeeshops" that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then,

psql geocoder


Installation

Install and Nation Data

I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:

--Add Extensions to database
CREATE EXTENSION postgis;
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION postgis_tiger_geocoder;
CREATE EXTENSION address_standardizer;

You can test that the installation worked by running the following query:

SELECT na.address, na.streetname,na.streettypeabbrev, na.zip
	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;

This should return the following:

 address | streetname | streettypeabbrev |  zip
---------+------------+------------------+-------
	   1 | Devonshire | Pl               | 02109

Next, a new profile needs to be created by using the following command.

INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, 
		   loader, environ_set_command, county_process_command)
SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,
	   loader, environ_set_command, county_process_command
  FROM tiger.loader_platform
  WHERE os = 'sh';

The installation instructions also provide the following note:

As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.

If you would like this feature, you can enable it by using the following command. This should be done before loading the script.

UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';

The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:

export PGHOST=localhost                       +
export PGUSER=postgres                        +
export PGPASSWORD=yourpasswordhere            +
export PGDATABASE=geocoder                    +
PSQL=${PGBIN}/psql                            +
SHP2PGSQL=shp2pgsql                           +
cd ${staging_fold}                            +

TMPDIR="${staging_fold}/temp/"                +
UNZIPTOOL=unzip                               +
WGETTOOL="/usr/bin/wget"                      +
export PGBIN=/usr/lib/postgresql/9.6/bin      +
export PGPORT=5432                            +
export PGHOST=localhost                       +
export PGUSER=postgres                        +
export PGPASSWORD=yourpasswordhere            +
export PGDATABASE=geocoder                    +
PSQL=${PGBIN}/psql                            +
SHP2PGSQL=shp2pgsql                           +
cd ${staging_fold}

Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:

/gisdata

There needs to be a directory called "temp" in the gisdata directory. To make the script, use the following from the command line:

psql -c "SELECT Loader_Generate_Nation_Script('test')" -d databasename -tA > /gisdata/nation_script_load.sh

This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths.

Change directories:

cd /gisdata

Edit the script using your favorite command line text editor. Specifically, edit the following fields.

PGUSER=postgres
PGPASSWORD=(Ask Anne for this password)!

Everything else remains the same.

Run the script by using: 
sh nation_script_load.sh

Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.

State Data

The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.

psql -c "SELECT Loader_Generate_Script(ARRAY['MA'], 'test')" -d geocoder -tA > /gisdata/ma_load.sh

CURRENT PROGRESS: The following states have been downloaded into the geocoder database.

AL, AK, AZ, AR, CA, CO, CT, DE, FL, MA

Current Errors

The state scripts stopped working on 11/1/2017 while they were working on 10/31/2017. Now, when a retrieval script is run, it draws the error

HTTP request sent, awaiting response... 403 Forbidden

Possible thoughts:

Maybe our IP has been blacklisted for downloading data from a government website quickly? If so, PostGIS should really choose a different installation method. The current one is dumb.

Maybe the nation downloader script never worked properly. Not sure how to check if it is correct or not; seems right.

This is the only online forum I could find with others who have faced a similar issue.

Geocode Function

The official arguments for the function are the following:

setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);

The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.

Single Address

An example query for a single address is:

SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,
    (addy).address As stno, (addy).streetname As street,
    (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip
    FROM geocode('75 State Street, Boston MA 02109', 1) As g;

rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess. ST_X(g.geomout) -- This retrieves the longitude coordinate of the point. ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point. addy -- In general, addy is a normalized address resulting from the input address. (addy).address -- The number of the address (Ex: "75" Blabla rd.) (addy).streetname -- The name of the street (Ex: 75 "Blabla" rd.) (addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla "rd") (addy).location -- The city location of the address. (addy).stateabbrev -- The state abbreviation. (addy).zip -- The zipcode of the address.

The output of the query above would be:

 rating |        lon        |      lat       | stno | street | styp |  city  | st |  zip
--------+-------------------+----------------+------+--------+------+--------+----+-------
      0 | -71.0557505845646 | 42.35897920691 |   75 | State  | St   | Boston | MA | 02109

Current Errors

Currently, we are getting a 403 Forbidden Error when trying to download state data. We are in contact with the US Census Bureau. Their contact information can be found here.

The email exchanges are recorded below.



I am a student researcher at the McNair Center for Entrepreneurship and Innovation at Rice University, and I am in the process of installing a Postgres Extension that relies on the TIGER data.

When I began installing the extension, things were working fine. Now however, I am getting a 403 Forbidden error when the script tries to download the TIGER files. Do you have any idea why this might be happening?

The extension I'm trying to install is below:

http://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension

When I run any of the scripts that require data from TIGER, I am receiving the following error:


--2017-11-06 14:15:29-- http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_06065_featnames.zip Resolving www2.census.gov (www2.census.gov)... 104.84.241.90, 2600:1404:a:382::208c, 2600:1404:a:39c::208c Connecting to www2.census.gov (www2.census.gov)|104.84.241.90|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2017-11-06 14:15:29 ERROR 403: Forbidden.



Thank you for reaching out to us. The postgis docs is not from the Geography Division, so I cannot comment on that. A couple things that come to mind.

We just released our new 2017 Shapefiles, so it's possible the scripts may be written for a previous version of our Shapefiles. You may need to clean out your cookies, restart your browser, and then attempt to reinstall. Were you able to download our Shapefiles successfully?


The Shapefiles are part of the files being blocked by the 403 Forbidden error.

The script is using a wget protocol to bulk download the data, so there were no cookies involved. Also, the download had been working in previous days; it was only recently that the same scripts stopped working. I am worried that our IP address somehow ended up on a blacklist for the TIGER data. Is there a blacklist for addresses that access the TIGER data?

Our IP address is 128.42.44.181.


I have forwarded your question on to our IT folks. Since I work on the subject matter side, I am unable to answer your questions. Once I hear back from them, I will forward their response to you. Hopefully they will provide to you what you need in order to download our Shapefiles. My apologies for your inconvenience.