Difference between revisions of "Working with PostgreSQL"

From edegan.com
Jump to navigation Jump to search
Line 1: Line 1:
 
[[Category: McNair Admin]]
 
[[Category: McNair Admin]]
  
==Configuring a copy of Postgres on Windows==
 
  
Add PostgreSQL to the path if it isn't already:
 
Control Panel->System->Advanced->Environmental Variables
 
Add: C:\Program Files\PostgreSQL\9.0\bin
 
  
or from a cmd window (though sometimes this doesn't stick):
+
==PostgreSQL at the McNair Center==
SET PATH=%PATH%;C:\Program Files\PostgreSQL\9.0\bin;
 
  
To use PLPerl on windows, you will need to be careful to mix and match the right versions of PostgreSQL and Perl.  
+
We have a dedicated Postgres server available for use by McNair interns, affiliates, and researchers. It is available by SSH from inside the Rice network, including through the RDP.  
  
Perl version 5.10.1 (a 64bit build is available from [http://www.activestate.com/activeperl/downloads ActiveState]) works with PostgreSQL 9.0.1 (64bit build 1500). To see your PostgreSQL version type:
+
===Connecting to the dbase server===
  
psql -c "select version();" template1
+
To connect through the RDP get a copy of PuTTY (put PuTTY.exe on your desktop) from:
 +
E:/McNair/Installs
  
To see your perl version type:
+
Connect via SSH to:
 +
reseacher@128.42.44.181
 +
  or
 +
researcher@dbase.mcnaircenter.org
  
  perl -v
+
All of the data files (tab-delimited text) that need to be loaded in and out of a dbase for your project should be stored in:
 +
  Z:/Bulk/YourDbase
  
With these two versions together you should be able to add plperl to template1 (which all new dbs will inherit) with the command:
+
Note: To make your life easy, map the database's bulk drive on your RDP account.  [[Help:Access_RDP_Sever#Connecting_the_Database_Server|Follow the instructions]] to do this. We refer to the database server's bulk drive as either 181/bulk (where 181 is the last segment of its address) or as Z:, as this is the drive letter most commonly mapped to.
  
createlang plperl template1
+
==Working with psql==
  
There is a [http://www.postgresql.org/docs/7.3/static/reference-client.html list of commands/client applications], with links to documentation, which is useful.
+
After you have ssh'd onto the server change directory to your data directory:
 +
cd /bulk/yourdir
  
You will almost surely want to 'performance tune' your postgresql database, as the default settings are near useless. In particular edit postgresql.conf (which is in the data directory of your install) to change:
+
Then connect a database:
 +
psql DBName
  
shared_buffers = 512MB #Use about 10-15% of available RAM, to a max of 512Mb on windows
+
Note: to use a local copy of psql (if you have it installed locally), connect using the username researcher and DBname:
effective_cache_size = 3GB #Use half to 3/4 of available RAM, depending on your pref
+
  psql -h 128.42.44.181 -U researcher dbname
work_mem = 128MB # This is the memory allocated to each query sort
 
  maintenance_work_mem = 256MB #This is for vacuum, and a max of 256 is recommended.
 
  
Note that with ''work_mem'' this is the allocation to each sort. Each query you run may do many sorts and you may have many users, so this can explode quickly. 128Mb is an aggresive setting that assumes only a single user. See the [http://wiki.postgresql.org/wiki/Performance_Optimization various documentation resources], especially the [http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server official performance optimization page] for more information.
+
You MUST store all of your SQL commands in a file named yourfilename.sql that is stored in:
 +
E:/McNair/Projects/YourProject/
  
Restart the server (on windows use the 'services' control panel) for many changes to take effect:
+
There are NO EXCEPTIONS to this. All of your code must go into a .sql file. Even exploratory code. You can copy out of there line by line to run code.
  
pg_ctl restart
+
===Useful PostgreSQL commands===
  
Create a user using pgAdmin or the createuser command:
+
Useful commands are:
 
+
  \q            Quits
  createuser ed_egan
+
  \l            List all dbases available to your username
 
+
\i basics.sql  Run script basics.sql
And then create a database again using pgAdmin or the createdb command:
+
\dt            List tables
 
+
\d tablename  Shows the schema of the table
  createdb -O ed_egan DBName
+
  \r            Reset the query buffer
 
+
  Ctrl-c        Abort the current query
==Postgres.Haas==
+
q              Go back to the prompt when viewing a dataset
 
+
\COPY          Psql's version of copy (See below)
We have a new dedicated Postgres server available for use by PhD Students and Faculty only. Access to the server is available from within Haas or over the VPN.
 
 
 
To get an account see Ed Egan (in F533). Ed can either set you up with SSH access so that you can create an account on the database server and as many databases as you like, or he can create an account on the database server, and a blank database owner solely by your account, for you.
 
 
 
Once you have a username and password you and a blank database, can use ''psql'' on your local computer (available from [http://psql.sourceforge.net/]) to connect to the database server as detailed below. You will probably find that this gives you everything you need - you can load data, perform queries, and produce output datasets for analysis (for example in STATA).  
 
 
 
Without an SSH account you may not be able to use the backup and restore command-line tools. We are currently experimenting with ways to overcome this if necessary. Please ask for more information. If you are comfortable using SSH then all advanced functionality will be available, including the ability to mount your R drive using the commands:
 
 
 
  mkdir /mnt/username
 
  mount -t cifs //bear/username/ /mnt/ed -o user=username
 
 
 
Currently we are running Postgres version 8.1 on this server. An upgrade is being considered, but the core functionality is more than adequate for our purposes. One useful function that is missing is '''unnest''', which was introduced in version 8.4. However, this can be created manually with [http://wiki.postgresql.org/wiki/Array_Unnest details here].
 
  
==Dumping and Restoring a Database==
+
===Dumping and Restoring a Database===
  
 
This can be done in pgAdmin in Windows, but the commands (needed on Linux with SSH access) are:
 
This can be done in pgAdmin in Windows, but the commands (needed on Linux with SSH access) are:
Line 72: Line 61:
 
  pg_restore -d DBName db.backup
 
  pg_restore -d DBName db.backup
  
==Working with psql==
+
==SQL Commands==
 
 
To connect your psql client to a db type (for localhost don't specify the host):
 
psql -h host DBName
 
  
Therefore to connect to Postgres.Haas you will type:
+
===CREATE, DROP, \COPY===
psql -h postgres.haas.berkeley.edu DBName
 
 
 
The 'new' postgres server is currently on 128.32.252.201. Connect using your username and DBname (firstname_data by default):
 
psql -h 128.32.252.201 -U username dbname
 
 
 
 
 
Useful commands are:
 
\q            Quits
 
\i basics.sql  Run script basics.sql
 
\dt            List tables
 
\COPY         Psql's version of copy
 
\r            Reset the query buffer
 
Ctrl-c        Abort the current query
 
 
 
==SQL Commands==
 
  
 
There is a list of [http://www.postgresql.org/docs/7.3/static/sql-commands.html SQL commands] that may help.
 
There is a list of [http://www.postgresql.org/docs/7.3/static/sql-commands.html SQL commands] that may help.
Line 103: Line 74:
 
  DROP TABLE tablename;
 
  DROP TABLE tablename;
 
   
 
   
 +
CREATE TABLE tablename (
 +
  field1 varchar(100),
 +
  field2 int,
 +
  field3 date,
 +
  field4 real
 +
);
 +
 +
Functions can be written in Perl, Python and other languages. See below for more information.
 +
 
  CREATE FUNCTION getreal (text) RETURNS real AS $$
 
  CREATE FUNCTION getreal (text) RETURNS real AS $$
 
     if ($_[0]=~/^\d{1,}\.\d{0,}$/) { return $_[0]; }
 
     if ($_[0]=~/^\d{1,}\.\d{0,}$/) { return $_[0]; }
Line 110: Line 90:
 
  DROP Function correctyear(int,int);
 
  DROP Function correctyear(int,int);
  
Populate data with COPY, INSERT and UPDATE:
+
'''Do not do any of the following:'''
 +
 
 +
Populate data with COPY, INSERT and UPDATE:
 
   
 
   
INSERT INTO tablename VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
+
  INSERT INTO tablename VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
 
   
 
   
COPY tablename FROM '/home/user/weather.txt'; --http://www.postgresql.org/docs/8.4/interactive/sql-copy.html
+
  COPY tablename FROM '/home/user/weather.txt'; --http://www.postgresql.org/docs/8.4/interactive/sql-copy.html
 
   
 
   
 
   UPDATE tablename SET kind = 'Dramatic' WHERE kind = 'Drama';
 
   UPDATE tablename SET kind = 'Dramatic' WHERE kind = 'Drama';
Line 121: Line 103:
 
       (SELECT last_name, first_name FROM salesmen
 
       (SELECT last_name, first_name FROM salesmen
 
       WHERE salesmen.id = accounts.sales_id);
 
       WHERE salesmen.id = accounts.sales_id);
 +
 +
'''Instead, always build stack of tables using:'''
 +
 +
CREATE TABLE tablename AS
 +
  SELECT * FROM tablename WHERE ...;
 +
 +
DROP TABLE tablename;
 +
 +
Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS.
 +
 +
Load using:
 +
\COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
 +
 +
Unload (copy to txt file) using:
 +
\COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
 +
 +
===SELECT===
  
 
Retrieve results with SELECT:
 
Retrieve results with SELECT:
Line 151: Line 150:
 
  SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'tablename';
 
  SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'tablename';
 
   
 
   
 +
===ALTER===
 +
 +
DON'T DO THIS. CREATE A NEW TABLE INSTEAD!
 +
 
Change a table with ALTER:  
 
Change a table with ALTER:  
 
  ALTER TABLE tablename ADD COLUMN colname real;
 
  ALTER TABLE tablename ADD COLUMN colname real;
Line 156: Line 159:
 
  ALTER TABLE tablename RENAME COLUMN product_no TO product_number;
 
  ALTER TABLE tablename RENAME COLUMN product_no TO product_number;
 
   
 
   
 +
===EXPLAIN===
 +
 
Find out how a query will be executed with EXPLAIN (a Postgre command):
 
Find out how a query will be executed with EXPLAIN (a Postgre command):
  
Line 179: Line 184:
 
     return b
 
     return b
 
   $$ LANGUAGE plpythonu;
 
   $$ LANGUAGE plpythonu;
 +
 +
==Configuring a copy of Postgres on Windows==
 +
 +
If you'd like to set up a copy of PostgreSQL on your windows laptop or desktop, the following instructions may be helpful.
 +
 +
===Adding PostgresSQL to the PATH===
 +
 +
Add PostgreSQL to the path if it isn't already:
 +
Control Panel->System->Advanced->Environmental Variables
 +
Add: C:\Program Files\PostgreSQL\9.0\bin
 +
 +
or from a cmd window (though sometimes this doesn't stick):
 +
SET PATH=%PATH%;C:\Program Files\PostgreSQL\9.0\bin;
 +
 +
===PostgreSQL and Perl===
 +
 +
To use PLPerl on windows, you will need to be careful to mix and match the right versions of PostgreSQL and Perl.
 +
 +
Perl version 5.10.1 (a 64bit build is available from [http://www.activestate.com/activeperl/downloads ActiveState]) works with PostgreSQL 9.0.1 (64bit build 1500). To see your PostgreSQL version type:
 +
 +
psql -c "select version();" template1
 +
 +
To see your perl version type:
 +
 +
perl -v
 +
 +
With these two versions together you should be able to add plperl to template1 (which all new dbs will inherit) with the command:
 +
 +
createlang plperl template1
 +
 +
There is a [http://www.postgresql.org/docs/7.3/static/reference-client.html list of commands/client applications], with links to documentation, which is useful.
 +
 +
===Basic Performance Tuning===
 +
 +
You will almost surely want to 'performance tune' your postgresql database, as the default settings are near useless. In particular edit postgresql.conf (which is in the data directory of your install) to change:
 +
 +
shared_buffers = 512MB #Use about 10-15% of available RAM, to a max of 512Mb on windows
 +
effective_cache_size = 3GB #Use half to 3/4 of available RAM, depending on your pref
 +
work_mem = 128MB # This is the memory allocated to each query sort
 +
maintenance_work_mem = 256MB #This is for vacuum, and a max of 256 is recommended.
 +
 +
Note that with ''work_mem'' this is the allocation to each sort. Each query you run may do many sorts and you may have many users, so this can explode quickly. 128Mb is an aggresive setting that assumes only a single user. See the [http://wiki.postgresql.org/wiki/Performance_Optimization various documentation resources], especially the [http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server official performance optimization page] for more information.
 +
 +
Restart the server (on windows use the 'services' control panel) for many changes to take effect:
 +
 +
pg_ctl restart
 +
 +
===Creating Users and Dbases===
 +
 +
Create a user using pgAdmin or the createuser command:
 +
 +
createuser ed_egan
 +
 +
And then create a database again using pgAdmin or the createdb command:
 +
 +
createdb -O ed_egan DBName
  
 
[[admin_classification::IT Build| ]]
 
[[admin_classification::IT Build| ]]

Revision as of 13:39, 20 March 2017



PostgreSQL at the McNair Center

We have a dedicated Postgres server available for use by McNair interns, affiliates, and researchers. It is available by SSH from inside the Rice network, including through the RDP.

Connecting to the dbase server

To connect through the RDP get a copy of PuTTY (put PuTTY.exe on your desktop) from:

E:/McNair/Installs

Connect via SSH to:

reseacher@128.42.44.181
 or
researcher@dbase.mcnaircenter.org

All of the data files (tab-delimited text) that need to be loaded in and out of a dbase for your project should be stored in:

Z:/Bulk/YourDbase

Note: To make your life easy, map the database's bulk drive on your RDP account. Follow the instructions to do this. We refer to the database server's bulk drive as either 181/bulk (where 181 is the last segment of its address) or as Z:, as this is the drive letter most commonly mapped to.

Working with psql

After you have ssh'd onto the server change directory to your data directory:

cd /bulk/yourdir

Then connect a database:

psql DBName

Note: to use a local copy of psql (if you have it installed locally), connect using the username researcher and DBname:

psql -h 128.42.44.181 -U researcher dbname

You MUST store all of your SQL commands in a file named yourfilename.sql that is stored in:

E:/McNair/Projects/YourProject/

There are NO EXCEPTIONS to this. All of your code must go into a .sql file. Even exploratory code. You can copy out of there line by line to run code.

Useful PostgreSQL commands

Useful commands are:

\q             Quits
\l             List all dbases available to your username
\i basics.sql  Run script basics.sql
\dt            List tables
\d tablename   Shows the schema of the table
\r             Reset the query buffer
Ctrl-c         Abort the current query
q              Go back to the prompt when viewing a dataset
\COPY          Psql's version of copy (See below)

Dumping and Restoring a Database

This can be done in pgAdmin in Windows, but the commands (needed on Linux with SSH access) are:

To dump a db:

pg_dump mydb > db.backup

To reload this database:

pg_restore -d DBName db.backup

SQL Commands

CREATE, DROP, \COPY

There is a list of SQL commands that may help.

Make/delete tables and functions with CREATE and DROP:

CREATE TABLE tablename AS
  SELECT * FROM tablename WHERE date_prod >= '2002-01-01';

DROP TABLE tablename;

CREATE TABLE tablename (
 field1 varchar(100),
 field2 int,
 field3 date,
 field4 real
);

Functions can be written in Perl, Python and other languages. See below for more information.

CREATE FUNCTION getreal (text) RETURNS real AS $$
    if ($_[0]=~/^\d{1,}\.\d{0,}$/) { return $_[0]; }
    return undef;
$$ LANGUAGE plperl;
 
DROP Function correctyear(int,int);

Do not do any of the following:

Populate data with COPY, INSERT and UPDATE:

 INSERT INTO tablename VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');

 COPY tablename FROM '/home/user/weather.txt'; --http://www.postgresql.org/docs/8.4/interactive/sql-copy.html

 UPDATE tablename SET kind = 'Dramatic' WHERE kind = 'Drama';
 
 UPDATE accounts SET (contact_last_name, contact_first_name) =
     (SELECT last_name, first_name FROM salesmen
     WHERE salesmen.id = accounts.sales_id);

Instead, always build stack of tables using:

CREATE TABLE tablename AS
  SELECT * FROM tablename WHERE ...;

DROP TABLE tablename;

Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS.

Load using:
\COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS  CSV
Unload (copy to txt file) using:
\COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS  CSV

SELECT

Retrieve results with SELECT:

SELECT x.a y.a FROM x,y WHERE x.a=y.a

SELECT * FROM x LEFT OUTER JOIN y ON x.a=y.a;

SELECT * FROM
	(SELECT ) AS x
LEFT OUTER JOIN 
	(SELECT ) AS y
ON x.a=y.a;

SELECT COUNT (*) FROM ( ) AS Temp;

SELECT COUNT (*) FROM Acq;

SELECT x, min(y) FROM tablename GROUP BY x;

SELECT * FROM tablename LIMIT 10;

SELECT 
	CASE WHEN a > b THEN 1
	ELSE  2 
	END As Colname,
FROM tablename;

Get information on a table:

SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'tablename';

ALTER

DON'T DO THIS. CREATE A NEW TABLE INSTEAD!

Change a table with ALTER:

ALTER TABLE tablename ADD COLUMN colname real;

ALTER TABLE tablename RENAME COLUMN product_no TO product_number;

EXPLAIN

Find out how a query will be executed with EXPLAIN (a Postgre command):

EXPLAIN ANALYZE SELECT * FROM x;

Python Functions

Installing python in a database - if Python is not already installed

Source: https://www.postgresql.org/docs/9.4/static/plpython-funcs.html

To get into database via terminal:

 1) ssh researcher@128.42.44.181
 2) cd \folder_name
 3) psql database_name

Creating Functions:

CREATE FUNCTION pymax (a integer, b integer)
   RETURNS integer
AS $$
   if a > b:
     return a
   return b
 $$ LANGUAGE plpythonu;

Configuring a copy of Postgres on Windows

If you'd like to set up a copy of PostgreSQL on your windows laptop or desktop, the following instructions may be helpful.

Adding PostgresSQL to the PATH

Add PostgreSQL to the path if it isn't already:

Control Panel->System->Advanced->Environmental Variables
Add: C:\Program Files\PostgreSQL\9.0\bin

or from a cmd window (though sometimes this doesn't stick):

SET PATH=%PATH%;C:\Program Files\PostgreSQL\9.0\bin;

PostgreSQL and Perl

To use PLPerl on windows, you will need to be careful to mix and match the right versions of PostgreSQL and Perl.

Perl version 5.10.1 (a 64bit build is available from ActiveState) works with PostgreSQL 9.0.1 (64bit build 1500). To see your PostgreSQL version type:

psql -c "select version();" template1

To see your perl version type:

perl -v

With these two versions together you should be able to add plperl to template1 (which all new dbs will inherit) with the command:

createlang plperl template1

There is a list of commands/client applications, with links to documentation, which is useful.

Basic Performance Tuning

You will almost surely want to 'performance tune' your postgresql database, as the default settings are near useless. In particular edit postgresql.conf (which is in the data directory of your install) to change:

shared_buffers = 512MB	#Use about 10-15% of available RAM, to a max of 512Mb on windows
effective_cache_size = 3GB #Use half to 3/4 of available RAM, depending on your pref
work_mem = 128MB # This is the memory allocated to each query sort
maintenance_work_mem = 256MB #This is for vacuum, and a max of 256 is recommended.

Note that with work_mem this is the allocation to each sort. Each query you run may do many sorts and you may have many users, so this can explode quickly. 128Mb is an aggresive setting that assumes only a single user. See the various documentation resources, especially the official performance optimization page for more information.

Restart the server (on windows use the 'services' control panel) for many changes to take effect:

pg_ctl restart

Creating Users and Dbases

Create a user using pgAdmin or the createuser command:

createuser ed_egan

And then create a database again using pgAdmin or the createdb command:

createdb -O ed_egan DBName