Working with PostgreSQL
PostgreSQL on edegan.com
We have a dedicated Postgres server available for use by interns, affiliates, and researchers. It is available by SSH directly and through the RDP.
Connecting to the dbase server
To connect through the RDP get a copy of PuTTY (put PuTTY.exe on your desktop) from:
E:/installs
The preferred way to connect is from the RDP where you can stay inside the private network, and have gigabit connection speeds. Connect to:
researcher@192.168.2.92
It is possible to connect from other machines over the Internet. You shouldn't do this unless you have to but in this case connect via SSH to:
reseacher@199.188.177.215 or researcher@ssh.edegan.com
All of the data files (tab-delimited text) that need to be loaded in and out of a dbase for your project should be stored in:
/bulk/YourDbase
Note: To make your life easy, map the database's bulk drive on your RDP account. Follow the instructions to do this. We refer to the database server's bulk drive as either dbase/bulk or as Z:, as this is the drive letter most commonly mapped to.
Working with psql
After you have ssh'd onto the server change directory to your data directory:
cd /bulk/yourdir
Then connect a database:
psql DBName
Note: to use a local copy of psql (if you have it installed locally), connect using the username researcher and DBname:
psql -h dbase.edegan.com -U researcher dbname
You MUST store all of your SQL commands in a file named yourfilename.sql that is stored in:
E:/projects/YourProject/
There are NO EXCEPTIONS to this. All of your code must go into a .sql file. Even exploratory code. You can copy out of there line by line to run code.
Useful PostgreSQL commands
Useful commands are:
\q Quits \l List all dbases available to your username \i basics.sql Run script basics.sql \dt List tables \d tablename Shows the schema of the table \r Reset the query buffer Ctrl-c Abort the current query q Go back to the prompt when viewing a dataset \COPY Psql's version of copy (See below)
Dumping and Restoring a Database
This can be done in pgAdmin in Windows, but the commands (needed on Linux with SSH access) are:
To dump a db:
pg_dump mydb > db.backup
To reload this database:
pg_restore -d DBName db.backup
We typically use compression (Postgres' format custom), so the best command is:
pg_dump -Fc dbase > dbase_fc.dump pg_restore -Fc -d DBName db.backup
All backups are stored in /bulk/backups. You can drop a database after it has been dumped with:
dropdb dbase
To selectively restore a single table use the -t tablename option. See https://www.postgresql.org/docs/9.2/static/app-pgrestore.html for the other options.
SQL Commands
CREATE, DROP, \COPY
There is a list of SQL commands that may help.
Make/delete tables and functions with CREATE and DROP:
CREATE TABLE tablename AS SELECT * FROM tablename WHERE date_prod >= '2002-01-01'; DROP TABLE tablename; CREATE TABLE tablename ( field1 varchar(100), field2 int, field3 date, field4 real );
Functions can be written in Perl, Python and other languages. See below for more information.
CREATE FUNCTION getreal (text) RETURNS real AS $$ if ($_[0]=~/^\d{1,}\.\d{0,}$/) { return $_[0]; } return undef; $$ LANGUAGE plperl; DROP Function correctyear(int,int);
Do not do any of the following:
Populate data with COPY, INSERT and UPDATE: INSERT INTO tablename VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27'); COPY tablename FROM '/home/user/weather.txt'; --http://www.postgresql.org/docs/8.4/interactive/sql-copy.html UPDATE tablename SET kind = 'Dramatic' WHERE kind = 'Drama'; UPDATE accounts SET (contact_last_name, contact_first_name) = (SELECT last_name, first_name FROM salesmen WHERE salesmen.id = accounts.sales_id);
Instead, always build stack of tables using:
CREATE TABLE tablename AS SELECT * FROM tablename WHERE ...; DROP TABLE tablename;
Always load/unload data using the PostgreSQL specific copy function below. Always load tab-delimited data that is UTF-8 encoded, with PC or UNIX line endings, and that has a header row. NEVER DEVIATE FROM THIS.
Load using: \COPY tablename FROM 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
Unload (copy to txt file) using: \COPY tablename TO 'filename.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
SELECT
Retrieve results with SELECT:
SELECT x.a y.a FROM x,y WHERE x.a=y.a SELECT * FROM x LEFT OUTER JOIN y ON x.a=y.a; SELECT * FROM (SELECT ) AS x LEFT OUTER JOIN (SELECT ) AS y ON x.a=y.a; SELECT COUNT (*) FROM ( ) AS Temp; SELECT COUNT (*) FROM Acq; SELECT x, min(y) FROM tablename GROUP BY x; SELECT * FROM tablename LIMIT 10; SELECT CASE WHEN a > b THEN 1 ELSE 2 END As Colname, FROM tablename;
Get information on a table:
SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'tablename';
ALTER
DON'T DO THIS. CREATE A NEW TABLE INSTEAD!
Change a table with ALTER:
ALTER TABLE tablename ADD COLUMN colname real; ALTER TABLE tablename RENAME COLUMN product_no TO product_number;
EXPLAIN
Find out how a query will be executed with EXPLAIN (a Postgre command):
EXPLAIN ANALYZE SELECT * FROM x;
CREATE OR DROP INDEX
If the dbase is large or you just need things to run faster, add an index to your key fields.
CREATE UNIQUE INDEX title_idx ON films (title); CREATE INDEX title_idx ON films (title); DROP INDEX title_idx;
See https://www.postgresql.org/docs/9.5/static/sql-createindex.html for more options
SEQUENCES
If you want to create a sequence:
CREATE SEQUENCE serial START 101;
To use the sequence call:
nextval('serial');
Perl Functions
NOTE: Perl and Python Functions only work the the dbase server, not the RDP (where perl has a dependency error in plperl.dll and python has unknown issues).
PLPerl was installed into Template1 (and hence all new databases) when the server was first set up.
An example perl function is:
CREATE OR REPLACE FUNCTION getint (text) RETURNS int AS $$ if ($_[0]) { my $var=$_[0]; if ($var=~/^\d\d\d\d\d\d\d+$/) { return 1; } return undef; } return undef; $$ LANGUAGE plperl;
Python Functions
Installing python in a database - if Python is not already installed
Source: https://www.postgresql.org/docs/9.4/static/plpython-funcs.html
To get into database via terminal:
1) ssh researcher@ssh.edegan.com 2) cd \folder_name 3) psql database_name
Creating Functions:
CREATE FUNCTION pymax (a integer, b integer) RETURNS integer AS $$ if a > b: return a return b $$ LANGUAGE plpythonu;
Configuring a copy of Postgres on Windows
If you'd like to set up a copy of PostgreSQL on your windows laptop or desktop, the following instructions may be helpful.
Adding PostgresSQL to the PATH
Add PostgreSQL to the path if it isn't already:
Control Panel->System->Advanced->Environmental Variables Add: C:\Program Files\PostgreSQL\9.0\bin
or from a cmd window (though sometimes this doesn't stick):
SET PATH=%PATH%;C:\Program Files\PostgreSQL\9.0\bin;
PostgreSQL and Perl
To use PLPerl on windows, you will need to be careful to mix and match the right versions of PostgreSQL and Perl.
Perl version 5.10.1 (a 64bit build is available from ActiveState) works with PostgreSQL 9.0.1 (64bit build 1500). To see your PostgreSQL version type:
psql -c "select version();" template1
To see your perl version type:
perl -v
With these two versions together you should be able to add plperl to template1 (which all new dbs will inherit) with the command:
createlang plperl template1
There is a list of commands/client applications, with links to documentation, which is useful.
Basic Performance Tuning
Note that the dbase server at ssh.edegan.com does not use the settings below. Its configuration is much more aggressive.
You will almost surely want to 'performance tune' your postgresql database, as the default settings are near useless. In particular edit postgresql.conf (which is in the data directory of your install) to change:
shared_buffers = 512MB #Use about 10-15% of available RAM, to a max of 512Mb on windows effective_cache_size = 3GB #Use half to 3/4 of available RAM, depending on your pref work_mem = 128MB # This is the memory allocated to each query sort maintenance_work_mem = 256MB #This is for vacuum, and a max of 256 is recommended.
Note that with work_mem this is the allocation to each sort. Each query you run may do many sorts and you may have many users, so this can explode quickly. 128Mb is an aggresive setting that assumes only a single user. See the various documentation resources, especially the official performance optimization page for more information.
Restart the server (on windows use the 'services' control panel) for many changes to take effect:
pg_ctl restart
Creating Users and Dbases
Create a user using pgAdmin or the createuser command:
createuser username
And then create a database again using pgAdmin or the createdb command:
createdb -O username DBName