PostgreSQL Collation - part 2

by Szymon LipiƄski

Due to many questions about the PostgreSQL collation that were asked after my last short article about that, I decided to write some more about the current collation situation in PostgreSQL.

What The Collation Is

The collation setting is responsible for correct working functions such as

If we have text written in Chineese or Polish and we’ve got the Japaneese collation, we can be sure that sorting texts won’t look like we want it.

Types of Language Settings

Generally there are two types of languge settings in PostgreSQL. One can be set for each connection and the other only during the initdb and can’t be changed.

How to Set Collation For The Cluster

There is only one way for setting the cluster collation - during the initdb. Initdb is a program for creating the initial files, and the directory structure for the database cluster.

Once the collation is set, you can’t change it.

For setting the collation, you can initialize the cluster with the command:

initdb --locale=fr_CA

Of course you can use the initdb without any option, then the language settings will be taken from the current environment variables.

How to Set Language Settings for Client

For client connection there can be set language settings such as:

Setting the above variables won’t change anything for the current database or the whole cluster. You can’t change or improve that way how texts are sorted.

The above options can be changed for the whole database using:

ALTER DATABASE mydb SET LC_NUMERIC TO 'my value';

How to Check The Variables

For checking all the above variables you can connect to you database and write a query such as:

show LC_COLLATE;

A possible output is:

# show LC_COLLATE;  
lc_collate  
-------------  
en_US.UTF-8  
(1 row)

Other options can be checked the same way.

How To Reinitdb

That’s not so easy. There is no way to, so called, reinitdb. You can initdb and nothing more. To change those two language settings that can’t be changed normally you can do:

  1. Just make dump of all databases
  2. Stop the server.
  3. Delete the whole PostgreSQL data directory.
  4. Run initdb once again but with other language settings.
  5. Load the dump that has been made in the first point.

What next?

Fortunately in the next version 8.4 there will be some progress in playing with the languge settings. I wrote that before in PostgreSQL Collation.