PostgreSQL Collation - part 2
Due to many questions about the PostgreSQL collation that were asked after my last short article about that, I decided to write some more about the current collation situation in PostgreSQL.
What The Collation Is
The collation setting is responsible for correct working functions such as
- converting letter to upper
- converting letter to lower
- sorting texts
- ability to use indexes in
If we have text written in Chineese or Polish and we’ve got the Japaneese collation, we can be sure that sorting texts won’t look like we want it.
Types of Language Settings
Generally there are two types of languge settings in PostgreSQL. One can be set for each connection and the other only during the initdb and can’t be changed.
How to Set Collation For The Cluster
There is only one way for setting the cluster collation - during the initdb. Initdb is a program for creating the initial files, and the directory structure for the database cluster.
Once the collation is set, you can’t change it.
For setting the collation, you can initialize the cluster with the command:
Of course you can use the initdb without any option, then the language settings will be taken from the current environment variables.
How to Set Language Settings for Client
For client connection there can be set language settings such as:
LC_MESSAGES– language of messages
LC_MONETARY– formatting of currency amounts
LC_NUMERIC– formatting of numbers
LC_TIME– formatting of dates and times
Setting the above variables won’t change anything for the current database or the whole cluster. You can’t change or improve that way how texts are sorted.
The above options can be changed for the whole database using:
ALTER DATABASE mydb SET LC_NUMERIC TO 'my value';
How to Check The Variables
For checking all the above variables you can connect to you database and write a query such as:
A possible output is:
# show LC_COLLATE; lc_collate ------------- en_US.UTF-8 (1 row)
Other options can be checked the same way.
How To Reinitdb
That’s not so easy. There is no way to, so called, reinitdb. You can initdb and nothing more. To change those two language settings that can’t be changed normally you can do:
- Just make dump of all databases
- Stop the server.
- Delete the whole PostgreSQL data directory.
- Run initdb once again but with other language settings.
- Load the dump that has been made in the first point.
Fortunately in the next version 8.4 there will be some progress in playing with the languge settings. I wrote that before in PostgreSQL Collation.