PostgreSQL Locale & Encoding FAQ

Check out these frequently asked questions to learn how to create and manage your PostgreSQL locale and encoding.

What is Locale and character encoding?

A locale is an explicit model of the subset of a user's environment that depends on language and cultural conventions. This includes date and time formatting conventions, monetary conventions, decimal formatting conventions, and collation (sort) order.

An encoding, or character set, defines the mapping between human-readable characters and their binary representations. ASCII is the oldest and most well known character set - but has limited support for non-English characters. UTF-8 is one of the most versatile character sets and has become the default choice these days.

Check out these links for more information on locales and PostgreSQL locale support.

When should I change my locale and character encoding?

If you are in doubt, you probably shouldn't change these.

In general, you will not need to change your locale and character encoding. Almost all modern PostgreSQL clients will pick up your system's locale and behave accordingly - so if your system language is set to French, psql (and other clients) will automatically display messages in French and use French formatting conventions. They will also pick up a charset the client machine can parse.

However, here are a few examples when changing the locale and character set becomes necessary:

  1. You are working with custom/legacy clients that require a particular encoding (change client encoding).
  2. You wish to use a locale other than your client system's default (change client locale).
  3. Indexing does not improve the performance as expected.

How does locale affect indexing?

Indexes are ordered data sets - when PostgreSQL looks up an index, it has certain assumptions about where to find data. In other words, indexes work because they are stored in a sorted order. This collation (sort order) is defined by the locale setting.

So, if locale is set to English-only, PostgreSQL knows any key starting with 'b' will be found after 'a'. But, it doesn't know whether 'ಅ' comes before or after 'a'. As a result, if your data has characters that your locale doesn't know how to collate, your indexes may be inefficient.

❗️

Changing the encoding and locale of an existing database is not a simple task. You should be absolutely sure this is the problem before attempting to change the locale of your existing database.

What is the encoding and locale for a ScaleGrid PostgreSQL deployment?

ScaleGrid PostgreSQL deployments use UTF-8 as the default encoding on both client and server side.

The template1 database is UTF-8 encoded, and uses the en_US.UTF-8 locale. By default any databases you create will also inherit this encoding. However, you can use any locale and encoding of your choice for creating a database - please look at the FAQs for how to create database custom locale and encoding for further details.

Can I change the encoding for a ScaleGrid PostgreSQL deployment?

Easily change the client-side encoding when connecting to a ScaleGrid PostgreSQL deployment in the same way you change client-side encoding for any PostgreSQL deployment - there is no special requirements on ScaleGrid side. Please take a look at this troubleshooting FAQ if your client-side encoding or locale setting is not being respected.

At this point, we do not support changing the default client-side encoding property for the PostgreSQL deployment. In general a client should request the encoding it needs or default to the server-side encoding for the database, rather than the cluster default. However, if you do need to change this value, reach out to [email protected] and we will do our best to help you!

If you wish to change the encoding or locale on the server side, please take a look at this topic.

How can I create a database with custom server-side encoding and locale?

📘

PgBouncer enabled?

If you have PgBouncer enabled on your cluster, please see this topic instead.

If you created a database from the ScaleGrid UI, it will use “en_US.UTF-8” collation and ctype, along with UTF-8 encoding.

If you would like to use a custom value for any of these, please connect to your PostgreSQL instance and run the following command:

CREATE DATABASE <database-name> WITH TEMPLATE = template0 ENCODING = '<desired-encoding>' LC_COLLATE = '<desired-locale>' LC_CTYPE = '<desired-locale>' OWNER=”<desired-owner, role should already exist>”;

For example, the following command creates a database named ‘latinBritain’ using the English/Great Britain locale and Latin-9/ISO-8859-15 encoding, and sets the owner as the currently logged-in user:

CREATE DATABASE latinBritain WITH TEMPLATE = template0 ENCODING = 'LATIN9' LC_COLLATE = 'en_GB.iso885915' LC_CTYPE = 'en_GB.iso885915';

Please be aware that the locale and encoding you choose must be compatible with each other. Most locales have exactly one encoding they will work with, and vice versa. On a Linux system you can use the output of the locale command to find out the encoding and ctype supported by your particular locale.

Note - If you have PgBouncer enabled on your deployment, this new database will not be accessible through PgBouncer. Since all operations are routed via PgBouncer, this database will also be inaccessible to ScaleGrid processes and will cause failures in monitoring and other workflows. If you have PgBouncer enabled, please see here for additional steps you must take.

❗️

We do not support changing the encoding for the template1 or the Postgres admin databases at this point. Dropping or altering either of these databases will cause your cluster to go into an unmanaged state.

Creating a database with custom server-side encoding when PgBouncer is enabled.

🚧

When PgBouncer is enabled, you must have SSH access to your machines in order to create databases with custom encoding/locale.

If you are on a BYOC Plan, you can find details on how to access your machines via SSH on the ‘Machines’ Tab. If you are on Dedicated Hosting Plan, please contact [email protected] before proceeding any further.

Once you have SSH access, login to the master of your PostgreSQL deployment. If you have a standalone deployment, that machine is the master, otherwise refer to the ‘Machines’ Tab to know which machine is currently the master. From the shell of the master machine, please execute the following command to connect to your PostgreSQL instance:

psql -W -U <adminUserName> -p 5432 -h 127.0.0.1 -d postgres

Enter your password when prompted. You can find the admin user-name and password on the ScaleGrid UI. Once you are connected directly to Postgres, bypassing PgBouncer, execute the command to create a new database:

CREATE DATABASE <database-name> WITH TEMPLATE = template0 ENCODING = '<desired-encoding>' LC_COLLATE = '<desired-locale>' LC_CTYPE = '<desired-locale>' OWNER=”<desired-owner, role should already exist>”;

Once you have created the database, connect to it using the psql connect-command:

\c <database-name>

Now you must grant the PgBouncer authentication user access to this database, and create the PgBouncer auth query. This is required in order for connections through PgBouncer to go through.

-- Grant connect permissions to the PgBouncer authentication user
GRANT CONNECT ON DATABASE <database-name> TO pgbouncer;
--
-- Create the ScaleGrid Schema to hold the authentication function
CREATE SCHEMA scalegrid_pg;
--
-- GRANT pgbouncer auth user access to the schema
GRANT USAGE ON SCHEMA scalegrid_pg TO pgbouncer;
--
-- Create the function allowing PgBouncer to authenticate users
CREATE OR REPLACE FUNCTION scalegrid_pg.user_lookup(in i_username text, out uname text, out phash text)
RETURNS record AS $$
BEGIN
IF  session_user = 'pgbouncer' THEN
    SELECT usename, passwd FROM pg_catalog.pg_shadow
    WHERE usename = i_username INTO uname, phash;
    RETURN;
ELSE
    RAISE EXCEPTION 'Not an authorised user';
    END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

-- Reovke execute permissions for the user lookup function from all entities
REVOKE ALL ON FUNCTION scalegrid_pg.user_lookup(text) FROM PUBLIC

-- Grant execute only to the pgBouncer user
GRANT EXECUTE ON FUNCTION scalegrid_pg.user_lookup(text) TO pgbouncer

And now your new database is ready to use!

I want to change the encoding and/or locale of my existing database

This is not supported. Once a database has been created, it is no longer possible to modify its encoding or locale without risking data corruption, and as such, PostgreSQL disallows this operation.

We would recommend using client-side encoding and locale settings to get the desired results instead. If that does not resolve your problem, executing the following steps from a machine where pg_dump and psql should serve the purpose:

  1. Dump the existing data

You can use the pg_dump utility in order to dump data out of your existing database. To do this, use the following command:

pg_dump <your-connection-string> <your-database-name> > your_database.sql

You can find the PSQL connection string on your ScaleGrid Console - your connection string is the entire string except the psql command.

  1. Drop the existing database

You can do this using the DROP DATABASE command - however, if you have enough space on your machine, it is strongly recommended to rename the original and drop it only after verifying restore in to the new one was successful.

To rename a database, execute the command:

ALTER DATABASE your_database RENAME TO your_database_backup;

To drop a database, execute the command:

DROP DATABASE <your-database-name>
  1. Create a new database with your desired encoding.
  2. Restore the data in to this newly created database.

To restore the database, use the following command:

PGCLIENTENCODING=<YOUR_OLD_ENCODING> psql <your-connection-string> -f your_database.sql <your_database-name>

You can find the PSQL connection string on your ScaleGrid Console.

Once restore goes through you should verify data has been correctly imported and indexes are working as expected. Only when you are sure there is no data corruption, you should remove the old database as well as the dump file.

I want to import my database - but it has custom locale/encoding settings

🚧

If you have PgBouncer enabled on your ScaleGrid cluster, please verify you have SSH access to the underlying machines of your PostgreSQL deployment before proceeding.

When you perform an import from the ScaleGrid UI, the destination database is always created using the UTF-8 encoding and en_US.UTF-8 locale. If you need to import a database with custom encodings and locale, please execute the following steps from a machine where pg_dump and pg_restore are installed.

Before you start, please perform the following steps:

  1. Ensure pg_dump and pg_restore are installed and part of the system path on the machine you will execute import commands on. If you have PgBouncer enabled, you must execute these commands from the master of your ScaleGrid PostgreSQL deployment.

  2. Get your connection string from the ScaleGrid console. This is displayed when you click on the cluster - for an SSL-enabled cluster it will be of the form:

psql -W -U sgpostgres -p 5432 -h <hostname>.devservers.scalegrid.io -d "dbname=postgres sslmode=verify-ca sslrootcert=<Path to ca.pem file>".

The part after psql is referred to as “connection string” in the rest of the commands.

  1. If you have PgBouncer enabled, make the following modifications to your connection string:
  • Replace 6432 with 5432
  • Replace .devservers.scalegrid.io with 127.0.0.1.

First you need to dump your database from source, along with the command to create said database - this will ensure all your encoding and location settings are specified.

PGPASSWORD=<your-source-password> pg_dump -C -Fd --quote-all-identifiers --verbose --exclude-schema=scalegrid_pg --lock-wait-timeout=480000 --no-unlogged-table-data --serializable-deferrable  -d <your-database-name> -h <source-host> -p <source-port> -U <source-user> -f <dump-location>

The dump-location must not already exist (though any parent folders must exist already). Once the dump completes, make sure you do not have any database of the same name on the ScaleGrid cluster. If you do, please drop that database from the ScaleGrid UI, otherwise, the restore might fail.

Once you have dropped any database with conflicting names, please execute the
restore command as given below:

PGPASSWORD=<your-scalegrid-admin-password> pg_restore -c --if-exists -C --verbose --no-tablespaces <your-scalegrid-connection-string> <dump-location>

If you do not have PgBouncer enabled, your import is now over and you can connect to your database.

If you do have PgBouncer enabled, your newly imported database will not be accessible via PgBouncer (and as such becomes inaccessible to all ScaleGrid processes). You must login using the command psql <your connection string> and execute the following commands:

Don’t forget to change your database name below!

-- Connect to your newly imported database
\c <your-imported-database-name>
-- Grant connect permissions to the PgBouncer authentication user
GRANT CONNECT ON DATABASE <database-name> TO pgbouncer;
--
-- Create the ScaleGrid Schema to hold the authentication function
CREATE SCHEMA scalegrid_pg;
--
-- GRANT pgbouncer auth user access to the schema
GRANT USAGE ON SCHEMA scalegrid_pg TO pgbouncer;
--
-- Create the function allowing PgBouncer to authenticate users
CREATE OR REPLACE FUNCTION scalegrid_pg.user_lookup(in i_username text, out uname text, out phash text)
RETURNS record AS $$
BEGIN
IF  session_user = 'pgbouncer' THEN
    SELECT usename, passwd FROM pg_catalog.pg_shadow
    WHERE usename = i_username INTO uname, phash;
    RETURN;
ELSE
    RAISE EXCEPTION 'Not an authorised user';
    END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

Will scale/upgrade/restore/clone operations preserve my encoding and locale settings?

Yes. Once you have created a database with custom encoding following the steps listed here, any operation you trigger from the ScaleGrid UI will respect these settings!

My locale/encoding settings are not working!

There are typically a few reasons for this:

  1. psql, JDBC and most other official PostgreSQL clients default to the client-machine’s locale rather than the database’s server-side locale settings. Please refer to the client’s documentation on how to override this default.

  2. If after overriding the default you still do not see messages in your desired language, check the NLS Status Page for the PostgreSQL Project - your language might not have translations available yet.

  3. If your language does have translation available, it may indicate a problem with the locale support in the underlying machine. Please reach out to [email protected] and we can help you investigate further.

I created a database with custom encoding/locale and now I am not able to connect to it!

Do you have PgBouncer enabled? If you do, please follow the steps here. If you still face issues, do contact us at [email protected].