Import a PostgreSQL Database Using Command Line Tools
Once you've signed up for a fully managed PostgreSQL hosting plan, you can easily import your existing databases to your PostgreSQL deployment in ScaleGrid using the pg_dump command line utility.
pg_dump is a database backup program that performs logical backups, producing a set of SQL statements that can be executed to reproduce the original database object definitions and table data. It dumps one PostgreSQL database for backup or transfer to another SQL server.
When the destination is also a PostgreSQL cluster, we can also take advantage of more efficient dump formats that pg_dump supports. Using this approach we can dump and restore tables in parallel, making the operation much faster. However since the output is not plain-SQL, we need the pg_restore utility to load the data on to a ScaleGrid cluster. In this document we have detailed the steps to do this.
Prerequisites
Before you start import, you must have the pg_dump and pg_restore command line utilities installed on your system. You can refer to the PostgreSQL Wiki for installation instructions or use the EnterpriseDB installer.
To perform the import you will need:
-
Your source server name, port, and authentication.
-
Your ScaleGrid PostgreSQL Connection String - you can follow these instructions to obtain your connection string.
-
The machine on which you will execute the dump and restore commands must have connectivity to both the source and destination clusters. Refer to these instructions to configure the firewall on your ScaleGrid cluster.
PostgreSQL Versions
If the PostgreSQL version on your source is higher than the PostgreSQL version on your target ScaleGrid cluster, import via pg_restore will not work. If you are not using any feature that is incompatible with the PostgreSQL version on your target cluster, you can attempt an import using psql.
In this tutorial, we're going to show you how to import into an existing, newly created PostgreSQL deployment at ScaleGrid, using both pg_restore (more efficient, but source must be a PostgreSQL cluster with version <= that of the target) and psql (logical backup, can be against any source that supports SQL).
Import Roles
If you have admin access on the PostgreSQL cluster, you can use the pg_dumpall command line utility to import all the roles.
- Run pg_dumpall against the source cluster, to create a dump of your usernames and passwords:
PGPASSWORD=<your-source-password> pg_dumpall --roles-only -h <source-host> -p <source-port> -U <source-user> -f <dump-location>
The dump-location must be a filename which does not exist. You can specify a relative or absolute path, in which case all the parent directories must already exist.
- Once the dump completes, you can use the following command to restore all the roles in to your ScaleGrid target cluster
PGPASSWORD=<your-scalegrid-admin-password> psql <your-scalegrid-connection-string> -f <dump-location>
- Once the roles are created successfully, you are ready to go ahead and import the data, along with permissions.
Order of import operations
You must import or create the roles before you import databases. Otherwise the permissions will not be set correctly when you run pg_restore.
Importing without admin permissions
If you do not have admin permissions on the source cluster, we recommend you manually create the roles through the ScaleGrid UI, or PostgreSQL CLI, before attempting a data import. If you skip this step, you will see some non-fatal errors during the restore, and all the newly created data will be owned by the ScaleGrid admin user - sgpostgres.
Import Using pg_dump & pg_restore
Before you start, ensure pg_dump and pg_restore are installed and part of the system path on the machine you will execute import commands on.
Preserving Locale and Encoding Settings
If you follow the instructions here, the destination database will be created using the UTF-8 encoding and en_US.UTF-8 locale. If you need to import a database with custom encodings and locale, please use the import instructions on our Locale FAQ.
Repeat the following steps for each database you wish to import:
- Dump your existing database using the command:
PGPASSWORD=<your-source-password> pg_dump -Fd --quote-all-identifiers --verbose --exclude-schema=scalegrid_pg --lock-wait-timeout=480000 --no-unlogged-table-data --serializable-deferrable --jobs=1 -d <your-database-name> -h <source-host> -p <source-port> -U <source-user> -f <dump-location>
Notes:
-
You can change the value of the
--jobs
parameter to control how many parallel connections are opened to your database. -
The dump-location must not already exist (though any parent folders must exist already).
-
The
scalegrid_pg
schema contains functions and objects that are necessary for ScaleGrid to administer your PostgreSQL Cluster. To avoid naming conflicts, we do not support importing this schema.
Importing in to the 'postgres' database
If you wish to import in to the 'postgres' database, please skip the next step. Do NOT drop the 'postgres' database - ScaleGrid can no longer manage your cluster if you do this.
- Once the dump completes, create a new database on the ScaleGrid cluster using the following commands. Please note if a database already exists, we recommend dropping it first and starting afresh.
PGPASSWORD=<your-scalegrid-admin-password> psql <your-scalegrid-connection-string> -c 'DROP DATABASE IF EXISTS <dbname>';
PGPASSWORD=<your-scalegrid-admin-password> psql <your-scalegrid-connection-string> -c 'CREATE DATABASE <dbname>';
Using the --create options for pg_dump/pg_restore
For your ScaleGrid-hosted PostgreSQL cluster to work correctly, all new databases must be created using template1, rather than template0. The --create option uses template0, hence we recommend you avoid using it. If you still wish to use it, please follow the instructions in this document for PgBouncer enabled clusters.
- Once you have created the database you want to import into, execute the
restore command as given below:
PGPASSWORD=<your-scalegrid-admin-password> pg_restore --verbose --no-tablespaces <your-scalegrid-connection-string-WITH-database-name-changed-to-your-newly-created-database> <dump-location>
Your import is now over and you can connect to your database.
Import Using SQL-Based Logical Backups
Importing using SQL-based backups is useful when the source cluster is not a PostgreSQL server, or the source version is higher than the target version. If you do choose to use this method, please pay careful attention to the error messages displayed during the restore process to determined if the import was successful.
We also strongly recommend doing this only on dev/test clusters, and running data sanity checks on it, before working with a production cluster.
The steps to do a SQL-based logical backup are:
- Generate a SQL-based dump on the source cluster, using appropriate tools. If the source cluster is PostgreSQL, pg_dump can be used with the following command:
PGPASSWORD=<your-source-password> pg_dump -Fp --quote-all-identifiers --verbose --exclude-schema=scalegrid_pg --lock-wait-timeout=480000 --no-unlogged-table-data --serializable-deferrable --jobs=1 -d <your-database-name> -h <source-host> -p <source-port> -U <source-user> -f <dump-location>
If the source and target has different PostgreSQL versions, you must use the pgdump utility from the _higher version.
- Once the dump completes, create a new database on the ScaleGrid cluster using the following commands. Please note if a database already exists, we recommend dropping it first and starting afresh.
PGPASSWORD=<your-scalegrid-admin-password> psql <your-scalegrid-connection-string> -c 'DROP DATABASE IF EXISTS <dbname>';
PGPASSWORD=<your-scalegrid-admin-password> psql <your-scalegrid-connection-string> -c 'CREATE DATABASE <dbname>';
- Now you can use psql to run the SQL queries to recreate the data on source.
PGPASSWORD=<your-scalegrid-admin-password> psql <your-scalegrid-connection-string-WITH-database-name-changed-to-your-newly-created-database> -f <dump-location>
Updated about 2 years ago