Bulk loading data into PGD clusters
Suggest editsBulk loading data into PGD clusters
This guidance is specifically for environments where there's no direct access to the PGD nodes, only PGD Proxy endpoints, such as Cloud Service's distributed high availability deployments of PGD.
Without using care, bulk loading data into a PGD cluster can cause a lot of replication load on a cluster. With that in mind, this content describes a process to mitigate that replication load.
Provision or prepare a PGD cluster
You must provision a PGD cluster, either manually, using TPA, or on Cloud Service. This will be the target database for the migration. Ensure that you provision it with sufficient storage capacity to hold the migrated data.
We recommend that, when provisioning or, if needed, after provisioning, you set the following Postgres GUC variables.
GUC variable | Setting |
---|---|
maintenance_work_mem | 1GB |
wal_sender_timeout | 60min |
wal_receiver_timeout | 60min |
max_wal_size | Set to either: • A multiple (2 or 3) of your largest table or • More than one third of the capacity of your dedicated WAL disk (if configured) |
Make note of the target's proxy hostname (target-proxy) and port (target-port). You also need a user (target-user) and password (target-password) for the target cluster.
The following instructions give examples for a cluster named ab-cluster
with an ab-group
subgroup and three nodes: ab-node-1
, ab-node-2
, and ab-node3
. The cluster is accessed through a host named ab-proxy
(the target-proxy).
On Cloud Service, a cluster is configured, by default, with an edb_admin
user (the target-user) that can be used for the bulk upload.
The target-password for the target-user is available from the Cloud Service dashboard for the cluster.
A database named bdrdb
(the target-dbname) was also created.
Identify your data source
You need the source hostname (source-host), port (source-port), database name (source-dbname), user, and password for your source database.
Also, you currently need a list of tables in the database that you want to migrate to the target database.
Prepare a bastion server
Create a virtual machine with your preferred operating system in the cloud to orchestrate your bulk loading.
- Use your EDB account.
- Obtain your EDB repository token from the EDB Repos 2.0 page.
- Set environment variables.
- Set the
EDB_SUBSCRIPTION_TOKEN
environment variable to the repository token.
- Set the
- Configure the repositories.
- Run the automated installer to install the repositories.
- Install the required software.
- Install and configure:
- psql
- PGD CLI
- Migration Toolkit
- LiveCompare
- Install and configure:
Use your EDB account
Go to the EDB Repos 2.0 page and log in with your EDB account. Make a note of the repository token that you will use to configure the repositories on the bastion server.
Set environment variables
Set the EDB_SUBSCRIPTION_TOKEN
environment variable to the repository token you obtained from the EDB Repos 2.0 page.
Configure repositories
The required software is available from the EDB repositories. You need to install the EDB repositories on your bastion server.
- Red Hat
- Ubuntu/Debian
Install the required software
Once the repositories are configured, you can install the required software.
Installing psql
and pg_dump
/pg_restore
/pg_dumpall
The psql command is the interactive terminal for working with PostgreSQL. It's a client application and can be installed on any operating system. Packaged with psql are pg_dump and pg_restore, command-line utilities for dumping and restoring PostgreSQL databases.
- Ubuntu
- Red Hat
To simplify logging in to the databases, create a .pgpass file for both your source and target servers:
Create the file in your home directory and change its permissions to read/write only for the owner.
Ensure that your passwords are appropriately escaped in the .pgpass
file. If an entry needs to contain : or \, escape this character with \.
Installing PGD CLI
PGD CLI is a command-line interface for managing and monitoring PGD clusters. It's a Go application and can be installed on any operating system.
- Ubuntu
- Red Hat
Create a configuration file for the PGD CLI:
For the example ab-cluster
: