PostgreSQL/BackupAndRecovery

PostgreSQL
BackupAndRecovery

Overview

Creating backups is an essential task for every database administrator. If the hardware crashes or any form of data corruption occurs, the DBA must ensure that a database can be restored with minimal data loss. PostgreSQL offers multiple strategies to support the DBA in achieving this goal.

In principle, backup technology can be divided into two classes: cold backups and hot backups. A cold backup is a backup taken when no database file is open. In the case of PostgreSQL this means that the instance must be stopped during the complete time interval of taking the backup. A hot backup is a backup taken during normal working hours. Clients can perform read and write actions in parallel to this form of backup creation.

PostgreSQL supports different types of backups:

Cold backups are called File System Level Backup.
There are two types of hot backups

SQL Dump produces SQL commands, e.g.: INSERT, which can re-create the database.
Continuous Archiving and Point-in-Time Recovery (PITR) uses the combination of a special backup plus all data-changes since then.

File System Level Backup

A cold backup is a backup taken when the PostgreSQL instance is not running. It consists of all files of all databases of a cluster.

There is only one way to create a consistent and therefore useful cold backup: the PostgreSQL instance must be stopped, e.g. by issuing the pg_ctl stop command. This will disconnect all clients from all databases of the cluster, shut down the instance, and close all files. After that, the backup can be taken by using one of the usual operating system copy-utilities (cp, tar, dd, rsync, etc.) to create a copy of all files at a secure location, e.g. at disks on a different server. Especially the following files must be copied:

All files under the directory node where the cluster resides. The environment variable $PGDATA points to this directory and resolves to something like .../postgres/14/data. Use echo $PGDATA on the command-line, or show data_directory; in psql to find the directory.
All configuration files. They may be in $PGDATA, but can also be located elsewhere. The main configuration files are: postgresql.conf, pg_hba.conf, and pg_ident.conf. Their locations can be found by running the following commands from the psql utility:

    show config_file;
    show hba_file;
    show ident_file;

All tablespace files. These files are located elsewhere on the file-system. Their locations can be found by looking at the symlinks in the $PGDATA/pg_tblspc directory:

    cd $PGDATA/pg_tblspc
    ls -lt

Caution One may try to backup only special parts of a cluster, eg. a huge file that represents a table, or one of the tablespaces - or the opposite: everything except the huge file. Even if the instance is shut down during the generation of such a partial copy, copies of this kind are useless. The recovery of a cold backup needs really all data files and meta-information files of the cluster to re-create the cluster.

Caution It is strongly recommended to verify every backup/recovery strategy on a test system to verify their reliability before implementing them on a production server. In particular, it's necessary to test the recovery steps!

► Advantages

A cold backup is easy to generate and restore.

► Disadvantages

A continuous 7x24 operation mode of any of the databases in the cluster is not possible.
It is not possible to backup smaller parts of a cluster like a single database or table.
Partial restores are not possible. Restores must include all cluster files.
After a crash, any data changes that occur after the most recent cold backup get lost. Only the data in the backup will be restored.

► How to Recover

Stop the instance.
Backup the original files of the crashed cluster. They may be useful for forensic actions.
Delete all original files of the crashed cluster.
Copy the files of the cold backup to their original places.
Start the instance. It should start in the normal way, without any special message.

Hot Backup

In contrast to cold backups, hot backups are taken while the instance is running and applications may change data during the backup is taken. Hot backups are sometimes called online backups. PostgreSQL supports two very different kinds of hot backups: First, a pure SQL-based version, and second, a product-specific version. They are explained in the next two chapters.

SQL Dump (or: Logical Backup)

A logical backup is one of the two forms of a hot backup. It consists of data and/or metadata within the cluster, a single database, or some parts of a database. They are created by the utilities pg_dump or pg_dumpall.

The instance must run for those utilities to operate. Even though they run in parallel with other clients - possibly over a longer period of time -, they create an exact copy of the data as of the moment of their start time. For example, if an application changes some data during this period, the backup takes the old value whereas all other applications operate on the new value. This is possible because of PostgreSQL's MVCC (Multi-version concurrency control) implementation which allows the existence of multiple versions of a row at the same time.

pg_dump

pg_dump works at the database level and can backup the complete database as well as some of its parts such as individual tables. It is able to dump data, schema definitions, or both. The parameters --data-only and --schema-only select the intended part.

pg_dump supports two output formats: plain (readable plain-text format) and custom (a binary format). The format type is chosen by the parameter --format. The plain-text format contains SQL commands like CREATE and INSERT. Files created in this format may be used by psql to restore the backed-up data. The custom format is sometimes called the archive format. To restore files created in this format you must use pg_restore.

The following diagram visualizes the cooperation of pg_dump, psql and pg_restore.

Some Examples:

$ # dump complete database 'finance' in plain-text format to a file
$ pg_dump --dbname=finance --username=boss --format=plain --file=finance.sql
$
$ # restore database content (to a different or an empty database)
$ psql --dbname=finance_x --username=boss <finance.sql
$
$
$
$ # dump table 'person' of database 'finance' in binary format to a file
$ pg_dump --dbname=finance --username=boss --table=person  \
          --format=custom --file=finance_person.archive
$
$ restore table 'person' from binary file
$ pg_restore --dbname=finance_x --username=boss        \
             --format=custom <finance_person.archive
$

pg_dumpall

The pg_dumpall utility works at the cluster level and calls pg_dump internally to dump each database of the cluster. Additionally, it dumps cluster level objects ('globals') like user/roles and their rights. If it is started without detailed parameters, it dumps the complete content of the cluster: all data and metadata of all databases plus all cluster level objects. The parameter --globals-only can be used to restrict its behavior to dump cluster objects only. pg_dumpall output is in plain-text format.

► Advantages

Continuous 7x24 operation mode is possible.
Small parts of the cluster or database may be backup-ed or restored.
When you use the text format, you can switch from one PostgreSQL version to another or from one hardware platform to another.

► Disadvantages

The text format uses much space, but it compresses well.

► How to Recover
As shown in the above diagram, the recovery process depends on the format of the dump. Text files are in standard SQL syntax. To recreate data from such files you must use psql. Files with the custom format have a PostgreSQL-specific binary structure and can only be used by the utility pg_restore.

Continuous Archiving and Point-in-Time Recovery (PITR)

This is the second form of hot backups. Such backups consist of two parts. The first one is the so-called base backup, which consists of a copy of all files of a cluster (similar to File System Level Backup). The second one consists of all data-changes since the start of the backup command. Such data-changes keep occurring with further online activities (during and after the backup generation), are stored in WAL files, and must be continuously saved ('archived') in the same way as the first part.

To understand the purpose and the technique of such backups, it's helpful to know PostgreSQL's recover-from-crash strategy. At all times and independent from any backup/recovery action, PostgreSQL maintains Write Ahead Log (WAL) files - primarily for crash-safety purposes. Such WAL files contain log records, which reflect all changes made to the data and the schema. Prior to transfers of changes to data files, log records are stored in (sequentially written) WAL files. In the case of a system crash, those log records are used to recover the cluster to a consistent state during the restart of the instance. The recovery process searches the timestamp of the last checkpoint, which is stored in the WAL files, and replays all subsequent log records in chronological order against the cluster. Through that action, the cluster gets recovered to a consistent state and contains all changes up to the last COMMIT.

When recovering from a backup, the overall strategy is similar to the recover-from-crash strategy: remove the files of the crashed cluster, restore them from the base backup, inform the recovery process (which is an integral part of the instance) how to access the archived WAL files via an operating system command, and restart the instance. The recovery part of the instance replays all log records from the archived WAL files against the (restored) database files and transfers the cluster to a consistent state. Thereafter the cluster contains all changes up to the last COMMIT before the crash.

To implement this backup strategy, three actions must be taken:

Define all necessary parameters in postgres.conf.
Generate a base backup with the utility pg_basebackup.
Archive all arising WAL files.

If a recovery becomes necessary, you have to delete all files in the cluster, recreate the old state of the cluster by copying the backup to its original location, create a special file (recovery.signal or recovery.conf, see below: step 3) with some recovery-information (especially to what location WAL files have been archived) and restart the instance. The instance will recreate the cluster according to its parameters in postgres.conf and recovery.conf to a consistent state including all data changes up to the last COMMIT.

► Advantages

Continuous 7x24 operation mode is possible.
Recover with minimal data loss.
The generated WAL files can be used for additional features like replication.

► Disadvantages

Base backups work only on the cluster level, not on any finer granularity like database or table.
If your database is very busy and clients change a lot of data, many WAL files may arise.

How to Take the Backup

Step 1
You have to define some parameters in postgres.conf so that WAL files contain enough data, archiving of WAL files is activated, and a copy command is defined to transfer WAL files to a fail-safe location.

# collect enough information in WAL files
wal_level = 'replica'
# activate ARCHIVE mode so that WAL files will be archived by the instance
archive_mode = on
# supply a system command to transfer WAL files to a failsafe location (cp, scp, rsync, ...)
# %p represents the pathname including filename. %f represents the filename only.
archive_command = 'scp %p dba@archive_server:/postgres/wal_archive/%f'

After the parameters are defined, you must restart the cluster: pg_ctl restart. The cluster will continuously generate WAL files in its subdirectory pg_wal (pg_xlog in Postgres version 9.x and older) in concordance with data changes in the database. When it has filled a WAL file and must switch to the next one, it will copy the old one to the defined archive location.

Step 2
You must create the so-called base backup with the utility bg_basebackup.

$ # take a copy (base backup) of the files of the cluster with the pg_basebackup utility
$ pg_basebackup --pgdata=/safe_drive/backup/
$

Step 3
That's all. All other activities are taken by the instance, especially the continuous copy of completely filled WAL files to the archive location.

How to Recover

To perform a recovery the original base backup is copied back and the instance is configured to perform recovery during its start.

Stop the instance - if it is still running.
Create a copy of the crashed cluster - if you have enough disc space. Maybe, you will need it at a later stage.
Delete all files of the crashed cluster.
Recreate the cluster files from the base backup.
Create a special file in $PGDATA:

PostgreSQL prior to version 12: Create a file recovery.conf in $PGDATA. It must contain a command similar to: restore_command = 'scp dba@archive_server:/postgres/wal_archive/%f %p'. This copy command is the reverse of the command in postgres.conf, which saved the WAL files to the archive location.
PostgreSQL since version 12: Create an empty file recovery.signal in $PGDATA. Add a command similar to: restore_command = 'scp dba@archive_server:/postgres/wal_archive/%f %p' within postgres.conf. This copy command is the reverse of the command in postgres.conf, which saved the WAL files to the archive location.

Start the instance. During startup, the instance will copy and process all WAL files found in the archive location.

The fact, that recovery.signal respective recovery.conf exists, signals the instance to perform a recovery. After a successful recovery, this file is renamed.

If you want to recover to some previous point in time prior to the occurrence of the crash (but behind the creation of the backup), you can do so by specifying this point in time. In this case, the recovery process will stop before processing all archived WAL files. This feature is the origin of the term Point-In-Time-Recovery.

In summary the two crucial commands for recovery (in recovery.conf resp. postgres.conf) may look like this:

restore_command      = 'scp dba@archive_server:/postgres/wal_archive/%f %p'
recovery_target_time = '2021-01-31 06:00:00 CET'

Additional Tools

There is an open-source project Barman, which simplifies backup and recovery steps. If you have to manage a lot of servers and instances and it becomes complicated to configure and remember all the details about the server landscape, Barman stores the configuration details and automates processes.

External link

PostgreSQL documentation: Backup

Configuration

PostgreSQL
BackupAndRecovery

Architecture