To promote a consistent use and understanding of important terms we list and define them here. For some terms we include short annotations to give a first introduction to the subject.

Database Cluster

edit

Overview

edit


 


Server (or Node)

edit

A server is some (real or virtual) hardware where PostgreSQL is installed. In this document, the word instance is a different concept from server. See the definition of instance later in this document.

Cluster of Nodes

edit

A set of nodes, which interchange information via replication.

Installation

edit

After you have downloaded and installed PostgreSQL, you have a set of programs, scripts, configuration- and other files on a server. This set is called the 'Installation'. It includes all instance programs as well as some client programs like psql.

Server Database

edit

The term server database is often used in the context of client/server connections to refer to an instance or a single database.

Cluster (or 'Database Cluster')

edit

A cluster is a storage area (directory, subdirectories and files) in the file system, where a collection of databases plus meta-information resides. Within the database cluster there are also the definitions of global objects like users and their rights. They are known across the entire database cluster. (Access rights for an user may be limited to individual objects like a certain table or a certain schema. In that case, the user will not have this access rights to the other objects of the cluster.)

Within a database cluster there are at least three databases: 'template0', 'template1', 'postgres' and possibly more.

  • 'template0': A template database, which may be used by the command CREATE DATABASE (template0 should never be modified)
  • 'template1': A template database, which may be used by the command CREATE DATABASE (template1 may be modified by DBA)
  • 'postgres': An empty database, mainly for maintenance purposes

Most PostgreSQL installations use only one database cluster. Its name is 'main'. But you can create more clusters on the same PostgreSQL installation, see tools initdb further down.

Instance (or 'Database Server Instance' or 'Database Server' or 'Backend')

edit

An instance is a group of processes (on a UNIX server) or one service (on a Windows server) plus shared memory, which controls and manages exactly one cluster. Using IP terminology one can say that one instance occupies one IP/port combination, eg. the combination http://localhost:5432. It is possible that on a different port of the same server another instance is running. The processes (in a UNIX server), which build an instance, are called: postmaster (creates one 'postgres'-process per client-connection), logger, checkpointer, background writer, WAL writer, autovacuum launcher, archiver, stats collector. The role of each process is explained in the chapter architecture.

If you have many clusters on your server, you can run many instances at the same machine - one per cluster.

Hint: Other publications sometimes use the term server to refer to an instance. As the term server is widely used to refer to real or virtual hardware, we do not use server as a synonym for instance.

Database

edit

A database is a storage area in the file system, where a collection of objects is stored in files. The objects consist of data, metadata (table definitions, data types, constraints, views, ...) and other data like indices. Those objects are stored in the default database 'postgres' or in a newly created database.

The storage area for one database is organized as one subdirectory tree within the storage area of the database cluster. Thus a database cluster may contain multiple databases.

In a newly created database cluster (see below: initdb) there is an empty database with the name 'postgres'. In most cases this database stays empty and application data is stored in separate databases like 'finance' or 'engineering'. Nevertheless 'postgres' should not be dropped because some tools try to store temporary data within this database.

Schema

edit

A schema is a namespace within a database: it contains named objects (tables, data types, functions, and operators) whose names can duplicate those of other objects existing in other schemas of this database. Every database contains the default schema 'public' and may contain more schemas. All objects of one schema must reside within the same database. Objects of different schemas within the same database may have the same name.

There is another special schema in each database. The schema 'pg_catalog' contains all system tables, built-in data types, functions, and operators. See also 'Search Path' below.

Search Path (or 'Schema Search Path')

edit

A Search Path is a list of schema names. If applications use unqualified object names (e.g.: 'employee_table' for a table name), the search path is used to locate this object in the given sequence of schemas. The schema 'pg_catalog' is always the first part of the search path although it is not explicitly listed in the search path. This behaviour ensures that PostgreSQL finds the system objects.

initdb (OS command)

edit

Despite of its name the utility initdb creates a new cluster, which contains the 3 databases 'template0', 'template1' and 'postgres'.

createdb (OS command)

edit

The utility createdb creates a new database within the actual cluster.

CREATE DATABASE (SQL command)

edit

The SQL command CREATE DATABASE creates a new database within the actual cluster.

Directory Structure

edit

A cluster and its databases consists of files, which hold data, actual status information, modification information and a lot more. Those files are organized in a fixed way under one directory node.

 


Consistent Writes

edit

Shared Buffers

edit

Shared bufferes are RAM pages, which mirror pages of data files on disc. They exist due to performance reasons. The term shared results from the fact that a lot of processes read and write to that area.

'Dirty' Page

edit

Pages in the shared buffers mirror pages of data files on disc. When clients request changes of data, those pages get changed without - provisionally - a change of the related pages on disc. Until the background writer writes those modified pages to disc, they are called 'dirty' pages.

Checkpoint

edit

A checkpoint is a special point in time where it is guaranteed that the database files are in a consistent state. At checkpoint time all change records are flushed to the WAL file, all dirty data pages (in shared buffers) are flushed to disc, and at last a special checkpoint record is written to the WAL file.

The instance's checkpointer process automatically triggers checkpoints on a regular basis. Additionally they can be forced by issuing the command CHECKPOINT in a client program. For the database system it takes a lot of time to perform a checkpoint - because of the physical writes to disc.

WAL File

edit

WAL files contain the changes which are applied to the data by modifying commands like INSERT, UPDATE, DELETE or CREATE TABLE ... . This is redundant information as it is also recorded in the data files (for better performance at a later time). According to the configuration of the instance, there may be more information within WAL files. WAL files reside in the pg_wal directory (which was named pg_xlog before version 10), have a binary format and a fixed size of 16MB. When they are no longer needed, they get recycled by renaming and reusing their already allocated space.

A single information unit within a WAL file is called a log record.

Hint: In the PostgreSQL documentation and in related documents there are a lot of other, similar terms which refer to what we denote as WAL file in this Wikibook: segment, WAL segment, logfile (don't mix it with the term logfile, see below), WAL log file, ... .

Logfile

edit

The instance logs and reports warning and error messages about special situations in readable text files. These logfiles can reside at any place in the directory structure of the server and are not part of the cluster.

Hint: The term 'logfile' does not relate to the other terms of this subchapter. It is mentioned here because the term sometimes is used as a synonym for what we call WAL file - see above.

Log Record

edit

A log record is a single information unit within a WAL file.

Segment

edit

The term segment is sometimes used as a synonym for WAL file.

MVCC

edit

Multiversion Concurrency Control (MVCC) is a common database technique to accomplish two goals: First, it allows the management of parallel running transactions on a logical level and second, it ensures high performance for concurrent read and write actions. It is implemented as follows: Whenever some values of an existing row change, PostgreSQL writes a new version of this row to the database without deleting the old one. In such situations the database contains multiple versions of the row. In addition to their regular data the rows contain transaction IDs which allows to decide, which other transactions will see the new or the old row. Hence other transactions sees only those values (of other transactions), which are committed.

Outdated old rows are deleted at a later time by the utility vacuumdb respectively the SQL command vacuum.

Backup and Recovery

edit

The term cold as an addition to the backup method name indicates that with this method the instance must be stopped to create a useful backup. In contrast, the addition 'hot' denotes methods where the instance MUST run (and hence changes to the data may occur during backup actions).

A cold backup is a consistent copy of all files of the cluster with OS tools like cp or tar. During the creation of a cold backup the instance must not run - otherwise the backup is useless. Hence you need a period of time in which applications do not use any database of the cluster - a continuous 7×24 operation mode is not possible. And secondly: the cold backup works only on the cluster level, not on any finer granularity like database or table.

Hint: A cold backup is sometimes called an "offline backup".

A logical backup is a consistent copy of the data within a database or some of its parts. It is created with the utility pg_dump. Although pg_dump may run in parallel to applications (the instance must be up), it creates a consistent snapshot as of the time of its start.

pg_dump supports two output formats. The first one is a text format containing SQL commands like CREATE and INSERT. Files created in this format may be used by psql to restore the backed-up data. The second format is a binary format and is called the 'archive format'. Files with this format can be used to restore its data with the tool pg_restore.

As mentioned, pg_dump works at the database level or smaller parts of databases like tables. If you want to refer to the cluster level, you must use pg_dumpall. Please notice, that important objects like users/roles and their rights are always defined at cluster level.

Such backups consist of two parts. The first one is the so-called base backup, which is a copy of all files of a cluster. The second one consists of all data-changes since the start of the backup command. They are stored in WAL files.

Such backups work only at the cluster level, not on any finer granularity like database or table.

When you use the technique of 'Continuous Archiving' (Base Backup) and archive all occurring WAL files, you can restore the database as it was at an arbitrary time. To do so, you must restore the base backup and replay the WAL files against it up to a defined timestamp.

Archiving

edit

Archiving is the process of copying WAL files to a failsafe location. When you plan to use PITR you must ensure that the sequence of WAL files is saved for a longer period. To support the process of copying WAL files at the right moment (when they are completely filled and a switch to the next WAL file has taken place), PostgreSQL runs the archiving process which is part of the instance. This process copies WAL files to a configurable destination.

Recovering

edit

Recovering is the process of playing WAL files against a physical backup. One of the involved steps is the copy of the WAL files from the failsafe archive location to its original location in '/pg_xlog'. The aim of recovery is to bring the cluster into a consistent state at a defined timestamp.

Archive Recovery Mode

edit

When recovering takes place, the instance is in archive recovery mode.

Restartpoint

edit

A restart point is an action similar to a checkpoint. Restart points are only performed when the instance is in archive recovery mode or in standby mode.

Timeline

edit

After a successful recovery PostgreSQL transfers the cluster into a new timeline to avoid problems, which may occur when PITR is reset and WAL files reapplied (e.g.: to a different timestamp). Timeline names are sequential numbers: 1, 2, 3, ... .

Replication

edit

Replication is a technique to send data, which was written within a master server, to one or more standby servers or even another master server.

Master Server

edit

The master server is an instance on a server which sends data to other instances in addition to its local processing of data.

Standby Server

edit

The standby server is an instance on a server which receives information from a master server about changes of its data.

Warm Standby Server

edit

A warm standby server is a running instance, which is in standby_mode (recovery.conf file). It continuously reads and processes incoming WAL files (in the case of log-shipping) or log records (in the case of streaming replication). It does not accept client connections.

Hot Standby Server

edit

A hot standby server is a warm standby server with the additional flag hot_standby in postgres.conf. It accepts client connections and read-only queries.

Synchronous Replication

edit

Replication is called synchronous, when the standby server processes the received data immediately, sends a confirmation record to the master server and the master server delays its COMMIT action until he has received the confirmation of the standby server.

Asynchronous Replication

edit

Replication is called asynchronous, when the master server sends data to the standby server and does not expect any feedback about this action.

Streaming Replication

edit

The term is used when log entries are transfered from master server to standby server over a TCP connection - in addition to their transfer to the local WAL file. Streaming replication is asynchronous by default but can also be synchronous.

Log-Shipping Replication

edit

Log shipping is the process of transfering WAL files from a master server to a standby server. Log shipping is an asynchronous operation.