PostgreSQL/Architecture
PostgreSQL implements a client-server architecture. Each Client process
connects to one Backend process
at the server site.
Clients do not have direct access to database files and the data stored in them. Instead, they send requests to the Server
and receive the requested data from there. The server launches a single process for each client connection. Such a Backend process handles the client's requests by acting on the Shared Memory
. This leads to other activities (file access, WAL, vacuum, ...) of the Instance
. The Instance is a group of server-side processes acting on the common Shared Memory. PostgreSQL does not use threading.
At startup time, the Instance is launched by a single process denoted the Postmaster
. It loads configuration files, allocates Shared Memory, and starts the other collaborating processes of the Instance: Background Writer
, Checkpointer
, WAL Writer
, WAL Archiver
, Autovacuum
, Statistics Collector
, Logger
, and more. Subsequently, the Postmaster listens to its configured system port. In response to new client connection attempts he launches new Backend processes and delegates authentication, communication, and the handling of all further requests to them. The next figure visualizes the main aspects of RAM, processes, files, and their collaboration.
Client requests like SELECT or UPDATE usually lead to the necessity to read or write data. This is carried out by the client's Backend process. Such I/O-activities are not done directly on the disks. Instead, they are done in a cache in the Shared Memory that mirrors the file pages. Accesses to such caches are much faster than accesses to disk. Read accesses affect only the cache whereas write accesses are accomplished by writing to a log, the so-called write-ahead-log or WAL.
Shared Memory is limited in size and it can become necessary to evict pages. As long as the content of such pages hasn't changed, this is not a problem. But they may be modified. Modified pages are called dirty pages (or dirty buffers) and before they can be evicted they must be written back to disk. The Background Writer processes and the Checkpointer takes care of that. They ensure that the cache is - after a short time delay - in sync with files. The synchronization from RAM to disk consists of two steps.
First, whenever the content of a page changes, a WAL record
is created containing the delta-information (difference between the old and new content) and stored in another area of Shared Memory. During a COMMIT
or earlier the WAL Writer process reads them and appends them to the end of the current WAL file
. Such sequential writes are faster than writes to random positions of heap and index files. All WAL records created from one dirty page must be transferred to disk before the dirty page itself can be transferred to disk in the second step.
Second, the Background Writer process transfers dirty buffers from Shared Memory to files. Because I/O activities can block other processes, it starts periodically and acts only for a short period. Doing so, its extensive - and expensive - I/O activities are spread over time, avoiding debilitating I/O peaks. The Checkpointer process also transfers dirty buffers to file.
The Checkpointer process creates a Checkpoint
by writing and flushing all older dirty buffers, all older WAL records, and finally a special Checkpoint record to disk. Therefore a Checkpoint is a point in the sequence of transactions at which it is guaranteed that the heap and index files have been updated with all dirty pages before that Checkpoint.
WAL files contain the changes made to the data. Such 'delta information' is used in the case of a system crash for recovery (database backup + WAL files --> database immediately before the crash). Hence, WAL files shall be duplicated and preserved at a safe place until the next database backup is taken. This is the duty of the WAL Archiver process. He can be configured to run a script that copies WAL files, as soon as they are full and a switch to the next one takes place. Of course, such copies should be done to a separate disk or server for security reasons. But it's also a good idea to store the original WAL files on a different disk than heap and index files. Such a separation boosts performance. It can be done using a symbolic link pointing from the original WAL directory to a directory at a different disk.
The Autovacuum process marks old versions of records in the heap and index files that are no longer used by any transaction as 'finally deleted'. Hence it releases the space occupied by them for reuse. The need for such a process results from the MVCC architecture.
The Statistics Collector collects counters about accesses to SQL objects like tables, rows, indexes, pages, ... and stores them in system tables.
The Logger writes text lines about more or less serious events that may happen during database accesses, e.g., wrong password, no permission, long-running queries, etc. to a sequential file.