In nearly all cases, PostgreSQL databases must support many clients, which want to add or change data, at the same time. This makes it necessary to protect concurrently running requests from each other - preferably without blocking them. Situations may occur where two clients want to change the same row at the same time or that one client wants to revoke (rollback) his changes while another client may still have tried to read the newest version.

Imagine an Online shop offering the last copy of an article. Two clients display the article at their user interface. After a while, but at the same time, both clients decide to put the article into their shopping cart or even to buy it. Both have seen the article, but only one can be allowed to buy it. The database must enforce an order of the requests, permit the write access to one of them, block the other from writing, and inform the blocked client that the data has been changed by a different process and shall be re-read.

PostgreSQL implements a sophisticated technique to handle concurrent accesses that avoids locking: Multiversion Concurrency Control (MVCC). Instead of locking a row, the MVCC technique creates a new version of that row when a data change takes place. "The main advantage of using the MVCC ... rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading. PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation through the use of ... Serializable Snapshot Isolation (SSI)". [1]

The implementation of MVCC is based on transaction IDs (XID). Every transaction in a cluster gets a unique sequential number as its ID. Every INSERT, UPDATE, or DELETE command stores the XID in xmin or xmax within the affected rows. xmin, xmax, and some more are system columns contained in every row. Both are not visible with the usual SELECT * FROM ... command. But you can read them with commands like SELECT xmin, xmax, * FROM ... . The column xmin contains the XID of the transaction which has created this version of the row and xmax contains the XID of the transaction which has deleted this version, or zero if the version is not deleted.

So, what's going on in detail when write accesses take place? The following graphic shows details concerning xmin, xmax, and the regular application data.

An INSERT command creates the very first version of a row. Besides its application data 'x', this version contains the ID of the creating transaction 123 in xmin and 0 in xmax. xmin indicates that the version exists since transaction 123 and the value 0 in xmaxindicates that it is currently not deleted.

Somewhat later, transaction 135 executes an UPDATE of this row by changing the application data from 'x' to 'y'. According to the MVCC principles, the data in the old version of the row is not changed. The value 'x' remains as it was before. Only xmax changes to 135. Now, this version is treated as valid exclusively for transactions with XIDs from 123 to 134. In addition to preserve the data in the old version, the UPDATE creates a new version of the complete row with its XID in xmin, 0 in xmax, and 'y' in the application data (plus all other application data from the old version). This new row version is visible to all future transactions. (Internally, an UPDATE command acts as a DELETE command followed by an INSERT command.)

All subsequent UPDATE commands behave in the same way as the first one: they put their XID in xmax of the current version, create a new version with their XID in xmin and 0 in xmax.

Finally, a row may be deleted by a DELETE command. Even in this case, all versions of the row including the newest one remain in the database - nothing is thrown away. Only xmax of the last version is set to the XID of the DELETE transaction, which indicates that it is only visible to transactions with older XIDs - in this example from 142 to 820.

In summary, the MVCC technology creates more and more versions of the same row in the table's heap file and leaves them there, even after a DELETE command. Only the youngest version is relevant for all future transactions. But the system must also preserve some of the older ones for a short time because they could still be requested by transactions that had started before the deleting transaction and hence have a smaller XID. Over time, also the older ones goes out of scope for ALL transactions and therefore become ultimately unnecessary. Nevertheless, they do exist physically on the disk and occupy space. They are called dead rows and are part of the so-called bloat.

Please keep in mind:

  • xmin and xmax indicate the range in which row versions are visible for transactions. This range doesn't imply any direct temporal meaning. The sequence of XIDs reflects only the sequence of transactions' begin events.
  • Internally, an UPDATE command acts in the same way as a DELETE command followed by an INSERT command.
  • Nothing is removed - with the consequence that the database occupies more and more disk space. It is obvious that this behavior has to be corrected in some way. The next chapter explains how vacuum and autovacuum fulfill this task.

So far this is only a raw description of the principles of MVCC. The implementation considers more problems, e.g.:

  • Changes may be revoked by a ROLLBACK command.
  • After some time the sequence of XIDs may start from zero (wrap-around). In this case xmax can be smaller than xmin.

Note

edit

XIDs are sequences (with a reserved value to handle wrap-around in pre-9.4 PostgreSQL versions). PostgreSQL knows some configuration parameters concerning transactions and their XIDs with names like xxx_age, e.g.: vacuum_freeze_min_age. For such parameters, the 'age' doesn't specify a period of time but represents a certain number of transactions, e.g., 100 millions.

References

edit