PostgreSQL/Visibility


Some exemplary Problems

edit

It's obvious that every transaction 'sees' all data changes, it has been carried out during its lifetime, without problems. But there are situations where more than one process wants to read or write the same data during an overlapping time interval of their transactions or even at the same point in time, which is possible on servers with multiple CPUs or a disk array. In such cases, different types of conflicts and suspicious effects may occur.

Applications may or may not accept the effects resulting from such competing situations. They can choose different levels of isolation against the activities of other transactions depending on their needs. The level defines which effects they are willing to accept and which not. Higher levels mean that fewer effects can occur but the database system must work harder and that the overall throughput decreases.

Here are some examples with two transactions TA and TB. Both don't perform a COMMIT if not explicitly noted.

  • TA reads the row with id = 1. TB reads the same row. TA increases column X by 1. TB increases the same column by 1. What will be the result? There is the danger of a 'Lost update'.
  • TA changes a value of the row with id = 1. What shall TB see if it reads the same row? TA may perform a ROLLBACK. (Uncommitted read)
  • TA reads the row with id = 1. TB reads the same row, changes a value and performs a COMMIT. TA reads the row again. In comparison to its first read, it will see a different value. (Non-repeatable read)
  • TA reads all rows with status = 'ok'. TB inserts an additional row with status = 'ok' and performs a COMMIT. TA reads all rows with status = 'ok' again and receives a different number of rows. (Phantom read)
  • TA reads and changes the row with id = 1. TB reads and changes the row with id = 2. TB wants to read and change the row with id = 1. Because TA has not yet committed its changes, TB must wait for TA. TA wants to read and change the row with id = 2. Because TB has not yet committed its changes, TA must wait for TB. (Deadlock)

PostgreSQL's Solutions

edit

The SQL standard describes the 3 effects (or problematic situations) 'Uncommitted read', 'Non-repeatable read', and 'Phantom read' and defines 4 levels of isolation between transactions: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. Every level is stricter than its predecessor and prevents more effects, which means e.g. that a 'Non-repeatable read' is possible in level READ COMMITTED but not in REPEATABLE READ or SERIALIZABLE.

PostgreSQL implements those levels. But, as a consequence of its MVCC model, it implements some aspects a little stricter than they are demanded by the standard. If a transaction requests the level READ UNCOMMITTED, PostgreSQL handles it always as a READ COMMITTED, which leads to the overall behavior that all uncommitted changes are invisible to all other transactions at any level - only committed changes can be seen by other transactions.

Examples

edit

The following examples act on a table t1 with the two columns id and col and a single row.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (id INTEGER, col INTEGER);
INSERT INTO t1 VALUES (1, 100);
SELECT * FROM t1;
id | col 
----+-----
  1 | 100
(1 row)

Uncommitted read

edit

The example shows that PostgreSQL solely shows committed rows to other transactions.

Transaction A Transaction B
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
-- you can shorten the two commands into one:
-- BEGIN TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
UPDATE t1 SET col=101 WHERE id=1;
SELECT col FROM t1 WHERE id=1;
-- 101
-- 'READ UNCOMMITTED' acts equal to 'READ COMMITTED'
-- other transactions solely sees committed rows!
BEGIN TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT col FROM t1 WHERE id=1;
-- 100 (the committed one!)
COMMIT;
SELECT col FROM t1 WHERE id=1;
-- 101
SELECT col FROM t1 WHERE id=1;
-- 101 (again: the committed one!)
COMMIT; -- no real effect
SELECT col FROM t1 WHERE id=1;
-- 101

Lost update

edit

The example shows that PostgreSQL prevents 'lost update' in the lowest level of isolation - as well as in all other levels. (The table t1 contains its original values.)

Transaction A Transaction B
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT col FROM t1 WHERE id=1;
-- 100
UPDATE t1 SET col=col+1 WHERE id=1;
SELECT col FROM t1 WHERE id=1;
-- 101
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT col FROM t1 WHERE id=1;
-- 100
UPDATE t1 SET col=col+1 WHERE id=1;
-- UPDATE is queued and must wait for the
-- COMMIT of transaction A
.
.
COMMIT;
-- the above UPDATE executes after (!) the COMMIT
-- of transaction A
SELECT col FROM t1 WHERE id=1;
-- 102

Both UPDATE statements are executed, nothing gets lost.

Please note that transaction B is an example for a 'non-repeatable read' (see below) because the isolation level is '(UN)COMMITTED READ'. First, it reads the value '100' with its SELECT command. Next, it reads '101' with its UPDATE command - after COMMIT of transaction A - and increases it to '102'. If the isolation level would be 'REPEATABLE READ', transaction B would receive the error message 'could not serialize access due to concurrent update' as PostgreSQL's reaction to the UPDATE request.

Non-repeatable read

edit

The example shows a non-repeatable read. (The table t1 contains its original values.)

Transaction A Transaction B
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT col FROM t1 WHERE id=1;
-- 100
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
UPDATE t1 SET col=101 WHERE id=1;
SELECT col FROM t1 WHERE id=1;
-- 101
COMMIT;
SELECT col FROM t1 WHERE id=1;
-- 101 (same transaction, but different value)
-- ' ISOLATION LEVEL REPEATABLE READ' or 
-- 'SERIALIZATION' will avoid such an effect

Phantom read

edit

The example shows a phantom read. (The table t1 contains its original values.)

Transaction A Transaction B
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT col FROM t1 WHERE id>0;
-- 1 row: 100
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO t1 VALUES (2, 200);
COMMIT;
SELECT col FROM t1 WHERE id>0;
-- 2 rows: 100 and 200
SELECT col FROM t1 WHERE id>0;
-- 2 rows: 100 and 200
-- (same transaction, same query, but different rows)
-- ' ISOLATION LEVEL SERIALIZABLE'
-- will avoid such an effect

Dead lock

edit

The example shows a dead lock. (The table t1 contains two rows.)

DELETE FROM t1;
INSERT INTO t1 VALUES (1, 100);
INSERT INTO t1 VALUES (2, 200);
Transaction A Transaction B
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
UPDATE t1 SET col=col+1 WHERE id=1;
SELECT col FROM t1 WHERE id=1;
-- 101
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
UPDATE t1 SET col=col+1 WHERE id=2;
SELECT col FROM t1 WHERE id=2;
-- 201

UPDATE t1 SET col=col+1 WHERE id=1;
.
.
-- must wait for COMMIT/ROLLBACK of transaction A
UPDATE t1 SET col=col+1 WHERE id=2;
-- must wait for COMMIT/ROLLBACK of transaction B.
--
-- PostgreSQL detects the deadlock and performs a
-- ROLLBACK to overcome the circular situation.
-- message: "ERROR:  deadlock detected ..."
-- processing goes on with a 'success message'
SELECT col FROM t1 WHERE id>0;
-- 101
-- 201
-- no UPDATEs from transaction A. They were
-- ROLLBACK-ed by PostgreSQL.