PostgreSQL/Visibility

PostgreSQL
Visibility

Some exemplary Problems

It's obvious that every transaction 'sees' all data changes, it has been carried out during its lifetime, without problems. But there are situations where more than one process wants to read or write the same data during an overlapping time interval of their transactions or even at the same point in time, which is possible on servers with multiple CPUs or a disk array. In such cases, different types of conflicts and suspicious effects may occur.

Applications may or may not accept the effects resulting from such competing situations. They can choose different levels of isolation against the activities of other transactions depending on their needs. The level defines which effects they are willing to accept and which not. Higher levels mean that fewer effects can occur but the database system must work harder and that the overall throughput decreases.

Here are some examples with two transactions T_A and T_B. Both don't perform a COMMIT if not explicitly noted.

T_A reads the row with id = 1. T_B reads the same row. T_A increases column X by 1. T_B increases the same column by 1. What will be the result? There is the danger of a 'Lost update'.

T_A changes a value of the row with id = 1. What shall T_B see if it reads the same row? T_A may perform a ROLLBACK. (Uncommitted read)

T_A reads the row with id = 1. T_B reads the same row, changes a value and performs a COMMIT. T_A reads the row again. In comparison to its first read, it will see a different value. (Non-repeatable read)

T_A reads all rows with status = 'ok'. T_B inserts an additional row with status = 'ok' and performs a COMMIT. T_A reads all rows with status = 'ok' again and receives a different number of rows. (Phantom read)

T_A reads and changes the row with id = 1. T_B reads and changes the row with id = 2. T_B wants to read and change the row with id = 1. Because T_A has not yet committed its changes, T_B must wait for T_A. T_A wants to read and change the row with id = 2. Because T_B has not yet committed its changes, T_A must wait for T_B. (Deadlock)

PostgreSQL's Solutions

The SQL standard describes the 3 effects (or problematic situations) 'Uncommitted read', 'Non-repeatable read', and 'Phantom read' and defines 4 levels of isolation between transactions: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. Every level is stricter than its predecessor and prevents more effects, which means e.g. that a 'Non-repeatable read' is possible in level READ COMMITTED but not in REPEATABLE READ or SERIALIZABLE.

PostgreSQL implements those levels. But, as a consequence of its MVCC model, it implements some aspects a little stricter than they are demanded by the standard. If a transaction requests the level READ UNCOMMITTED, PostgreSQL handles it always as a READ COMMITTED, which leads to the overall behavior that all uncommitted changes are invisible to all other transactions at any level - only committed changes can be seen by other transactions.

Examples

The following examples act on a table t1 with the two columns id and col and a single row.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (id INTEGER, col INTEGER);
INSERT INTO t1 VALUES (1, 100);
SELECT * FROM t1;
id | col 
----+-----
  1 | 100
(1 row)

Uncommitted read

The example shows that PostgreSQL solely shows committed rows to other transactions.

Transaction A	Transaction B
BEGIN; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; -- you can shorten the two commands into one: -- BEGIN TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; UPDATE t1 SET col=101 WHERE id=1; SELECT col FROM t1 WHERE id=1; -- 101
	-- 'READ UNCOMMITTED' acts equal to 'READ COMMITTED' -- other transactions solely sees committed rows! BEGIN TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; SELECT col FROM t1 WHERE id=1; -- 100 (the committed one!)
COMMIT; SELECT col FROM t1 WHERE id=1; -- 101
	SELECT col FROM t1 WHERE id=1; -- 101 (again: the committed one!) COMMIT; -- no real effect SELECT col FROM t1 WHERE id=1; -- 101

Lost update

The example shows that PostgreSQL prevents 'lost update' in the lowest level of isolation - as well as in all other levels. (The table t1 contains its original values.)

Transaction A	Transaction B
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; SELECT col FROM t1 WHERE id=1; -- 100 UPDATE t1 SET col=col+1 WHERE id=1; SELECT col FROM t1 WHERE id=1; -- 101
	BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; SELECT col FROM t1 WHERE id=1; -- 100 UPDATE t1 SET col=col+1 WHERE id=1; -- UPDATE is queued and must wait for the -- COMMIT of transaction A . .
COMMIT;
	-- the above UPDATE executes after (!) the COMMIT -- of transaction A SELECT col FROM t1 WHERE id=1; -- 102

Both UPDATE statements are executed, nothing gets lost.

Please note that transaction B is an example for a 'non-repeatable read' (see below) because the isolation level is '(UN)COMMITTED READ'. First, it reads the value '100' with its SELECT command. Next, it reads '101' with its UPDATE command - after COMMIT of transaction A - and increases it to '102'. If the isolation level would be 'REPEATABLE READ', transaction B would receive the error message 'could not serialize access due to concurrent update' as PostgreSQL's reaction to the UPDATE request.

Non-repeatable read

The example shows a non-repeatable read. (The table t1 contains its original values.)

Transaction A

Transaction B

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT col FROM t1 WHERE id=1;
-- 100

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
UPDATE t1 SET col=101 WHERE id=1;
SELECT col FROM t1 WHERE id=1;
-- 101
COMMIT;

SELECT col FROM t1 WHERE id=1;
-- 101 (same transaction, but different value)
-- ' ISOLATION LEVEL REPEATABLE READ' or 
-- 'SERIALIZATION' will avoid such an effect

Phantom read

The example shows a phantom read. (The table t1 contains its original values.)

Transaction A

Transaction B

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT col FROM t1 WHERE id>0;
-- 1 row: 100

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO t1 VALUES (2, 200);
COMMIT;
SELECT col FROM t1 WHERE id>0;
-- 2 rows: 100 and 200

SELECT col FROM t1 WHERE id>0;
-- 2 rows: 100 and 200
-- (same transaction, same query, but different rows)
-- ' ISOLATION LEVEL SERIALIZABLE'
-- will avoid such an effect

Dead lock

The example shows a dead lock. (The table t1 contains two rows.)

DELETE FROM t1;
INSERT INTO t1 VALUES (1, 100);
INSERT INTO t1 VALUES (2, 200);

Transaction A	Transaction B
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; UPDATE t1 SET col=col+1 WHERE id=1; SELECT col FROM t1 WHERE id=1; -- 101
	BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; UPDATE t1 SET col=col+1 WHERE id=2; SELECT col FROM t1 WHERE id=2; -- 201 UPDATE t1 SET col=col+1 WHERE id=1; . . -- must wait for COMMIT/ROLLBACK of transaction A
UPDATE t1 SET col=col+1 WHERE id=2; -- must wait for COMMIT/ROLLBACK of transaction B. -- -- PostgreSQL detects the deadlock and performs a -- ROLLBACK to overcome the circular situation. -- message: "ERROR: deadlock detected ..."
	-- processing goes on with a 'success message' SELECT col FROM t1 WHERE id>0; -- 101 -- 201 -- no UPDATEs from transaction A. They were -- ROLLBACK-ed by PostgreSQL.

ACID

PostgreSQL
Visibility

MVCC