Structured Query Language/Transactions

Structured Query Language
Transactions

A transaction is a grouping of one or more SQL statements - that contains statements that write to the database, such as INSERT, UPDATE or DELETE, and also the SELECT command can be part of a transaction. All writing statements must be part of a transaction. The purpose of transactions is to guarantee that the database changes only from one consistent state to another consistent state. It combines all intermediate states into one change-event. This holds true also in critical situations such as parallel processing, disc crash, power failure, ... . Transactions ensure the database integrity.

To do so they support four basic properties, which all in all are called the ACID paradigm.

Atomic	All SQL statements of the transaction take place or none.
Consistent	The sum of all data changes of a transaction transforms the database from one consistent state to another consistent state.
Isolated	The isolation level defines, which parts of uncommited transactions are visible to other sessions.
Durable	The database retains committed changes even if the system crashes afterward.

Transaction Boundaries

As every SQL statement which writes to the database is part of a transaction, the DBMS silently starts a transaction for every of them if there is no in-complete transaction started by an earlier statement. Alternatively, the application/session can start a transaction explicitly by the command START TRANSACTION.

All subsequent SQL commands are part of this transaction. The transaction remains until it is confirmed or rejected. The confirmation takes place with the command COMMIT, the rejection with the command ROLLBACK. Before the COMMIT or ROLLBACK command is submitted, the DBMS stores the results of every writing statement into an intermediate area where it is not visible to other sessions (see: Isolation Levels). Simultaneously with the COMMIT command, all changes of this transaction ends up in the common database, are visible to every other session, and the transaction terminates. If the COMMIT fails for any reason, it happens the same as when the session submits a ROLLBACK command: all changes of this transaction are discarded, and the transaction terminates. Optionally, a session can revert its complete writing actions, which are part of the active transaction, by submitting the single command ROLLBACK.

An Example:

-- Begin the transaction with an explicit command (In general not necessary. Not supported by Oracle.)
START TRANSACTION;
-- Insert some rows
INSERT ... ;
-- Modify those rows or some other rows
UPDATE ... ;
-- Delete some rows
DELETE ... ;
-- If the COMMIT succeeds, the results of the above 3 commands have been transferred to the 'common' 
-- database and thus 'published' to all other sessions.
COMMIT;
--
-- 
START TRANSACTION;
INSERT ... ;
UPDATE ... ;
DELETE ... ;
-- Discard INSERT, UPDATE and DELETE
ROLLBACK;

Savepoints

As transactions can include many statements, it is likely that runtime errors or logical errors arise. In some of such cases, applications want to rollback only parts of the actual transaction and commit the rest or resume the processing a second time. To do so, it is possible to define internal transaction boundaries, which reflects all processing from the start of the transaction up to this point in time. Such intermediate boundaries are called savepoints. COMMIT and ROLLBACK statements terminate the complete transaction, including its savepoints.

-- Begin the transaction with an explicit command
START TRANSACTION;
--
INSERT ... ;
-- Define a savepoint
SAVEPOINT step_1;
--
UPDATE ... ;
-- Discard only the UPDATE. The INSERT remains.
ROLLBACK TO SAVEPOINT step_1;
-- try again (or do any other action)
UPDATE ... ;
-- confirm INSERT and the second UPDATE
COMMIT;

During the lifetime of a transaction, a savepoint can be released if it's no longer needed. (It gets implicitly released at the end of the transaction.)

-- ...
-- ...
RELEASE SAVEPOINT <savepoint_name>;
-- This has no effect on the results of the previous INSERT, UPDATE or DELETE commands. It only eliminates the
-- possibility to ROLLBACK TO SAVEPOINT <savepoint_name>.

Atomicity

A transaction guarantees that the results of all of its statements are handled on a logical level as one single operation. All writing statements have a temporary nature until the COMMIT command terminates successfully.

This behavior helps to ensure the logical integrity of business logic. E.g.: If one wants to transfer some amount of money from one account to another, at least two rows of the database must be modified. The first modification decreases the amount in one row, and the second one increases it on a different row. If there is a disc crash or power failure between these two write-operations, the application has a problem. But the atomicity property of transactions guarantees that none of the write-operations reaches the database (in the case of any failure or a ROLLBACK) or all of them reach the database (in the case of a successful COMMIT).

There is more detailed information about the atomicity property at Wikipedia.

Consistency

Transactions guarantee that the database is in a consistent state after they terminate. This consistency occurs at different levels:

The data and all derived index entries are synchronized. In most cases, data and index entries are stored in different areas within the database. Nevertheless, after the end of a transaction, both areas are updated (or none).
Table constraints and column constraints may be violated during a transaction (by use of the DEFERRABLE keyword) but not after its termination.
There may be Primary and Foreign Keys. During a transaction, the rules for Foreign Keys may be violated (by use of the DEFERRABLE keyword) but not after its termination.
The logical integrity of the database is not guaranteed! If, in the above example of a bank account, the application forgets to update the second row, problems will arise.

Isolation

In most situations, there are a lot of sessions working simultaneously on the DBMS. They compete for their resources, especially for the data. As long as the data is not modified, this is no problem. The DBMS can deliver the data to all of them.

But if multiple sessions try to modify data at the same point in time, conflicts are inescapable. Here is the timeline of an example with two sessions working on a flight reservation system. Session S1 reads the number of free seats for a flight: 1 free seat. S2 reads the number of free seats for the same flight: 1 free seat. S1 reserves the last seat. S2 reserves the last seat.

The central result of the analysis of such conflicts is that all of them are avoidable, if all transactions (concerning the same data) run sequentially: one after the other. But it's obvious that such a behavior is less efficient. The overall performance is increased if the DBMS does as much work as possible in parallel. The SQL standard offers a systematic of such conflicts and the command SET TRANSACTION ... to resolve them with the aim to allow parallel operations as much as possible.

Classification of Isolation Problems

The standard identifies three problematic situations:

P1 (Dirty read): "SQL-transaction T1 modifies a row. SQL-transaction T2 then reads that row before T1 performs a COMMIT. If T1 then performs a ROLLBACK, T2 will have read a row that was never committed, and that may thus be considered to have never existed." ^[1]
P2 (Non-repeatable read): "SQL-transaction T1 reads a row. SQL-transaction T2 then modifies or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted." ^[1] Non-repeatable reads concern single rows.
P3 (Phantom): "SQL-transaction T1 reads the set of rows N that satisfy some search condition. SQL transaction T2 then executes SQL-statements that generate one or more rows that satisfy the search condition used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same search condition, it obtains a different collection of rows." ^[1] Phantoms concern result sets.

Avoidance of Isolation Problems

Depending on the requirements and access strategy of an application, some of the above problems may be tolerable - others not. The standard offers the SET TRANSACTION ... command to define, which are allowed to occur within a transaction and which not. The SET TRANSACTION ... command must be the first statement within a transaction.

-- define (un)tolerable conflict situations (Oracle does not support all of them)
SET TRANSACTION ISOLATION LEVEL [READ UNCOMMITTED |
                                 READ COMMITTED   |
                                 REPEATABLE READ  |
                                 SERIALIZABLE];

The following table shows which problems may occur within each level.

Isolation level	Dirty reads	Non-repeatable reads	Phantoms
Read Uncommitted	may occur	may occur	may occur
Read Committed	-	may occur	may occur
Repeatable Read	-	-	may occur
Serializable	-	-	-

At Wikipedia there are more detailed informatiton and examples about isolation levels and concurrency control.

Durability

Transactions guarantee that every confirmed write-operation will survive (almost) every subsequent disaster. To do so, in most cases, the DBMS writes the changes not only to the database but additionally to logfiles, which shall reside on different devices. So it is possible - after a disc crash - to restore all changes from a database backup plus these logfiles.

There is more detailed information about the durability property at Wikipedia.

Autocommit

Some DBMS offers - outside of the standard - an AUTOCOMMIT feature. If it is activated, the feature submits automatically a COMMIT command after every writing statement with the consequence that you cannot ROLLBACK a logical unit-of-work consisting of a lot of SQL statements. Furthermore, the use of the SAVEPOINT feature is not possible.

In much cases, the feature is activated by default.

References

↑ ^a ^b ^c "ISO/IEC 9075-2:2011: Information technology -- Database languages -- SQL -- Part 2: Foundation (SQL/Foundation)".

NULLs and the Three Valued Logic

Structured Query Language
Transactions

Retrieve Top N Rows per Group

[ISO/IEC-1] "ISO/IEC 9075-2:2011: Information technology -- Database languages -- SQL -- Part 2: Foundation (SQL/Foundation)".

[1]