Relational Database Design/Transactionality

Transactionality

edit

ACID

edit

ACID is an acronym for the four most desirable basic characteristics of a transaction system: transactions should be Atomic, Consistent, Isolated and Durable.

  • Atomic transactions are such that the transaction is either entirely completed or makes no change to the database; even if an error or a hardware fault occurs mid-transaction the database will not be left with a half-completed transaction.
  • Consistent transactions ensure that the database is left in a consistent state after the transaction is complete, meaning that any integrity constraints (unique keys, foreign keys, CHECK constraints etc.) must be satisfied or the transaction will be rejected.
  • Isolated transactions are invisible to other users of the database while they are being processed
  • Durable transactions guarantee that they will not be rolled back after the caller has committed them

Early RDBMSes couldn't always guarantee all four of these requirements with their transactions, but modern counterparts can usually provide ACID transactions even in the event of power or hardware failure.

Precedence Relationships (Serialization)

edit

The basic operations are read, and write. Then there is commit or abort. A conflict occurs if at least one operation is a write onto a shared object. A conflict is measured as a pair of operations performed by two different transactions. Hence the possible combinations are RR, RW, WR, WW.

Conflicts occur with RW, because the first R is an unrepeatable read , as the W from another transaction has overwritten the previous value read in R ;

In WR, there may be an abort of the first W after the R, leading to a cascading abort , because the value written by W has been aborted and R is using an invalid value from W.

In WW, the first write is a lost update e.g. W1W2R1 R1 lost the update because it reads W2

Locking

edit

Can achieve a serializable, recoverable schedule (an interleaved schedule of two transactions, with the same result of the two transactions one after the other, and which an abort of one transaction will not invalidate the commit of another schedule).

2PL - 2 phase locking has a growing phase, where locks can be acquired; when a lock is released, all locks must be released, and new ones cannot be acquired. This ensures that one transactions can complete modifications using reads acquired inside a lock in one phase, without letting the phase of another transaction doing a conflicting RW, WR, or WW.

Locks provide exclusive access; this combined with non-preemptibility (locks can't be released except by the locking transaction/process), the ability to hold more than one lock, and the inability to detect a chain of locks forming a cycle, leads to deadlock.

Deadlock may be dealt with if an ordering to lock holders e.g. age, applies, to prevent a cycle occurring.

Timestamped Based concurrency control

edit

Lock based concurrency control is sometimes called pessimistic whereas timestamp based concurrency control is termed optimistic i.e. a pessimist foresees Murphy's Law applying so will do everything possible to prevent an undesirable event. Locking generally means at least one writer-reader or writer-writer conflict is actively prevented by obtaining an appropriate level lock, and unifying a transactions multiple updates or reads by having a growing phase and a release phase , where once one lock is released, no other locks can be acquired.

The optimist operates generally like the tiny headed cat in a particular cartoon show and through sheer audacity and good fortune, manages to get transactions done most of the time without having to bother checking before hand. However, being a legitimate computer algorithm, the optimistic timestamp protocol requires checking of data read and write timestamps at the time of commit, against transaction timestamps, requiring no write timestamps and sometimes even no read timestamp on the acquired data items, occurring after the transaction timestamp (assuming something like a monotonic system-wide clock). Thus a protected optimist can end up doing a lot of work, only to be starved by repeated aborts, much like Sysiphus and his rock.

Multiversion concurrency control

edit

Multiversion Concurrency Control (MVCC) is a concurrency method that extends row-level locking to allow data to be read without locking the rows. Under MVCC, writing to data doesn't overwrite the old values but merely creates a new copy, tagged with the ID of the transaction that wrote to it. This allows earlier read-only transactions to carry on reading the old values, which continue to form a consistent snapshot of the data before it was overwritten. Thus the earlier read transaction doesn't need to hold a lock on the data.