DBMS - Concurrency Control

A transaction is a collection of instructions. To maintain the integrity of a database, all transactions must obey ACID properties. ACID is an acronym for below

Atomicity  this property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none.
Consistency  the database must remain in a consistent state after any transaction. No transaction should have any adverse effect on the data residing in the database. In CAP this is one important parameter as shown in below pic.
Durability  the database should be durable enough to hold all its latest updates even if the system fails or restarts.
Isolation - No transaction will affect the existence of any other transaction.

In this Article I will focus on Concurrency of database, note consistency is different to understand follow below:

A resource is called concurrent if multiple execution flows with the potential to share resources
Ex: two threads competing for a I/O port.
"Two queues accessing one ATM machine"
Parallel is: splitting a problem in multiple similar chunks.
Ex: parsing a big file by running two processes on every half of the file.
"Two queues and two ATM machines"

Concurrency allows as many users as possible to use a database at one time while preventing data conflicts from occurring. Assume you have an SQL statement, e.g., UPDATE table SET a = a + 1 WHERE ..., which would correspond to the following code:

read a
a = a + 1
write a

Assume that two clients A and B execute this simultaneously. The following could happen (time flows from top to bottom):

   A               B
read a    
                read a
a = a + 1
write a
                a = a + 1
                write a
What happens? a is incremented only once, although it should have been incremented twice. This is a classical concurrency conflict. 

Transactions executing at the same time need to produce meaningful and consistent results. Therefore, control of data concurrency and data consistency is vital in a multiuser database.

Data concurrency means that many users can access data at the same time.
Data consistency means that each user sees a consistent view of the data, including visible changes made by the user's own transactions and transactions of other users.

To describe consistent transaction behavior when transactions run at the same time, database researchers have defined a transaction isolation model called serializability. The serializable mode of transaction behavior tries to ensure that transactions run in such a way that they appear to be executed one at a time, or serially, rather than concurrently.
While this degree of isolation between transactions is generally desirable, running many applications in this mode can seriously compromise application throughput. Complete isolation of concurrently running transactions could mean that one transaction cannot perform an insert into a table being queried by another transaction. In short, real-world considerations usually require a compromise between perfect transaction isolation and performance.

The three preventable phenomena are:
Dirty reads: A transaction reads data that has been written by another transaction that has not been committed yet.
Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading transactions end with incorrect results.

Nonrepeatable (fuzzy) reads: A transaction rereads data it has previously read and finds that another committed transaction has modified or deleted the data.
The lost update problem: A second transaction writes a second value of a data-item (datum) on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results.

Phantom reads (or phantoms): A transaction re-runs a query returning a set of rows that satisfies a search condition and finds that another committed transaction has inserted additional rows that satisfy the condition.
The incorrect summary problem: While one transaction takes a summary over the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not.

The 3 main categories of concurrency control mechanisms are:
Optimistic - Delay the checking of whether a transaction meets the isolation and other integrity rules (e.g., serializability and recoverability) until its end, without blocking any of its (read, write) operations ("...and be optimistic about the rules being met..."), and then abort a transaction to prevent the violation, if the desired rules are to be violated upon its commit. An aborted transaction is immediately restarted and re-executed, which incurs an obvious overhead (versus executing it to the end only once). If not too many transactions are aborted, then being optimistic is usually a good strategy.

Pessimistic - Block an operation of a transaction, if it may cause violation of the rules, until the possibility of violation disappears. Blocking operations is typically involved with performance reduction.

Semi-optimistic - Block operations in some situations, if they may cause violation of some rules, and do not block in other situations while delaying rules checking (if needed) to transaction's end, as done with optimistic.

Many methods for concurrency control exist.

1.      Locking (e.g., Two-phase locking - 2PL) - Controlling access to data by locks assigned to the data. Access of a transaction to a data item (database object) locked by another transaction may be blocked (depending on lock type and access operation type) until lock release.

2.      Serialization graph checking (also called Serializability, or Conflict) - Checking for cycles in the schedule's graph and breaking them by aborts.

3.      Timestamp ordering (TO) - Assigning timestamps to transactions, and controlling or checking access to data by timestamp order.

4.      Commitment ordering (or Commit ordering; CO) - Controlling or checking transactions' chronological order of commit events to be compatible with their respective precedence order.

SQL Server: Choosing a transaction isolation level doesn't affect the locks that are acquired to protect data modifications. A transaction always gets an exclusive lock on any data it modifies and holds that lock until the transaction completes, regardless of the isolation level set for that transaction.

A lower isolation level increases the ability of many users to access data at the same time, but increases the number of concurrency effects, such as dirty reads or lost updates, that users might encounter. Conversely, a higher isolation level reduces the types of concurrency effects that users might encounter, but requires more system resources and increases the chances that one transaction will block another. Choosing the appropriate isolation level depends on balancing the data integrity requirements of the application against the overhead of each isolation level. The highest isolation level, serializable, guarantees that a transaction will retrieve exactly the same data every time it repeats a read operation, but it does this by performing a level of locking that is likely to impact other users in multi-user systems. The lowest isolation level, read uncommitted, can retrieve data that has been modified but not committed by other transactions. All concurrency side effects can happen in read uncommitted, but there's no read locking or versioning, so overhead is minimized.

The following table shows the concurrency side effects allowed by the different isolation levels.

Isolation Level
Dirty Read
Non Repeatable Read
Phantom
Read uncommitted
Yes
Yes
Yes
Read committed
No
Yes
Yes
Repeatable read
No
No
Yes
Snapshot
No
No
No
Serializable
No
No
No

Transactions must be run at an isolation level of at least repeatable read to prevent lost updates that can occur when two transactions each retrieve the same row, and then later update the row based on the originally retrieved values. If the two transactions update rows using a single UPDATE statement and don't base the update on the previously retrieved values, lost updates can't occur at the default isolation level of read committed.

Azure SQL Database default database wide setting is to enable read committed snapshot isolation (RCSI) by having both the READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION database options set to ON. You cannot change the database default isolation level. However, you can control the isolation level explicitly on a connection. One way to do this, you can use any one of the following statements in Azure SQL Database before you BEGIN TRANSACTION:

SET TRANSACTION  ISOLATION LEVEL  SERIALIZABLE
SET TRANSACTION  ISOLATION LEVEL  SNAPSHOT
SET TRANSACTION  ISOLATION LEVEL  REPEATABLE READ
SET TRANSACTION  ISOLATION LEVEL  READ COMMITTED
SET TRANSACTION  ISOLATION LEVEL  READ UNCOMMITTED

Post a Comment

Thanks for your comment !
I will review your this and will respond you as soon as possible.