A transaction is a
collection of instructions. To maintain the integrity of a database, all
transactions must obey ACID properties. ACID is an
acronym for below
In this Article I will focus on
Concurrency of database, note consistency is different to understand follow
below:
Atomicity − this property states that a transaction must be
treated as an atomic unit, that is, either all of its operations are executed
or none.
Consistency − the database must remain in a consistent state after
any transaction. No transaction should have any adverse effect on the data
residing in the database. In CAP this is one important parameter as shown in below pic.
Durability − the database should be durable enough to hold all its
latest updates even if the system fails or restarts.
Isolation - No transaction will affect the existence of any other transaction.
A resource is called concurrent
if multiple execution flows with the potential to share resources
Ex: two threads competing for a
I/O port.
"Two queues accessing one
ATM machine"
Parallel is: splitting a problem
in multiple similar chunks.
Ex: parsing a big file by
running two processes on every half of the file.
"Two queues and two ATM
machines"
Concurrency allows as many users
as possible to use a database at one time while preventing data conflicts from
occurring. Assume
you have an SQL statement, e.g., UPDATE table SET a = a + 1 WHERE ...,
which would correspond to the following code:
read a
a = a + 1
write a
Assume that two clients A and B
execute this simultaneously. The following could happen (time flows from top to
bottom):
A B
read a
read a
a = a + 1
write a
a = a + 1
write a
What happens? a is
incremented only once, although it should have been incremented twice. This is
a classical concurrency conflict.
Transactions executing at the
same time need to produce meaningful and consistent results. Therefore, control
of data concurrency and data consistency is vital in a multiuser database.
Data concurrency means that many users can
access data at the same time.
Data consistency means that each user sees
a consistent view of the data, including visible changes made by the user's own
transactions and transactions of other users.
To describe consistent
transaction behavior when transactions run at the same time, database
researchers have defined a transaction isolation
model called serializability. The serializable mode of
transaction behavior tries to ensure that transactions run in such a way that
they appear to be executed one at a time, or serially, rather than
concurrently.
While this degree of isolation
between transactions is generally desirable, running many applications in this
mode can seriously compromise application throughput. Complete isolation of
concurrently running transactions could mean that one transaction cannot
perform an insert into a table being queried by another transaction. In short,
real-world considerations usually require a compromise between perfect
transaction isolation and performance.
The three preventable phenomena
are:
Dirty reads: A
transaction reads data that has been written by another transaction that has
not been committed yet.
Transactions read a value written
by a transaction that has been later aborted. This value disappears from the
database upon abort, and should not have been read by any transaction
("dirty read"). The reading transactions end with incorrect results.
Nonrepeatable (fuzzy) reads: A transaction rereads data it has previously read and finds
that another committed transaction has modified or deleted the data.
The lost update problem: A
second transaction writes a second value of a data-item (datum) on top of a
first value written by a first concurrent transaction, and the first value is
lost to other transactions running concurrently which need, by their
precedence, to read the first value. The transactions that have read the wrong
value end with incorrect results.
Phantom reads (or phantoms): A transaction re-runs a query returning a set of rows that
satisfies a search condition and finds that another committed transaction has
inserted additional rows that satisfy the condition.
The incorrect summary problem:
While one transaction takes a summary over the values of all the instances of a
repeated data-item, a second transaction updates some instances of that
data-item. The resulting summary does not reflect a correct result for any
(usually needed for correctness) precedence order between the two transactions
(if one is executed before the other), but rather some random result, depending
on the timing of the updates, and whether certain update results have been
included in the summary or not.
The 3 main categories of concurrency control
mechanisms are:
Optimistic - Delay the checking of whether a transaction
meets the isolation and other integrity rules (e.g., serializability and recoverability)
until its end, without blocking any of its (read, write) operations
("...and be optimistic about the rules being met..."), and then abort
a transaction to prevent the violation, if the desired rules are to be violated
upon its commit. An aborted transaction is immediately restarted and
re-executed, which incurs an obvious overhead (versus executing it to the end
only once). If not too many transactions are aborted, then being optimistic is
usually a good strategy.
Pessimistic - Block an operation of a transaction,
if it may cause violation of the rules, until the possibility of violation
disappears. Blocking operations is typically involved with performance
reduction.
Semi-optimistic - Block operations in some situations,
if they may cause violation of some rules, and do not block in other situations
while delaying rules checking (if needed) to transaction's end, as done with
optimistic.
Many methods for concurrency control exist.
1.
Locking
(e.g., Two-phase locking - 2PL) - Controlling access to data by locks assigned
to the data. Access of a transaction to a data item (database object) locked by
another transaction may be blocked (depending on lock type and access operation
type) until lock release.
2.
Serialization graph checking (also called
Serializability, or Conflict) - Checking for cycles in the
schedule's graph and
breaking them by aborts.
3.
Timestamp ordering (TO) - Assigning timestamps to transactions, and
controlling or checking access to data by timestamp order.
4.
Commitment ordering (or Commit ordering;
CO) - Controlling or
checking transactions' chronological order of commit events to be compatible
with their respective precedence order.
SQL Server: Choosing a transaction
isolation level doesn't affect the locks that are acquired to protect data
modifications. A transaction always gets an exclusive lock on any data it
modifies and holds that lock until the transaction completes, regardless of the
isolation level set for that transaction.
A lower isolation level increases the
ability of many users to access data at the same time, but increases the number
of concurrency effects, such as dirty reads or lost updates, that users might
encounter. Conversely, a higher isolation level reduces the types of
concurrency effects that users might encounter, but requires more system
resources and increases the chances that one transaction will block another.
Choosing the appropriate isolation level depends on balancing the data
integrity requirements of the application against the overhead of each
isolation level. The highest isolation level, serializable, guarantees that a
transaction will retrieve exactly the same data every time it repeats a read
operation, but it does this by performing a level of locking that is likely to
impact other users in multi-user systems. The lowest isolation level, read
uncommitted, can retrieve data that has been modified but not committed by
other transactions. All concurrency side effects can happen in read
uncommitted, but there's no read locking or versioning, so overhead is
minimized.
The following table shows the
concurrency side effects allowed by the different isolation levels.
Isolation Level
|
Dirty Read
|
Non Repeatable Read
|
Phantom
|
Read uncommitted
|
Yes
|
Yes
|
Yes
|
Read committed
|
No
|
Yes
|
Yes
|
Repeatable read
|
No
|
No
|
Yes
|
Snapshot
|
No
|
No
|
No
|
Serializable
|
No
|
No
|
No
|
Transactions must be run at an
isolation level of at least repeatable read to prevent lost updates that can
occur when two transactions each retrieve the same row, and then later update
the row based on the originally retrieved values. If the two transactions
update rows using a single UPDATE statement and don't base the update on the
previously retrieved values, lost updates can't occur at the default isolation
level of read committed.
Azure SQL Database default database wide
setting is to enable read committed snapshot isolation (RCSI) by having both
the READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION database options set
to ON. You cannot change the database default isolation level. However,
you can control the isolation level explicitly on a connection. One way to do
this, you can use any one of the following statements in Azure SQL Database
before you BEGIN TRANSACTION:
SET TRANSACTION ISOLATION
LEVEL SERIALIZABLE
SET TRANSACTION ISOLATION
LEVEL SNAPSHOT
SET TRANSACTION ISOLATION
LEVEL REPEATABLE READ
SET TRANSACTION ISOLATION
LEVEL READ COMMITTED
SET TRANSACTION ISOLATION
LEVEL READ UNCOMMITTED
Post a Comment
Post a Comment
Thanks for your comment !
I will review your this and will respond you as soon as possible.