Data Base Management Systems
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING (AUTONOMOUS) Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230. |
UNIT IV
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's account to B's account. This very simple and small transaction involves several low-level tasks.
A’s Account
Open_Account (A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account (A)
B’s Account
Open_Account (B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account (B)
ACID Properties
A transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.
Atomicity (A)
This property ensures that either all the operations of a transaction reflect in database or none.
Example
Suppose Account A has a balance of Rs.400 & B has Rs.700. Account A is transferring Rs.100 to Account B. This is a transaction that has two operations
a) Debiting Rs.100 from A’s balance
b) Creating Rs.100 to B’s balance.
Let’s say first operation passed successfully while second failed, in this case A’s balance would be Rs. 300 while B would be having Rs. 700 instead of Rs. 800.
This is unacceptable in a banking system. Either the transaction should fail without executing any of the operation or it should process both the operations. The Atomicity property ensures that.
Consistency (C)
Integrity Constraints should be maintained. So, that the database is consistent before and after the transaction. To preserve the consistency of database, the execution of transaction should take place in isolation (that means no other transaction should run concurrently when there is a transaction already running).
Example
Account A is having a balance of 400 and it is transferring 100 to account B & C both. We have two transactions here. Let’s say these transactions run concurrently and both the transactions read 400 balance, in that case the final balance of A would be 300 instead of 200 .
This is wrong. If the transaction were to run in isolation, then the second transaction would have read the correct balance 300 (before debiting 100 ) once the first transaction went successful.
Isolation
Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently in the system.
Durability (D)
Once a transaction completes successfully, the changes it has made into the database should be permanent even if there is a system failure. The recovery management component of database systems ensures the durability of transaction.
Transaction States
Any transaction at any point of time must be in one of the following states:
Active: The initial state; the transaction stays in this state while it is executing.
Partially committed: After the final statement has been executed.
Failed: After the discovery that normal execution can no longer proceed.
Aborted: After the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction
Committed: After successful completion.
A transaction that completes its execution successfully is said to be committed. A committed transaction that has performed updates transforms the database into a new consistent state, which must persist even if there is a system failure.
Once a transaction has committed, we cannot undo its effects by aborting it. The only way to undo the effects of a committed transaction is to execute a compensating transaction.
An aborted transaction has two options
• It can be restarted, but only if the transaction was aborted as a result of some hardware or software error that was not created through the internal logic of the transaction. A restarted transaction is considered to be a new transaction.
• It can be killed, It usually does so because of some internal logical error that can be corrected only by rewriting the application program, or because the input was bad, or because the desired data were not found in the database.
Serializability
Schedule
A chronological execution sequence of a transaction is called a schedule. A schedule can have many transactions in it and each comprising of number of instructions/tasks.
In general, there are two types of schedules
T1 and T2 be two transactions that transfer funds from one account to another. Transaction T1 transfers $50 from account A to account B. It is defined as
T1: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Transaction T2 transfers 10 percent of the balance from account A to account B. It is defined as
T2: read(A);
temp := A * 0.1;
A := A − temp;
write(A);
read(B);
B := B + temp;
write(B).
Suppose the current values of accounts A and B are $1000 and $2000
Schedule2 Concurrent schedule
Serial Schedules
It is a schedule in which transactions are aligned in such a way that one transaction is executed first. When the first transaction completes its cycle, then the next transaction is executed. Transactions are ordered one after the other. This type of schedule is called a serial schedule, as transactions are executed in a serial manner.
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence:
If two schedules produce the same result after execution, they are said to be result equivalent. They may yield the same result for some value and different results for another set of values. That's why this equivalence is not generally considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a similar manner.
For example −
Conflict Equivalence
Two Operations would be conflicting if they have the following properties –
Example:
Conflicting operations pair (R1(A), W2(A)) because they belong to two different transactions on same data item A and one of them is write operation.
Similarly,
(W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
S: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)
On the other hand, (R1(A), W2(B)) pair is non-conflicting because they operate on different data item.
Similarly, ((W1(A), W2(B)) pair is non-conflicting.
Two schedules having multiple transactions with conflicting operations are said to be conflict equivalent if and only if
Check whether the following schedules are conflict equivalent or not.
S1: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)
S2:R1(X) W1(X) W1(Y) R1(Y) R2(X) W2(X)
Let us consider S1. In this the conflict operations are
Now, consider S2. In this the conflict operations are
Note − View equivalent schedules are view serializable and conflict equivalent schedules are conflict serializable. All conflict serializable schedules are view serializable too.
Serializability
A schedule is said to be serializable schedule if it is equivalent to the any one of the serial schedule.
Conflict Serializable Schedules
A schedule S is said to be conflict serializable schedule if it is conflict equivalent to the any one of the serial schedule.
Method-1 for checking the conflict Serializability: -
Step-1: Generate all possible serial schedules for the given schedule. If a schedule contains n transactions, then possible number of serial schedules are n!
Step-2: Now compare each serial schedule with given schedule for conflict equivalence. If any serial schedule is conflict equivalent with the given schedule, then it is conflict serializable schedule otherwise it is not a conflict serializable schedule.
Example: S: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)
Step-1: Generate all possible serial schedules for S. Here S contains 2 transactions, so the possible number of serial schedules are 2(2!). Those are
<T1 T2> (S12)
<T2 T1> (S21)
Step-2: Now compare S with <T1 T2 > for conflict equivalence.
Now S: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)
S12: R1(X)W1(X) R1(Y)R2(X)W2(X)
In S the conflicting operations are
R1(X) W2(X),
R2(X) W1(X),
W1(X)W2(X)
In S12 the conflicting operations are
R1(X) W2(X),
W1(X) R2(X)
W1(X)W2(X)
Here S, S12 are not following the same order for conflicting operation pairs. S
Now compare S with <T2 T1> for conflict equivalence.
S: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)
S21: R2(X)W2(X)R1(X)W1(X) R1(Y) W1(Y)
In S the conflicting operations are
R1(X) W2(X),
R2(X) W1(X),
W1(X)W2(X)
In S21 the conflicting operations are
R2(X) W1(X),
W2(X)W1(X),
W2(X)R1(X)
Here S, S21 are not following the same order for conflicting operation pairs.
So, the given schedule S is not a conflict serializable schedule.
S: R1(A)W1(A)R2(A)W2(A)R1(B)W1(B)R2(B)W2(B)
View Serializable Schedules
A schedule S is said to be view Serializable schedule if it is view equivalent to the any one of the serial schedule.
Method for checking the View Serializability:
Step-1: Generate all possible serial schedules for the given schedule. If a schedule contains n transactions, then possible number of serial schedules are n!
Step-2: Now compare each serial schedule with given schedule for view equivalence. If any serial schedule is view equivalent with the given schedule, then it is view Serializable schedule otherwise it is not a view Serializable schedule.
Example:
Check whether the schedule is view Serializable or not?
S: R2 (B); R2 (A); R1 (A); R3 (A); W1 (B); W2 (B); W3 (B);
Solution: With 3 transactions, total number of schedules possible = 3! = 6
<T1 T2 T3>
<T1 T3 T2>
<T2 T3 T1>
<T2 T1 T3>
<T3 T1 T2>
<T3 T2 T1>
Step 1: Initial Read
A: T2
B: T2
Remaining Schedules:
< T2 T1 T3>
<T2 T3 T1 >
Step 2: Write Read Sequence (WR):
No need to check here.
Step 3: Final Updation on data items:
A: -
B: T3
Since the final Updation on B is made by T3, so the transaction T3 must execute after transactions T1 and T2.
Now, removing those schedules in which T3 is not executing at last.
Hence, view equivalent serial schedule is:
T2 → T1 → T3
Hence, given schedule is view Serializable schedule.
S:R1(A)W2(A)R3(A)W1(A)W3(A)
Recoverability
Failure Classification
There are various types of failure that may occur in a system, each of which needs to be dealt with in a different manner. The failures are classified as follows
Transaction failure
There are two types of errors that may cause a transaction to fail:
Logical error
The transaction can no longer continue with its normal execution because of some internal condition, such as bad input, data not found, overflow, or resource limit exceeded.
System error
The system has entered an undesirable state (for example, deadlock), as a result of which a transaction cannot continue with its normal execution. The transaction, however, can be executed later.
System crash
There is a hardware malfunction, or a bug in the database software or the operating system, that causes the loss of the content of volatile storage and brings transaction processing to a halt. The content of non-volatile storage remains intact and is not corrupted.
Disk failure
To determine how the system should recover from failures, we need to identify the failure modes of those devices used for storing data. Next, we must consider how these failure modes affect the contents of the database.
Storage Structure
The storage structure can be divided into two categories
Volatile storage
Examples
main memory ,RAM and cache memory are examples of volatile storage.
Non-volatile storage
Examples:
Harddisks, magnetic tapes, flash memory, and non-volatile (battery backed up) ROM.
Recovery and Atomicity
When a DBMS recovers from a crash, it should maintain the following
There are two types of techniques, which can help a DBMS in recovering as well as maintaining the atomicity of a transaction
Log-based Recovery
Log maintains the records of actions performed by a transaction. It is important that the logs are written prior to the actual modification and stored on a stable storage media, which is failsafe.
Log-based recovery works as follows
The log file is kept on a stable storage media.
<Tn, Start>
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
<Tn, commit>
The database can be modified using two approaches
Deferred database modification
In this all the writes are deferred to after partially commit. In this we have two cases.
If all the operations in schedule are successful, then all the writes are deferred to partial commit otherwise no writes are deferred to partial commit.
Example:
(Start,T0)
(Write,T0,a,9)
Crash (Nothing is to be done because no commit found)
(commit,T0)
(start, T1)
(write,T1,a,7)
Crash (T0 committed so redo T0)
(commit,T1)
Crash (T0, T1 are committed so redo T0,T1)
Immediate Database Modification:
Example:
(Start, T0)
(Write, T0, A, 3, 9)
Crash (Undo T0 so A=3)
(Commit, T0)
(Start, T1)
(Write, T1, B, 2, 5)
(Write, T1, a, 9, 7)
Crash (undo – T1 and Redo T0 so B=2, A=9)
(Commit, T1)
Crash (Redo T0, T1 so B=5, A=7)
Checkpoint
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and their previous logs are removed and then redone before saving their logs.
Example:
(Start, T1)
(Write, T1, B, 2, 3)
(Start, T2)
(Commit, T1)
(Write, T2, C, 5, 7)
(Commit, T2)
(Checkpoint, {T2})
(Start, t3)
(Write, T3, A, 1, 9)
(Commit, T3)
(Start, T4)
(Write, T4, C, 7, 2)
In the above example Undo list is – T4
Redo list is – T2,T3
First it performs Undo list in the order i.e., T4 and then it performs Redo T2,T3.
Deadlock Handling
Deadlock is a state of a database system having two or more transactions, when each transaction is waiting for a data item that is being locked by some other transaction.
A deadlock can be indicated by a cycle in the wait-for-graph. This is a directed graph in which the vertices denote transactions and the edges denote waits for data items.
Example
Transaction T1 is waiting for data item X which is locked by T3. T3 is waiting for Y which is locked by T2 and T2 is waiting for Z which is locked by T1. Hence, a waiting cycle is formed, and none of the transactions can proceed executing.
T1
T3
T2
X
Z
Y
There are three classical approaches for deadlock handling, namely
All of the three approaches can be incorporated in both a centralized and a distributed database system.
Deadlock Prevention
There are two algorithms for this purpose, namely wait-die and wound-wait.
Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a data item which is already locked by T2. The algorithms are as follows
Wait-Die
The wait–die scheme is a non preemptive technique. When transaction Ti requests
a data item currently held by Tj , Ti is allowed to wait only if it has a timestamp smaller than that of Tj (that is, Ti is older than Tj ). Otherwise, Ti is rolled back (dies).
Wound-Wait
The wound–wait scheme is a preemptive technique. It is a counterpart to the wait–die scheme. When transaction Ti requests a data item currently held by Tj , Ti is allowed to wait only if it has a timestamp larger than that of Tj (that is, Ti is younger than Tj ). Otherwise, Tj is rolled back (Tj is wounded by Ti).
Deadlock Avoidance
Deadlock Detection and Removal
Some of the methods used for victim selection are
Concurrency Control
In a multiprogramming environment where multiple transactions can be executed simultaneously, it is highly important to control the concurrency of transactions. We have concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent transactions. Concurrency control protocols can be broadly divided into two categories
Lock based protocol
This protocol requires that all the data items must be accessed in a mutually exclusive manner, i.e. when one transaction is executing then no other transaction should interrupt the same object.
To implement Lock based protocol we need 2 types of locks.
Shared Lock: When we take this lock we can just read the item but cannot write.
Exclusive Lock: In this type of lock we can write as well as read the data item.
Below table will give clear idea about what we can do and cannot while having shared or exclusive lock.
In general we have 2 types of locking protocols. Those are
Simple locking protocol
In this simply we have only two operations.
1. Lock the item before access that item.
Read operation (Lock-S (A))
Write operation & Read operation (Lock- X (A))
2. Unlock the item after access
Unlock-S (A)
Unlock-X (A)
Problem 1: Inconsistency in the database
So above we have shown one interleaving and we could understand the data flow by below example.
Output:
Let say in the start of the schedule A=100 and B=200 so in total we should have 300 always. But as we see that after Till Display (B+A) we have A=50 and B=200 to total=250 but it should be 300 to make consistency.
Problem 2: Deadlock Problem
Let’s say T2 is waiting for T1 to unlock (A) and T1 is waiting for T2 to unlock (B)
Two-Phase Locking protocol
A transaction is said to follow the Two-Phase Locking protocol if Locking and Unlocking can be done in two phases.
LOCK POINT: The Point at which the growing phase ends, i.e., when a transaction takes the final lock it needs to carry on its work
2 PL suffers from other problems such as cascading rollback and rollback.
Cascading Rollback Problem:
Failure of one Transaction leads to failure of its sub sequent Transactions. Here we could see cascading rollback by the following example
So if transaction T1 rollbacks then transaction T2 and T3 has to rollback.
Deadlock Problem Example
In 2 PL we have seen:
Serializability: It is guaranteed to happen.
Cascading Rollback: It is possible which is bad.
Deadlock: It is possible.
Strict Two-phase locking (Strict-2PL)
The first phase of Strict-2PL is like 2PL.
Strict 2-PL ensures that our schedule is:
Rigorous 2-PL
Rigorous 2-PL ensures that our schedule is:
The difference between
Strict 2-PL and Rigorous 2-PL is that Rigorous is more restrictive, it requires both Exclusive and Shared locks to be held until after the Transaction commits and this is what makes the implementation of Rigorous 2-PL easier.
Conservative 2-PL
The Venn Diagram below shows the classification of schedules that are rigorous strict and conservative.
Example 1:
Now we will see what is the above schedule following the properties discussed above
2 PL: There is growing and shrinking phase.
Strict 2 PL: There is Lock-x (B) and it is unlocked before commit so no strict 2 PL.
Rigorous 2 PL: If it is not strict 2 PL then it can’t be Rigorous.
Conservative 2 PL: If it is not strict 2 PL then it can’t be conservative.
Example 2:
2 PL: There is growing and shrinking phase so it is 2 PL.
Strict 2 PL: Exclusive locks are unlocked after commit. So yes it is.
Rigorous: We have unlocked all the locks after commit so it is rigorous.
Conservative: We have not taken all the locks at first then start the transaction so no
conservative.
Example 3:
2 PL: There is growing and shrinking phase so it is 2 PL.
Strict 2 PL: Exclusive locks are unlocked after commit. so it is strict.
Rigorous: We have unlocked all the locks after commit so it is rigorous.
Conservative: We have taken all the locks at first then start the transaction so it is conservative.
Time Stamp Protocol
Basic Timestamp ordering protocol works as follows:
Check the following condition whenever a transaction Ti issues a Read (X) operation:
Check the following condition whenever a transaction Ti issues a Write(X) operation:
Where,
TS(Ti) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Advantages and Disadvantages of TO protocol:
Thomas Write Rule
Timestamp Ordering Protocol states that if Ri(X) and Wj(X) are conflicting operations then Ri (X) is processed before Wj(X) if and only if TS(Ti) < TS(Tj). Whenever a schedule does not follow a serializability order according to the Timestamp, a user generally rejects it and rollback the Transaction.
Thomas Write Rule
Validation Based Protocol (Optimistic Concurrency)
validation based protocol executes transaction in three phases:
Read phase
In this phase, the transaction T is read and executed. It is used to read the value of various data items and stores them in temporary local variables. It can perform all the write operations on temporary variables without an update to the actual database.
Validation phase
In this phase, the temporary variable value will be validated against the actual data to see if it violates the serializability.
Write phase
If the validation of the transaction is validated, then the temporary results are written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
Multiple Granularity
Granularity: It is the size of data item allowed to lock.
Example
Hence, the levels of the tree starting from the top level are as follows:
�
There are three additional lock modes with multiple granularity:
Intention Mode Lock
Intention-shared (IS): It contains explicit locking at a lower level of the tree but only with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and some node is locked in exclusive mode by the same transaction.
Compatibility Matrix
It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts to lock a node, then that node must follow these protocols:
In multiple-granularity, the locks are acquired in top-down order, and locks must be released in bottom-up order.
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES)
ARIES algorithm follows three principles
write-ahead logging (WAL)
Repeating history during Redo
On restart after a crash, ARIES retraces the actions of a database before the crash and brings the system back to the exact state that it was in before the crash.
Logging changes during Undo
Changes made to the database while undoing transactions are logged to ensure such an action isn't repeated in the event of repeated restarts
Log sequence number
Dirty page table
The ARIES recovery procedure consists of three main steps:
Analysis
REDO
UNDO