Concurrent Execution of Database
Keywords: concurrent execution of transaction
Question 1: a) What is meant by concurrent execution of database in a multi-user system? Discuss why concurrency control is needed giving a suitable example. How the concept of Time stamp based and validation is based Protocols used in concurrency control.
Ans:- Multiple transactions are allowed to run concurrently in the system. Concurrent execution of database is meant by execution of database in parallel. I.e. each transaction must behave in isolation. This means that the concurrent execution does not result an inconsistent state. Ensuring consistency in spite of concurrent execution of transactions requires is very complex
Advantages are:
- increased processor and disk utilization, leading to better transaction throughput
- reduced average response time for transactions: short transactions need not wait behind long ones.
4 E.g. one transaction can be using the CPU while another is reading from or writing to the disk
Timestamp Protocol: Timestamp is a unique identifier to identify a transaction. Timestamp can be considered to be as the transaction start time. It determines the concurrent execution such that the timestamp determines the serializability order. Time stamp holds two timestamp values:
- W-timestamp() is the largest time-stamp of any transaction that executed write() successfully.
- R-timestamp () is the largest time-stamp of any transaction that executed read() successfully.
Suppose that transaction Ti issues write (Q).
If TS (Ti) < R-timestamp (Q), then the value of Q that Ti is producing was needed previously, and the system assumed that that value would never be produced. Hence, the write operation is rejected, and Ti is rolled back.
If TS (Ti) < W-timestamp (Q), then Ti is attempting to write an obsolete value of Q. Hence, this write operation is rejected, and Ti is rolled back.
Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).The timestamp-ordering protocol guarantees serializability since all the arcs in the precedence graph are of the form. Timestamp protocol ensures freedom from deadlock as no transaction ever waits.
Validation Protocol: The optimistic concurrency control techniques also known as validation or certification techniques, no checking is done while the transaction is executing. Several methods use the validation technique.
Execution of transaction Ti is done in three phases.
- Read and execution phase: Transaction Ti writes only to temporary local variables
- Validation phase: Transaction Ti performs a “validation test” to determine if local variables can be written without violating serializability.
- Write phase: If Ti is validated, the updates are applied to the database; otherwise, Ti is rolled back.
The three phases of concurrently executing transactions can be interleaved, but each transaction must go through the three phases in that order.
Each transaction Ti has 3 timestamps
- Start(Ti) : the time when Ti started its execution
- Validation(Ti): the time when Ti entered its validation phase
- Finish(Ti) : the time when Ti finished its write phase
- This protocol is useful and gives greater degree of concurrency if probability of conflicts is low. That is because the serializability order is not pre-decided and relatively less transactions will have to be rolled back.
Question 2: Why does deadlock occur in concurrent execution of transaction where locking scheme is followed? How can you detect deadlock in database system? How is the problem of deadlock resolved? Explain with the help of an example.
Ans: Although locks prevent serious data inconsistencies but their use may lead to a major problem. The schedule may create deadlocks. A database deadlock is caused when two transactions wait for each other to unlock data. Deadlocks can be managed by using deadlock detection and prevention techniques. A deadlock is a condition that occurs when two transactions wait for each other to unlock data. Deadlocks occur when two transactions T1 and T2 exist in the following mode:
- T1 locks data item X and it needs to access Y while Y is locked by T2
- T2 locks data item Y and it needs to access X while X is locked by T1
If T1 has not found unlocked data item Y and T2 needs Y, T2 cannot begin, if T2 has not unlocked data item X and T1 needs X, T1 cannot continue. Consequently T1 and T2 wait indefinitely, each waiting for the other to unlock the required data item.
Three basic techniques exist to control deadlocks:-
- Deadlock prevention
- Deadlock detection
- Deadlock avoidance
Deadlock prevention:-
A transaction requesting a new lock is aborted if there is a possibility that a deadlock can occur. If the transaction is aborted, all the changes made by this transaction are rolled back, and all locks obtained by the transaction are released. The transaction is then rescheduled for execution. Deadlocks prevention works because it avoids the condition that leads to deadlock. In deadlock prevention, timed out schemes are used.
Deadlock detection:-
The DBMS periodically test the database for deadlocks. If a deadlock is found one of the transactions is aborted and the other transaction continues. The aborted transaction will be rescheduled for execution later on. The system uses wait- for graph. The system is in deadlock state if and only if the wait-for graph has a cycle.
Deadlock avoidance:-
The transaction must obtain all the locks it needs before it can be executed. This technique avoids rollback of conflicting transaction by requiring that locks be obtained in succession. The serial lock assignment required in deadlock avoidance increases transaction response.
Example:-
Deadlock checking occurs when a transaction has waited 30 seconds; no lock wait timeouts occur.
Question 3: What is two phase locking? Describe with help of an example. Will two phase locking result in serialisable schedule? Will two phase locking result in deadlock? Justify your answer with the help of an example
Ans: Two phase locking defines how transactions acquire and relinquish locks. Two phase locking gurantees serializability but it does not prevent deadlocks. The two phases are:
- Growing phase in which a transaction acquires all the required locks without unlocking any data.
- Shrinking phase in which a transaction releases all locks and cannot obtain any new lock.
The two-phase locking protocol is governed by the following rules:
- Two transactions cannot have conflicting locks.
- No unlock operation can precede a lock operation in the same transaction.
- No data are affected until all locks are obtained that is until the transaction is in its locked point.
Yes, two phase locking will result in deadlock because when mutual blocking between transactions occurs then it results deadlock. When deadlock occur, the execution of these transaction cannot be completed. So deadlock need to be resolved for completion of these transactions.
Example:-
In this example, the transaction acquires all the locks it needs until it reaches its locked point. When the locked point is reached is reached, the data are modified to confirm to the transaction requirements. Finally, the transaction is completed as it releases all of the lock it acquired in the first phase.
Question 4: When do we use Log based recovery technique? Explain the write ahead log strategy for recovery in a centralized DBMS, with the help of an example.
Ans: It is mostly used to structure for recoding database modifications. In a log based recovery a a log file is maintained for recovery purpose. Log file is a sequence of log records. There are two techniques for log based recovery.
- Deferred database modification: it ensures transaction atomicity by recording all database modifications in the log, but deferring the execution of all write operations of transactions until it is partially committed.
- Immediate database modification: it allows database modification to be output to the database while the transaction is still in the active state.
The Write Ahead log is the basic rule that ensures that a record of every change to the database is available while attempting to recover from a crash. That is if a transaction makes a change and is committed, then the no force approach means that some of the changes may have not been written to the disk at the time of crash. Without any record of changes there is no way to ensure that the changes of a committed transaction survive crashes. For this before writing a page to the disk every update log record that describes a change to the page must be in a stable storage
Question 5: Assume that the Railway reservation system is implemented using an RDBMS. What are the concurrency control measures one has to take, in order to avoid concurrency related problems in the above system? How can the deadlock be avoided in this system? How can we implement recovery techniques in such a system?
Ans: When several transactions execute concurrently in the database, however there is a chance, the isolation property may no longer be preserved. To ensure that it is, the system must control the interaction among the concurrent transactions; this control is achieved through one of a variety of mechanisms called concurrency-control schemes.
TYPES OF CONCURRENCY CONTROL SCHEME
- Lock-Based Protocols
- Timestamp-Based Protocols
- Validation-Based Protocols
TO AVOID DEADLOCK:
- To ensure that no cyclic waits can occur by ordering the requests for locks.
- Deadlock recovery, and performs transaction rollback instead of waiting for a lock, whenever the wait could potentially result in a deadlock.
DEADLOCK RECOVERY TECHNIQUE-
- To maintain information about the current allocation of data items to transactions, as well as any outstanding data item requests.
- Provide an algorithm that uses this information to determine whether the system has entered a deadlock state.
- Recover from the deadlock when the detection algorithm determines that deadlock exists.
Question 6: How can we use the concept of shadow paging in database recovery of a real time application?
Ans:
- It is an alternative to log-based recovery techniques.
- It may require fewer disk accesses but it is hard to extend paging to allow multiple concurrent transactions.
- The idea is to maintain two page tables during the life of a transaction: the current page table and the shadow page table.
- When the transaction starts current and page tables are identical.
- The shadow page is never changed during the life of the transaction.
- The current page is updated with each write operation.
- Each table entry points to a page on the disk.
- When the transaction is committed the shadow page entry becomes a copy of the current page table entry and the disk block with the old data is released.
- If the shadow is stored in nonvolatile memory and a system crash occurs, then the shadow page table is copied to the current page table.
- This guarantees that the shadow page table will point to the database pages corresponding to the state of the database prior to any transaction that was active at the time of the crash, making aborts automatic.
It is used as an improvement to the shadow copy technique. The key idea of shadow paging is to maintain two page tables during the life of a transaction. Shadow paging helps in better recovery from crashes and also it does not require any undo/redo technique. The disadvantage of shadow paging is that it changes the disk address of the current page table and also copies the actual data block from the RAM to the hard disk output operation.
There are drawbacks to the shadow-page technique:
- Commit overhead.
- Data fragmentation.
- Garbage collection
The commit of a single transaction using shadow paging requires multiple blocks to be output — the current page table, the actual data and the disk address of the current page table. Log-based schemes need to output only the log records.
Shadow paging causes database pages to change locations.
Each time that a transaction commits, the database pages containing the old version of data changed by the transactions must become inaccessible. Such pages are considered to be garbage since they are not part of the free space and do not contain any usable information. Periodically it is necessary to find all of the garbage pages and add them to the list of free pages. This process is called garbage collection and imposes additional overhead and complexity on the system.