Concurrency Control In Distributed Database Systems Information Technology Essay

Concurrency control is a very important issue in distributed database system design. This is because concurrency allows many transactions to be executing simultaneously such that collection of manipulated data item is left in a consistent state. Database concurrency control permits users to access a database in a multiprogrammed fashion while preserving the illusion that user is executing alone on a dedicated system. Besides, it produces the same effect and has the same output on the database as some serial execution of the same transaction.

Concurrency control in distributed system is achieved by a program which is called scheduler. Scheduler help to order the operations of transaction in such a way that the resulting logs is serializable. There have two type of the concurrency control that are locking approach and non-locking approach. In term of locking approach, two-phase lock is widely used and purpose for centralized or distributed database system. Before distributed database systems accessing some part of database, it must adopt a locking mechanism such as each transaction has to obtain a lock. After that part is locked by other transaction, the access request will be block and the transaction who making reauest has to wait. Two-phase locking will cause the deadlock problem. This locking induces high communication cost because of the deadlock problem.

Whereas in term of non-locking approach which is divided into two type which are Timestamp ordering (TO) and Serialization Graph Testing (SGT). Timestamp ordering utilizes unique transaction timestamp for determining the serialization order. It divided into three different modes that are basic, conservative and optimistic modes. For Serialization Graph Testing, it has been most attractive log class until 1987 because it is known as the largest known class of serializable logs. There are some limitations that can found in those two approaches. However, there are some solution that can used to solve the problem and will be discuss more detail in the below.

2. Type and the issue of concurrency control

2.1 Two -phase locking

Two -phase locking is a widely used concurrency control technique to synchronize accesses to shared item. Each data item has a lock associated with it. The scheduler will first examine the associated lock before a transaction (T1) may access a data item. If there is no transaction holds the lock then the scheduler will obtains the lock on behalf of T1. When another transaction T2 hold the lock, the T1 has to wait until T2 to gives up the lock. The scheduler will give the T1 the lock after the T2 release the lock. Scheduler must ensure that there is only one transaction can hold the lock at a time and only one transaction can access data item at a time. They are two types of locks associate with data item that is read locks and write locks. Figure 1 below show that the transaction T1 and T2 follow the two-phase locking protocol.

Figure 1: Transaction T1 and T2 with 2-phase locking protocol

However, there is an important and unfortunate property of two-phase locking schedulers is that they are subject to deadlocks. For instances, the classic deadlock situation in which neither of two processes can proceed that is one must release a resource and the other one needs to proceed. Besides that, deadlock also arises when the transaction try to strengthen read locks to write locks. To ensure that there is no transaction is blocked forever, the scheduler needs a strategy for detecting the deadlock.

In addition, two-phase locking is suffer from sensible delays due to a node needs to send message to all nodes and must receive acknowledgement from all nodes to discover a deadlock. The messages will exchange between nodes to get decision of transaction’s commit. The whole system will stop since not discover the deadlock immediately and a natural of distributed system make deadlock treatment very difficult. Moreover, the message that due to commit and deadlock treatment make high traffic on network which is makes need of special kind of networking.

Time

Figure 2: A spatial schedules of T1 and T2 in state of deadlock

T1 wait for T2 to release read lock on y2

T3 wait for T1 to release read lock on x1

T2 wait for T3 to release read lock on z3

Figure 3: A wait-for graph for the spatial schedule of Figure 2

There are two type of graph that show as above which is a spatial schedule of T1 and T2 in the state of deadlock. Another graph show that the wait for graph for the spatial schedule of the figure 2 above. Each node of the graph represent the transaction, the edges represent the “waiting for” relationship.

Furthermore, concurrency control also deals with starvation. The starvation occurs when a particular transaction consistently waits or restarted and never gets a chance to proceed further. In a deadlock resolution it is possible the same transaction may consistently be selected as victim and rolled-back. This type of limitation is inherent in all priority based scheduling mechanism. The wound-wait scheme a younger transaction may always be aborted by a long running older transaction which may create starvation.

Besides that, the two phases in locking method also may cause to dirty read and cascading abort. The problem of dirty read is arises when assume T1 and T2 are executed interleave. For this condition, the T1 will get the exclusive locks for record. Supposedly, T1 release locks immediately after doing updates while T2 acquires that locks and does its update. If the T1 fail before it commits due to certain of reason, then it will return value to original value. T2 will continue to its execution with uncommitted data and this will cause an error on the system result. On the other hand, the problem of cascading abort arises when if a transaction aborts. There may have some other transaction already used data from an object that the aborted transaction modified it and unlocked it. If this happen, any such transaction will also have to be aborts.

2.2 Timestamp ordering technique

Timestamp is a monotonically increasing variable indicating the age of an operation or a transaction. The larger of the timestamp values indicate that more recent event or operation. Timestamp ordering technique concurrency mechanism were considered suitable for distributed database systems since transaction to be rolled back can be determined locally at each site. It involves a unique transaction timestamps in place of conventional locks. This technique is based on the idea that an operation is allowed to proceed only if all the conflicting operations of older transaction have already been processed. For instance, when a transaction accesses an item the system will check whether the transaction is older than the last one which is accesses the item. If this is the case that transaction proceeds, otherwise ordering is violated and the transaction is aborted. Besides that, the serializability of the transaction is preserved and requires knowledge about which transaction are younger than the others.

In the implementation of a distributed timing system, each site in the distributed system contains a local clock or a counter. This clock assumed to tick at least once between any two events. This event within the site is totally ordered. If the total ordering of event at different sites, it need to assigned a unique number to each site and the number is concatenated as least significant bits to the current value of local block. Moreover, each message contains the information about the local time of their site of origin at which the message is sent. There are many concurrency control method on timestamp ordering technique. One of the method is basic timestamp ordering (BTO) which is for each data item x, the largest timestamp of any write operation on data item x and the largest timestamp of any read operation on data item x which are denoted by R_TS (x) and W_TS (x) respectively.

The basic timestamp ordering technique is easy to distribute and the transaction to be aborted will immediately be recognized when the operation are being scheduled. This type of method will not deal with deadlock problem because the locks are not used and operations are blocked. Hence, an atomic commitment mechanism is necessary to provide the reliability.

3. Improvement of issue concurrency control

3. 1Deadlock resolution

The preceding implementation of two-phase locking makes the transaction to wait for unavailable locks. When the waiting is uncontrolled will cause a deadlock. The situation of deadlock can be characterized by wait-for graph, that graph indicates which transaction is waiting for which other transaction. There are three type of general technique are available for deadlock resolution that are deadlock detection, deadlock prevention and timeout strategies.

3.1.1 Deadlock Detection

Deadlock detection is a transaction wait for each other in an uncontrolled manner and is only aborted if a deadlock occurs. It can be detected by explicitly constructing a wait-for graph and searching it for cycles. Victim which is one transaction on the cycle will abort when cycle is found and thereby breaking the deadlock. There are a few victim selection criteria will be consider. First is the Current Blocker, this current blocker will pick the transaction that blocked the most recently. Secondly, Random Blocker which is a process of picks a transaction at random from the participants in the deadlock cycle. Third is the Min Locks that is to pick a transaction that is holding the fewest locks. Fourth is the Youngest which is picked the transaction with the most recent initial startup time. Lastly is the Min Work and responsible to pick the transaction that has consumed the least amount of physical resources (CPU + I/O time) since it first began running.

In order to minimize the cost of restarting the victim, usually the victim is based on the amount of resource that use by each of transaction on the cycle. Each of the two-phase locking’s scheduler can be easily construct the waits-for graph based on the waits-for relationships local to that scheduler. But this is not efficient to characterize all deadlocks in the distributed system. To increase the efficiency, more “global” wait-for graphs have to combine with local waits-for graph. For the centralized two-phase locking will not have this type of problem since there only consist of one scheduler.

In order to construct global waits-for graph, there consist of two techniques which are hierarchical and centralized deadlock detection. For the hierarchical deadlock detection, the database sites are organized into hierarchy with deadlock detector to each node of the hierarchy. Deadlock divided into many sites. Deadlock involve a single site are detected at that. Whereas deadlock involve more than two sites of the same region detected by the regional deadlock detector.

On the other hand, one site is designated the deadlock detector in the centralized deadlock detection. Every few minutes, each scheduler has to send its local wait-for graph to the deadlock detector. Then the deadlock detector combines the local graph into a system wide waits-for graph by constructing the union of the local graph. Although both of the technique mention above differ in detail but it involve periodic transmission of local waits-for information to one or more deadlock detector sites.

3.1.2Deadlock Prevention

Deadlock prevention is a “cautious” scheme in which transaction is restarted when the system is afraid that deadlock to be occurring. In the process of deadlock prevention, the scheduler will test the requesting transaction which is name (T1) and the transaction that currently owns by the lock (T2) when a lock request is denied. When T1 and T2 pass the test, T1 is permitted to wait for T2 as usual. Otherwise, one of the two is aborted. There are a few prevention algorithms that are Wound-Wait, Wait-Die, Immediate-Restart and Running Priority. In the Wait-Die prevention algorithm, if a lock request from transaction T1 leads to a conflict with another transaction T2, then T1 started time earlier than T2 and block the T1 otherwise will restarted T1. The deadlock prevention will know as nonpreemptive if T1 is restarted. By this technique deadlock are impossible since there is only one transaction can be blocked by a younger transaction.

In the Wound-Wait algorithm, if T1 started running before T2 then restarted T2 otherwise blocked T1. This type of algorithm is known as preemptive which is older transaction run though the system by killing any one that they conflict with and continues waiting onlt for older conflicting transaction. The scheduler must ensure that T1 wait for T2 so that deadlock cannot occur. There is a better approach can be assign priorities to transaction and to test priorities to decide whether T1 can wait for T2. If T1 has the lower priority than T2 then T1 will wait for T2. When they have same priorities, T1 cannot wait for T2 or vice versa. This test will prevents the deadlock to occur since every edge in the waits-for graph T1 has a lower priority than T2. In addition, the Immediate-Restart algorithm also will overcome the deadlock by simply restart T1 since there is no transaction is ever blocked.

Besides that, preordering of resources is a type of deadlock avoidance technique that used to avoid restarts altogether. It requires predeclaration of locks which mean that each transaction obtains all its locks before execution. The priority of transaction is the number of the highest number lock on it and the data item are numbered and each transaction request locks one at a time in numeric order. Although this techniques can avoid the deadlock occur but it forces locks to be obtained sequentially which is tends to increase response time.

3.1.3 Timeout strategic

In the timeout strategic, a transaction whose lock request cannot be granted is simply placed in the blocked queue. When the wait time exceeds some threshold value then the transaction will restarted. Timeout will restart transaction that involves in deadlocks in the detection strategies whereas in prevention strategic it may also restarted some transaction that are not involved in any cycle.

3.2 Strict 2 Phase Locking (S2 PL)

In order to prevent cascading problem and dirty read, Strict 2 Phase Locking mechanism is apply. The executed transaction holds all its lock to the every end until it is committed or aborts. For the dirty read, the executed transaction does not release any of exclusive locks until it commits or aborts. S2PL has important role in the two phase locking problems. In first time transaction it need a lock on a data item, acquires it and all locks of a transaction released together when the transaction terminates and has a few resource wasting. Figure 4 below show the S2PL mechanism.

Figure 4: S2PL mechanism

3.3 Ordering by Serialization Number (OSN) method

There exists a new method for concurrency control in distributed system which increases the level of concurrent execution of transaction and known as ordering by serialization number (OSN). The deadlock is prevented by this method since it works in the certifier mode and uses time interval technique in conjunction. In order to provide serializability, it combines with time interval technique with short term locks. Scheduler is distributed and the standard transaction execution policy is assumed. The read and write operations are issued continuously during transaction execution. The write operations are performed on the private copies of the data that issue by the transaction. The transaction is certified and its operation will applied to database when a validation test is passed by the transaction at the end of transaction. Otherwise, the transaction will restart if the test validation is not passed.

To find the serialization number for a transaction, the largest serialization number of certified transaction will read item x. The largest serialization number of certified transaction which have written item x will record along the data item x which is known as RSN(x) and WSN(x). There consist of four types of short term locks in this OSN method that are R-lock, W-lock, Certified Read lock, Certified Write lock. The R-lock and W-lock normally used when read or write the data item. It responsible to protect the data item forms two conflicting operations of concurrent transaction. Whereas Certified Read lock (CR-lock) and Certified Write lock (CW-lock) are used during the validation test while a transaction is searching for a valid time interval. These lock are held until either the transaction’s operation are applied to the database or the transaction is aborted.

The deadlock of short term locks is prevented by a few steps. Firstly, R-lock or W-lock is obtained at any order. Since transaction do not require any other lock before releasing a particular R-lock or W-lock deadlock does not appear. Secondly, employing the preordering for deadlock avoidance can prevent the deadlock of the CR-lock and CW-lock respectively. Although this method requires the access set of transaction to known in advance but this disadvantage eliminated in the OSN method. This is because CR-lock and CW-lock requires by the time of certification. Besides that, if a transaction requires both CR-lock and CW-lock on data item x the system will grants them both when both of these locks are requires at once.

4. Conclusion

In this study, the performance of concurrency control protocol on top of a comprehensively modeled high-speed network is evaluated. It found that there exist of some issue within the concurrency protocol such as deadlock and starvation in the two-phase locking. The weaknesses that occur in another concurrency protocol that is timestamp ordering.

By the way, there are also discussing about the effective way to make improvement of the concurrency control by using three types of deadlock resolution to solving the deadlock problem. The deadlock resolution included deadlock detection, deadlock prevention and timeout strategic. Furthermore, a precise definition of deadlock in term of existence interval of an edge is given and these help provide a better understanding of deadlock detection is distributed system. Besides, this paper also discuss the complete specification of a secure version of two phase locking protocol. The interaction between the protocol and the standard failure recovery procedures were discuss and modification to the commit and restart procedures were proposed.

5. Reference

Abdou R. Ali and Hany M. Harb,” Two Phase Locking Concurrency Control in Distributed Database with N-Tier architecture”, 2004.

S. M. Wu and D. L. Pan, “FUTURE TREND ON CONCURRENCY CONTROL IN DISTRIBUTED DATABASES SYSTEM DESIGN “, 1988.

Sang H. Son and Rasikan David, “Design and Analysis of A Secure Two-Phase Locking Protocol”, 1994

Shapour Joudi Begdillo, Fariborz Mahmoudi and Mehdi Asadi, “Improving Strict 2 Phase Locking (S2PL) in Transactions Concurrency Control”, 2007

PHILIP A. BERNSTEIN AND NATHAN GOODMAN, “Concurrency Control in Distributed Database Systems”, June 1981

RAKESH AGRAWAL, MICHAEL J. CAREY, MEMBER, IEEE, AND LAWRENCE W. McVOY, ” The Performance of Alternative Strategies for Dealing with Deadlocks in Database Management Systems”, 1987

UGUR HALICI AND ASUMAN DOGAC,MEMBER,IEEE, “Concurrency Control in Distributed Databases Through Time Intervals and Short-Term Locks”,1989

Order Now