CSS 430 Book Notes Chapter 8 Deadlocks

CSS 430 Book Notes

8.1 System Model

When a waiting thread can never change state because a resource it requested is held by other waiting threads, the situation is called a deadlock.

Info

deadlock: The state in which two processes or threads are stuck waiting for an event that can only be caused by one of the processes or threads.

A system consists of a finite number of resources to be distributed among a number of competing threads.

The resources may be partitioned into several types, for instance, if a system has four CPUs, then the resource type CPU has four instances.

If a thread requests an instance of a resource type, the allocation of any instance of the type should satisfy the request. If it does not, then the instances are not identical, and the resource type classes have not been defined properly.
A thread must request a resource before using it and must release the resource after using it. A thread may request as many resources as it requires to carry out its designated task. Obviously, the number of resources requested may not exceed the total number of resources available in the system. In other words, a thread cannot request two network interfaces if the system has only one.

Under the normal mode of operation, a thread may utilize a resource in only the following sequence:

  1. Request. The thread requests the resource. If the request cannot be granted immediately (for example, if a mutex lock is currently held by another thread), then the requesting thread must wait until it can acquire the resource.
  2. Use. The thread can operate on the resource (for example, if the resource is a mutex lock, the thread can access its critical section).
  3. Release. The thread releases the resource.
    Developers of multithreaded applications must remain aware of the possibility of deadlocks. The locking tools presented in the chapter Synchronization Tools are designed to avoid race conditions. However, in using these tools, developers must pay careful attention to how locks are acquired and released. Otherwise, deadlock can occur, as described next.

8.2 Deadlock in multithreaded applications

Here we illustrate an example of deadlock using Pthreads

Two mutex locks are initialized

pthread_mutex_t first_mutex;
pthread_mutex_t second_mutex;
 
pthread_mutex_init(&first_mutex,NULL);
pthread_mutex_init(&second_mutex,NULL);

Next, two threads, thread_one and thread_two are created and both these have access to both mutex locks. thred_one and thread_two run in the functions do_work_one() and do_work_two()

/* thread_one runs in this function */
void *do_work_one(void *param)
{
   pthread_mutex_lock(&first_mutex);
   pthread_mutex_lock(&second_mutex);
   /**
     * Do some work
     */
   pthread_mutex_unlock(&second_mutex);
   pthread_mutex_unlock(&first_mutex);
 
   pthread_exit(0);
}
 
/* thread_two runs in this function */
void *do_work_two(void *param)
{
   pthread_mutex_lock(&second_mutex);
   pthread_mutex_lock(&first_mutex);
   /**
    * Do some work
    */
   pthread_mutex_unlock(&first_mutex);
   pthread_mutex_unlock(&second_mutex);
 
   pthread_exit(0);
}

Here, thread_one attempts to acquire first_mutex and then second_mutex while thread_two attempts to acquire second_mutex then first_mutex. If thread_one acquires first_mutex and thread_two acquires second mutex, deadlock can occur

Livelock

livelock is similar to deadlock in that it prevents two or more threads from proceeding. Livelock occurs when a thread continuously attempts an action that fails. (think people trying to pass where they keep moving to the same side.)

/* thread_one runs in this function */
void *do_work_one(void *param)
{
   int done = 0;
 
   while (!done) {
      pthread_mutex_lock(&first_mutex);
      if (pthread_mutex_trylock(&second_mutex)) {
         /**
          * Do some work
          */
         pthread_mutex_unlock(&second_mutex);
         pthread_mutex_unlock(&first_mutex);
         done = 1;
     }
     else
         pthread_mutex_unlock(&first_mutex);
   }
   pthread_exit(0);
}
 
/* thread_two runs in this function */
void *do_work_two(void *param)
{
   int done = 0;
 
   while (!done) {
     pthread_mutex_lock(&second_mutex);
     if (pthread_mutex_trylock(&first_mutex)) {
         /**
          * Do some work
          */
         pthread_mutex_unlock(&first_mutex);
         pthread_mutex_unlock(&second_mutex);
         done = 1;
     }
     else
         pthread_mutex_unlock(&second_mutex);
   }
 
   pthread_exit(0);
}

8.3 Deadlock Characterization

Necessary Conditions

A deadlock situation can arise if the following four conditions hold simultaneously in a system:

  1. Mutual Exclusion: At least one resource must be held in a nonsharable mode; that is, only one thread at a time can use the resource. If another thread requests that resource, the requesting thread must be delayed until the resource has been released.
  2. Hold and Wait: A thread must be holding at least one resource and waiting to acquire additional resources that are currently being held by other threads.
  3. No preemption: Resources cannot be preempted. IE, a resource can be released only voluntarily by the thread holding it after the thread completes its task
  4. Circular Wait: A set {T0, T1,...TN} of waiting threads must exist such that T0 is waiting for a resource held by T1T1 is waiting for a resource held by T2, …, TN1 is waiting for a resource held by TN, and TN is waiting for a resource held by T0.

In order for dead lock to occur, all four of the above must be true.

Resource Allocation Graph

Deadlocks can be described more precisely in terms of a directed graph called a system resource-allocation graph.

This graph consists of a set of V vertices and a set of edges E. The set of vertices V is partitioned into two different types of nodes, T=T1,T2,...TN, the set consisting of all active threads in the system, and R=R1,R2,...RM, the set of all resource types in the system.

A directed edge from thread Ti to resource type Ri is denoted by TiRJ. it signifies that thread Ti has requested an instance of resource type Rj and is currently waiting for that resource. A directed edge of from resource Rj to thread Ti is denoted by RjTi. It signifies that an instance of resource type Rj has been allocated to thread Ti. A directed edge TiRj is called a request edge and a directed edge RjTi is called an assignment edge.

Generally we represent each thread Ti as a circle and each resource type Rj as a rectangle

Since resource Rj may have more than one instance, we represent each instance as a dot within the rectangle. A request edge only points to the rectangle, however, an assignment edge points from one of the dots(resources) in the rectangle.
Pasted image 20230213192632.png

  1. thread_one acquires first_mutex.
  2. thread_two acquires second_mutex.
  3. thread_one wants second_mutex, which could be freed up by thread_two.
  4. thread_two wants first_mutex, and as it is still holding second_mutex which thread_one is waiting for, there is a cycle in the graph: first_mutex → T1 → second_mutex → T2 → first_mutex.

When thread Ti requests an instance of resource type Rj, a request edge is inserted in the resource allocation graph. When this request can be fulfilled, the request edge is instantaneously transformed to an assignment edge.
Once a thread finishes with a resource, the assignment edge is deleted.

Example

The sets T, R, E:

Resource Instances:
- One instance of resource type R1
- Two instances of resource type R2
- One instance of resource type R3
- Three instances of resource type R4

Thread States:

if the graph contains no cycles, then no thread in the system is deadlocked. If the graph does contain a cycle, then a deadlock may exist.

If each resource type has exactly one instance, then a cycle implies that a deadlock has occurred. If the cycle involves only a set of resource types, each of which has only a single instance, then a deadlock has occurred. Each thread involved in the cycle is deadlocked. In this case, a cycle in the graph is both a necessary and a sufficient condition for the existence of deadlock.

If we take the above graph, and add an instance where T3 requests an instance of type R2, we will add edge T3R2. This creates two possible cycles in the graph

  1. T1 → R1 → T2 → R3 → T3 → R2 → T1
  2. T2 → R3 → T3 → R2 → T2

Threads T1, T2, and T3 are deadlocked. T2 is waiting for R3, which is held by T3. T3 is waiting for T1 or T2 to release resource R2. Additionally, T1 is waiting for T2 to release R1
Pasted image 20230213194400.png

In the following however, there is no deadlock. There is a cycle, however T4 may release its instance of resource R2 which can then be allocated to T3.

Methods for handling deadlocks

We can deal with deadlocks in one of three ways:

  1. We can ignore the problem altogether and pretend that deadlocks never occur in the system.
  2. We can use a protocol to prevent or avoid deadlocks, ensuring that the system will never enter a deadlocked state.
  3. We can allow the system to enter a deadlocked state, detect it, and recover.
    Most operating systems (including Windows and Linux) use option 1.

Deadlock prevention provides a set of methods to ensure that at least one of the necessary conditions (Section Necessary conditions) cannot hold. These methods prevent deadlocks by constraining how requests for resources can be made.

Deadlock avoidance requires that the operating system be given additional information in advance concerning which resources a thread will request and use during its lifetime. With this additional knowledge, the operating system can decide for each request whether or not the thread should wait.

8.5 Deadlock Prevention

If we prevent one of the four necessary conditions for deadlocks(mutual exclusion, hold and wait, no preemption, and circular wait), we can prevent deadlock from occurring.

Mutual exclusion

In general, we cannot prevent deadlocks by denying the mutual-exclusion condition, because some resources are intrinsically nonsharable. For example, a mutex lock cannot be simultaneously shared by several threads.

Hold and Wait

To ensure that the hold-and-wait condition never occurs in the system, we must guarantee that, whenever a thread requests a resource, it does not hold any other resources.
One way we can do this is require that each thread be allocated all its resources before it begins execution, however this is difficult to implement

Another method is allowing a thread to request resources only when it has none. A thread can request some resources, uses them, but must release them before it requests more.
Both these protocols have two main disadvantages. First, resource utilization may be low, since resources may be allocated but unused for a long period. For example, a thread may be allocated a mutex lock for its entire execution, yet only require it for a short duration. Second, starvation is possible. A thread that needs several popular resources may have to wait indefinitely, because at least one of the resources that it needs is always allocated to some other thread.

No preemption

If a thread is holding some resources and requests another resource that cannot be immediately allocated to it (that is, the thread must wait), then all resources the thread is currently holding are preempted. In other words, these resources are implicitly released. The preempted resources are added to the list of resources for which the thread is waiting. The thread will be restarted only when it can regain its old resources, as well as the new ones that it is requesting.
Alternatively, if a thread requests some resources, we first check whether they are available. If they are, we allocate them. If they are not, we check whether they are allocated to some other thread that is waiting for additional resources. If so, we preempt the desired resources from the waiting thread and allocate them to the requesting thread. If the resources are neither available nor held by a waiting thread, the requesting thread must wait. While it is waiting, some of its resources may be preempted, but only if another thread requests them. A thread can be restarted only when it is allocated the new resources it is requesting and recovers any resources that were preempted while it was waiting.

Circular Wait

Consider the set R=R1,R2,...RM as the set of resource types. We assign each resource type a unique integer number. This allows us to compare two resources and determine which one precedes another in our ordering. We define a one-to-one function F:RN where N is the set of natural numbers.

We can accomplish this scheme in an application program by developing an ordering among all synchronization objects in the system. For example, the lock ordering in the Pthread program shown below could be:

F(first_mutex)=1;
F(second_mutex)=1;

We now can use the following protocol to prevent a deadlock. Each thread can request resources only in an increasing order of enumeration. Or, a thread can initially request an instance of a resource, say Ri. After that, the thread can request an instance of Rj if and only if F(Rj)>F(Ri)

Using the function defined above, a thread which wants to use both first_mutex and second_mutex at the same time must first request first_mutex and then second_mutex.

We can also require thread requesting an instance of Rj must have released any resources Ri such that F(Ri)F(Rj). If several instances of the same resource type are needed, a single request for them must be issued.

If these two protocols are used, circular-wait cannot hold. This can be demonstrated by assuming that a circular wait exists. If the set of threads involved in the circular wait are T1,T2,...TN where Ti is waiting for a resource Ri which is held by thread (modulo arithmetic is used on the indexes so that TN is waiting for resource RN held by T0) Ti+1, then since Ti+1 is holding resource Ri while requesting resource Ri+1, we must have F(Ri)<F(Ri+1 for all i. This condition means that F(R0)<F(R1)<...<F(RN)<F(R0). Since F(R0) cannot be less than F(R0) we know this circular wait cannot hold true.

It is also important to note that imposing a lock ordering does not guarantee deadlock prevention if locks can be acquired dynamically. For example, assume we have a function that transfers funds between two accounts. To prevent a race condition, each account has an associated mutex lock that is obtained from a get_lock() function such as that shown below. Deadlock is possible if two threads simultaneously invoke the transaction() function, transposing different accounts. That is, one thread might invoke

Deadlock with lock ordering
void transaction(Account from, Account to, double amount)
{
  mutex lock1, lock2;
  lock1 = get_lock(from);
  lock2 = get_lock(to);
 
  acquire(lock1);
    acquire(lock2);
 
      withdraw(from, amount);
      deposit(to, amount);
 
    release(lock2);
  release(lock1);
}

8.6 Deadlock Avoidance

An alternative method for preventing deadlocks is to require additional info about how resources are to be requested. For example, in a system with resources R1 and R2, the system might need to know that thread P will request first R1$ and then R2before releasing both resources, whereas thread Q will request R2and thenR1.

With this knowledge of the complete sequence of requests and releases for each thread, the system can decide for each request whether or not the thread should wait in order to avoid a possible future deadlock. Each request requires that in making this decision the system consider the resources currently available, the resources currently allocated to each thread, and the future requests and releases of each thread.

The simplest and most useful model requires that each thread declare the maximum number of resources of each type that it may need. Given this, it's possible to create an algorithm that ensures that the system will never enter a deadlocked state.

Deadlock avoidance algorithms dynamically examines the resource allocation state to ensure that a circular-wait condition can never exist. The resource allocation state is defined by the number of available and allocated resources and the maximum demands of the threads

Safe State

A state is safe if the system can allocate resources to each thread (up to its maximum) in some order and still avoid a deadlock.

A system is in a safe state only if there exists a safe sequence. A sequence such as <T1,T2,...TN> is safe for the current allocation state if for eacht Ti the resource requests that Ti can still make can be satisfied by the currently available resources plus the resources held by all Tj with j<i. If the resources that Ti needs are not immediately available, then Ti can wait until all Tj have finished. When they're done, Ti can obtain all of its needed resources, do critical work, then terminate. When Ti terminates, Ti+1 can obtain its needed resources and so on.

An unsafe state may lead to a deadlock, but doesn't necessarily do so.

Safe, Unsafe, and Deadlocked State Spaces

Pasted image 20230214145832.png

For example, lets say you have 12 resources and three threads, <T0,T1,andT2>. Thread T0 requires a maximum of 10 resources, T1 a maximum of four, and T2 a max of 9. Suppose that at time t0, thread T0 is holding 5 resources, T1 is holding 2, and T2 is holding 2, leaving us with 3 free resources (5+2+2=9).

Maximum Needs Current Needs
T0 10 5
T1 4 2
T2 9 2

at time t0 the system is in a safe state. The sequence <T0,T1,T2> satisfies the safety condition. Thread T1 can be allocated 2 resources, complete it's task at which point there will be 5 available resources. thread T0 can then get it's necessary 5 resources, complete it's task, and return them which would give us 10 available resources. Finally T2 can acquire its needed resources and return them giving the system all 12 resources back.

A system can go from safe to unsafe pretty easily. For instance, if at time t1 thread T2 is allocated one more resource, there will not be enough resources to complete task T0 after T1 completes
Additionally, If thread T2 had requested and been allocated 6 resources, we would have had to wait, which would have resulted in deadlock. Therefore the mistake is in granting additional resources to thread T2. Instead we should have made it wait.

Resource-allocation-graph-algorithm.

If we have a resource-allocation system with only one instance of each resource type, we can use a variant of the resource-allocation graph for deadlock avoidance. In addition to the request and assignment edges already described, we can introduce a new type of edge, called a claim edge. A claim edge TiRj indicates a thread Ti may request resource Rj at some time in the future. This edge resembles a request edge in direction but is represented in the graph by a dashed line. When thread Ti requests resource Rj the claim edge TiRj is converted to a request edge. Similarly, when resource Rj is released by Ti the assignment edge RjTi is reconverted to a claim edge TiRj.

The resources must be claimed a priori in the system, or before thread Ti starts executing, all its claim edges must already appear in the resource allocation graph. We can relax this condition by allowing claim edge TiRj to be added to the graph only if all the edges associated with thread Ti are claim edges.

Suppose that thread Ti requests Rj, the request can be granted only if converting the edge TiRj to an assignment edge RjTi does not result in the formation of a cycle in the resource-allocation graph.We check for safety by using a cycle-detection algorithm. An algorithm for detecting a cycle requires an order of n2 operations where n is the number of threads in the system.

If no cycle exists, the allocation of resources will leave the system in a safe state. If a cycle is found, the allocation will put the system in an unsafe state. in that case, thread Ti will have to wait for its request to be satisfied

Example:

Pasted image 20230214152919.png
Suppose that T2 requests R2. Although R2 is currently free, we cannot allocate it to T2 since this action will create a cycle in the graph. A cycle indicates that the system is in an unsafe state. If T1 requests R2 and T2 requests R1 then a deadlock will occur

An unsafe state in the resource-allocation graph

Pasted image 20230214153408.png

Banker's Algorithm

The resource-allocation-graph algorithm is not applicable to a resource-allocation system with multiple instances of each resource type. The deadlock-avoidance algorithm that we describe next is applicable to such a system but is less efficient than the resource-allocation graph scheme. This algorithm is commonly known as the banker's algorithm.
When a new thread enters the system, it must declare the maximum number of instances of each resource type that it may need. This number may not exceed the total number of resources in the system. When a user requests a set of resources, the system must determine whether the allocation of these resources will leave the system in a safe state. If it will, the resources are allocated; otherwise, the thread must wait until some other thread releases enough resources.

To implement the banker's algorithm, we need to use several data structures. These data structures encode the state of the resource allocation system. In the following data structures, n is the number of threads in the system and m is the number of resources.

To simplify the presentation of the banker's algorithm, we can establish some notation. Let X and Y be vectors of length n. We say that XY if and only if X[i]Y[i] for all i=1,2,...n. For example, if X=(1,7,3,2) and Y=(0,3,2,1 then YX. In addition, Y<X if YX and YX

We can treat each row in the matrices Allocation and Need as vextors and refer to them as Allocationi and Needi The vector Allocationi specifies the resources currently allocated to Ti . The vector Needi specifies the additional resources that the thread T_i may still request to complete its task.

Safety Algorithm

We can now present the algorithm for finding out whether or not a system is in a safe state. This algorithm can be described as follows:

  1. Let Work and Finish be vectors of length m and n, respectively. Initialize Work = Available and Finish[i] = false for i = 0, 1, …, n− 1.

  2. Find an index i such that both

    1. Finish[i] == false

    2. Needi ≤ Work
      If no such i exists, go to step 4.

  3. Work = Work + Allocationi Finish[i] = true Go to step 2.

  4. If Finish[i] == true for all i, then the system is in a safe state.

This algorithm may require an order of m×n2 operations to determine whether a state is safe.

Resource Request Algorithm

Next, we describe the algorithm for determining whether requests can be safely granted. Let Requesti be the request vector for thread Ti. If Requesti[j] == k, then thread Ti wants k instances of resource type Rj. When a request for resources is made by thread Ti, the following actions are taken:

  1. If RequestiNeedi go to step 2. Otherwise, raise an error condition since the thread has exceeded its maximum claim.
  2. If RequestiAvailable go to step 3, Otherwise Ti must wait since the resources are not available.
  3. Have the system pretend to have allocates the requested resources to thread Ti by modifying the state as follows
Available = Available-Requesti
Allocationi = Allocationi + Requesti
Need = Needi-Requesti

If the resulting resource allocation state is safe, the transaction is completed and the thread Ti is allocated its resources. However, if the new state is unsafe, then Ti must wait for Requesti and the old resource allocation state is restored.

Example:

To illustrate the use of the banker's algorithm, consider a system with five threads, T0 through T4, and three resource types, A,B, and C. Resource type A has ten instances, resource type B has five instances, and resource type C has seven instances. The following snapshot represents the state of the system.

Allocation Max Available
ABC ABC ABC
T0 010 753 332
T1 200 322
T2 302 902
T3 211 222
T4 002 433

The content of the matrix Need is defined to be Max-Allocationand is as follows:

Need
ABC
T0 743
T1 122
T2 600
T3 011
T4 431

We claim that the system is currently in a safe state. Indeed, the sequence <T1,T3,T4,T2,T0> satisfies the safety criteria. Suppose now that thread T1 requests one additional instance of resource type A and two instances of resource type C, so Requesti=(1,0,2). To decide whether this request can immediately be granted, we first check that RequestiAvailable, that is that (1,0,2)(3,3,2) which is true. We then pretend that this request has been fulfilled, and we arrive at the following new state:

Allocation Need Available
ABC ABC ABC
T0 010 743 230
T1 200 020
T2 302 600
T3 211 011
T4 002 431

We must determine whether this new system state is safe. To do so, we execute our safety algorithm and find that the sequence <T1,T3,T4,T2,T0> satisfies the safety requirement, Hence we can immediately grant the request of threat T1.

However, we can see, that when the system is in this state, a request for (3,3,0) by T4 cannot be granted since the resources are not available. Furthermore, a request for (0,2,0) by T0 cannot be granted, even though the resources are available since the resulting state is unsafe.

8.7 Deadlock Detection

If a system does not employ either a deadlock-prevention or deadlock avoidance algorithm, then a deadlock situation may occur. To solve for this, the operating system may provide:

Single instance of each resource type

If all resources have only a single instance, we can define a deadlock detection algorithm that uses a variant of the resource allocation graph called a wait-for graph. We obtain this graph from the resource allocation graph by removing the resource nodes and collapsing the appropriate edges.

An edge from Ti to Tj in a wait for graph implies that thread Ti is waiting for thread Tj to release a resource that Tineeds. An edge, TiTj exists in a wait-for graph if and only if the corresponding resource allocation graph contains two edges, TiRq and RqTj for some resource Rq

Resource allocation graph(a) and corresponding wait-for graph(b)

Pasted image 20230215204544.png

A deadlock exists in the system if and only if the wait-for graph contains a cycle. To detect deadlocks, the system needs to maintain the wait-for graph and periodically invoke an algorithm that searches for a cycle in the graph. this is an O(n2) operation where n is the number of vertices in the graph.

Several instances of a resource type

The wait-for graph scheme is not applicable to a resource-allocation system with multiple instances of each resource type, however we can use an algorithm similar to the banker's algorithm to deal with this

  1. Let Work and Finish be vectors of length m and n respectively. Initialize Work=Available. For i=0,1,...n if Allocation0, Finish[i]=false otherwise Finish[i]=true
  2. Find an index i such that both
    • Finish[i]==false
    • RequestiWork
      if no such i exists, go to step 4
  3. Work=Work+Allocationi,Finish[i]=true go to step 2
  4. If Finish[i]==false for some i, 0i<n then the system is in a deadlocked state. If Finish[i]==false, then thread Ti is deadlocked.

This algorithm requires m×n2 operations to detect whether the system is in a deadlocked state.

Recovery From Deadlock

There are two options for breaking a deadlock. One is simply to abort one or more threads to break the circular wait. The other is to preempt some resources from one or more of the deadlocked threads.

Process and thread termination

Elimination by abortion can use one of two methods. In both methods, the system reclaims all resources allocated to the terminated processes.

Aborting a process may have repercussions. If the process was, say, updating a file, it may leave the file in an incorrect state. Additionally, if a process was holding a mutex, the system must restore the status of the mutex as being available.

If aborting one process at a time, the system must determine which process or processes should be terminated. Ideally, the system should abort the process whose termination will incur the minimum cost, unfortunatley many factors play into what may constitute minimum cost.

  1. What the priority of the process is
  2. How long the process has computed and how much longer the process will compute before completing its designated task
  3. How many and what types of resources the process has used (for example, whether the resources are simple to preempt)
  4. How many more resources the process needs in order to complete
  5. How many processes will need to be terminated

Resource Preemption

To eliminate deadlocks using resource preemption, we successively preempt some resources from processes and give these resources to other processes until the deadlock cycle is broken.

If preemption is required to deal with deadlocks, then three issues need to be addressed:

  1. Selecting a victim. Which resources and which processes are to be preempted? As in process termination, we must determine the order of preemption to minimize cost. Cost factors may include such parameters as the number of resources a deadlocked process is holding and the amount of time the process has thus far consumed.
  2. Rollback.
    If we preempt a resource from a process, what should be done with that process? Clearly, it cannot continue with its normal execution; it is missing some needed resource. We must roll back the process to some safe state and restart it from that state.
    Since, in general, it is difficult to determine what a safe state is, the simplest solution is a total rollback: abort the process and then restart it. Although it is more effective to roll back the process only as far as necessary to break the deadlock, this method requires the system to keep more information about the state of all running processes.
  3. Starvation.
    How do we ensure that starvation will not occur? That is, how can we guarantee that resources will not always be preempted from the same process?
    In a system where victim selection is based primarily on cost factors, it may happen that the same process is always picked as a victim. As a result, this process never completes its designated task, a starvation situation any practical system must address. Clearly, we must ensure that a process can be picked as a victim only a (small) finite number of times. The most common solution is to include the number of rollbacks in the cost factor.