1 of 82

CSE 451

Operating Systems

L12 - Deadlocks

Slides by: Tom Anderson

Baris Kasikci

2 of 82

Dining Lawyers

Each lawyer needs two chopsticks to eat. Each grabs chopstick on the right first.

3 of 82

Necessary Conditions for Deadlock

  • Limited access to resources
    • If infinite resources, no deadlock!
  • No preemption
    • If resources are virtual, can break deadlock
  • Multiple independent requests
    • “wait while holding”
  • Circular chain of requests

4 of 82

Question

  • Does Dining Lawyers meet the necessary conditions for deadlock?
    • Limited access to resources
    • No preemption
    • Multiple independent requests (wait while holding)
    • Circular chain of requests

  • Can we modify Dining Lawyers to prevent deadlock?

5 of 82

Preventing Deadlock

  • Exploit or limit program behavior
    • Limit program from doing anything that might lead to deadlock
  • Predict the future
    • If we know what program will do, we can tell if granting a resource might lead to deadlock
  • Detect and recover
    • If we can rollback a thread, we can fix a deadlock once it occurs

6 of 82

Exploit or Limit Behavior

  • Provide enough resources
    • How many chopsticks are enough?
  • Eliminate wait while holding
    • Release lock when calling out of module
    • Telephone circuit setup gives busy signal
  • Eliminate circular waiting
    • Lock ordering: always acquire locks in a fixed order
    • Example: move file from one directory to another

7 of 82

Question

  • How can we use resource ordering to eliminate deadlock in dining lawyers?

8 of 82

Example

Thread 1

1. Acquire A

2.

3. Acquire C

4.

5. If (maybe) Wait for B

Thread 2

1.

2. Acquire B

3.

4. Wait for A

How can we make sure to avoid deadlock?

9 of 82

Deadlock Dynamics

  • Safe state:
    • For any possible sequence of future resource requests, it is possible to

eventually grant all requests

    • May require waiting even when resources are available!
  • Unsafe state:
    • Some sequence of resource requests can result in deadlock
  • Doomed state:
    • All possible computations lead to deadlock

10 of 82

Possible System States

11 of 82

Question

  • What are the doomed states for Dining Lawyers?

  • What are the unsafe states?

  • What are the safe states?

12 of 82

Communal Dining Lawyers

  • n chopsticks in middle of table
  • n lawyers, each can take one chopstick at a time
  • What are the safe states?
  • What are the unsafe states?
  • What are the doomed states?

13 of 82

Communal Mutant Dining Lawyers

  • N chopsticks in the middle of the table
  • N lawyers, each takes one chopstick at a time
  • Lawyers need k chopsticks to eat, k > 1

  • What are the safe states?
  • What are the unsafe states?
  • What are the doomed states?

14 of 82

Communal Mutant

Absent-Minded Dining Lawyers

  • N chopsticks in the middle of the table
  • N lawyers, each takes one chopstick at a time
  • Lawyers need k chopsticks to eat, k > 1
    • Sometimes lawyer is talking on his/her cellphone
    • Need k+1 chopsticks in that case

  • What are the safe states?
  • What are the unsafe states?
  • What are the doomed states?

15 of 82

Predict the Future

  • Banker’s algorithm
    • State maximum resource needs in advance
    • Allocate resources dynamically when resource is needed -- wait if granting

request would lead to deadlock

    • Request can be granted if some sequential ordering of threads is deadlock free

16 of 82

Banker’s Algorithm

  • Grant request iff result is a safe state
  • Sum of maximum resource needs of current threads can be greater than the total resources
    • Provided there is some way for all the threads to finish without getting into deadlock
  • Example: proceed iff
    • total available resources - # allocated >= max remaining that might be needed by this thread in order to finish
    • Guarantees this thread can finish

17 of 82

CPU Scheduling

18 of 82

Main Points

  • Scheduling policy: what to do next, when there are multiple threads ready to run
    • Or multiple packets to send, or web requests to serve, or …
  • Definitions
    • response time, throughput, predictability
  • Uniprocessor policies
    • FIFO, round robin, optimal
    • multilevel feedback as approximation of optimal
  • Multiprocessor policies
    • Affinity scheduling, gang scheduling
  • Queueing theory
    • Can you predict/improve a system’s response time?

19 of 82

Example

  • You manage a web site, that suddenly becomes wildly popular. Performance starts to degrade. Do you?
    • Buy more hardware?
    • Implement a different scheduling policy?
    • Turn away some users? Which ones?
  • How much worse will performance get if the web site becomes even more popular?

20 of 82

Definitions

  • Task/Job
    • User request: e.g., mouse click, web request, shell command, …
  • Latency/response time
    • How long does a task take to complete?
    • Tail latency: worst case response time inflation factor?
  • Throughput
    • How many tasks can be done per unit of time?
  • Overhead
    • How much extra work is done by the scheduler?
  • Fairness
    • Do multiple users share resource evenly?
  • Strategy-proof
    • Can a user manipulate the system to gain better performance?
  • Predictability
    • How consistent is a user’s performance over time?

21 of 82

More Definitions

  • Workload
    • Set of tasks for system to perform
  • Preemptive scheduler
    • If we can take resources away from a running task
  • Work-conserving
    • Resource is used whenever there is a task to run (keep the resource busy)
    • For non-preemptive schedulers, work-conserving is not always better
  • Scheduling algorithm
    • takes a workload as input
    • decides which tasks to do first
    • Performance metric (throughput, latency) as output
    • Only preemptive, work-conserving schedulers to be considered

22 of 82

First In First Out (FIFO)

  • Schedule tasks in the order they arrive
    • Continue running them until they complete or give up the processor
  • Example: memcached
    • Facebook cache of friend lists, …

  • On what workloads is FIFO particularly bad?

23 of 82

Shortest Job First (SJF)

  • Always run the task with the shortest remaining amount of work to do
    • Often called Shortest Remaining Time First (SRTF)

  • Suppose we have five tasks arrive one right after each other, but the first one is much longer than the others
    • Which completes first in FIFO? Next?
    • Which completes first in SJF? Next?

24 of 82

FIFO vs. SJF

25 of 82

Question

  • Claim: SJF is optimal for average response time. Is that true?

  • Does SJF have any downsides?

26 of 82

Question

  • Is FIFO ever optimal (for average response time)?
  • Pessimal?

27 of 82

Tail Latency

  • What if we are optimizing for tail latency and not average latency?
    • Ex: mapreduce needs to wait for the slowest task
  • Many cloud systems provide service level agreements (SLA) to applications
    • Average response time, throughput, …
    • Tail behavior: 99% (or 99.9%) latency, downtime, …
  • Starvation of some jobs not an option

28 of 82

Round Robin

  • Each task gets resource for a fixed period of time (time quantum)
    • If task doesn’t complete, it goes back in line
  • Need to pick a time quantum
    • What if time quantum is too long?
      • Infinite?
    • What if time quantum is too short?
      • One instruction?
        • Thrashing

29 of 82

Round Robin

30 of 82

Round Robin vs. FIFO

  • Assume time slices have zero cost (they don’t). Is Round Robin

always better than FIFO?

    • Average response time?
    • Tail latency?

31 of 82

Round Robin vs. FIFO

32 of 82

Round Robin == Fairness?

  • Is Round Robin fair?
  • What is fair?
    • Equal share of the CPU?
    • What is some other tasks don’t need their full share? How do we allocate the remainder?

33 of 82

Max-Min Fairness

  • Applies to repeating tasks
    • Ex: web server splits its time between compute and disk/network I/O
    • Ex: end hosts transferring packets over a network
  • Maximize the minimum allocation given to a task
    • If any task needs less than their equal share, schedule the smallest first
      • Will get their entire allocation
    • Split the remaining time among remaining tasks using max-min (recursion)
    • If all remaining tasks need at least equal share, split evenly

34 of 82

Fair Ǫueueing: Implementing max-min

  • For each repeating task, track how much of the resource they have consumed (during some time window)
    • Eg, total CPU time (or disk or network)
  • Schedule the task with the least accumulated resource usage
    • Deficit round robin: allow tasks to exceed minimum by one time quantum
      • Approximates fair queueing, plus or minus one time quantum
  • What if some tasks are more important than others?
    • Weighted fair queueing: assign tasks weights that sum to 1
    • Schedule task with least accumulated resource usage/weight
    • Ex: 5 units/.5 vs. 5 units/.2

35 of 82

Example: Workload with I/O and Compute Tasks (Round Robin)

Tasks

I/O Bound

CPU Bound

CPU Bound

Time

Issues

I/O Request

I/O

Completes

Issues

I/O Request

I/O

Completes

36 of 82

Example: Workload with I/O and Compute Tasks (Weighted Fair Ǫueueing)

37 of 82

Round Robin vs. FIFO

  • What happens with a mixture of short and long tasks?
    • Each task gets the same share of the CPU (+/- a quantum)
    • Short tasks aren’t stuck behind long tasks, acts like SPF
    • Long tasks get an equal share of the CPU

38 of 82

Round Robin vs. FIFO

39 of 82

Round Robin with I/O and Compute Tasks

40 of 82

Multi-level Feedback Queue (MFQ)

  • Goals:
    • Responsiveness for latency sensitive tasks
    • Low overhead
    • Starvation freedom
    • Some tasks are high/low priority
    • Fairness (among equal priority tasks)
  • Not perfect at any of them!

41 of 82

MFQ

  • Set of Round Robin queues
    • Each queue has a separate priority
  • High priority queues have short time slices
    • Low priority queues have long time slices
  • Scheduler picks first thread in highest priority queue
  • Tasks start in highest priority queue
    • If time slice expires, task drops one level
  • If low priority tasks are being starved, bump them up a level

42 of 82

MFQ

43 of 82

Example: Workload with I/O and Compute Tasks (Multilevel Feedback Queues)

44 of 82

Example: Multilevel Feedback Queues and Starvation

45 of 82

Fair Queuing: Max-Min Fairness

  • Applies to repeating tasks
    • Ex: web server splits its time between compute and disk/network I/O
    • Ex: end hosts transferring packets over a network
  • Maximize the minimum allocation given to a task
    • Keep track of CPU usage over time (including time spent waiting for I/O)
    • Schedule the task with the smallest accumulated CPU time, +/- time quantum
  • Result:
    • If any task needs less than their equal share, will get their entire allocation
    • Any extra time split evenly among remaining tasks using max-min (recursion)
    • If all remaining tasks need are compute bound, split evenly

46 of 82

Example: Workload with I/O and Compute Tasks (Fair Queueing)

47 of 82

Weighted Fair Queueing

  • What if some tasks are more important than others?
  • With priority scheduling, higher priority task is always scheduled
    • Lower priority tasks can wait forever (starve) if high priority task is large
  • Weighted fair queueing: assign tasks weights that sum to 1
    • Schedule task with least accumulated CPU usage/weight
    • Every task gets *some* portion of the CPU

48 of 82

Uniprocessor Summary (1)

  • FIFO is simple and minimizes overhead.
  • If tasks are variable in size, then FIFO can have very poor average response time.
  • If tasks are equal in size, FIFO is optimal in terms of average response time.
  • Considering only the processor, SJF is optimal in terms of average response time.
  • SJF is pessimal in terms of variance in response time.

49 of 82

Uniprocessor Summary (2)

  • If tasks are variable in size, Round Robin approximates SJF.
  • If tasks are equal in size, Round Robin will have very poor average response time.
  • Tasks that intermix processor and I/O benefit from SJF and can do poorly under Round Robin, especially with a long quantum
  • MFQ attempts to prioritize latency sensitive tasks
    • MFQ approximates SJF => High variance for long jobs
    • Under high load, MFQ approximates round robin (everyone at highest priority)
  • Fair queueing improves response time for I/O-bound tasks by equalizing CPU usage over a long period
    • Strategy proof: everyone gets equal share of CPU if they need it

50 of 82

Virtual Machines and Two Level Scheduling

  • Suppose the guest kernel running in virtual machine A waits for I/O
    • Nothing else to run => go into idle loop
  • Suppose another guest kernel running in virtual machine B has work to do
  • What should the host kernel do?
    • Leave the CPU with the virtual machine A?
    • Steal the CPU and give it to virtual machine B?
  • How does the host kernel know that the guest OS has nothing to do?
    • Rewrite guest OS idle loop to trap into host kernel

51 of 82

Multiprocessor Scheduling

  • What would happen if we used multilevel feedback or fair queueing on a multiprocessor?
  • Contention for scheduler spinlock
  • Cache slowdown:
    • Task data from last time it ran is still in its old cache => will it run there?
    • ready list (or ptable) data structure pings from one CPU to another
  • Parallel programs (with multiple tasks per process)
    • Won’t be run efficiently
    • Won’t be run efficiently in a virtual machine

52 of 82

Per-Processor Affinity Scheduling

  • Each processor has its own ready list
    • Protected by a per-processor spinlock
  • Put threads back on the ready list where it had most recently run
    • Ex: when I/O completes, or on Condition->signal
  • Idle processors can steal work from other processors

53 of 82

Per-Processor Multi-level Feedback with Affinity Scheduling

54 of 82

Bulk Synchronous Model of Parallelism

  • Loop at each processor:
    • Compute on local data (in parallel)
    • Barrier
    • Send (selected) data to other processors (in parallel)
    • Barrier
  • Examples:
    • Machine learning training and inference
    • MapReduce
    • Fluid flow over a wing
  • Most parallel algorithms can be recast in BSP
    • Within a small constant factor in performance

55 of 82

Bulk Synchronous in Action

56 of 82

Scheduling Parallel Programs

  • What happens if one thread gets time-sliced while other threads from the same program are still running?
    • Assuming program uses locks and condition variables, it will still be correct
    • What about performance?

57 of 82

Scheduling Parallel Programs

Oblivious: each CPU time-slices its ready list independently

58 of 82

Gang Scheduling

59 of 82

Parallel Program Speedup

60 of 82

Virtual Machines and Two Level Scheduling

  • Suppose the guest kernel running in virtual machine A is running a parallel program with limited parallelism
    • May be using all of its CPUs, just inefficiently
  • Suppose another guest kernel running in virtual machine B is running a highly parallel program
  • What should the host kernel do?
  • How does the host kernel know that the guest OS in A is inefficient?

61 of 82

Space Sharing

Scheduler activations: kernel tells each app its assignment, and every time it changes

62 of 82

Queueing Theory

  • Can we predict what will happen to user performance:
    • If a service becomes more popular?
    • If we buy more hardware?
    • If we change the implementation to provide more features?

63 of 82

Assumption: average performance in a stable system, where the arrival rate (ƛ) matches the departure rate (μ)

Queueing Model

64 of 82

Definitions

  • Queueing delay (W): wait time
    • Number of tasks queued (Q)
  • Service time (S): time to service the request
  • Response time (R) = queueing delay + service time
  • Utilization (U): fraction of time the server is busy
    • Service time * arrival rate (ƛ)
  • Throughput (X): rate of task completions
    • If no overload, throughput = arrival rate

65 of 82

Little’s Law

N = X * R

N: number of tasks in the system

Applies to any stable system – where arrivals match departures

  • Independent of scheduling discipline and burstiness

66 of 82

Question

Suppose a system has throughput (X) = 100 tasks/s, average response time (R) = 50 ms/task

  • How many tasks are in the system on average?
    • Hint: Little’s Law N = X * R

67 of 82

Question

Suppose a system has throughput (X) = 100 tasks/s, average response time (R) = 50 ms/task

  • If the server takes 5 ms/task, what is its utilization? (N = X * R)

68 of 82

Question

Suppose a system has throughput (X) = 100 tasks/s, average response time (R) = 50 ms/task

  • What is the average wait time?
  • What is the average number of queued tasks?

69 of 82

Question

  • From example:

X = 100 task/sec

R = 50 ms/task

S = 5 ms/task

W = 45 ms/task

Q = 4.5 tasks

  • What gives? W = 45 ms while S * Q = 22.5 ms
    • Hint: what if S = 10ms? S = 1ms?

70 of 82

Queueing

  • What is the best case scenario for minimizing queueing delay?
    • Keeping arrival rate, service time constant
  • What is the worst case scenario?

71 of 82

Queueing: Best Case

72 of 82

Response Time: Best vs. Worst Case

73 of 82

Queueing: Average Case?

  • What is average?
    • Gaussian: Arrivals are spread out, around a mean value
    • Exponential: arrivals are memoryless
    • Heavy-tailed: arrivals are bursty

  • Can have randomness in both arrivals and service times

74 of 82

Exponential Distribution

75 of 82

Exponential Distribution

Permits closed form solution to state probabilities, as function of arrival rate and service rate

76 of 82

Response Time vs. Utilization (Exponential)

77 of 82

Question

  • Exponential arrivals: R = S/(1-U)
  • If system is 20% utilized, and load increases by 5%, how much does response time increase?
  • If system is 90% utilized, and load increases by 5%, how much does response time increase?

78 of 82

Variance in Response Time

  • Exponential arrivals
    • Variance in R = S/(1-U)^2

  • What if less bursty than exponential?

  • What if more bursty than exponential?

79 of 82

What if Multiple Resources?

  • Assuming exponential arrival, service times
  • Response time =

Sum over all i: Service time for resource i /

(1 – Utilization of resource i)

  • Implication
    • If you fix one bottleneck, the next highest utilized resource will limit performance

80 of 82

Overload Management

  • What if arrivals occur faster than service can handle them
    • If do nothing, response time will become infinite
  • Turn users away?
    • Which ones? Average response time is best if turn away users with highest remaining service demand (shortest job first!)
    • Example: Highway congestion
  • Degrade service?
    • Compute result with fewer resources
    • Example: CNN static front page on 9/11

81 of 82

Highway Congestion (measured)

82 of 82

Why Do Metro Buses Cluster?

Suppose two Metro buses start 10 minutes apart. Why might they arrive at the same time?