1 of 82

CSE 451

Operating Systems

L12 - Deadlocks

Slides by: Tom Anderson

Baris Kasikci

2 of 82

Dining Lawyers

Each lawyer needs two chopsticks to eat. Each grabs chopstick on the right first.

3 of 82

Necessary Conditions for Deadlock

Limited access to resources

If infinite resources, no deadlock!

No preemption

If resources are virtual, can break deadlock

Multiple independent requests

“wait while holding”

Circular chain of requests

4 of 82

Question

Does Dining Lawyers meet the necessary conditions for deadlock?

Limited access to resources
No preemption
Multiple independent requests (wait while holding)
Circular chain of requests

Can we modify Dining Lawyers to prevent deadlock?

5 of 82

Preventing Deadlock

Exploit or limit program behavior

Limit program from doing anything that might lead to deadlock

Predict the future

If we know what program will do, we can tell if granting a resource might lead to deadlock

Detect and recover

If we can rollback a thread, we can fix a deadlock once it occurs

6 of 82

Exploit or Limit Behavior

Provide enough resources

How many chopsticks are enough?

Eliminate wait while holding

Release lock when calling out of module
Telephone circuit setup gives busy signal

Eliminate circular waiting

Lock ordering: always acquire locks in a fixed order
Example: move file from one directory to another

7 of 82

Question

How can we use resource ordering to eliminate deadlock in dining lawyers?

8 of 82

Example

Thread 1

1. Acquire A

3. Acquire C

5. If (maybe) Wait for B

Thread 2

2. Acquire B

4. Wait for A

How can we make sure to avoid deadlock?

9 of 82

Deadlock Dynamics

Safe state:

For any possible sequence of future resource requests, it is possible to

eventually grant all requests

May require waiting even when resources are available!

Unsafe state:

Some sequence of resource requests can result in deadlock

Doomed state:

All possible computations lead to deadlock

10 of 82

Possible System States

11 of 82

Question

What are the doomed states for Dining Lawyers?

What are the unsafe states?

What are the safe states?

12 of 82

Communal Dining Lawyers

n chopsticks in middle of table
n lawyers, each can take one chopstick at a time
What are the safe states?
What are the unsafe states?
What are the doomed states?

13 of 82

Communal Mutant Dining Lawyers

N chopsticks in the middle of the table
N lawyers, each takes one chopstick at a time
Lawyers need k chopsticks to eat, k > 1

What are the safe states?
What are the unsafe states?
What are the doomed states?

14 of 82

Communal Mutant

Absent-Minded Dining Lawyers

N chopsticks in the middle of the table
N lawyers, each takes one chopstick at a time
Lawyers need k chopsticks to eat, k > 1

Sometimes lawyer is talking on his/her cellphone
Need k+1 chopsticks in that case

What are the safe states?
What are the unsafe states?
What are the doomed states?

15 of 82

Predict the Future

Banker’s algorithm

State maximum resource needs in advance
Allocate resources dynamically when resource is needed -- wait if granting

request would lead to deadlock

Request can be granted if some sequential ordering of threads is deadlock free

16 of 82

Banker’s Algorithm

Grant request iff result is a safe state
Sum of maximum resource needs of current threads can be greater than the total resources

Provided there is some way for all the threads to finish without getting into deadlock

Example: proceed iff

total available resources - # allocated >= max remaining that might be needed by this thread in order to finish
Guarantees this thread can finish

17 of 82

CPU Scheduling

18 of 82

Main Points

Scheduling policy: what to do next, when there are multiple threads ready to run

Or multiple packets to send, or web requests to serve, or …

Definitions

response time, throughput, predictability

Uniprocessor policies

FIFO, round robin, optimal
multilevel feedback as approximation of optimal

Multiprocessor policies

Affinity scheduling, gang scheduling

Queueing theory

Can you predict/improve a system’s response time?

19 of 82

Example

You manage a web site, that suddenly becomes wildly popular. Performance starts to degrade. Do you?

Buy more hardware?
Implement a different scheduling policy?
Turn away some users? Which ones?

How much worse will performance get if the web site becomes even more popular?

20 of 82

Definitions

Task/Job

User request: e.g., mouse click, web request, shell command, …

Latency/response time

How long does a task take to complete?
Tail latency: worst case response time inflation factor?

Throughput

How many tasks can be done per unit of time?

Overhead

How much extra work is done by the scheduler?

Fairness

Do multiple users share resource evenly?

Strategy-proof

Can a user manipulate the system to gain better performance?

Predictability

How consistent is a user’s performance over time?

21 of 82

More Definitions

Workload

Set of tasks for system to perform

Preemptive scheduler

If we can take resources away from a running task

Work-conserving

Resource is used whenever there is a task to run (keep the resource busy)
For non-preemptive schedulers, work-conserving is not always better

Scheduling algorithm

takes a workload as input
decides which tasks to do first
Performance metric (throughput, latency) as output
Only preemptive, work-conserving schedulers to be considered

22 of 82

First In First Out (FIFO)

Schedule tasks in the order they arrive

Continue running them until they complete or give up the processor

Example: memcached

Facebook cache of friend lists, …

On what workloads is FIFO particularly bad?

23 of 82

Shortest Job First (SJF)

Always run the task with the shortest remaining amount of work to do

Often called Shortest Remaining Time First (SRTF)

Suppose we have five tasks arrive one right after each other, but the first one is much longer than the others

Which completes first in FIFO? Next?
Which completes first in SJF? Next?

24 of 82

FIFO vs. SJF

25 of 82

Question

Claim: SJF is optimal for average response time. Is that true?

Does SJF have any downsides?

26 of 82

Question

Is FIFO ever optimal (for average response time)?

Pessimal?

27 of 82

Tail Latency

What if we are optimizing for tail latency and not average latency?

Ex: mapreduce needs to wait for the slowest task

Many cloud systems provide service level agreements (SLA) to applications

Average response time, throughput, …
Tail behavior: 99% (or 99.9%) latency, downtime, …

Starvation of some jobs not an option

28 of 82

Round Robin

Each task gets resource for a fixed period of time (time quantum)

If task doesn’t complete, it goes back in line

Need to pick a time quantum

What if time quantum is too long?

Infinite?

What if time quantum is too short?

One instruction?

Thrashing

29 of 82

Round Robin

30 of 82

Round Robin vs. FIFO

Assume time slices have zero cost (they don’t). Is Round Robin

always better than FIFO?

Average response time?
Tail latency?

31 of 82

Round Robin vs. FIFO

32 of 82

Round Robin == Fairness?

Is Round Robin fair?
What is fair?

Equal share of the CPU?
What is some other tasks don’t need their full share? How do we allocate the remainder?

33 of 82

Max-Min Fairness

Applies to repeating tasks

Ex: web server splits its time between compute and disk/network I/O
Ex: end hosts transferring packets over a network

Maximize the minimum allocation given to a task

If any task needs less than their equal share, schedule the smallest first

Will get their entire allocation

Split the remaining time among remaining tasks using max-min (recursion)
If all remaining tasks need at least equal share, split evenly

34 of 82

Fair Ǫueueing: Implementing max-min

For each repeating task, track how much of the resource they have consumed (during some time window)

Eg, total CPU time (or disk or network)

Schedule the task with the least accumulated resource usage

Deficit round robin: allow tasks to exceed minimum by one time quantum

Approximates fair queueing, plus or minus one time quantum

What if some tasks are more important than others?

Weighted fair queueing: assign tasks weights that sum to 1
Schedule task with least accumulated resource usage/weight
Ex: 5 units/.5 vs. 5 units/.2

35 of 82

Example: Workload with I/O and Compute Tasks (Round Robin)

Tasks

I/O Bound

CPU Bound

Time

Issues

I/O Request

I/O

Completes

Issues

I/O Request

I/O

Completes

36 of 82

Example: Workload with I/O and Compute Tasks (Weighted Fair Ǫueueing)

37 of 82

Round Robin vs. FIFO

What happens with a mixture of short and long tasks?

Each task gets the same share of the CPU (+/- a quantum)
Short tasks aren’t stuck behind long tasks, acts like SPF
Long tasks get an equal share of the CPU

38 of 82

Round Robin vs. FIFO

39 of 82

Round Robin with I/O and Compute Tasks

40 of 82

Multi-level Feedback Queue (MFQ)

Goals:

Responsiveness for latency sensitive tasks
Low overhead
Starvation freedom
Some tasks are high/low priority
Fairness (among equal priority tasks)

Not perfect at any of them!

41 of 82

MFQ

Set of Round Robin queues

Each queue has a separate priority

High priority queues have short time slices

Low priority queues have long time slices

Scheduler picks first thread in highest priority queue
Tasks start in highest priority queue

If time slice expires, task drops one level

If low priority tasks are being starved, bump them up a level

43 of 82

Example: Workload with I/O and Compute Tasks (Multilevel Feedback Queues)

44 of 82

Example: Multilevel Feedback Queues and Starvation

45 of 82

Fair Queuing: Max-Min Fairness

Applies to repeating tasks

Ex: web server splits its time between compute and disk/network I/O
Ex: end hosts transferring packets over a network

Maximize the minimum allocation given to a task

Keep track of CPU usage over time (including time spent waiting for I/O)
Schedule the task with the smallest accumulated CPU time, +/- time quantum

Result:

If any task needs less than their equal share, will get their entire allocation
Any extra time split evenly among remaining tasks using max-min (recursion)
If all remaining tasks need are compute bound, split evenly

46 of 82

Example: Workload with I/O and Compute Tasks (Fair Queueing)

47 of 82

Weighted Fair Queueing

What if some tasks are more important than others?
With priority scheduling, higher priority task is always scheduled

Lower priority tasks can wait forever (starve) if high priority task is large

Weighted fair queueing: assign tasks weights that sum to 1

Schedule task with least accumulated CPU usage/weight
Every task gets *some* portion of the CPU

48 of 82

Uniprocessor Summary (1)

FIFO is simple and minimizes overhead.
If tasks are variable in size, then FIFO can have very poor average response time.
If tasks are equal in size, FIFO is optimal in terms of average response time.
Considering only the processor, SJF is optimal in terms of average response time.
SJF is pessimal in terms of variance in response time.

49 of 82

Uniprocessor Summary (2)

If tasks are variable in size, Round Robin approximates SJF.
If tasks are equal in size, Round Robin will have very poor average response time.
Tasks that intermix processor and I/O benefit from SJF and can do poorly under Round Robin, especially with a long quantum
MFQ attempts to prioritize latency sensitive tasks

MFQ approximates SJF => High variance for long jobs
Under high load, MFQ approximates round robin (everyone at highest priority)

Fair queueing improves response time for I/O-bound tasks by equalizing CPU usage over a long period

Strategy proof: everyone gets equal share of CPU if they need it

50 of 82

Virtual Machines and Two Level Scheduling

Suppose the guest kernel running in virtual machine A waits for I/O

Nothing else to run => go into idle loop

Suppose another guest kernel running in virtual machine B has work to do
What should the host kernel do?

Leave the CPU with the virtual machine A?
Steal the CPU and give it to virtual machine B?

How does the host kernel know that the guest OS has nothing to do?

Rewrite guest OS idle loop to trap into host kernel

51 of 82

Multiprocessor Scheduling

What would happen if we used multilevel feedback or fair queueing on a multiprocessor?
Contention for scheduler spinlock
Cache slowdown:

Task data from last time it ran is still in its old cache => will it run there?
ready list (or ptable) data structure pings from one CPU to another

Parallel programs (with multiple tasks per process)

Won’t be run efficiently
Won’t be run efficiently in a virtual machine

52 of 82

Per-Processor Affinity Scheduling

Each processor has its own ready list

Protected by a per-processor spinlock

Put threads back on the ready list where it had most recently run

Ex: when I/O completes, or on Condition->signal

Idle processors can steal work from other processors

53 of 82

Per-Processor Multi-level Feedback with Affinity Scheduling

54 of 82

Bulk Synchronous Model of Parallelism

Loop at each processor:

Compute on local data (in parallel)
Barrier
Send (selected) data to other processors (in parallel)
Barrier

Examples:

Machine learning training and inference
MapReduce
Fluid flow over a wing

Most parallel algorithms can be recast in BSP

Within a small constant factor in performance

55 of 82

Bulk Synchronous in Action

56 of 82

Scheduling Parallel Programs

What happens if one thread gets time-sliced while other threads from the same program are still running?

Assuming program uses locks and condition variables, it will still be correct
What about performance?

57 of 82

Scheduling Parallel Programs

Oblivious: each CPU time-slices its ready list independently

58 of 82

Gang Scheduling

59 of 82

Parallel Program Speedup

60 of 82

Virtual Machines and Two Level Scheduling

Suppose the guest kernel running in virtual machine A is running a parallel program with limited parallelism

May be using all of its CPUs, just inefficiently

Suppose another guest kernel running in virtual machine B is running a highly parallel program
What should the host kernel do?
How does the host kernel know that the guest OS in A is inefficient?

61 of 82

Space Sharing

Scheduler activations: kernel tells each app its assignment, and every time it changes

62 of 82

Queueing Theory

Can we predict what will happen to user performance:

If a service becomes more popular?
If we buy more hardware?
If we change the implementation to provide more features?

63 of 82

Assumption: average performance in a stable system, where the arrival rate (ƛ) matches the departure rate (μ)

Queueing Model

64 of 82

Definitions

Queueing delay (W): wait time

Number of tasks queued (Q)

Service time (S): time to service the request
Response time (R) = queueing delay + service time
Utilization (U): fraction of time the server is busy

Service time * arrival rate (ƛ)

Throughput (X): rate of task completions

If no overload, throughput = arrival rate

65 of 82

Little’s Law

N = X * R

N: number of tasks in the system

Applies to any stable system – where arrivals match departures

Independent of scheduling discipline and burstiness

66 of 82

Question

Suppose a system has throughput (X) = 100 tasks/s, average response time (R) = 50 ms/task

How many tasks are in the system on average?

Hint: Little’s Law N = X * R

67 of 82

Question

Suppose a system has throughput (X) = 100 tasks/s, average response time (R) = 50 ms/task

If the server takes 5 ms/task, what is its utilization? (N = X * R)

68 of 82

Question

Suppose a system has throughput (X) = 100 tasks/s, average response time (R) = 50 ms/task

What is the average wait time?
What is the average number of queued tasks?

69 of 82

Question

From example:

X = 100 task/sec

R = 50 ms/task

S = 5 ms/task

W = 45 ms/task

Q = 4.5 tasks

What gives? W = 45 ms while S * Q = 22.5 ms

Hint: what if S = 10ms? S = 1ms?

70 of 82

Queueing

What is the best case scenario for minimizing queueing delay?

Keeping arrival rate, service time constant

What is the worst case scenario?

71 of 82

Queueing: Best Case

72 of 82

Response Time: Best vs. Worst Case

73 of 82

Queueing: Average Case?

What is average?

Gaussian: Arrivals are spread out, around a mean value
Exponential: arrivals are memoryless
Heavy-tailed: arrivals are bursty

Can have randomness in both arrivals and service times

74 of 82

Exponential Distribution

75 of 82

Exponential Distribution

Permits closed form solution to state probabilities, as function of arrival rate and service rate

76 of 82

Response Time vs. Utilization (Exponential)

77 of 82

Question

Exponential arrivals: R = S/(1-U)
If system is 20% utilized, and load increases by 5%, how much does response time increase?

If system is 90% utilized, and load increases by 5%, how much does response time increase?

78 of 82

Variance in Response Time

Exponential arrivals

Variance in R = S/(1-U)^2

What if less bursty than exponential?

What if more bursty than exponential?

79 of 82

What if Multiple Resources?

Assuming exponential arrival, service times
Response time =

Sum over all i: Service time for resource i /

(1 – Utilization of resource i)

Implication

If you fix one bottleneck, the next highest utilized resource will limit performance

80 of 82

Overload Management

What if arrivals occur faster than service can handle them

If do nothing, response time will become infinite

Turn users away?

Which ones? Average response time is best if turn away users with highest remaining service demand (shortest job first!)
Example: Highway congestion

Degrade service?

Compute result with fewer resources
Example: CNN static front page on 9/11

81 of 82

Highway Congestion (measured)

82 of 82

Why Do Metro Buses Cluster?

Suppose two Metro buses start 10 minutes apart. Why might they arrive at the same time?

1 of 82

2 of 82

3 of 82

4 of 82

5 of 82

6 of 82

7 of 82

8 of 82

9 of 82

10 of 82

11 of 82

12 of 82

13 of 82

14 of 82

15 of 82

16 of 82

17 of 82

18 of 82

19 of 82

20 of 82

21 of 82

22 of 82

23 of 82

24 of 82

25 of 82

26 of 82

27 of 82

28 of 82

29 of 82

30 of 82

31 of 82

32 of 82

33 of 82

34 of 82

35 of 82

36 of 82

37 of 82

38 of 82

39 of 82

40 of 82

41 of 82

42 of 82

43 of 82

44 of 82

45 of 82

46 of 82

47 of 82

48 of 82

49 of 82

50 of 82

51 of 82

52 of 82

53 of 82

54 of 82

55 of 82

56 of 82

57 of 82

58 of 82

59 of 82

60 of 82

61 of 82

62 of 82

63 of 82

64 of 82

65 of 82

66 of 82

67 of 82

68 of 82

69 of 82

70 of 82

71 of 82

72 of 82

73 of 82

74 of 82

75 of 82

76 of 82

77 of 82

78 of 82

79 of 82

80 of 82