1 of 38

CSE 451

Operating Systems

L8 - Synchronization

Slides by: Tom Anderson

Baris Kasikci

2 of 38

Midterm Feedback

50%+ participation -> 1 extra day on the pset

75%+ participation -> 2 extra days on the pset

3 of 38

xk swtch (swtch.S)

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

switch(old, new) does not return in the same thread that called it. It starts executing on the old thread’s stack, saves enough state so that old thread can resume later, then changes %rsp to the new thread’s saved stack, restores that thread’s saved state, and ret returns into the new thread’s past execution.

We only save the registers that the suspended thread expects to still have when it resumes. Under the calling convention, those are the callee-saved registers. Caller-saved registers are not preserved across ordinary function calls anyway, so they do not need to be part of the saved context here.

4 of 38

xk swtch (swtch.S): Switch from A -> B

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi

push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

Step 1: Save old thread’s preserved state (A)

5 of 38

xk swtch (swtch.S): Switch from A -> B

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi

push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

Step 2: Record A’s execution point

current stack pointer

destination register (memory field inside the TCB)

6 of 38

xk swtch (swtch.S): Switch from A -> B

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi

push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

Step 3: Switch to the new thread’s (B) stack

7 of 38

xk swtch (swtch.S): Switch from A -> B

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi

push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

Step 4: Restore B’s state and resume it

8 of 38

A Subtlety

Thread_create puts new thread on ready list
When it first runs, some thread calls swtch

Saves old thread state to stack
Restores new thread state from stack

=> Set up newly created thread so that swtch will ”resume” at start of

thread

9 of 38

Stack Progression

10 of 38

Why is there a stub at all?

F is an ordinary function that might return normally

No where sensible to go in that case!

With the stub, once F returns, stub calls thread_exit to cleanup

11 of 38

Why is there a stub at all?

When the scheduler switches to B:

swtch loads this thread’s saved stack pointer into %rsp
pops registers, executes ret.

ret uses the saved return address on this stack, which points to the stub.

12 of 38

What Happens After Switch?

Stub is running and called F

Stub’s return PC is on the stack
Arguments are in the registers or on stack

13 of 38

After F calls yield

Thread B calls yield in F

Yield calls swtch
switch pushes callee saved registers

What happens when thread B resumes?

%rsp restored to the saved top of the stack
switch pops the saved registers
control goes back to yield and then F

When F returns

Stub calls thread_exit to cleanup

14 of 38

Timer Interrupt -> swtch

15 of 38

Timer Interrupt -> swtch

16 of 38

Timer Interrupt -> swtch

17 of 38

Timer Interrupt -> swtch

18 of 38

Two Threads Call Yield

19 of 38

Synchronization

20 of 38

Motivation

When threads concurrently read/write shared memory, program behavior is very difficult to reason about

Two threads write to the same variable; what is the result?
Thread schedule is non-deterministic -> behavior changes run to run

C++ language standard is that program behavior with concurrent read/writes is undefined (if you have a data race)

Literally, can’t know what code compiler will generate

Hardware behavior with concurrent read/writes is architecture- specific

And hard to understand/explain to programmers
Compiler is much less effective in we try to assign semantics to programs with data races

21 of 38

Question: Can this panic?

Thread 1 Thread 2

p = someComputation(); pInitialized = true;

while (!pInitialized)

;

q = someFunction(p);

if (q != someFunction(p)) panic

22 of 38

Definitions

Race condition: output of a concurrent program depends on the order of operations between threads

Data race: when a thread can read or modify a memory location while another thread may be modifying it. (2 threads, 1 is a write, no intervening sycnrhonization)

Memory Fence: hardware instruction to enforce ordering

All instructions before fence complete before any started after fence

Mutual exclusion: only one thread does a particular thing at a time

Critical section: piece of code that only one thread can execute at once

Lock: prevent someone from doing something

Lock before entering critical section, before accessing shared data
Unlock when leaving, after done accessing shared data
Wait if locked (all synchronization involves waiting!)

23 of 38

Data Races in C++

C++ has undefined semantics when there is a data race (since 2006)

⇒ compiler optimizes code assuming no data races

If variables can spontaneously change, most compiler optimizations become impossible
https://www.hboehm.info/c++mm/why_undef.html

Processor architectures have non-intuitive memory semantics

Write buffering: next instruction executes while write is being completed

24 of 38

Too Much Milk Example

	Person A	Person B
12:30	Look in fridge. Out of milk.
12:35	Leave for store.
12:40	Arrive at store.	Look in fridge. Out of milk.
12:45	Buy milk.	Leave for store.
12:50	Arrive home, put milk away.	Arrive at store.
12:55		Buy milk.
1:00		Arrive home, put milk away. Oh no!

25 of 38

Too Much Milk, #1

Correctness property

Someone buys if needed (liveness)
At most one person buys (safety)

Try #1: leave a note

if (!note)

if (!milk) {

leave note buy milk remove note

}

26 of 38

Too Much Milk, #2

Thread A

leave note A if (!note B) {

if (!milk) buy milk

}

remove note A

Thread B

leave note B if (!noteA) {

if (!milk) buy milk

}

remove note B

27 of 38

Too Much Milk, #3

Thread A

leave note A

while (note B) // X

do nothing; if (!milk)

buy milk;

remove note A

Thread B

leave note B

if (!noteA) { // Y if (!milk)

buy milk

}

remove note B

Can guarantee at X and Y that either:

Safe for me to buy
Other will buy, ok to quit

28 of 38

Lessons

Solution is complicated

“obvious” code often has bugs

Modern compilers/architectures reorder instructions

Making reasoning even more difficult

Generalizing to many threads/processors

Even more complex: see Peterson’s algorithm

29 of 38

30 of 38

Locks

Lock::acquire: wait until lock is free, then take it Lock::release: release lock, waking up anyone waiting for it

At most one lock holder at a time (safety)
If no one holding, acquire gets lock (progress)
If all lock holders finish and no higher priority waiters, waiter eventually gets lock (progress)

31 of 38

Question: Why only Acquire/Release?

Suppose we add a method to a lock, to ask if the lock is free. it returns true. Is the lock:

Free? Busy?

Don’t know?

Suppose

32 of 38

Too Much Milk, #4

Locks allow concurrent code to be much simpler:

lock.acquire(); if (!milk)

buy milk lock.release();

33 of 38

Lock Example: Malloc/Free

char *malloc (n) { heaplock.acquire(); p = allocate memory heaplock.release(); return p;

}

void free(char *p) { heaplock.acquire(); put p back on free list heaplock.release();

}

34 of 38

Rules for Using Locks

Lock is initially free
Always acquire before accessing shared data structure

Beginning of procedure!

Always release after finishing with shared data

End of procedure!
Only the lock holder can release
DO NOT throw lock for someone else to release

Never access shared data without lock

Danger!

35 of 38

Double Checked Locking

if (p == NULL) {

lock.acquire(); if (p == NULL) {

p = newP();

}

lock.release();

}

use p->field1

newP() {

tmp = malloc(sizeof(p)); tmp->field1 = …

tmp->field2 = … return tmp;

}

36 of 38

Single Checked Locking

lock.acquire();

if (p == NULL) {

p = newP();

}

lock.release(); use p->field1

newP() {

tmp = malloc(sizeof(p)); tmp->field1 = …

tmp->field2 = … return tmp;

}

37 of 38

Example: Bounded Buffer

tryget() { lock.acquire(); item = NULL;

if (front < tail) {

item = buf[front % MAX]; front++;

}

lock.release(); return item;

}

tryput(item) { lock.acquire(); success = FALSE;

if ((tail – front) < MAX) {

buf[tail % MAX] = item; tail++;

success = TRUE;

}

lock.release(); return success;

}

Initially: front = tail = 0; lock = FREE; MAX is buffer capacity

38 of 38

Question

If tryget returns NULL, do we know the buffer is empty?

If we poll tryget in a loop, what happens to a thread calling tryput?