1 of 38

CSE 451

Operating Systems

L8 - Synchronization

Slides by: Tom Anderson

Baris Kasikci

2 of 38

Midterm Feedback

50%+ participation -> 1 extra day on the pset

75%+ participation -> 2 extra days on the pset

3 of 38

xk swtch (swtch.S)

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

4 of 38

xk swtch (swtch.S): Switch from A -> B

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi

push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

Step 1: Save old thread’s preserved state (A)

5 of 38

xk swtch (swtch.S): Switch from A -> B

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi

push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

Step 2: Record A’s execution point

current stack pointer

destination register (memory field inside the TCB)

6 of 38

xk swtch (swtch.S): Switch from A -> B

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi

push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

Step 3: Switch to the new thread’s (B) stack

7 of 38

xk swtch (swtch.S): Switch from A -> B

swtch:

// callee saved registers already saved

// ptr to old PCB is in rdi

push %rbp

push %rbx

push %r11

push %r12

push %r13

push %r14

push %r15

mov %rsp, (%rdi)

// ptr to new PCB is in rsi mov %rsi, %rsp

pop %r15 pop %r14 pop %r13 pop %r12 pop %r11 pop %rbx pop %rbp ret

Step 4: Restore B’s state and resume it

8 of 38

A Subtlety

  • Thread_create puts new thread on ready list
  • When it first runs, some thread calls swtch
    • Saves old thread state to stack
    • Restores new thread state from stack

=> Set up newly created thread so that swtch will ”resume” at start of

thread

9 of 38

Stack Progression

10 of 38

Why is there a stub at all?

  • F is an ordinary function that might return normally
    • No where sensible to go in that case!
  • With the stub, once F returns, stub calls thread_exit to cleanup

11 of 38

Why is there a stub at all?

  • When the scheduler switches to B:
    • swtch loads this thread’s saved stack pointer into %rsp
    • pops registers, executes ret.
  • ret uses the saved return address on this stack, which points to the stub.

12 of 38

What Happens After Switch?

  • Stub is running and called F
    • Stub’s return PC is on the stack
    • Arguments are in the registers or on stack

13 of 38

After F calls yield

  • Thread B calls yield in F
    • Yield calls swtch
    • switch pushes callee saved registers
  • What happens when thread B resumes?
    • %rsp restored to the saved top of the stack
    • switch pops the saved registers
    • control goes back to yield and then F
  • When F returns
    • Stub calls thread_exit to cleanup

14 of 38

Timer Interrupt -> swtch

15 of 38

Timer Interrupt -> swtch

16 of 38

Timer Interrupt -> swtch

17 of 38

Timer Interrupt -> swtch

18 of 38

Two Threads Call Yield

19 of 38

Synchronization

20 of 38

Motivation

  • When threads concurrently read/write shared memory, program behavior is very difficult to reason about
    • Two threads write to the same variable; what is the result?
    • Thread schedule is non-deterministic -> behavior changes run to run
  • C++ language standard is that program behavior with concurrent read/writes is undefined (if you have a data race)
    • Literally, can’t know what code compiler will generate
  • Hardware behavior with concurrent read/writes is architecture- specific
    • And hard to understand/explain to programmers
    • Compiler is much less effective in we try to assign semantics to programs with data races

21 of 38

Question: Can this panic?

Thread 1 Thread 2

p = someComputation(); pInitialized = true;

while (!pInitialized)

;

q = someFunction(p);

if (q != someFunction(p)) panic

22 of 38

Definitions

Race condition: output of a concurrent program depends on the order of operations between threads

Data race: when a thread can read or modify a memory location while another thread may be modifying it. (2 threads, 1 is a write, no intervening sycnrhonization)

Memory Fence: hardware instruction to enforce ordering

  • All instructions before fence complete before any started after fence

Mutual exclusion: only one thread does a particular thing at a time

  • Critical section: piece of code that only one thread can execute at once

Lock: prevent someone from doing something

  • Lock before entering critical section, before accessing shared data
  • Unlock when leaving, after done accessing shared data
  • Wait if locked (all synchronization involves waiting!)

23 of 38

Data Races in C++

C++ has undefined semantics when there is a data race (since 2006)

compiler optimizes code assuming no data races

  • If variables can spontaneously change, most compiler optimizations become impossible
  • https://www.hboehm.info/c++mm/why_undef.html

Processor architectures have non-intuitive memory semantics

Write buffering: next instruction executes while write is being completed

24 of 38

Too Much Milk Example

Person A

Person B

12:30

Look in fridge. Out of milk.

12:35

Leave for store.

12:40

Arrive at store.

Look in fridge. Out of milk.

12:45

Buy milk.

Leave for store.

12:50

Arrive home, put milk away.

Arrive at store.

12:55

Buy milk.

1:00

Arrive home, put milk away.

Oh no!

25 of 38

Too Much Milk, #1

  • Correctness property
    • Someone buys if needed (liveness)
    • At most one person buys (safety)
  • Try #1: leave a note

if (!note)

if (!milk) {

leave note buy milk remove note

}

26 of 38

Too Much Milk, #2

Thread A

leave note A if (!note B) {

if (!milk) buy milk

}

remove note A

Thread B

leave note B if (!noteA) {

if (!milk) buy milk

}

remove note B

27 of 38

Too Much Milk, #3

Thread A

leave note A

while (note B) // X

do nothing; if (!milk)

buy milk;

remove note A

Thread B

leave note B

if (!noteA) { // Y if (!milk)

buy milk

}

remove note B

Can guarantee at X and Y that either:

  1. Safe for me to buy
  2. Other will buy, ok to quit

28 of 38

Lessons

  • Solution is complicated
    • “obvious” code often has bugs
  • Modern compilers/architectures reorder instructions
    • Making reasoning even more difficult
  • Generalizing to many threads/processors
    • Even more complex: see Peterson’s algorithm

29 of 38

30 of 38

Locks

Lock::acquire: wait until lock is free, then take it Lock::release: release lock, waking up anyone waiting for it

  1. At most one lock holder at a time (safety)
  2. If no one holding, acquire gets lock (progress)
  3. If all lock holders finish and no higher priority waiters, waiter eventually gets lock (progress)

31 of 38

Question: Why only Acquire/Release?

Suppose we add a method to a lock, to ask if the lock is free. it returns true. Is the lock:

Free? Busy?

Don’t know?

Suppose

32 of 38

Too Much Milk, #4

Locks allow concurrent code to be much simpler:

lock.acquire(); if (!milk)

buy milk lock.release();

33 of 38

Lock Example: Malloc/Free

char *malloc (n) { heaplock.acquire(); p = allocate memory heaplock.release(); return p;

}

void free(char *p) { heaplock.acquire(); put p back on free list heaplock.release();

}

34 of 38

Rules for Using Locks

  • Lock is initially free
  • Always acquire before accessing shared data structure
    • Beginning of procedure!
  • Always release after finishing with shared data
    • End of procedure!
    • Only the lock holder can release
    • DO NOT throw lock for someone else to release
  • Never access shared data without lock
    • Danger!

35 of 38

Double Checked Locking

if (p == NULL) {

lock.acquire(); if (p == NULL) {

p = newP();

}

lock.release();

}

use p->field1

newP() {

tmp = malloc(sizeof(p)); tmp->field1 = …

tmp->field2 = … return tmp;

}

36 of 38

Single Checked Locking

lock.acquire();

if (p == NULL) {

p = newP();

}

lock.release(); use p->field1

newP() {

tmp = malloc(sizeof(p)); tmp->field1 = …

tmp->field2 = … return tmp;

}

37 of 38

Example: Bounded Buffer

tryget() { lock.acquire(); item = NULL;

if (front < tail) {

item = buf[front % MAX]; front++;

}

lock.release(); return item;

}

tryput(item) { lock.acquire(); success = FALSE;

if ((tail – front) < MAX) {

buf[tail % MAX] = item; tail++;

success = TRUE;

}

lock.release(); return success;

}

Initially: front = tail = 0; lock = FREE; MAX is buffer capacity

38 of 38

Question

  • If tryget returns NULL, do we know the buffer is empty?
  • If we poll tryget in a loop, what happens to a thread calling tryput?