1 of 31

Parallelism & Concurrency

CSE 332 (Section 9)

Original slides by: Louis Maliyam Monty Nitschke

2 of 31

Announcements

  • EX11-14 due TODAY (12/01 11:59 PM)
    • No lates accepted.
  • Project 3 due next Tuesday (12/06 11:59 PM)
    • You got this! 💪

3 of 31

Problem 0

Parallel Prefix Sum

Recap

4 of 31

(Q0) Parallel Prefix Sum

8

9

6

3

2

5

7

4

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

  • leaves[i].sum =
  • p.sum =
  • p.right.FL =
  • p.left.FL =
  • output[i] =

5 of 31

(Q0) Parallel Prefix Sum

Range: 0, 8

Sum:

FL:

Range: 0, 4

Sum:

FL:

Range: 4, 8

Sum:

FL:

Range: 0, 2

Sum:

FL:

Range: 2, 4

Sum:

FL:

Range: 4, 6

Sum:

FL:

Range: 6, 8

Sum:

FL:

Range: 0, 1

Sum:

FL:

Range: 1, 2

Sum:

FL:

Range: 2, 3

Sum:

FL:

Range: 3, 4

Sum:

FL:

Range: 4, 5

Sum:

FL:

Range: 5, 6

Sum:

FL:

Range: 6, 7

Sum:

FL:

Range: 7, 8

Sum:

FL:

8

9

6

3

2

5

7

4

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

Note: say cutoff is 1

  • leaves[i].sum =
  • p.sum =
  • p.right.FL =
  • p.left.FL =
  • output[i] =

Divide problem into parallel pieces

6 of 31

(Q0) Parallel Prefix Sum - 1st Pass

  • leaves[i].sum = input[i]
  • p.sum =
  • p.right.FL =
  • p.left.FL =
  • output[i] =

Range: 0, 8

Sum:

FL:

Range: 0, 4

Sum:

FL:

Range: 4, 8

Sum:

FL:

Range: 0, 2

Sum:

FL:

Range: 2, 4

Sum:

FL:

Range: 4, 6

Sum:

FL:

Range: 6, 8

Sum:

FL:

Range: 0, 1

Sum: 8

FL:

Range: 1, 2

Sum: 9

FL:

Range: 2, 3

Sum: 6

FL:

Range: 3, 4

Sum: 3

FL:

Range: 4, 5

Sum: 2

FL:

Range: 5, 6

Sum: 5

FL:

Range: 6, 7

Sum: 7

FL:

Range: 7, 8

Sum: 4

FL:

8

9

6

3

2

5

7

4

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

Find sum at the cutoff

Note: say cutoff is 1

7 of 31

(Q0) Parallel Prefix Sum - 1st Pass

  • leaves[i].sum = input[i]
  • p.sum = p.left.sum + p.right.sum
  • p.right.FL =
  • p.left.FL =
  • output[i] =

Range: 0, 8

Sum:

FL:

Range: 0, 4

Sum:

FL:

Range: 4, 8

Sum:

FL:

Range: 0, 2

Sum:

FL:

Range: 2, 4

Sum:

FL:

Range: 4, 6

Sum:

FL:

Range: 6, 8

Sum:

FL:

Range: 0, 1

Sum: 8

FL:

Range: 1, 2

Sum: 9

FL:

Range: 2, 3

Sum: 6

FL:

Range: 3, 4

Sum: 3

FL:

Range: 4, 5

Sum: 2

FL:

Range: 5, 6

Sum: 5

FL:

Range: 6, 7

Sum: 7

FL:

Range: 7, 8

Sum: 4

FL:

8

9

6

3

2

5

7

4

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

Find sum going up and build the tree by combining parallel pieces

Note: say cutoff is 1

Range: 0, 2

Sum: 17

FL:

Range: 2, 4

Sum: 9

FL:

Range: 4, 6

Sum: 7

FL:

Range: 6, 8

Sum: 11

FL:

Range: 4, 8

Sum: 18

FL:

Range: 0, 4

Sum: 26

FL:

Range: 0, 8

Sum: 44

FL:

8 of 31

(Q0) Parallel Prefix Sum - 2nd Pass

  • leaves[i].sum = input[i]
  • p.sum = p.left.sum + p.right.sum
  • p.right.FL = p.FL + p.left.sum
  • p.left.FL = p.FL
  • output[i] =

Range: 0, 8

Sum: 44

FL:

Range: 0, 4

Sum: 26

FL:

Range: 4, 8

Sum: 18

FL:

Range: 0, 2

Sum: 17

FL:

Range: 2, 4

Sum: 9

FL:

Range: 4, 6

Sum: 7

FL:

Range: 6, 8

Sum: 11

FL:

Range: 0, 1

Sum: 8

FL:

Range: 1, 2

Sum: 9

FL:

Range: 2, 3

Sum: 6

FL:

Range: 3, 4

Sum: 3

FL:

Range: 4, 5

Sum: 2

FL:

Range: 5, 6

Sum: 5

FL:

Range: 6, 7

Sum: 7

FL:

Range: 7, 8

Sum: 4

FL:

8

9

6

3

2

5

7

4

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

Fill out fromLeft going down the tree

Note: fromLeft is the sum of everything LEFT of the node’s range

Range: 0, 8

Sum: 44

FL: 0

Range: 0, 4

Sum: 26

FL: 0

Range: 4, 8

Sum: 18

FL: 26

Range: 0, 2

Sum: 17

FL: 0

Range: 2, 4

Sum: 9

FL: 17

Range: 4, 6

Sum: 7

FL: 26

Range: 6, 8

Sum: 11

FL: 33

Range: 0, 1

Sum: 8

FL: 0

Range: 1, 2

Sum: 9

FL: 8

Range: 2, 3

Sum: 6

FL: 17

Range: 3, 4

Sum: 3

FL: 23

Range: 4, 5

Sum: 2

FL: 26

Range: 5, 6

Sum: 5

FL: 28

Range: 6, 7

Sum: 7

FL: 33

Range: 7, 8

Sum: 4

FL: 40

9 of 31

(Q0) Parallel Prefix Sum - 2nd Pass

Range: 0, 8

Sum: 44

FL: 0

Range: 0, 4

Sum: 26

FL: 0

Range: 4, 8

Sum: 18

FL: 26

Range: 0, 2

Sum: 17

FL: 0

Range: 2, 4

Sum: 9

FL: 17

Range: 4, 6

Sum: 7

FL: 26

Range: 6, 8

Sum: 11

FL: 33

Range: 0, 1

Sum: 8

FL: 0

Range: 1, 2

Sum: 9

FL: 8

Range: 2, 3

Sum: 6

FL: 17

Range: 3, 4

Sum: 3

FL: 23

Range: 4, 5

Sum: 2

FL: 26

Range: 5, 6

Sum: 5

FL: 28

Range: 6, 7

Sum: 7

FL: 33

Range: 7, 8

Sum: 4

FL: 40

8

9

6

3

2

5

7

4

Input

Output

  • leaves[i].sum = input[i]
  • p.sum = p.left.sum + p.right.sum
  • p.right.FL = p.FL + p.left.sum
  • p.left.FL = p.FL
  • output[i] = leaves[i].FL + input[i]

Note:

- FL is “FromLeft”

- p: non-leaf node

Finally, fill out output array at the cutoff.

Each leaf node should have fromLeft that it needs to compute the output

8

8

17

8

17

23

8

17

23

26

8

17

23

26

28

8

17

23

26

28

33

8

17

23

26

28

33

40

8

17

23

26

28

33

40

44

10 of 31

Problem 1

Parallel Prefix FindMin

Your Turn!

11 of 31

(Q2) Parallel Prefix FindMin

Range: 0, 8

Min:

FL:

Range: 0, 4

Min:

FL:

Range: 4, 8

Min:

FL:

Range: 0, 2

Min:

FL:

Range: 2, 4

Min:

FL:

Range: 4, 6

Min:

FL:

Range: 6, 8

Min:

FL:

Range: 0, 1

Min:

FL:

Range: 1, 2

Min:

FL:

Range: 2, 3

Min:

FL:

Range: 3, 4

Min:

FL:

Range: 4, 5

Min:

FL:

Range: 5, 6

Min:

FL:

Range: 6, 7

Min:

FL:

Range: 7, 8

Min:

FL:

8

9

6

3

2

5

7

4

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

  • leaves[i].min =
  • p.min =
  • p.right.FL =
  • p.left.FL =
  • output[i] =

12 of 31

(Q2) Parallel Prefix FindMin

8

9

6

3

2

5

7

4

Input

Output

  • leaves[i].min = input[i]
  • p.min =
  • p.right.FL =
  • p.left.FL =
  • output[i] =
  • leaves[i].min = input[i]
  • p.min = min(p.left.min, p.right.min)
  • p.right.FL =
  • p.left.FL =
  • output[i] =
  • leaves[i].min = input[i]
  • p.min = min(p.left.min, p.right.min)
  • p.right.FL = min(p.FL, p.left.min)
  • p.left.FL =
  • output[i] =
  • leaves[i].min = input[i]
  • p.min = min(p.left.min, p.right.min)
  • p.right.FL = min(p.FL, p.left.min)
  • p.left.FL = p.FL
  • output[i] =
  • leaves[i].min = input[i]
  • p.min = min(p.left.min, p.right.min)
  • p.right.FL = min(p.FL, p.left.min)
  • p.left.FL = p.FL
  • output[i] = min(leaves[i].FL, input[i])

Range: 0, 4

Min:

FL:

Range: 4, 8

Min:

FL:

Range: 0, 2

Min:

FL:

Range: 2, 4

Min:

FL:

Range: 4, 6

Min:

FL:

Range: 6, 8

Min:

FL:

Range: 0, 1

Min:

FL:

Range: 1, 2

Min:

FL:

Range: 2, 3

Min:

FL:

Range: 3, 4

Min:

FL:

Range: 4, 5

Min:

FL:

Range: 5, 6

Min:

FL:

Range: 6, 7

Min:

FL:

Range: 7, 8

Min:

FL:

Note:

- FL is “FromLeft”

- p: non-leaf node

Range: 0, 8

Min:

FL:

13 of 31

(Q2) Parallel Prefix FindMin - 1st Pass

Range: 0, 8

Min: 2

FL:

Range: 0, 4

Min: 3

FL:

Range: 4, 8

Min: 2

FL:

Range: 0, 2

Min: 8

FL:

Range: 2, 4

Min: 3

FL:

Range: 4, 6

Min: 2

FL:

Range: 6, 8

Min: 4

FL:

Range: 0, 1

Min: 8

FL:

Range: 1, 2

Min: 9

FL:

Range: 2, 3

Min: 6

FL:

Range: 3, 4

Min: 3

FL:

Range: 4, 5

Min: 2

FL:

Range: 5, 6

Min: 5

FL:

Range: 6, 7

Min: 7

FL:

Range: 7, 8

Min: 4

FL:

8

9

6

3

2

5

7

4

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

  • leaves[i].min = input[i]
  • p.min = min(p.left.min, p.right.min)
  • p.right.FL = min(p.FL, p.left.min)
  • p.left.FL = p.FL
  • output[i] = min(leaves[i].FL, input[i])

14 of 31

(Q2) Parallel Prefix FindMin - 2nd Pass

Range: 0, 8

Min: 2

FL: none

Range: 0, 4

Min: 3

FL: none

Range: 4, 8

Min: 2

FL: 3

Range: 0, 2

Min: 8

FL: none

Range: 2, 4

Min: 3

FL: 8

Range: 4, 6

Min: 2

FL: 3

Range: 6, 8

Min: 4

FL: 2

Range: 0, 1

Min: 8

FL: none

Range: 1, 2

Min: 9

FL: 8

Range: 2, 3

Min: 6

FL: 8

Range: 3, 4

Min: 3

FL: 6

Range: 4, 5

Min: 2

FL: 3

Range: 5, 6

Min: 5

FL: 2

Range: 6, 7

Min: 7

FL: 2

Range: 7, 8

Min: 4

FL: 2

8

9

6

3

2

5

7

4

8

8

6

3

2

2

2

2

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

  • leaves[i].min = input[i]
  • p.min = min(p.left.min, p.right.min)
  • p.right.FL = min(p.FL, p.left.min)
  • p.left.FL = p.FL
  • output[i] = min(leaves[i].FL, input[i])

15 of 31

(Q1) Parallel Prefix FindMin - Breakout 1

Range: 0, 8

Min:

FL:

Range: 0, 4

Min:

FL:

Range: 4, 8

Min:

FL:

Range: 0, 2

Min:

FL:

Range: 2, 4

Min:

FL:

Range: 4, 6

Min:

FL:

Range: 6, 8

Min:

FL:

Range: 0, 1

Min:

FL:

Range: 1, 2

Min:

FL:

Range: 2, 3

Min:

FL:

Range: 3, 4

Min:

FL:

Range: 4, 5

Min:

FL:

Range: 5, 6

Min:

FL:

Range: 6, 7

Min:

FL:

Range: 7, 8

Min:

FL:

8

9

6

3

2

5

7

4

Input

Output

Note:

- FL is “FromLeft”

- p: non-leaf node

  • leaves[i].min = input[i]
  • p.min = min(p.left.min, p.right.min)
  • p.right.FL = min(p.FL, p.left.min)
  • p.left.FL = p.FL
  • output[i] = min(leaves[i].min, input[i])

16 of 31

Problem 2

Work it Out [the Span]

17 of 31

(Q2) Work it Out [the Span]

(a) Define work and span

(b) How do we calculate work and span?

(c) Does adding more processors affect the work or span?

18 of 31

(Q2) Work it Out [the Span]

(a) Define work and span

Work: how long the running time of a program would be with just one processor

Span: the running time with an infinite number of processors

(b) How do we calculate work and span?

(c) Does adding more processors affect the work or span?

(a) Define work and span

Work: how long the running time of a program would be with just one processor

Span: the running time with an infinite number of processors

(b) How do we calculate work and span?

Work: sum all the work done by each processor

Span: calculate the longest dependence chain (the longest ‘branch’ in the parallel ‘tree’)

(c) Does adding more processors affect the work or span?

(a) Define work and span

Work: how long the running time of a program would be with just one processor

Span: the running time with an infinite number of processors

(b) How do we calculate work and span?

Work: sum of run-time of all nodes in the fork/join DAG

Span: calculate the longest dependence chain (the longest ‘branch’ in the DAG)

(c) Does adding more processors affect the work or span?

Neither - both work and span are defined by a fixed number of processors (1 for work and infinity for span) so adding more processors won’t affect them

19 of 31

Problem 2.5

Task Graph Scheduling

20 of 31

(Q2.5) Task Graph Scheduling

T1

1

T3

2

T7

1

T6

1

T5

4

T4

2

T2

2

(a) What is the total work?

(b) What is the span?

(c) What is the minimum amount of time for a schedule of the tasks on two processors? Show the schedule.

(d) What is the maximum amount of time for a schedule of the tasks on two processors where there is no unnecessary idle time (ie. there is a task available and a processor is not working)? Show the schedule.

1 + 2 + 2 + 2 + 4 + 1 + 1 = 13

1 + 2 + 4 = 7

Time: 1 2 3 4 5 6 7

Processor 1: T1 T2—-T5—------- Total: 7

Processor 2: T3—-T4—--T6T7

Time: 1 2 3 4 5 6 7 8 9 10

Processor 1: T1 T3—-T6T2—--T5—-------- Total: 10

Processor 2: T4—-T7

(multiple correct schedules with the same time)

21 of 31

Concurrency

22 of 31

Concurrency Review

  • Race Condition: when the result of your program depends on how the threads are scheduled/interleaved
    • Data Race: A type of race condition that can happen in 2 ways:
      • 1. Two threads trying to write to the same variable at the same time → write-write
      • 2. Thread1 trying to write to a variable while thread2 tries to read from it → write-read
      • Note: read-reads don’t cause concurrency errors because they don’t modify the variable!
    • Bad Interleaving: A type of race condition where the interleaving of threads can end up in bad and unintended intermediate states
      • Example: thread1 increments a variable, thread2 decrements the variable, thread1 is in an unexpected intermediate state
  • Deadlock: a cycle of multiple threads waiting on each other
    • Thread1 is waiting on a resource held by thread2
    • Thread2 is waiting on a resource held by thread1

23 of 31

Problem 3

User Profile

Discuss with neighbors

24 of 31

(Q3a) User Profile

You are designing a new social-networking site to take over the world. To handle all the volume you expect, you want to support multiple threads with a fine-grained locking strategy in which each users profile is protected with a different lock. At the core of your system is this simple class definition:

(a) The constructor has a concurrency error. What is it and how would you fix it? A short English answer is enough - no code or details required.

There is a data race on id_counter!!

Example issue: two accounts could get the same id if they are created simultaneously by different threads. Or even stranger things could happen.

Fix: You could synchronize on a lock for id_counter.

Key takeaway: if at least one of the accesses to variable is for writing, LOCK the variable to prevent data race!

Note: synchronized keyword on a method is used to lock this object. Else, it’s used to lock a specific object.

25 of 31

(Q3b) User Profile

You are designing a new social-networking site to take over the world. To handle all the volume you expect, you want to support multiple threads with a fine-grained locking strategy in which each users profile is protected with a different lock. At the core of your system is this simple class definition:

(b) The makeFriends method has a concurrency error. What is it and how would you fix it? A short English answer is enough no code or details required.

Key takeaway: to prevent deadlock, make sure all threads acquire the locks in a consistent order.

There is a potential deadlock!!

Example issue: say there are two UserProfile objects obj1 and obj2. One thread calls obj1.makeFriends(obj2) when another thread calls obj2.makeFriends(obj1). They both perform line 13 and wait forever (deadlock) at line 14.

Fix: acquire locks in a consistent order based on the id fields, which are unique.

26 of 31

Problem 4

Bubble Tea

Discuss with neighbors

27 of 31

(Q4a) Bubble Tea

The BubbleTea class manages a bubble tea order assembled by multiple workers. Multiple threads could be accessing the same BubbleTea object. Assume the Stack objects ARE THREAD-SAFE, have enough space, and operations on them will not throw an exception.

(a) Does the BubbleTea class have:

a race condition potential for deadlock

a data race none of these

Assuming stack is thread-safe, a race condition still exists. Specifically, the code could lead to a bad interleaving.

Example issue: two threads call addLiquid() at the same time, say they pass the hasCapacity() test with a value of 7 for drink.size(). Then both threads would be free to push onto the drink stack, exceeding maxDrinkAmount. This is a bad interleaving because it violates the expected behavior of the class.

Key takeaway: A “thread-safe” Stack (drink) prevents data race on itself because it can’t be modified from two threads at the same time. However, bad interleavings may still happen; a lock is needed to protect the invariants of the class.

28 of 31

(Q4b) Bubble Tea

The BubbleTea class manages a bubble tea order assembled by multiple workers. Multiple threads could be accessing the same BubbleTea object. Assume the Stack objects ARE THREAD-SAFE, have enough space, and operations on them will not throw an exception.

(b) Suppose we made the addTopping method synchronized, and changed nothing else in the code. Does this modified BubbleTea class above have:

a race condition potential for deadlock

a data race none of these

This does NOT fix the problem. Making addTopping() synchronized still allows for the same pattern of execution in addLiquid() as described earlier.

However, this change does reduce the effective concurrency in the code, so it actually makes things slightly worse.

29 of 31

Problem 5

Phone Monitor

Discuss with neighbors

30 of 31

(Q5a) Phone Monitor

The PhoneMonitor class tries to help manage how much you use your cell phone each day. Multiple threads can access the same PhoneMonitor object. Remember that synchronized gives you reentrancy.

(a) Does the PhoneMonitor class have:

a race condition potential for deadlock

a data race none of these

There is a data race on phoneOn. A data race is by definition a type of race condition.

Example issue: Thread 1 (not needing to hold any locks) could be at line 11 reading phoneOn, while Thread 2 is at line 27 (holding both of the locks) writing phoneOn.

Key takeaway: a data race is considered a type of race condition. If at least one of the accesses to variable is for writing, LOCK the variable to prevent data race!

31 of 31

(Q5b) Phone Monitor

(b) Suppose we made the checkLimits method public, and changed nothing else in the code. Does this modified PhoneMonitor class have:

a race condition potential for deadlock

a data race none of these

Key takeaway: If locks are acquired in different orders between two methods that clients have access to, it will lead to a potential deadlock.

Same data race on phoneOn still exists (and thus so does the race condition). Now, there’s also potential for deadlock.

Example issue: Thread 1 calls accessPhone(), stays at line 13 holding the accessesLock lock, trying to get the minutesLock lock. Thread 2 now calls checkLimits(), stays at line 24, holding the minutesLock lock and trying to get the accessesLock lock.

The PhoneMonitor class tries to help manage how much you use your cell phone each day. Multiple threads can access the same PhoneMonitor object. Remember that synchronized gives you reentrancy.