Parallelism & Concurrency
CSE 332 (Section 9)
Original slides by: Louis Maliyam • Monty Nitschke
Announcements
Problem 0
Parallel Prefix Sum
Recap
(Q0) Parallel Prefix Sum
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
(Q0) Parallel Prefix Sum
Range: 0, 8
Sum:
FL:
Range: 0, 4
Sum:
FL:
Range: 4, 8
Sum:
FL:
Range: 0, 2
Sum:
FL:
Range: 2, 4
Sum:
FL:
Range: 4, 6
Sum:
FL:
Range: 6, 8
Sum:
FL:
Range: 0, 1
Sum:
FL:
Range: 1, 2
Sum:
FL:
Range: 2, 3
Sum:
FL:
Range: 3, 4
Sum:
FL:
Range: 4, 5
Sum:
FL:
Range: 5, 6
Sum:
FL:
Range: 6, 7
Sum:
FL:
Range: 7, 8
Sum:
FL:
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
Note: say cutoff is 1
Divide problem into parallel pieces
(Q0) Parallel Prefix Sum - 1st Pass
Range: 0, 8
Sum:
FL:
Range: 0, 4
Sum:
FL:
Range: 4, 8
Sum:
FL:
Range: 0, 2
Sum:
FL:
Range: 2, 4
Sum:
FL:
Range: 4, 6
Sum:
FL:
Range: 6, 8
Sum:
FL:
Range: 0, 1
Sum: 8
FL:
Range: 1, 2
Sum: 9
FL:
Range: 2, 3
Sum: 6
FL:
Range: 3, 4
Sum: 3
FL:
Range: 4, 5
Sum: 2
FL:
Range: 5, 6
Sum: 5
FL:
Range: 6, 7
Sum: 7
FL:
Range: 7, 8
Sum: 4
FL:
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
Find sum at the cutoff
Note: say cutoff is 1
(Q0) Parallel Prefix Sum - 1st Pass
Range: 0, 8
Sum:
FL:
Range: 0, 4
Sum:
FL:
Range: 4, 8
Sum:
FL:
Range: 0, 2
Sum:
FL:
Range: 2, 4
Sum:
FL:
Range: 4, 6
Sum:
FL:
Range: 6, 8
Sum:
FL:
Range: 0, 1
Sum: 8
FL:
Range: 1, 2
Sum: 9
FL:
Range: 2, 3
Sum: 6
FL:
Range: 3, 4
Sum: 3
FL:
Range: 4, 5
Sum: 2
FL:
Range: 5, 6
Sum: 5
FL:
Range: 6, 7
Sum: 7
FL:
Range: 7, 8
Sum: 4
FL:
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
Find sum going up and build the tree by combining parallel pieces
Note: say cutoff is 1
Range: 0, 2
Sum: 17
FL:
Range: 2, 4
Sum: 9
FL:
Range: 4, 6
Sum: 7
FL:
Range: 6, 8
Sum: 11
FL:
Range: 4, 8
Sum: 18
FL:
Range: 0, 4
Sum: 26
FL:
Range: 0, 8
Sum: 44
FL:
(Q0) Parallel Prefix Sum - 2nd Pass
Range: 0, 8
Sum: 44
FL:
Range: 0, 4
Sum: 26
FL:
Range: 4, 8
Sum: 18
FL:
Range: 0, 2
Sum: 17
FL:
Range: 2, 4
Sum: 9
FL:
Range: 4, 6
Sum: 7
FL:
Range: 6, 8
Sum: 11
FL:
Range: 0, 1
Sum: 8
FL:
Range: 1, 2
Sum: 9
FL:
Range: 2, 3
Sum: 6
FL:
Range: 3, 4
Sum: 3
FL:
Range: 4, 5
Sum: 2
FL:
Range: 5, 6
Sum: 5
FL:
Range: 6, 7
Sum: 7
FL:
Range: 7, 8
Sum: 4
FL:
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
Fill out fromLeft going down the tree
Note: fromLeft is the sum of everything LEFT of the node’s range
Range: 0, 8
Sum: 44
FL: 0
Range: 0, 4
Sum: 26
FL: 0
Range: 4, 8
Sum: 18
FL: 26
Range: 0, 2
Sum: 17
FL: 0
Range: 2, 4
Sum: 9
FL: 17
Range: 4, 6
Sum: 7
FL: 26
Range: 6, 8
Sum: 11
FL: 33
Range: 0, 1
Sum: 8
FL: 0
Range: 1, 2
Sum: 9
FL: 8
Range: 2, 3
Sum: 6
FL: 17
Range: 3, 4
Sum: 3
FL: 23
Range: 4, 5
Sum: 2
FL: 26
Range: 5, 6
Sum: 5
FL: 28
Range: 6, 7
Sum: 7
FL: 33
Range: 7, 8
Sum: 4
FL: 40
(Q0) Parallel Prefix Sum - 2nd Pass
Range: 0, 8
Sum: 44
FL: 0
Range: 0, 4
Sum: 26
FL: 0
Range: 4, 8
Sum: 18
FL: 26
Range: 0, 2
Sum: 17
FL: 0
Range: 2, 4
Sum: 9
FL: 17
Range: 4, 6
Sum: 7
FL: 26
Range: 6, 8
Sum: 11
FL: 33
Range: 0, 1
Sum: 8
FL: 0
Range: 1, 2
Sum: 9
FL: 8
Range: 2, 3
Sum: 6
FL: 17
Range: 3, 4
Sum: 3
FL: 23
Range: 4, 5
Sum: 2
FL: 26
Range: 5, 6
Sum: 5
FL: 28
Range: 6, 7
Sum: 7
FL: 33
Range: 7, 8
Sum: 4
FL: 40
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
Finally, fill out output array at the cutoff.
Each leaf node should have fromLeft that it needs to compute the output
8 | | | | | | | |
8 | 17 | | | | | | |
8 | 17 | 23 | | | | | |
8 | 17 | 23 | 26 | | | | |
8 | 17 | 23 | 26 | 28 | | | |
8 | 17 | 23 | 26 | 28 | 33 | | |
8 | 17 | 23 | 26 | 28 | 33 | 40 | |
8 | 17 | 23 | 26 | 28 | 33 | 40 | 44 |
Problem 1
Parallel Prefix FindMin
Your Turn!
(Q2) Parallel Prefix FindMin
Range: 0, 8
Min:
FL:
Range: 0, 4
Min:
FL:
Range: 4, 8
Min:
FL:
Range: 0, 2
Min:
FL:
Range: 2, 4
Min:
FL:
Range: 4, 6
Min:
FL:
Range: 6, 8
Min:
FL:
Range: 0, 1
Min:
FL:
Range: 1, 2
Min:
FL:
Range: 2, 3
Min:
FL:
Range: 3, 4
Min:
FL:
Range: 4, 5
Min:
FL:
Range: 5, 6
Min:
FL:
Range: 6, 7
Min:
FL:
Range: 7, 8
Min:
FL:
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
(Q2) Parallel Prefix FindMin
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Range: 0, 4
Min:
FL:
Range: 4, 8
Min:
FL:
Range: 0, 2
Min:
FL:
Range: 2, 4
Min:
FL:
Range: 4, 6
Min:
FL:
Range: 6, 8
Min:
FL:
Range: 0, 1
Min:
FL:
Range: 1, 2
Min:
FL:
Range: 2, 3
Min:
FL:
Range: 3, 4
Min:
FL:
Range: 4, 5
Min:
FL:
Range: 5, 6
Min:
FL:
Range: 6, 7
Min:
FL:
Range: 7, 8
Min:
FL:
Note:
- FL is “FromLeft”
- p: non-leaf node
Range: 0, 8
Min:
FL:
(Q2) Parallel Prefix FindMin - 1st Pass
Range: 0, 8
Min: 2
FL:
Range: 0, 4
Min: 3
FL:
Range: 4, 8
Min: 2
FL:
Range: 0, 2
Min: 8
FL:
Range: 2, 4
Min: 3
FL:
Range: 4, 6
Min: 2
FL:
Range: 6, 8
Min: 4
FL:
Range: 0, 1
Min: 8
FL:
Range: 1, 2
Min: 9
FL:
Range: 2, 3
Min: 6
FL:
Range: 3, 4
Min: 3
FL:
Range: 4, 5
Min: 2
FL:
Range: 5, 6
Min: 5
FL:
Range: 6, 7
Min: 7
FL:
Range: 7, 8
Min: 4
FL:
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
(Q2) Parallel Prefix FindMin - 2nd Pass
Range: 0, 8
Min: 2
FL: none
Range: 0, 4
Min: 3
FL: none
Range: 4, 8
Min: 2
FL: 3
Range: 0, 2
Min: 8
FL: none
Range: 2, 4
Min: 3
FL: 8
Range: 4, 6
Min: 2
FL: 3
Range: 6, 8
Min: 4
FL: 2
Range: 0, 1
Min: 8
FL: none
Range: 1, 2
Min: 9
FL: 8
Range: 2, 3
Min: 6
FL: 8
Range: 3, 4
Min: 3
FL: 6
Range: 4, 5
Min: 2
FL: 3
Range: 5, 6
Min: 5
FL: 2
Range: 6, 7
Min: 7
FL: 2
Range: 7, 8
Min: 4
FL: 2
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
8 | 8 | 6 | 3 | 2 | 2 | 2 | 2 |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
(Q1) Parallel Prefix FindMin - Breakout 1
Range: 0, 8
Min:
FL:
Range: 0, 4
Min:
FL:
Range: 4, 8
Min:
FL:
Range: 0, 2
Min:
FL:
Range: 2, 4
Min:
FL:
Range: 4, 6
Min:
FL:
Range: 6, 8
Min:
FL:
Range: 0, 1
Min:
FL:
Range: 1, 2
Min:
FL:
Range: 2, 3
Min:
FL:
Range: 3, 4
Min:
FL:
Range: 4, 5
Min:
FL:
Range: 5, 6
Min:
FL:
Range: 6, 7
Min:
FL:
Range: 7, 8
Min:
FL:
8 | 9 | 6 | 3 | 2 | 5 | 7 | 4 |
| | | | | | | |
Input
Output
Note:
- FL is “FromLeft”
- p: non-leaf node
Problem 2
Work it Out [the Span]
(Q2) Work it Out [the Span]
(a) Define work and span
(b) How do we calculate work and span?
(c) Does adding more processors affect the work or span?
(Q2) Work it Out [the Span]
(a) Define work and span
Work: how long the running time of a program would be with just one processor
Span: the running time with an infinite number of processors
(b) How do we calculate work and span?
(c) Does adding more processors affect the work or span?
(a) Define work and span
Work: how long the running time of a program would be with just one processor
Span: the running time with an infinite number of processors
(b) How do we calculate work and span?
Work: sum all the work done by each processor
Span: calculate the longest dependence chain (the longest ‘branch’ in the parallel ‘tree’)
(c) Does adding more processors affect the work or span?
(a) Define work and span
Work: how long the running time of a program would be with just one processor
Span: the running time with an infinite number of processors
(b) How do we calculate work and span?
Work: sum of run-time of all nodes in the fork/join DAG
Span: calculate the longest dependence chain (the longest ‘branch’ in the DAG)
(c) Does adding more processors affect the work or span?
Neither - both work and span are defined by a fixed number of processors (1 for work and infinity for span) so adding more processors won’t affect them
Problem 2.5
Task Graph Scheduling
(Q2.5) Task Graph Scheduling
T1
1
T3
2
T7
1
T6
1
T5
4
T4
2
T2
2
(a) What is the total work?
(b) What is the span?
(c) What is the minimum amount of time for a schedule of the tasks on two processors? Show the schedule.
(d) What is the maximum amount of time for a schedule of the tasks on two processors where there is no unnecessary idle time (ie. there is a task available and a processor is not working)? Show the schedule.
1 + 2 + 2 + 2 + 4 + 1 + 1 = 13
1 + 2 + 4 = 7
Time: 1 2 3 4 5 6 7
Processor 1: T1 T2—-T5—------- Total: 7
Processor 2: T3—-T4—--T6T7
Time: 1 2 3 4 5 6 7 8 9 10
Processor 1: T1 T3—-T6T2—--T5—-------- Total: 10
Processor 2: T4—-T7
(multiple correct schedules with the same time)
Concurrency
Concurrency Review
Problem 3
User Profile
Discuss with neighbors
(Q3a) User Profile
You are designing a new social-networking site to take over the world. To handle all the volume you expect, you want to support multiple threads with a fine-grained locking strategy in which each users profile is protected with a different lock. At the core of your system is this simple class definition:
(a) The constructor has a concurrency error. What is it and how would you fix it? A short English answer is enough - no code or details required.
There is a data race on id_counter!!
Example issue: two accounts could get the same id if they are created simultaneously by different threads. Or even stranger things could happen.
Fix: You could synchronize on a lock for id_counter.
Key takeaway: if at least one of the accesses to variable is for writing, LOCK the variable to prevent data race!
Note: synchronized keyword on a method is used to lock this object. Else, it’s used to lock a specific object.
(Q3b) User Profile
You are designing a new social-networking site to take over the world. To handle all the volume you expect, you want to support multiple threads with a fine-grained locking strategy in which each users profile is protected with a different lock. At the core of your system is this simple class definition:
(b) The makeFriends method has a concurrency error. What is it and how would you fix it? A short English answer is enough no code or details required.
Key takeaway: to prevent deadlock, make sure all threads acquire the locks in a consistent order.
There is a potential deadlock!!
Example issue: say there are two UserProfile objects obj1 and obj2. One thread calls obj1.makeFriends(obj2) when another thread calls obj2.makeFriends(obj1). They both perform line 13 and wait forever (deadlock) at line 14.
Fix: acquire locks in a consistent order based on the id fields, which are unique.
Problem 4
Bubble Tea
Discuss with neighbors
(Q4a) Bubble Tea
The BubbleTea class manages a bubble tea order assembled by multiple workers. Multiple threads could be accessing the same BubbleTea object. Assume the Stack objects ARE THREAD-SAFE, have enough space, and operations on them will not throw an exception.
(a) Does the BubbleTea class have:
a race condition potential for deadlock
a data race none of these
Assuming stack is thread-safe, a race condition still exists. Specifically, the code could lead to a bad interleaving.
Example issue: two threads call addLiquid() at the same time, say they pass the hasCapacity() test with a value of 7 for drink.size(). Then both threads would be free to push onto the drink stack, exceeding maxDrinkAmount. This is a bad interleaving because it violates the expected behavior of the class.
Key takeaway: A “thread-safe” Stack (drink) prevents data race on itself because it can’t be modified from two threads at the same time. However, bad interleavings may still happen; a lock is needed to protect the invariants of the class.
(Q4b) Bubble Tea
The BubbleTea class manages a bubble tea order assembled by multiple workers. Multiple threads could be accessing the same BubbleTea object. Assume the Stack objects ARE THREAD-SAFE, have enough space, and operations on them will not throw an exception.
(b) Suppose we made the addTopping method synchronized, and changed nothing else in the code. Does this modified BubbleTea class above have:
a race condition potential for deadlock
a data race none of these
This does NOT fix the problem. Making addTopping() synchronized still allows for the same pattern of execution in addLiquid() as described earlier.
However, this change does reduce the effective concurrency in the code, so it actually makes things slightly worse.
Problem 5
Phone Monitor
Discuss with neighbors
(Q5a) Phone Monitor
The PhoneMonitor class tries to help manage how much you use your cell phone each day. Multiple threads can access the same PhoneMonitor object. Remember that synchronized gives you reentrancy.
(a) Does the PhoneMonitor class have:
a race condition potential for deadlock
a data race none of these
There is a data race on phoneOn. A data race is by definition a type of race condition.
Example issue: Thread 1 (not needing to hold any locks) could be at line 11 reading phoneOn, while Thread 2 is at line 27 (holding both of the locks) writing phoneOn.
Key takeaway: a data race is considered a type of race condition. If at least one of the accesses to variable is for writing, LOCK the variable to prevent data race!
(Q5b) Phone Monitor
(b) Suppose we made the checkLimits method public, and changed nothing else in the code. Does this modified PhoneMonitor class have:
a race condition potential for deadlock
a data race none of these
Key takeaway: If locks are acquired in different orders between two methods that clients have access to, it will lead to a potential deadlock.
Same data race on phoneOn still exists (and thus so does the race condition). Now, there’s also potential for deadlock.
Example issue: Thread 1 calls accessPhone(), stays at line 13 holding the accessesLock lock, trying to get the minutesLock lock. Thread 2 now calls checkLimits(), stays at line 24, holding the minutesLock lock and trying to get the accessesLock lock.
The PhoneMonitor class tries to help manage how much you use your cell phone each day. Multiple threads can access the same PhoneMonitor object. Remember that synchronized gives you reentrancy.