Parallel and Distributed Computing
April 15, 2024
Data 101/Info 258, Spring 2024 @ UC Berkeley
Aditya Parameswaran https://data101.org/sp24
1
LECTURE 22
From Concurrency to Distributed Computing
2
Concept of concurrent transactions:
(We have just done this)
From Concurrency to Distributed Computing
3
Concept of concurrent transactions:
(We have just done this)
actually
Concept of parallel/distributed computing:
(we are now going to do this)
Focus on relational setting, but also applies to semi-structured data!
What are some Parallel DBMSs/Parallel Computation Architectures?
4
Shared Memory
Multiple cores accessing shared memory and disk
(This is what you get when you reserve a “beefy machine”)
Shared Nothing
Multiple commodity nodes each with their own memory and disk
Shared Disk
Worker nodes have own memory and may have local disk as a cache
Each may be “responsible” for a piece of data from shared disk.
Parallel Data Processing Considerations
Data Partitioning
Parallel Data Processing
Hardware edge cases
5
We’ll only get to discuss the first two items (partitioning, processing). But know that physical restrictions are the most important in daily operation!
Partitioning Strategies
Partitioning Strategies
Partitioned Operators and Pipelining
Partitioned Joins and Aggregations
MapReduce
MapReduce’s Legacy and Spark
[Extra] Parallel DB: Sort-Merge, Sort
[Extra] Volcano Framework for Parallelism
6
Lecture 22, Data 101/Info 258 Spring 2024
Partitioning Approaches
How we partition our data depends on the data model itself.
7
Horizontal partitioning:
Vertical partitioning:
We focus mostly on horizontal partitioning, as that is the more generalizable strategy.
Three (Horizontal) Partitioning Approaches
8
🤔
A. Range-based partitioning
B. Hash-based partitioning
C. Round-robin partitioning
Which partitioning approach matches to each diagram?
1.
2.
3.
A...E
F...J
K...N
O...S
T...Z
A...E
F...J
K...N
O...S
T...Z
A...E
F...J
K...N
O...S
T...Z
Three (Horizontal) Partitioning Approaches
9
C. Round-robin partitioning
Hash tuple i to processor i % n
B. Hash-based partitioning
Pick a field. Pick processor for a tuple by hashing with this field.
A. Range-based partitioning
Pick processor j for tuple i if its value for some field is within range i.
1.
2.
3.
A...E
F...J
K...N
O...S
T...Z
A...E
F...J
K...N
O...S
T...Z
A...E
F...J
K...N
O...S
T...Z
Three (Horizontal) Partitioning Approaches
Also: What do we partition on? Think of partitioning as coarse-grained indexing.
10
1.
2.
3.
A...E
F...J
K...N
O...S
T...Z
A...E
F...J
K...N
O...S
T...Z
A...E
F...J
K...N
O...S
T...Z
Each approach has tradeoffs!�We’ll discuss one: skew.
Hash-based partitioning
Pick a field. Pick processor for a tuple by hashing with this field.
Range-based partitioning
Pick processor j for tuple i if its value for some field is within range i.
Round-robin partitioning
Hash tuple i to processor i % n
Skew
Skew is a common problem when dealing with parallel processing across many machines.
Reason 1: Uneven data distribution
Reason 2: Uneven access patterns
11
Three (Horizontal) Partitioning Approaches and Skew
12
Hash-based partitioning
Pick a field. Pick processor for a tuple by hashing with this field.
Range-based partitioning
Pick processor j for tuple i if its value for some field is within range i.
Round-robin partitioning
Hash tuple i to processor i % n
1.
2.
3.
A...E
F...J
K...N
O...S
T...Z
A...E
F...J
K...N
O...S
T...Z
A...E
F...J
K...N
O...S
T...Z
Fairly susceptible to skew if there are some partitioning attribute ranges that are very popular (from either standpoint: data or access)
Somewhat susceptible to skew if there are some partitioning attribute values that are very popular (from either standpoint: data or access)
Not susceptible to skew since work is equally divided,
but there is more work involved since all partitions need to be consulted
Partitioned operations
Data Partitioning
Parallel Data Processing
Hardware edge cases
13
Two types of parallel operations we discuss:
Partitioned Operators and Pipelining
Partitioning Strategies
Partitioned Operators and Pipelining
Partitioned Joins and Aggregations
MapReduce
MapReduce’s Legacy and Spark
[Extra] Parallel DB: Sort-Merge, Sort
[Extra] Volcano Framework for Parallelism
14
Lecture 22, Data 101/Info 258 Spring 2024
Partitioned Operators, Manager-Worker, and Operation Pipelining
Many parallel operations use computing nodes in two ways:
Partitioned operators used to speed up a given DB op,�where workers all perform the same task in parallel�designated by a manager.
15
Quinn, Parallel Programming, 2008.
Partitioned operators can also be pipelined.
Operator Speedup with Partitioning and Pipelining
Scans
Selection/Projection
16
Grouped Aggregation, Joins, Sorting
How do Partitioned Operators Work? Selection
Point Selection (i.e., Selection by Equality) SELECT * … WHERE Name = 'Shana';
Range Selection (i.e., Selection by Equality) SELECT * … WHERE Name LIKE 'S%';
17
A...E
F...J
K...N
O...S
T...Z
Range
A...E
F...J
K...N
O...S
T...Z
Hash
A...E
F...J
K...N
O...S
T...Z
Round-Robin
How do Partitioned Operators Work? Updates and Insertions
Updates UPDATE … SET … WHERE Name = 'Shana';
Insertion
18
A...E
F...J
K...N
O...S
T...Z
Range
A...E
F...J
K...N
O...S
T...Z
Hash
A...E
F...J
K...N
O...S
T...Z
Round-Robin
From now on, assume the worst case scenario.
Given the conditions on which we partition, it’s sometimes easier to assume�that all partitions need to be consulted on any selection/insertion/update.
…unless indexes exist!
If we have indexes, then we look at indexes first, then access data partitions.
19
Partitioned Joins and Aggregations
Conflicting Actions Review
Determining Serializability: Conflict Graphs
Formal Terminology: Conflict Serializable
Performance Tradeoffs: Snapshot Isolation
—
Partitioning Strategies
Partitioned Operators and Pipelining
MapReduce
MapReduce’s Legacy and Spark
[Extra] Weak Isolation: Read Commit
[Extra] Partitioned Joins and Aggregations
[Extra] Parallel DB: Sort-Merge, Sort
[Extra] Volcano Framework for Parallelism
20
Lecture 22, Data 101/Info 258 Spring 2024
How do Partitioned Operators Work? Joins
Natural join R and S on A
Neither R nor S is partitioned on A
Q: How would we do this join?
Repartition R on A�Repartition S on A
Then each node individually does a hash join, a merge join, a nested loops, etc.
21
R1
R2
R3
Rn
R'1
R’2
R’3
R’n
S’1
S’2
S’3
S’n
S1
S2
S3
Sn
Parallel Joins with Complex Join Conditions
When the join conditions become a little more complex
Can’t simply apply repartitioning as is
Q: What would we do for this?
22
Parallel Join: Fragment-Replicate Join
Every fragment in R is joined with every fragment in S
Extreme case: only one fragment of R (when it is small)
23
R1
R2
R3
Rn
S1
S2
S3
Sn
RS11
RS21
RS31
RSn1
How do Partitioned Operators Work? Aggregation
Example: SELECT A, SUM(B) FROM R GROUP BY A
Easy to do if GROUP BY is on partitioning attribute (A)
Else:
Q: Any room for optimization?
A: Can avoid communicating all of the tuples
24
R1
R2
R3
Rn
Agg1
Agg2
Agg3
Aggn
MapReduce
Partitioning Strategies
Partitioned Operators and Pipelining
Partitioned Joins and Aggregations
MapReduce
MapReduce’s Legacy and Spark
[Extra] Parallel DB: Sort-Merge, Sort
[Extra] Volcano Framework for Parallelism
25
Lecture 22, Data 101/Info 258 Spring 2024
Parallel Data Processing Considerations
Data Partitioning
Parallel Data Processing
Hardware edge cases
26
We’ll only get to discuss the first two items (partitioning, processing). But know that physical restrictions are the most important in daily operation!
✅
next up
MapReduce
MapReduce is a parallel programming model that was introduced in 2004 at Google.
27
On the surface, MapReduce will seem entirely different to every paradigm you’ve seen before. But it’s not!
But it turns out that MapReduce, like partitioning ops, leverages parallel operations and manager-worker nodes.
[Google Research link]
The programmer specifies two methods:
Map(k, v) → <k’, v’>*
Reduce(k’, <v’>*) → <k’, v’’>*
MapReduce
MapReduce is a parallel programming model that was introduced in 2004 at Google.
28
[Google Research link]
The programmer specifies two methods:
Map(k, v) → <k’, v’>*
Reduce(k’, <v’>*) → <k’, v’’>*
Fundamental idea: Many parallel data processing algorithms (on both relational and semi-/un-structured data) can be captured using these two primitives: map and reduce.
Map: item-by-item processing
Read a lot of data and extract something of value
Reduce: collect items corresponding to the same key, and process them together
(shuffle: implicit in-between step to group by keys such that reduce works properly)
MapReduce Example 1: Word Counts
Suppose that a crawl of the entire world wide web is stored across N machines.
We want to count the # of times each word appears on the world wide web.
29
MapReduce Example 1: Word Counts
MapReduce is a parallel programming model that was introduced in 2004 at Google.
30
(shuffle: implicit in-between step to group by keys such that reduce works properly)
<word 1, 1>, <word 1, 1>, … <word 1, 1> <word2, 1> , …
Map: item-by-item processing
Read a lot of data and extract something of value
[Google Research link]
The programmer specifies two methods:
Map(k, v) → <k’, v’>*
Reduce(k’, <v’>*) → <k’, v’’>*
Reduce: collect items corresponding to the same key, and process them together
MapReduce Example 1: Word Counts
31
The crew of the space shuttle Endeavor recently returned to Earth as ambassadors, harbingers of a new era of space exploration. Scientists at NASA are saying that the recent assembly of the Dextre bot is the first step in a long-term space-based man/mache partnership. '"The work we're doing now -- the robotics we're doing -- is what we're going to need ……………………..
Document
(The, 1)
(crew, 1)
(of, 1)
(the, 1)
(space, 1)
(shuttle, 1)
(Endeavor, 1)
(recently, 1)
….
(crew, 1)
(crew, 1)
(space, 1)
(the, 1)
(the, 1)
(the, 1)
(shuttle, 1)
(recently, 1)
…
(crew, 2)
(space, 1)
(the, 3)
(shuttle, 1)
(recently, 1)
…
MAP:
Read input and produces a set of key-value pairs
Group by key:
Collect all pairs with same key
Reduce:
Collect all values belonging to the key and output
(key, value)
Programmer-provided
(key, value)
(key, value)
Sequentially read the data
Only sequential reads
Each document is processed on a separate worker
reorg by key and redistribute to workers
each key-value list is processed on a separate worker
Programmer-provided
System
[at home] MapReduce Example 2: Partitioned Parallel Join
32
[at home] MapReduce Example 2: Partitioned Parallel Join
Input relations:
Map:
Input: R (or S)
Output <B: (R, A)> (or <B: (S, C)>)
(Shuffle: all B values on same worker)
Reduce:
Input <B: (R, A1)…(R, An), (S, C1)…(S, Cm)>
Output: cross product of Ai and Cj
(A1, B, C1), (A1, B, C2), …, (An, B, Cm)
33
MapReduce’s Legacy and Spark
Partitioning Strategies
Partitioned Operators and Pipelining
Partitioned Joins and Aggregations
MapReduce
MapReduce’s Legacy and Spark
[Extra] Parallel DB: Sort-Merge, Sort
[Extra] Volcano Framework for Parallelism
34
Lecture 22, Data 101/Info 258 Spring 2024
MapReduce beyond Map and Reduce
Other than the Map and Reduce functions defined by the programmer,�the system handles everything else:
35
Is MapReduce Still Used?
There are several downsides of MapReduce:
36
we’ve seen a glimpse of how hard it is to think in this paradigm!
Is MapReduce Still Used?
There are several downsides of MapReduce:
37
While convenient to do arbitrary large scale parallel data processing on arbitrary data…
…MapReduce isn’t actually all that efficient compared to parallel relational databases!
we’ve seen a glimpse of how hard it is to think in this paradigm!
Is MapReduce Still Used?
At this point, MapReduce is largely overshadowed by parallel databases that addresses several of these limitations.
There are several downsides of MapReduce:
38
However, note that various MapReduce implementations still exist today:
While convenient to do arbitrary large scale parallel data processing on arbitrary data…
…MapReduce isn’t actually all that efficient compared to parallel relational databases!
Spark: Somewhere In Between
Spark (2014) introduced the notion of resilient distributed datasets (RDDs).
39
Matei Zaharia
Associate Professor
UC Berkeley
Originally, introduced as a way to bridge the gap between MapReduce and Parallel DBs
Ultimately ended up much faster than MapReduce…
Now: More like a parallel database with additional machine-learning-oriented bells and whistles.
Beyond: Streaming Data Applications
These all only perform batch jobs: A single query on a bunch of data with one result.
Modern applications need much more, e.g., streaming jobs, or live results as data is updated.
40
[Extra] Parallel DB: Sort-Merge, Sort
Partitioning Strategies
Partitioned Operators and Pipelining
Partitioned Joins and Aggregations
MapReduce
MapReduce’s Legacy and Spark
[Extra] Parallel DB: Sort-Merge, Sort
[Extra] Volcano Framework for Parallelism
41
Lecture 22, Data 101/Info 258 Spring 2024
Parallel Two-Pass Sort-Merge
42
Parallel Sort Approach 2: Partitioned Sort
43
[Extra] Volcano Framework for Parallelism
Conflicting Actions Review
Determining Serializability: Conflict Graphs
Formal Terminology: Conflict Serializable
Performance Tradeoffs: Snapshot Isolation
—
Partitioning Strategies
Partitioned Operators and Pipelining
MapReduce
MapReduce’s Legacy and Spark
[Extra] Weak Isolation: Read Commit
[Extra] Partitioned Joins and Aggregations
[Extra] Parallel DB: Sort-Merge, Sort
[Extra] Volcano Framework for Parallelism
44
Lecture 22, Data 101/Info 258 Spring 2024
Volcano Framework: A Helpful Way to think about Parallelism
The Volcano Framework (90s) by Graefe
Exchange functions:
Destination operators can read data from multiple “streams” via
45
Volcano Framework Examples
Partitioned parallel join
Asymmetric fragment and replicate join
Partitioned parallel aggregation
46
MapReduce in terms of other parallel operations
MapReduce is akin to what we’ve been discussing so far:
47