Relational Algebra
1
Data 101, Fall 2025 @ UC Berkeley
Aditya Parameswaran
Announcements
2
Recall: Colloquial terminology
Table/Relation: Collection of tuples w/a predefined collection of attributes.
Schema: The structure, format, or scaffolding defining a relation.
Database schema: The structure, format, �or scaffolding defining a collection of relations, �e.g., in a database.
3
We will now define these terms more rigorously with Relational Algebra.
name | artist | album | peak |
22 | Taylor Swift | Red | 20 |
Blinding Lights | The Weeknd | After Hours | 1 |
360 | Charli XCX | Brat | 1 |
Shelter | Porter Robinson | Shelter (single) | 16 |
… | … | … | … |
Table “songs”
Column | Type
----------|---------
name | text
artist | text
album | text
peak | integer
Relation Schema
songs(name TEXT, artist TEXT, � album TEXT, peak INTEGER, …)
Relation “instance”
4
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
Relational Algebra
Algebra (formulaic operations on variables) is core to several disciplines:
Relational Algebra (RA) is the theory of operations that help us transform relations.
5
RA is the core of query optimization, which we will cover later in the semester.
Relational Algebra is the foundation of the relational data model.�A grounding in RA allows us:
A Bit of History
RA introduced in a seminal paper by Edgar F. Codd in 1970.
Became the basis for the relational database revolution and many subsequent data systems.
6
Edgar F. Codd
The formal Relational Data Model
Relation: A set of tuples with a predefined set of attributes.
Schema: The structure, format, or scaffolding defining a relation.
Instance: A relation with values “filled in”, a specific instantiation
7
A Relation = Schema + Instance.
Set are Unordered: Equivalent Representations
Relations are sets of tuples w/ sets of attribs.
We can reorder both and not change the relation:
8
name | artist | album | peak |
22 | Taylor Swift | Red | 20 |
Blinding Lights | The Weeknd | After Hours | 1 |
360 | Charli XCX | Brat | 1 |
Shelter | Porter Robinson | Shelter (single) | 16 |
… | … | … | … |
name | peak | artist | album |
Blinding Lights | 1 | The Weeknd | After Hours |
360 | Charli XCX | Brat | 1 |
Shelter | 16 | Porter Robinson | Shelter (single) |
22 | 20 | Taylor Swift | Red |
… | … | … | … |
=
Relational Algebra Notation
9
Songs (name String, artist � String, album String,� peak Integer, …)
relation name
relation schema
relation name
relation schema
10
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
Relational Algebra Primitive Operations
There are six primitive RA operations, from which all others can be built:
11
Relational Algebra Primitive Operations
Let’s first focus on unary operators:
There are six primitive RA operations, from which all others can be built:
12
As we cover these next three operators, try brainstorming the corresponding SQL keywords/queries!
Projection: Cross Out Columns
Operation: Keep only columns A1, …, An�
Input: Output:�
Notes:
13
Example:
Column | Type
-----------------+--------
title_id | text
primary_title | text
Titles
Column | Type
-----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
same tuples, but with attributes removed
Selection: Cross out rows
Operation: Filter out rows, i.e., keep only rows that satisfy some condition C�
Input: Output: �
Notes:
14
Example:
Column | Type -----------+-------- � person_id | text
name | text
born | bigint
died | bigint
People
Column | Type -----------+-------- � person_id | text
name | text
born | bigint
died | bigint
rows where born > 1980
Renaming: Rename attributes and/or relation name
Operation: Change not the data, but the metadata (relation name, attribute names)�
Input: Output: �
15
Notes:
equivalently
Example:
Persons
Column | Type -----------+-------- � person_id | text
name | text
birth | bigint
death | bigint
People
Column | Type -----------+-------- � person_id | text
name | text
born | bigint
died | bigint
equivalently
Relational Algebra so far
Basic unary operations:
Relational Algebra (RA) is the theory of operations that transform relations.
16
Change metadata (schema)?
✅
❌�
✅
17
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
SFW as Relational Algebra
A SELECT FROM WHERE (SFW) query is the most standard SQL query to extract records from a relation.
18
🤔
A. WHERE: Projection, SELECT: Projection, AS: Renaming
B. WHERE: Projection, SELECT: Selection, AS: Renaming
C. WHERE: Selection, SELECT: Projection, AS: Renaming
D. WHERE: Selection, SELECT: Selection, AS: Renaming
E. Something else
WHERE, SELECT, AS - Which RA operators do these intuitively map to?
SELECT Bi1 AS A1,
Bi2 AS A2,
…
BiN AS AN
FROM R
WHERE C;
SFW as Relational Algebra
SQL “Order of execution”:
A SELECT FROM WHERE (SFW) query is the most standard SQL query to extract records from a relation.
19
SELECT Bi1 AS A1,
Bi2 AS A2,
…
BiN AS AN
FROM R
WHERE C;
Relational Algebra steps:
For the Relation R,
1. Select tuples that �match condition C.
2. Project to include�attributes Bi1, Bi2, …, Bin .
3. Rename attributes �Bi1, Bi2, …, Bin to A1, A2, …, An.
Caution! Terminology
SQL “Order of execution”:
A SELECT FROM WHERE (SFW) query is the most standard SQL query to extract records from a relation.
20
SELECT Bi1 AS A1,
Bi2 AS A2,
…
BiN AS AN
FROM R
WHERE C;
Relational Algebra steps:
For the Relation R,
1. Select tuples that �match condition C.
2. Project to include�attributes Bi1, Bi2, …, Bin .
3. Rename attributes �Bi1, Bi2, …, Bin to A1, A2, …, An.
⚠️ SQL SELECT is RA projection! ⚠️
This is a decades-old mismatch of terminology!!
Practice: Composing relational algebra operations
How would you write a relational algebra expression that represents the following?
21
title_id | primary_title | premiered
------------+-----------------------------------------+-----------
tt0178116 | Zhenatyy kholostyak | 1982
tt1681370 | The Algerian | 2014
tt3501632 | Thor: Ragnarok | 2017
tt9535620 | Vis à vis: Beyond the Veil | 1998
tt3582172 | Luanshi Jiaren | 1941
Titles
Column | Type
-----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
(input relation schema)
(output relation instance)
Return the title_id, primary_title, premiered of all movies (i.e., where type is 'movie').
Practice: Composing relational algebra operations
How would you write a relational algebra expression that represents the following?
22
title_id | primary_title | premiered
------------+-----------------------------------------+-----------
tt0178116 | Zhenatyy kholostyak | 1982
tt1681370 | The Algerian | 2014
tt3501632 | Thor: Ragnarok | 2017
tt9535620 | Vis à vis: Beyond the Veil | 1998
tt3582172 | Luanshi Jiaren | 1941
Titles
Column | Type
-----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
(input relation schema)
(output relation instance)
Return the title_id, primary_title, premiered of all movies (i.e., where type is 'movie').
1. selection
2. projection
Practice: Composing relational algebra operations
How would you write a relational algebra expression that represents the following?
23
Titles
Column | Type
-----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
(input relation schema)
1. selection
2. projection
Pi_{title_id, primary_title, premiered} (
Sigma_{type = movie} (
Pi_{title_id, primary_title, premiered, type}
(Titles)))
24
25
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
Cross Product (i.e., Cartesian Product, Product)
Operation: Associate each tuple on LHS with RHS�
Inputs: Output: �
Notes:
26
Example: Cross Product
27
0
Apricot
2
Cally
4
Eugene
1
Boots
ragdoll
persian
2
4
bengal
5
1
persian
S
T
id
name
breed
id
S.id
name
breed
T.id
0
Apricot
2
Cally
4
Eugene
2
ragdoll
1
persian
0
Apricot
4
Eugene
1
Boots
1
persian
1
persian
2
ragdoll
2
ragdoll
1
Boots
2
Cally
1
persian
2
ragdoll
4
bengal
0
Apricot
2
Cally
1
Boots
0
Apricot
2
Cally
4
Eugene
1
Boots
4
bengal
4
bengal
5
persian
5
persian
5
persian
5
persian
bengal
4
Eugene
4
continued…
By itself, a cross product is not very meaningful, as it combines records that represent different individuals.
Cross products are composed with other operations to create a richer expression of RA.
28
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
Relational Algebra Primitive Operations
There are six primitive RA operations, from which all others can be built:
29
Fundamental binary operators in set algebra!
Set Operations: Union and Difference
Inputs: Output:
Notes:
Union:
Operation:��All rows in R1 or R2.
If relations are sets, then the union and difference operators work �just like in set algebra.
30
Difference:
Operation:��All rows in R1 and not in R2.
Summary of Relational Algebra
There are six primitive relational algebra (RA) operations, from which all others can be built:
31
Intersection:
Operation:��All rows that are in� both R1 and R2
All other operations can be derived from these primitives.
Exercise for home: How can you derive intersection from a composition of union and difference operations?
🤔🏡
Summary
So Far
32
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
Summary of Relational Algebra
There are six primitive relational algebra (RA) operations, from which all others can be built:
33
All other operations can be derived from these primitives.
Joins are therefore also a composition of primitive operations!
We’ll derive two new joins today: theta join and natural join. Try out the others yourself!
Summary
So Far
Theta Join (and Equi Join)
Operation: Join two relations such that rows satisfy the condition Θ
More formally: A cross product of R1 and R2, followed by a selection on Θ
Inputs: Output:
34
Notes:
Theta Join (specifically, Equi Join) SQL Example
35
SELECT *�FROM Crew, People�WHERE � Crew.person_id = � People.person_id;
or equivalently
SELECT *�FROM Crew� INNER JOIN People� ON Crew.person_id = � People.person_id;
Crew
Column | Type -----------+-------- � title_id | text
person_id | text
category | text
job | text
People
Column | Type -----------+--------
person_id | text
name | text
born | bigint
died | bigint
Column | Type -----------------+--------
title_id | text
Crew.person_id | text
category | text
job | text
People.person_id| text
name | text
born | bigint
died | bigint
(using RA notation for attributes)
Theta Join is SQL inner join!
Composing Theta Join Using RA Primitives
Operation: Join two relations such that rows satisfy the condition Θ.
With primitives: A cross product of R1 and R2, followed by a selection on Θ.
Theta Join is an explicit composition of primitive RA operations:
36
🤔
Which of the following expressions are equivalent to this statement?
G.
H.
I.
J. Something else
Composing Theta Join Using RA Primitives
Operation: Join two relations such that rows satisfy the condition Θ.
With primitives: A cross product of R1 and R2, followed by a selection on Θ.
Theta Join is an explicit composition of primitive RA operations. From before:
37
This composition is the formal definition of theta join.
38
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
A new join “naturally” arises!
This is a very common operation!
39
Crew
Column | Type -----------+-------- � title_id | text
person_id | text
category | text
job | text
People
Column | Type -----------+--------
person_id | text
name | text
born | bigint
died | bigint
equi join on person_id
Column | Type -----------------+--------
title_id | text
Crew.person_id | text
category | text
job | text
People.person_id| text
name | text
born | bigint
died | bigint
project/�rename
Column | Type -----------------+--------
title_id | text
person_id | text
category | text
job | text � name | text
born | bigint
died | bigint
A new join “naturally” arises!
This is a very common operation! Often times:
40
Enter the natural join!
Crew
Column | Type -----------+-------- � title_id | text
person_id | text
category | text
job | text
People
Column | Type -----------+--------
person_id | text
name | text
born | bigint
died | bigint
equi join on person_id
Column | Type -----------------+--------
title_id | text
Crew.person_id | text
category | text
job | text
People.person_id| text
name | text
born | bigint
died | bigint
project/�rename
Column | Type -----------------+--------
title_id | text
person_id | text
category | text
job | text � name | text
born | bigint
died | bigint
Natural Join
Operation: Join two relations such that rows are matched on shared� attributes, with no duplicate attribute renaming.
�Inputs: Output:
41
with duplicate attributes removed�from schema
Example: Natural Join in SQL
42
Crew
Column | Type -----------+-------- � title_id | text
person_id | text
category | text
job | text
People
Column | Type -----------+--------
person_id | text
name | text
born | bigint
died | bigint
natural join
Column | Type -----------------+--------
title_id | text
person_id | text
category | text
job | text � name | text
born | bigint
died | bigint
SELECT� title_id,� Crew.person_id AS person_id,� category,� job,� name,� born,� died FROM Crew� INNER JOIN People� ON Crew.person_id � = People.person_id;
SELECT *�FROM Crew� NATURAL JOIN People;
🥹
equivalently
Composing Natural Join
43
Crew
Column | Type -----------+-------- � title_id | text
person_id | text
category | text
job | text
People
Column | Type -----------+--------
person_id | text
name | text
born | bigint
died | bigint
natural join
Column | Type -----------------+--------
title_id | text
person_id | text
category | text
job | text � name | text
born | bigint
died | bigint
1.Cross Product
2. Match tuples
3. Rename same attributes
4. Drop duplicate attributes
Like Theta Join, Natural Join is an explicit composition of RA primitives:
Exercise for Home
Suppose we have the following relations:
44
What is an equivalent operation for the following natural joins?
🤔
Exercise for Home
Suppose we have the following relations:
Natural Join as RA primitives:
45
What is an equivalent operation for the following natural joins?
C. intersection
R and S have same schema, so any rows that don’t exactly match will be dropped!
Exercise for Home
2.
Suppose we have the following relations:
Natural Join as RA primitives:
46
What is an equivalent operation for the following natural joins?
No shared attributes!
D. cross product
Summary of Relational Algebra
There are six primitive relational algebra (RA) operations, from which all others can be built:
47
All other operations can be derived from these primitives:
Summary
So Far
48
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
Briefly, Three Types of Data Systems
Let’s ground the utility of RA by using it to understand real systems.
(We’ll cover these systems in more depth later). For now, don’t need to know details or API/syntax—just that they exist.
49
Dataframes
Relational Databases
Spark
Briefly, Three Types of Data Systems
Let’s ground the utility of RA by using it to understand real systems.
(We’ll cover these systems in more depth later). For now, don’t need to know details or API/syntax—just that they exist.
50
Dataframes
Relational Databases
Abstraction/API to represent, clean, and structure your data
Spark
Briefly, Three Types of Data Systems
Let’s ground the utility of RA by using it to understand real systems.
(We’ll cover these systems in more depth later). For now, don’t need to know details or API/syntax—just that they exist.
51
Dataframes
Relational Databases
Abstraction/API to represent, clean, and structure your data
Traditional, time-tested solutions to managing data at scale
Spark
Briefly, Three Types of Data Systems
Let’s ground the utility of RA by using it to understand real systems.
(We’ll cover these systems in more depth later). For now, don’t need to know details or API/syntax—just that they exist.
52
Dataframes
Relational Databases
Spark
Abstraction/API to represent, clean, and structure your data
Traditional, time-tested solutions to managing data at scale
Distributed, parallel computing framework
Expressing Projection across Data Systems
53
Dataframes
Relational Databases
Spark
Titles[['title_id', 'primary_title']]
or equivalently
Titles.drop(columns=['type','original_title',…,'genres'])
SELECT title_id, primary_title
FROM titles;
Titles.select('title_id', 'primary_title')
Expressing Selection across Data Systems
54
Dataframes
Relational Databases
Spark
People[People[‘born’] > 1980]
SELECT *
FROM people
WHERE born > 1980;
People.filter("born > 1980")
or equivalently
People.filter(People.born > 1980)
or equivalently
People.where("born > 1980")
Expressing Renaming across Data Systems (attributes only for now)
55
Dataframes
Relational Databases
Spark
People.rename(columns={'born': 'birth', 'died': 'death'})
SELECT person_id, name, born AS birth, died AS death
FROM people;
People
.withColumnRenamed('born','birth')
.withColumnRenamed('died','death')
Renaming relations: In Spark/pandas, use assignment (=) operator. In SQL, use DML (more next time).
Expressing Cross Product across Data Systems
56
Dataframes
Relational Databases
Spark
SELECT *
FROM titles, crew;
Titles.crossJoin(Crew)
Crew
Column | Type -----------+-------- � title_id | text
person_id | text
category | text
job | text
Titles
Column | Type
-----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Titles.merge(Crew, how=”cross”)
Expressing Union across Data Systems
57
Dataframes
Relational Databases
Spark
pd.concat([Title2019, Title2023],axis=0)
(SELECT * FROM titles_2019)�UNION�(SELECT * FROM titles_2023);
Titles2019.union(Titles2023)
Expressing Theta Join Across Data Systems
58
Dataframes
Relational Databases
Spark
Not supported, but instead hybrid theta join + natural join with pd.merge:
Titles.merge(Crew, on='title_id')
SELECT *�FROM Crew� INNER JOIN People� ON Crew.person_id = People.person_id;
Crew.join(People, [Crew.person_id == People.person_id])
Expressing Natural Join Across Data Systems
59
Dataframes
pd.merge(Titles, Crew)
Relational Databases
Spark
SELECT *
FROM Crew
NATURAL JOIN People;
(No convenient way; use theta join w/ other operators)
60
The Relational Data Model
Projection, Selection, Renaming
Unary Operator Practice
Binary Operator: Cartesian Product
Set Operations
Theta Join (and Equi Join)
Natural Join
From RA to Data Systems
From RA to Data Systems: Sets vs. Bags
Outline
What was the point?
Relational algebra forms the basics of data processing
→ understanding RA lets you quantify the capabilities of data systems.
Small caveat:
61
Operations on Bags
Briefly, bags are sets that allow duplicate elements. Why use bags?
62
| Bag operation | Perf vs. Sets | Comments |
Selection | Preserve # of occurrences | Roughly as fast | |
Projection | " | Faster than sets | Common operation (SQL SELECT). Want this to be fast! No checking of dupes |
Cartesian Product | " | Roughly as fast | Every copy “matches” with your copy. Includes join |
Union | Add # of occurrences | Faster than sets | No checking of dupes |
Difference | Subtract # of occurrences | Roughly as fast | Both require checking dupes |
Renaming | (unchanged) | – | |
SQL: Bag/Set operations with subqueries
Bag (i.e., multiset) operations for union, difference, and intersection, respectively:
Enforce set behavior instead
63
Example: Find members of Crew that are not in People:
(SELECT person_id FROM Crew)
EXCEPT ALL
(SELECT person_id FROM People)
A subquery is a parenthesized SQL query. More next time!
Summary of Relational Algebra
There are six primitive relational algebra (RA) operations, from which all others can be built:
64
All other operations can be derived from these primitives:
While RA is defined on sets, many data systems actually implement RA with bags. It’s faster and supports duplicates!
Summary
So Far