Introduction to Data Management�DSC 100
Lecture 4 Part 1:
Relational Algebra
DSC 100
1
Relational Algebra
DSC 100
2
Relational Algebra
DSC 100
3
Basics
DSC 100
4
Sets v.s. Bags
Relational Algebra has two flavors:
DB systems implement bag semantics (Why?)
DSC 100
5
Relational Algebra Operators
DSC 100
6
RA
Extended RA
All operators take in 1 or 2 relations as inputs �and return another relation
Union and Difference
DSC 100
7
What do they mean over bags ?
R1 ∪ R2
R1 – R2
Only make sense if R1, R2 have the same schema
What about Intersection ?
DSC 100
8
R1 ∩ R2 = R1 – (R1 – R2)
R1 ∩ R2 = R1 ⨝ R2
Selection
DSC 100
9
σc(R)
DSC 100
10
σSalary > 40000 (Employee)
SSN | Name | Salary |
1234545 | John | 20000 |
5423341 | Smith | 60000 |
4352342 | Fred | 50000 |
SSN | Name | Salary |
5423341 | Smith | 60000 |
4352342 | Fred | 50000 |
Employee
Projection
DSC 100
11
π A1,…,An (R)
Different semantics over sets or bags! Why?
DSC 100
12
π Name,Salary (Employee)
SSN | Name | Salary |
1234545 | John | 20000 |
5423341 | John | 60000 |
4352342 | John | 20000 |
Name | Salary |
John | 20000 |
John | 60000 |
John | 20000 |
Employee
Name | Salary |
John | 20000 |
John | 60000 |
Bag semantics
Set semantics
Which is more efficient?
Composing RA Operators
DSC 100
13
no | name | zip | disease |
1 | p1 | 98125 | flu |
2 | p2 | 98125 | heart |
3 | p3 | 98120 | lung |
4 | p4 | 98120 | heart |
Patient
σdisease=‘heart’(Patient)
no | name | zip | disease |
2 | p2 | 98125 | heart |
4 | p4 | 98120 | heart |
zip | disease |
98125 | flu |
98125 | heart |
98120 | lung |
98120 | heart |
πzip,disease(Patient)
πzip,disease(σdisease=‘heart’(Patient))
zip | disease |
98125 | heart |
98120 | heart |
Cartesian Product
DSC 100
14
R1 × R2
Cross-Product Example
DSC 100
15
Name | SSN |
John | 999999999 |
Tony | 777777777 |
Employee
EmpSSN | DepName |
999999999 | Emily |
777777777 | Joe |
Dependent
Employee × Dependent
Name | SSN | EmpSSN | DepName |
John | 999999999 | 999999999 | Emily |
John | 999999999 | 777777777 | Joe |
Tony | 777777777 | 999999999 | Emily |
Tony | 777777777 | 777777777 | Joe |
Renaming
DSC 100
16
ρB1,…,Bn (R)
Natural Join
DSC 100
17
R1 ⨝ R2
Natural Join Example
DSC 100
18
A | B |
X | Y |
X | Z |
Y | Z |
Z | V |
B | C |
Z | U |
V | W |
Z | V |
A | B | C |
X | Z | U |
X | Z | V |
Y | Z | U |
Y | Z | V |
Z | V | W |
R
S
R ⨝ S =
ΠABC(σR.B=S.B(R × S))
Natural Join Example 2
DSC 100
19
age | zip | disease |
54 | 98125 | heart |
20 | 98120 | flu |
AnonPatient P
Voters V
P V
name | age | zip |
Alice | 54 | 98125 |
Bob | 20 | 98120 |
age | zip | disease | name |
54 | 98125 | heart | Alice |
20 | 98120 | flu | Bob |
Natural Join
DSC 100
20
Introduction to Data Management�DSC 100
Lecture 4 Part 2:
Relational Algebra
DSC 100
21
Theta Join
DSC 100
22
R1 ⨝θ R2 = σθ (R1× R2)
P ⨝ P.zip = V.zip and P.age >= V.age -1 and P.age <= V.age +1 V
AnonPatient (age, zip, disease)�Voters (name, age, zip)
Equijoin
DSC 100
23
R1 ⨝θ R2 = σθ (R1 × R2)
Equijoin Example
DSC 100
24
age | zip | disease |
54 | 98125 | heart |
20 | 98120 | flu |
AnonPatient P
Voters V
P P.age=V.age V
name | age | zip |
p1 | 54 | 98125 |
p2 | 20 | 98120 |
P.age | P.zip | P.disease | V.name | V.age | V.zip |
54 | 98125 | heart | p1 | 54 | 98125 |
20 | 98120 | flu | p2 | 20 | 98120 |
Natural Join Example
DSC 100
25
age | zip | disease |
54 | 98125 | heart |
20 | 98120 | flu |
AnonPatient P
Voters V
P V
name | age | zip |
p1 | 54 | 98125 |
p2 | 20 | 98120 |
age | zip | disease | name | V.age | V.zip |
54 | 98125 | heart | p1 | 54 | 98125 |
20 | 98120 | flu | p2 | 20 | 98120 |
Join Summary
DSC 100
26
So Which Join Is It ?
When we write R ⨝ S we usually mean an equijoin, but we often omit the equality predicate when it is clear from the context
DSC 100
27
More Joins
DSC 100
28
Outer Join Example
DSC 100
29
age | zip | disease |
54 | 98125 | heart |
20 | 98120 | flu |
33 | 98120 | lung |
AnonPatient P
P ⋊ J
P.age | P.zip | P.disease | J.job | J.age | J.zip |
54 | 98125 | heart | lawyer | 54 | 98125 |
20 | 98120 | flu | cashier | 20 | 98120 |
33 | 98120 | lung | null | null | null |
AnnonJob J
job | age | zip |
lawyer | 54 | 98125 |
cashier | 20 | 98120 |
Some Examples
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,qty,price)
Name of supplier of parts with size greater than 10
πsname(Supplier ⨝ (Supply ⨝ (σpsize>10 (Part)))
Name of supplier of red parts or parts with size greater than 10
πsname(Supplier ⨝ (Supply ⨝ (σ psize>10 (Part) ∪ σpcolor=‘red’ (Part) ) ) )
πsname(Supplier ⨝ (Supply ⨝ (σ psize>10 ∨ pcolor=‘red’ (Part) ) ) )
Can be represented as trees as well
DSC 100
30
Some Examples
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,qty,price)
Name of supplier of parts with size greater than 10
Project[sname](Supplier Join[sno=sno] � (Supply Join[pno=pno] (Select[psize>10](Part))))
Name of supplier of red parts or parts with size greater than 10
Project[sname](Supplier Join[sno=sno]
(Supply Join[pno=pno]
((Select[psize>10](Part)) Union � (Select[pcolor=‘red’](Part)))
Project[sname](Supplier Join[sno=sno] (Supply Join[pno=pno] � (Select[psize>10 OR pcolor=‘red’](Part))))
Can be represented as trees as well
31
Representing RA Queries as Trees
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,qty,price)
32
Part
Supply
σpsize>10
πsname
Answer
Supplier
SELECT z.sname
FROM Part x, Supply y, Supplier z
WHERE x.psize > 10� and x.pno = y.pno� and y.sno = z.sno
πsname(Supplier ⨝ (Supply ⨝ (σpsize>10 (Part)))
Relational Algebra Operators
DSC 100
33
RA
Extended RA
All operators take in 1 or 2 relations as inputs �and return another relation
Extended RA: Operators on Bags
DSC 100
34
Using Extended RA Operators
DSC 100
35
SELECT city, sum(quantity)
FROM sales
GROUP BY city
HAVING count(*) > 100
T1, T2 = temporary tables
sales(product, city, quantity)
γ city, sum(quantity)→q, count(*) → c
σ c > 100
Π city, q
Answer
T1(city,q,c)
T2(city,q,c)
Typical Plan for a Query (1/2)
DSC 100
36
R
S
join condition
σselection condition
πfields
join condition
…
SELECT-PROJECT-JOIN
Query
Answer
SELECT fields
FROM R, S, …
WHERE condition
Typical Plan for a Query (1/2)
DSC 100
37
πfields
ɣfields, sum/count/min/max(fields)
σhaving condition
σwhere condition
join condition
…
…
SELECT fields
FROM R, S, …
WHERE condition
GROUP BY fields
HAVING condition
How about Subqueries?
DSC 100
38
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,price)
SELECT Q.sno
FROM Supplier Q
WHERE Q.sstate = ‘WA’ � and not exists
(SELECT *
FROM Supply P� WHERE P.sno = Q.sno� and P.price > 100)
How about Subqueries?
DSC 100
39
SELECT Q.sno
FROM Supplier Q
WHERE Q.sstate = ‘WA’ � and not exists
(SELECT *
FROM Supply P� WHERE P.sno = Q.sno� and P.price > 100)
Correlation !
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,price)
How about Subqueries?
DSC 100
40
SELECT Q.sno
FROM Supplier Q
WHERE Q.sstate = ‘WA’ � and not exists
(SELECT *
FROM Supply P� WHERE P.sno = Q.sno� and P.price > 100)
De-Correlation
SELECT Q.sno
FROM Supplier Q
WHERE Q.sstate = ‘WA’� and Q.sno not in
(SELECT P.sno
FROM Supply P� WHERE P.price > 100)
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,price)
How about Subqueries?
DSC 100
41
SELECT Q.sno
FROM Supplier Q
WHERE Q.sstate = ‘WA’� and Q.sno not in
(SELECT P.sno
FROM Supply P� WHERE P.price > 100)
(SELECT Q.sno
FROM Supplier Q
WHERE Q.sstate = ‘WA’)
EXCEPT� (SELECT P.sno
FROM Supply P� WHERE P.price > 100)
EXCEPT = set difference
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,price)
Un-nesting
How about Subqueries?
DSC 100
42
(SELECT Q.sno
FROM Supplier Q
WHERE Q.sstate = ‘WA’)
EXCEPT� (SELECT P.sno
FROM Supply P� WHERE P.price > 100)
Supply
σsstate=‘WA’
Supplier
σPrice > 100
−
Finally…
πsno
πsno
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,price)
Summary of RA and SQL
DSC 100
43
RDBMS translate SQL 🡪 RA, then optimize RA
Summary of RA and SQL
DSC 100
44