1 of 59

Relational Algebra

Prof. Sharad Mehrotra

University of California, Irvine

2 of 59

What is Relational Algebra

  • A simple algebraic operator based language for writing queries over relational databases.
    • Very simple to learn, though can be hard to master ☺
    • Part of the original relational model
  • DBMSs do not support relational algebra directly, instead they support SQL.

  • So why learn relational algebra and not SQL directly?
    • The theoretical underpinning of SQL is RA. To gain deeper insight into SQL and to write SQL queries more accurately, it is important to understand RA.
    • SQL is transformed into RA before optimizing and executing it.
    • To understand how optimizers work and how they rewrite queries you must know RA and the properties (such as associativity, distributivity, etc. ) of RA.
    • DBMSs implement RA operators to implement SQL. If you do not know RA, how DBMSs work will remain a mystery!!

2

3 of 59

Relational Algebra: in 1 slide!

  • Unary operators:
    • Selection: choose rows from a relation.
    • Projection: choose columns from a relation.
    • Renaming: rename a relation and its attributes
    • Combining basic operators to form expressions
  • Binary Operators:
    • Union, Intersection, Difference:
      • Relations must have the same schema
    • Cartesian Product and Join: construct a new relation from several relations

A relational Algebra query is a composition of the above operators

3

3

4 of 59

Sample Schema

Patients(name, age) -- patients who currently have some disease

RecoveredPatients(name, age) -- patients who have recovered from the disease they had. They may still have other diseases in which case they will also be part of the Patients table as well.

Diagnosis(pname, disease, diagnosis_date) -- the diagnosis made about patients on a given date.

ToDiagnose(disease, test) -- tests used to diagnose the disease

Outcomes(pname, test, result, test_date) -- the result of the test when conducted on the patient pid on the given date.

4

5 of 59

Sample Tables

5

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

name

age

'Mary'

45

'John'

60

'Al'

33

'Barbara'

30

'Alex'

27

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Jane'

'COVID'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

'Bob'

'Mono'

2021-02-01

'John'

'COVID'

2021-02-10

'Al'

'Strep'

2021-02-16

'Mary'

'Strep'

2021-02-16

'Mehdi'

'Meningitis'

2021-02-10

disease

test

'Flu'

'A'

'COVID'

'B'

'Mono'

'C'

'Strep'

'D'

'Mono'

'E'

'COVID'

'F'

'Strep'

'G'

'Meningitis'

'H'

Patients

RecoveredPatients

Diagnosis

ToDiagnose

select[name= mary]

6 of 59

6

pname

test

result

test_date

'Mary'

'A'

'true'

2021-01-01

'Mary'

'B'

'false'

2021-01-01

'Mary'

'D'

'true'

2021-01-01

'Jane'

'B'

'true'

2021-01-01

'Jane'

'C'

'false'

2021-01-01

'Jane'

'F'

'false'

2021-01-01

'Mehdi'

'E'

'true'

2021-01-02

'Mehdi'

'E'

'false'

2021-01-02

'Mehdi'

'F'

'true'

2021-01-02

'Alex'

'G'

'true'

2020-01-03

'Bob'

'C'

'true'

2021-02-01

'Al'

'G'

'true'

2020-02-16

'Barbara'

'G'

'false'

2020-02-10

'Alex'

'A'

'true'

2020-01-03

'Mary'

'A'

'true'

2020-01-01

Outcomes

7 of 59

Selection

σ c (R): return tuples in R that satisfy condition C.

7

Patients.name

Patients.age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

Patients.name

Patients.age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

Patients

σage ≥30 ∧ age < 50(Patients)

Patients.name

Patients.age

'Mary'

45

'Mehdi'

33

'Alex'

30

σage ≥ 30(Patients)

8 of 59

Projection

ΠA1,…,Ak(R): pick columns of attributes A1,…,Ak of R.

8

πpname,disease (Diagnosis)

pname

disease

'Mary'

'Flu'

'Jane'

'COVID'

'Mehdi'

'Mono'

'Alex'

'Strep'

'Bob'

'Mono'

'John'

'COVID'

'Al'

'Strep'

'Mary'

'Strep'

'Mehdi'

'Meningitis'

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Jane'

'COVID'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

'Bob'

'Mono'

2021-02-01

'John'

'COVID'

2021-02-10

'Al'

'Strep'

2021-02-16

'Mary'

'Strep'

2021-02-16

'Mehdi'

'Meningitis'

2021-02-10

Diagnosis

pname

'Mary'

'Jane'

'Mehdi'

'Alex'

'Bob'

'John'

'Al'

πpname (Diagnosis)

Notice: Duplicates eliminated in answer

9 of 59

Union ( ∪)

9

Union

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

name

age

'Mary'

45

'John'

60

'Al'

33

'Barbara'

30

'Alex'

27

Patients

RecoveredPatients

Query in Relax: Patients ∪ RecoveredPatients

Patients.name

Patients.age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

'John'

60

'Al'

33

'Barbara'

30

'Alex'

27

10 of 59

Set operations must be over relations of the same type.

10

Error: at line 1: schemas are not unifiable: types are different or size is different: [RecoveredPatients.name : string, RecoveredPatients.age : number] and [Diagnosis.pname : string, Diagnosis.disease : string, Diagnosis.diagnosis_date : date]

name

age

'Mary'

45

'John'

60

'Al'

33

'Barbara'

30

'Alex'

27

Diagnosis

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Jane'

'COVID'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

'Bob'

'Mono'

2021-02-01

'John'

'COVID'

2021-02-10

'Al'

'Strep'

2021-02-16

'Mary'

'Strep'

2021-02-16

'Mehdi'

'Meningitis'

2021-02-10

RecoveredPatients

Diagnosis ∪ RecoveredPatients

11 of 59

Intersection

11

Patients.name

Patients.age

'Mary'

45

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

name

age

'Mary'

45

'John'

60

'Al'

33

'Barbara'

30

'Alex'

27

Patients

RecoveredPatients

Query in Relax: Patients ∩RecoveredPatients

12 of 59

Set Difference

12

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

name

age

'Mary'

45

'John'

60

'Al'

33

'Barbara'

30

'Alex'

27

Patients

RecoveredPatients

Query in Relax: Patients - RecoveredPatients

Patients.name

Patients.age

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

13 of 59

Cartesian Product

R × S: pair each tuple r in

R with each tuple s in S.

13

‘Mary'

'Flu'

'Mary'

'COVID'

'Mary'

'Mono'

'Mary'

'Strep'

'Mary'

'Meningitis'

'Jane'

'Flu'

'Jane'

'COVID'

'Jane'

'Mono'

'Jane'

'Strep'

'Jane'

'Meningitis'

'Mehdi'

'Flu'

'Mehdi'

'COVID'

'Mehdi'

'Mono'

'Mehdi'

'Strep'

'Mehdi'

'Meningitis'

'Alex'

'Flu'

'Alex'

'COVID'

'Alex'

'Mono'

'Alex'

'Strep'

'Alex'

'Meningitis'

'Bob'

'Flu'

'Bob'

'COVID'

'Bob'

'Mono'

'Bob'

'Strep'

'Bob'

'Meningitis'

π name ( Patients ) ⨯ π disease ( ToDiagnose )

14 of 59

Join

R S = σ c (R × S)

14

14

C

  • Join condition C is of the form:

<cond_1> AND <cond_2> AND … AND <cond_k>

Each cond_i is of the form A op B, where:

    • A is an attribute of R, B is an attribute of S
    • op is a comparison operator: =, <, >, ≥, ≤, or .
  • Different types:
    • Theta-join
    • Equi-join
    • Natural join

15 of 59

Theta-Join

(Diagnosis)

⨝diagnosis_date>test_date

(Outcomes)

15

15

Diagnosis × Outcomes

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'John'

'COVID'

2021-02-10

Diagnosis

pname

test

result

test_date

'Mary'

'A'

'true'

2021-01-01

'Mehdi'

'E'

'true'

2021-01-02

'Barbara'

'G'

'false'

2021-02-10

Outcomes

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

Outcomes.pname

Outcomes.test

Outcomes.result

Outcomes.test_date

'Mehdi'

'Mono'

2021-01-02

'Mary'

'A'

'true'

2021-01-01

'John'

'COVID'

2021-02-10

'Mary'

'A'

'true'

2021-01-01

'John'

'COVID'

2021-02-10

'Mehdi'

'E'

'true'

2021-01-02

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

Outcomes.pname

Outcomes.test

Outcomes.result

Outcomes.test_date

'Mary'

'Flu'

2021-01-01

'Mary'

'A'

'true'

2021-01-01

'Mary'

'Flu'

2021-01-01

'Mehdi'

'E'

'true'

2021-01-02

'Mary'

'Flu'

2021-01-01

'Barbara'

'G'

'false'

2021-02-10

'Mehdi'

'Mono'

2021-01-02

'Mary'

'A'

'true'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Mehdi'

'E'

'true'

2021-01-02

'Mehdi'

'Mono'

2021-01-02

'Barbara'

'G'

'false'

2021-02-10

'John'

'COVID'

2021-02-10

'Mary'

'A'

'true'

2021-01-01

'John'

'COVID'

2021-02-10

'Mehdi'

'E'

'true'

2021-01-02

'John'

'COVID'

2021-02-10

'Barbara'

'G'

'false'

2021-02-10

16 of 59

Theta-Join

(Diagnosis)

⨝diagnosis_date>=test_date AND Diagnosis.pname ≠ Outcomes.pname

(Outcomes)

16

16

Diagnosis⨯Outcomes

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'John'

'COVID'

2021-02-10

Diagnosis

pname

test

result

test_date

'Mary'

'A'

'true'

2021-01-01

'Mehdi'

'E'

'true'

2021-01-02

'Barbara'

'G'

'false'

2021-02-10

Outcomes

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

Outcomes.pname

Outcomes.test

Outcomes.result

Outcomes.test_date

'Mehdi'

'Mono'

2021-01-02

'Mary'

'A'

'true'

2021-01-01

'John'

'COVID'

2021-02-10

'Mary'

'A'

'true'

2021-01-01

'John'

'COVID'

2021-02-10

'Mehdi'

'E'

'true'

2021-01-02

'John'

'COVID'

2021-02-10

'Barbara'

'G'

'false'

2021-02-10

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

Outcomes.pname

Outcomes.test

Outcomes.result

Outcomes.test_date

'Mary'

'Flu'

2021-01-01

'Mary'

'A'

'true'

2021-01-01

'Mary'

'Flu'

2021-01-01

'Mehdi'

'E'

'true'

2021-01-02

'Mary'

'Flu'

2021-01-01

'Barbara'

'G'

'false'

2021-02-10

'Mehdi'

'Mono'

2021-01-02

'Mary'

'A'

'true'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Mehdi'

'E'

'true'

2021-01-02

'Mehdi'

'Mono'

2021-01-02

'Barbara'

'G'

'false'

2021-02-10

'John'

'COVID'

2021-02-10

'Mary'

'A'

'true'

2021-01-01

'John'

'COVID'

2021-02-10

'Mehdi'

'E'

'true'

2021-01-02

'John'

'COVID'

2021-02-10

'Barbara'

'G'

'false'

2021-02-10

17 of 59

Equi-Join

  • Special kind of theta-join: Join condition C only uses the equality operator.

17

17

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'John'

'COVID'

2021-02-10

Diagnosis

pname

test

result

test_date

'Mary'

'A'

'true'

2021-01-01

'Mehdi'

'E'

'true'

2021-01-02

'Barbara'

'G'

'false'

2021-02-10

Outcomes

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

Outcomes.pname

Outcomes.test

Outcomes.result

Outcomes.test_date

'Mary'

'Flu'

2021-01-01

'Mary'

'A'

'true'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Mehdi'

'E'

'true'

2021-01-02

(Diagnosis) ⨝Diagnosis.pname=Outcomes.pname (Outcomes)

18 of 59

Natural-Join

  • Relations R and S. Let L be the union of their attributes.
  • Let A1,…,Ak be their common attributes.

18

18

(Diagnosis) ⨝ (Outcomes)

R S = Π L (R S)

R.A1=S.A1 AND … AND R.Ak=S.Ak

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'John'

'COVID'

2021-02-10

Diagnosis

pname

test

result

test_date

'Mary'

'A'

'true'

2021-01-01

'Mehdi'

'E'

'true'

2021-01-02

'Barbara'

'G'

'false'

2021-02-10

Outcomes

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

Outcomes.test

Outcomes.result

Outcomes.test_date

'Mary'

'Flu'

2021-01-01

'A'

'true'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'E'

'true'

2021-01-02

19 of 59

Renaming ρ

  • Motivation: disambiguate attribute names. E.g., in R × R, how to differentiate the attributes from the two instances?

19

19

ρS(B1,…,Bn)(R)�

A relation identical to R, with new attributes B1,…,Bn.

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

Patients (name, age)

name1

age1

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

Patients1 (name1, age1)

ρ Patients1 (ρ name1←name, age1←age (Patients))

20 of 59

Combining Operations

  • Construct general expressions using basic operations.
  • Schema of each operation:
    • ∪, ∩, - : same as the schema of the two relations
    • Selection σ : same as the relation’s schema
    • Projection Π: attributes in the projection
    • Cartesian product × : attributes in two relations, use prefix to avoid confusion
    • Theta Join : same as ×
    • Natural Join : union of relations’ attributes, merge common attributes
    • Renaming: new renamed attributes

20

20

C

21 of 59

Query 0: Who has what disease

Query: Diagnosis

21

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

'Mary'

'Flu'

2021-01-01

'Jane'

'COVID'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

'Bob'

'Mono'

2021-02-01

'John'

'COVID'

2021-02-10

'Al'

'Strep'

2021-02-16

'Mary'

'Strep'

2021-02-16

'Mehdi'

'Meningitis'

2021-02-10

22 of 59

Query 1: Which Patients are younger compared to Mary

22

22

Patients.name

Patients.age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

Patients1.name1

'Mehdi'

'Alex'

'Bob'

23 of 59

Query 1: Which Patients are younger compared to Mary

23

23

ρPatients1(Patients) ⨝Patients1.age < Patients.age σname = 'Mary' (Patients)

Diagonsis

Patient

Disease

Mary

Strep

ToDiagnose

Disease

Test

Strep

A

Patient

name

age

Mary

45

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

24 of 59

Query 1: Which Patients are younger compared to Mary

24

24

ρPatients1(Patients) ⨝Patients1.age < Patients.age σname = 'Mary' (Patients)

project_p1.name (select[name= mary] Patients join[patient.age:> p1.age rename[p1](patient))

project (select[name= mary] (Patients join[patient.age:> p1.age rename[p1](patient)))

25 of 59

Query 2: Who has a disease they have been tested for?

25

Diagonsis

Patient

Disease

Mary

Strep

ToDiagnose

Disease

Test

Strep

A

Outcome

Patient

Test

Outc

Mary

A

T

Mary has been tested for A

A is a test for Strep

Mary has Strep

26 of 59

Query 2: Who has a disease they have been tested for?

πpname (Outcomes ⨝ToDiagnose⨝Diagnosis)

26

'Mary'

'Jane'

'Mehdi'

'Alex'

'Bob'

'Al'

27 of 59

Query 2: Who has a disease they have been tested for?

27

Several relational algebra expressions are equivalent.

πpname ((Diagnosis ⨝ ToDiagnose) ⨝ Outcomes)

Mary'

'Jane'

'Mehdi'

'Alex'

'Bob'

'Al'

28 of 59

Query 2: Who has a disease they have been tested for?

28

πpname ((πdisease,pname Diagnosis ⨝ ToDiagnose) ⨝ πpname, test Outcomes)

Several relational algebra expressions are equivalent.

Mary'

'Jane'

'Mehdi'

'Alex'

'Bob'

'Al'

29 of 59

Query 3: Who has a disease they have been tested Positive for?

29

29

πpname((Diagnosis ⨝ ToDiagnose) ⨝ σ result='true' (Outcomes))

Diagonsis

Patient

Disease

Mary

Strep

ToDiagnose

Disease

Test

Strep

A

Outcome

Patient

Test

Outc

Mary

A

T

Mary has been tested for A

A is a test for Strep

Mary has Strep

outcome is true

30 of 59

Query 3: Who has a disease they have been tested Positive for?

30

πpname((Diagnosis ⨝ ToDiagnose) ⨝ σ result='true' (Outcomes))

'Mary'

'Jane'

'Mehdi'

'Alex'

'Bob'

'Al'

31 of 59

Query 3’’ : Who has a disease they have been tested Negative for?

31

πpname ((Diagnosis ⨝ ToDiagnose) ⨝ σresult='false' (Outcomes))

Jane'

'Mehdi'

32 of 59

Query 4: Who has a disease they have tested both positively & negatively for?

32

(πpname ((Diagnosis ⨝ ToDiagnose) ⨝ σresult='true' Outcomes)) ∩ (πpname ((Diagnosis ⨝ ToDiagnose) ⨝ σresult='false' Outcomes))

Jane'

'Mehdi'

Why will simply taking intersection of the previous two queries result in wrong results?

33 of 59

Query 4: Who has a disease they have tested both positively & negatively for?

33

(πpname ((Diagnosis ⨝ ToDiagnose) ⨝ σresult='true' Outcomes)) ∩ (πpname ((Diagnosis ⨝ ToDiagnose) ⨝ σresult='false' Outcomes))

Jane'

'Mehdi'

Why will simply taking intersection of the previous two queries result in wrong results?

A person could be in the answer if he/she tested positively for one disease and negatively for another!

34 of 59

Query 4: Who has a disease they have tested both positively & negatively for? - Take 2!

34

Jane'

'Mehdi'

pos = πpname,disease ((Diagnosis ⨝ ToDiagnose) ⨝ σresult='true' Outcomes)

neg = (πpname,disease ((Diagnosis ⨝ ToDiagnose) ⨝ σresult='false' Outcomes))

πpname (pos ∩ neg)

Turns out the results did not change between the wrong & right version of the query. But they could have. Play around with the database by inserting perhaps more people into diagnosis until the answers change

35 of 59

Query 5: Who tested both positively and negatively for a disease, whether or not they have it…..

35

35

ToDiagnose

Disease

Test

Strep

Sterp

A

B

Outcome

Patient

Test

Outc

Mary

A

T

Mary has been tested for strep positive outcome

A is a test for Strep

Mary tested for Strep negative outcome

B is a test for strep

Notice: Query ONLY needs Outcome and not Diagnosis Table!!

Outcome

Patient

Test

Outc

Mary

B

F

ToDiagnose

Disease

Test

Strep

Sterp

A

B

intersection

36 of 59

Query 5: Who tested both positively and negatively for a disease, whether or not they have it…..

36

pos = π pname,disease (ToDiagnose ⨝ σ result='true' Outcomes)

neg = π pname,disease (ToDiagnose ⨝ σ result='false' Outcomes)

πpname (pos ∩ neg)

Notice: Query ONLY needs Outcome and not Diagnosis Table!!

'Jane'

'Mehdi'

37 of 59

Query 6: Who tested both positively and negatively for the same test….

We need to join Outcomes table with itself and check for the same person having the same test once with result true and another time with false.

37

Outcome

Patient

Test

Outc

Mary

A

T

Outcome

Patient

Test

Outc

Mary

A

F

38 of 59

Query 6: Who tested both positively and negatively for the same test….

38

π Outcomes.pname ((Outcomes) ⨝Outcomes.pname = Outcomes1.pname AND Outcomes.test = Outcomes1.test AND Outcomes.result ≠ Outcomes1.result (ρ Outcomes1 (Outcomes)))

'Mehdi'

39 of 59

Query 7: What testable disease does no one have?

testable disease does no one have?

39

HINT HINT……think set difference…

Let us find

  1. names of all testable diseases → project diseases from ToDiagnose
  2. Find names of diseases people have.. Project diseases from Diagnosis

Then subtract b) from a)

40 of 59

Query 7: What testable disease does no one have?

40

πdisease(ToDiagnose) - πdisease(Diagnosis)

41 of 59

Query 8: What disease more than one person have?

41

Diagnosis

Patient

disease

date

Mary

flu

….

Diagnosis

Patient

disease

date

John

flu

….

Same disease

Different names

42 of 59

Query 8: What disease more than one person have?

42

πDiagnosis.disease ((Diagnosis) ⨝Diagnosis.pname≠Diagnosis2.pname AND Diagnosis.disease=Diagnosis2.disease (ρDiagnosis2 Diagnosis))

Diagnosis.disease

'COVID'

'Mono'

'Strep'

43 of 59

Query 9: What disease does everyone have?

Difficult to write query to answer directly! So think opposite….

Can we find diseases that not everyone has?

Let P be the set of people, and D the set of diseases……

Then P x D is the set of ALL possible people disease pairs.

Now we subtract from this set the table Diagnosis

- the result is a set of tuples, where each tuple represents a name of a person and the disease the person does NOT have.

NOTE: any disease that everyone has will NOT be in the table...

So if we subtract this table from the set of diseases, the result will be the set of disease EVERYONE has….

OK… we are now ready to express the above in relational algebra...

43

43

44 of 59

Query 9: What disease does everyone have?

AllDiseases = πdisease Diagnosis

AllPatients = πpname Diagnosis

DiseasesNotEveryoneHas= πdisease ((AllPatients x AllDiseases) - (πpname,disease Diagnosis))

DiseaseEveryonehas = AllDiseases - DiseasesNotEveryoneHas

DiseaseEveryonehas

44

Diagnosis

Diagnosis.disease

'Mono'

Result

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

'Mary'

'Flu'

2021-01-01

'Jane'

'COVID'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

'Bob'

'Mono'

2021-02-01

'Mary'

'Strep'

2021-02-16

'Mehdi'

'Meningitis'

2021-02-10

'Mary'

'Mono'

2021-01-01

'Jane'

'Mono'

2021-01-01

'Alex'

'Mono'

2021-01-03

45 of 59

Division Operator

  •  

45

 

 

46 of 59

Division

46

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Jane'

'COVID'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

'Bob'

'Mono'

2021-02-01

'Mary'

'Mono'

2021-01-01

'Jane'

'Mono'

2021-01-01

'Alex'

'Mono'

2021-01-03

Diagnosis

(πdisease, pname Diagnosis) ÷ (πpname Diagnosis)

Diagnosis.disease

'Mono'

Result

47 of 59

  • Motivation: “join” can lose information
  • E.g.: join of Patients and Diagnosis on name=pname loses info about Bob, Mehdi and Alex, since they do not join with other tuples
    • Called “dangling tuples

47

47

Outer Joins

  • Outer join: natural join, but use NULL values to fill in dangling tuples
  • Three types: “left”, “right”, or “full”

Patients

name

age

'Mary'

45

'Bob'

27

Diagnosis

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

48 of 59

Right Outer Join

48

48

Patients

Diagnosis

Pad null value for right dangling tuples

Patients × Diagnosis

name

age

'Mary'

45

'Bob'

27

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

Right Outer Join: Patients Patients.name=Diagnosis.pname Diagnosis

Patients.name

Patients.age

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

'Mary'

45

'Mary'

'Flu'

2021-01-01

'Mary'

45

'Mehdi'

'Mono'

2021-01-02

'Mary'

45

'Alex'

'Strep'

2021-01-03

'Bob'

27

'Mary'

'Flu'

2021-01-01

'Bob'

27

'Mehdi'

'Mono'

2021-01-02

'Bob'

27

'Alex'

'Strep'

2021-01-03

Patients.name

Patients.age

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

'Mary'

45

'Mary'

'Flu'

2021-01-01

NULL

NULL

'Mehdi'

'Mono'

2021-01-02

NULL

NULL

'Alex'

'Strep'

2021-01-03

49 of 59

Left Outer Join

49

49

Patients

Diagnosis

Pad null value for left dangling tuples

Patients × Diagnosis

name

age

'Mary'

45

'Bob'

27

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

Patients.name

Patients.age

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

'Mary'

45

'Mary'

'Flu'

2021-01-01

'Bob'

27

NULL

NULL

NULL

Left Outer Join: Patients Patients.name=Diagnosis.pname Diagnosis

Patients.name

Patients.age

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

'Mary'

45

'Mary'

'Flu'

2021-01-01

'Mary'

45

'Mehdi'

'Mono'

2021-01-02

'Mary'

45

'Alex'

'Strep'

2021-01-03

'Bob'

27

'Mary'

'Flu'

2021-01-01

'Bob'

27

'Mehdi'

'Mono'

2021-01-02

'Bob'

27

'Alex'

'Strep'

2021-01-03

50 of 59

Full Outer Join

50

50

Patients

Diagnosis

Pad null values for both left and right dangling tuples

Patients × Diagnosis

name

age

'Mary'

45

'Bob'

27

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

Full Outer Join: Patients Patients.name=Diagnosis.pname Diagnosis

Patients.name

Patients.age

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

'Mary'

45

'Mary'

'Flu'

2021-01-01

'Mary'

45

'Mehdi'

'Mono'

2021-01-02

'Mary'

45

'Alex'

'Strep'

2021-01-03

'Bob'

27

'Mary'

'Flu'

2021-01-01

'Bob'

27

'Mehdi'

'Mono'

2021-01-02

'Bob'

27

'Alex'

'Strep'

2021-01-03

'Alex'

'Strep'

2021-01-03

Patients.name

Patients.age

Diagnosis.pname

Diagnosis.disease

Diagnosis.diagnosis_date

'Mary'

45

'Mary'

'Flu'

2021-01-01

'Bob'

27

NULL

NULL

NULL

NULL

NULL

'Mehdi'

'Mono'

2021-01-02

NULL

NULL

'Alex'

'Strep'

2021-01-03

Might not work on Relax

51 of 59

Bag Semantics

  • Set semantics: no duplicates
  • Bag semantics: duplicates possible, no order
    • SQL uses the bag semantics: a table could have duplicated tuples
    • Reason: efficient processing.
      • E.g., while doing a projection, no need to remove duplicates

51

51

A bag.

name

age

'Mary'

45

'Jane'

60

'Alex'

30

'Alex'

30

'Bob'

27

Patients

52 of 59

Bag Operations

  • Bag Union: Combine elements from two relations
    • E.g., {1,3,7} ∪{3,5,6}={1,3,3,5,6,7}

  • Bag intersection: take the minimum of the number of occurrences in each bag
    • E.g.: {1,1,2,2,3} ∩ {1,2,2,2,3,4}={1,2,2,3}

  • Bag difference: proper-subtract the number of occurrences in the two bags
    • E.g.: {1,1,1,2,2,3}-{1,2,3,4}={1,1,2}

52

52

53 of 59

Laws for Bags

  • Some laws for sets still hold for bags
    • E.g., union and intersection are still commutative and associative
    • R∩S = S∩R, (R∩S) ∩ T = R∩(S∩T)

  • Some laws for sets might not hold for bags:
    • R∩(S∪T) = (R∩S)∪ (R∩T) is true for sets
    • But it is not true for bags. E.g.:
      • R=S=T={1}
      • R∩(S∪T)={1}
      • (R∩S)∪ (R∩T)={1,1}

53

53

54 of 59

Extended Relational Algebra

  • By default: the relational algebra is using the set semantics

  • But when necessary, we can use the bag semantics (need to be specified explicitly)

  • More operations needed:
    • Duplicate-elimination operator δ
    • Sorting operator τ
    • Grouping and aggregation operator γ

54

54

55 of 59

Duplicate elimination δ

  • δ(R) = relation with one copy of each tuple that appears one or more times in R

55

55

name

age

'Mary'

45

'Jane'

60

'Alex'

30

'Alex'

30

'Bob'

27

Patients

name

age

'Mary'

45

'Jane'

60

'Alex'

30

'Bob'

27

δ (Patients)

56 of 59

Sorting τ

  • τL(R) = sort the records in R according to attributes on list L

56

56

τname (Patients)

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

Patients

name

age

'Alex'

30

'Bob'

27

'Jane'

60

'Mary'

45

'Mehdi'

33

57 of 59

Aggregation γ

57

57

γ SUM(age)→sum_age (Patients)

γ AVG(age)→avg_age (Patients)

γ COUNT(*)→count (Patients)

γ MAX(age)→max_age (Patients)

γ MIN(age)→min_age (Patients)

  • γθ(A)(R) : do aggregation θ on attribute A
  • θ can be COUNT, SUM, AVG, MAX, or MIN.

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

Patients

5

195

39

60

27

58 of 59

Grouping and aggregation γ

  • Group R according to attributes on list L γL,θ(A) (R)
  • Within each group, do aggregation θ(A)

58

58

πdisease,avg_age (γdisease; AVG(age)→avg_age (Patients ⨝name=pname Diagnosis))

Group tuples according to “disease”

Then do the aggregation within each group.

name

age

'Mary'

45

'Jane'

60

'Mehdi'

33

'Alex'

30

'Bob'

27

pname

disease

diagnosis_date

'Mary'

'Flu'

2021-01-01

'Jane'

'COVID'

2021-01-01

'Mehdi'

'Mono'

2021-01-02

'Alex'

'Strep'

2021-01-03

'Bob'

'Mono'

2021-02-01

'John'

'COVID'

2021-02-10

'Al'

'Strep'

2021-02-16

'Mary'

'Strep'

2021-02-16

'Mehdi'

'Meningitis'

2021-02-10

Patients

Diagnosis

name

age

disease

diagnosis_date

'Mary'

45

'Flu'

2021-01-01

'Alex'

'Mary'

30

45

'Strep'

'Strep'

2021-01-03

2021-02-16

'Jane'

'John'

60

60

'COVID'

'COVID'

2021-01-01

2021-02-10

'Bob'

'Mehdi'

27

33

'Mono'

'Mono'

2021-02-01

2021-01-02

'Mehdi'

33

'Meningitis'

2021-02-10

Diagnosis.disease

avg_age

'Flu'

45

'Strep'

37.5

'COVID'

60

'Mono'

30

'Meningitis'

33

59 of 59

Limitation of Relational Algebra

  • Some queries cannot be represented

  • Example, recursive queries:
    • Table R(Parent,Child)
    • How to find all the ancestors of “Tom”?
    • Impossible to write this query in relational algebra

  • More expressive languages needed:
    • E.g., Datalog

59

59