1 of 29

CSE 344: Section 4

Relational Algebra, Subqueries (again)

Oct 17th, 2024

2 of 29

Announcements

  • Homework 3:
    • Due at 11:00 pm on Wednesday, October 23
    • Autograder might run slower or even time out near deadlines - submit early
    • If you suddenly aren’t able to login to your query editor or run queries, your credits might have run out. Post on Ed so we can figure out why and fix it.
  • Midterm
    • Scheduled for 9:30-10:20 am on Friday, October 25 (during lecture)
    • Past exams can be found on the website
  • Questions?

3 of 29

Relational Algebra

4 of 29

RA Operators

Standard:

⋂ - Intersect

⋃ - Union

- Difference

σ - Select

π - Project

⍴ - Rename

Extended:

δ - Duplicate Elim.

ɣ - Group/Agg.

τ - Sorting

Joins:

⨝ - Nat. Join

⟕ - L.O. Join

⟖ - R.O. Join

⟗ - F.O. Join

✕- Cross Product

4

5 of 29

Ɣ Notation

Grouping and aggregation on group:

ɣattr_1, …, attr_k, count/sum/max/min(attr) -> alias

Aggregation on the entire table:

ɣcount/sum/max/min(attr) -> alias

5

6 of 29

Format

  • Follows FJWGHOS structure

6

7 of 29

Query Plans (Example SQL -> RA)

Select-Join-Project structure

Make this SQL query into RA (remember FWGHOS):

SELECT R.b, T.c, max(T.a) AS T_max

FROM Table_R AS R, Table_T AS T

WHERE R.b = T.b

GROUP BY R.b, T.c

HAVING max(T.a) > 99

7

8 of 29

Query Plans (Example SQL -> RA)

Select-Join-Project structure

Make this SQL query into RA (remember FJWGHOS):

SELECT R.b, T.c, max(T.a) AS T_max

FROM Table_R AS R, Table_T AS T

WHERE R.b = T.b

GROUP BY R.b, T.c

HAVING max(T.a) > 99

πR.b, T.c, T_max

|

σT_max > 99

|

ɣR.b, T.c, max(T.a)->T_max

|

R.b=T.b

/ \

R T

8

9 of 29

Nested Queries Review!

10 of 29

Nested Queries

  • Queries inside other queries
    • Usually simplifies or factors out part of the outer query
    • Use case is similar to helper methods
  • Can go in multiple places!
    • SELECT
    • FROM
    • WHERE/HAVING
    • ... basically anywhere we use a table

11 of 29

Nested Queries

  • Avoid when possible
  • Danger of making simple queries slow and complicated
  • Just because you can do it, doesn’t mean you should

11

12 of 29

Subquery in SELECT

12

SELECT DISTINCT C.cname, (SELECT COUNT(*)

FROM Product P

WHERE P.cid = C.cid)

FROM Company C;

13 of 29

Subquery in SELECT

Unnest using JOIN and GROUP BY

13

SELECT C.cname, count(P.cid)

FROM Company C LEFT OUTER JOIN

Product P ON C.cid = P.cid

GROUP BY C.cname;

14 of 29

Nested Queries

  • If the SQL subquery returns exactly one value it can be used where we use a field
    • “one value” = one tuple with one attribute
  • Otherwise, a SQL subquery can be thought of as an "extra table"
    • Can name the table, its columns, etc

14

15 of 29

Subquery in FROM

15

SELECT X.pname

FROM (SELECT *

FROM Product

WHERE price > 20) AS X

WHERE X.price < 500;

More readable: WITH <name> AS (<subquery>)

16 of 29

Subquery in FROM

16

WITH price_more_20 AS (

SELECT *

FROM Product

WHERE price > 20

)

SELECT X.pname

FROM price_more_20 AS X

WHERE X.price < 500;

17 of 29

Subquery in FROM

Unnest using WHERE

17

SELECT X.pname

FROM Product AS X

WHERE X.price < 500 AND X.price > 20;

18 of 29

Subquery in WHERE (example 1)

18

SELECT DISTINCT C.cname

FROM Company C

WHERE EXISTS (SELECT *

FROM Product P

WHERE C.cid = P.cid AND P.price < 200);

19 of 29

Subquery in WHERE (example 1)

19

SELECT DISTINCT C.cname

FROM Company C, Product P

WHERE C.cid = P.cid AND P.price < 200;

20 of 29

Subquery in WHERE (example 2)

20

SELECT S1.major, S1.num_of_students, S1.school_name

FROM Schools S1

WHERE S1.num_of_students >= ALL (SELECT S2.num_of_students

FROM Schools S2

WHERE S1.school_name =

S2.school_name);

21 of 29

Subquery in WHERE (example 2)

21

SELECT S1.major, S1.num_of_students, S2.school_name

FROM Schools S1, Schools S2

WHERE S1.school_name = S2.school_name

GROUP BY S1.major, S1.num_of_students, S2.school_name

HAVING S1.num_of_students = MAX(S2.num_of_students);

22 of 29

Subquery in WHERE Syntax

  • SELECT ……… WHERE EXISTS (<sub>);
  • SELECT ……… WHERE NOT EXISTS (<sub>);
  • SELECT ……… WHERE attribute IN (<sub>);
  • SELECT ……… WHERE attribute NOT IN (<sub>);
  • SELECT ……… WHERE attribute > ANY (<sub>);
  • SELECT ……… WHERE attribute > ALL (<sub>);

22

23 of 29

Witnessing (i.e. argmax)

Find the student(s) who is taking the most classes.

Student(stu_id, id_num)

Enrolled(id_num, class)

SELECT S.stu_id

FROM Student S, Enrolled E

WHERE S.id_num = E.id_num

GROUP BY S.stu_id

HAVING count(E.class) >= ALL(

SELECT count(E1.class)

FROM Enrolled E1

GROUP BY E1.id_num);

23

johndoe

973

maryjane

712

alsmith

899

973

CSE 311

973

CSE 344

712

CSE 311

899

CSE 351

24 of 29

Witnessing (i.e. argmax) (another way)

24

WITH class_counts AS (SELECT COUNT(E1.class) AS cnt

FROM Enrolled E1

GROUP BY E1.id_num),

max_counts AS (SELECT MAX(cnt) AS max FROM class_counts)

SELECT S.stu_id

FROM Students S, Enrolled E, max_counts Emax

WHERE S.id_num = E.id_num

GROUP BY S.stu_id, Emax.max

HAVING COUNT(E.class) = Emax.max;

25 of 29

To Nest or Not to Nest

  • Not an exact science
  • Figuring out what is actually wanted will help you find simpler solutions (best way is to practice)
  • Trigger words to use sub-querying
    • Every, All (universal quantifiers)
    • No, None, Never (negation)
    • Only

25

26 of 29

Monotonic Query

Query that does not lose any output tuples with the addition of new tuples in the database.

Theorem/helpful hint: If a query is a SELECT-FROM-WHERE query without nested subqueries or aggregates then it is monotone.

Consequence: If a query is not monotonic, then we cannot write it as a SELECT-FROM-WHERE query without nested subqueries or aggregates.

27 of 29

(Non-)monotonic Queries

  • “Can we take back outputs by looking at more data?”
  • Is this a monotonic query?

SELECT count(*)

FROM T1

GROUP BY T1.attr

27

28 of 29

(Non-)monotonic Queries

  • “Can we take back outputs by looking at more data?”
  • Is this a monotonic query?

SELECT count(*)

FROM T1

GROUP BY T1.attr

No! This query does not satisfy set containment.

Ex:

Current output: {(6), (23), (10)}

After more data: {(6), (23), (11)}

{(6), (23), (10)} ⊄ {(6), (23), (11)}

28

29 of 29

Worksheet