1 of 30

CSE 414: Section 3

(Un)Nesting Queries

October 13th, 2022

1

2 of 30

Administrivia

  • HW2 due October 17th @ 11pm
  • HW1 grades out soon
  • Using Microsoft Azure for HW3
    • Next week we will do an Azure setup demo in section

2

3 of 30

Where we started

FWS

(From, Where, Select)

3

4 of 30

And now...

FWGHOSTM

(From, Where, Group By, Having, Order By, Select)

4

5 of 30

Group By

  • Powerful tool to handle “categories”
    • Groups rows with the same value of an attribute into a “bucket” (think dividing into categories)
  • Careful when selecting
    • Only SELECT attributes in GROUP BY or aggregates
    • SQLite will guess (arbitrarily pick a value)¯\_(ツ)_/¯
    • SQL Server will throw an error ง •̀_•́)ง

5

6 of 30

Aggregates

  • Computes summary values for a set of tuples.

COUNT(attribute) - counts the number of tuples

SUM(attribute) - sums the value of the attribute among all tuples in set

MIN/MAX(attribute) - min/max value of the attribute among all tuples in the set

AVG(attribute) - avg value of the attribute among all tuples in the set

...

6

7 of 30

Group By - Examples

Do these queries work?

Enrolled(stu_id, course_num)

SELECT stu_id, course_num

FROM Enrolled

GROUP BY stu_id

SELECT stu_id, count(course_num)

FROM Enrolled

GROUP BY stu_id

7

johndoe

311

johndoe

344

maryjane

311

maryjane

351

maryjane

369

8 of 30

Group By - Examples

Do these queries work?

Enrolled(stu_id, course_num)

SELECT stu_id, course_num

FROM Enrolled

GROUP BY stu_id

SELECT stu_id, count(course_num)

FROM Enrolled

GROUP BY stu_id

8

johndoe

?

maryjane

?

9 of 30

Group By - Examples

Do these queries work?

Enrolled(stu_id, course_num)

SELECT stu_id, course_num

FROM Enrolled

GROUP BY stu_id

SELECT stu_id, count(course_num)

FROM Enrolled

GROUP BY stu_id

9

johndoe

2

maryjane

3

johndoe

311

johndoe

344

maryjane

311

maryjane

351

maryjane

369

10 of 30

Grouping

GROUP BY [attribute], …, [attribute_n]

HAVING [predicate] - operates on groups, chooses to keep or remove the entire group

10

11 of 30

WHY SUBQUERIES?

11

12 of 30

RECAP: THE WITNESSING PROBLEM

12

13 of 30

Difference?

13

14 of 30

Nested Queries

  • Avoid when possible
  • Danger of making simple queries slow and complicated
  • Just because you can do it, doesn’t mean you should

14

15 of 30

Subquery in SELECT

SELECT DISTINCT C.cname, (SELECT count(*)

FROM Product P

WHERE P.cid=C.cid)

FROM Company C

  • Intuition: generate a value associated to tuple

15

16 of 30

Subquery in SELECT

Unnest using JOIN and GROUP BY

SELECT C.cname, count(P.cid)

FROM Company C LEFT OUTER JOIN

Product P ON C.cid = P.cid

GROUP BY C.cname;

16

17 of 30

Subquery in FROM

SELECT X.pname � FROM (SELECT *

FROM Product

WHERE price > 20) AS X

WHERE X.price < 500

More readable: WITH <name> AS (<subquery>)

17

18 of 30

Subquery in FROM

Unnest using WHERE

SELECT X.pname � FROM Product AS X

WHERE X.price < 500 AND X.price > 20;

18

19 of 30

Subquery in WHERE

SELECT DISTINCT C.cname

FROM Company C

WHERE EXISTS (SELECT *� FROM Product P� WHERE C.cid = P.cid AND P.price < 200)

19

20 of 30

Subquery in WHERE

SELECT DISTINCT C.cname

FROM Company C, Product P

WHERE C.cid = P.cid AND P.price < 200

20

21 of 30

Subquery in WHERE Syntax

  • SELECT ……… WHERE EXISTS (<sub>);
  • SELECT ……… WHERE NOT EXISTS (<sub>);
  • SELECT ……… WHERE attribute IN (<sub>);
  • SELECT ……… WHERE attribute NOT IN (<sub>);
  • SELECT ……… WHERE attribute > ANY (<sub>);
  • SELECT ……… WHERE attribute > ALL (<sub>);

21

22 of 30

To Nest or Not to Nest

  • Not an exact science
  • Figuring out what is actually wanted will help you find simpler solutions (best way is to practice)
  • Trigger words to use sub-querying
    • Every, All (universal quantifiers)
    • No, None, Never (negation)
    • Only

22

23 of 30

Monotonicity

23

24 of 30

Monotonicity

Definition: A query Q is monotone if whenever we add tuples to one or more input tables, the answer to the query will not lose any existing output tuples

Theorem: If Q is a SELECT-FROM-WHERE query that does not have subqueries or aggregates, then it is monotone

24

25 of 30

(Non-)monotonic Queries

  • “Can we take back outputs by looking at more data?”
  • Is this a monotonic query?

SELECT count(*)

FROM T1

GROUP BY T1.attr

25

26 of 30

(Non-)monotonic Queries

  • “Can we take back outputs by looking at more data?”
  • Is this a monotonic query?

SELECT count(*)

FROM T1

GROUP BY T1.attr

No! This query does not satisfy set containment.

Ex:

Current output: {(6), (23), (10)}

After more data: {(6), (23), (11)}

{(6), (23), (10)} ⊄ {(6), (23), (11)}

26

27 of 30

(Non-)monotonic Queries

  • “Can we take back outputs by looking at more data?”
  • Is this a monotonic query?

SELECT count(*)

FROM T1

GROUP BY T1.attr

No! This query does not satisfy set containment.

Ex:

Current output: {(6), (23), (10)}

After more data: {(6), (23), (11)}

{(6), (23), (10)} ⊄ {(6), (23), (11)}

27

Aggregates are generally sensitive to new tuples

28 of 30

Consequences of Monotonicity

Theorem: If a query is monotonic, it implies that a nested query can actually be unnested.

28

29 of 30

(AGAIN) Witnessing (i.e. argmax)

Find the student who is taking the most classes.

Student(stu_id, id_num)

Enrolled(id_num, class)

SELECT S.stu_id

FROM Student S, Enrolled E

WHERE S.id_num = E.id_num

GROUP BY S.stu_id

HAVING count(E.class) >= ALL(

SELECT count(E1.class)

FROM Enrolled E1

GROUP BY E1.id_num);

29

johndoe

973

maryjane

712

alsmith

899

973

CSE 311

973

CSE 344

712

CSE 311

899

CSE 351

30 of 30

Witnessing (i.e. argmax)

Find the student who is taking the most classes.

Student(stu_id, id_num)

Enrolled(id_num, class)

SELECT S.stu_id

FROM Student S, Enrolled E

WHERE S.id_num = E.id_num

GROUP BY S.stu_id

HAVING COUNT(E.class) = (

SELECT MAX(C) FROM (

SELECT COUNT(E1.class) AS C

FROM Enrolled E1

GROUP BY E1.id_num));

30

johndoe

973

maryjane

712

alsmith

899

973

CSE 311

973

CSE 344

712

CSE 311

899

CSE 351

Alternative to “ALL”