1 of 25

CSE 414: Section 3

Azure Setup and (Un)Nesting

January 23rd, 2020

1

2 of 25

Administrivia

  • HW 3 due January 28th 2020
  • Start early and come to us with azure problems!

2

3 of 25

Group By

  • Powerful tool to handle “categories”
    • Treat rows with a same attribute as a category
  • Careful when selecting
    • Only select attributes appeared in GROUP BY or aggregates
    • SQLite will guess (arbitrarily pick a value)¯\_(ツ)_/¯
    • SQL Server will throw an error ง •̀_•́)ง

3

4 of 25

Group By - Examples

Do these queries work?

Enrolled(stu_id, course_num)

SELECT stu_id, course_num

FROM Enrolled

GROUP BY stu_id

SELECT stu_id, count(course_num)

FROM Enrolled

GROUP BY stu_id

4

johndoe

311

johndoe

344

maryjane

311

maryjane

351

maryjane

369

5 of 25

Group By - Examples

Do these queries work?

Enrolled(stu_id, course_num)

SELECT stu_id, course_num

FROM Enrolled

GROUP BY stu_id

SELECT stu_id, count(course_num)

FROM Enrolled

GROUP BY stu_id

5

johndoe

?

maryjane

?

6 of 25

Group By - Examples

Do these queries work?

Enrolled(stu_id, course_num)

SELECT stu_id, course_num

FROM Enrolled

GROUP BY stu_id

SELECT stu_id, count(course_num)

FROM Enrolled

GROUP BY stu_id

6

johndoe

2

maryjane

3

johndoe

311

johndoe

344

maryjane

311

maryjane

351

maryjane

369

7 of 25

Azure Setup

7

8 of 25

Setup a New Database

  • Follow the steps on the hw3.md
    • Not perfect, but it guides you through the major steps
  • For importing the data check this piazza post, we had to create a new blob so some connection information is different
    • The hw3.md is updated on the upstream with the same information, use `git pull upstream master` to get the updated version
    • Make sure to run the `CREATE EXTERNAL …` command before running any `bulk import…` statements

8

9 of 25

Connecting to the Database

A few options:

  • Azure SQL Query Editor (won’t work, has a max timeout of 5 minutes per query and it doesn’t appear to be possible to increase it -- let us know if you find a way)
  • JetBrains DataGrip (download) follow instructions here to connect
    • If you sign up with a ‘.edu’ e-mail on JetBrains website, you can get a year long subscription
  • VSCode MS SQL Extension (checkout this piazza post or this article)

9

10 of 25

Start Early

  • You have until the 28th to do the homework…
  • Lots of time… but LOTS OF WORK
  • Harder queries, more complicated setup, more frustration
  • Can sometimes take awhile to get your Azure account setup, let us know if you don’t have one ASAP
    • The night before HW is due is much too late

10

11 of 25

(Un)nesting Queries

11

12 of 25

Witnessing (i.e. argmax)

Find the student who is taking the most classes.

Student(stu_id, id_num)

Enrolled(id_num, class)

SELECT S.stu_id

FROM Student S, Enrolled E

WHERE S.id_num = E.id_num

GROUP BY S.stu_id

HAVING COUNT(E.class) >= ALL(

SELECT COUNT(E1.class)

FROM Enrolled E1

GROUP BY E1.id_num);

12

johndoe

973

maryjane

712

alsmith

899

973

CSE 311

973

CSE 344

712

CSE 311

899

CSE 351

13 of 25

Witnessing (i.e. argmax)

Find the student who is taking the most classes.

Student(stu_id, id_num)

Enrolled(id_num, class)

SELECT S.stu_id

FROM Student S, Enrolled E

WHERE S.id_num = E.id_num

GROUP BY S.stu_id

HAVING COUNT(E.class) = (

SELECT MAX(C) FROM (

SELECT COUNT(E1.class) AS C

FROM Enrolled E1

GROUP BY E1.id_num));

13

johndoe

973

maryjane

712

alsmith

899

973

CSE 311

973

CSE 344

712

CSE 311

899

CSE 351

Alternative to “ALL”

14 of 25

Nested Queries

  • Avoid when possible
  • Danger of making simple queries slow and complicated
  • Just because you can do it, doesn’t mean you should

14

15 of 25

Subquery in SELECT

SELECT DISTINCT C.cname, (SELECT count(*)

FROM Product P

WHERE P.cid=C.cid)

FROM Company C

15

16 of 25

Subquery in SELECT

Unnest using JOIN and GROUP BY

SELECT C.cname, count(P.cid)

FROM Company C LEFT OUTER JOIN

Product P ON C.cid = P.cid

GROUP BY C.cname;

16

17 of 25

Subquery in FROM

SELECT X.pname � FROM (SELECT *

FROM Product

WHERE price > 20) AS X

WHERE X.price < 500

More readable: WITH <alias> AS (<subquery>)

17

18 of 25

Subquery in FROM

Unnest using WHERE

SELECT X.pname � FROM Product AS X

WHERE X.price < 500 AND X.price > 20;

18

19 of 25

Subquery in WHERE

SELECT DISTINCT C.cname

FROM Company C

WHERE EXISTS (SELECT *� FROM Product P� WHERE C.cid = P.cid AND P.price < 200)

19

20 of 25

Subquery in WHERE

SELECT DISTINCT C.cname

FROM Company C, Product P

WHERE C.cid = P.cid AND P.price < 200

20

21 of 25

Subquery in WHERE Syntax

  • SELECT ……… WHERE EXISTS (<sub>);
  • SELECT ……… WHERE NOT EXISTS (<sub>);
  • SELECT ……… WHERE attribute IN (<sub>);
  • SELECT ……… WHERE attribute NOT IN (<sub>);
  • SELECT ……… WHERE attribute > ANY (<sub>);
  • SELECT ……… WHERE attribute > ALL (<sub>);

21

22 of 25

To Nest or Not to Nest

  • Not an exact science
  • Figuring out what is actually wanted will help you find simpler solutions (best way is to practice)
  • Trigger words to use sub-querying
    • Every, All (universal quantifiers)
    • No, None, Never (negation)
    • Only

22

23 of 25

Monotonicity

Definition: A query Q is monotone if whenever we add tuples to one or more input tables, the answer to the query will not lose any output tuples

Theorem: If Q is a SELECT-FROM-WHERE query that does not have subqueries,and no aggregates, then it is monotone

23

24 of 25

(Non-)monotonic Queries

  • “Can we take back outputs by looking at more data?”
  • Is this a monotonic query?

SELECT count(*)

FROM T1

GROUP BY T1.attr

24

25 of 25

(Non-)monotonic Queries

  • “Can we take back outputs by looking at more data?”
  • Is this a monotonic query?

SELECT count(*)

FROM T1

GROUP BY T1.attr

No! This query does not satisfy set containment.

Ex:

Current output: {(6), (23), (10)}

After more data: {(6), (23), (11)}

{(6), (23), (10)} ⊄ {(6), (23), (11)}

25