1 of 32

SQL - Data Query Language - II

Mark Fontenot, PhD

Northeastern University

1

2 of 32

Joins

2

3 of 32

Level 0: Simple Joins

Use JOIN keyword to combine data from two tables.

Must provide ON or USING

(or you just get the cartesian product)

3

SELECT <list of columns>

FROM <table name> [JOIN <table name>

ON <condition>]

[WHERE <condition list>]

[ORDER BY <column-order list>];

General Form

4 of 32

Level 0: Simple Joins

- Provide a list of all product names with the name of the category each belongs to.

4

SELECT ProductName, CategoryName

FROM Products JOIN Categories

ON Products.CategoryID = Categories.CategoryID;

Example:

the JOIN Condition

without it, you get Cartesian product

5 of 32

Level 1: Natural Joins

NATURAL JOIN will join on attributes with the same names.

- adds an implicit ON clause.

5

SELECT <list of columns>

FROM <table name> NATURAL JOIN <table name>

[WHERE <condition list>]

[ORDER BY <column-order list>];

General Form

6 of 32

Level 1: Natural Joins

- Provide a list of all product names with the name of the category each belongs to.

6

SELECT ProductName, CategoryName

FROM Products NATURAL JOIN Categories

Example:

If no matching attribute names, you get the Cartesian product.

7 of 32

Aside - the soft Join

FROM clause provides comma-separated list of tables

WHERE clause provides a predicate for each join.

This style mixes the filtering of rows of tables with the join condition… can get a little messy.

Stick with the syntax on the previous slides.

7

SELECT ProductName, CategoryName

FROM Products, Categories

WHERE Products.CategoryID = Categories.CategoryID;

Example:

8 of 32

Your Turn

1. Which cities in either Ireland or Poland did Michael Suyama (an employee) ship items to customers?

8

9 of 32

Your Turn

2a. Which products have suppliers in Germany?

2b. Which, if any, of those products from Germany are on backorder?

9

10 of 32

Your Turn

3. Provide a list of all suppliers that Northwind purchases products in the Confections category from.

10

11 of 32

Outer Join

The joins we’ve seen so far (also sometimes called Inner Joins) retain only rows from the Cartesian product where the join condition is true.

An Outer Join includes everything from the inner join, but also includes one or both of the following:

  • all rows from the join’s left operand dataset that doesn’t have a match (left outer join)
  • all rows from the join’s right operand dataset that doesn’t have a match (right outer join)

11

12 of 32

Left Outer Join - Intuition

12

sID

Name

112

Mark

234

Sam

643

Rohit

sID

cID

Grade

112

cs3200

A

112

cs2500

B-

643

cs1800

A-

643

ds2000

A-

Student

Grades

Student LEFT OUTER JOIN Grades

ON Student.sID = Grades.sID

Student.sID

Name

Grades.sID

cID

Grade

112

Mark

112

cs3200

A

112

Mark

112

cs2500

B-

643

Rohit

643

cs1800

A-

643

Rohit

643

ds2000

A-

234

Sam

null

null

null

13 of 32

Full Outer Join - Intuition

13

p

q

1

ball

2

bat

3

buzz

r

s

2

cat

3

dog

4

fish

A

B

A FULL OUTER JOIN B

ON A.p = B.r

p

q

r

s

1

ball

Null

Null

2

bat

2

cat

3

buzz

3

dog

Null

Null

4

fish

14 of 32

Thinking Through Joins

15 of 32

Thinking Through Joins

Northwind has been having some problems with Speedy Express shipping company… they don’t seem to be too speedy. Provide a list of all orders shipped via Speedy Express and include the order id, customer name, order date, and shipped date for all orders in 2022.

15

Maybe you think to yourself… “Where do I start?”

16 of 32

Thinking Through Joins

Northwind has been having some problems with Speedy Express shipping company… they don’t seem to be too speedy. Provide a list of all orders shipped via Speedy Express and include the order id, customer name, order date, and shipped date for all orders in 2022.

16

1. Think about what columns are needed in the final result.

17 of 32

Thinking Through Joins

Northwind has been having some problems with Speedy Express shipping company… they don’t seem to be too speedy. Provide a list of all orders shipped via Speedy Express and include the order id, customer name, order date, and shipped date for all orders in 2022.

17

2. What tables are those attributes in?

- Minimally, your FROM clause will need to include those tables.

- Are there other tables needed that link the above?

Tables:�- Shippers

- Orders

- Customers

18 of 32

Thinking Through Joins

18

3. What join conditions do I need for the tables?

- Can I natural join all of them?

- If no, what attributes link the different pairs of tables?

- Since Shippers and Orders do not have a common attribute name, we can’t use a Natural Join.

- Must use a JOIN … ON

19 of 32

Thinking Through Joins

Northwind has been having some problems with Speedy Express shipping company… they don’t seem to be too speedy. Provide a list of all orders shipped via Speedy Express and include the order id, customer name, order date, and shipped date for all orders in 2022.

19

4. What, if any, filters are needed for the rows?

20 of 32

Thinking Through Joins – Putting it all Together

20

1. Think about what columns are needed in the final result.

2. What tables are those attributes in?

3. What join conditions do I need for the tables?

4. What, if any, filters are needed for the rows?

SELECT o.orderid, c.CompanyName, o.OrderDate, o.ShippedDate

FROM Shippers s JOIN Orders o ON s.shipperid = o.shipvia

JOIN Customers c ON o.customerid = c.customerid

WHERE s.CompanyName = ‘Speedy Express’

AND o.OrderDate >= ‘2022-01-01’

21 of 32

What’s Wrong With the Following?

Northwind has been having some problems with Speedy Express shipping company… they don’t seem to be too speedy. Provide a list of all orders shipped via Speedy Express and include the order id, customer name, order date, and shipped date for all orders in 2022.

21

SELECT o.orderid,

c.CompanyName,

o.OrderDate,

o.ShippedDate

FROM shippers s

JOIN orders o ON s.shipperid = o.shipvia

NATURAL JOIN customers c

WHERE s.companyName = 'Speedy Express'

AND o.orderdate >= '2022-01-01';

22 of 32

GROUP BY and HAVING Clauses

23 of 32

Applying Aggregates to Groups

- The GROUP BY clause allows you to define subgroups for the application of an aggregate function.

- The GROUP BY list should be a subset of the SELECT attributes.

23

SELECT [DISTINCT] <attribute list>

FROM <table list>

[WHERE <condition list>]

[GROUP BY <attribute list>]

[ORDER BY <attribute-order list>];

24 of 32

Group By -

Intuition

Problem: Provide the average GPA for all students in each year of their education.

24

sID

Name

Year

GPA

121

Rohit

2

3.8

134

Jack

1

3.7

231

Sam

1

3.6

114

Sal

2

3.5

153

Alex

1

3.5

25 of 32

Group By -

Intuition

Problem: Provide the average GPA for all students in each year of their education.

GROUP BY Year

25

sID

Name

Year

GPA

121

Rohit

2

3.8

134

Jack

1

3.7

231

Sam

1

3.6

114

Sal

2

3.5

153

Alex

1

3.5

sID

Name

Year

GPA

153

Alex

1

3.5

134

Jack

1

3.7

231

Sam

1

3.6

114

Sal

2

3.5

121

Rohit

2

3.8

Average these vals

Average these vals…

26 of 32

Group By -

Intuition

Problem: Provide the average GPA for all students in each year of their education.

SELECT year, AVG(GPA) as avg_gpa

FROM students

GROUP BY year;

26

sID

Name

Year

GPA

121

Rohit

2

3.8

134

Jack

1

3.7

231

Sam

1

3.6

114

Sal

2

3.5

153

Alex

1

3.5

27 of 32

Sample Group By

Determine the average price of products from each of our suppliers.

27

28 of 32

You can apply multiple agg functions to a group in 1 stmt.

For each category of products, give the name of the product category, min product unit price, max product unit price, and average product unit price.

28

SELECT A, MAX(B), MIN(B)

FROM table

GROUP BY A;

SELECT categoryName,

min(unitPrice) AS minPrice,

max(unitPrice) AS maxPrice,

avg(unitPrice) AS avgPrice

FROM Categories C JOIN Products P

ON C.categoryID = P.categoryID

GROUP BY categoryName;

29 of 32

HAVING Clause

- used in conjunction with GROUP BY

- a set of boolean expressions that are evaluated for each group

- If true, group is returned. If false, group is discarded.

29

Provide a list of all regions of the world where Northwind has more than 10 customers.

SELECT region,

count(customerid) AS numCustomers

FROM Customers

GROUP BY region

HAVING count(customerid) >= 10;

30 of 32

HAVING

30

SELECT <list of columns>

FROM <table name> [JOIN <table name> � ON <join condition> [...]]

[WHERE <condition list>]

[GROUP BY <attribute list>]

[HAVING <group conditions>]

[ORDER BY <column [ASC|DESC]>,[...]]

General�Form

31 of 32

Challenge Problem

Provide a list of all customers whose average order total is below the average for all companies in their same region.

31

32 of 32

Review Questions

  1. What is SQL?
  2. What is DML? DDL? DQL?
  3. How do you return all columns from a table?
  4. How do you return a subset of columns from a table?
  5. How do you return a subset of rows from a table?
  6. How do you return only unique rows from a query?
  7. How do you combine data from two tables?
  8. How do you perform basic aggregation on data from a table?
  9. How do you perform perform basic aggregation on data from a table based on category or group?
  10. How do you limit the results of basic grouped data aggregation?
  11. How do you dictate the order in which rows will be returned from a query?