1 of 56

CSE 344: Section 4

ER diagrams, BCNF decomposition

Oct 19th, 2023

2 of 56

Announcements

  • Homework 4:
    • Due at 10:00 pm on Tuesday, October 24th
    • Focus on ER diagrams and BCNF

3 of 56

Database Design

4 of 56

E/R diagram Basics

is a

Subclass

Attribute

Entity set

Relationship

Weak Entity

5 of 56

Subclass

CREATE TABLE Product (

pname INT PRIMARY KEY,

price INT)

CREATE TABLE Toy (

pname INT PRIMARY KEY REFERENCES Product,

age INT)

CREATE TABLE Candy (

pname INT PRIMARY KEY REFERENCES Product,

isChoc TEXT)

6 of 56

Weak Entity Set

7 of 56

Relationships

  • One-to-one
  • Many-to-one
  • Many-to-many

8 of 56

Integrity Constraints in ER Diagrams

Product

price

name

category

makes

makes

makes

Keys

Single Value Constraints (many-one) [possibly none]

Referential Integrity Constraints (many-one) [exactly one]

Other Constraints

>2

9 of 56

Examples of Relationships

  • one-one: ssn - UW student id
  • one-many: ssn - phone#
  • many-many: store - product
  • is-a: PC - computer and Mac - computer
  • has-a: city - country
    • What country does the city of Cambridge belong to?

Question: "at most one" vs. "exactly one"?

10 of 56

Worksheet Exercise 1:

Entity Relationship Diagrams

11 of 56

Worksheet Exercise 1: Entity Relationship Diagrams

Odegaard Library is in need of a new database, and they have asked you to help design it! Here are some of the requirements for what information needs to be stored in this database:

  • Each book has a unique ID, a title, an author, a genre, and a number of pages
  • Readers who visit the library have a unique email address, a first name, a last name, and an age
  • Readers can “check out” multiple books from the library at a time, and one book can be checked out multiple times. We should keep track of the day that each book was checked out
  • To make it easier to recommend books to readers, we should assign a recommended age for each genre
  1. Design an ER diagram for this new library database.

12 of 56

Worksheet Exercise 1: Entity Relationship Diagrams

  • Each book has a unique ID, a title, an author, a genre, and a number of pages

book

num_of_pages

id

title

author

13 of 56

Worksheet Exercise 1: Entity Relationship Diagrams

book

num_of_pages

id

title

author

  • Each book has a unique ID, a title, an author, a genre, and a number of pages
  • To make it easier to recommend books to readers, we should assign a recommended age for each genre

14 of 56

Worksheet Exercise 1: Entity Relationship Diagrams

book

num_of_pages

id

title

author

has_genre

genre

name

recommended_age

  • Each book has a unique ID, a title, an author, a genre, and a number of pages
  • To make it easier to recommend books to readers, we should assign a recommended age for each genre

15 of 56

Worksheet Exercise 1: Entity Relationship Diagrams

reader

email

first_name

last_name

age

  • Readers who visit the library have a unique email address, a first name, a last name, and an age
  • Readers can “check out” multiple books from the library at a time, and one book can be checked out multiple times. We should keep track of the day that each book was checked out

16 of 56

Worksheet Exercise 1: Entity Relationship Diagrams

  • Readers who visit the library have a unique email address, a first name, a last name, and an age
  • Readers can “check out” multiple books from the library at a time, and one book can be checked out multiple times. We should keep track of the day that each book was checked out

reader

email

first_name

last_name

age

book

num_of_pages

id

title

author

has_genre

genre

name

recommended_age

check_out

day_check_out

17 of 56

Worksheet Exercise 1: Entity Relationship Diagrams

2) Convert the ER diagram to a series of CREATE TABLE statements. Include primary key and foreign key statements.

reader

email

first_name

last_name

age

book

num_of_pages

id

title

author

has_genre

genre

name

recommended_age

check_out

day_check_out

18 of 56

Worksheet Exercise 1: Entity Relationship Diagrams

CREATE TABLE Checkouts (

day_check_out VARCHAR(10),

book_id INT REFERENCES Books,

reader_email VARCHAR(50) REFERENCES Readers,

PRIMARY KEY (book_id, reader_email, day_check_out)

);

CREATE TABLE Readers (

email VARCHAR(50) PRIMARY KEY,

first_name VARCHAR(20),

last_name VARCHAR(20),

age INT

);

CREATE TABLE Books (

id INT PRIMARY KEY,

title VARCHAR(100),

author VARCHAR(50),

genre VARCHAR(20) REFERENCES Genres,

num_pages INT

);

CREATE TABLE Genres (

name VARCHAR(20) PRIMARY KEY,

recommended_age INT

);

2) Convert the ER diagram to a series of CREATE TABLE statements. Include primary key and foreign key statements.

19 of 56

Functional Dependencies

20 of 56

Functional Dependency

A → B

Indicates that attribute A determines attribute B (B is dependent on A).

Some examples:

  • student_id → graduation_year
  • origin_city → origin_state, (population)
  • flight_id → carrier_id, origin_city, dest_city, price, …

21 of 56

Transitive Dependency

If A → B and B → C

then A → C

{A}+ = Closure of attribute A: Set of all attributes that can be determined by A.

{A}+ = {ABC}

22 of 56

Closure

23 of 56

Closure Definition

Goal:

  • “If we are given the attribute/set of attributes {A1, …, Am}, what attributes will we know from using those attributes and their functional dependencies.”

 

Closure

24 of 56

Closure Algorithm

Algorithm Steps:

1. Find some attribute(s) C to add to right side

2. Add them

3. Look back at the FDs to find more C

 

25 of 56

Closure Example

  • Suppose we have the functional dependencies ID → Name and Name → Job, what is the closure of ID?

  • Step 1:

To determine the closure of ID we start with trivial FD: ID → ID because if we are given ID, then we know ID

    • So we add ID to X = {ID}

26 of 56

Closure Example

  • Suppose we have the functional dependencies ID → Name and Name → Job, what is the closure of ID?

  • Step 2: (X from Step 1: X = {ID})
    • ID → Name
    • Since we already know ID (X currently contains ID), we can use this functional dependency to get Name
    • So we add Name to X = {ID, Name}

27 of 56

Closure Example

  • Suppose we have the functional dependencies ID → Name and Name → Job, what is the closure of ID?

  • Step 3: (X from Step 2: X = {ID, Name})
    • Name → Job
    • Since we already know Name (X currently contains Name), we can use this functional dependency to get Job
    • So we add Job to X = {ID, Name, Job}

28 of 56

Closure Example

  • Suppose we have the functional dependencies ID → Name and Name → Job, what is the closure of ID?

  • Step 4: (X from Step 3: X = {ID, Name, Job})
    • Since we have run out of functional dependencies, X will no longer change so the algorithm is done!

Formally, the closure of ID is written as {ID}+ = {ID, Name, Job}

29 of 56

Closures (quicker)

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • If we have ID → Name and Name → C, then ID → Job
  • So really, ID → Name, Job
  • Add ID to the RHS so ID → ID, Name, Job
  • Formally, the closure of ID is written as {ID}+ = {ID, Name, Job}

30 of 56

Keys

We call the attribute(s) that determines all other attributes in a schema to be a superkey.

Ex. for R(A, B, C), the set {A, B, C} determines all attributes in R

If no subset of a superkey is also a superkey, then that superkey is a minimal key or key for short. If a relation has multiple keys, each key is a candidate key

Ex. if AB → ABC and C → ABC, then {AB} and {C} are candidate keys for R

Super Key

Minimal Key

31 of 56

Boyce-Codd Normal Form (BCNF)

32 of 56

Motivation of BCNF

Recall the three anomalies:

  • Redundancy anomaly
  • Update anomaly
  • Deletion anomaly

If a relation is in BCNF, then it does not have these anomalies

If it is not in BCNF, then it does have these anomalies

33 of 56

BCNF Algorithm

 

 

34 of 56

Example: BCNF Decomposition

Restaurant(id,name,rating,popularity,rec)

  1. id -> name, rating
  2. rating -> popularity
  3. popularity -> rec?
  • Example: We know that popularity determines rec (ex: Okay = N, Respectable = N, Poppin = Y)
    • What if we want to update the rec so that a popularity = ‘Respectable’ has rec= ‘Y’
      • UPDATE Restaurant

SET rec? = ‘Y’

WHERE popularity = ‘Respectable’;

  1. How many tuples in the Restaurant table do we need to update? How can we reduce this number?

id

name

rating

popularity

rec?

1

Mee Sum Pastry

3

Okay

N

2

Café on the Ave

4

Respectable

N

3

Guanaco’s Tacos

4

Respectable

N

4

Aladdin Gyro-Cery

5

Poppin

Y

Restaurant

35 of 56

Example: BCNF Decomposition

Restaurant(id,name,rating,popularity,rec)

  • id -> name, rating
  • rating -> popularity
  • popularity -> rec?

rating+ = rating, popularity, rec?

id

name

rating

popularity

rec?

1

Mee Sum Pastry

3

Okay

N

2

Café on the Ave

4

Respectable

N

3

Guanaco’s Tacos

4

Respectable

N

4

Aladdin Gyro-Cery

5

Poppin

Y

Restaurant

36 of 56

Example: BCNF Decomposition

id

name

rating

popularity

rec?

1

Mee Sum Pastry

3

Okay

N

2

Café on the Ave

4

Respectable

N

3

Guanaco’s Tacos

4

Respectable

N

4

Aladdin Gyro-Cery

5

Poppin

Y

Restaurant(id,name,rating,popularity,rec)

  1. id -> name, rating
  2. rating -> popularity
  3. popularity -> rec?

Restaurant

rating+ = rating, popularity, rec

R1(rating,popularity,rec?)

R2(rating,id,name)

rating

popularity

rec?

3

Okay

N

4

Respectable

N

5

Poppin

Y

id

name

rating

1

Mee Sum Pastry

3

2

Café on the Ave

4

3

Guanaco’s Tacos

4

4

Aladdin Gyro-Cery

5

R2

R1

37 of 56

Example: BCNF Decomposition

id

name

rating

popularity

rec?

1

Mee Sum Pastry

3

Okay

N

2

Café on the Ave

4

Respectable

N

3

Guanaco’s Tacos

4

Respectable

N

4

Aladdin Gyro-Cery

5

Poppin

Y

Restaurant(id,name,rating,popularity,rec)

  1. id -> name, rating
  2. rating -> popularity
  3. popularity -> rec?

Restaurant

rating+ = rating, popularity, rec

R1(rating,popularity,rec?)

R2(rating,id,name)

popularity+ = popularity, rec?

id

name

rating

1

Mee Sum Pastry

3

2

Café on the Ave

4

3

Guanaco’s Tacos

4

4

Aladdin Gyro-Cery

5

R2

38 of 56

Example: BCNF Decomposition

id

name

rating

popularity

rec?

1

Mee Sum Pastry

3

Okay

N

2

Café on the Ave

4

Respectable

N

3

Guanaco’s Tacos

4

Respectable

N

4

Aladdin Gyro-Cery

5

Poppin

Y

Restaurant(id,name,rating,popularity,rec)

  1. id -> name, rating
  2. rating -> popularity
  3. popularity -> rec?

Restaurant

rating+ = rating, popularity, rec

R1(rating,popularity,rec?)

R2(rating,id,name)

popularity+ = popularity, rec?

id

name

rating

1

Mee Sum Pastry

3

2

Café on the Ave

4

3

Guanaco’s Tacos

4

4

Aladdin Gyro-Cery

5

R2

R4(popularity,rating)

R3(popularity,rec?)

39 of 56

Example: BCNF Decomposition

id

name

rating

popularity

rec?

1

Mee Sum Pastry

3

Okay

N

2

Café on the Ave

4

Respectable

N

3

Guanaco’s Tacos

4

Respectable

N

4

Aladdin Gyro-Cery

5

Poppin

Y

Restaurant(id,name,rating,popularity,rec)

  1. id -> name, rating
  2. rating -> popularity
  3. popularity -> rec?

Restaurant

rating+ = rating, popularity, rec

R1(rating,popularity,rec?)

R2(rating,id,name)

popularity+ = popularity, rec?

id

name

rating

1

Mee Sum Pastry

3

2

Café on the Ave

4

3

Guanaco’s Tacos

4

4

Aladdin Gyro-Cery

5

R2

R4(popularity,rating)

R3(popularity,rec?)

popularity

rec?

Okay

N

Respectable

N

Poppin

Y

R3

rating

popularity

3

Okay

4

Respectable

5

Poppin

R4

40 of 56

Example: BCNF Decomposition

id

name

rating

popularity

rec?

1

Mee Sum Pastry

3

Okay

N

2

Café on the Ave

4

Respectable

N

3

Guanaco’s Tacos

4

Respectable

N

4

Aladdin Gyro-Cery

5

Poppin

Y

Restaurant(id,name,rating,popularity,rec)

  • id -> name, rating
  • rating -> popularity
  • popularity -> rec?

Restaurant

rating+ = rating, popularity, rec

R1(rating,popularity,rec?)

R2(rating,id,name)

popularity+ = popularity, rec?

id

name

rating

1

Mee Sum Pastry

3

2

Café on the Ave

4

3

Guanaco’s Tacos

4

4

Aladdin Gyro-Cery

5

R2

R4(popularity,rating)

R3(popularity,rec?)

popularity

rec?

Okay

N

Respectable

N

Poppin

Y

R3

rating

popularity

3

Okay

4

Respectable

5

Poppin

R4

R2, R3, R4 is the normalized database schema. No more anomalies!

41 of 56

Example: BCNF Decomposition

Now when we update the rec? so that a popularity = ‘Respectable’ has rec?= ‘Y’

UPDATE R3 SET rec? = ‘Y’ WHERE popularity = ‘Respectable’;

  • Now we only update 1 tuple in the R3 table so we have reduced redundancy, and therefore avoid redundancy anomalies, update anomalies and deletion anomalies

id

name

rating

1

Mee Sum Pastry

3

2

Café on the Ave

4

3

Guanaco’s Tacos

4

4

Aladdin Gyro-Cery

5

R2

R3

R4

popularity

rec?

Okay

N

Respectable

N

Poppin

Y

rating

popularity

3

Okay

4

Respectable

5

Poppin

42 of 56

Worksheet

43 of 56

Nested Queries Review!

44 of 56

Nested Queries

  • Avoid when possible
  • Danger of making simple queries slow and complicated
  • Just because you can do it, doesn’t mean you should

44

45 of 56

Subquery in SELECT

45

SELECT DISTINCT C.cname, (SELECT COUNT(*)

FROM Product P

WHERE P.cid = C.cid)

FROM Company C;

46 of 56

Subquery in SELECT

Unnest using JOIN and GROUP BY

46

SELECT C.cname, count(P.cid)

FROM Company C LEFT OUTER JOIN

Product P ON C.cid = P.cid

GROUP BY C.cname;

47 of 56

Nested Queries

  • If the SQL subquery returns exactly one value it can be used where we use a field
    • “one value” = one tuple with one attribute
  • Otherwise, a SQL subquery can be thought of as an "extra table"
    • Can name the table, its columns, etc

47

48 of 56

Subquery in FROM

48

SELECT X.pname

FROM (SELECT *

FROM Product

WHERE price > 20) AS X

WHERE X.price < 500;

More readable: WITH <name> AS (<subquery>)

49 of 56

Subquery in FROM

49

WITH price_more_20 AS (

SELECT *

FROM Product

WHERE price > 20

)

SELECT X.pname

FROM price_more_20 AS X

WHERE X.price < 500;

50 of 56

Subquery in FROM

Unnest using WHERE

50

SELECT X.pname

FROM Product AS X

WHERE X.price < 500 AND X.price > 20;

51 of 56

Subquery in WHERE

51

SELECT DISTINCT C.cname

FROM Company C

WHERE EXISTS (SELECT *

FROM Product P

WHERE C.cid = P.cid AND P.price < 200);

52 of 56

Subquery in WHERE

52

SELECT DISTINCT C.cname

FROM Company C, Product P

WHERE C.cid = P.cid AND P.price < 200;

53 of 56

Subquery in WHERE Syntax

  • SELECT ……… WHERE EXISTS (<sub>);
  • SELECT ……… WHERE NOT EXISTS (<sub>);
  • SELECT ……… WHERE attribute IN (<sub>);
  • SELECT ……… WHERE attribute NOT IN (<sub>);
  • SELECT ……… WHERE attribute > ANY (<sub>);
  • SELECT ……… WHERE attribute > ALL (<sub>);

53

54 of 56

Witnessing (i.e. argmax)

Find the student(s) who is taking the most classes.

Student(stu_id, id_num)

Enrolled(id_num, class)

SELECT S.stu_id

FROM Student S, Enrolled E

WHERE S.id_num = E.id_num

GROUP BY S.stu_id

HAVING count(E.class) >= ALL(

SELECT count(E1.class)

FROM Enrolled E1

GROUP BY E1.id_num);

54

johndoe

973

maryjane

712

alsmith

899

973

CSE 311

973

CSE 344

712

CSE 311

899

CSE 351

55 of 56

Witnessing (i.e. argmax) (another way)

55

WITH class_counts AS (SELECT COUNT(E1.class) AS cnt

FROM Enrolled E1

GROUP BY E1.id_num),

max_counts AS (SELECT MAX(cnt) AS max FROM class_counts)

SELECT S.stu_id

FROM Students S, Enrolled E, max_counts Emax

WHERE S.id_num = E.id_num

GROUP BY S.stu_id, Emax.max

HAVING COUNT(E.class) = Emax.max;

56 of 56

To Nest or Not to Nest

  • Not an exact science
  • Figuring out what is actually wanted will help you find simpler solutions (best way is to practice)
  • Trigger words to use sub-querying
    • Every, All (universal quantifiers)
    • No, None, Never (negation)
    • Only

56