CSE 344: Section 4
ER diagrams, BCNF decomposition
Oct 19th, 2023
Announcements
Database Design
E/R diagram Basics
is a
Subclass
Attribute
Entity set
Relationship
Weak Entity
Subclass
CREATE TABLE Product (
pname INT PRIMARY KEY,
price INT)
CREATE TABLE Toy (
pname INT PRIMARY KEY REFERENCES Product,
age INT)
CREATE TABLE Candy (
pname INT PRIMARY KEY REFERENCES Product,
isChoc TEXT)
Weak Entity Set
Relationships
Integrity Constraints in ER Diagrams
Product
price
name
category
makes
makes
makes
Keys
Single Value Constraints (many-one) [possibly none]
Referential Integrity Constraints (many-one) [exactly one]
Other Constraints
>2
Examples of Relationships
Question: "at most one" vs. "exactly one"?
Worksheet Exercise 1:
Entity Relationship Diagrams
Worksheet Exercise 1: Entity Relationship Diagrams
Odegaard Library is in need of a new database, and they have asked you to help design it! Here are some of the requirements for what information needs to be stored in this database:
Worksheet Exercise 1: Entity Relationship Diagrams
book
num_of_pages
id
title
author
Worksheet Exercise 1: Entity Relationship Diagrams
book
num_of_pages
id
title
author
Worksheet Exercise 1: Entity Relationship Diagrams
book
num_of_pages
id
title
author
has_genre
genre
name
recommended_age
Worksheet Exercise 1: Entity Relationship Diagrams
reader
first_name
last_name
age
Worksheet Exercise 1: Entity Relationship Diagrams
reader
first_name
last_name
age
book
num_of_pages
id
title
author
has_genre
genre
name
recommended_age
check_out
day_check_out
Worksheet Exercise 1: Entity Relationship Diagrams
2) Convert the ER diagram to a series of CREATE TABLE statements. Include primary key and foreign key statements.
reader
first_name
last_name
age
book
num_of_pages
id
title
author
has_genre
genre
name
recommended_age
check_out
day_check_out
Worksheet Exercise 1: Entity Relationship Diagrams
CREATE TABLE Checkouts (
day_check_out VARCHAR(10),
book_id INT REFERENCES Books,
reader_email VARCHAR(50) REFERENCES Readers,
PRIMARY KEY (book_id, reader_email, day_check_out)
);
CREATE TABLE Readers (
email VARCHAR(50) PRIMARY KEY,
first_name VARCHAR(20),
last_name VARCHAR(20),
age INT
);
CREATE TABLE Books (
id INT PRIMARY KEY,
title VARCHAR(100),
author VARCHAR(50),
genre VARCHAR(20) REFERENCES Genres,
num_pages INT
);
CREATE TABLE Genres (
name VARCHAR(20) PRIMARY KEY,
recommended_age INT
);
2) Convert the ER diagram to a series of CREATE TABLE statements. Include primary key and foreign key statements.
Functional Dependencies
Functional Dependency
A → B
Indicates that attribute A determines attribute B (B is dependent on A).
Some examples:
Transitive Dependency
If A → B and B → C
then A → C
{A}+ = Closure of attribute A: Set of all attributes that can be determined by A.
{A}+ = {ABC}
Closure
Closure Definition
Goal:
Closure
Closure Algorithm
Algorithm Steps:
1. Find some attribute(s) C to add to right side
2. Add them
3. Look back at the FDs to find more C
Closure Example
To determine the closure of ID we start with trivial FD: ID → ID because if we are given ID, then we know ID
Closure Example
Closure Example
Closure Example
Formally, the closure of ID is written as {ID}+ = {ID, Name, Job}
Closures (quicker)
Goal:
Observation:
Keys
We call the attribute(s) that determines all other attributes in a schema to be a superkey.
Ex. for R(A, B, C), the set {A, B, C} determines all attributes in R
If no subset of a superkey is also a superkey, then that superkey is a minimal key or key for short. If a relation has multiple keys, each key is a candidate key
Ex. if AB → ABC and C → ABC, then {AB} and {C} are candidate keys for R
Super Key
Minimal Key
Boyce-Codd Normal Form (BCNF)
Motivation of BCNF
Recall the three anomalies:
If a relation is in BCNF, then it does not have these anomalies
If it is not in BCNF, then it does have these anomalies
BCNF Algorithm
Example: BCNF Decomposition
Restaurant(id,name,rating,popularity,rec)
SET rec? = ‘Y’
WHERE popularity = ‘Respectable’;
id | name | rating | popularity | rec? |
1 | Mee Sum Pastry | 3 | Okay | N |
2 | Café on the Ave | 4 | Respectable | N |
3 | Guanaco’s Tacos | 4 | Respectable | N |
4 | Aladdin Gyro-Cery | 5 | Poppin | Y |
Restaurant
Example: BCNF Decomposition
Restaurant(id,name,rating,popularity,rec)
rating+ = rating, popularity, rec?
id | name | rating | popularity | rec? |
1 | Mee Sum Pastry | 3 | Okay | N |
2 | Café on the Ave | 4 | Respectable | N |
3 | Guanaco’s Tacos | 4 | Respectable | N |
4 | Aladdin Gyro-Cery | 5 | Poppin | Y |
Restaurant
Example: BCNF Decomposition
id | name | rating | popularity | rec? |
1 | Mee Sum Pastry | 3 | Okay | N |
2 | Café on the Ave | 4 | Respectable | N |
3 | Guanaco’s Tacos | 4 | Respectable | N |
4 | Aladdin Gyro-Cery | 5 | Poppin | Y |
Restaurant(id,name,rating,popularity,rec)
Restaurant
rating+ = rating, popularity, rec
R1(rating,popularity,rec?)
R2(rating,id,name)
rating | popularity | rec? |
3 | Okay | N |
4 | Respectable | N |
5 | Poppin | Y |
id | name | rating |
1 | Mee Sum Pastry | 3 |
2 | Café on the Ave | 4 |
3 | Guanaco’s Tacos | 4 |
4 | Aladdin Gyro-Cery | 5 |
R2
R1
Example: BCNF Decomposition
id | name | rating | popularity | rec? |
1 | Mee Sum Pastry | 3 | Okay | N |
2 | Café on the Ave | 4 | Respectable | N |
3 | Guanaco’s Tacos | 4 | Respectable | N |
4 | Aladdin Gyro-Cery | 5 | Poppin | Y |
Restaurant(id,name,rating,popularity,rec)
Restaurant
rating+ = rating, popularity, rec
R1(rating,popularity,rec?)
R2(rating,id,name)
popularity+ = popularity, rec?
id | name | rating |
1 | Mee Sum Pastry | 3 |
2 | Café on the Ave | 4 |
3 | Guanaco’s Tacos | 4 |
4 | Aladdin Gyro-Cery | 5 |
R2
Example: BCNF Decomposition
id | name | rating | popularity | rec? |
1 | Mee Sum Pastry | 3 | Okay | N |
2 | Café on the Ave | 4 | Respectable | N |
3 | Guanaco’s Tacos | 4 | Respectable | N |
4 | Aladdin Gyro-Cery | 5 | Poppin | Y |
Restaurant(id,name,rating,popularity,rec)
Restaurant
rating+ = rating, popularity, rec
R1(rating,popularity,rec?)
R2(rating,id,name)
popularity+ = popularity, rec?
id | name | rating |
1 | Mee Sum Pastry | 3 |
2 | Café on the Ave | 4 |
3 | Guanaco’s Tacos | 4 |
4 | Aladdin Gyro-Cery | 5 |
R2
R4(popularity,rating)
R3(popularity,rec?)
Example: BCNF Decomposition
id | name | rating | popularity | rec? |
1 | Mee Sum Pastry | 3 | Okay | N |
2 | Café on the Ave | 4 | Respectable | N |
3 | Guanaco’s Tacos | 4 | Respectable | N |
4 | Aladdin Gyro-Cery | 5 | Poppin | Y |
Restaurant(id,name,rating,popularity,rec)
Restaurant
rating+ = rating, popularity, rec
R1(rating,popularity,rec?)
R2(rating,id,name)
popularity+ = popularity, rec?
id | name | rating |
1 | Mee Sum Pastry | 3 |
2 | Café on the Ave | 4 |
3 | Guanaco’s Tacos | 4 |
4 | Aladdin Gyro-Cery | 5 |
R2
R4(popularity,rating)
R3(popularity,rec?)
popularity | rec? |
Okay | N |
Respectable | N |
Poppin | Y |
R3
rating | popularity |
3 | Okay |
4 | Respectable |
5 | Poppin |
R4
Example: BCNF Decomposition
id | name | rating | popularity | rec? |
1 | Mee Sum Pastry | 3 | Okay | N |
2 | Café on the Ave | 4 | Respectable | N |
3 | Guanaco’s Tacos | 4 | Respectable | N |
4 | Aladdin Gyro-Cery | 5 | Poppin | Y |
Restaurant(id,name,rating,popularity,rec)
Restaurant
rating+ = rating, popularity, rec
R1(rating,popularity,rec?)
R2(rating,id,name)
popularity+ = popularity, rec?
id | name | rating |
1 | Mee Sum Pastry | 3 |
2 | Café on the Ave | 4 |
3 | Guanaco’s Tacos | 4 |
4 | Aladdin Gyro-Cery | 5 |
R2
R4(popularity,rating)
R3(popularity,rec?)
popularity | rec? |
Okay | N |
Respectable | N |
Poppin | Y |
R3
rating | popularity |
3 | Okay |
4 | Respectable |
5 | Poppin |
R4
R2, R3, R4 is the normalized database schema. No more anomalies!
Example: BCNF Decomposition
Now when we update the rec? so that a popularity = ‘Respectable’ has rec?= ‘Y’
UPDATE R3 SET rec? = ‘Y’ WHERE popularity = ‘Respectable’;
id | name | rating |
1 | Mee Sum Pastry | 3 |
2 | Café on the Ave | 4 |
3 | Guanaco’s Tacos | 4 |
4 | Aladdin Gyro-Cery | 5 |
R2
R3
R4
popularity | rec? |
Okay | N |
Respectable | N |
Poppin | Y |
rating | popularity |
3 | Okay |
4 | Respectable |
5 | Poppin |
Worksheet
Nested Queries Review!
Nested Queries
44
Subquery in SELECT
45
SELECT DISTINCT C.cname, (SELECT COUNT(*)
FROM Product P
WHERE P.cid = C.cid)
FROM Company C;
Subquery in SELECT
Unnest using JOIN and GROUP BY
46
SELECT C.cname, count(P.cid)
FROM Company C LEFT OUTER JOIN
Product P ON C.cid = P.cid
GROUP BY C.cname;
Nested Queries
47
Subquery in FROM
48
SELECT X.pname
FROM (SELECT *
FROM Product
WHERE price > 20) AS X
WHERE X.price < 500;
More readable: WITH <name> AS (<subquery>)
Subquery in FROM
49
WITH price_more_20 AS (
SELECT *
FROM Product
WHERE price > 20
)
SELECT X.pname
FROM price_more_20 AS X
WHERE X.price < 500;
Subquery in FROM
Unnest using WHERE
50
SELECT X.pname
FROM Product AS X
WHERE X.price < 500 AND X.price > 20;
Subquery in WHERE
51
SELECT DISTINCT C.cname
FROM Company C
WHERE EXISTS (SELECT *
FROM Product P
WHERE C.cid = P.cid AND P.price < 200);
Subquery in WHERE
52
SELECT DISTINCT C.cname
FROM Company C, Product P
WHERE C.cid = P.cid AND P.price < 200;
Subquery in WHERE Syntax
53
Witnessing (i.e. argmax)
Find the student(s) who is taking the most classes.
Student(stu_id, id_num)
Enrolled(id_num, class)
SELECT S.stu_id
FROM Student S, Enrolled E
WHERE S.id_num = E.id_num
GROUP BY S.stu_id
HAVING count(E.class) >= ALL(
SELECT count(E1.class)
FROM Enrolled E1
GROUP BY E1.id_num);
54
johndoe | 973 |
maryjane | 712 |
alsmith | 899 |
973 | CSE 311 |
973 | CSE 344 |
712 | CSE 311 |
899 | CSE 351 |
Witnessing (i.e. argmax) (another way)
55
WITH class_counts AS (SELECT COUNT(E1.class) AS cnt
FROM Enrolled E1
GROUP BY E1.id_num),
max_counts AS (SELECT MAX(cnt) AS max FROM class_counts)
SELECT S.stu_id
FROM Students S, Enrolled E, max_counts Emax
WHERE S.id_num = E.id_num
GROUP BY S.stu_id, Emax.max
HAVING COUNT(E.class) = Emax.max;
To Nest or Not to Nest
56