CSE 344: Section 5
Database Design Theory
Database Design
2
Database Design
Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.
Consider what data needs to be stored and the interrelationship of the data.
Arbitrary Data
Database
There are no good or bad designs, just tradeoffs.
Data Relationship Discovery
How do we find relationships between attributes?
Flights dataset:
Did you know that every canceled flight has an actual_time of 0?
Data Relationship Discovery
Domain approach:
Before looking at the data, think about the domain
Research the domain and realize that �canceled flights have a 0 flight time!
Data Relationship Discovery
Data mining approach:
Don’t worry about the domain, find patterns in the data
Look at the data and notice that every time canceled = 1, then actual_time = 0.
SELECT DISTINCT actual_time
FROM Flights
WHERE canceled = 1;
Data Relationship Discovery
How do we find relationships between attributes?
Domain Approach
E/R Diagrams
Data Approach
Functional Dependencies (FDs)
Boyce-Codd Normal Form (BCNF)
Domain Approach
ER Diagrams
Visual graph of entities and relationships
Product
Company
Person
employs
buys
makes
price
name
ceo
name
address
address
name
id
Relationships
“at most one”
“exactly one”
is a
“subclass”
no constraint
other constraint
E/R Example
Create an ER diagram containing the following:
E/R Example – Sample Solution
11
Country (name, region)
People (id, name, gender, Country.name)
Faculty (People.id, salary)
Student (People.id, major)
Course (courseID, dept)
Took (Student.People.id, Course.courseID)
�
Exercise 1: Entity Relationship Diagrams
Odegaard Library is in need of a new database, and they have asked you to help design it! Here are some of the requirements for what information needs to be stored in this database:
Worksheet Exercise 1: Entity Relationship Diagrams
book
num_of_pages
id
title
author
Worksheet Exercise 1: Entity Relationship Diagrams
book
num_of_pages
id
title
author
Worksheet Exercise 1: Entity Relationship Diagrams
book
num_of_pages
id
title
author
has_genre
genre
name
recommended_age
Worksheet Exercise 1: Entity Relationship Diagrams
reader
first_name
last_name
age
Worksheet Exercise 1: Entity Relationship Diagrams
reader
first_name
last_name
age
book
num_of_pages
id
title
author
has_genre
genre
name
recommended_age
check_out
day_check_out
Worksheet Exercise 1: Entity Relationship Diagrams
2) Convert the ER diagram to a series of CREATE TABLE statements. Include primary key and foreign key statements.
reader
first_name
last_name
age
book
num_of_pages
id
title
author
has_genre
genre
name
recommended_age
check_out
day_check_out
Worksheet Exercise 1: Entity Relationship Diagrams
CREATE TABLE Checkouts (
day_check_out VARCHAR(10),
book_id INT REFERENCES Books,
reader_email VARCHAR(50) REFERENCES Readers,
PRIMARY KEY (book_id, reader_email, day_check_out)
);
CREATE TABLE Readers (
email VARCHAR(50) PRIMARY KEY,
first_name VARCHAR(20),
last_name VARCHAR(20),
age INT
);
CREATE TABLE Books (
id INT PRIMARY KEY,
title VARCHAR(100),
author VARCHAR(50),
genre VARCHAR(20) REFERENCES Genres,
num_pages INT
);
CREATE TABLE Genres (
name VARCHAR(20) PRIMARY KEY,
recommended_age INT
);
2) Convert the ER diagram to a series of CREATE TABLE statements. Include primary key and foreign key statements.
Data Approach
Functional Dependencies
A functional dependency (FD) for relation R is a formula of the form A → B
where A and B are sets or individual attributes of R.
We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.
DEPENDENCY != CAUSATION
*in other words: A determines B, or A is kinda sorta a key for B
Example
Student(studentID, name, dateOfBirth, phoneNumber)
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
There are two tuples that agree on studentID but don’t agree on phoneNumber!
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Trivial FDs
Axiom of Reflexivity
A trivial FD is one where the RHS is a subset of the LHS
That is, every attribute on the RHS is also on the LHS
The two FDs below are trivial:
{name} → {name}
{studentID,name} → {name}
Inference Rules
Closure
Goal:
Observation:
Closures (quicker)
Goal:
Observation:
Example
Given studentID → name, dateOfBirth:
{studentID}+ = {studentID, name, dateOfBirth}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Keys
We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.
If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Superkeys and Minimal Keys
R(A, B, C, D)
A -> B
B -> C, D
Question: is {A, B} a superkey, minimal key, neither, or both?
Question: is {B} a superkey, minimal key, neither, or both?
Question: is {A} a superkey, minimal key, neither or both?
superkey
neither
both
A superkey determines all other attributes in a schema
A minimal key is the smallest set of attributes that determines all other attributes in a schema
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Superkeys and Minimal Keys
R(A, B, C, D)
A -> B
C -> D
Question: is {A, C} a superkey, minimal key, or both?
Question: is {A, B, C, D} a superkey, minimal key, or both?
Question: is {A, C, D} a superkey, minimal key, or both?
both
superkey
superkey
A superkey determines all other attributes in a schema
A minimal key is the smallest set of attributes that determines all other attributes in a schema
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Boyce-Codd Normal Form (BCNF)
A relation R is in BCNF if every set of attributes in R is either a superkey or its closure is the same set (trivial FD).
*that is, either {a}+ -> a OR {a}+ -> {all cols}, where a is any col in the table
Boyce-Codd Normal Form (BCNF)
Not BCNF! SSN is not a trivial FD or a superkey
BCNF Decomposition
Algorithm
C = the set of all attributes in the relation R
Look for an attribute (or a set of attributes) that meets the following conditions (non-BCNF flag!):
If such attribute exists, it is a violation of the BCNF condition; decompose R:
Else, the table is in BCNF
BCNF
Example
Relation: R (A, B, C, D, E)
FDs:
Goal: Decompose R into BCNF
BCNF
Example
R (B, C, A)
R (B, C, D)
R (A, E)
R (A, B, C, D)
R (A, B, C, D, E)
A+ = {A, E}
BCNF compliant
BCNF compliant
BCNF compliant
{B, C}+ = {B, C, A}
BCNF
Example
Notice that {A}+ = {A,E}, which violates the BCNF condition
{B,C}+ = {B,C,A} violates the BCNF condition
BCNF Final Tips
Worksheet