CSE 344: Section 5
Database Design Theory
February 1st, 2024
Announcements
Database Design
3
Database Design
Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.
Consider what data needs to be stored and the interrelationship of the data.
Arbitrary Data
Database
There are no good or bad designs, just tradeoffs.
Data Relationship Discovery
How do we find relationships between attributes?
Flights dataset:
Did you know that every canceled flight has an actual_time of 0?
Data Relationship Discovery
Domain approach:
Before looking at the data, think about the domain
Research the domain and realize that �canceled flights have a 0 flight time!
Data Relationship Discovery
Data mining approach:
Don’t worry about the domain, find patterns in the data
Look at the data and notice that every time canceled = 1, then actual_time = 0.
SELECT DISTINCT actual_time
FROM Flights
WHERE canceled = 1;
Data Relationship Discovery
How do we find relationships between attributes?
Domain Approach
E/R Diagrams
Data Approach
Functional Dependencies (FDs)
Boyce-Codd Normal Form (BCNF)
Domain Approach
ER Diagrams
Visual graph of entities and relationships
Product
Company
Person
employs
buys
makes
price
name
ceo
name
address
address
name
id
Relationships
“at most one”
“exactly one”
is a
“subclass”
no constraint
other constraint
E/R Example
Create an ER diagram containing the following:
E/R Example – Sample Solution
12
Country (name, region)
People (id, name, gender, Country.name)
Faculty (People.id, salary)
Student (People.id, major)
Course (courseID, dept)
Took (Student.People.id, Course.courseID)
�
Data Approach
Functional Dependencies
A functional dependency (FD) for relation R is a formula of the form A → B
where A and B are sets or individual attributes of R.
We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.
DEPENDENCY != CAUSATION
*in other words: A determines B, or A is kinda sorta a key for B
Example
Student(studentID, name, dateOfBirth, phoneNumber)
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
There are two tuples that agree on studentID but don’t agree on phoneNumber!
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Trivial FDs
Axiom of Reflexivity
A trivial FD is one where the RHS is a subset of the LHS
That is, every attribute on the RHS is also on the LHS
The two FDs below are trivial:
{name} → {name}
{studentID,name} → {name}
Inference Rules
Closure
Goal:
Observation:
Closures (quicker)
Goal:
Observation:
Example
Given studentID → name, dateOfBirth:
{studentID}+ = {studentID, name, dateOfBirth}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Keys
We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.
If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Superkeys and Minimal Keys
R(A, B, C, D)
A -> B
B -> C, D
Question: is {A, B} a superkey, minimal key, neither, or both?
Question: is {B} a superkey, minimal key, neither, or both?
Question: is {A} a superkey, minimal key, neither or both?
superkey
neither
both
A superkey determines all other attributes in a schema
A minimal key is the smallest set of attributes that determines all other attributes in a schema
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Superkeys and Minimal Keys
R(A, B, C, D)
A -> B
C -> D
Question: is {A, C} a superkey, minimal key, or both?
Question: is {A, B, C, D} a superkey, minimal key, or both?
Question: is {A, C, D} a superkey, minimal key, or both?
both
superkey
superkey
A superkey determines all other attributes in a schema
A minimal key is the smallest set of attributes that determines all other attributes in a schema
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Boyce-Codd Normal Form (BCNF)
A relation R is in BCNF if every set of attributes in R is either a superkey or its closure is the same set (trivial FD).
*that is, either {a}+ -> a OR {a}+ -> {all cols}, where a is any col in the table
Boyce-Codd Normal Form (BCNF)
Not BCNF! SSN is not a trivial FD or a superkey
BCNF Decomposition
Algorithm
C = the set of all attributes in the relation R
Look for an attribute (or a set of attributes) that meets the following conditions (non-BCNF flag!):
If such attribute exists, it is a violation of the BCNF condition; decompose R:
Else, the table is in BCNF
BCNF
Example
Relation: R (A, B, C, D, E)
FDs:
Goal: Decompose R into BCNF
BCNF
Example
R (B, C, A)
R (B, C, D)
R (A, E)
R (A, B, C, D)
R (A, B, C, D, E)
A+ = {A, E}
BCNF compliant
BCNF compliant
BCNF compliant
{B, C}+ = {B, C, A}
BCNF
Example
Notice that {A}+ = {A,E}, which violates the BCNF condition
{B,C}+ = {B,C,A} violates the BCNF condition
Worksheet