CSE 414: Section 5
Database Design Theory
Feb 3, 2020
Snigdha
Aaditya
Announcements
Agenda
Database Design
4
October 22, 2020
Database Design
Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.
Consider what data needs to be stored and the interrelationship of the data.
Arbitrary Data
Database
There are no good or bad designs, just tradeoffs.
Design Section
Data Relationship Discovery
How do we find relationships between attributes?
5
October 22, 2020
Domain Approach
E/R Diagrams
Data Approach
Functional Dependencies
Boyce-Codd Normal Form
(BCNF)
Design Section
Building Blocks
6
October 22, 2020
Entity (nouns)
E.g. Company, Ocean, Employee
Attributes (characteristics of entities)
E.g. Name, Temperature, Salary
Relations among entities (verbs)
E.g. Employs, Borders, Oversees
Design Section
Entity Sets
7
October 22, 2020
Design Section
Relationships
�
8
October 22, 2020
“at most one”
“exactly one”
is a
“subclass”
no constraint
other constraint
Design Section
ER Diagrams
9
October 22, 2020
Product
Company
Person
employs
buys
makes
price
name
ceo
name
address
address
name
id
Design Section
Functional Dependencies
A functional dependency (FD) for relation R is a formula of the form A → B
where A and B are sets or individual attributes of R.
We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.
*in other words: A determines B, or A is kinda sorta a key for B
Example
Student(studentID, name, dateOfBirth, phoneNumber)
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
There are two tuples that agree on studentID but don’t agree on phoneNumber!
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Trivial FDs
A trivial FD is one where the RHS is a subset of the LHS
That is, every attribute on the RHS is also on the LHS
The two FDs below are trivial:
{name} → {name}
{studentID,name} → {name}
Note that every LHS will always determine itself, but they are not always explicitly written for the sake of convenience.
Inference Rules
Closure
Goal:
Observation:
Closures (quicker)
Goal:
Observation:
Example
Given studentID → name, dateOfBirth:
{studentID}+ = {studentID, name, dateOfBirth}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Keys
We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.
If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Superkeys and Minimal Keys
R(A, B, C, D)
A -> B
B -> C, D
Question: is {A, B} a superkey, minimal key, neither, or both?
Question: is {B} a superkey, minimal key, neither, or both?
Question: is {A} a superkey, minimal key, neither or both?
superkey
neither
both
A superkey determines all other attributes in a schema
A minimal key is the smallest set of attributes that determines all other attributes in a schema
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Superkeys and Minimal Keys
R(A, B, C, D)
A -> B
C -> D
Question: is {A, C} a superkey, minimal key, or both?
Question: is {A, B, C, D} a superkey, minimal key, or both?
Question: is {A, C, D} a superkey, minimal key, or both?
both
superkey
superkey
A superkey determines all other attributes in a schema
A minimal key is the smallest set of attributes that determines all other attributes in a schema
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Boyce-Codd Normal Form (BCNF)
A relation R is in BCNF if every set of attributes in R is either a superkey or its closure is the same set.
*that is, either {a}+ -> a OR {a}+ -> {all cols}, where a is any col in the table
BCNF Decomposition Algorithm
X +
BCNF Decomposition Algorithm
C = the set of all attributes in the relation R
Look for an attribute (or a set of attributes) that meets the following conditions (non-BCNF flag!):
If such attribute exists, it is a violation of the BCNF condition; decompose R:
Else, the table is in BCNF
Example
Relation: R (A, B, C, D, E)
FDs:
Goal: Decompose R into BCNF
Example
R (A, B, C, D, E)
R (A, E)
R (A, B, C, D)
R (B, C, A)
R (B, C, D)
A+ = {A, E}
BCNF compliant
BCNF compliant
BCNF compliant
{B, C}+ = {B, C, A}
Solution
Notice that {A}+ = {A,E}, which violates the BCNF condition
Notice there is no E in R2 so we don't need to consider FD DE → B