1 of 42

CSE 344: Section 5

Database Design Theory

2 of 42

Database Design

2

Database Design

Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.

Consider what data needs to be stored and the interrelationship of the data.

Arbitrary Data

Database

There are no good or bad designs, just tradeoffs.

3 of 42

Data Relationship Discovery

How do we find relationships between attributes?

Flights dataset:

Did you know that every canceled flight has an actual_time of 0?

  1. Domain approach
  2. Data mining approach

4 of 42

Data Relationship Discovery

Domain approach:

Before looking at the data, think about the domain

Research the domain and realize that �canceled flights have a 0 flight time!

5 of 42

Data Relationship Discovery

Data mining approach:

Don’t worry about the domain, find patterns in the data

Look at the data and notice that every time canceled = 1, then actual_time = 0.

SELECT DISTINCT actual_time

FROM Flights

WHERE canceled = 1;

6 of 42

Data Relationship Discovery

How do we find relationships between attributes?

Domain Approach

E/R Diagrams

Data Approach

Functional Dependencies (FDs)

Boyce-Codd Normal Form (BCNF)

7 of 42

Domain Approach

8 of 42

ER Diagrams

Visual graph of entities and relationships

Product

Company

Person

employs

buys

makes

price

name

ceo

name

address

address

name

id

9 of 42

Relationships

  • one-one: ssn - UW student id
  • one-many: ssn - phone #
  • many-many: store - product
  • is-a: computer - PC and computer - Mac
  • has-a: country - city
    • What country does the city of Cambridge belong to?

“at most one”

“exactly one”

 

is a

“subclass”

no constraint

other constraint

10 of 42

E/R Example

Create an ER diagram containing the following:

  • The id, name, gender, country of birth, �and country region (continent) of birth for students and faculty
  • The major of students
  • The salary of faculty
  • The courses that each student takes
  • The department each course is offered in

11 of 42

E/R Example – Sample Solution

11

Country (name, region)

People (id, name, gender, Country.name)

Faculty (People.id, salary)

Student (People.id, major)

Course (courseID, dept)

Took (Student.People.id, Course.courseID)

12 of 42

Exercise 1: Entity Relationship Diagrams

Odegaard Library is in need of a new database, and they have asked you to help design it! Here are some of the requirements for what information needs to be stored in this database:

  • Each book has a unique ID, a title, an author, a genre, and a number of pages
  • Readers who visit the library have a unique email address, a first name, a last name, and an age
  • Readers can “check out” multiple books from the library at a time, and one book can be checked out multiple times. We should keep track of the day that each book was checked out
  • To make it easier to recommend books to readers, we should assign a recommended age for each genre
  • Assume only one book fits in one genre
  1. Design an ER diagram for this new library database.

13 of 42

Worksheet Exercise 1: Entity Relationship Diagrams

  • Each book has a unique ID, a title, an author, a genre, and a number of pages

book

num_of_pages

id

title

author

14 of 42

Worksheet Exercise 1: Entity Relationship Diagrams

book

num_of_pages

id

title

author

  • Each book has a unique ID, a title, an author, a genre, and a number of pages
  • To make it easier to recommend books to readers, we should assign a recommended age for each genre

15 of 42

Worksheet Exercise 1: Entity Relationship Diagrams

book

num_of_pages

id

title

author

has_genre

genre

name

recommended_age

  • Each book has a unique ID, a title, an author, a genre, and a number of pages
  • To make it easier to recommend books to readers, we should assign a recommended age for each genre

16 of 42

Worksheet Exercise 1: Entity Relationship Diagrams

reader

email

first_name

last_name

age

  • Readers who visit the library have a unique email address, a first name, a last name, and an age
  • Readers can “check out” multiple books from the library at a time, and one book can be checked out multiple times. We should keep track of the day that each book was checked out

17 of 42

Worksheet Exercise 1: Entity Relationship Diagrams

  • Readers who visit the library have a unique email address, a first name, a last name, and an age
  • Readers can “check out” multiple books from the library at a time, and one book can be checked out multiple times. We should keep track of the day that each book was checked out

reader

email

first_name

last_name

age

book

num_of_pages

id

title

author

has_genre

genre

name

recommended_age

check_out

day_check_out

18 of 42

Worksheet Exercise 1: Entity Relationship Diagrams

2) Convert the ER diagram to a series of CREATE TABLE statements. Include primary key and foreign key statements.

reader

email

first_name

last_name

age

book

num_of_pages

id

title

author

has_genre

genre

name

recommended_age

check_out

day_check_out

19 of 42

Worksheet Exercise 1: Entity Relationship Diagrams

CREATE TABLE Checkouts (

day_check_out VARCHAR(10),

book_id INT REFERENCES Books,

reader_email VARCHAR(50) REFERENCES Readers,

PRIMARY KEY (book_id, reader_email, day_check_out)

);

CREATE TABLE Readers (

email VARCHAR(50) PRIMARY KEY,

first_name VARCHAR(20),

last_name VARCHAR(20),

age INT

);

CREATE TABLE Books (

id INT PRIMARY KEY,

title VARCHAR(100),

author VARCHAR(50),

genre VARCHAR(20) REFERENCES Genres,

num_pages INT

);

CREATE TABLE Genres (

name VARCHAR(20) PRIMARY KEY,

recommended_age INT

);

2) Convert the ER diagram to a series of CREATE TABLE statements. Include primary key and foreign key statements.

20 of 42

Data Approach

21 of 42

Functional Dependencies

A functional dependency (FD) for relation R is a formula of the form A → B

where A and B are sets or individual attributes of R.

We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.

DEPENDENCY != CAUSATION

*in other words: A determines B, or A is kinda sorta a key for B

22 of 42

Example

Student(studentID, name, dateOfBirth, phoneNumber)

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

23 of 42

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

24 of 42

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

25 of 42

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

26 of 42

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

There are two tuples that agree on studentID but don’t agree on phoneNumber!

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

27 of 42

Trivial FDs

Axiom of Reflexivity

A trivial FD is one where the RHS is a subset of the LHS

That is, every attribute on the RHS is also on the LHS

The two FDs below are trivial:

{name} → {name}

{studentID,name} → {name}

28 of 42

Inference Rules

  • If b is a subset of a, then ab (reflexivity or trivial FD)
    • then a → a (if a and b are subsets of each other, aka a = b!)
  • If a → b then a, c → b, c (augmentation)
    • then a, c → b
    • but not a → b, c
    • “Adding on the left is chill. Adding on the right though 🚩🚩🚩🚩🚩🚩”
  • If a → b & b → c then a → c (transitivity)
    • If abc and c → d then a → bd (pseudo transitivity) or even a -> bcd

29 of 42

Closure

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • Suppose we have the attributes A → B and B → C
    • To determine the closure of A we start with trivial FD: A → A and X = {A}
    • Since we have A → B, we add B to X = {A, B}
    • B is in X and B → C, so we can add C to X = {A, B, C}
    • Formally, the closure of A is written as {A}+ = {A, B, C}

30 of 42

Closures (quicker)

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • Suppose we have the attributes A → B and B → C
    • By transitivity, A → C
    • So really, A → B, C
    • Add A to the RHS so A → A, B, C
    • Formally, the closure of A is written as {A}+ = {A, B, C}

31 of 42

Example

Given studentID → name, dateOfBirth:

{studentID}+ = {studentID, name, dateOfBirth}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

32 of 42

Keys

We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.

If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

33 of 42

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

B -> C, D

Question: is {A, B} a superkey, minimal key, neither, or both?

Question: is {B} a superkey, minimal key, neither, or both?

Question: is {A} a superkey, minimal key, neither or both?

superkey

neither

both

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

34 of 42

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

C -> D

Question: is {A, C} a superkey, minimal key, or both?

Question: is {A, B, C, D} a superkey, minimal key, or both?

Question: is {A, C, D} a superkey, minimal key, or both?

both

superkey

superkey

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

35 of 42

Boyce-Codd Normal Form (BCNF)

A relation R is in BCNF if every set of attributes in R is either a superkey or its closure is the same set (trivial FD).

*that is, either {a}+ -> a OR {a}+ -> {all cols}, where a is any col in the table

  • BCNF reduces redundancy at the expense of creating additional tables.
  • BCNF may have several correct decompositions.
  • Some functional dependencies may be lost.

36 of 42

Boyce-Codd Normal Form (BCNF)

Not BCNF! SSN is not a trivial FD or a superkey

37 of 42

BCNF Decomposition

Algorithm

C = the set of all attributes in the relation R

Look for an attribute (or a set of attributes) that meets the following conditions (non-BCNF flag!):

  • X+ != X
  • X+ != C

If such attribute exists, it is a violation of the BCNF condition; decompose R:

  • R1 = X+
  • R2 = (C - X+) union (X)
  • Normalize each table (recursive call)

Else, the table is in BCNF

38 of 42

BCNF

Example

Relation: R (A, B, C, D, E)

FDs:

  • A → E
  • BC → A
  • DE → B

Goal: Decompose R into BCNF

39 of 42

BCNF

Example

R (B, C, A)

R (B, C, D)

R (A, E)

R (A, B, C, D)

R (A, B, C, D, E)

A+ = {A, E}

BCNF compliant

BCNF compliant

BCNF compliant

{B, C}+ = {B, C, A}

40 of 42

BCNF

Example

Notice that {A}+ = {A,E}, which violates the BCNF condition

  • We split R to R1(A,E) and R2(A,B,C,D)
  • R1 satisfies BCNF
  • R2 does not satisfy BCNF because: {B,C}+ = {B,C,A}

{B,C}+ = {B,C,A} violates the BCNF condition

  • *Notice there is no E in R2 so we don't need to consider DE → B
  • Split R2 to: R21(B,C,A) and R22(B,C,D)
  • R21 and R21 satisfy BCNF now

41 of 42

BCNF Final Tips

  • Split on closures, not FDs
  • Depending on the order in which you split on closures, you may end up with different tables - there are multiple possible solutions!
  • The PKs for your tables will depend on the FDs still preserved

42 of 42

Worksheet