1 of 34

CSE 344: Section 5

Database Design Theory

February 1st, 2024

2 of 34

Announcements

  • Homework 3 due tomorrow 2/2 @ 11pm
    • Late due date 2/4 @ 11pm
  • Homework 4 release soon

3 of 34

Database Design

3

Database Design

Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.

Consider what data needs to be stored and the interrelationship of the data.

Arbitrary Data

Database

There are no good or bad designs, just tradeoffs.

4 of 34

Data Relationship Discovery

How do we find relationships between attributes?

Flights dataset:

Did you know that every canceled flight has an actual_time of 0?

  • Domain approach
  • Data mining approach

5 of 34

Data Relationship Discovery

Domain approach:

Before looking at the data, think about the domain

Research the domain and realize that �canceled flights have a 0 flight time!

6 of 34

Data Relationship Discovery

Data mining approach:

Don’t worry about the domain, find patterns in the data

Look at the data and notice that every time canceled = 1, then actual_time = 0.

SELECT DISTINCT actual_time

FROM Flights

WHERE canceled = 1;

7 of 34

Data Relationship Discovery

How do we find relationships between attributes?

Domain Approach

E/R Diagrams

Data Approach

Functional Dependencies (FDs)

Boyce-Codd Normal Form (BCNF)

8 of 34

Domain Approach

9 of 34

ER Diagrams

Visual graph of entities and relationships

Product

Company

Person

employs

buys

makes

price

name

ceo

name

address

address

name

id

10 of 34

Relationships

  • one-one: ssn - UW student id
  • one-many: ssn - phone #
  • many-many: store - product
  • is-a: computer - PC and computer - Mac
  • has-a: country - city
    • What country does the city of Cambridge belong to?

“at most one”

“exactly one”

 

is a

“subclass”

no constraint

other constraint

11 of 34

E/R Example

Create an ER diagram containing the following:

  • The id, name, gender, country of birth, �and country region (continent) of birth for students and faculty
  • The major of students
  • The salary of faculty
  • The courses that each student takes
  • The department each course is offered in

12 of 34

E/R Example – Sample Solution

12

Country (name, region)

People (id, name, gender, Country.name)

Faculty (People.id, salary)

Student (People.id, major)

Course (courseID, dept)

Took (Student.People.id, Course.courseID)

13 of 34

Data Approach

14 of 34

Functional Dependencies

A functional dependency (FD) for relation R is a formula of the form A → B

where A and B are sets or individual attributes of R.

We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.

DEPENDENCY != CAUSATION

*in other words: A determines B, or A is kinda sorta a key for B

15 of 34

Example

Student(studentID, name, dateOfBirth, phoneNumber)

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

16 of 34

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

17 of 34

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

18 of 34

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

19 of 34

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

There are two tuples that agree on studentID but don’t agree on phoneNumber!

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

20 of 34

Trivial FDs

Axiom of Reflexivity

A trivial FD is one where the RHS is a subset of the LHS

That is, every attribute on the RHS is also on the LHS

The two FDs below are trivial:

{name} → {name}

{studentID,name} → {name}

21 of 34

Inference Rules

  • If b is a subset of a, then ab (reflexivity or trivial FD)
    • then a → a (if a and b are subsets of each other, aka a = b!)
  • If a → b then a, c → b, c (augmentation)
    • then a, c → b
    • but not a → b, c
    • “Adding on the left is chill. Adding on the right though 🚩🚩🚩🚩🚩🚩”
  • If a → b & b → c then a → c (transitivity)
    • If abc and c → d then a → bd (pseudo transitivity) or even a -> bcd

22 of 34

Closure

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • Suppose we have the attributes A → B and B → C
    • To determine the closure of A we start with trivial FD: A → A and X = {A}
    • Since we have A → B, we add B to X = {A, B}
    • B is in X and B → C, so we can add C to X = {A, B, C}
    • Formally, the closure of A is written as {A}+ = {A, B, C}

23 of 34

Closures (quicker)

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • Suppose we have the attributes A → B and B → C
    • By transitivity, A → C
    • So really, A → B, C
    • Add A to the RHS so A → A, B, C
    • Formally, the closure of A is written as {A}+ = {A, B, C}

24 of 34

Example

Given studentID → name, dateOfBirth:

{studentID}+ = {studentID, name, dateOfBirth}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

25 of 34

Keys

We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.

If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

26 of 34

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

B -> C, D

Question: is {A, B} a superkey, minimal key, neither, or both?

Question: is {B} a superkey, minimal key, neither, or both?

Question: is {A} a superkey, minimal key, neither or both?

superkey

neither

both

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

27 of 34

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

C -> D

Question: is {A, C} a superkey, minimal key, or both?

Question: is {A, B, C, D} a superkey, minimal key, or both?

Question: is {A, C, D} a superkey, minimal key, or both?

both

superkey

superkey

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

28 of 34

Boyce-Codd Normal Form (BCNF)

A relation R is in BCNF if every set of attributes in R is either a superkey or its closure is the same set (trivial FD).

*that is, either {a}+ -> a OR {a}+ -> {all cols}, where a is any col in the table

  • BCNF reduces redundancy at the expense of creating additional tables.
  • BCNF may have several correct decompositions.
  • Some functional dependencies may be lost.

29 of 34

Boyce-Codd Normal Form (BCNF)

Not BCNF! SSN is not a trivial FD or a superkey

30 of 34

BCNF Decomposition

Algorithm

C = the set of all attributes in the relation R

Look for an attribute (or a set of attributes) that meets the following conditions (non-BCNF flag!):

  • X+ != X
  • X+ != C

If such attribute exists, it is a violation of the BCNF condition; decompose R:

  • R1 = X+
  • R2 = (C - X+) union (X)
  • Normalize each table (recursive call)

Else, the table is in BCNF

31 of 34

BCNF

Example

Relation: R (A, B, C, D, E)

FDs:

  • A → E
  • BC → A
  • DE → B

Goal: Decompose R into BCNF

32 of 34

BCNF

Example

R (B, C, A)

R (B, C, D)

R (A, E)

R (A, B, C, D)

R (A, B, C, D, E)

A+ = {A, E}

BCNF compliant

BCNF compliant

BCNF compliant

{B, C}+ = {B, C, A}

33 of 34

BCNF

Example

Notice that {A}+ = {A,E}, which violates the BCNF condition

  • We split R to R1(A,E) and R2(A,B,C,D)
  • R1 satisfies BCNF
  • R2 does not satisfy BCNF because: {B,C}+ = {B,C,A}

{B,C}+ = {B,C,A} violates the BCNF condition

  • *Notice there is no E in R2 so we don't need to consider DE → B
  • Split R2 to: R21(B,C,A) and R22(B,C,D)
  • R21 and R21 satisfy BCNF now

34 of 34

Worksheet