1 of 29

CSE 414: Section 5

Database Design Theory

Feb 3, 2020

Snigdha

Aaditya

2 of 29

Announcements

  • HW3 due on Saturday at 11pm
  • Come to OH or post on Ed if you need help!

3 of 29

Agenda

  1. ER Diagram Review
  2. Functional Dependencies
  3. Closures
  4. Superkeys and Minimal Keys
  5. BCNF

4 of 29

Database Design

4

October 22, 2020

Database Design

Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.

Consider what data needs to be stored and the interrelationship of the data.

Arbitrary Data

Database

There are no good or bad designs, just tradeoffs.

Design Section

5 of 29

Data Relationship Discovery

How do we find relationships between attributes?

5

October 22, 2020

Domain Approach

E/R Diagrams

Data Approach

Functional Dependencies

Boyce-Codd Normal Form

(BCNF)

Design Section

6 of 29

Building Blocks

6

October 22, 2020

Entity (nouns)

E.g. Company, Ocean, Employee

Attributes (characteristics of entities)

E.g. Name, Temperature, Salary

Relations among entities (verbs)

E.g. Employs, Borders, Oversees

Design Section

7 of 29

Entity Sets

7

October 22, 2020

Design Section

8 of 29

Relationships

  • one-one: ssn - UW student id
  • one-many: ssn - phone#
  • many-many: store - product
  • is-a: computer - PC and computer - Mac
  • has-a: country - city
    • What country does the city of Cambridge belong to?

8

October 22, 2020

 

“at most one”

“exactly one”

is a

“subclass”

no constraint

other constraint

Design Section

9 of 29

ER Diagrams

  • Visual graph of entities and relationships

9

October 22, 2020

Product

Company

Person

employs

buys

makes

price

name

ceo

name

address

address

name

id

Design Section

10 of 29

Functional Dependencies

A functional dependency (FD) for relation R is a formula of the form A → B

where A and B are sets or individual attributes of R.

We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.

*in other words: A determines B, or A is kinda sorta a key for B

11 of 29

Example

Student(studentID, name, dateOfBirth, phoneNumber)

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

12 of 29

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

13 of 29

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

14 of 29

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

15 of 29

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

There are two tuples that agree on studentID but don’t agree on phoneNumber!

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

16 of 29

Trivial FDs

A trivial FD is one where the RHS is a subset of the LHS

That is, every attribute on the RHS is also on the LHS

The two FDs below are trivial:

{name} → {name}

{studentID,name} → {name}

Note that every LHS will always determine itself, but they are not always explicitly written for the sake of convenience.

17 of 29

Inference Rules

  • If b is a subset of a, then ab (reflexivity or trivial FD)
    • then a → a (if a and b are subsets of each other, aka a = b!)

  • If a → b then a, c → b, c (augmentation)
    • then a, c → b
    • but not a → b, c
    • Adding on the left is chill. Adding on the right though 🚩🚩🚩🚩🚩🚩

  • If a → b & b → c then a → c (transitivity)
    • Another example: if abc and cd then a → bd (pseudo transitivity) or even a -> bcd

18 of 29

Closure

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • Suppose we have the dependencies A → B and B → C
    • To determine the closure of A we start with trivial FD: A → A and X = {A}
    • Since we have A → B, we add B to X = {A, B}
    • With this we can know that A → B, C because B → C, so we can add C to X = {A, B, C}
    • Formally, the closure of A is written as {A}+ = {A, B, C}

19 of 29

Closures (quicker)

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • If we have A → B and B → C, then A → C
  • So really, A → B, C
  • Add A to the RHS so A → A, B, C
  • Formally, the closure of A is written as {A}+ = {A, B, C}

20 of 29

Example

Given studentID → name, dateOfBirth:

{studentID}+ = {studentID, name, dateOfBirth}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

21 of 29

Keys

We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.

If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

22 of 29

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

B -> C, D

Question: is {A, B} a superkey, minimal key, neither, or both?

Question: is {B} a superkey, minimal key, neither, or both?

Question: is {A} a superkey, minimal key, neither or both?

superkey

neither

both

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

23 of 29

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

C -> D

Question: is {A, C} a superkey, minimal key, or both?

Question: is {A, B, C, D} a superkey, minimal key, or both?

Question: is {A, C, D} a superkey, minimal key, or both?

both

superkey

superkey

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

24 of 29

Boyce-Codd Normal Form (BCNF)

A relation R is in BCNF if every set of attributes in R is either a superkey or its closure is the same set.

*that is, either {a}+ -> a OR {a}+ -> {all cols}, where a is any col in the table

  • BCNF reduces redundancy at the expense of creating additional tables. For example, repetition of student information in the above examples.
  • BCNF may have several correct decompositions.

25 of 29

BCNF Decomposition Algorithm

X +

26 of 29

BCNF Decomposition Algorithm

C = the set of all attributes in the relation R

Look for an attribute (or a set of attributes) that meets the following conditions (non-BCNF flag!):

  • X+ != X
  • X+ != C

If such attribute exists, it is a violation of the BCNF condition; decompose R:

  • R1 = X+
  • R2 = (C - X+) union (X)
  • Normalize each table (recursive call)

Else, the table is in BCNF

27 of 29

Example

Relation: R (A, B, C, D, E)

FDs:

  • A → E
  • BC → A
  • DE → B

Goal: Decompose R into BCNF

28 of 29

Example

R (A, B, C, D, E)

R (A, E)

R (A, B, C, D)

R (B, C, A)

R (B, C, D)

A+ = {A, E}

BCNF compliant

BCNF compliant

BCNF compliant

{B, C}+ = {B, C, A}

29 of 29

Solution

Notice that {A}+ = {A,E}, which violates the BCNF condition

  • We split R to R1(A,E) and R2(A,B,C,D)
  • R1 satisfies BCNF
  • R2 does not satisfy BCNF because: {B,C}+ = {B,C,A}

Notice there is no E in R2 so we don't need to consider FD DE → B

  • Split R2 to: R21(B,C,A) and R22(B,C,D)
  • R21 and R21 satisfy BCNF now