1 of 27

CSE 414: Section 5

Database Design Theory

10/27/22

2 of 27

Announcements

  • Homework 3 late submissions due 10/28
  • Homework 2 grades released!
    • Go over them
    • Regrade requests are open
    • Contact us if you have any doubts

3 of 27

Database Design

3

Database Design

Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.

Consider what data needs to be stored and the interrelationship of the data.

Arbitrary Data

Database

There are no good or bad designs, just tradeoffs.

Design Section

4 of 27

Data Relationship Discovery

How do we find relationships between attributes?

Flights dataset:

Did you know that every canceled flight�has an actual_time of 0?

  1. Domain approach
  2. Data mining approach

4

Design Section

5 of 27

Data Relationship Discovery

Domain approach:

Before looking at the data, �think about the domain

Research the domain and realize that �canceled flights have a 0 flight time!

5

Design Section

6 of 27

Data Relationship Discovery

Data mining approach:

Don’t worry about the domain, �just find patterns in the data

Look at the data and notice that every time canceled = 1, then actual_time = 0.

SELECT DISTINCT actual_time

FROM Flights

WHERE canceled = 1;

The only actual_time value is 0

6

Design Section

7 of 27

Data Relationship Discovery

How do we find relationships between attributes?

7

Domain Approach

E/R Diagrams

Data Approach

Functional Dependencies (FDs)

Boyce-Codd Normal Form (BCNF)

Design Section

8 of 27

ER Diagrams

  • Visual graph of entities and relationships

8

Product

Company

Person

employs

buys

makes

price

name

ceo

name

address

address

name

id

Design Section

9 of 27

Relationships

  • one-one: ssn - UW student id (foreign key)
  • one-many: ssn - phone # (foreign key)
  • many-many: store - product (create a new table)
  • is-a: computer - PC and computer - Mac
  • has-a: country - city
    • What country does the city of Cambridge belong to?

9

 

“at most one”

“exactly one”

is a

“subclass”

no constraint

other constraint

Design Section

10 of 27

E/R Demo

Create an ER diagram containing the following:

  • The id, name, gender, country of birth, �and country region (continent) of birth for students and faculty
  • The major of students
  • The salary of faculty
  • The courses that each student takes
  • The department each course is offered in

10

Design Section

11 of 27

E/R Demo – Sample Solution

11

The id, name, gender, country of birth, and country region of birth for students and faculty

The major of students

The salary of faculty

The courses that each student takes

The department each course is offered in

Design Section

12 of 27

E/R Demo – Sample Solution

12

The id, name, gender, country of birth, and country region of birth for students and faculty

The major of students

The salary of faclty

The courses that each student takes

The department each course is offered in

Country (name, region)

People (id, country_name, gender, p_name)

Faculty (People.id, salary)

Student (People.id, major)

Course (courseID, dept)

Took (Student.People.id, Course.courseID)

Design Section

13 of 27

Functional Dependencies

A functional dependency (FD) for relation R is a formula of the form A → B

where A and B are sets or individual attributes of R.

We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.

DEPENDENCY != CAUSATION

*in other words: A determines B, or A is kinda sorta a key for B

14 of 27

Example

Student(studentID, name, dateOfBirth, phoneNumber)

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

15 of 27

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

16 of 27

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

17 of 27

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

18 of 27

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

There are two tuples that agree on studentID but don’t agree on phoneNumber!

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

19 of 27

Trivial FDs

Axiom of Reflexivity

A trivial FD is one where the RHS is a subset of the LHS

That is, every attribute on the RHS is also on the LHS

The two FDs below are trivial:

{name} → {name}

{studentID,name} → {name}

20 of 27

Inference Rules

  • If b is a subset of a, then ab (reflexivity or trivial FD)
    • then a → a (if a and b are subsets of each other, aka a = b!)

  • If a → b then a, c → b, c (augmentation)
    • then a, c → b
    • but not a → b, c
    • “Adding on the left is chill. Adding on the right though 🚩🚩🚩🚩🚩🚩”

  • If a → b & b → c then a → c (transitivity)
    • If abc and c → d then a → bd (pseudo transitivity) or even a -> bcd

21 of 27

Closure

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • Suppose we have the attributes A → B and B → C
    • To determine the closure of A we start with trivial FD: A → A and X = {A}
    • Since we have A → B, we add B to X = {A, B}
    • We also have {A, B, C}
    • With this we can know that A → A, B, and B → C, so we can add C to X =
    • Formally, the closure of A is written as {A}+ = {A, B, C}

22 of 27

Closures (quicker)

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • If we have A → B and B → C, then A → C
  • So really, A → B, C
  • Add A to the RHS so A → A, B, C
  • Formally, the closure of A is written as {A}+ = {A, B, C}

23 of 27

Example

Given studentID → name, dateOfBirth:

{studentID}+ = {studentID, name, dateOfBirth}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

24 of 27

Keys

We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.

If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

25 of 27

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

B -> C, D

Question: is {A, B} a superkey, minimal key, neither, or both?

Question: is {B} a superkey, minimal key, neither, or both?

Question: is {A} a superkey, minimal key, neither or both?

superkey

neither

both

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

26 of 27

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

C -> D

Question: is {A, C} a superkey, minimal key, or both?

Question: is {A, B, C, D} a superkey, minimal key, or both?

Question: is {A, C, D} a superkey, minimal key, or both?

both

superkey

superkey

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

27 of 27

customer