1 of 38

CSE 414: Section 4

RA + Database Design Theory

4/20/23

2 of 38

Announcements

  • Homework 3 due 4/27 (next Thursday)
  • Azure setup demo is pinned on Ed

3 of 38

Relational Algebra

4 of 38

RA Operators

Standard:

⋂ - Intersect

⋃ - Union

- Difference

σ - Select

π - Project

⍴ - Rename

Extended:

δ - Duplicate Elim.

ɣ - Group/Agg.

τ - Sorting

Joins:

⨝ - Nat. Join

⟕ - L.O. Join

⟖ - R.O. Join

⟗ - F.O. Join

✕- Cross Product

4

5 of 38

RA Operators

Standard:

⋂ - Intersect

⋃ - Union

- Difference

σ - Select

π - Project

⍴ - Rename

Extended:

δ - Duplicate Elim.

ɣ - Group/Agg.

τ - Sorting

Joins:

⨝ - Nat. Join

⟕ - L.O. Join

⟖ - R.O. Join

⟗ - F.O. Join

✕- Cross Product

5

6 of 38

Ɣ Notation

Grouping and aggregation on group:

ɣattr_1, …, attr_k, count/sum/max/min(attr) -> alias

Aggregation on the entire table:

ɣcount/sum/max/min(attr) -> alias

6

7 of 38

Format

  • Follows FWGHOS Structure

7

8 of 38

Query Plans (Example SQL -> RA)

Select-Join-Project structure

Make this SQL query into RA (remember FWGHOS):

SELECT R.b, T.c, max(T.a) AS T_max

FROM Table_R AS R, Table_T AS T

WHERE R.b = T.b

GROUP BY R.b, T.c

HAVING max(T.a) > 99

8

9 of 38

Query Plans (Example SQL -> RA)

Select-Join-Project structure

Make this SQL query into RA (remember FWGHOS):

SELECT R.b, T.c, max(T.a) AS T_max

FROM Table_R AS R, Table_T AS T

WHERE R.b = T.b

GROUP BY R.b, T.c

HAVING max(T.a) > 99

πR.b, T.c, T_maxT_max>99R.b, T.c, max(T.a)->T_max(R R.b=T.b T)))

9

10 of 38

Difference Operator

SELECT DISTINCT R.a

FROM Table_R AS R

WHERE NOT EXISTS (

SELECT *

FROM Table_S AS S

WHERE S.b = R.a

AND S.c < 15

);

10

  • We need to correctly exclude rows if they exist in the subquery
  • We cannot use σ (select) to compare rows
  • Solution is to use the (difference) operator!

11 of 38

Difference Operator

SELECT DISTINCT R.a

FROM Table_R AS R

WHERE NOT EXISTS (

SELECT *

FROM Table_S AS S

WHERE S.b = R.a

AND S.c < 15);

11

πR2.a

12 of 38

Difference Operator

SELECT DISTINCT R.a

FROM Table_R AS R

WHERE NOT EXISTS (

SELECT *

FROM Table_S AS S

WHERE S.b = R.a

AND S.c < 15);

Equivalent SQL:

SELECT DISTINCT * FROM Table_R R2

EXCEPT

SELECT R1.a FROM Table_R R1, Table_S S

WHERE S.c < 15 AND R1.a = S.b;

12

πR2.a

13 of 38

Database Design

14 of 38

Database Design

14

Database Design

Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.

Consider what data needs to be stored and the interrelationship of the data.

Arbitrary Data

Database

There are no good or bad designs, just tradeoffs.

Design Section

15 of 38

Data Relationship Discovery

How do we find relationships between attributes?

Flights dataset:

Did you know that every canceled flight�has an actual_time of 0?

  1. Domain approach
  2. Data mining approach

15

Design Section

16 of 38

Data Relationship Discovery

Domain approach:

Before looking at the data, �think about the domain

Research the domain and realize that �canceled flights have a 0 flight time!

16

Design Section

17 of 38

Data Relationship Discovery

Data mining approach:

Don’t worry about the domain, �just find patterns in the data

Look at the data and notice that every time canceled = 1, then actual_time = 0.

SELECT DISTINCT actual_time

FROM Flights

WHERE canceled = 1;

The only actual_time value is 0

17

Design Section

18 of 38

Data Relationship Discovery

How do we find relationships between attributes?

18

Domain Approach

E/R Diagrams

Data Approach

Functional Dependencies (FDs)

Boyce-Codd Normal Form (BCNF)

Design Section

19 of 38

ER Diagrams

  • Visual graph of entities and relationships

19

Product

Company

Person

employs

buys

makes

price

name

ceo

name

address

address

name

id

Design Section

20 of 38

Relationships

  • one-one: ssn - UW student id
  • one-many: ssn - phone #
  • many-many: store - product
  • is-a: computer - PC and computer - Mac
  • has-a: country - city
    • What country does the city of Cambridge belong to?

20

 

“at most one”

“exactly one”

is a

“subclass”

no constraint

other constraint

Design Section

21 of 38

E/R Demo

Create an ER diagram containing the following:

  • The id, name, gender, country of birth, �and country region (continent) of birth for students and faculty
  • The major of students
  • The salary of faculty
  • The courses that each student takes
  • The department each course is offered in

21

Design Section

22 of 38

E/R Demo – Sample Solution

22

The id, name, gender, country of birth, and country region of birth for students and faculty

The major of students

The salary of faculty

The courses that each student takes

The department each course is offered in

Design Section

23 of 38

E/R Demo – Sample Solution

23

The id, name, gender, country of birth, and country region of birth for students and faculty

The major of students

The salary of faculty

The courses that each student takes

The department each course is offered in

Country (name, region)

People (id, country_name, gender, p_name)

Faculty (People.id, salary)

Student (People.id, major)

Course (courseID, dept)

Took (Student.People.id, Course.courseID)

Design Section

24 of 38

Functional Dependencies

A functional dependency (FD) for relation R is a formula of the form A → B

where A and B are sets or individual attributes of R.

We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.

DEPENDENCY != CAUSATION

*in other words: A determines B, or A is kinda sorta a key for B

25 of 38

Example

Student(studentID, name, dateOfBirth, phoneNumber)

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

26 of 38

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

27 of 38

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

28 of 38

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

29 of 38

Example

Do the following FDs hold?

{studentID} → {name}

{studentID} → {name, dateOfBirth}

{studentID} → {phoneNumber}

There are two tuples that agree on studentID but don’t agree on phoneNumber!

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

30 of 38

Trivial FDs

Axiom of Reflexivity

A trivial FD is one where the RHS is a subset of the LHS

That is, every attribute on the RHS is also on the LHS

The two FDs below are trivial:

{name} → {name}

{studentID,name} → {name}

31 of 38

Inference Rules

  • If b is a subset of a, then ab (reflexivity or trivial FD)
    • then a → a (if a and b are subsets of each other, aka a = b!)

  • If a → b then a, c → b, c (augmentation)
    • then a, c → b
    • but not a → b, c
    • “Adding on the left is chill. Adding on the right though 🚩🚩🚩🚩🚩🚩”

  • If a → b & b → c then a → c (transitivity)
    • If abc and c → d then a → bd (pseudo transitivity) or even a -> bcd

32 of 38

Closure

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • Suppose we have the attributes A → B and B → C
    • To determine the closure of A we start with trivial FD: A → A and X = {A}
    • Since we have A → B, we add B to X = {A, B}
    • We also have {A, B, C}
    • With this we can know that A → A, B, and B → C, so we can add C to X =
    • Formally, the closure of A is written as {A}+ = {A, B, C}

33 of 38

Closures (quicker)

Goal:

  • We want everything that an attribute/set of attributes determines

Observation:

  • If we have A → B and B → C, then A → C
  • So really, A → B, C
  • Add A to the RHS so A → A, B, C
  • Formally, the closure of A is written as {A}+ = {A, B, C}

34 of 38

Example

Given studentID → name, dateOfBirth:

{studentID}+ = {studentID, name, dateOfBirth}

studentID

name

dateOfBirth

phoneNumber

1234

Allison

01-09-1999

253-876-1028

1235

Ben

03-10-2000

206-999-1923

1236

Chris

05-23-1999

206-127-1010

1234

Allison

01-09-1999

253-307-1678

35 of 38

Keys

We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.

If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

36 of 38

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

B -> C, D

Question: is {A, B} a superkey, minimal key, neither, or both?

Question: is {B} a superkey, minimal key, neither, or both?

Question: is {A} a superkey, minimal key, neither or both?

superkey

neither

both

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

37 of 38

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

C -> D

Question: is {A, C} a superkey, minimal key, or both?

Question: is {A, B, C, D} a superkey, minimal key, or both?

Question: is {A, C, D} a superkey, minimal key, or both?

both

superkey

superkey

A superkey determines all other attributes in a schema

A minimal key is the smallest set of attributes that determines all other attributes in a schema

By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.

38 of 38

Worksheet Time!