1 of 31

Lecture:�Design Theory – Part 1

2 of 31

3 of 31

Example Enrollment table - “v0”

~375

cs145

students

~300

cs245

students

Problems

Repeats?

Room/time change?

Deletes?

Properties

Class -> Room/time

Room -> Lat, Lng

(more compact)

4 of 31

Example Enrollment table - “v1”

375

cs145

students

300

cs245

students

5 of 31

Design Theory

  • Design theory: how to represent your data to avoid anomalies.
        • Note: different than concurrency anomalies.

  • Simple algorithms for “best practices”

6 of 31

Data Anomalies & Constraints

7 of 31

Constraints Prevent (some) �Anomalies in the Data

Student

Course

Room

Mary

CS145

B01

Joe

CS145

B01

Sam

CS145

B01

..

..

..

If every course is in only one room, contains redundant information!

A poorly designed database causes anomalies:

8 of 31

Constraints Prevent (some) �Anomalies in the Data

Student

Course

Room

Mary

CS145

B01

Joe

CS145

C12

Sam

CS145

B01

..

..

..

If we update the room number for one tuple, we get inconsistent data = an update anomaly

A poorly designed DB can have anomalies:

9 of 31

Constraints Prevent (some) �Anomalies in the Data

Student

Course

Room

..

..

..

If everyone drops the class, we lose what room the class is in! = a delete anomaly

A poorly designed database causes anomalies:

10 of 31

Constraints Prevent (some) �Anomalies in the Data

Student

Course

Room

Mary

CS145

B01

Joe

CS145

B01

Sam

CS145

B01

..

..

..

Similarly, we can’t reserve a room without students = an insert anomaly

A poorly designed database causes anomalies:

CS229

C12

11 of 31

Constraints Prevent (some) �Anomalies in the Data

Student

Course

Mary

CS145

Joe

CS145

Sam

CS145

..

..

Course

Room

CS145

B01

CS229

C12

What are “good” decompositions?

Is this form better?

  • Redundancy?
  • Update anomaly?
  • Delete anomaly?
  • Insert anomaly?

12 of 31

Functional Dependencies

13 of 31

A Picture Of FDs

Definition:

Given attribute sets A={A1,…,Am} and B = {B1,…Bn} in R,

The functional dependency A→ B on R holds if for any ti,tj in R:

if ti[A1] = tj[A1] AND ti[A2]=tj[A2] AND … AND ti[Am] = tj[Am]

then ti[B1] = tj[B1] AND ti[B2]=tj[B2] AND … AND ti[Bn] = tj[Bn]

A1

Am

B1

Bn

ti

tj

If ti,tj agree here..

…they also agree here!

A->B means that

“whenever two tuples agree on A then they agree on B.”

14 of 31

FDs for Relational Schema Design

High-level idea: why do we care about FDs?

    • Start with some relational schema

    • Find functional dependencies (FDs)

    • Use these to design a better schema

One which minimizes the possibility of anomalies

15 of 31

Functional Dependencies as Constraints

Student

Course

Room

Mary

CS145

B01

Joe

CS145

B01

Sam

CS145

B01

..

..

..

Note: The FD {Course} -> {Room} holds on this table instance

However, cannot prove that the FD {Course} -> {Room} holds on all instances. That is, FDs are for an instance and not for schema

16 of 31

Functional Dependencies as Constraints

Student

Course

Room

Mary

CS145

B01

Joe

CS145

B01

Sam

CS145

B01

..

..

..

Note that:

  • You can check if an FD is violated by examining a single instance;
  • However, you cannot prove that an FD is part of the schema by examining a single instance.
    • This would require checking every valid instance

17 of 31

More Examples

An FD is a constraint which holds, or does not hold on an instance:

EmpID

Name

Phone

Position

E0045

Smith

1234

Clerk

E3542

Mike

9876

Salesrep

E1111

Smith

9876

Salesrep

E9999

Mary

1234

Lawyer

18 of 31

More Examples

{Position} → {Phone}

EmpID

Name

Phone

Position

E0045

Smith

1234

Clerk

E3542

Mike

9876 ←

Salesrep

E1111

Smith

9876 ←

Salesrep

E9999

Mary

1234

Lawyer

19 of 31

More Examples

EmpID

Name

Phone

Position

E0045

Smith

1234 →

Clerk

E3542

Mike

9876

Salesrep

E1111

Smith

9876

Salesrep

E9999

Mary

1234 →

Lawyer

but not {Phone} → {Position}

20 of 31

Example

(mega)

Enrollment table - “v0”

~375

cs145

students

~300

cs245

students

Problems

Repeats?

Room/time change?

Deletes?

FDs

Class -> Room,Time

Room -> Lat, Lng

(more compact)

21 of 31

Table Decomposition

R1 = the projection of R on A1, ..., An, B1, ..., Bm

R(A1,...,An,B1,...,Bm,C1,...,Cp)

R1(A1,...,An,B1,...,Bm)

R2(A1,...,An,C1,...,Cp)

R2 = the projection of R on A1, ..., An, C1, ..., Cp

22 of 31

Conceptual Design

For a “mega” table

      • Search for “bad” dependencies

      • If any, keep decomposing the table into sub-tables until no more bad dependencies

      • When done, the database schema is normalized

Note: there are several “good” (normal) forms…

23 of 31

Finding Functional Dependencies

1. {Name} → {Color}

2. {Category} → {Department}

3. {Color, Category} → {Price}

Name

Color

Category

Dep

Price

Gizmo

Green

Gadget

Toys

49

Widget

Black

Gadget

Toys

59

Gizmo

Green

Whatsit

Garden

99

Which / how many other FDs do?!?

Provided FDs:

Products

Given the provided FDs,

{Name, Category} → {Price} must also hold on any instance

Example:

24 of 31

Inferring FDs

Given a set of FDs, F = {f1,…fn}, does an FD g hold?

(Similar idea to Logic inferencing. E.g, A implies B, B implies C, so A implies C)

Armstrong’s Rules.

  1. Split/Combine
  2. Reduction
  3. Transitivity

25 of 31

1. Split/Combine

A1

Am

B1

Bn

A1, …, Am → B1,…,Bn

… is equivalent to the following n FDs…

A1,…,Am → Bi for i=1,…,n

26 of 31

1. Split/Combine

A1

Am

B1

Bn

A1, …, Am → B1,…,Bn

… is equivalent to …

And vice-versa, A1,…,Am → Bi for i=1,…,n

27 of 31

2. Reduction/Trivial

A1

Am

If B is subset of A (=A1,…,Am ) for any j=1,…,m

A → B

28 of 31

3. Transitive Closure

A1

Am

B1

Bn

C1

Ck

A1, …, Am → B1,…,Bn and

B1,…,Bn → C1,…,Ck

29 of 31

3. Transitive Closure

A1

Am

B1

Bn

C1

Ck

A1, …, Am → B1,…,Bn and

B1,…,Bn → C1,…,Ck

implies

A1,…,Am → C1,…,Ck

30 of 31

Finding Functional Dependencies

1. {Name} → {Color}

2. {Category} → {Department}

3. {Color, Category} → {Price}

Name

Color

Category

Dep

Price

Gizmo

Green

Gadget

Toys

49

Widget

Black

Gadget

Toys

59

Gizmo

Green

Whatsit

Garden

99

Which / how many other FDs hold?

Provided FDs:

Products

Example:

31 of 31

Finding Functional Dependencies

1. {Name} → {Color}

2. {Category} → {Dept.}

3. {Color, Category} → {Price}

What’s an algorithmic way to do this?

Provided FDs:

Example:

Inferred FD

Rule used

4. {Name, Category} -> {Name}

5. {Name, Category} -> {Color}

6. {Name, Category} -> {Category}

7. {Name, Category} -> {Color, Category}

8. {Name, Category} -> {Price}

Trivial

Transitive (4 -> 1)

Trivial

Transitive (7 -> 3)

Split/Combine (5 + 6)