1 of 18

Schema Normalization

From FD’s to Keys to 3NF (or higher)

2 of 18

Review: Functional Dependency

Given the attributes: A1 A2 … An → B1 B2 … Bn

  • ” means “determines”
  • If two rows have the same values A1 A2 … An
  • Then they have the same values B1 B2 … Bn

Usually the right side is a single attribute

Attributes are optionally separated by commas

FD’s are useful to identify keys in the database!

3 of 18

Review: Keys

  • Superkey: a set of attributes that can uniquely define a tuple
  • Candidate key: a minimal superkey
    • if you remove any attributes, it’s no longer a superkey
  • Primary key: a candidate key selected to have the role of “primary” key
  • Example:

CAR( VIN, Make, Model, Year, State, LicPlateNum, OwnerID )

Some superkeys: {VIN, Make, Model, Year}, {VIN, State, OwnerID},

{VIN, State, LicPlateNum, OwnerID}, etc.

Candidate keys: {VIN}, {State, LicPlateNum}

Primary key: {VIN}

4 of 18

Review: Normal Forms

Principles for improving database design:

  • First Normal Form (1NF): Every cell of a table contains exactly one value, and each row is uniquely identifiable.
  • Second Normal Form (2NF): All non-key columns depend on the whole primary key (not just part of the key).
  • Third Normal Form (3NF): All non-key columns depend on the key, the whole key, and nothing but the key (no transitive dependencies).

See Example 3.10.9 from the textbook

5 of 18

Example 1

  • Given the relation:

R(A, B, C, D, E)

  • And the set of functional dependencies:

{CD → AB, B → E}

  • Find all candidate keys
  • Identify the highest normal form that R satisfies
  • If not in 3NF, decompose until reaching 3NF

6 of 18

Example 1

  • Given the relation:

R(A, B, C, D, E)

  • And the set of functional dependencies:

{CD → AB, B → E}

  • Find all candidate keys - CD
  • Identify the highest normal form that R satisfies - 2NF
  • If not in 3NF, decompose until reaching 3NF

7 of 18

Example 1

  • B -> E is a transitive FD
  • Decompose into:

R1(C, D, A, B)

R2(B, E)

8 of 18

Finding Keys

Look at the left and right sides of the functional dependencies

  • Only on the left = part of every key
  • Only on the right = not part of any key
  • On both sides = consider all subsets

For example: AB → C, C→ B, C→ D

  • Must have A, and will either have B, C, or BC
  • Solution: AB and AC are both keys

Left

Both

Right

A

BC

D

9 of 18

Example 2

  • Given the relation:

R(A, B, C, D, E, F, G, H)

  • And the set of functional dependencies:

{DE → C, E → A, D → BF, F → GH}

  • Find all candidate keys
  • Identify the highest normal form that R satisfies
  • If not in 3NF, decompose until reaching 3NF

10 of 18

Example 2

  • Given the relation:

R(A, B, C, D, E, F, G, H)

  • And the set of functional dependencies:

{DE → C, E → A, D → BF, F → GH}

  • Find all candidate keys - DE
  • Identify the highest normal form that R satisfies - 1NF
  • If not in 3NF, decompose until reaching 3NF

11 of 18

Example 2

  • Two partial key dependencies: E -> A and D -> BF
  • Decompose to 2NF:

R1(D, E, C), R2(E, A), R3(D, B, F, G, H)

  • Transitive dependency F -> GH still exists
  • Decompose to 3NF:

R1(D, E, C), R2(E, A)

R3(D, B, F), R4(F, G, H)

12 of 18

Boyce-Codd Normal Form (BCNF) and higher

  • Formally, a table is in Third Normal Form (3NF) if, whenever a non-key column A depends on column B, then B is unique.
  • A table is in Boyce-Codd Normal Form (BCNF) if, whenever�any column A depends on column B, then B is unique.
  • 4NF and 5NF are usually not used
  • Many other normal forms have been proposed

See Example 3.11.4 from the textbook

Article: Difference Between 3NF and BCNF

13 of 18

Normalization and denormalization

  • Good conceptual models usually yield good relational models
  • Relational models are checked for redundancies and usually normalized to 3NF/BCNF
  • Sometimes, relational database designers purposely allow redundancydenormalization to increase performance by reducing joins
    • In fact, the non-relational (NoSQL) document database model specifically trades redundancy (and all its issues) for read performance

14 of 18

Example 3

  • Given the set of relations:

{R1(A, B, C, D, E, F, G, H, K), R2(A, X, Y, Z)}

  • And the set of functional dependencies:

{AB → CD, E → GH, A → F,D → K, A → XY, Y → Z}

  • Find all candidate keys
  • Identify the highest normal form that R satisfies
  • If not in 3NF, decompose until reaching 3NF

15 of 18

Example 3

  • Given the set of relations:

{R1(A, B, C, D, E, F, G, H, K), R2(A, X, Y, Z)}

  • And the set of functional dependencies:

{AB → CD, E → GH, A → F,D → K, A → XY, Y → Z}

  • Find all candidate keys: ABE
  • Identify the highest normal form satisfied:�1NF with R1 in 1NF and R2 in 2NF
  • If not in 3NF, decompose until reaching 3NF

16 of 18

Example 3

  • Remove partial key dependencies - decompose to 2NF
  • Remove transitive dependencies - decompose to 3NF

R1 (A, B, C, D, K) - 2NF

R12 (E, G, H) - 3NF

R13 (A, F) - 3NF

R2 (A, X, Y, Z) - 2NF

R1 (A, B, C, D) - 3NF

R12 (E, G, H) - 3NF

R13 (A, F) - 3NF

R14 (D, K) - 3NF

R2 (A, X, Y) - 3NF

R21 (Y, Z) - 3NF

17 of 18

Example 3

  • Note that relations R13 and R2 have the same key and combine
  • Final set of relations:

R1 (A, B, C, D) - 3NF

R12 (E, G, H) - 3NF

R14 (D, K) - 3NF

R2 (A, X, Y, F) - 3NF

R21 (Y, Z) - 3NF

18 of 18

Synthesis Algorithm for 3NF

Shortcut (not always optimal)

  1. Identify a minimal set of dependencies
  2. For each functional dependency X → A
    1. Create a table with attributes from X and A
    2. Drop tables that are subsets of other tables
  3. If none of the tables is a superkey:
    • Create a table whose schema is a key