1 of 36

Normalization

Introduction to Database

By 

Michael Kona

2 of 36

Normalization

  • It is the process of determining how much redundancy exists in a table. 
  • The goals of normalization are to:
    • Be able to characterize the level of redundancy in a relational schema
    • Provide mechanisms for transforming schemas in order to remove redundancy

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

3 of 36

Normalization

  • Normalization theory draws heavily on the theory of functional dependencies.
  • Normalization theory defines six normal forms (NF).
    • First Normal Form (1NF)
    • Second Normal Form (2NF)
    • Third Normal Form (3NF)
    • Boyce Codd Normal Form (BCNF)
    • Fourth Normal Form (4NF)
    • Fifth Normal Form (5NF)
    • Sixth Normal Form (6NF)

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

4 of 36

First Normal Form (1NF)

  • only single values are permitted at the intersection of each row and column(in a cell); hence, there are no repeating groups.
  • To normalize a relation that contains a repeating group, remove the repeating group and form two new relations.
  • The PK of the new relation is a combination of following for unique identification.
    • the PK of the original relation 
    • an attribute from the newly created relation 

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

5 of 36

Process for 1NF

  • Example:-
    • Student_Grade_Report (StudentNo, StudentName, Major, CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation, Grade)   - Initial Table
  • First normal form
    • An attribute (column) of a table cannot hold multiple values. It should hold only atomic values
  • For a single student there can be multiple courses.
  • Example:-
    • Student (StudentNo, StudentName, Major)   - Table1
    • StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation, Grade)   -  Table2

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

6 of 36

Example 

The First normal form simply says that each cell of a table should contain exactly one value.

In the first row, we are storing 2 courses against Prof. George. This isn’t the optimal way

If we want to edit some information related to CS101, we do not have to touch the data corresponding to CS154. Also, observe that each row stores unique information. 

7 of 36

Second Normal Form (2NF)

  • For the second normal form, the relation must first be in 1NF
  • The relation is automatically in 2NF if, and only if, the PK comprises a single attribute.
  • If the relation has a composite PK, then each non-key attribute must be fully dependent on the entire PK and not on a subset of the PK (i.e., there must be no partial dependency or augmentation).

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

8 of 36

Example for 2NF

  • Following Tables from 1NF
    • Student (StudentNo, StudentName, Major)
    • StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation, Grade)
  • Except 'Grade' remaining all attributes has Partial dependency on PK (studentno & courseno)
    • Student (StudentNo, StudentName, Major)
    • CourseGrade (StudentNo, CourseNo, Grade)
    • CourseInstructor (CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation)

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

9 of 36

Example

Source url:-https://codeandwork.github.io/courses/cs/database_normalization.html​

10 of 36

Third Normal Form (3NF)

To be in third normal form, the relation must be in second normal form.

Also all transitive dependencies must be removed;

a non-key attribute may not be functionally dependent on another non-key attribute.

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

11 of 36

Process for 3NF

  • Example – Tables before 3NF
    • Student (StudentNo, StudentName, Major)
    • CourseGrade (StudentNo, CourseNo, Grade)
    • CourseInstructor (CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation)
  • Transitive dependency - 'InstructorNo' has dependency with 'CourseNo' and 'InstructorName' has dependency with 'InstructorNo'.
  • Eliminate all dependent attributes in transitive relationship(s) from each of the tables that have a transitive relationship.
  • Create new table(s) with removed dependency.
  • Check new table(s) as well as table(s) modified to make sure that each table has a determinant and that no table contains inappropriate dependencies.
  • Example - Tables after transformation
    • Student (StudentNo, StudentName, Major)
    • CourseGrade (StudentNo, CourseNo, Grade)
    • Course (CourseNo, CourseName, InstructorNo)
    • Instructor (InstructorNo, InstructorName, InstructorLocation)

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

Transitive Dependency

Course No --> Instructor No --> Instructor Name

12 of 36

Example

Source url:-https://codeandwork.github.io/courses/cs/database_normalization.html​

13 of 36

Boyce-Codd Normal Form (BCNF)

  • When a table has more than one candidate key, anomalies may exist even though the relation is in 3NF. 
  • Boyce-Codd normal form is a special case of 3NF. 

  • A relation is in BCNF if, and only if, every determinant is a candidate key.

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

14 of 36

BCNF Example

Student_id

Major

Advisor

111

Physics

Smith

111

Music

Chan

320

Math

Dobbs

671

Physics

White

803

Physics

Smith

Consider the following table (St_Maj_Adv).

The semantic rules (business rules applied to the database) for this table are:

  1. Each Student may major in several subjects.
  2. For each Major, a given Student has only one Advisor.
  3. Each Major has several Advisors.
  4. Each Advisor advises only one Major.
  5. Each Advisor advises several Students in one Major.

The functional dependencies for this table are listed below.

 The first one is a candidate key; the second is not.

  1. Student_id, Major ——> Advisor
  2. Advisor ——> Major

Anomalies for this table include:

  1. Delete – student deletes advisor info
  2. Insert – a new advisor needs a student
  3. Update – inconsistencies

Note: No single attribute is a candidate key.​

PK can be Student_id, Major or Student_id, Advisor​

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

15 of 36

BCNF Example – contd.

  • To reduce the St_Maj_Adv relation to BCNF, you create two new tables:
  • St_Adv (Student_id, Advisor)
  • Adv_Maj (Advisor, Major)

Student_id

Advisor

111

Smith

111

Chan

320

Dobbs

671

White

803

Smith

Advisor

Major

Smith

Physics

Chan

Music

Dobbs

Math

White

Physics

St_Adv table

Adv_Maj table

https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/

16 of 36

Fourth Normal Form (4NF)

  • Rules for 4NF
    • Meet all the requirements of the third normal form. 
    • A relation is in 4NF if it has no multi-valued dependencies.

https://www.studytonight.com/dbms/fourth-normal-form.php

17 of 36

Multi Valued Dependency (MVD)

  • A table is said to have multi-valued dependency, if the following conditions are true,
    • For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have multi-valued dependency.
    • Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
    • And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.
  • If all these conditions are true for any relation(table), it is said to have multi-valued dependency.

https://www.studytonight.com/dbms/fourth-normal-form.php

18 of 36

4NF - Example

https://en.wikipedia.org/wiki/Fourth_normal_form

pizza varieties offered by a restaurant are not affected by delivery area 

19 of 36

Lossless Join Decomposition

Relation Decomposition

  • Decomposition:
    • The process of decomposing the universal relation schema R into a set of relation schemas D = {R1,R2, …, Rm} that will become the relational database schema by using the functional dependencies.  

  • Attribute preservation condition:
    • Each attribute in R will appear in at least one relation schema Ri in the decomposition so that no attributes are “lost”.
  • Another goal of decomposition is to have each individual relation Ri in the decomposition D be in BCNF or 3NF. 

20 of 36

Lossless Join Decomposition

  • Lossless (Non-additive) Join Property of a Decomposition: 
    • Definition:
      • Lossless join property: a decomposition D = {R1, R2, ..., Rm} of R has the lossless (nonadditive) join property with respect to the set of dependencies F on R if, for every relation state r of R that satisfies F, the following holds, where * is the natural join of all the relations in D:  
  • * (p R1(r), ..., pRm(r)) = r
    • Note: The word loss in lossless refers to loss of information, not to loss of tuples. In fact, for “loss of information” a  better term is “addition of spurious information

21 of 36

Lossless Join Decomposition

  • Algorithm 11.1: Testing for Lossless Join Property 
  • Input: A universal relation R, a decomposition D = {R1, R2, ..., Rm} of R, and a set F of functional dependencies. 
  • 1. Create an initial matrix S with one row i for each relation Ri in D, and one column j for each attribute Aj in R.
  • 2. Set S(i,j):=bij for all matrix entries. (* each bij is a distinct symbol associated with indices (i,j) *).
  • 3. For each row i representing relation schema Ri 
  •   {for each column j representing attribute Aj 
  •       {if (relation Ri includes attribute Aj) then set S(i,j):= aj;};};
  • (* each aj is a distinct symbol associated with index (j) *)

22 of 36

Lossless Join Decomposition

  • 4. Repeat the following loop until a complete loop execution results in no changes to S 

  {for each functional dependency X gY in F 

  {for all rows in S which have the same symbols in the columns corresponding to attributes in X

         {make the symbols in each column that correspond to an attribute in Y be the same in all these rows as follows:

  If any of the rows has an “a” symbol for the column, set the other rows to that same “a” symbol in the column.

  If no “a” symbol exists for the attribute in any of the rows, choose one of the “b” symbols that appear in one of the rows for the attribute and set the other rows to that same “b” symbol in the column ;};

  };

  };

  • 5. If a row is made up entirely of “a” symbols, then the decomposition has the lossless join property; otherwise it does not.

23 of 36

Lossless Join Decomposition

24 of 36

Join Dependency and 5NF

JD - Definition: 

  • A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema R, specifies a constraint on the states r of R.
  • The constraint states that every legal state r of R should have a non-additive join decomposition into R1, R2, ..., Rn; that is, for every such r we have

                                        * (pR1(r), pR2(r), ..., pRn(r)) = r

  • A join dependency JD(R1, R2, ..., Rn), specified on relation schema R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2, ..., Rn) is equal to R

25 of 36

Fifth normal form

  • 5NF is also known as project-join normal form (PJ/NF)
  •  It is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.

26 of 36

Fifth normal form

  • table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys.
  • A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.

Source:-https://en.wikipedia.org/wiki/Fifth_normal_form

27 of 36

Example - 5NF

  • PK -  composite of all three columns. 
  • table is in 4NF
  • Problem:-
    • Suppose that Jack Schneider starts selling Robusto's products Breadboxes and Vacuum Cleaners. 
    • We would have to add two new entries one for each product type 
      • <Jack Schneider, Robusto, Breadboxes>
      • <Jack Schneider, Robusto, Vacuum Cleaners>

Source:-https://en.wikipedia.org/wiki/Fifth_normal_form

28 of 36

Example - 5NF

  •  Suppose that Jack Schneider starts selling Robusto's products Breadboxes and Vacuum Cleaners.

  • we need to add only a single entry (<Jack Schneider, Robusto>) in Brands By Traveling Salesman.

Source:-https://en.wikipedia.org/wiki/Fifth_normal_form

Note:- A table T is in fifth normal form (5NF) or Project-Join Normal Form (PJNF) if it cannot have a lossless decomposition into any number of smaller tables. 

29 of 36

  • Fifth normal form (5NF)
  • A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
  • 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
  • 5NF is also known as Project-join normal form (PJ/NF).

30 of 36

Example

SUBJECT

LECTURER

SEMESTER

Computer

Anshika

Semester 1

Computer

John

Semester 1

Math

John

Semester 1

Math

Akash

Semester 2

Chemistry

Praveen

Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. 

In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. 

But all three columns together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

31 of 36

p3

SEMESTER

SUBJECT

Semester 1

Computer

Semester 1

Math

Semester 1

Chemistry

Semester 2

Math

SUBJECT

LECTURER

Computer

Anshika

Computer

John

Math

John

Math

Akash

Chemistry

Praveen

SEMSTER

LECTURER

Semester 1

Anshika

Semester 1

John

Semester 1

John

Semester 2

Akash

Semester 1

Praveen

p2

p1

32 of 36

Inclusion Dependency

  • Inclusion dependency would happen if projecting R on its key attributes yields a relation that is contained in the relation obtained by projecting S on its key attributes.

33 of 36

Inclusion Dependency

  • Definition
    • Suppose Ri(Ax ..... Am) and Rj(Bx ..... Bp) are two relation schemes (not necessarily distinct) in a database scheme D = {R1 ..... Rn}. 
    • If X is a sequence of k distinct members of Ax ..... Am, and 
    • if Y is a sequence of k distinct members of Bx ..... Bp, 
    • then we say that the Inclusion Dependency (IND) Ri[X]⊆Rj[Y] holds in a database d if whenever ri and rj are relations in d, it must be the case that
      • ri[X]⊆rj[Y]. 

"Logical Database Design with Inclusion Dependencies" by Tok Wang LING and Cheng Hian GOH Department of Information Systems & Computer Science, National University of Singapore. 

34 of 36

axiomatization for INDs

  • IND1 (reflexivity): R[X]⊆R[X], if X is a sequence of distinct attributes of R. 

  • IND2 (projection & permutation): if R[A1 ..... Am] ⊆ S[B1 ..... Bm], then R[Ai1..... Aim] ⊆ S[Bi1 ..... Bim], for each sequence i1 ..... ik of distinct integers from { 1 ..... m }. 

  • IND3 (transitivity): if R[X]⊆S[Y] and S[Y]⊆T[Z], then R[X]⊆T[Z]. 

35 of 36

36 of 36