1 of 38

Normalization

2 of 38

The Evils of Redundancy

  • Redundancy is at the root of several problems associated with relational schemas:
    • redundant storage, insert/delete/update anomalies
  • Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.
  • Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).
  • Decomposition should be used judiciously:
    • Is there a reason to decompose a relation?
    • What problems (if any) does the decomposition solve?

3 of 38

Functional Dependency

  • It is a constraint that specifies the relationship between two sets of attributes where one set can accurately determine the value of other sets.
  • It is denoted by an arrow "→".
  • The functional dependency of X on Y is represented by X → Y, where X is a set of attributes that is capable of determining the value of Y.
  • The attribute set X is called determinant and Y is called the dependent.

4 of 38

Let us understand functional dependency with example:

From the above , some valid functional dependencies are

empno → {ename, d_name, d_building}

empno → {dname}

d_name → {d_building}

Here are some invalid functional dependencies:

ename → {d_name}

d_building → {d_name}

5 of 38

Advantages

  • It plays a vital role to find the difference between good and bad database design.

  • They are used to mathematically express relations among database entities.

  • It helps to maintain the quality of data in the database.

  • It avoids data redundancy. Therefore, same data do not repeat at multiple locations in that database.

6 of 38

Armstrong’s axioms�

  • These are a complete set of inference rules, which are used to infer all the functional dependencies on a relational database.

  • There are 3 rules:
    • Reflexivity: If Y ⊆ X, then XY.
    • Augmentation: If XY, then XZYZ for any Z
    • Transitivity: If X Y and YZ, then XZ.

  • Couple of additional rules :
    • Union: If X Y and X Z, then X YZ
    • Decomposition: If XYZ, then XY and XZ

7 of 38

Types of FDs

1. Trivial FD

  • If dependent is subset to determinant, then such type of fd is called trivial fd.
  • For example,

{empno, ename} ename

empno empno

2. Non – trivial FD

  • If dependent is not a subset to determinant, then such type of fd is called non-trivial fd.
  • For example,

empno ename.

{empno, ename} age

8 of 38

Types of FDs (cont.)

3. Transitive FD

  • If dependent is indirectly dependent on determinant, then such type of fd is called transitive dependency.
  • For example,

empno d_name and d_name d_building then empno d_building .

4. Multi – valued FD

  • If the entities of the dependent set are not dependent on each other. i.e., If X → {Y, Z} and there exists no functional dependency between Y and Z, then it is called a multi - valued fd.
  • For example,

empno {ename, deptno}

is a multi - valued fd since the dependents ename and deptno are not dependent on each other.

9 of 38

Closure of an Attribute Set

  • The set of all those attributes which can be functionally determined from an attribute set is called as a closure of that attribute set.
  • Closure of attribute set {X} is denoted as {X}+.
  • It is useful in finding the candidate key of a given relation with its functional dependency.
  • Algorithm

Closure = X;

Repeat until there is no change

{

If there is an FD U V in F such that

U ⊆ closure

Then set closure = closure U V

}

10 of 38

Example

  • Let R (A, B, C, D) be a relation with the following functional dependencies:

F { AB, B C, CD}.

Now, let us find the closure of some attributes and attribute sets-

Closure of attribute A-

A+ = { A }

= { A , B , C } ( Using A B )

= { A , B , C } ( Using B C )

= { A , B , C , D } ( Using C D )

Thus,

A+ = { A , B , C , D } 

Since A can determine all the attributes of the given relation therefore A is a candidate key.

11 of 38

Exercises

  1. Suppose, a relational schema R (A, B, C, D, E) and set of functional dependencies:

F { ABC,

CDE,

BD,

EA }

Compute CD+, E+.

2. Suppose, a relational schema R (A, B, C, D, E, F) and set of functional dependencies:

F { ABC,

BC AD,

D E,

CF B }

Compute BCF+, CD+,D+?

12 of 38

Normal forms

  • Normalization is the process of minimizing redundancy from a relation or set of relations.
  • It is a process of decomposing large complex tables into simpler tables.
  • Normal forms are used to eliminate or reduce redundancy in database tables.
  • Levels of Normalization: There are six normal forms. They are:
    1. First Normal Form (1NF)
    2. Second Normal Form (2NF)
    3. Third Normal Form (3NF)
    4. BCNF (3.5 NF)
    5. Fourth Normal Form (4NF)
    6. Fifth Normal Form (5NF)

13 of 38

What is redundancy & Why should we reduce it?

  • Repetition of data increases the size of the database.

  • Other issues like
    • Insertion problems
    • Deletion problems
    • Updation problems

14 of 38

15 of 38

Insertion anomaly

16 of 38

Deletion anomaly

17 of 38

Deletion anomaly

18 of 38

Updation Anomaly

19 of 38

Updation anomaly

20 of 38

Updation anomaly

21 of 38

Now the million dollar question is “How Normalization will solve all these problems?”

?

22 of 38

23 of 38

24 of 38

25 of 38

26 of 38

27 of 38

First Normal Form

  • For a table to be in the First Normal Form, it should follow the following 4 rules:
    1. It should only have single(atomic) valued attributes/columns.
    2. Values stored in a column should be of the same domain
    3. All the columns in a table should have unique names.
    4. The order in which data is stored, does not matter.

Example

28 of 38

Second Normal Form

A relation is said to be in 2NF if and only if:

    • It is in first normal form.
    • There are no partial dependencies should exist in a relation.

What is partial dependency?

  • A partial dependency is a dependency where a portion of the key attribute determines non-key attribute(s).

29 of 38

Consider the following relation <student-course> to understand 2NF:

Pd: sid🡪sname, cid🡪cname

30 of 38

  • To remove partial dependency and make the relation comply with 2NF, it needs to be decomposed into two smaller relations

R1(sid, sname, cid), R2(cid, cname).

31 of 38

Third Normal form

  • A relation is said to be in 3NF if and only if :
    • It is in 2NF
    • There is no transitive functional dependencies should exist in a relation.
  • Let us take relation Course to better understand 3NF:

32 of 38

The following dependencies exist:

    • cid {c_name, c_fee, marks}
    • marks grade

Td: cid grade

The table is in 1NF and in 2NF. But there is a transitive dependency between marks and grade, this violates the 3NF.

To get the third normal form (3NF), we have to put the grade in a separate table together with the clearing number to identify it.

33 of 38

34 of 38

BCNF (3.5 Normal Form)

  • A relation is said to be in BCNF if and only if:
    • It is in 3NF.
    • Every determinant should be a candidate key.
  • Example: Suppose there is a company where employees work in more than one department. They store the data like this:
  • Let us see an example − SportsClub

  • FDs:
  • package🡪 ground

35 of 38

Now the above tables are in BCNF.

Candidate key for <Package> table are Package and Ground

Candidate key for <TomorrowBookings> table are {Ground, Begin_Time} and {Ground, End_Time}

36 of 38

Fourth Normal Form

  • A relation is said to be in 4NF if and only if
    • It is in BCNF
    • There must be no multi- valued dependency.
  • To understand it clearly, consider a relation with Subject, Lecturer and Books.

  • Key = {subject, lecturer, books}.

37 of 38

  • The table satisfies the rule of BCNF. But the table still have data redundancy due to multi valued dependencies:
  • subject → lecturer
  • subject → books
  • lecturer and books are also independent to each other
  • Therefore, to satisfy 4NF, it needs to be decomposed into R1 (subject, lecturer) and R2 (subject, books)
  • See how data redundancy is removed by decomposing it into 4NF:

38 of 38