1 of 68

Module:5�Relational–Database Design

Ms. Khushbu Tikhe

Assistant Professor

Department of Electronics Engineering

SLRTCE, Mira Road

2 of 68

Pitfalls in Relational-Database designs

Relational database design requires that we find a “good” collection of relation schemas.

but A bad design may lead to

Repetition of Information.
Inability to represent certain information.

Design Goals:

Avoid redundant data
Ensure that relationships among attributes are represented
Facilitate the checking of updates for violation of database integrity constraints.

3 of 68

Pitfalls in Relational-Database designs

In general, the goal of a relational-database design is to generate a set of relation schemas that allows us to store information without unnecessary redundancy, yet also allows us to retrieve information easily. One approach is to design schemas that are in an appropriate normal form.
Among the undesirable properties that a bad design may have are.

1. Repetition of information

2. Inability to represent certain information

Example: The information concerning loans is now kept in one singlerelation, lending, which is defined over the relation schema.

Lending-schema=(branch-name, branch-city, assets, customer- name, loan-number, amount)

5 of 68

Problems:

1. add a new loan The loan is made by the Perryridge branch to Adams in the amount of $1500. Let the loan-number be L-31.

We add the tuple (Perryridge, Horseneck, 1700000,Adams, L-31, 1500)

ⅰ Repeating information wastes space.

ⅱ Repeating information complicates updating the database.

2. we cannot represent directly the information concerning a branch.(branch-name, branch-city,assets) unless there exists at least one loan at the branch.

The problem is that tuples in the lending relation require values for loan-number, amount and customer-name.

Solution: introduce null values to handle updates through views.

6 of 68

Among the undesirable properties that a bad design may have

3. Dependency of various attributes of relation:

Dependency specifies how to interpret attribute values in a tuple of relation are related with each other.
if database design is done carefully, followed by a systematic mapping into relation, most of the dependencies will have been accounted for the resulting design should have a simple structure.
ex: EMP(Emp_ID, Name, Salary)
Salary is dependent on Emp_ID, as each employee have specific salary.

4. Loss of information:

If a customer happens to have several loans from different branches, we cannot tell which loan belongs to which branch.
Although we have more tuples in branch-customer customer-loan, we actually have less information.
We are no longer able, in general, to represent in the database information about which customers are borrowers form which branch.

7 of 68

Design Guideliness for Relational Schema

Introduction:

Each relation schema consists of number of attributes, and the relational database schema consist of a number of relation schemas.

15 of 68

Note:

Spurious Tuples :�Spurious Tuples are those rows in a table, which occur as a result of joining two tables in wrong manner. They are extra tuples (rows) which might not be required.

16 of 68

Relational Decomposition

When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is required.
In a database, it breaks the table into multiple tables.
If the relation has no proper decomposition, then it may lead to problems like loss of information.
Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.

17 of 68

Types of Decomposition

Lossless join Decomposition
Dependency Preservation
No repetition of information

a. Lossless join Decomposition

If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.
The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.

18 of 68

Example

EMPLOYEE_DEPARTMENT table:

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

19 of 68

EMPLOYEE table:

DEPARTMENT table

20 of 68

Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look like:

Employee ⋈ Department

Hence, the decomposition is Lossless join decomposition.

21 of 68

b. Dependency Preservation

It is an important constraint of the database.
In the dependency preservation, at least one decomposed table must satisfy every dependency.
If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation R1(ABC).

c. No repetation of information

Decomposition that have done should not suffer from any repetition of information problem.
It is desirable not to have any redundancy in database.

22 of 68

Normalization Process

Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
The normal form is used to reduce redundancy from the database table.
Definition: Normalization is the process of designing a consistent database by minimizing redundancy and ensuring data intigrity through decomposition which is lossless.

23 of 68

Functional Dependencies

The functional dependency is a relationship that exists between two attributes. It typically exists between the primary key and non-key attribute within a table.

X → Y

The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

24 of 68

Example:�Consider an employee table:

26 of 68

Types of Functional Dependencies

Full Functional Dependency
Partial Functional Dependency
Multivalued dependency:
Trivial functional dependency:
Transitive dependency:

27 of 68

Full Functional Dependency

EXAMPLE:

28 of 68

Partial Functional Dependency

29 of 68

Salary is Partial Functionally Dependent on attribute pair Employee_ID and Project_ID.

30 of 68

Multivalued dependency

Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes in a single table.
A multivalued dependency is a complete constraint between two sets of attributes in a relation.

In this example, maf_year and color are independent of each other but dependent on car_model. In this example, these two columns are said to be multivalue dependent on car_model.

This dependence can be represented like this:

car_model -> maf_year

car_model-> colour

31 of 68

Trivial Functional dependency

The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in that attribute.
So, X -> Y is a trivial functional dependency if Y is a subset of X.
For example:

Consider this table with two columns Emp_id and Emp_name.

{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of {Emp_id,Emp_name}.

32 of 68

Non trivial functional dependency

Nontrivial dependency occurs when A->B holds true where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is considered as a non-trivial dependency.

Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name)

But CEO is not a subset of Company, and hence it's non-trivial functional dependency.

33 of 68

Transitive dependency

A transitive is a type of functional dependency which happens when it is indirectly formed by two functional dependencies.
if one attribute of relation is functionally dependent on other dependent attribute then such a dependency is called as transitive dependency.
Example:

{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the company name, we can know his age.

34 of 68

FD Properties(Armstrong's Axioms/Closures of FD)

Armstrong axioms are a complete set of inference rules or axioms, introduced and developed by William W. Armstrong in 1974. The inference rules are sound which is used to test logical inferences of functional dependencies.
The Axioms are a set of rules, that when applied to a specific set, generates a closure of functional dependencies.
Armstrong's Axioms has two different set of rules,
Axioms or primary rules
Axiom of Reflexivity (Subset Property)
Axiom of Augmentation (Append Property)
Axiom of Transitivity

2. Additional rules or Secondary rules

Union
Composition
Decomposition
Pseudo Transitivity

35 of 68

1. Axioms or primary rules

Let suppose T (k) with the set of attributes k be a relation scheme. Subsequently, we will represent subsets of k as A, B, C. The standard notation in database theory for the set of attributes is AB rather than A∪B.

Axiom of Reflexivity:

If a set of attributes is P and its subset is Q, then P holds Q. If Q ⊆ P, then P → Q. This property is called as Trivial functional dependency. Where P holds Q (P → Q) denote P functionally decides Q.

Axiom of Augmentation:

If P holds Q (P → Q) and R is a set of attributes, then PR holds QR (PR → QR). It means that a change in attributes in dependencies does not create a change in basic dependencies. If P → Q, then PR → QR for any R.

Axiom of Transitivity:

If P holds Q (P → Q) and Q holds R (Q → R), then P hold R (P → R). Where P holds R (P → R) denote P functionally decides R, same with P holds Q and Q holds R.

36 of 68

2. Additional rules or secondary rules

These rules can be derived from the above axioms.

Union:

If P holds Q (P → Q) and P holds R (P → R), then P → QR. If X → Y and X → Z, then X → YZ.

Composition:

If P holds Q (P → Q) and A holds B (A → B), then PA → QB.
proof,

P → Q (Given)
A → B (Given)
PA → QA (Augmentation of 1 and A)
PA → Q (Decomposition of 3)
PA → PB (Augmentation of 2 and P)
PA → B (Decomposition of 5)
PA → QB (Union 4 and 6)

37 of 68

2. Additional rules or secondary rules

Decomposition:

This rule is contrary of union rule. If P → QR, then P holds Q (P → Q) and P holds R (P → R). If X → YZ, then X → Y and X → Z.
proof,

P → QR (Given)
QR → Q (Reflexivity)
P → Q (Transitivity of 1 and 2)

Pseudo Transitivity:

If P → RQ and Q → S, then P → RS.
proof,

P → RQ (Given)
Q → S (Given)
RQ → RS (Augmentation of 2 and R)
P → RS (Transitivity of 1 and 3)

38 of 68

Keys and Attributes in Keys

39 of 68

Super Keys:- Example

41 of 68

Candidates Key :- Example

45 of 68

First Normal Form(1NF)

Normalization is the process of minimizing redundancy from a relation or set of relations.
Redundancy in relation may cause insertion, deletion and updation anomalies. So, it helps to minimize the redundancy in relations.
Normal forms are used to eliminate or reduce redundancy in database tables.

1. First Normal Form –

If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is in first normal form if it does not contain any composite or multi-valued attribute.
A relation is in first normal form if every attribute in that relation is singled valued attribute.
Defination: it defines that all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.

47 of 68

Minimize Domain Redundancy:

The first formal form will solve the group redundancy occure in domain value as it allow only single value from the domain of that attribute.
So 1NF will solve all problems related to Domain Redundancy.

48 of 68

Second Normal Form(2NF)

To be in second normal form, a relation must be in first normal form and relation must not contain any partial dependency.
A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is called partial dependency.

49 of 68

Example:

{Note that, there are many courses having the same course fee. }

Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency and so this relation is not in 2NF.

50 of 68

To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE

NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are 100 students taking C1 course, we dont need to store its Fee as 1000 for all the 100 records, instead once we can store it in the second table as the course fee for C1 is 1000.

51 of 68

Example 2 –

Consider following functional dependencies in relation R (A, B , C, D )

AB -> C [A and B together determine C]
BC -> D [B and C together determine D]

In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset of AB doesn’t determine any non-prime attribute.

Minimizing Tuple Redundancy:

2NF will avoid same tuples to be repeated in a table as it forces all non key attributes must be full functionally depends on primary key of a relation.
2NF will creat new table for each partial key with all its dependent attributes.

52 of 68

Third Normal Form(3NF)

A relation is in third normal form, if there is no transitive dependency for non-prime attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency X –> Y
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive dependency.

Example:

53 of 68

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −

Minimizing Group Redundancy:

3NF will avoid repeating groups in same table as it forces all non-prime attributes must be non-transitively depends on key of a relation.
3NF will create a new table for each transitive attritube and its dependent attributes.

54 of 68

Boyce-Codd Normal Form

BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomalies not dealt with by 3NF as originally defined.
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms.

BCNF states that −

For any non-trivial functional dependency, X → A, X must be a super-key.
In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip

and

Zip → City
Which confirms that both the relations are in BCNF.

55 of 68

Example

As you can see, we have also added some sample data to the table.
In the table above:
One student can enrol for multiple subjects. For example, student with student_id 101, has opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.

56 of 68

Well, in the table above student_id, subject together form the primary key, because using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one subject may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names are unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.

57 of 68

Why this table is not in BCNF?
In the table above, student_id, subject form primary key, which means subject column is a prime attribute.

But, there is one more dependency, professor → subject.

And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.

How to satisfy BCNF?
To make this relation(table) satisfy BCNF, we will decompose this table into two tables, student table and professor table.

Below we have the structure for both the tables.

59 of 68

Example2

Employee table.

61 of 68

Employee Table

Department table

62 of 68

Emp_Dept(Employee_ID, Department_ID)

EMP Dept Table

63 of 68

Key Points –

BCNF is free from redundancy.
If a relation is in BCNF, then 3NF is also also satisfied.
If all attributes of relation are prime attribute, then the relation is always in 3NF.
A relation in a Relational Database is always and at least in 1NF form.
Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1 attribute), then the Relation is always in 2NF( because no Partial functional dependency possible).
Sometimes going for BCNF form may not preserve functional dependency. In that case go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
There are many more Normal forms that exist after BCNF, like 4NF and more. But in real world database systems it’s generally not required to go beyond BCNF.

64 of 68

What is Multi-valued Dependency?

A table is said to have multi-valued dependency, if the following conditions are true,
For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have multi-valued dependency.
Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.

65 of 68

Example

As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths, and has two hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown below, because for one student, two hobbies exists, hence along with both the courses, these hobbies should be specified.

66 of 68

And, in the table above, there is no relationship between the columns course and hobby. They are independent of each other.

So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.

67 of 68

Dependency Diagram

A dependency diagram, shown in Figure, illustrates the various dependencies that might exist in a non-normalized table. A non-normalized table is one that has data redundancy in it.

The following dependencies are identified in this table:
ProjectNo and EmpNo, combined, are the PK.
Partial Dependencies:

ProjectNo —> ProjName
EmpNo —> EmpName, DeptNo,
ProjectNo, EmpNo —> HrsWork

Transitive Dependency:

DeptNo —> DeptName

68 of 68

References

https://slideplayer.com/slide/8068305/

https://www.javatpoint.com/dbms-first-normal-form
https://beginnersbook.com/2015/05/normalization-in-dbms/
https://www.tutorialspoint.com/dbms/database_normalization.htm#:~:text=Functional%20dependency%20(FD)%20is%20a,%2C%20...%2C%20Bn.
https://www.guru99.com/dbms-functional-dependency.html
https://www.geeksforgeeks.org/normal-forms-in-dbms/
https://www.studytonight.com/dbms/fourth-normal-form.php

1 of 68

2 of 68

3 of 68

4 of 68

5 of 68

6 of 68

7 of 68

8 of 68

9 of 68

10 of 68

11 of 68

12 of 68

13 of 68

14 of 68

15 of 68

16 of 68

17 of 68

18 of 68

19 of 68

20 of 68

21 of 68

22 of 68

23 of 68

24 of 68

25 of 68

26 of 68

27 of 68

28 of 68

29 of 68

30 of 68

31 of 68

32 of 68

33 of 68

34 of 68

35 of 68

36 of 68

37 of 68

38 of 68

39 of 68

40 of 68

41 of 68

42 of 68

43 of 68

44 of 68

45 of 68

46 of 68

47 of 68

48 of 68

49 of 68

50 of 68

51 of 68

52 of 68

53 of 68

54 of 68

55 of 68

56 of 68

57 of 68

58 of 68

59 of 68

60 of 68

61 of 68

62 of 68

63 of 68

64 of 68

65 of 68

66 of 68

67 of 68

68 of 68