2 of 102

INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW

A database is a collection of data, typically describing the activities of one or more related organizations. For example, a university database might contain information about the following:

• Entities such as students, faculty, courses, and classrooms.

• Relationships between entities, such as students' enrollment in courses, faculty teaching courses, and the use of rooms for courses.

A database management system, or DBMS, is software designed to assist in maintaining and utilizing large collections of data. The need for such systems, as well as their use, is growing rapidly. The alternative to using a DBMS is to store the data in files and write application-specific code to manage it.

3 of 102

INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW

In order to facilitate these functions, DBMS has the following key components:

Software.

DBMS is primarily a software system that can be considered as a management console or an interface to interact with and manage databases. The interfacing also spreads across real-world physical systems that contribute data to the backend databases. The OS, networking software, and the hardware infrastructure is involved in creating, accessing, managing, and processing the databases.

Data.

DBMS contains operational data, access to database records and metadata as a resource to perform the necessary functionality. The data may include files with such as index files, administrative information, and data dictionaries used to represent data flows, ownership, structure, and relationships to other records or objects.

Procedures.

While not a part of the DBMS software, procedures can be considered as instructions on using DBMS. The documented guidelines assist users in designing, modifying, managing, and processing databases.

Database languages.

These are components of the DBMS used to access, modify, store, and retrieve data items from databases; specify database schema; control user access; and perform other associated database management operations. Types of DBMS languages include Data Definition Language (DDL), Data Manipulation Language (DML), Database Access Language (DAL) and Data Control Language (DCL).

4 of 102

INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW

In order to facilitate these functions, DBMS has the following key components:

• Query processor.

As a fundamental component of the DBMS, the query processor acts as an intermediary between users and the DBMS data engine in order to communicate query requests. When users enter an instruction in SQL language, the command is executed from the high-level language instruction to a low-level language that the underlying machine can understand and process to perform the appropriate DBMS functionality. In addition to instruction parsing and translation, the query processor also optimizes queries to ensure fast processing and accurate results.

• Runtime database manager.

A centralized management component of DBMS that handles functionality associated with runtime data, which is commonly used for context-based database access. This component checks for user authorization to request the query; processes the approved queries; devises an optimal strategy for query execution; supports concurrency so that multiple users can simultaneously work on same databases; and ensures integrity of data recorded into the databases.

5 of 102

INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW

In order to facilitate these functions, DBMS has the following key components:

•Database manager.

Unlike the runtime database manager that handles queries and data at runtime, the database manager performs DBMS functionality associated with the data within databases. Database manager allows a set of commands to perform different DBMS operations that include creating, deleting, backup, restoring, cloning, and other database maintenance tasks. The database manager may also be used to update the database with patches from vendors.

•Database engine.

This is the core software component within the DBMS solution that performs the core functions associated with data storage and retrieval. A database engine is also accessible via APIs that allow users or apps to create, read, write, and delete records in databases.

•Reporting.

The report generator extracts useful information from DBMS files and displays it in structured format based on defined specifications. This information may be used for further analysis, decision making, or business intelligence.

6 of 102

INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW

Why Use DBMS?

• To develop software applications In less time.

• Data independence and efficient use of data.

• For uniform data administration.

• For data integrity and security.

• For concurrent access to data, and data recovery from crashes.

• To use user-friendly declarative query language.

Database Applications

• Airlines: reservations, schedules, etc.

• Telecom: calls made customer details, network usage, etc.

• Universities: registration, results, grades, etc.

• Sales: products, purchases, customers, etc.

• Banking: all transactions etc.

7 of 102

INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW

Components of DBMS

•

8 of 102

Database Architecture

9 of 102

Database Architecture

The architecture of a database system is greatly influenced by the underlying computer system on which the database system runs.
Database systems can be centralized, or client-server, where one server machine executes work on behalf of multiple client machines.
Database systems can also be designed to exploit parallel computer architectures. Distributed databases span multiple geographically separated machines
Most users of a database system today are not present at the site of the database system, but connect to it through a network. We can therefore differentiate between client machines, on which remote database users work, and server machines, on which the database system runs.

10 of 102

Database Architecture

Database applications are usually partitioned into two or three parts, as in Figure .
In a two-tier architecture, the application resides at the client machine, where it invokes database system functionality at the server machine through query language statements. Application program interface standards like ODBC and JDBC are used for interaction between the client and the server.
In contrast, in a three-tier architecture, the client machine acts as merely a front end and does not contain any direct database calls. Instead, the client end communicates with an application server, usually through a forms interface.
The application server in turn communicates with a database system to access data.

11 of 102

Data Models in Database Technologies

1. Introduction to Data Models
2. Types of Data Models
3. Importance of Data Modeling

12 of 102

Data Models�

A data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints.
It provides a structured way to represent and organize data, ensuring consistency and integrity in database systems.

13 of 102

Types of Data Model Structures�

Relational Model:�The relational model shows the logical structure of a database, including the relationships and constraints that determine how data can be stored and accessed.
Entity-Relationship Model:�The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. It is widely used in database design.
Semi-structured Data Model:�The semi-structured data model permits the specification of data where individual data items of the same type may have different sets of attributes, in contrast to traditional data models.

14 of 102

Benefits of Data ModeLling�

Data Organization:�Data modelling allows for the identification of possible relationships between different pieces of information, determining the types of queries that can be run on the database.
Standardization:�It provides a standardized method for defining and formatting database contents, ensuring consistency and integrity.
Communication:�Data models create a visual representation of an information system, facilitating effective communication of data structures and business rules.

15 of 102

Process of Data Modeling�

Conceptual Modeling:�Involves identifying the highest-level relationships between different entities.
Logical Modeling:�Focuses on defining the structure of the data elements and the relationships between them.
Physical Modeling:�Identifies the table structures that will be built in the database, including all tables, columns, primary keys, and foreign keys.

16 of 102

Types of Data Models�

17 of 102

Relational Data Model�

The relational model uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations.

18 of 102

Entity-Relationship Data Model�

Represents the relationships between different entities in a database, focusing on the logical structure and the connections between entities.
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects.
An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in database design.

19 of 102

Semi-structured Data Model�

It permits the specification of data where individual data items of the same type may have different sets of attributes. This contrasts the above data models, where every data item of a particular type must have the same set of attributes. JSON and XML are widely used semi-structured data representations.

20 of 102

Object-Based Data Model�

Utilizes object-oriented concepts like Java and C++ to represent data, allowing for complex data relationships and encapsulation of data and behaviour.

21 of 102

Importance of Data Modeling�

22 of 102

Organizing Data Effectively�

Data Models are used to Organize Data Effectively:�Data modeling allows for the effective organization of data, ensuring that information is structured in a way that supports efficient storage, retrieval, and manipulation.
Example:�By using data models, organizations can ensure that their databases are structured to support their specific business needs and requirements.

Facilitating Communication

Data Models effectively Facilitate Communication:�Data models provide a visual representation of an information system, facilitating effective communication of data structures and business rules.
Example:�By using data models, teams can communicate more effectively about the structure and organization of the data, leading to better collaboration and understanding.

Importance of Data Modeling�

23 of 102

Enabling Data Analysis�

Data Models Enable Data Analysis:�Data modeling allows for the identification of possible relationships between different pieces of information, determining what type of queries can be run on the database.
Example:�By using data models, analysts can understand the relationships between different data points and structures, enabling more effective data analysis and decision-making.

Ensuring Data Integrity

Data Models Help in Ensuring Data Integrity:�Data models provide a structured way to represent and organize data, ensuring consistency and integrity.

Example:�By using data models, organizations can ensure that their databases maintain data integrity, preventing errors and inconsistencies in the data.

Importance of Data Modeling�

24 of 102

Database Schema and Schema Diagrams

25 of 102

Introduction to Database Schema

Definition:�A database schema is the logical design of the database.
Database schema vs. database instance:�Database schema is the logical design of the database.�Database instance is a snapshot of the data in the database at a given instant in time.

26 of 102

Relation Instance and Relation Schema

Relation Instance:�Corresponds to the programming-language notion of a variable.
Relation Schema:�Corresponds to the programming-language notion of type definition.
The value of a given variable may change with time; similarly the contents of a relation instance may change with time as the relation is updated. In contrast, the schema of a relation does not generally change.

27 of 102

Relating Tuples

Using common attributes in relation schemas is one way of relating tuples of distinct relations

Suppose we wish to find the information about all the instructors who work in the Watson building:

section (course_id, sec_id, semester, year, building, room_number, time_slot_id)
teaches (ID, course_id, sec_id, semester, year)

28 of 102

Other relation examples

student (ID, name, dept_name, tot cred)
advisor (s_id, i_id)
takes (ID, course_id, sec_id, semester, year, grade)
classroom (building, room_number, capacity)
time slot (time_slot_id, day, start_time, end_time)

29 of 102

Schema Diagrams

A database schema, along with primary key and foreign-key constraints, can be depicted by schema diagrams.
Each relation appears as a box, with the relation name at the top in blue and the attributes listed inside the box.

30 of 102

Schema Diagram for the University Database

31 of 102

Attributes

Primary-key attributes are shown underlined.
Foreign-key constraints appear as arrows from the foreign-key attributes of the referencing relation to the primary key of the referenced relation.
We use a two-headed arrow, instead of a single-headed arrow, to indicate a referential integrity constraint that is not a foreign-key constraints.
Many database systems provide design tools with a graphical user interface for creating schema diagrams.

32 of 102

Relational Query Languages

A query language is a language in which a user requests information from the database.
These languages are usually on a level higher than that of a standard programming language.
Query languages can be categorized as either procedural or nonprocedural.
In a procedural language, the user instructs the system to perform a sequence of operations on the database to compute the desired result.
In a nonprocedural language, the user describes the desired information without giving a specific procedure for obtaining that information.
There are a number of “pure” query languages: The relational algebra is procedural, whereas the tuple relational calculus and domain relational calculus are nonprocedural. These query languages are terse and formal, lacking the “syntactic sugar” of commercial languages, but they illustrate the fundamental techniques for extracting data from the database

33 of 102

INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW

Advantages of DBMS

A DBMS manages data and has many benefits. These are:

• Data independence: Application programs should be as free or independent as possible from details of data representation and storage. DBMS can supply an abstract view of the data for insulating application code from such facts.

• Efficient data access: DBMS utilizes a mixture of sophisticated concepts and techniques for storing and retrieving data competently. This feature becomes important in cases where the data is stored on external storage devices.

• Data integrity and security: If data is accessed through the DBMS, the DBMS can enforce integrity constraints on the data.

• Data administration: When several users share the data, integrating the administration of data can offer significant improvements. Experienced professionals understand the nature of the data being managed and can be responsible for organizing the data representation to reduce redundancy and make the data to retrieve efficiently.

34 of 102

FILE SYSTEMS VS DBMS

What is File System?

A File Management system is a DBMS that allows access to single files or tables at a time. In a File System, data is directly stored in set of files. It contains flat files that have no relation to other files (when only one table is stored in single file, then this file is known as flat file).

In a typical file processing system, each and every subsystem of the information system will have its own set of files.

ii. The typical file-oriented system is supported by a conventional operating system. Permanent records are stored in various files and a number of different application programs are written to extract records from and add records to the appropriate files.

To understand the need for a DBMS, let us consider a motivating scenario:

A company has a large collection (say, 500 GB3 ) of data on employees, departments, products, sales, and so on. This data is accessed concurrently by several employees. Questions about the data must be answered quickly, changes made to the data by different users must be applied consistently, and access to certain parts of the data (e.g., salaries) must be restricted.

35 of 102

FILE SYSTEMS VS DBMS

Drawbacks of File system

• Data redundancy: Data redundancy refers to the duplication of data, lets say we are managing the data of a college where a student is enrolled for two courses, the same student details in such case will be stored twice, which will take more storage than needed. Data redundancy often leads to higher storage costs and poor access time.

• Data inconsistency: Data redundancy leads to data inconsistency, lets take the same example that we have taken above, a student is enrolled for two courses and we have student address stored twice, now lets say student requests to change his address, if the address is changed at one place and not on all the records then this can lead to data inconsistency.

• Data Isolation: Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult.

• Dependency on application programs: Changing files would lead to change in application programs.

• Atomicity issues: Atomicity of a transaction refers to “All or nothing”, which means either all the operations in a transaction executes or none.

For example: Lets say Steve transfers 100$ to Negan’s account. This transaction consists multiple operations such as debit 100$ from Steve’s account, credit 100$ to Negan’s account. Like any other device, a computer system can fail lets say it fails after first operation then in that case Steve’s account would have been debited by 100$ but the amount was not credited to Negan’s account, in such case the rollback of operation should occur to maintain the atomicity of transaction. It is difficult to achieve atomicity in file processing systems.

• Data Security: Data should be secured from unauthorised access, for example a student in a college should not be able to see the payroll details of the teachers, such kind of security constraints are difficult to apply in file processing systems.

36 of 102

FILE SYSTEMS VS DBMS

File Management System	Database Management System
File System is a general, easy-to-use system to store general files which require less security and constraints.	Database management system is used when security constraints are high.
Data Redundancy is more in file management system.	Data Redundancy is less in database management system.
Data Inconsistency is more in file system.	Data Inconsistency is less in database management system.
Centralisation is hard to get when it comes to File Management System.	Centralisation is achieved in Database Management System.
User locates the physical address of the files to access data in File Management System.	In Database Management System, user is unaware of physical address where data is stored.
Security is low in File Management System.	Security is high in Database Management System.
File Management System stores unstructured data as isolated data files/entities.	Database Management System stores structured data which have well defined constraints and interrelation.

37 of 102

Database Design and ER Diagrams -Entities, Attributes, and Entity Sets

Database Design is a collection of processes that facilitate the designing, development, implementation and maintenance of enterprise data management systems.

The main objectives of database designing are to produce logical and physical designs models of the proposed database system.
The logical model concentrates on the data requirements and the data to be stored independent of physical considerations. It does not concern itself with how the data will be stored or where it will be stored physically.
The physical data design model involves translating the logical design of the database onto physical media using hardware resources and software systems such as database management systems (DBMS).

38 of 102

The database design process can be divided into six steps.

The ER model is most relevant to the first three steps.

1. Requirements Analysis:

The very first step in designing a database application is to understand what data is to be stored in the database, what applications must be built on top of it, and what operations are most frequent and subject to performance requirements. In other words,

we must find out what the users want from the database.

2. Conceptual Database Design:

The information gathered in the requirements analysis step is used to develop a high-level description of the data to be stored in the database, along with the constraints known to hold over

this data. This step is often carried out using the ER model

3. Logical Database Design: We must choose a DBMS to implement our database design, and convert the conceptual database design into a database schema in the data model of the chosen DBMS. We will consider only relational DBMSs, and therefore, the task in the logical design step is to convert an ER schema into a relational database schema.

39 of 102

ENTITIES, ATTRIBUTES, AND ENTITY SETS

An entity is an object in the real world that is distinguishable from other objects.
An entity set is a group of similar entities and these entities can have attributes.
In terms of DBMS, an entity is a table or attribute of a table in database, so by showing relationship among tables and their attributes, ER diagram shows the complete logical structure of a database

In a database management system (DBMS), an attribute is a piece of data that describes an entity.

For example, in a STUDENT database, the attributes might be Student id, Age, name, address, and phone number

40 of 102

ENTITIES, ATTRIBUTES, AND ENTITY SETS

An entity set is a group of distinct entities that have the same properties.
For example, a customer entity might have the name, address, and phone number attributes.
A customer entity set would then be a group of customers with those same attributes.
Entity sets are used to organize data in a database
An entity set is represented by a rectangle
An attribute is represented by an oval.
Each attribute in the primary key is underlined.

41 of 102

ENTITIES, ATTRIBUTES, AND ENTITY SETS

Symbol	Shape Name	Symbol Description
Entities
	Entity	An entity is represented by a rectangle which contains the entity’s name.
	Weak Entity	An entity that cannot be uniquely identified by its attributes alone. The existence of a weak entity is dependent upon another entity called the owner entity. The weak entity’s identifier is a combination of the identifier of the owner entity and the partial key of the weak entity.

42 of 102

ENTITIES, ATTRIBUTES, AND ENTITY SETS

Strong relationship

Weak relationship

43 of 102

ER diagram

An ER diagram shows the relationship among entity sets.
An entity set is a group of similar entities and these entities can have attributes. In terms of DBMS, an entity is a table or attribute of a table in database, so by showing relationship among tables and their attributes, ER diagram shows the complete logical structure of a database.

A simple ER Diagram:

In the following diagram we have two entities Student and College and their relationship. The relationship between Student and College is many to one as a college can have many students however a student cannot study in multiple colleges at the same time. Student entity has attributes such as Stu_Id,Stu_Name&Stu_Addr and College entity has attributes such as Col_ID&Col_Name.

Rectangle: Represents Entity sets.�Ellipses: Attributes�Diamonds: Relationship Set�Lines: They link attributes to Entity Sets and Entity sets to Relationship Set�Double Ellipses: Multivalued Attributes�Dashed Ellipses: Derived Attributes�Double Rectangles: Weak Entity Sets�Double Lines: Total participation of an entity in a relationship set

44 of 102

ER diagram

In ER diagram, attributes associated with an entity set may be of the following types-

45 of 102

ER diagram

Simple Attributes-
Simple attributes are those attributes which can not be divided further.

Composite Attributes-

Composite attributes are those attributes which are composed of many other simple attributes.

Here, the attributes “Name” and “Address” are composite attributes as they are composed of many other simple attributes.

46 of 102

ER diagram

Single Valued Attributes- Single valued attributes are those attributes which can take only one value for a given entity from an entity set.
Example-Here, all the attributes are single valued attributes as they can take only one specific value for each entity.

Multi Valued Attributes-

Multi valued attributes are those attributes which can take more than one value for a given entity from an entity set.

Example-Here, the attributes “Mob_no” and “Email_id” are multi valued attributes as they can take more than one values for a given entity.

47 of 102

ER diagram

5. Derived Attributes- Derived attributes are those attributes which can be derived from other attribute(s).
Example-Here, the attribute “Age” is a derived attribute as it can be derived from the attribute “DOB”.

Key Attributes-

Key attributes are those attributes which can identify an entity uniquely in an entity set.

Example-Here, the attribute “Roll_no” is a key attribute as it can identify any student uniquely.

48 of 102

ER diagram

Relationship
A relationship is represented by diamond shape in ER diagram; it shows the relationship among entities. There are four types of relationships:
1. One to One

2. One to Many

3.Many to One-For example – many students can study in a single college but a student cannot study in many colleges at the same time.

4. Many to Many

49 of 102

Describing and storing data in a DBMS

Introduction:
• A data model is a collection of high-level data description constructs that hide many low-level storage details
• A semantic data model is a more abstract, high-level data model that makes it easier for a user to come up with a good initial description of the data in an enterprise.
• A database design in terms of a semantic model serves as a useful starting point and is subsequently translated into a database design in terms of the data model the DBMS actually supports.
• A widely used semantic data model called the entity-relationship (ER) model allows us to pictorially denote entities and the relationships among them

50 of 102

Describing and storing data in a DBMS

The Relational Model
• The central data description construct in this model is relation, which can be thought of as a set of records.
• A description of data in terms of a data model is called a schema.
• The schema for a relation specifies its name, the name of each field or attribute or column.
• Example: student information in a university database my be stored in a relation with the following schema (with 5 fields):

Students(sid: string, name: string, login: string, age: integer, gpa: real)

SID	NAME	ZOGIN	AGE	GPA
53666	Jones	jones@cs	18	3.4
53688	Smith	smith@ee	18	3.2
53650	Smith	smith@math	19	3.8
53831	Madayan	madayan@music	11	1.8
53832	Guldu	guldu@music	12	2.0

51 of 102

Describing and storing data in a DBMS

• Each row in the Students relation is a record that describes a student.
Every row follows the schema of the Student relation and schema can therefore be regarded as a template for describing a student.
• Other notable models: hierarchical model, network model, object-oriented model, and the object-relational model.

SID	NAME	ZOGIN	AGE	GPA
53666	Jones	jones@cs	18	3.4
53688	Smith	smith@ee	18	3.2
53650	Smith	smith@math	19	3.8
53831	Madayan	madayan@music	11	1.8
53832	Guldu	guldu@music	12	2.0

52 of 102

Describing and storing data in a DBMS

• Data abstraction -: Data abstraction is one of the fundamental characteristic of any database, which helps in making data more accurate and easy to use. Abstraction refers to the act of essential feathers without including background details. Data abstraction refers to the act of representing data without giving details that how data are stored or maintained. There are different levels of abstraction

53 of 102

Describing and storing data in a DBMS

• Physical Level -: It is called the internal level. This level is the lowest level of abstraction which specified storage detail that how the data are store in disk. This layer describe manner in which records are stored either as the collection of pages or as the collection of records.
• Logical Level -: This is the conceptual view .It is next higher level of abstraction what data are stored in the database and what relationship exists among those data. There is only one conceptual schema per database. This schema also contains the method of deriving the object on the conceptual view from the internal views. This level of abstraction is used by database administration.
• View Level-: View level is the highest level of abstraction which describes different views of the entire database. These views are designed according to the requirement of user who want to access only part of database. In database has many several views, according to the demand of individual user or groups of the user. The data in these views are not exactly store in dbms.

54 of 102

Relationships And Relationship Sets

• Relationship
An association among two or more entities. Example: the relationship between a customer and a Sage 2-weight flyrod purchased at the fly shop
Relationship set

A set of relationships of the same type

ssn

Name

lot

Employees

55 of 102

RELATIONSHIPS AND RELATIONSHIP SETS

A relationship is an association among two or more entities. For example, we may have the relationship that Attishoo works in the pharmacy department.

ssn

Name

lot

Employees

56 of 102

Keys

We must have a way to specify how entities within a given entity set are distinguished.
Conceptually, individual entities are distinct; from a database perspective, however, the difference among them must be expressed in terms of their attributes.
Therefore, the values of the attribute values of an entity must be such that they can uniquely identify the entity. In other words, no two entities in an entity set are allowed to have exactly the same value for all attributes.
A key allows us to identify a set of attributes that suffice to distinguish entities from each other. Keys also help uniquely identify relationships, and thus distinguish relationships from each other.
key constraints : key constraints can refer to different concepts, particularly in the realms of databases and cryptography:

57 of 102

Keys

Why we need a Key?
In real world applications, number of tables required for storing the data is huge, and the different tables are related to each other as well.
Also, tables store a lot of data in them. Tables generally extends to thousands of records stored in them, unsorted and unorganised.
Now to fetch any particular record from such dataset, you will have to apply some conditions, but what if there is duplicate data present and every time you try to fetch some data by applying certain condition, you get the wrong data. How many trials before you get the right data?
To avoid all this, Keys are defined to easily identify any row of data in a table.

58 of 102

Keys

SQL PRIMARY KEY Constraint
The PRIMARY KEY constraint uniquely identifies each record in a table.
Primary keys must contain UNIQUE values, and cannot contain NULL values.
A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).
PRIMARY KEY in SQL
PRIMARY KEY in SQL is a column (or group of columns) that uniquely identifies the records in that table. A primary key must contain unique values and can not have any NULL value.
There can only be one primary key in a table, but that primary key can consist of one or more columns. When there are two or more columns in the primary key it is called a composite key.
A primary key automatically has a UNIQUE constraint defined on it, and it ensures that there are no duplicate or NULL values in that column.

59 of 102

Keys

SQL PRIMARY KEY Constraint
There can only be one primary key in a table, but that primary key can consist of one or more columns. When there are two or more columns in the primary key it is called a composite key.
In the above picture we have a Score table which stores the marks scored by a student in a particular subject. In this table student_id and subject_id together will form the primary key; hence it is a composite key.

60 of 102

Keys

SQL PRIMARY KEY Constraint
CREATE TABLE Persons (�PersonID int NOT NULL PRIMARY KEY,�LastName varchar(255) NOT NULL,�FirstName varchar(255),�Age int�);

61 of 102

Unique Constraint

Ensures that values in a column (or a set of columns) are unique across rows.
A unique constraint is a rule applied to a database column or a set of columns to ensure that all values in the column(s) are distinct from one another.
This means that no two rows in a table can have the same value in the column(s) where the unique constraint is applied.

62 of 102

Unique Constraint

1.Purpose: To enforce data integrity by preventing duplicate values in a specified column or columns. This is useful for fields where duplicate entries would be invalid or problematic, such as user IDs or email addresses.
2.Syntax: The exact syntax for creating a unique constraint can vary slightly depending on the database management system (DBMS) you’re using. Here are examples for a few common systems:

- SQL (Standard):

CREATE TABLE employees (

id INT PRIMARY KEY,

email VARCHAR(255) UNIQUE

);

63 of 102

Unique Constraint

1.Purpose: To enforce data integrity by preventing duplicate values in a specified column or columns. This is useful for fields where duplicate entries would be invalid or problematic, such as user IDs or email addresses.
2.Syntax: The exact syntax for creating a unique constraint can vary slightly depending on the database management system (DBMS) you’re using. Here are examples for a few common systems:

- SQL (Standard):

CREATE TABLE Persons (� ID int NOT NULL UNIQUE,� LastName varchar(255) NOT NULL,� FirstName varchar(255),� Age int�);

64 of 102

Foreign Key Constraint

Ensures referential integrity by enforcing relationships between tables.
A foreign key constraint is a rule used in relational databases to maintain referential integrity between two tables.
It ensures that a value in one table (the child table) matches a value in another table (the parent table) and that relationships between tables are consistent.

65 of 102

Normalization

Normalization is the process of organizing data in a database. It includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

66 of 102

Functional Dependency in DBMS

Functional dependencies are used to mathematically express relations among database entities
Functional Dependency is the relationship between attributes(characteristics) of a table related to each other. The functional dependency of A on B is represented by

A → B,

where A and B are the attributes of the relation.

Example:

67 of 102

Functional Dependency in DBMS

What is Functional Dependency in DBMS?
Relational database is a collection of data stored in rows and columns. Columns represent the characteristic of data while each row in a table represents a set of related data, and every row in the table has the same structure. The row is sometimes referred to as a tuple in DBMS.
Have a look at the Employee table below. It contains attributes as column values, namely

Employee_Id

Employee_Name

Employee_Department

Salary

Employee_Id	Employee_Name	Employee_Department	Salary
1	Ryan	Mechanical	$5000
2	Justin	Biotechnology	$5000
3	Andrew	Computer Science	$8000
4	Felix	Human Resource	$10000

Employee Table

68 of 102

Functional Dependency in DBMS

Now that we are clear with the jargon related to functional dependency, let's discuss what functional dependency is.
Functional Dependency in DBMS, as the name suggests it is the relationship between attributes(characteristics) of a table related to each other.
A relation consisting of functional dependencies always follows a set of rules called RAT rules. They are proposed by William Armstrong in 1974.
It helps in maintaining the quality of data in the database, and the core concepts behind database normalization are based on functional dependencies.

Employee_Id	Employee_Name	Employee_Department	Salary
1	Ryan	Mechanical	$5000
2	Justin	Biotechnology	$5000
3	Andrew	Computer Science	$8000
4	Felix	Human Resource	$10000

Employee Table

69 of 102

Functional Dependency in DBMS

Types of Functional Dependencies in DBMS
Trivial functional dependency
Non-Trivial functional dependency
Multivalued functional dependency
Transitive functional dependency

Employee Table

70 of 102

Functional Dependency in DBMS

Types of Functional Dependencies in DBMS
Trivial functional dependency
Non-Trivial functional dependency
Multivalued functional dependency
Transitive functional dependency

Employee Table

71 of 102

Functional Dependency in DBMS

Trivial Functional Dependency in DBMS
In Trivial functional dependency, a dependent is always a subset of the determinant. In other words, a functional dependency is called trivial if the attributes on the right side are the subset of the attributes on the left side of the functional dependency.
X → Y is called a trivial functional dependency if Y is the subset of X.
For example, consider the Employee table below.
Here, { Employee_Id, Name } → { Name } is a Trivial functional dependency, since the dependent Name is the subset of determinant { Employee_Id, Name }.
{ Employee_Id } → { Employee_Id }, { Name } → { Name } and { Age } → { Age } are also Trivial.

Employee Table

Employee_Id	Name	Age
1	Zayn	24
2	Phobe	34
3	Hikki	26
4	David	29

72 of 102

Functional Dependency in DBMS

Multivalued Functional Dependency in DBMS
In Multivalued functional dependency, attributes in the dependent set are not dependent on each other.
For example, X → { Y, Z }, if there exists is no functional dependency between Y and Z, then it is called as Multivalued functional dependency.
For example, consider the Employee table below.

Employee Table

Employee_Id	Name	Age
1	Zayn	24
2	Phobe	34
3	Hikki	26
4	David	29
4	Phobe	24

73 of 102

Functional Dependency in DBMS

Functional dependency in DBMS is an important concept that describes the relationship between attributes (columns) in a table. It shows that the value of one attribute determines the other. In this article, we will learn about functional dependencies and their types.

What is Functional Dependency in DBMS?
A dependency function (FD) is a database constraint that determines the relationship of one attribute to another in a database management system (DBMS). Functional dependencies help maintain the quality of data in the database. Functional dependence is a relationship that exists between two attributes. It usually exists between the primary key and non-prime attributes in the table.

Example: X -> Y

In this case, the left side of the arrow is the determinant and the right of the arrow is dependent. X will be the primary attribute and Y will be a non-prime attribute of the table. It shows that the attributes of column X uniquely identify the attribute of column Y to satisfy this functional dependency.

74 of 102

Functional Dependency in DBMS

75 of 102

Functional Dependency in DBMS

A real-world database has a large number of schemas and an even larger number of attributes. The number of tuples can be in the millions or higher.
Discovering repetition would be costly. There is an even more fundamental problem with this approach
In our example, how would we know that in our university organization, each department (identified by its department name) must reside in a single building and must have a single budget amount? Is the fact that the budget amount for the Comp. Sci. department appears three times with the same budget amount just a coincidence?
We cannot answer these questions without going back to the enterprise itself and understanding its rules. In particular, we would need to discover that the university requires that every department (identified by its department name) must have only one building and one budget value.

76 of 102

Functional Dependency in DBMS

Therefore, we need to allow the database designer to specify rules such as “each specific value for dept name corresponds to at most one budget” even in cases where dept name is not the primary key for the schema in question.
In other words, we need to write a rule that says “if there were a schema (dept name, budget), then dept name is able to serve as the primary key.”
This rule is specified as a functional dependency

dept name → budget

77 of 102

Normalization

Normalization is the process of organizing data in a database. It includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.
A large database defined as a single relation may result in data duplication. This repetition of data may result in:
Making relations very large.
It isn't easy to maintain and update data as it would involve searching many records in relation.
Wastage and poor utilization of disk space and resources.
The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and well-structured relations that are satisfy desirable properties.

Normalization is a process of decomposing the relations into relations with fewer attributes.

78 of 102

Normalization

What is Normalization?

Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that helps to guide you in creating a good database structure.

79 of 102

Normalization

80 of 102

Normalization

Normal Form	Description
1NF	A relation is in 1NF if it contains an atomic value.
2NF	A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key.
3NF	A relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNF	A stronger definition of 3NF is known as Boyce Codd's normal form.
4NF	A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.
5NF	A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be lossless.

81 of 102

Normalization First Normal Form (1NF)

A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their combinations
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE.
Its decomposition into 1NF has been shown in table 2.

82 of 102

Normalization First Normal Form (1NF)

Example 2

83 of 102

Normalization Second Normal Form (2NF)

A relation is in 2NF if it is in 1NF and any non-prime attribute (attributes which are not part of any candidate key) is not partially dependent on any proper subset of any candidate key of the table.
In other words, we can say that, every non-prime attribute must be fully dependent on each candidate key.
A functional dependency X->Y (where X and Y are set of attributes) is said to be in partial dependency, if Y can be determined by any proper subset of X.
However, in 2NF it is possible for a prime attribute to be partially dependent on any candidate key, but every non-prime attribute must be fully dependent(or not partially dependent) on each candidate key of the table.

84 of 102

Normalization Second Normal Form (2NF)

For a table to be in the Second Normal Form,
It should be in the First Normal form.
And, it should not have Partial Dependency.

What is Partial Dependency?

When a table has a primary key that is made up of two or more columns, then all the columns(not included in the primary key) in that table should depend on the entire primary key and not on a part of it. If any column(which is not in the primary key) depends on a part of the primary key then we say we have Partial dependency in the table.
If we have two tables Students and Subjects, to store student information and information related to subjects.
Students table Subjects table

student_id	student_name	branch
1	Akon	CSE
2	Bkon	Mechanical

subject_id	subject_name
1	C Language
2	DSA
3	Operating System

85 of 102

Normalization Second Normal Form (2NF)

For a table to be in the Second Normal Form,
It should be in the First Normal form.
And, it should not have Partial Dependency.

Students table Subjects table

student_id	student_name	branch
1	Akon	CSE
2	Bkon	Mechanical

subject_id	subject_name
1	C Language
2	DSA
3	Operating System

student_id	subject_id	marks	teacher_name
1	1	70	Miss. C
1	2	82	Mr. D
2	1	65	Mr. Op

And we have another table Score to store the marks scored by students in any subject like this,

86 of 102

Normalization Second Normal Form (2NF)

Now in the above table, the primary key is student_id + subject_id, because both these information are required to select any row of data.
But in the Score table, we have a column teacher_name, which depends on the subject information or just the subject_id, so we should not keep that information in the Score table.
The column teacher_name should be in the Subjects table. And then the entire system will be Normalized as per the Second Normal Form.

Updated Subject table:

subject_id	subject_name	teacher_name
1	C Language	Miss. C
2	DSA	Mr. D
3	Operating System	Mr. Op

Updated Score table:

student_id	subject_id	marks
1	1	70
1	2	82
2	1	65

87 of 102

Normalization 3. Third Normal Form (3NF)

A table is said to be in the Third Normal Form when,
It satisfies the First Normal Form and the Second Normal form.
And, it doesn't have Transitive Dependency.

What is Transitive Dependency?

In a table we have some column that acts as the primary key and other columns depends on this column. But what if a column that is not the primary key depends on another column that is also not a primary key or part of it? Then we have Transitive dependency in our table.

Let's take an example. We had the Score table in the Second Normal Form above. If we have to store some extra information in it, like,

exam_type
total_marks
To store the type of exam and the total marks in the exam so that we can later calculate the percentage of marks scored by each student.

88 of 102

Normalization 3. Third Normal Form (3NF)

A table is said to be in the Third Normal Form when,
It satisfies the First Normal Form and the Second Normal form.
And, it doesn't have Transitive Dependency.

What is Transitive Dependency?

In a table we have some column that acts as the primary key and other columns depends on this column. But what if a column that is not the primary key depends on another column that is also not a primary key or part of it? Then we have Transitive dependency in our table.

Let's take an example. We had the Score table in the Second Normal Form above. If we have to store some extra information in it, like,

exam_type
total_marks
To store the type of exam and the total marks in the exam so that we can later calculate the percentage of marks scored by each student. The Score table will look like this,

student_id	subject_id	marks	exam_type	total_marks
1	1	70	Theory	100
1	2	82	Theory	100
2	1	42	Practical	50

89 of 102

Normalization 3. Third Normal Form (3NF)

In the table above, the column exam_type depends on both student_id and subject_id, because,

a student can be in the CSE branch or the Mechanical branch,
and based on that they may have different exam types for different subjects.
The CSE students may have both Practical and Theory for Compiler Design,
whereas Mechanical branch students may only have Theory exams for Compiler Design.

But the column total_marks just depends on the exam_type column. And the exam_type column is not a part of the primary key. Because the primary key is student_id + subject_id, hence we have a Transitive dependency here.
How to Transitive Dependency?
You can create a separate table for ExamType and use it in the Score table.
New Exam Table

exam_type_id	exam_type	total_marks	duration
1	Practical	50	45
2	Theory	100	180
3	Workshop	150	300

student_id	subject_id	marks	exam_type	total_marks
1	1	70	Theory	100
1	2	82	Theory	100
2	1	42	Practical	50

90 of 102

Normalization Boyce-Codd Normal Form (BCNF)

Boyce and Codd Normal Form is a higher version of the Third Normal Form.
This form deals with a certain type of anomaly that is not handled by 3NF.
A 3NF table that does not have multiple overlapping candidate keys is said to be in BCNF.
For a table to be in BCNF, the following conditions must be satisfied:

R must be in the 3rd Normal Form
and, for each functional dependency ( X → Y ), X should be a Super Key.

Rules for BCNF

For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions:
It should be in the Third Normal Form.
And, for any dependency A → B, A should be a super key.
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency A → B, A cannot be a non-prime attribute, if B is a prime attribute.

91 of 102

Normalization Boyce-Codd Normal Form (BCNF)

Time for an Example
Below we have a college enrolment table with columns student_id, subject and professor.

student_id	subject	professor
101	Java	P.Java
101	C++	P.Cpp
102	Java	P.Java2
103	C#	P.Chash
104	Java	P.Java

92 of 102

Normalization Boyce-Codd Normal Form (BCNF)

In the table above:
One student can enrol for multiple subjects. For example, student with student_id 101, has opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.
What do you think should be the Primary Key?
Well, in the table above student_id, subject together form the primary key, because using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one subject may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names are unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.

student_id	subject	professor
101	Java	P.Java
101	C++	P.Cpp
102	Java	P.Java2
103	C#	P.Chash
104	Java	P.Java

93 of 102

Normalization Boyce-Codd Normal Form (BCNF)

Why this table is not in BCNF?

In the table above, student_id, subject form primary key, which means subject column is a prime attribute.
But, there is one more dependency, professor → subject.
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.

How to satisfy BCNF?

To make this relation(table) satisfy BCNF, we will decompose this table into two tables, student table and professor table.
Below we have the structure for both the tables.
Students table

student_id	subject	professor
101	Java	P.Java
101	C++	P.Cpp
102	Java	P.Java2
103	C#	P.Chash
104	Java	P.Java

student_id	p_id
101	1
101	2
and so on...

And, Professor Table

p_id	professor	subject
1	P.Java	Java
2	P.Cpp	C++
and so on...

94 of 102

Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)

In database management systems, normalization is an essential process to ensure that data is organized efficiently and effectively. Multivalued dependency (MVD) is a concept that helps to identify and eliminate data redundancy and anomalies, and Fourth Normal Form (4NF) is a normalization form that addresses the challenges associated with multivalued dependency.
Multivalued Dependency (MVD)
Multivalued dependency (MVD) is a type of dependency that exists when a table contains more than one multivalued attribute and changes to one attribute can affect another attribute. In other words, MVD occurs when a table has a non-trivial relationship between attributes that are not part of the same composite key.
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third attribute that’s why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company that produces two colors(white and black) of each model every year.

95 of 102

Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)

Multivalued Dependency (MVD)

Multivalued dependency (MVD) is a type of dependency that exists when a table contains more than one multivalued attribute and changes to one attribute can affect another attribute. In other words, MVD occurs when a table has a non-trivial relationship between attributes that are not part of the same composite key.
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third attribute that’s why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company that produces two colors(white and black) of each model every year.

NIKE_MODEL	MANUF_YEAR	COLOR
M2011	2008	White
M2001	2008	Black
M3001	2013	White
M3001	2013	Black
M4006	2017	White
M4006	2017	Black

96 of 102

Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)

Multivalued Dependency (MVD)

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of these dependencies is shown below:
BIKE_MODEL -> -> MANUF_YEAR
BIKE_MODEL -> -> COLOR
This can be read as “BIKE_MODEL multidetermined MANUF_YEAR” and “BIKE_MODEL multidetermined COLOR”.

NIKE_MODEL	MANUF_YEAR	COLOR
M2011	2008	White
M2001	2008	Black
M3001	2013	White
M3001	2013	Black
M4006	2017	White
M4006	2017	Black

97 of 102

Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)

Fourth Normal Form (4NF) is a level of database normalization that requires a relation to be in BCNF and have no non-trivial multivalued dependencies other than the candidate key, to eliminate redundant data and maintain data consistency. If a table violates this standard, it needs to be split into two tables to achieve 4NF.
For a relation R to be in 4NF, it must meet two conditions − It should be in Boyce-Codd Normal Form (BCNF). It should not have any non-trivial multivalued dependencies.
Example:
To remove the multivalued dependency (MVD) in the “Students” table example, we can create two new tables, one for “Courses” and another for “Textbooks,” and establish a relationship between them using foreign keys.
Here’s how we can create the tables:
Table 1: Students

Student ID	Course ID
1	1
1	2
2	3
2	4
2	5
3	6

Table 2: Courses

Table 3: Textbooks

Course ID	Course Name
1	Math
2	Science
3	Art
4	History

Textbook ID	Textbook Name	Course ID
1	Algebra	1
2	Calculus	2
3	Biology	2
4	Chemistry	2
5	Art History	3
6	American History	4

98 of 102

Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)

So, we removed the multivalued dependency by splitting the “Course” and “Textbook” columns into separate tables.
We have also added a new “Course ID” column to the “Students” table. It has a foreign key that references the “Course ID” column in the “Courses” table. Similarly, the “Textbooks” table also has a “Course ID” column that serves as a foreign key referencing the “Course ID” column in the “Courses” table.
Hence, we have achieved the fourth normal form (4NF) for the “Students” table. It has done after by removing the multivalued dependency and creating separate tables. The Resultant schema eliminates data redundancy and improves data integrity, making it easier to manage and query the database.
Table 1: Students

Student ID	Course ID
1	1
1	2
2	3
2	4
2	5
3	6

Table 2: Courses

Table 3: Textbooks

Course ID	Course Name
1	Math
2	Science
3	Art
4	History

Textbook ID	Textbook Name	Course ID
1	Algebra	1
2	Calculus	2
3	Biology	2
4	Chemistry	2
5	Art History	3
6	American History	4

99 of 102

Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)

Examples of Achieving Fourth Normal Form

movie

location

Movie	Location	Genre
PIE	USA	STRATEGY
DOOM	UK	THRILLER
DUNE	UK	ACTION
INTERSTELLAR	USA	ACTION
DHAMAAL	INDIA	COMEDY

MOVIE	LOCATION
PIE	USA
DOOM	UK
DUNE	UK
INTERSTELLAR	USA
DHAMAAL	INDIA

MOVIE	LOCATION
PIE	USA
DOOM	UK
DUNE	UK
INTERSTELLAR	USA
DHAMAAL	INDIA

Now, the table has achieved its 4th Normal Form, by splitting the tables and converting it into its violation free table.

100 of 102

Normalization Fifth normal form (5NF)

A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
5NF is also known as Project-join normal form (PJ/NF).
n the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

SUBJECT	LECTURER	SEMESTER
Computer	Anshika	Semester 1
Computer	John	Semester 1
Math	John	Semester 1
Math	Akash	Semester 2
Chemistry	Praveen	Semester 1

Example

101 of 102

Normalization Fifth normal form (5NF)

mple

SEMESTER	SUBJECT
Semester 1	Computer
Semester 1	Math
Semester 1	Chemistry
Semester 2	Math

SUBJECT	LECTURER
Computer	Anshika
Computer	John
Math	John
Math	Akash
Chemistry	Praveen

SEMSTER	LECTURER
Semester 1	Anshika
Semester 1	John
Semester 1	John
Semester 2	Akash
Semester 1	Praveen

102 of 102

Normalization Fifth normal form (5NF)

E example of Join Dependency- Suppose we have the following table R:

We can break, or decompose the above table into three tables, this would mean that the table is not in 5NF!
The three decomposed tables would be:

R2: The table with columns E_Name and Product.

E_Name	Company	Product
Rohan	Comp1	Jeans
Harpreet	Comp2	Jacket
Anant	Comp3	TShirt

E_Name	Company
Rohan	Comp1
Harpreet	Comp2
Anant	Comp3

R1: The table with columns E_Name and Company.

R3: The table with columns Company and Product.

E_Name	Product
Rohan	Jeans
Harpreet	Jacket
Anant	TShirt

Company	Product
Comp1	Jeans
Comp2	Jacket
Comp3	TShirt