PCA24C02J �ADVANCED DATABASE TECHNOLOGY�UNIT-1
INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW
A database is a collection of data, typically describing the activities of one or more related organizations. For example, a university database might contain information about the following:
• Entities such as students, faculty, courses, and classrooms.
• Relationships between entities, such as students' enrollment in courses, faculty teaching courses, and the use of rooms for courses.
A database management system, or DBMS, is software designed to assist in maintaining and utilizing large collections of data. The need for such systems, as well as their use, is growing rapidly. The alternative to using a DBMS is to store the data in files and write application-specific code to manage it.
INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW
In order to facilitate these functions, DBMS has the following key components:
Software.
DBMS is primarily a software system that can be considered as a management console or an interface to interact with and manage databases. The interfacing also spreads across real-world physical systems that contribute data to the backend databases. The OS, networking software, and the hardware infrastructure is involved in creating, accessing, managing, and processing the databases.
Data.
DBMS contains operational data, access to database records and metadata as a resource to perform the necessary functionality. The data may include files with such as index files, administrative information, and data dictionaries used to represent data flows, ownership, structure, and relationships to other records or objects.
Procedures.
While not a part of the DBMS software, procedures can be considered as instructions on using DBMS. The documented guidelines assist users in designing, modifying, managing, and processing databases.
Database languages.
These are components of the DBMS used to access, modify, store, and retrieve data items from databases; specify database schema; control user access; and perform other associated database management operations. Types of DBMS languages include Data Definition Language (DDL), Data Manipulation Language (DML), Database Access Language (DAL) and Data Control Language (DCL).
INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW
In order to facilitate these functions, DBMS has the following key components:
• Query processor.
As a fundamental component of the DBMS, the query processor acts as an intermediary between users and the DBMS data engine in order to communicate query requests. When users enter an instruction in SQL language, the command is executed from the high-level language instruction to a low-level language that the underlying machine can understand and process to perform the appropriate DBMS functionality. In addition to instruction parsing and translation, the query processor also optimizes queries to ensure fast processing and accurate results.
• Runtime database manager.
A centralized management component of DBMS that handles functionality associated with runtime data, which is commonly used for context-based database access. This component checks for user authorization to request the query; processes the approved queries; devises an optimal strategy for query execution; supports concurrency so that multiple users can simultaneously work on same databases; and ensures integrity of data recorded into the databases.
INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW
In order to facilitate these functions, DBMS has the following key components:
•Database manager.
Unlike the runtime database manager that handles queries and data at runtime, the database manager performs DBMS functionality associated with the data within databases. Database manager allows a set of commands to perform different DBMS operations that include creating, deleting, backup, restoring, cloning, and other database maintenance tasks. The database manager may also be used to update the database with patches from vendors.
•Database engine.
This is the core software component within the DBMS solution that performs the core functions associated with data storage and retrieval. A database engine is also accessible via APIs that allow users or apps to create, read, write, and delete records in databases.
•Reporting.
The report generator extracts useful information from DBMS files and displays it in structured format based on defined specifications. This information may be used for further analysis, decision making, or business intelligence.
INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW
Why Use DBMS?
• To develop software applications In less time.
• Data independence and efficient use of data.
• For uniform data administration.
• For data integrity and security.
• For concurrent access to data, and data recovery from crashes.
• To use user-friendly declarative query language.
Database Applications
• Airlines: reservations, schedules, etc.
• Telecom: calls made customer details, network usage, etc.
• Universities: registration, results, grades, etc.
• Sales: products, purchases, customers, etc.
• Banking: all transactions etc.
INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW
Components of DBMS
•
Database Architecture
Database Architecture
Database Architecture
Data Models in Database Technologies
Data Models�
Types of Data Model Structures�
Benefits of Data ModeLling�
Process of Data Modeling�
Types of Data Models�
Relational Data Model�
The relational model uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations.
Entity-Relationship Data Model�
Semi-structured Data Model�
It permits the specification of data where individual data items of the same type may have different sets of attributes. This contrasts the above data models, where every data item of a particular type must have the same set of attributes. JSON and XML are widely used semi-structured data representations.
Object-Based Data Model�
Utilizes object-oriented concepts like Java and C++ to represent data, allowing for complex data relationships and encapsulation of data and behaviour.
Importance of Data Modeling�
Organizing Data Effectively�
Facilitating Communication
Importance of Data Modeling�
Enabling Data Analysis�
Ensuring Data Integrity
Data Models Help in Ensuring Data Integrity:�Data models provide a structured way to represent and organize data, ensuring consistency and integrity.
Example:�By using data models, organizations can ensure that their databases maintain data integrity, preventing errors and inconsistencies in the data.
Importance of Data Modeling�
Database Schema and Schema Diagrams
Introduction to Database Schema
Relation Instance and Relation Schema
Relating Tuples
Suppose we wish to find the information about all the instructors who work in the Watson building:
Other relation examples
Schema Diagrams
Schema Diagram for the University Database
Attributes
Relational Query Languages
INTRODUCTION OF DATABASE SYSTEMS -OVERVIEW
Advantages of DBMS
A DBMS manages data and has many benefits. These are:
• Data independence: Application programs should be as free or independent as possible from details of data representation and storage. DBMS can supply an abstract view of the data for insulating application code from such facts.
• Efficient data access: DBMS utilizes a mixture of sophisticated concepts and techniques for storing and retrieving data competently. This feature becomes important in cases where the data is stored on external storage devices.
• Data integrity and security: If data is accessed through the DBMS, the DBMS can enforce integrity constraints on the data.
• Data administration: When several users share the data, integrating the administration of data can offer significant improvements. Experienced professionals understand the nature of the data being managed and can be responsible for organizing the data representation to reduce redundancy and make the data to retrieve efficiently.
FILE SYSTEMS VS DBMS
What is File System?
A File Management system is a DBMS that allows access to single files or tables at a time. In a File System, data is directly stored in set of files. It contains flat files that have no relation to other files (when only one table is stored in single file, then this file is known as flat file).
In a typical file processing system, each and every subsystem of the information system will have its own set of files.
ii. The typical file-oriented system is supported by a conventional operating system. Permanent records are stored in various files and a number of different application programs are written to extract records from and add records to the appropriate files.
To understand the need for a DBMS, let us consider a motivating scenario:
A company has a large collection (say, 500 GB3 ) of data on employees, departments, products, sales, and so on. This data is accessed concurrently by several employees. Questions about the data must be answered quickly, changes made to the data by different users must be applied consistently, and access to certain parts of the data (e.g., salaries) must be restricted.
FILE SYSTEMS VS DBMS
Drawbacks of File system
• Data redundancy: Data redundancy refers to the duplication of data, lets say we are managing the data of a college where a student is enrolled for two courses, the same student details in such case will be stored twice, which will take more storage than needed. Data redundancy often leads to higher storage costs and poor access time.
• Data inconsistency: Data redundancy leads to data inconsistency, lets take the same example that we have taken above, a student is enrolled for two courses and we have student address stored twice, now lets say student requests to change his address, if the address is changed at one place and not on all the records then this can lead to data inconsistency.
• Data Isolation: Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve the appropriate data is difficult.
• Dependency on application programs: Changing files would lead to change in application programs.
• Atomicity issues: Atomicity of a transaction refers to “All or nothing”, which means either all the operations in a transaction executes or none.
For example: Lets say Steve transfers 100$ to Negan’s account. This transaction consists multiple operations such as debit 100$ from Steve’s account, credit 100$ to Negan’s account. Like any other device, a computer system can fail lets say it fails after first operation then in that case Steve’s account would have been debited by 100$ but the amount was not credited to Negan’s account, in such case the rollback of operation should occur to maintain the atomicity of transaction. It is difficult to achieve atomicity in file processing systems.
• Data Security: Data should be secured from unauthorised access, for example a student in a college should not be able to see the payroll details of the teachers, such kind of security constraints are difficult to apply in file processing systems.
FILE SYSTEMS VS DBMS
File Management System | Database Management System |
File System is a general, easy-to-use system to store general files which require less security and constraints. | Database management system is used when security constraints are high. |
Data Redundancy is more in file management system. | Data Redundancy is less in database management system. |
Data Inconsistency is more in file system. | Data Inconsistency is less in database management system. |
Centralisation is hard to get when it comes to File Management System. | Centralisation is achieved in Database Management System. |
User locates the physical address of the files to access data in File Management System. | In Database Management System, user is unaware of physical address where data is stored. |
Security is low in File Management System. | Security is high in Database Management System. |
File Management System stores unstructured data as isolated data files/entities. | Database Management System stores structured data which have well defined constraints and interrelation. |
Database Design and ER Diagrams -Entities, Attributes, and Entity Sets
Database Design is a collection of processes that facilitate the designing, development, implementation and maintenance of enterprise data management systems.
The database design process can be divided into six steps.
The ER model is most relevant to the first three steps.
1. Requirements Analysis:
The very first step in designing a database application is to understand what data is to be stored in the database, what applications must be built on top of it, and what operations are most frequent and subject to performance requirements. In other words,
we must find out what the users want from the database.
2. Conceptual Database Design:
The information gathered in the requirements analysis step is used to develop a high-level description of the data to be stored in the database, along with the constraints known to hold over
this data. This step is often carried out using the ER model
3. Logical Database Design: We must choose a DBMS to implement our database design, and convert the conceptual database design into a database schema in the data model of the chosen DBMS. We will consider only relational DBMSs, and therefore, the task in the logical design step is to convert an ER schema into a relational database schema.
ENTITIES, ATTRIBUTES, AND ENTITY SETS
In a database management system (DBMS), an attribute is a piece of data that describes an entity.
For example, in a STUDENT database, the attributes might be Student id, Age, name, address, and phone number
ENTITIES, ATTRIBUTES, AND ENTITY SETS
ENTITIES, ATTRIBUTES, AND ENTITY SETS
Symbol | Shape Name | Symbol Description |
Entities | ||
| Entity | An entity is represented by a rectangle which contains the entity’s name. |
| Weak Entity | An entity that cannot be uniquely identified by its attributes alone. The existence of a weak entity is dependent upon another entity called the owner entity. The weak entity’s identifier is a combination of the identifier of the owner entity and the partial key of the weak entity. |
ENTITIES, ATTRIBUTES, AND ENTITY SETS
Strong relationship
Weak relationship
ER diagram
A simple ER Diagram:
Rectangle: Represents Entity sets.�Ellipses: Attributes�Diamonds: Relationship Set�Lines: They link attributes to Entity Sets and Entity sets to Relationship Set�Double Ellipses: Multivalued Attributes�Dashed Ellipses: Derived Attributes�Double Rectangles: Weak Entity Sets�Double Lines: Total participation of an entity in a relationship set
ER diagram
ER diagram
Composite Attributes-
Composite attributes are those attributes which are composed of many other simple attributes.
Here, the attributes “Name” and “Address” are composite attributes as they are composed of many other simple attributes.
ER diagram
Multi Valued Attributes-
Multi valued attributes are those attributes which can take more than one value for a given entity from an entity set.
Example-Here, the attributes “Mob_no” and “Email_id” are multi valued attributes as they can take more than one values for a given entity.
.
ER diagram
Key Attributes-
Key attributes are those attributes which can identify an entity uniquely in an entity set.
Example-Here, the attribute “Roll_no” is a key attribute as it can identify any student uniquely.
.
.
ER diagram
Describing and storing data in a DBMS
Describing and storing data in a DBMS
SID | NAME | ZOGIN | AGE | GPA |
53666 | Jones | jones@cs | 18 | 3.4 |
53688 | Smith | smith@ee | 18 | 3.2 |
53650 | Smith | smith@math | 19 | 3.8 |
53831 | Madayan | madayan@music | 11 | 1.8 |
53832 | Guldu | guldu@music | 12 | 2.0 |
Describing and storing data in a DBMS
SID | NAME | ZOGIN | AGE | GPA |
53666 | Jones | jones@cs | 18 | 3.4 |
53688 | Smith | smith@ee | 18 | 3.2 |
53650 | Smith | smith@math | 19 | 3.8 |
53831 | Madayan | madayan@music | 11 | 1.8 |
53832 | Guldu | guldu@music | 12 | 2.0 |
Describing and storing data in a DBMS
Describing and storing data in a DBMS
Relationships And Relationship Sets
A set of relationships of the same type
ssn
Name
lot
Employees
RELATIONSHIPS AND RELATIONSHIP SETS
A relationship is an association among two or more entities. For example, we may have the relationship that Attishoo works in the pharmacy department.
ssn
Name
lot
Employees
Keys
Keys
Keys
Keys
Keys
Unique Constraint
Unique Constraint
- SQL (Standard):
CREATE TABLE employees (
id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE
);
Unique Constraint
- SQL (Standard):
CREATE TABLE Persons (� ID int NOT NULL UNIQUE,� LastName varchar(255) NOT NULL,� FirstName varchar(255),� Age int�);
Foreign Key Constraint
Normalization
Functional Dependency in DBMS
A → B,
where A and B are the attributes of the relation.
Functional Dependency in DBMS
Employee_Id
Employee_Name
Employee_Department
Salary
Employee_Id | Employee_Name | Employee_Department | Salary |
1 | Ryan | Mechanical | $5000 |
2 | Justin | Biotechnology | $5000 |
3 | Andrew | Computer Science | $8000 |
4 | Felix | Human Resource | $10000 |
Employee Table
Functional Dependency in DBMS
Employee_Id | Employee_Name | Employee_Department | Salary |
1 | Ryan | Mechanical | $5000 |
2 | Justin | Biotechnology | $5000 |
3 | Andrew | Computer Science | $8000 |
4 | Felix | Human Resource | $10000 |
Employee Table
Functional Dependency in DBMS
Employee Table
Functional Dependency in DBMS
Employee Table
Functional Dependency in DBMS
Employee Table
Employee_Id | Name | Age |
1 | Zayn | 24 |
2 | Phobe | 34 |
3 | Hikki | 26 |
4 | David | 29 |
Functional Dependency in DBMS
Employee Table
Employee_Id | Name | Age |
1 | Zayn | 24 |
2 | Phobe | 34 |
3 | Hikki | 26 |
4 | David | 29 |
4 | Phobe | 24 |
Functional Dependency in DBMS
Functional Dependency in DBMS
Functional Dependency in DBMS
Functional Dependency in DBMS
dept name → budget
Normalization
Normalization is a process of decomposing the relations into relations with fewer attributes.
Normalization
What is Normalization?
Why do we need Normalization?
Normalization
Normalization
Normal Form | Description |
A relation is in 1NF if it contains an atomic value. | |
A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key. | |
A relation will be in 3NF if it is in 2NF and no transition dependency exists. | |
BCNF | A stronger definition of 3NF is known as Boyce Codd's normal form. |
A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency. | |
A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be lossless. |
Normalization First Normal Form (1NF)
Normalization First Normal Form (1NF)
Normalization Second Normal Form (2NF)
Normalization Second Normal Form (2NF)
What is Partial Dependency?
student_id | student_name | branch |
1 | Akon | CSE |
2 | Bkon | Mechanical |
subject_id | subject_name |
1 | C Language |
2 | DSA |
3 | Operating System |
Normalization Second Normal Form (2NF)
student_id | student_name | branch |
1 | Akon | CSE |
2 | Bkon | Mechanical |
subject_id | subject_name |
1 | C Language |
2 | DSA |
3 | Operating System |
student_id | subject_id | marks | teacher_name |
1 | 1 | 70 | Miss. C |
1 | 2 | 82 | Mr. D |
2 | 1 | 65 | Mr. Op |
And we have another table Score to store the marks scored by students in any subject like this,
Normalization Second Normal Form (2NF)
Updated Subject table:
subject_id | subject_name | teacher_name |
1 | C Language | Miss. C |
2 | DSA | Mr. D |
3 | Operating System | Mr. Op |
Updated Score table:
student_id | subject_id | marks |
1 | 1 | 70 |
1 | 2 | 82 |
2 | 1 | 65 |
Normalization 3. Third Normal Form (3NF)
What is Transitive Dependency?
Let's take an example. We had the Score table in the Second Normal Form above. If we have to store some extra information in it, like,
Normalization 3. Third Normal Form (3NF)
What is Transitive Dependency?
Let's take an example. We had the Score table in the Second Normal Form above. If we have to store some extra information in it, like,
student_id | subject_id | marks | exam_type | total_marks |
1 | 1 | 70 | Theory | 100 |
1 | 2 | 82 | Theory | 100 |
2 | 1 | 42 | Practical | 50 |
Normalization 3. Third Normal Form (3NF)
exam_type_id | exam_type | total_marks | duration |
1 | Practical | 50 | 45 |
2 | Theory | 100 | 180 |
3 | Workshop | 150 | 300 |
student_id | subject_id | marks | exam_type | total_marks |
1 | 1 | 70 | Theory | 100 |
1 | 2 | 82 | Theory | 100 |
2 | 1 | 42 | Practical | 50 |
Normalization Boyce-Codd Normal Form (BCNF)
Rules for BCNF
Normalization Boyce-Codd Normal Form (BCNF)
student_id | subject | professor |
101 | Java | P.Java |
101 | C++ | P.Cpp |
102 | Java | P.Java2 |
103 | C# | P.Chash |
104 | Java | P.Java |
Normalization Boyce-Codd Normal Form (BCNF)
student_id | subject | professor |
101 | Java | P.Java |
101 | C++ | P.Cpp |
102 | Java | P.Java2 |
103 | C# | P.Chash |
104 | Java | P.Java |
Normalization Boyce-Codd Normal Form (BCNF)
Why this table is not in BCNF?
How to satisfy BCNF?
student_id | subject | professor |
101 | Java | P.Java |
101 | C++ | P.Cpp |
102 | Java | P.Java2 |
103 | C# | P.Chash |
104 | Java | P.Java |
student_id | p_id |
101 | 1 |
101 | 2 |
and so on... | |
And, Professor Table
p_id | professor | subject |
1 | P.Java | Java |
2 | P.Cpp | C++ |
and so on... | ||
Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)
Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)
Multivalued Dependency (MVD)
NIKE_MODEL | MANUF_YEAR | COLOR |
M2011 | 2008 | White |
M2001 | 2008 | Black |
M3001 | 2013 | White |
M3001 | 2013 | Black |
M4006 | 2017 | White |
M4006 | 2017 | Black |
Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)
Multivalued Dependency (MVD)
NIKE_MODEL | MANUF_YEAR | COLOR |
M2011 | 2008 | White |
M2001 | 2008 | Black |
M3001 | 2013 | White |
M3001 | 2013 | Black |
M4006 | 2017 | White |
M4006 | 2017 | Black |
Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)
Student ID | Course ID |
1 | 1 |
1 | 2 |
2 | 3 |
2 | 4 |
2 | 5 |
3 | 6 |
Table 2: Courses
Table 3: Textbooks
Course ID | Course Name |
1 | Math |
2 | Science |
3 | Art |
4 | History |
Textbook ID | Textbook Name | Course ID |
1 | Algebra | 1 |
2 | Calculus | 2 |
3 | Biology | 2 |
4 | Chemistry | 2 |
5 | Art History | 3 |
6 | American History | 4 |
Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)
Student ID | Course ID |
1 | 1 |
1 | 2 |
2 | 3 |
2 | 4 |
2 | 5 |
3 | 6 |
Table 2: Courses
Table 3: Textbooks
Course ID | Course Name |
1 | Math |
2 | Science |
3 | Art |
4 | History |
Textbook ID | Textbook Name | Course ID |
1 | Algebra | 1 |
2 | Calculus | 2 |
3 | Biology | 2 |
4 | Chemistry | 2 |
5 | Art History | 3 |
6 | American History | 4 |
Normalization MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)
movie
movie
location
Movie | Location | Genre |
PIE | USA | STRATEGY |
DOOM | UK | THRILLER |
DUNE | UK | ACTION |
INTERSTELLAR | USA | ACTION |
DHAMAAL | INDIA | COMEDY |
MOVIE | LOCATION |
PIE | USA |
DOOM | UK |
DUNE | UK |
INTERSTELLAR | USA |
DHAMAAL | INDIA |
MOVIE | LOCATION |
PIE | USA |
DOOM | UK |
DUNE | UK |
INTERSTELLAR | USA |
DHAMAAL | INDIA |
Now, the table has achieved its 4th Normal Form, by splitting the tables and converting it into its violation free table.
Normalization Fifth normal form (5NF)
SUBJECT | LECTURER | SEMESTER |
Computer | Anshika | Semester 1 |
Computer | John | Semester 1 |
Math | John | Semester 1 |
Math | Akash | Semester 2 |
Chemistry | Praveen | Semester 1 |
Example
Normalization Fifth normal form (5NF)
mple
SEMESTER | SUBJECT |
Semester 1 | Computer |
Semester 1 | Math |
Semester 1 | Chemistry |
Semester 2 | Math |
SUBJECT | LECTURER |
Computer | Anshika |
Computer | John |
Math | John |
Math | Akash |
Chemistry | Praveen |
SEMSTER | LECTURER |
Semester 1 | Anshika |
Semester 1 | John |
Semester 1 | John |
Semester 2 | Akash |
Semester 1 | Praveen |
P1
Normalization Fifth normal form (5NF)
R2: The table with columns E_Name and Product.
E_Name | Company | Product |
Rohan | Comp1 | Jeans |
Harpreet | Comp2 | Jacket |
Anant | Comp3 | TShirt |
E_Name | Company |
Rohan | Comp1 |
Harpreet | Comp2 |
Anant | Comp3 |
R1: The table with columns E_Name and Company.
R3: The table with columns Company and Product.
E_Name | Product |
Rohan | Jeans |
Harpreet | Jacket |
Anant | TShirt |
Company | Product |
Comp1 | Jeans |
Comp2 | Jacket |
Comp3 | TShirt |