1 of 41

Lecture - 01

Introduction

Friday, December 6, 2024

2 of 41

DDBMS : Course Introduction

  • Course C.Hs:
    • 03 C.Hs (42 lectures) :
      • Theoretical work
      • Hands on Exercises

3 of 41

DDBMS : Course Introduction

  • Teaching plan:
    • 02 tests:
      • Each after 07 lectures

    • Assignments + Exercises + Presentations

4 of 41

DDBMS : Course Introduction

  • Lecture slides/announcements/results can be found at course home page:

http://saleem.quest.edu.pk/

5 of 41

DDBMS : Course Outline

  • Introduction (today's lecture)
    • Motivation, DBMS, DDBMS
  • DDBMS Architecture
    • Basics of Distributed Database Architecture
      • Architecture, Schema, Views
    • Alternatives in Distributed Database Systems
      1. Centralized DDBMS
      2. Client-server DBMS
      3. Peer-to-peer DDBMS
      4. Multidatabase Systems

6 of 41

DDBMS : Course Outline

  • DDBMS Design
    • Design Approaches
    • Design Issues
    • Fragmentation
    • Allocation
  • Query Processing
    • Relational Algebra and Relational Calculus
    • Centralized and Distributed Query Processing
    • Query Optimization

7 of 41

DDBMS : Course Outline

  • Transaction Management and Processing
    • Fundamentals of Transaction Processing
    • Centralized and Distributed Transaction Processing
    • Concurrency Control
    • Reliability
    • Transaction Protocols

8 of 41

DDBMS : Course Introduction

  • Recommended Textbooks:
    • Principles of Distributed Database Systems, 3rd ed.
          • Tamer Oszu and Patrick Valduriez
    • Principles of Transaction Processing, 2nd ed.
          • Philip A. Bernstein and Eric Newcomer
    • The State of the Art in Distributed Query Processing ACM Computing Surveys, Vol. 32, No. 4, 2000, S. 422 - 469.
          • D. Kossmann
    • Online Literature/Tutorials etc.

9 of 41

Lecture Outline

  • Introduction
    • Motivation
    • Database and database management system
    • Distributed database system
    • Distributed database system promises/characteristics

10 of 41

Traditional File Processing

program 1

data description 1

program 2

data description 2

program 3

data description 3

File 1

File 2

File 3

11 of 41

Database Management System

database

DBMS

Application

program 1

(with data

semantics)

Application

program 2

(with data

semantics)

Application

program 3

(with data

semantics)

description

manipulation

control

12 of 41

Database vs. File-based Approach

Database approach is preferred over traditional file-based

approach because of following key reasons:

  • Self-describing nature of a database system
  • Insulation between program and data, and data abstraction
  • Support of multiple views of data
  • Sharing of data and multiuser transaction processing

13 of 41

List the names of students who took the section of the ‘Database’ course offered in fall 2008 and their grades in that section

14 of 41

Distributed Database System

Database

Technology

Computer

Networks

integration

distribution

integration

Distributed

Database

Systems

15 of 41

Distributed Database System (DDBS)

  • Distributed Database (DDB):
    • A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network.
  • Distributed Database Management System (DDBMS):

    • A distributed database management system (D–DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users.

  • Distributed Database System (DDBS):

    • Distributed database system (DDBS) = DB + DBMS + Communication

16 of 41

What is not Distributed Database System

Site 5

Site 1

Site 2

Site 3

Site 4

Communication

Network

  • A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node.

17 of 41

Distributed DBMS Environment

Site 5

Site 1

Site 2

Site 3

Site 4

Communication

Network

18 of 41

Applications

  • Reservation systems
  • Hotel chains
  • Electronic fund transfers and electronic trading
  • Corporate MIS
  • Manufacturing - especially multi-plant manufacturing
  • Military command and control
  • Any organization which has a decentralized organization structure

19 of 41

Distributed DBMS Promises

  1. Transparent management of distributed, fragmented, and replicated data

  • Improved reliability/availability through distributed transactions

  • Improved performance

  • Easier and more economical system expansion

20 of 41

Transparency

  • Transparency is the separation of the higher-level semantics of a system from the lower-level implementation issues.

21 of 41

Example

TITLE

SAL

PAY

Elect. Eng.

40000

Syst. Anal.

34000

Mech. Eng.

27000

Programmer

24000

PROJ

PNO

PNAME

BUDGET

ENO

ENAME

TITLE

E1

J. Doe

Elect. Eng.

E2

M. Smith

Syst. Anal.

E3

A. Lee

Mech. Eng.

E4

J. Miller

Programmer

E5

B. Casey

Syst. Anal.

E6

L. Chu

Elect. Eng.

E7

R. Davis

Mech. Eng.

E8

J. Jones

Syst. Anal.

EMP

ENO

PNO

RESP

E1

P1

Manager

12

DUR

E2

P1

Analyst

24

E2

P2

Analyst

6

E3

P3

Consultant

10

E3

P4

Engineer

48

E4

P2

Programmer

18

E5

P2

Manager

24

E6

P4

Manager

48

E7

P3

Engineer

36

E8

P3

Manager

40

ASG

P1

Instrumentation

150000

P3

CAD/CAM

250000

P2

Database Develop.

135000

P4

Maintenance

310000

E7

P5

Engineer

23

SELECT ENAME,SAL

FROM EMP,ASG,PAY

WHERE DUR > 12

AND EMP.ENO = ASG.ENO

AND PAY.TITLE = EMP.TITLE

22 of 41

Transparent Access

Paris projects

Paris employees

Paris assignments

Boston employees

Montreal projects

Paris projects

New York projects

with budget > 200000

Montreal employees

Montreal assignments

Boston

Communication

Network

Montreal

Paris

New

York

Boston projects

Boston employees

Boston assignments

Boston projects

New York employees

New York projects

New York assignments

Tokyo

SELECT ENAME,SAL

FROM EMP,ASG,PAY

WHERE DUR > 12

AND EMP.ENO = ASG.ENO

AND PAY.TITLE = EMP.TITLE

23 of 41

Transparency

  • Transparency involves:
    1. Data independence
      • Logical
      • Physical
    2. Network (distribution) transparency
    3. Replication transparency
    4. Fragmentation transparency
      • horizontal fragmentation: selection
      • vertical fragmentation: projection
      • hybrid

24 of 41

Transparency

  • Data independence (Data transparency):
    • It refers to the immunity of user applications to changes in the definition and organization of data, and vice versa.
    • Logical data independence:
      • refers to the immunity of user applications to changes in the logical structure of data.
      • Application should still be running if additional attributes are added to a relation
    • Physical data independence:
      • deals with hiding the details of the storage structure from user applications

25 of 41

Transparency

  • Network transparency:
    • User does not need to know the operational details of the network:
      • Service transparency
      • Location transparency

26 of 41

Transparency

  • Replication:
    • It refers to multiple copies of the same data. Helps to improve performance, reliability, and availability of the system across the network.

  • Replication transparency:
    • The user does not need to know the existence of copies, their management, and location.

27 of 41

Transparency

  • Fragmentation:
    • It refers to the division of database relations into smaller fragments and treat each fragment as a separate database object (i.e., another relation). It helps performance, availability, and reliability of the system.
    • Fragmentation can reduce the negative effects of replication

  • Fragmentation types:
      • horizontal fragmentation: selection
      • vertical fragmentation: projection
      • hybrid

28 of 41

Fragmentation Transparency - Horizontal

29 of 41

Fragmentation Transparency - Horizontal

30 of 41

Fragmentation Transparency - Vertical

31 of 41

Fragmentation Transparency - Vertical

32 of 41

Layers of Transparency

33 of 41

Who Should Provide Transparency?

  • Application
    • Applications or application modules are implemented in a distributed fashion Communication and data exchange via standard protocols (RPC, CORBA, HTTP, . . . )
  • Operating system
    • Realizes network transparency, e.g., on system level (NFS) or protocol level
  • Database system
    • Transparent access to data at remote database instances
    • Requires splitting queries, transaction control, replication

34 of 41

Distributed DBMS Promises

  1. Transparent management of distributed, fragmented, and replicated data

  • Improved reliability/availability through distributed transactions

  • Improved performance

  • Easier and more economical system expansion

35 of 41

Reliability Through Distributed Transactions

  • Reliability:
    • Allows correct operation even in case of failures
    • Is achieved by data copies (replicates) on remote sites
    • Correct operations are achieved by transferring one consistent database state into another consistent database state
      • Example: Increasing the salaries of all employees in distributed environment by 10%.

36 of 41

Distributed DBMS Promises

  1. Transparent management of distributed, fragmented, and replicated data

  • Improved reliability/availability through distributed transactions

  • Improved performance

  • Easier and more economical system expansion

37 of 41

Improved Performance

  • Improved performance:
    • Is achieved using fragmentation and parallelism
  • Improved performance using fragmentation:
    • Fragmenting the conceptual database in a way that enables data to be stored in close proximity to its points of use which ultimately reduces transfer costs and delays
  • Improved performance using fragmentation:
    • Inter-query parallelism:
      • execution of multiple queries at the same time
    • Intra-query parallelism:
      • parallel execution of sub-queries at different sites accessing a different part of the distributed database

38 of 41

Distributed DBMS Promises

  1. Transparent management of distributed, fragmented, and replicated data

  • Improved reliability/availability through distributed transactions

  • Improved performance

  • Easier and more economical system expansion

39 of 41

Easier System Expansion

  • Necessity of increasing database size and/or decreasing query execution time

  • Expansion by adding additional storage and processing power to the network

  • A system of smaller computers is often cheaper than a single big machine with the equivalent power

40 of 41

Challenges/Issues

  • Distributed database design
    • Fragmentation, replication, and distribution

  • Distributed query processing
    • Executing a query over the network in the most cost-effective way

  • Distributed concurrency control
    • Synchronizing access such that integrity is maintained

  • Reliability of distributed DBMS
    • Ensure consistency, detect failures, and recover from failures

  • Heterogeneous databases
    • Translation between database systems (data model, data language)

41 of 41

Relationship Among Challenges/Issues

Directory

Management

Reliability

Deadlock

Management

Query

Processing

Concurrency

Control

Distribution

Design