1 of 38

Distributed Databases

Mrs.Bhagyashri Rahul Joshi

2 of 38

Difference between Centralized Database and Distributed Database

  • Centralized Database :

  • A centralized database is basically a type of database that is stored, located as well as maintained at a single location only.
  • This type of database is modified and managed from that location itself.
  • This location is thus mainly any database system or a centralized computer system.
  • The centralized location is accessed via an internet connection (LAN, WAN, etc).
  • This centralized database is mainly used by institutions or organizations.

3 of 38

4 of 38

Advantages –

  • Since all data is stored at a single location only thus it is easier to access and co-ordinate data.
  • The centralized database has very minimal data redundancy since all data is stored at a single place.
  • It is cheaper in comparison to all other databases available.
  • Disadvantages –
  • The data traffic in case of centralized database is more.
  • If any kind of system failure occurs at centralized system then entire data will be destroyed.

5 of 38

  • Advantages –
  • This database can be easily expanded as data is already spread across different physical locations.
  • The distributed database can easily be accessed from different networks.
  • This database is more secure in comparison to centralized database.
  • Disadvantages –
  • This database is very costly and it is difficult to maintain because of its complexity.
  • In this database, it is difficult to provide a uniform view to user since it is spread across different physical locations.

 Distributed Database :A distributed database is basically a type of database which consists of multiple databases that are connected with each other and are spread across different physical locations. The data that is stored on various physical locations can thus be managed independently of other physical locations. The communication between databases at different physical locations is thus done by a computer network.

6 of 38

7 of 38

Centralized database

Distributed database

It is database that is stored, located as well as maintained at a single location only.

It is a database which consists of multiple databases which are connected with each other and are spread across different physical locations.

The data access time in case of multiple users is more in a centralized database.

The data access time in case of multiple users is less in a distributed database.

The management, modification and backup of this database is easier as entire data is present at the same location.

The management, modification and backup of this database is very difficult as it is spread across different physical locations.

This database provides a uniform and complete view to the user.

Since it is spread across different locations thus it is difficult to provide a uniform view to the user.

This database has more data consistency in comparison to distributed database.

This database may have some data replications thus data consistency is less.

The users cannot access database in case database failure occurs.

In distributed database if one database fails users have access to other databases.

Centralized database is less costly.

This database is very expensive.

8 of 38

9 of 38

Functions Of a DDBMS

  • Application Interface-It Allows the interaction with the end user

or application program.

  • Validation: Able to analyse data request.
  • Security: To Provide data privacy at both local and remote databases.
  • Formatting: To Prepare the data for presentation to end user.
  • I/O interface: To read or write data from or to permanent local

storage.

  • DB Administration: To allow the database Administrator to

Maintain The database.

  • Concurrency control:
  • Transaction Management

10 of 38

Homogeneous distributed databases system:

  •  Homogeneous distributed database system is a network of two or more databases (With same type of DBMS software) which can be stored on one or more machines.
  • So, in this system data can be accessed and modified simultaneously on several databases in the network. Homogeneous distributed system are easy to handle.

Example: Consider that we have three departments using Oracle-9i for DBMS. If some changes are made in one department then, it would  update the other department also.

11 of 38

Heterogeneous distributed databases system:

 Heterogeneous distributed database system is a network of two or more databases with different types of DBMS software, which can be stored on one or more machines.

In this system data can be accessible to several databases in the network with the help of generic connectivity (ODBC and JDBC).

Example: Example: In the following diagram, different DBMS software are accessible to each other  using ODBC and JDBC.

12 of 38

Replication

  • Data Replication is the process of storing data in more

than one site or node. It is useful in improving the availability

of data.

  • It is simply copying data from a database from one server

to another server so that all the users can share the same data

without any inconsistency. The result is a distributed database in

which users can access data relevant to their tasks without interfering

with the work of others.

  • Data replication encompasses duplication of transactions on an ongoing basis, so that the replicate is in a consistently updated state and synchronized with the source. However in data replication data is available at different locations, but a particular relation has to reside at only one location.
  • There can be full replication, in which the whole database is stored at every site. There can also be partial replication, in which some frequently used fragment of the database are replicated and others are not replicated.

13 of 38

Snapshot Replication –

  •  Snapshot replication distributes data exactly as it appears at a specific moment in time does not monitor for updates to the data.
  • The entire snapshot is generated and sent to Users. 
  • Snapshot replication is generally used when data changes are infrequent. It is bit slower than transactional because on each attempt it moves multiple records from one end to the other end.
  • Snapshot replication is a good way to perform initial synchronization between the publisher and the subscriber.
  • Data on one database server

14 of 38

AAA

BBB

CCC

AAA

BBB

CCC

AAA

BBB

CCC

Publisher

Subscriber

Subscriber

Snapshot Replication

15 of 38

Merge Replication – 

  • Data from two or more databases is combined into a single database. Merge replication is the most complex type of replication because it allows both publisher and subscriber to independently make changes to the database.
  • Merge replication is typically used in server-to-client environments.
  • It allows changes to be sent from one publisher to multiple subscribers
  • Example of Merging Replication is Retile Marketing like Big Bazar

16 of 38

AAA

BBB

CCC

AAA

BBB

CCC

AAA

BBB

CCC

AAA

BBB

CCC

AAA

BBB

CCC

AAA

BBB

CCC

AAA

BBB

CCC

Publisher

Publisher

Subscriber

Subscriber

Subscriber

Subscriber

Subscriber

Subscriber

17 of 38

Transactional Replication

  • Transitional replication replicates each transaction from a publisher to subscriber for the article/table being published.
  • Initially transactional replication takes a snapshot of the publisher database and applies to the subscriber to synchronize the data.
  • A log Reader Agent transactions from the transaction log writes it to the distribution database send then to subscriber database.

18 of 38

AAA

*BBB

CCC

*DDD

*EEE

AAA

BBB*

CCC

DDD*

EEE*

Publisher

Subscriber

Transactional Replication

19 of 38

Advantages of Data Replication�

  • Increased Reliability and availability − In case of failure

of any site, the database system continues to work since a copy

is available at another site(s).

  • Reduction in Network Load − Since local copies of data
  • are available, query processing can be done with reduced
  • network usage, particularly during prime hours. Data updating can be done at non-prime hours.
  • Quicker Response − Availability of local copies of data ensures quick query processing and consequently quick response time.
  • Simpler Transactions − Transactions require less number of joins of tables located at different sites and minimal coordination across the network. Thus, they become simpler in nature.

20 of 38

Disadvantages of Data Replication�

  • Increased Storage Requirements − Maintaining multiple copies of data is associated with increased storage costs. The storage space required is in multiples of the storage required for a centralized system.
  • Increased Cost and Complexity of Data Updating − Each time a data item is updated, the update needs to be reflected in all the copies of the data at the different sites. This requires complex synchronization techniques and protocols.
  • Undesirable Application – Database coupling − If complex update mechanisms are not used, removing data inconsistency requires complex co-ordination at application level. This results in undesirable application – database coupling.

21 of 38

Data fragmentation

  • Data fragmentation occurs when a collection of data in memory is

broken up into many pieces that are not close together.

It is typically the result of attempting to insert a large object

into storage that has already suffered external fragmentation.

  • Data fragmentation is a technique used to break up objects.

  • In designing a distributed database, you must decide which

portion of the database is to be stored where.  One technique

used to break up the database into logical units called fragments.  Fragmentation information is stored in a distributed

data catalogue which the processing computer uses to process a user's request.

22 of 38

Types Of Fragmentation

  • Horizontal Fragmentation: This type of fragmentation refers division of a relation into fragments of rows. Each fragment is stored at a different computer or node, and each fragment contains unique rows. Each horizontal fragment may have a different number of rows, but each fragment must have the same attributes.
  • Vertical Fragmentation: This type of fragmentation refers to the division of a relation into fragments that comprise a collection of attributes. Each vertical fragment must have the same number of rows, but can have different attributes depending on the key.
  • Mixed Fragmentation: This type of fragmentation is a two-step process. First, horizontal fragmentation is done to obtain the necessary rows, then vertical fragmentation is done to divide the attributes among the rows.

23 of 38

Horizontal Fragmentation

  • Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each horizontal fragment must have all columns of the original base table.
  • For example, in the student schema, if the details of all students of Computer Science Course needs to be maintained at the School of Computer Science, then the designer will horizontally fragment the database as follows −

CREATE COMP_STD AS

SELECT * FROM STUDENT

WHERE COURSE = "Computer Science";

24 of 38

Vertical Fragmentation

Regd_No

Name

Course

Address

Semester

Fees

Marks

In vertical fragmentation, the fields or columns of a table are grouped into fragments.

In order to maintain reconstructiveness, each fragment should contain the primary key field(s) of the table.

Vertical fragmentation can be used to enforce privacy of data.

For example, let us consider that a University database keeps records of all registered

students in a Student table having the following schema.

STUDENT

Now, the fees details are maintained in the accounts section.

In this case, the designer will fragment the database as follows −

CREATE TABLE STD_FEES AS

SELECT Regd_No, Fees FROM STUDENT;

25 of 38

Hybrid Fragmentation�

  • In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used. This is the most flexible fragmentation technique since it generates fragments with minimal extraneous information. However, reconstruction of the original table is often an expensive task.
  • Hybrid fragmentation can be done in two alternative ways −
  • At first, generate a set of horizontal fragments; then generate vertical fragments from one or more of the horizontal fragments.
  • At first, generate a set of vertical fragments; then generate horizontal fragments from one or more of the vertical fragments.

26 of 38

Mixed Fragmentation

  • This is the combination of horizontal as well as vertical fragmentation. This type of fragmentation will have horizontal fragmentation to have subset of data to be distributed over the DB, and vertical fragmentation to have subset of columns of the table.

  • As we observe in above diagram, this type of fragmentation can be done in any order. It does not have any particular order. It is solely based on the user requirement. But it should satisfy fragmentation conditions.

27 of 38

Advantages of Fragmentation

  • Easy usage of Data: It makes most frequently accessed set of data near to the user. Hence these data can be accessed easily as and when required by them.
  • Efficiency : It in turn increases the efficiency of the query by reducing the size of the table to smaller subset and making them available with less network access time.
  • Security : It provides security to the data. That means only valid and useful records will be available to the actual user. The DB near to the user will not have any unwanted data in their DB. It will contain only those information's, which are necessary for them.
  • Parallelism : Fragmentation allows user to access the same table at the same time from different locations. Users at different locations will be accessing the same table in the DB at their location, seeing the data that are meant for them. If they are accessing the table at one location, then they have to wait for the locks to perform their transactions.
  • Reliability : It increases the reliability of fetching the data. If the users are located at different locations accessing the single DB, then there will be huge network load. This will not guarantee that correct records are fetched and returned to the user. Accessing the fragment of data in the nearest DB will reduce the risk of data loss and correctness of data.
  • Balanced Storage : Data will be distributed evenly among the databases in DDB.

28 of 38

What is Client Server Architecture�

  • Definition – Client-server architecture is also called of the “Client/Server Network” or “Network computing Model“, because in this architecture all services and requests are spread over the network. Its functionality like as distributed computing system because in which all components are performing their tasks independently from each other.
  • Client-server architecture is a shared computer network architecture where several clients (remote system) send many requests and finally to obtained services from the centralized server machine (host system). Client machine delivers user-friendly interface that helps to users to fire request services of server computer and finally to show your output on client system.

29 of 38

30 of 38

1-Tier Architecture

  • n the 1-tier architecture, all client/server configuration setting, user interface environment, data logic, and marketing logic system are existed on the same system. These types of services are reliable but it is very difficult tasks to handle because they contain all data in different variance, which are allotted the replication of entire work. This architecture also contain the different layers.

For example – Presentation, Business, Data Access layer with using of single software package. All data is saved on the local machine. Some applications, which manage all three tiers like as MP3 player, MS Office; but these types of applications are presented under 1-tier architecture applications. 

31 of 38

2-Tier Architecture

  • 2-tier architecture provides the best client/server environment that helps to store user interface on the client system and all database is saved on the server machine. Business logic and database logic are existed on the client otherwise server, but they are required to be maintained. When data logic and business are gathered on the client terminal then it is known as “fat client thin server architecture”. But if Business Logic and Data Logic are controlled at the server machine then it is known as “thin client fat server architecture”.

In this architecture, client and server machines are connected directly incorporation because if client is firing any input for server terminal then in between should not any intermediate. So, it delivers the output with fastest rate and to ignore misunderstanding between the other clients. For example – online ticket reservations program, in which 2-tier architecture is used.

32 of 38

Benefits Are

  • Easy to design all applications
  • Maximum user satisfaction
  • Implementation of Homogeneous Environment
  • Best performance

Limitations Are

  • Poor performance due to grow number of connections of each user
  • Less security
  • All clients are totally dependent upon the manufacturer’s database.
  • Less portability means this architecture is totally dependent upon the particular database.

33 of 38

3-Tier Architecture

  • In this 3-tier architecture, middleware is needed because if client machine sends the request to server machine then firstly this request is received by middle layer, and finally this request is obtained to server. So, firstly response of server is received by middle layer then it is obtained to client machine. All data logic and business logic are stored on the middleware. Due to use of middleware,  to improve its flexibility and deliver excellent performance.

34 of 38

3-tier architecture is divided into 3 layers such as presentation layer (Client Tier), Application layer (Business Tier) and Database layer (Data Tier). Client machine handles the presentation layer, Application layer controls the Application layer, and finally Server machine takes care of Database layer.

Benefits Are

  • Best performed data integrity
  • Improved security to 2-tier architecture
  • Hide database structure

Limitation is:

To increase complexity of communication in between client and server because in which middleware is also used

35 of 38

Examples of Client Server Architecture�

  • There are four examples of Client Server Architecture. Below explain each one –
  • Web Servers – Web server likes as high performance computer system that can host multiples websites. On this server, to install different types of web server softwares like as Apache or Microsoft IIS, which delivers access to hosted several websites on the internet, and these servers are linked with internet through higher speed connection that delivers ultra data transmission rates.
  • Mail Servers – Email servers helps to send and receive all emails. Some softwares are run on the mail server which allow to administrator to create and handle all email accounts for any domain that is hosted on the server. Mail servers use the some protocols for sending and receiving emails such as SMTP, IMAP, and POP3. SMTP protocol helps to fire messages and manages all outgoing email requests. IMAP and POP3 help to receive all messages and handle all incoming mails.

36 of 38

  • File Servers – File server is dedicated systems that allow users to access for all files. It works like as centralized file storage location, and it can be accessed by several terminal systems.
  • DNS – DNS stands for “Domain Name Server“, and it has huge database of different types of public IP addresses, and they link with their hostnames

37 of 38

Components of Client Server Architecture�

  • Client-server architecture contains three components such as workstations, server, and networking devices, and they are connected with each other.
  • Workstation – Workstation is also known as “Client Computer“. There are different types of operating systems, which are installed on the workstations like as Windows 2000, Windows XP, Windows Vista, Windows 7, and Windows 10. These workstation operating systems are cheaper compare to server’s operating systems.
  • Server – Server is a ultra performer computer system that contains the fastest memory, more hard drive space, and faster speed processors because they save and service of several requests which are coming from workstation side. A server plays different types of roles like as mail server, database server, file server, and domain controller at the same time duration.
  • Network Devices – With the help of network devices; workstations and servers are connected with each other. Every network device has own functionality like as hub is used for making connection between server to multiple workstations, repeater is used for moving data from one devices to another device, and bridges helps to isolate of all network segments.

38 of 38