1 of 21

Data Base Management Systems

LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING (AUTONOMOUS) Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.

UNIT V: Interfacing And Interacting With NoSQL

2 of 21

Introduction to NoSQL

A NoSQL originally referring to non SQL or non relational is a database that provides a mechanism for storage and retrieval of data.

NoSQL databases are used in real-time web applications and big data and their use are increasing over time. NoSQL systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-like query languages.

The data structures used by NoSQL databases are different from those used by default in relational databases which makes some operations faster in NoSQL.

In relational database you need to create the table, define schema, set the data types of fields etc before you can actually insert the data. In NoSQL you don’t have to worry about that, you can insert, update data on the go.

There are certain situations where you would prefer relational database over NoSQL, however when you are dealing with huge amount of data then NoSQL database is your best choice.

3 of 21

Difference between SQL(RDBMS) and NoSQL.

Type

SQL databases are primarily called as Relational Databases (RDBMS)

NoSQL database are primarily called as non-relational or distributed database.

Language

SQL databases defines and manipulates data based structured query language (SQL). SQL is one of the most versatile and widely-used options available which makes it a safe choice especially for great complex queries.

SQL requires you to use predefined schemas to determine the structure of your data before you work with it. Also all of your data must follow the same structure.

NoSQL database has dynamic schema for unstructured data. Data is stored in many ways which means it can be document-oriented, column-oriented, graph-based or organized as a KeyValue store. This flexibility means that documents can be created without having defined structure first.

Scalability

In almost all situations SQL databases are vertically scalable. This means that you can increase the load on a single server by increasing things like RAM, CPU or SSD.

NoSQL databases are horizontally scalable. This means that you handle more traffic by sharding, or adding more servers in your NoSQL database

4 of 21

Structure

SQL databases are table-based.

NoSQL databases are either key-value pairs, document-based, graph databases or wide-column stores.

Property followed

SQL databases follow ACID properties (Atomicity, Consistency, Isolation and Durability).

NoSQL database follows Brewers CAP theorem (Consistency, Availability and Partition tolerance).

Support

Great support is available for all SQL database from their vendors. Also a lot of independent consultations are there who can help you with SQL database for a very large scale deployments.

NoSQL databases still have to rely on community support and only limited outside experts are available for setting up and deploying your large scale NoSQL deployments.

Examples

SQL databases: PostgreSQL, MySQL, Oracle and Microsoft SQL Server.

NoSQL database: Redis, RavenDB Cassandra, MongoDB, BigTable, HBase, Neo4j and CouchDB.

6 of 21

The CAP Theorem

The CAP theorem is used to makes system designers aware of the trade-offs while designing networked shared-data systems.

It is very important to understand the CAP theorem as It makes the basics of choosing any NoSQL database based on the requirements.

CAP theorem states that in networked shared-data systems or distributed systems, we can only achieve at most two out of three guarantees for a database: Consistency, Availability and Partition Tolerance.

Consistency

Consistency means that the nodes will have the same copies of a replicated data item visible for various transactions. A guarantee that every node in a distributed cluster returns the same, most recent, successful write.

Availability:

Availability means that each read or write request for a data item will either be processed successfully or will receive a message that the operation cannot be completed.

7 of 21

Partition tolerance

Partition tolerance means that the system can continue operating if the network connecting the nodes has a fault that results in two or more partitions, where the nodes in each partition can only communicate among each other.

The CAP theorem categorizes systems into three categories

CP (Consistent and Partition Tolerant) database:

A CP database delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved.

AP (Available and Partition Tolerant) database

An AP database delivers availability and partition tolerance at the expense of consistency. When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. When the partition is resolved, the AP databases typically resynchronous the nodes to repair all inconsistencies in the system.

8 of 21

CA (Consistent and Available) database

A CA delivers consistency and availability in the absence of any network partition. Often a single node’s DB servers are categorized as CA systems. Single node DB servers do not need to deal with partition tolerance and are thus considered CA systems.

In any networked shared-data systems or distributed systems partition tolerance is a must. Network partitions and dropped messages are a fact of life and must be handled appropriately.

9 of 21

There are the four main types of NoSQL databases

Document databases
Key-value stores
Column-oriented databases
Graph databases

Key-value stores

Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load.

Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases.

10 of 21

Document Databases

A document database stores data in JSON, BSON , or XML documents (not Word documents or Google docs, of course). In a document database, documents can be nested. Particular elements can be indexed for faster querying.

Graph-Based

A graph type database stores entities as well the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge gives a relationship between nodes. Every node and edge has a unique identifier.

Amazon SimpleDB, CouchDB, MongoDB

11 of 21

A graph database is a database that is based on graph theory. It consists of a set of objects, which can be a node or an edge.

Nodes represent entities or instances such as people, businesses, accounts, or any other item to be tracked.
Edges, also termed graphs or relationships, are the lines that connect nodes to other nodes; representing the relationship between them. Edges are the key concept in graph databases.
Properties are information associated to nodes.

Graph base database mostly used for social networks, logistics, spatial data.

Neo4J, Infinite Graph, OrientDB

12 of 21

Column-based

Column-oriented databases work on columns and are based on BigTable paper by Google. Every column is treated separately. Values of single column databases are stored contiguously.

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.

1 of 21

2 of 21

3 of 21

4 of 21

5 of 21

6 of 21

7 of 21

8 of 21

9 of 21

10 of 21

11 of 21

12 of 21

13 of 21

14 of 21

15 of 21

16 of 21

17 of 21

18 of 21

19 of 21

20 of 21

21 of 21