1 of 19

Big Data and NoSQL

CREDITS: YU-SAN LIN, PENN STATE

(and A. Almaganbetov, A. Aldabergenov)

2 of 19

What is NoSQL?

  • Not Only SQL
  • Non-­relational database management systems
  • In response to a need for distributed data stores with a very large scale of data storage: big data
    • may not require a fixed schema
    • avoids join operations
    • scales horizontally

3 of 19

NoSQL

  • NoSQL data modeling often starts from the application-­specific queries
    • Relational modeling: “What answers do I have?”
    • NoSQL data modeling: “What questions do I have?”
  • NoSQL data modeling often requires a deeper understanding of data structures and algorithms than relational database modeling does. Why?
  • Some NoSQL systems are specifically made for hierarchical or graph-­like data modeling and processing

4 of 19

Big Data

Data that contains greater variety, arriving in increasing volumes, and with more velocity.

https://www.oracle.com/big-data/what-is-big-data/

https://www.statista.com/statistics/871513/worldwide-data-created/

5 of 19

ACID

  • Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all.
  • Consistency preservation: A correct execution of the transaction must take the database from one consistent state to another.
  • Isolation: A transaction should not make its updates visible to other transactions until it is committed; this property, when enforced strictly, solves the temporary update problem and makes cascading rollbacks of transactions unnecessary
  • Durability or permanency: Once a transaction changes the database and the changes are committed, these changes must never be lost because of subsequent failure.

Elmasri textbook

6 of 19

BASE

  • Basically Available: This constraint states that the system does guarantee the availability of the data; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a check to clear in your bank account.
  • Soft state: The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency’, thus the state of the system is always ‘soft.’
  • Eventual consistency: The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one. 

http://www.dataversity.net/acid-vs-base-the-shifting-ph-of-database-transaction-processing/

7 of 19

Relational vs. NoSQL

  • Structured query language (SQL)
  • Structured and organized data

  • Data and its relationships are stored in separate tables
  • Data Manipulation Language, Data Definition Language
  • ACID transaction properties
  • No declarative query language
  • No predefined schema
  • Key-­Value Pair Store, Column Store, Document Store, Graph databases
  • Eventual consistency (BASE)
  • Unstructured and unpredictable data
  • Prioritizes high performance, high availability and scalability

8 of 19

https://www.flickr.com/photos/bpanulla/5310748684

9 of 19

CAP Theorem

  • Consistency: Data in the database remains consistent after the execution of an operation, e.g., after an update operation, all clients see the same data
  • Availability: System is always up; no downtime
  • Partition Tolerance: System continues to function even if the communication between the distributed servers is unreliable
  • CAP Theorem says no distributed system can provide all 3

10 of 19

https://alperenbayramoglu2.medium.com/understanding-the-cap-theorem-8e06886c12ac

11 of 19

Pros and cons of NoSQL

  • High scalability
  • Distributed computing
  • Lower cost
  • Schema flexibility, semi-structured data
  • Uncomplicated relationships

  • Limited query capabilities (but improving rapidly)
  • No transactions
  • Large number of types and vendors

12 of 19

Main types of NoSQL database systems

  • Key-value pair
  • Column
  • Document
  • Graph

13 of 19

14 of 19

What to choose?

15 of 19

Key-Value Pair Databases

  • Most basic type of NoSQL database
  • Designed to handle a huge amount of data: Big Data
  • Each key is unique, value can be any type of data
  • Suitable for: shopping cart contents, settings

16 of 19

Column Databases

  • Works on columns
  • Column values are stored contiguously in column-specific files
  • All data in each data file are the same type, so ideal for compression
  • High performance on aggregation queries
  • Suitable for: customer relationship management (CRM), library card catalogs

17 of 19

Graph Databases

  • Stores data in a graph
  • Elegantly represents data in a highly accessible way
  • A collection of nodes and edges
  • Each node and each edge have unique identifiers

18 of 19

Document Databases

  • Collection of documents
  • Data is stored inside documents
  • A document is a key-value collection
  • Documents do not typically have a schema, so they are flexible and easy to change
  • Documents can contain key-value pairs, arrays of pairs or nested documents

19 of 19