1 of 19

Introduction to Data Science

By

S.V.V.D.Jagadeesh

Sr. Assistant Professor

Dept of Artificial Intelligence & Data Science

LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING

2 of 19

  • Session Outcomes
  • BASE Principles

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Previously Discussed Topics

LBRCE

IDS

3 of 19

At the end of this session, Student will be able to:

  • Understand different types of NoSQL database types(Understand- L2)

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Session Outcomes

LBRCE

IDS

4 of 19

  • Relational databases generally strive toward normalization: making sure every piece of data is stored only once.
  • Normalization marks their structural setup.
  • If, for instance, you want to store data about a person and their hobbies, you can do so with two tables: one about the person and one about their hobbies.
  • An additional table is necessary to link hobbies to persons because of their many-to-many relationship: a person can have multiple hobbies and a hobby can have many persons practicing it.

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Need for different NoSQL Types

LBRCE

IDS

5 of 19

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Need for different NoSQL Types

LBRCE

IDS

6 of 19

  • Traditional relational databases are row-oriented, with each row having a row id and each field within the row stored together in a table.
  • Let’s say, for example’s sake, that no extra data about hobbies is stored and you have only a single table to describe people

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Column Oriented Databases

LBRCE

IDS

7 of 19

  • Every time you look up something in a row-oriented database, every row is scanned, regardless of which columns you require.
  • Let’s say you only want a list of birthdays in September.
  • The database will scan the table from top to bottom and left to right

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Column Oriented Databases

LBRCE

IDS

8 of 19

  • Indexing the data on certain columns can significantly improve lookup speed, but indexing every column brings extra overhead and the database is still scanning all the columns.
  • Column databases store each column separately, allowing for quicker scans when only a small number of columns is involved

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Column Oriented Databases

LBRCE

IDS

9 of 19

  • This layout looks very similar to a row-oriented database with an index on every column.
  • A database index is a data structure that allows for quick lookups on data at the cost of storage space and additional writes (index update).
  • An index maps the row number to the data, whereas a column database maps the data to the row numbers; in that way counting becomes quicker, so it’s easy to see how many people like archery, for instance.
  • Storing the columns separately also allows for optimized compression because there’s only one data type per table.

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Column Oriented Databases

LBRCE

IDS

10 of 19

  • The column-oriented database shines when performing analytics and reporting: summing values and counting entries.
  • A row-oriented database is often the operational database of choice for actual transactions (such as sales).
  • Overnight batch jobs bring the column-oriented database up to date, supporting lightning-speed lookups and aggregations using MapReduce algorithms for reports.
  • Examples of column-family stores are Apache HBase, Facebook’s Cassandra, Hypertable, and the grandfather of wide-column stores, Google BigTable.

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Column Oriented Databases

LBRCE

IDS

11 of 19

  • Key-value stores are the least complex of the NoSQL databases.
  • They are, as the name suggests, a collection of key-value pairs, and this simplicity makes them the most scalable of the NoSQL database types, capable of storing huge amounts of data

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

KEY VALUE Stores

LBRCE

IDS

12 of 19

  • The value in a key-value store can be anything: a string, a number, but also an entire new set of key-value pairs encapsulated in an object.
  • Examples of key-value stores are Redis, Voldemort, Riak, and Amazon’s Dynamo

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

KEY VALUE Stores

LBRCE

IDS

13 of 19

  • Document stores are one step up in complexity from key-value stores: a document store does assume a certain document structure that can be specified with a schema.
  • Document stores appear the most natural among the NoSQL database types because they’re designed to store everyday documents as is, and they allow for complex querying and calculations on this often already aggregated form of data.
  • The way things are stored in a relational database makes sense from a normalization point of view: everything should be stored only once and connected via foreign keys.

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Document Stores

LBRCE

IDS

14 of 19

  • Document stores care little about normalization as long as the data is in a structure that makes sense.
  • A relational data model doesn’t always fit well with certain business cases. Newspapers or magazines, for example, contain articles.
  • To store these in a relational database, you need to chop them up first: the article text goes in one table, the author and all the information about the author in another, and comments on the article when published on a website go in yet another.
  • A newspaper article can also be stored as a single entity; this lowers the cognitive burden of working with the data for those used to seeing articles all the time.
  • Examples of document stores are MongoDB and CouchDB

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Document Stores

LBRCE

IDS

15 of 19

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Document Stores

LBRCE

IDS

16 of 19

  • The last big NoSQL database type is the most complex one, geared toward storing relations between entities in an efficient manner.
  • When the data is highly interconnected, such as for social networks, scientific paper citations, or capital asset clusters, graph databases are the answer.
  • Graph or network data has two main components:

■ Node—The entities themselves. In a social network this could be people.

■ Edge—The relationship between two entities. This relationship is represented by a line and has its own properties. An edge can have a direction, for example, if the arrow indicates who is whose boss

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Graph Databases

LBRCE

IDS

17 of 19

  • Graphs can become incredibly complex given enough relation and entity types.
  • Graph databases like Neo4j also claim to uphold ACID, whereas document stores and key-value stores adhere to BASE.

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Graph Databases

LBRCE

IDS

18 of 19

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Ranking of Different Database Types

LBRCE

IDS

19 of 19

  • Session Outcomes
  • Need for different NoSQL Types
  • Column Oriented Databases
  • KEY-VALUE stores
  • Document Stores
  • Graph Databases
  • Ranking of different Database Types

S.V.V.D.Jagadeesh

Tuesday, March 11, 2025

Summary

LBRCE

IDS