1 of 21

How to store data? It depends.

Juraj Sottnik

CONFIDENTIAL

2 of 21

What is the nature of data?

  • structured
  • unstructured
  • consistent
  • inconsistent

CONFIDENTIAL

CONFIDENTIAL

3 of 21

How big is your data?

  • static dataset
  • stream of data
  • retention

CONFIDENTIAL

CONFIDENTIAL

4 of 21

What kind of data (business problem)?

  • relational
  • transactional
  • connections between data
  • CAP (consistency, availability, partition tolerance)
  • operational vs analytic
  • inserts vs reads
  • utility vs strategic

CONFIDENTIAL

CONFIDENTIAL

5 of 21

When RDBMS

  • no need for a flexible data model
  • structured data
  • need of ACID
  • no need for constant uptime
  • data can fit into memory

CONFIDENTIAL

CONFIDENTIAL

6 of 21

When NoSQL

  • need more flexible model
  • need constant uptime
  • need to scale easily
  • business problem doesn't need features provided by SQL

CONFIDENTIAL

CONFIDENTIAL

7 of 21

NoSQL - Key value store

  • idea of key and value
  • access to the database is via a primary key
  • value can be simple type or serialized document (watch out for get-update conflicts)
  • can be used as secondary index

CONFIDENTIAL

CONFIDENTIAL

8 of 21

Memcached

Simple key and value. Server does not care what value looks like.

Simple key, numeric value.

set key1 value1

get key1

set key2 0

incr key2 1

CONFIDENTIAL

CONFIDENTIAL

9 of 21

Redis

Simple key, hash value type.

HSET key3 field1 "Hello"

HGET key3 field1

Simple key, list value type.

RPUSH key4 "hello"

RPUSH key4 "world"

LRANGE key4 0 -1

CONFIDENTIAL

CONFIDENTIAL

10 of 21

Redis

Simple key, set value type.

SADD key5 "Hello"

SMEMBERS key5

CONFIDENTIAL

CONFIDENTIAL

11 of 21

NoSQL - Document oriented store

  • idea of documents as independent units
  • hierarchical tree data structures
  • you should version schema of documents, it is not easy to update all existing document

CONFIDENTIAL

CONFIDENTIAL

12 of 21

MongoDB

  • unique _id field that acts as a primary key
  • you can configure schema validation
  • sharding, make sure to define correctly your sharding key, because resharding need dump and restore of data.

CONFIDENTIAL

CONFIDENTIAL

13 of 21

MongoDB

Query based on nested properties

{ item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" }��db.inventory.find(� {� "size.h": {� $lt: 15� },� "size.uom": "in",� status: "D"� }�)

CONFIDENTIAL

CONFIDENTIAL

14 of 21

MongoDB

Aggregation Pipeline

db.transactions.aggregate([

{

$match: {

transactionDate: {

$gte: ISODate("2017-01-01T00:00:00.000Z"),

$lt: ISODate("2017-01-31T23:59:59.000Z")

}

}

}, {

$group: {

_id: null,

total: {

$sum: "$amount"

}

}

}

])

CONFIDENTIAL

CONFIDENTIAL

15 of 21

NoSQL - Column family storage

  • idea of key value where value is set of columns
  • column consist name, value and timestamp
  • partition key
  • compound primary key
  • very efficient at data compression and/or partitioning

CONFIDENTIAL

CONFIDENTIAL

16 of 21

Apache Cassandra

  • scalable
  • fault tolerant
  • consistent
  • distributed atomic counters

CONFIDENTIAL

CONFIDENTIAL

17 of 21

NoSQL - Graph database

  • database with an explicit graph structure
  • optimized for connections between objects
  • optimized for traversing connected data

CONFIDENTIAL

CONFIDENTIAL

18 of 21

neo4j

  • edge, node, attributes
  • edge and node can have attributes and be labelled

MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie)�WHERE movie.title STARTS WITH "T"�RETURN movie.title AS title, collect(actor.name) AS cast�ORDER BY title ASC LIMIT 10;

CONFIDENTIAL

CONFIDENTIAL

19 of 21

Mixed approach

JSON support in Postgres

'[{"a":"foo"},{"b":"bar"},{"c":"baz"}]'::json->2

{"c":"baz"}

'{"a":[1,2,3],"b":[4,5,6]}'::json#>>'{a,2}'

3

You can upsert json documents with INSERT ON CONFLICT and jsonb_set.

CONFIDENTIAL

CONFIDENTIAL

20 of 21

Dummy Web Application

Users Sessions - Redis

Product Catalog - MongoDB

User events - MongoDB, Cassandra

Recommendations - neo4j

Analytics - MongoDB, Cassandra

Orders and payments - RDBMS

CONFIDENTIAL

CONFIDENTIAL

21 of 21

Summary

  • different problems need different solution
  • polyglot approach can solve our problems but adds complexity
  • prototype and discover
  • do not forget to think about retention
  • NoSQL can be introduced in existing applications
  • Use RDBMS when it makes sense

CONFIDENTIAL

CONFIDENTIAL