1 of 29

CSE 414: Section 9

NoSQL, SQL++

May 27th, 2021

2 of 29

Announcements

Welcome to Week 9 - we’re almost there!

  • HW6 due this Monday i.e. May 31st
  • HW4 grades released
    • Make sure you check feedback
    • If you find any grading errors - submit a regrade request:)
  • Quiz 3 opens coming Tuesday i.e. June 1st

3 of 29

Data Models

  • Key-value stores
    • Opaque value
    • e.g., Memcached, Redis
  • Document stores
    • “key-object”
    • e.g., SimpleDB, CouchDB, MongoDB, AsterixDB
  • Column-oriented
    • “column groups”
    • e.g., HBase, Cassandra, ClickHouse
  • Graph
    • E.g. Neo4j

4 of 29

Structured Data

5 of 29

Unstructured Data

6 of 29

Semi-structured Data

7 of 29

JSON

8 of 29

Why Semi-Structured?

Real-world data is often semistructured, with a flexible schema

Of course, we can convert this into a relational model, but the overhead costs of doing so may be too high

Hence, we use a system to query over Semi-Structured data directly

9 of 29

JSON and Semi-Structured Data

What kind of files are used for such data? JSON, XML, Protobuf (also an IDL)

Familiar - as your HTTP request/response

  • Good for data exchange
  • Maps to OOP paradigm

Also - as a database file

  • Flexible tree-structured model
  • Query langs: XQuery, XPath, etc.

10 of 29

JSON

JavaScript Object Notation

  • It is easy for humans to read and write. It is easy for machines to parse and generate. (Win-Win)
  • Based on familiar syntax derived from JavaScript, C, Java

JSON is built on two structures:

  • A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
  • An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

11 of 29

JSON Takeaway

Think of JSON as an array of ordered name value pairs, or an array of objects in the form of a tree..

.adm files in your homework- stand for Asterix Data Model, very similar format to JSON files

12 of 29

AsterixDB

  • A semi-structured NoSQL style data model (ADM)
  • Extends JSON with object database ideas
  • A DBMS used to query over semi-structured data

Know the following:

  • DDL: type (open vs. closed), data types (e.g. multiset). Creating an index.
  • DML:Joins, Nesting / Unnesting.
  • (Asterix stores data as flattened tables behind the scenes)

13 of 29

SQL++

The query language we use in AsterixDB to query semi-structured data

Strikingly similar to SQL!

(Because SQL is the industry

standard)

14 of 29

What is SQL++?

Just like SQL but parsed for processing JSON data

SQL++ has keywords to handle collections of data (i.e. non-flat data)

15 of 29

Motivation for SQL++

Why SQL++? Why not some other query language?

People are used to/like specifying data through SQL syntax

SQL-like language enforces idea of physical data independence

16 of 29

In essence

If this felt like a lot of terms bombarded together:

We are going to write SQL++ queries in AsterixDB to query over semi-structured JSON data

17 of 29

SQL++ Features

18 of 29

Output with Structure

If we wanted the name and a list of orders:

SELECT P.name, P.orders

FROM Person P;

Output:

{“name”: “Dan”, “orders”: [ { “date”: 1997,

“product”: “Furby”} ]}

{“name”: “Alvin”, “orders”: [ {date”: 2000,

“product”: “Furby”},

{date”: 2012,

“product”: “Magic8”} ]}

19 of 29

Unnesting

If we wanted the name and each product separately:

SELECT P.name, O.product as product

FROM Person P, P.orders O;

{“name”: “Dan”, “product”: “Furby”}

{“name”: “Alvin”, “product”: “Furby”}

{“name”: “Alvin”, “product”: “Magic8”}

20 of 29

Nesting (with subquery)

If we wanted the name and all products as a list:

SELECT P.name, (SELECT O.product FROM P.order O)� AS products

FROM Person P;

{“name”: “Dan”, “products”: [“Furby”] }

{“name”: “Alvin”, “products”: [“Furby”, “Magic8”]}

21 of 29

Nesting (with LET subquery)

If we wanted the name and all products as a list:

SELECT P.name, products

FROM Person P

LET products = (SELECT O.product FROM P.order O);

{“name”: “Dan”, “products”: [“Furby”] }

{“name”: “Alvin”, “products”: [“Furby”, “Magic8”]}

22 of 29

Compare

If we wanted the name and �all products as a list:

SELECT P.name, products

FROM Person P

LET products = (SELECT O.product

FROM P.order O);

{“name”: “Dan”, “product”: [“Furby”] }

{“name”: “Alvin”, “product”: [“Furby”, “Magic8”]}

If we wanted the name and �each product separately:

SELECT P.name, O.product as product

FROM Person P, P.orders O;

{“name”: “Dan”, “product”: “Furby”}

{“name”: “Alvin”, “product”: “Furby”}

{“name”: “Alvin”, “product”: “Magic8”}

23 of 29

Casing

SELECT y.name, z.`-continent` as continent

FROM geo.world x, x.mondial.country y,

case when is_array(y.encompassed)

then y.encompassed

else [y.encompassed] end z

ORDER BY y.name;

“Find the encompassing continents for each country”

24 of 29

HW SQL++ Example

SELECT y.`-car_code` as code,

y.name as name

FROM geo.world x, x.mondial.country y

ORDER BY y.name;

25 of 29

y)

26 of 29

27 of 29

Useful Keywords/Syntax for HW

is_array( … ) -----> checks if value is an array

split(s, d) -----> splits string s on delimiter d

[ … ] -----> explicitly construct array

(CASE WHEN … THEN … ELSE … END) -----> combine with “is_array(…)

MISSING -----> reserved keyword like “NULL

` … ` -----> backtick needed for accessing keys with names containing “-

28 of 29

NoSQL

  • No clear definition :\
    • Non-relational
    • + scalability, + availability, + flexibility
    • - consistency, - OLAP performance
    • Open source implementations
  • Motivation
    • The need to scale
    • Lots of web apps mostly OLTP queries
      • Read/write intensive
      • but fewer joins & aggregates

29 of 29

SQL vs NoSQL

Structure

Language

Scaling

Convergence