1 of 29

CSE 414: Section 9

NoSQL, SQL++

May 27th, 2021

2 of 29

Announcements

Welcome to Week 9 - we’re almost there!

HW6 due this Monday i.e. May 31st
HW4 grades released

Make sure you check feedback
If you find any grading errors - submit a regrade request:)

Quiz 3 opens coming Tuesday i.e. June 1st

3 of 29

Data Models

Key-value stores

Opaque value
e.g., Memcached, Redis

Document stores

“key-object”
e.g., SimpleDB, CouchDB, MongoDB, AsterixDB

Column-oriented

“column groups”
e.g., HBase, Cassandra, ClickHouse

Graph

E.g. Neo4j

4 of 29

Structured Data

5 of 29

Unstructured Data

6 of 29

Semi-structured Data

7 of 29

JSON

8 of 29

Why Semi-Structured?

Real-world data is often semistructured, with a flexible schema

Of course, we can convert this into a relational model, but the overhead costs of doing so may be too high

Hence, we use a system to query over Semi-Structured data directly

9 of 29

JSON and Semi-Structured Data

What kind of files are used for such data? JSON, XML, Protobuf (also an IDL)

Familiar - as your HTTP request/response

Good for data exchange
Maps to OOP paradigm

Also - as a database file

Flexible tree-structured model
Query langs: XQuery, XPath, etc.

10 of 29

JSON

JavaScript Object Notation

It is easy for humans to read and write. It is easy for machines to parse and generate. (Win-Win)
Based on familiar syntax derived from JavaScript, C, Java

JSON is built on two structures:

A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

11 of 29

JSON Takeaway

Think of JSON as an array of ordered name value pairs, or an array of objects in the form of a tree..

.adm files in your homework- stand for Asterix Data Model, very similar format to JSON files

12 of 29

AsterixDB

A semi-structured NoSQL style data model (ADM)
Extends JSON with object database ideas
A DBMS used to query over semi-structured data

Know the following:

DDL: type (open vs. closed), data types (e.g. multiset). Creating an index.
DML:Joins, Nesting / Unnesting.
(Asterix stores data as flattened tables behind the scenes)

13 of 29

SQL++

The query language we use in AsterixDB to query semi-structured data

Strikingly similar to SQL!

(Because SQL is the industry

standard)

14 of 29

What is SQL++?

Just like SQL but parsed for processing JSON data

SQL++ has keywords to handle collections of data (i.e. non-flat data)

15 of 29

Motivation for SQL++

Why SQL++? Why not some other query language?

People are used to/like specifying data through SQL syntax

SQL-like language enforces idea of physical data independence

16 of 29

In essence

If this felt like a lot of terms bombarded together:

We are going to write SQL++ queries in AsterixDB to query over semi-structured JSON data

17 of 29

SQL++ Features

18 of 29

Output with Structure

If we wanted the name and a list of orders:

SELECT P.name, P.orders

FROM Person P;

Output:

{“name”: “Dan”, “orders”: [ { “date”: 1997,

“product”: “Furby”} ]}

{“name”: “Alvin”, “orders”: [ {date”: 2000,

“product”: “Furby”},

{date”: 2012,

“product”: “Magic8”} ]}

19 of 29

Unnesting

If we wanted the name and each product separately:

SELECT P.name, O.product as product

FROM Person P, P.orders O;

{“name”: “Dan”, “product”: “Furby”}

{“name”: “Alvin”, “product”: “Furby”}

{“name”: “Alvin”, “product”: “Magic8”}

20 of 29

Nesting (with subquery)

If we wanted the name and all products as a list:

SELECT P.name, (SELECT O.product FROM P.order O)� AS products

FROM Person P;

{“name”: “Dan”, “products”: [“Furby”] }

{“name”: “Alvin”, “products”: [“Furby”, “Magic8”]}

21 of 29

Nesting (with LET subquery)

If we wanted the name and all products as a list:

SELECT P.name, products

FROM Person P

LET products = (SELECT O.product FROM P.order O);

{“name”: “Dan”, “products”: [“Furby”] }

{“name”: “Alvin”, “products”: [“Furby”, “Magic8”]}

22 of 29

Compare

If we wanted the name and �all products as a list:

SELECT P.name, products

FROM Person P

LET products = (SELECT O.product

FROM P.order O);

{“name”: “Dan”, “product”: [“Furby”] }

{“name”: “Alvin”, “product”: [“Furby”, “Magic8”]}

If we wanted the name and �each product separately:

SELECT P.name, O.product as product

FROM Person P, P.orders O;

{“name”: “Dan”, “product”: “Furby”}

{“name”: “Alvin”, “product”: “Furby”}

{“name”: “Alvin”, “product”: “Magic8”}

23 of 29

Casing

SELECT y.name, z.`-continent` as continent

FROM geo.world x, x.mondial.country y,

case when is_array(y.encompassed)

then y.encompassed

else [y.encompassed] end z

ORDER BY y.name;

“Find the encompassing continents for each country”

24 of 29

HW SQL++ Example

SELECT y.`-car_code` as code,

y.name as name

FROM geo.world x, x.mondial.country y

ORDER BY y.name;

25 of 29

y)

26 of 29

27 of 29

Useful Keywords/Syntax for HW

is_array( … ) -----> checks if value is an array

split(s, d) -----> splits string s on delimiter d

[ … ] -----> explicitly construct array

(CASE WHEN … THEN … ELSE … END) -----> combine with “is_array(…)”

MISSING -----> reserved keyword like “NULL”

` … ` -----> backtick needed for accessing keys with names containing “-”

28 of 29

NoSQL

No clear definition :\

Non-relational
+ scalability, + availability, + flexibility
- consistency, - OLAP performance
Open source implementations

Motivation

The need to scale
Lots of web apps mostly OLTP queries

Read/write intensive
but fewer joins & aggregates

SQL vs. NoSQL Databases: What’s the Difference?

29 of 29

SQL vs NoSQL

Structure

Language

Scaling

Convergence