CSE 414: Section 9
NoSQL, SQL++
May 27th, 2021
Announcements
Welcome to Week 9 - we’re almost there!
Data Models
Structured Data
Unstructured Data
Semi-structured Data
JSON
Why Semi-Structured?
Real-world data is often semistructured, with a flexible schema
Of course, we can convert this into a relational model, but the overhead costs of doing so may be too high
Hence, we use a system to query over Semi-Structured data directly
JSON and Semi-Structured Data
What kind of files are used for such data? JSON, XML, Protobuf (also an IDL)
Familiar - as your HTTP request/response
Also - as a database file
JSON
JavaScript Object Notation
JSON is built on two structures:
JSON Takeaway
Think of JSON as an array of ordered name value pairs, or an array of objects in the form of a tree..
.adm files in your homework- stand for Asterix Data Model, very similar format to JSON files
AsterixDB
Know the following:
SQL++
The query language we use in AsterixDB to query semi-structured data
Strikingly similar to SQL!
(Because SQL is the industry
standard)
What is SQL++?
Just like SQL but parsed for processing JSON data
SQL++ has keywords to handle collections of data (i.e. non-flat data)
Motivation for SQL++
Why SQL++? Why not some other query language?
People are used to/like specifying data through SQL syntax
SQL-like language enforces idea of physical data independence
In essence
If this felt like a lot of terms bombarded together:
We are going to write SQL++ queries in AsterixDB to query over semi-structured JSON data
SQL++ Features
Output with Structure
If we wanted the name and a list of orders:
SELECT P.name, P.orders
FROM Person P;
Output:
{“name”: “Dan”, “orders”: [ { “date”: 1997,
“product”: “Furby”} ]}
{“name”: “Alvin”, “orders”: [ {date”: 2000,
“product”: “Furby”},
{date”: 2012,
“product”: “Magic8”} ]}
Unnesting
If we wanted the name and each product separately:
SELECT P.name, O.product as product
FROM Person P, P.orders O;
{“name”: “Dan”, “product”: “Furby”}
{“name”: “Alvin”, “product”: “Furby”}
{“name”: “Alvin”, “product”: “Magic8”}
Nesting (with subquery)
If we wanted the name and all products as a list:
SELECT P.name, (SELECT O.product FROM P.order O)� AS products
FROM Person P;
{“name”: “Dan”, “products”: [“Furby”] }
{“name”: “Alvin”, “products”: [“Furby”, “Magic8”]}
Nesting (with LET subquery)
If we wanted the name and all products as a list:
SELECT P.name, products
FROM Person P
LET products = (SELECT O.product FROM P.order O);
{“name”: “Dan”, “products”: [“Furby”] }
{“name”: “Alvin”, “products”: [“Furby”, “Magic8”]}
Compare
If we wanted the name and �all products as a list:
SELECT P.name, products
FROM Person P
LET products = (SELECT O.product
FROM P.order O);
{“name”: “Dan”, “product”: [“Furby”] }
{“name”: “Alvin”, “product”: [“Furby”, “Magic8”]}
If we wanted the name and �each product separately:
SELECT P.name, O.product as product
FROM Person P, P.orders O;
{“name”: “Dan”, “product”: “Furby”}
{“name”: “Alvin”, “product”: “Furby”}
{“name”: “Alvin”, “product”: “Magic8”}
Casing
SELECT y.name, z.`-continent` as continent
FROM geo.world x, x.mondial.country y,
case when is_array(y.encompassed)
then y.encompassed
else [y.encompassed] end z
ORDER BY y.name;
“Find the encompassing continents for each country”
HW SQL++ Example
SELECT y.`-car_code` as code,
y.name as name
FROM geo.world x, x.mondial.country y
ORDER BY y.name;
y)
Useful Keywords/Syntax for HW
is_array( … ) -----> checks if value is an array
split(s, d) -----> splits string s on delimiter d
[ … ] -----> explicitly construct array
(CASE WHEN … THEN … ELSE … END) -----> combine with “is_array(…)”
MISSING -----> reserved keyword like “NULL”
` … ` -----> backtick needed for accessing keys with names containing “-”
NoSQL
SQL vs NoSQL
Structure
Language
Scaling
Convergence