Data Engineering��OLAP and Summarization
Raw Data
Transactions
Sensors
Log Files
Experiments
Data Warehouse
Data Integration
Extract
Load
Transform
Source of Truth
Governed
Secure
Audited�Managed
“ELT”
Raw Data
Transactions
Sensors
Log Files
Experiments
Use-Case-Specific
Fit for purpose
Self-Service
Data Lake
Data Warehouse
Extract
Load
Transform
Transform
Transform
ETLT+?
Source of Truth
Governed
Secure
Audited�Managed
Data Preparation /
Integration
Data Warehousing & OLAP
Typical Architectures
Region SW
Region NW
Region MW
Region NE
Region S
Regional OLTP
(live or transactional) support
Typical Architectures
Region SW
Region NW
Region MW
Region NE
Centralized
Data Warehouse
Region S
OLAP support
Regional OLTP
(live or transactional) support
Typical Architectures
Region SW
Region NW
Region MW
Region NE
Centralized
Data Warehouse
Region S
ETL
Periodically done
OLAP support
Regional OLTP
(live or transactional) support
Typical Architectures
Region SW
Region NW
Region MW
Region NE
Centralized
Data Warehouse
Region S
ETL
Periodically done
OLAP support
Regional OLTP
(live or transactional) support
Typical Architectures
Region SW
Region NW
Region MW
Region NE
Centralized
Data Warehouse
Region S
ETL
Regional OLTP
(live or transactional) support
Periodically done
OLAP support
Schemas in Data Warehouses
10
Centralized
Data Warehouse
Star Schema
11
Store
Iteminfo
Customer
Sales
Foreign Key
Foreign Key
Foreign Key
Star to Snowflake
12
Store
Customer
Sales
Manf
Iteminfo
Foreign Key
OLAP Queries
SELECT category, country, COUNT(number)
FROM Sales NATURAL JOIN Iteminfo NATURAL JOIN Store
GROUP BY category, country
13
Store
Iteminfo
Customer
Sales
Dates in OLAP
14
Store
Iteminfo
Customer
Sales
Star Schema
SELECT category, country, month, COUNT(number)
FROM Sales NATURAL JOIN Iteminfo NATURAL JOIN Store NATURAL JOIN Dateinfo
GROUP BY category, country, month
SELECT category, country, datepart(‘month’, date) as month, COUNT(number)
FROM Sales NATURAL JOIN Iteminfo NATURAL JOIN Store
GROUP BY category, country, month
15
Store
Iteminfo
Customer
Sales
Date
Introducing Data Cubes
16
Item | Color | Size | Number |
Jacket | Blue | Small | 1 |
Jacket | Red | Medium | 1 |
Jeans | Black | Large | 2 |
… | … | … | … |
Vanilla GROUP BY across all groups
17
Item | Color | Size | Number |
Jacket | Blue | Small | 23 |
Jacket | Blue | Medium | 17 |
Jacket | Blue | Large | 34 |
Jacket | Red | Small | 18 |
… | … | … | … |
Jeans | Blue | Small | 14 |
… | … | … | 13 |
Say I was interested in only the color and item…
18
| Blue | Red | … | Total |
Jacket | 23 | 45 | … | 234 |
Jeans | 24 | 28 | … | 462 |
… | … | … | … | … |
Total | 89 | 132 | … | 2384 |
Say I was interested in only the color and item…
19
| Blue | Red | … | Total |
Jacket | 23 | 45 | … | 234 |
Jeans | 24 | 28 | … | 462 |
… | … | … | … | … |
Total | 89 | 132 | … | 2384 |
Another way to view this…
20
| Blue | Red | … | Total |
Jacket | 23 | 45 | … | 234 |
Jeans | 24 | 28 | … | 462 |
… | … | … | … | … |
Total | 89 | 132 | … | 2384 |
Another way to view this…
21
Another way to view this…
22
Hierarchies for Setting the Partitioning Granularity
23
Datetime
Hour
Full Date
Day of week
Month
Specific Month
Specific Quarter
Year
Quarter
City
State
Country
Hierarchies for Setting the Partitioning Granularity
24
Datetime
Hour
Full Date
Day of week
Month
Specific Month
Specific Quarter
Year
Quarter
City
State
Country
Moves in the Hierarchy and Corresponding Cube
25
OLAP in SQL: CUBE
SELECT item, color, SUM(number)
FROM Sales
GROUP BY CUBE (item, color)
26
Item | Color | Number |
Jacket | Blue | 23 |
… | … | … |
Jacket | Green | 34 |
Jeans | Blue | 28 |
… | … | … |
Jeans | Green | 17 |
Jacket | NULL | 185 |
Jeans | NULL | 200 |
… | … | … |
NULL | Blue | 94 |
NULL | Red | 74 |
… | … | … |
NULL | NULL | 984 |
ROLLUP is an alternative to CUBE
ROLLUP targets a smaller # of combinations
SELECT item, color, size, SUM(number)
FROM Sales
GROUP BY ROLLUP (item), ROLLUP (color, size)
27
Picking the right CUBE/ROLLUP query
28
How does this relate to visualizations?
29
So, why did we learn OLAP?
30
Extra Slides on Visualization
Why Visualizations?
32
Types of Data: A Visualization-Oriented Perspective
33
Airlines, Genre
Film ratings, Batteries
Year, Location
Sales, Profit,
Temperature
Types of Data: A Visualization-Oriented Perspective
34
Hot, cold
Good, OK, Bad
Temperature
Score
Grade
A Very Quick Primer on Visualization Types
35
Bar Charts
36
Q: What SQL query
generates a multiple
bar chart?
When are bar charts appropriate?
37
Line Charts
38
When are line charts appropriate?
39
Scatterplots
40
When are scatterplots appropriate?
41
Choropleths
42
When are choropleths appropriate?
43
What type of visualization would you use?
44
Visualization Tools
45
What do Visual Analytics tools support?
46
Composing Attributes
47
Table Algebra: Basic Operands
48
Concatenation Operator (+)
49
Cross operator (X)
50
And then what?
51
Takeaways
52