JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 28

IoT Analytics: From MongoDB to Redshift

Ilya Drabenia, Tech Lead�

2 of 28

Agenda

Initial Solution
Initial Solution Enhancements
Stack Alternatives
AWS Redshift
Result Solution
Conclusion

3 of 28

IoT Project

4 of 28

Initial Solution: Design

5 of 28

Enhancements: Async Generation

6 of 28

Enhancements

Pre-Aggregation

Clustering Indexes

7 of 28

Stack Alternatives: Columnar vs Row Storage

8 of 28

Stack Alternatives: Sorting vs B-Tree

9 of 28

Stack Alternatives: Sorting vs B-Tree

10 of 28

Stack Alternatives: Flat vs Star Schema

1. Too difficult to update names and attributes of dimensions�2. Simple to load / unload data

11 of 28

Stack Alternatives: Flat vs Star Schema

1. Allow update names and attributes of dimensions in simple way�2. More difficult to load / unload data

12 of 28

Stack Alternatives

Analyzed Alternatives

Redshift
Greenplum
Vertica
ClickHouse
Apache Hadoop

13 of 28

AWS Redshift: Architecture

Distribution Style:

EVEN
KEY
ALL

SLA 99.9%

14 of 28

AWS Redshift: Compression

Original data value	Original size (bytes)	Compressed value (token)	Compressed size (bytes)
Blue	4	{2,Blue}	5
Blue	4	{2,Blue}	0
Green	5	{3,Green}	6
Green	5		0
Green	5		0
Totals	23		11

analyze compression <TABLE>

15 of 28

Result Solution: Proposed Solution

16 of 28

Result Solution: Data Ingestion Pipeline

17 of 28

Result Solution: Single Node Benchmark

Half of billion rows process max in 10 minutes�

18 of 28

Result Solution: Two-Nodes Benchmark

2 billion rows process max in 5 minutes�

19 of 28

Result Solution: Two-Nodes Benchmark

Caching

20 of 28

Result Solution: Two-Nodes Benchmark

Partial Caching

21 of 28

Result Solution: Two-Nodes Benchmark

Disk

22 of 28

Conclusion: Redshift Schema Tips

Use KEY distribution carefully, it may trigger data skew�

23 of 28

Conclusion: Redshift Schema Tips

No Secondary Indexes, each query must use sorting key�

24 of 28

Conclusion

MongoDB is not well suited for analytics

25 of 28

Conclusion

MongoDB is not well suited for analytics
Fix performance with pre-aggregation and covering indexes

26 of 28

Conclusion

MongoDB is not well suited for analytics
Fix performance with pre-aggregation and covering indexes
Columnar storage, sorting keys and compressions

27 of 28

Conclusion

MongoDB is not well suited for analytics
Fix performance with pre-aggregation and covering indexes
Columnar storage, sorting keys and compressions
Move to Redshift benefits

28 of 28

Conclusion

Q&A?