1 of 28

IoT Analytics: From MongoDB to Redshift

Ilya Drabenia, Tech Lead�

2 of 28

Agenda

  • Initial Solution
  • Initial Solution Enhancements
  • Stack Alternatives
  • AWS Redshift
  • Result Solution
  • Conclusion

3 of 28

IoT Project

4 of 28

Initial Solution: Design

5 of 28

Enhancements: Async Generation

6 of 28

Enhancements

  • Pre-Aggregation

  • Clustering Indexes

7 of 28

Stack Alternatives: Columnar vs Row Storage

8 of 28

Stack Alternatives: Sorting vs B-Tree

9 of 28

Stack Alternatives: Sorting vs B-Tree

10 of 28

Stack Alternatives: Flat vs Star Schema

1. Too difficult to update names and attributes of dimensions�2. Simple to load / unload data

11 of 28

Stack Alternatives: Flat vs Star Schema

1. Allow update names and attributes of dimensions in simple way�2. More difficult to load / unload data

12 of 28

Stack Alternatives

Analyzed Alternatives

  • Redshift
  • Greenplum
  • Vertica
  • ClickHouse
  • Apache Hadoop

13 of 28

AWS Redshift: Architecture

Distribution Style:

  • EVEN
  • KEY
  • ALL

SLA 99.9%

14 of 28

AWS Redshift: Compression

Original data value

Original size (bytes)

Compressed value (token)

Compressed size (bytes)

Blue

4

{2,Blue}

5

Blue

4

0

Green

5

{3,Green}

6

Green

5

0

Green

5

0

Totals

23

11

analyze compression <TABLE>

15 of 28

Result Solution: Proposed Solution

16 of 28

Result Solution: Data Ingestion Pipeline

17 of 28

Result Solution: Single Node Benchmark

Half of billion rows process max in 10 minutes�

18 of 28

Result Solution: Two-Nodes Benchmark

2 billion rows process max in 5 minutes�

19 of 28

Result Solution: Two-Nodes Benchmark

Caching

20 of 28

Result Solution: Two-Nodes Benchmark

Partial Caching

21 of 28

Result Solution: Two-Nodes Benchmark

Disk

22 of 28

Conclusion: Redshift Schema Tips

Use KEY distribution carefully, it may trigger data skew�

23 of 28

Conclusion: Redshift Schema Tips

No Secondary Indexes, each query must use sorting key�

24 of 28

Conclusion

  • MongoDB is not well suited for analytics

25 of 28

Conclusion

  • MongoDB is not well suited for analytics
  • Fix performance with pre-aggregation and covering indexes

26 of 28

Conclusion

  • MongoDB is not well suited for analytics
  • Fix performance with pre-aggregation and covering indexes
  • Columnar storage, sorting keys and compressions

27 of 28

Conclusion

  • MongoDB is not well suited for analytics
  • Fix performance with pre-aggregation and covering indexes
  • Columnar storage, sorting keys and compressions
  • Move to Redshift benefits

28 of 28

Conclusion

Q&A?