IoT Analytics: From MongoDB to Redshift
Ilya Drabenia, Tech Lead�
Agenda
IoT Project
Initial Solution: Design
Enhancements: Async Generation
Enhancements
Stack Alternatives: Columnar vs Row Storage
Stack Alternatives: Sorting vs B-Tree
Stack Alternatives: Sorting vs B-Tree
Stack Alternatives: Flat vs Star Schema
1. Too difficult to update names and attributes of dimensions�2. Simple to load / unload data
Stack Alternatives: Flat vs Star Schema
1. Allow update names and attributes of dimensions in simple way�2. More difficult to load / unload data
Stack Alternatives
Analyzed Alternatives
AWS Redshift: Architecture
Distribution Style:
SLA 99.9%
AWS Redshift: Compression
Original data value | Original size (bytes) | Compressed value (token) | Compressed size (bytes) |
Blue | 4 | {2,Blue} | 5 |
Blue | 4 | 0 | |
Green | 5 | {3,Green} | 6 |
Green | 5 | 0 | |
Green | 5 | 0 | |
Totals | 23 | | 11 |
analyze compression <TABLE>
Result Solution: Proposed Solution
Result Solution: Data Ingestion Pipeline
Result Solution: Single Node Benchmark
Half of billion rows process max in 10 minutes�
Result Solution: Two-Nodes Benchmark
2 billion rows process max in 5 minutes�
Result Solution: Two-Nodes Benchmark
Caching
Result Solution: Two-Nodes Benchmark
Partial Caching
Result Solution: Two-Nodes Benchmark
Disk
Conclusion: Redshift Schema Tips
Use KEY distribution carefully, it may trigger data skew�
Conclusion: Redshift Schema Tips
No Secondary Indexes, each query must use sorting key�
Conclusion
Conclusion
Conclusion
Conclusion
Conclusion
Q&A?