A Survey of Cloud Database Systems
C. Mohan� �Distinguished Visiting Professor (Tsinghua University, Beijing, China)��Member, Board of Governors (Digital University Kerala, India)�Retired IBM Fellow (Silicon Valley, USA) & Former Shaw Visiting Professor (National Univ of Singapore)���https://bit.ly/CMoDUK �
Indian Institute of Science (IISc)�Bangalore, India, 5 January 2022
@seemohan
1
C. Mohan, IISc BLR, 2022-01-05
Acks + More Info
@seemohan
2
C. Mohan, IISc BLR, 2022-01-05
Relational Database Management Systems (RDBMSs)
@seemohan
3
C. Mohan, IISc BLR, 2022-01-05
Traditional DBMS Architecture
@seemohan
4
C. Mohan, IISc BLR, 2022-01-05
Storage Hierarchy
@seemohan
5
C. Mohan, IISc BLR, 2022-01-05
Storage Hierarchy
@seemohan
6
C. Mohan, IISc BLR, 2022-01-05
Storage Hierarchy
@seemohan
7
C. Mohan, IISc BLR, 2022-01-05
Different Workloads
@seemohan
8
C. Mohan, IISc BLR, 2022-01-05
Online Transaction Processing (OLTP)
@seemohan
9
C. Mohan, IISc BLR, 2022-01-05
Online Analytical Processing (OLAP)
@seemohan
10
C. Mohan, IISc BLR, 2022-01-05
Data Warehouse Architecture
@seemohan
11
C. Mohan, IISc BLR, 2022-01-05
Modern Data Architecture
@seemohan
12
C. Mohan, IISc BLR, 2022-01-05
HTAP Benefits
@seemohan
13
C. Mohan, IISc BLR, 2022-01-05
Tradeoffs in the New Age
http://www.w3resource.com/mongodb/nosql.php
“Split-Brain” Syndrome
@seemohan
14
C. Mohan, IISc BLR, 2022-01-05
Modern Application Requirements
@seemohan
15
C. Mohan, IISc BLR, 2022-01-05
HTAP Systems: Design Challenges
@seemohan
16
C. Mohan, IISc BLR, 2022-01-05
Traditional Parallel Database Architectures
Source: DB2 for z/OS: Data Sharing in a Nutshell, IBM Redbook SG24-7322-00, October 2006
@seemohan
17
C. Mohan, IISc BLR, 2022-01-05
Traditional Distributed Databases
@seemohan
18
C. Mohan, IISc BLR, 2022-01-05
Traditional Database Replication
@seemohan
19
C. Mohan, IISc BLR, 2022-01-05
Distributed SQL Databases Comparison
Source: http://bit.ly/3t3rBDa Yugabyte, 1/2021
@seemohan
20
C. Mohan, IISc BLR, 2022-01-05
Distributed NoSQL Databases Comparison
Source: http://bit.ly/3t3rBDa Yugabyte, 1/2021
@seemohan
21
C. Mohan, IISc BLR, 2022-01-05
Cloud Deployment Models
Source: https://bit.ly/31KGP7j, 3/2017
@seemohan
22
C. Mohan, IISc BLR, 2022-01-05
Cloud Deployment Models
Source: https://bit.ly/3rRGjPz, 2/2021
@seemohan
23
C. Mohan, IISc BLR, 2022-01-05
Cloud Service Models
Source: https://bit.ly/31KGP7j, 3/2017
@seemohan
24
C. Mohan, IISc BLR, 2022-01-05
Cloud Service Models
Source: https://bit.ly/3rRGjPz, 2/2021
@seemohan
25
C. Mohan, IISc BLR, 2022-01-05
Cloud Data Center Architecture
Source: https://crmtrilogix.com/Cloud-Blog/IaaS-and-PaaS/Cloud-Infrastructure--Data-Center-Architecture/160
@seemohan
26
C. Mohan, IISc BLR, 2022-01-05
Cloud Fault Domain Zones Regions
@seemohan
27
C. Mohan, IISc BLR, 2022-01-05
Seattle 2018 DB Meeting (http://bit.ly/DBseat)
Material from Working Group Report on Cloud Data Services�Chairs: Sailesh Krishnamurthy and Fatma Ozcan
@seemohan
28
C. Mohan, IISc BLR, 2022-01-05
What Users Want
@seemohan
29
C. Mohan, IISc BLR, 2022-01-05
What Users Want
@seemohan
30
C. Mohan, IISc BLR, 2022-01-05
Some Cloud DBMSs
@seemohan
31
C. Mohan, IISc BLR, 2022-01-05
Traditional vs Cloud DB Architectures
Local
Storage
SQL
Transactions
Caching
Logging
Compute
Network
Storage
Attached
Storage
SQL
Transactions
Caching
Logging
Compute
Compute and Storage�decoupled for scalability,�availability, durability
@seemohan
32
C. Mohan, IISc BLR, 2022-01-05
State Separation from Compute
@seemohan
33
C. Mohan, IISc BLR, 2022-01-05
WARM POOL
OF INSTANCES
APPLICATION
DATABASE STORAGE
SCALABLE DB CAPACITY
REQUEST ROUTERS
Aurora Serverless
@seemohan
34
C. Mohan, IISc BLR, 2022-01-05
Aurora Architecture
@seemohan
35
C. Mohan, IISc BLR, 2022-01-05
Amazon Aurora Design Philosophy
@seemohan
36
C. Mohan, IISc BLR, 2022-01-05
Aurora Compute
Customer�Application
SQL
Transactions
Caching
Logging
Head Node
Customer VPC
@seemohan
37
C. Mohan, IISc BLR, 2022-01-05
Aurora: Offload Redo to Storage
Database Tier
Storage Tier
The Log is the Database
SQL
Transactions
AZ 1
AZ 2
AZ 3
Caching
Amazon S3
@seemohan
38
C. Mohan, IISc BLR, 2022-01-05
Network I/O in MySQL vs Amazon Aurora
@seemohan
39
C. Mohan, IISc BLR, 2022-01-05
Aurora Reduces Aggregate I/O Burden
30 minute SysBench write-only workload, 100GB dataset
6X more log writes, but 9X less network traffic
RDS MySQL Multi-AZ 30K PIOPS
Total trx : 780K
I/Os per trx : 7.4 (ex-mirroring)
Aurora
Total trx : 27,378K
I/Os per trx : 0.95 (inc 6x amp)
35X MORE
7.7X LESS
@seemohan
40
C. Mohan, IISc BLR, 2022-01-05
Aurora Storage Node
@seemohan
41
C. Mohan, IISc BLR, 2022-01-05
Availability & Software Upgrades
@seemohan
42
C. Mohan, IISc BLR, 2022-01-05
Distributed Commits in Cloud Databases
No atomic flush spanning multiple storage nodes
Commits over distributed storage use 2PC/Paxos to establish global consistency
Network chatter leads to stalls and jitter in the write path
Local
Storage
SQL
Transactions
Caching
Logging
Compute
Network
Storage
@seemohan
43
C. Mohan, IISc BLR, 2022-01-05
Storage nodes
Establish compact consistency points that:
Increase monotonically
Are continuously returned to the database
Do not vote on accepting a write
Execute idempotent operations on local state
Database nodes
Handle locking, transactions, deadlocks, constraints etc.
Aurora: Asynchronous Commit Processing
@seemohan
44
C. Mohan, IISc BLR, 2022-01-05
Backward Chaining of Redo Log Records
Each redo log record includes backlink LSNs:
@seemohan
45
C. Mohan, IISc BLR, 2022-01-05
Aurora: Crash Recovery
DB opens all storage nodes, reach read quorum for all PGs
Locally re-compute PGCLs and VCL
VCL is the highest point where all records have met quorum
Everything past VCL is truncated at crash recovery
No redo or undo processing is required before the database is opened for processing
CRASH
Log records
Gaps
Volume Complete
LSN (VCL)
AT CRASH
IMMEDIATELY AFTER CRASH RECOVERY
Ragged edge of log not meeting quorum and truncated at recovery
@seemohan
46
C. Mohan, IISc BLR, 2022-01-05
Database Node
Storage nodes
Push down �predicates
Aggregate�results
Aurora Parallel Query
@seemohan
47
C. Mohan, IISc BLR, 2022-01-05
Aurora Multi-Master
@seemohan
48
C. Mohan, IISc BLR, 2022-01-05
Microsoft Azure: SQL DB HADR Architecture
@seemohan
49
C. Mohan, IISc BLR, 2022-01-05
Microsoft Socrates (aka SQL DB Hyperscale)
@seemohan
50
C. Mohan, IISc BLR, 2022-01-05
POLARIS: Distributed SQL in Azure Synapse
Converging warehouses and data lakes in Azure Synapse
@seemohan
51
C. Mohan, IISc BLR, 2022-01-05
Google Spanner
@seemohan
52
C. Mohan, IISc BLR, 2022-01-05
Google F1
@seemohan
53
C. Mohan, IISc BLR, 2022-01-05
CockroachDB (CRDB)
@seemohan
54
C. Mohan, IISc BLR, 2022-01-05
Snowflake
Source: https://bit.ly/3eNUXQd
@seemohan
55
C. Mohan, IISc BLR, 2022-01-05
FoundationDB
@seemohan
56
C. Mohan, IISc BLR, 2022-01-05
Alibaba Cloud Database Systems
@seemohan
57
C. Mohan, IISc BLR, 2022-01-05
Alibaba POLARDB Architecture
Source: https://bit.ly/3sZAkZE
@seemohan
58
C. Mohan, IISc BLR, 2022-01-05
Alibaba POLARDB Architecture
@seemohan
59
C. Mohan, IISc BLR, 2022-01-05
Alibaba POLARDB Architecture
@seemohan
60
C. Mohan, IISc BLR, 2022-01-05
POLARDB-X Architecture
@seemohan
61
C. Mohan, IISc BLR, 2022-01-05
IoT-Cloud Architecture
Source: P. Pierleoni et al. IEEE Access, 2019
@seemohan
62
C. Mohan, IISc BLR, 2022-01-05
IoT-Cloud Architecture
Source: P. Pierleoni et al. IEEE Access, 2019
@seemohan
63
C. Mohan, IISc BLR, 2022-01-05