CSCD01 Engineering Large Software Systems
Mitigating Architecture Risks
Cho Yin Yong / Aleksander Bodurri
Scaling Databases
Leaderless Replication
Reads and writes can be sent to any replicas.
Changes are propagated through read repair - Updating replicas during a read.
Replication
Could be problematic!
Scaling a database
Partitioning / Sharding
Splitting data of a system to multiple nodes. Each node contain only partial information about the full dataset.
Partition by (primary) key range / index
For example, a primary key could be set as the student id.
In the case of tinder’s “geo-sharding”, the longitude and latitude.
Problems?
Partition 1
(000 - 999xxxxx)
Partition 2
(1000xxxxxx - 1005xxxxxx)
Partition 3
(1006xxxxxx - 1011xxxxxx)
Problem
Over time, some nodes become inactive because students graduate.
Need to avoid skewing - we need to spread the requests equally between database shards
Partition by hash
Hash algorithm to balance between partitions.
Partition 0
Partition 1
Partition 2
Hash algorithm
Easiest hash algorithm: x mod N
Partition by hash
Hash algorithm to balance between partitions.
Partition 0
Partition 1
Partition 2
Hash algorithm
Easiest hash algorithm: x mod N
id = 8
3 partitions:
8 -> Partition 2 (8 mod 3 = 2)
4 partitions:
8 -> Partition 0 (8 mod 4 = 0)
Problems
SELECT * FROM Students WHERE StudentID BETWEEN 1002000000 AND 1005000000
Need to rehash all the keys!
X mod 2
X mod 4
X mod 5
Partition by consistent hashing (Karger et al. 1997)
Instead of hashing on discrete replica numbers, hash into an intermediate step - a circle. 0 degrees to 360 degrees (2π radians)
Karger, David, et al. "Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web." Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. 1997.
Partition by consistent hash
Place replicas on circle as well. Assign all hashed items to the closest previous partition.
1
3
2
0
50
90
135
185
250
280
Introducing new partitions
Not needing to remap all nodes - only partial nodes
1
3
2
4
Virtual Nodes
1
3
2
4
2
3
4
3
4
1
1
2
Availability Risk
Tangential:
Reliability Risks
Reliability
“The probability of a product performing its intended function under stated conditions without failure for a given period of time.”
https://asq.org/quality-resources/reliability
“Priority Message Queues”
Always take the highest priority task.
Any risk?
Backend
Frontend
Report Backend
Priority Message queue
Ambulance Pattern
Prevents starvation of normal priority messages, but still allow prioritized messages to be processed expeditiously.
Backend
Report Backend
Normal Message Queue
Prioritized Message Queue
Backend
Normal Report Backend
Normal Message Queue
Prioritized Message Queue
Prioritized Report Backend
Security Risks
Example: Security in Architecture Scenario
Endpoints:
Well protected by security policies - �patients can only use the first route�but not the second route.
API Layer
DB
Access to PHI should be highly secure
During consensus phase, participants find that PHI leakage should be avoided at all costs - flagged as high risk and medium likelihood.
I will present 3 scenarios
Let’s do an in class discussion on these 3 scenarios.
Option 1: Separated APIs for each persona
Patient Portal API
DB
Clinician API
Option 2: Backend for frontend
A tightly coupled Patient Portal API designed specifically for the Patient Portal UI.
Patients can only use Patient Portal APIs - they will never be able to use Clinician APIs directly.
Patient Portal API
DB
Clinician API
POST /fitbit
POST /Observation
Option 3: Separate DB for best security
Patient Portal will only be able to store and read data from its own database. Clinician APIs will have access to both.
Patient Portal API
DB
Clinician API
DB
Evaluating Risks is like Evaluating Trade-offs
Either the problem is self-identified, or is identified through risk assessment collaboratively.