1 of 29

CSCD01 Engineering Large Software Systems

Mitigating Architecture Risks

Cho Yin Yong / Aleksander Bodurri

2 of 29

Scaling Databases

3 of 29

Leaderless Replication

Reads and writes can be sent to any replicas.

Changes are propagated through read repair - Updating replicas during a read.

4 of 29

Replication

  • Aims to increase availability
  • Hard to keep consistency
  • Given you have Y replicas, to store X GB of data, you need to have X * Y GB storage
  • Etc.

Could be problematic!

5 of 29

Scaling a database

  • Replication
  • Partitioning

6 of 29

Partitioning / Sharding

Splitting data of a system to multiple nodes. Each node contain only partial information about the full dataset.

  • Partition by key range
  • Partition by hash

7 of 29

Partition by (primary) key range / index

For example, a primary key could be set as the student id.

In the case of tinder’s “geo-sharding”, the longitude and latitude.

Problems?

Partition 1

(000 - 999xxxxx)

Partition 2

(1000xxxxxx - 1005xxxxxx)

Partition 3

(1006xxxxxx - 1011xxxxxx)

8 of 29

Problem

Over time, some nodes become inactive because students graduate.

Need to avoid skewing - we need to spread the requests equally between database shards

9 of 29

Partition by hash

Hash algorithm to balance between partitions.

Partition 0

Partition 1

Partition 2

Hash algorithm

Easiest hash algorithm: x mod N

10 of 29

Partition by hash

Hash algorithm to balance between partitions.

Partition 0

Partition 1

Partition 2

Hash algorithm

Easiest hash algorithm: x mod N

id = 8

3 partitions:

8 -> Partition 2 (8 mod 3 = 2)

4 partitions:

8 -> Partition 0 (8 mod 4 = 0)

11 of 29

Problems

  1. Range queries become inefficient, for example:

SELECT * FROM Students WHERE StudentID BETWEEN 1002000000 AND 1005000000

  • What if N changes? Easiest hash algorithm: x mod N

Need to rehash all the keys!

X mod 2

X mod 4

X mod 5

12 of 29

Partition by consistent hashing (Karger et al. 1997)

Instead of hashing on discrete replica numbers, hash into an intermediate step - a circle. 0 degrees to 360 degrees (2π radians)

Karger, David, et al. "Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web." Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. 1997.

13 of 29

Partition by consistent hash

Place replicas on circle as well. Assign all hashed items to the closest previous partition.

1

3

2

0

50

90

135

185

250

280

14 of 29

Introducing new partitions

Not needing to remap all nodes - only partial nodes

1

3

2

4

15 of 29

Virtual Nodes

1

3

2

4

2

3

4

3

4

1

1

2

16 of 29

Availability Risk

Tangential:

  • Consistency Risk
  • Complexity Risk

17 of 29

Reliability Risks

18 of 29

Reliability

“The probability of a product performing its intended function under stated conditions without failure for a given period of time.”

https://asq.org/quality-resources/reliability

19 of 29

“Priority Message Queues”

Always take the highest priority task.

Any risk?

Backend

Frontend

Report Backend

Priority Message queue

20 of 29

Ambulance Pattern

Prevents starvation of normal priority messages, but still allow prioritized messages to be processed expeditiously.

Backend

Report Backend

Normal Message Queue

Prioritized Message Queue

Backend

Normal Report Backend

Normal Message Queue

Prioritized Message Queue

Prioritized Report Backend

21 of 29

Security Risks

22 of 29

Example: Security in Architecture Scenario

Endpoints:

  • /api/patients/1/fitbit: API used for patients to add Fitbit data to their record
  • /api/patients/1: API used by clinicians to obtain summary for patient

Well protected by security policies - �patients can only use the first route�but not the second route.

API Layer

DB

23 of 29

Access to PHI should be highly secure

During consensus phase, participants find that PHI leakage should be avoided at all costs - flagged as high risk and medium likelihood.

24 of 29

I will present 3 scenarios

Let’s do an in class discussion on these 3 scenarios.

25 of 29

Option 1: Separated APIs for each persona

Patient Portal API

DB

Clinician API

26 of 29

Option 2: Backend for frontend

A tightly coupled Patient Portal API designed specifically for the Patient Portal UI.

Patients can only use Patient Portal APIs - they will never be able to use Clinician APIs directly.

Patient Portal API

DB

Clinician API

POST /fitbit

POST /Observation

27 of 29

Option 3: Separate DB for best security

Patient Portal will only be able to store and read data from its own database. Clinician APIs will have access to both.

Patient Portal API

DB

Clinician API

DB

28 of 29

29 of 29

Evaluating Risks is like Evaluating Trade-offs

Either the problem is self-identified, or is identified through risk assessment collaboratively.