1 of 45

RobustScaler: QoS-Aware Autoscaling for Complex Workloads

Huajie Qian, Qingsong Wen, Liang Sun, Jing Gu, Qiulin Niu, Zhimin Tang

Alibaba Group

2 of 45

Overview

Background: Autoscaling in cloud services

Dynamics of scaling-per-query and Challenges

The RobustScaler Framework

Query arrival modeling
Scaling decision optimization

Experiments

3 of 45

Background: Autoscaling

Autoscaling: Automatically add or delete computing resources to match the demand in cloud applications (Qu et al. ’18, Lorido-Botran et al. ’14, Al-Dhuraibi et al. ’17, …)

Horizontal scaling (add/delete instances or VMs) vs vertical scaling (up/downgrade CPU, RAM, network, etc.)

Provisioning delay: Adjust resource amount before workload changes (proactive scaling)

load balancer

instance

QPS

4 of 45

Background: Autoscaling

Autoscaling: Automatically add or delete computing resources to match the demand in cloud applications (Qu et al. ’18, Lorido-Botran et al. ’14, Al-Dhuraibi et al. ’17, …)

Horizontal scaling (add/delete instances or VMs) vs vertical scaling (up/downgrade CPU, RAM, network, etc.)

Provisioning delay: Adjust resource amount before workload changes (proactive scaling)

load balancer

instance

QPS

5 of 45

Background: Autoscaling

Autoscaling: Automatically add or delete computing resources to match the demand in cloud applications (Qu et al. ’18, Lorido-Botran et al. ’14, Al-Dhuraibi et al. ’17, …)

Horizontal scaling (add/delete instances or VMs) vs vertical scaling (up/downgrade CPU, RAM, network, etc.)

Provisioning delay: Adjust resource amount before workload changes (proactive scaling)

load balancer

instance

QPS

instance

6 of 45

Background: Autoscaling

Autoscaling: Automatically add or delete computing resources to match the demand in cloud applications (Qu et al. ’18, Lorido-Botran et al. ’14, Al-Dhuraibi et al. ’17, …)

Horizontal scaling (add/delete instances or VMs) vs vertical scaling (up/downgrade CPU, RAM, network, etc.)

Provisioning delay: Adjust resource amount before workload changes (proactive scaling)

load balancer

instance

QPS

7 of 45

Background: Scaling-Per-Query

We focus on scaling-per-query scenario:

Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 4

query 2

Instance 2

Instance 4

8 of 45

Background: Scaling-Per-Query

We focus on scaling-per-query scenario:

Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 4

query 2

Instance 2

Instance 4

9 of 45

Background: Scaling-Per-Query

We focus on scaling-per-query scenario:

Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 4

query 2

Instance 2

terminated

10 of 45

Background: Scaling-Per-Query

We focus on scaling-per-query scenario:

Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 2

Instance 2

terminated

query 5

11 of 45

Background: Scaling-Per-Query

We focus on scaling-per-query scenario:

Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 2

Instance 2

terminated

query 5

starting

waiting (cold start)

12 of 45

Background: Scaling-Per-Query

We focus on scaling-per-query scenario:

Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Reactive autoscaling leads to cold starts & long waiting times
Solution: Proactively start instances before query arrivals to reduce cold starts & waiting (e.g., pool of warm instances)

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 2

Instance 2

query 5

Instance 5

terminated

13 of 45

Background: Scaling-Per-Query Dynamics

time

QoS metrics:

Response time (RT)

end of processing – time of arrival

Hitting probability (HP): probability of a query being hit (i.e., warm instance available upon arrival)

Cost metrics:

Idle time (of an instance)

time to start processing a query – time of finishing startup

14 of 45

Background: Scaling-Per-Query Dynamics

time

not hit, idle time 0,

query wait > 0

QoS metrics:

Response time (RT)

end of processing – time of arrival

Hitting probability (HP): probability of a query being hit (i.e., warm instance available upon arrival)

Cost metrics:

Idle time (of an instance)

time to start processing a query – time of finishing startup

15 of 45

Background: Scaling-Per-Query Dynamics

time

not hit, idle time 0,

query wait > 0

not hit, idle time 0,

query wait longest

QoS metrics:

Response time (RT)

end of processing – time of arrival

Hitting probability (HP): probability of a query being hit (i.e., warm instance available upon arrival)

Cost metrics:

Idle time (of an instance)

time to start processing a query – time of finishing startup

16 of 45

Background: Scaling-Per-Query Dynamics

time

not hit, idle time 0,

query wait > 0

not hit, idle time 0,

query wait longest

QoS metrics:

Response time (RT)

end of processing – time of arrival

Hitting probability (HP): probability of a query being hit (i.e., warm instance available upon arrival)

Cost metrics:

Idle time (of an instance)

time to start processing a query – time of finishing startup

Idle time

start earlier

start later

17 of 45

Related Works & Challenges

Related works:

Autoscaling systems for various cloud services: database (P-store by Taft et al. SIGMOD ’18, Jindal et al. SoCC ’19, …), microservices (Yu et al. ’20, Abdullah et al. ’20, Bauer et al. ICDCS ’19, …), stream processing (Turbine by Mei et al. ICDE ’20, Wang et al. SIGMOD ’19, …), web applications (Jiang et al. CCGrid ’13, Aslanpour et al. ’17, …), serverless computing (Mohan et al. HotCloud ’19, Lin & Glikson ’19, Wang et al. ATC ’18, …), etc.
Technical tools for autoscaling: control theory (Ullah et al. ’18), reinforcement learning (Lolos et al. ICDE ’17, Gari et al. ’21, …), queuing theory (Moreno-Vozmediano et al. ’19, Gandhi et al. ’18, …), rule-based methods (Taherizadeh & Stankovski ’19, …), etc.
Workload modeling and forecasting: Classical time series models (Calheiros et al. ’15, Kumar & Mazumdar ICIT ’16, …), neural networks (Kumar et al. ’20, Janardhanan & Barrett ICITST ’17, …), Poisson/Markovian process (Calzarossa et al. ’16, Pitchumani et al. MSST ’15, Pacheco-Sanchez et al. CLOUD ’11, …), hierarchical bundling models (Juan et al. PAKDD ’14), etc.

Challenges

Complex and varying periodicity patterns are common but hard to capture
Robustness to data contamination, e.g., anomalies, noises, missing data
Uncertainty of query arrival times (on top of traffic size) plays a crucial role in scaling-per-query

Nonhomogeneous Poisson Process (NHPP) + periodicity regularization

Stochastically constrained optimization

18 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

decisions

19 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

Build non-homogeneous Poisson process (NHPP) models for query arrival

Integrate periodicity info.

decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

20 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

Build non-homogeneous Poisson process (NHPP) models for query arrival

Integrate periodicity info.

decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

Decomposing NHPP intensity into season, trend, and noise components (Fast RobustSTL, Wen et al. KDD ’20)

Predict each component using standard time series models and then combine

21 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

service/cost level

Build non-homogeneous Poisson process (NHPP) models for query arrival

Integrate periodicity info.

Solve stochastically constrained optimization (SCO) for optimal scaling decisions

decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

Decomposing NHPP intensity into season, trend, and noise components (Fast RobustSTL, Wen et al. KDD ’20)

Predict each component using standard time series models and then combine

22 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

service/cost level

Build non-homogeneous Poisson process (NHPP) models for query arrival

Integrate periodicity info.

decisions

Solve stochastically constrained optimization (SCO) for optimal scaling decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

Decomposing NHPP intensity into season, trend, and noise components (Fast RobustSTL, Wen et al. KDD ’20)

Predict each component using standard time series models and then combine

23 of 45

Arrival Modeling as NHPP

log likelihood

smoothness penalty

periodicity penalty

24 of 45

Arrival Modeling as NHPP

log likelihood

smoothness penalty

periodicity penalty

effect of periodicity penalty

25 of 45

Arrival Modeling as NHPP

log likelihood

smoothness penalty

periodicity penalty

effect of periodicity penalty

26 of 45

Arrival Modeling as NHPP

log likelihood

smoothness penalty

periodicity penalty

effect of periodicity penalty

27 of 45

Arrival Modeling as NHPP

update smoothness penalty

update periodicity penalty

update dual variables

update intensity (no closed-form sol., use quadratic approximation)

Augmented Lagrangian:

ADMM Solution

28 of 45

Experiment: Benefit of Periodicity Penalty

Synthetic data:

Estimate vs. truth:

Blue: ground truth

Orange: w/o periodicity penalty (wavy, especially at the valleys)

Red: w/ periodicity penalty (accurate)

29 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

service/cost level

Build non-homogeneous Poisson process (NHPP) models for query arrival

Integrate periodicity info.

decisions

Solve stochastically constrained optimization (SCO) for optimal scaling decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

Decomposing NHPP intensity into season, trend, and noise components (Fast RobustSTL, Wen et al. KDD ’20)

Predict each component using standard time series models and then combine

30 of 45

The Constrained Optimization Model

31 of 45

The Constrained Optimization Model

32 of 45

The Constrained Optimization Model

separable

monotonicity

33 of 45

The Constrained Optimization Model

separable

monotonicity

34 of 45

The Constrained Optimization Model

36 of 45

Experiments

Baseline autoscaling methods:
Backup Pool (BP): A warm instance pool of fixed size B is maintained throughout, whenever one of the instances is consumed, fill with a new one immediately. (B=0 is purely reactive)
Adaptive Backup Pool (AdapBP): Adaptive version of BP with B=coeff*QPS for some fixed coefficient coeff.
RobustScaler (RobustScaler-HP, RobustScaler-RT, RobustScaler-cost): decision is updated every one second.

Data: Three real-world traces

	#queries	duration
Alibaba container registry (CRS) trace	21,059	28 days
Alibaba cluster 2018 trace (https://github.com/alibaba/clusterdata)	503,850	5 days
Google cluster 2019 trace (https://github.com/google/cluster-data)	20,254	24 hours

37 of 45

QoS-cost Pareto Plots on Alibaba trace

Under the same relative cost, higher hitting rate or lower response time is better!

RobustScaler > AdapBP > BP

38 of 45

QoS-cost Pareto Plots on Google trace

Under the same relative cost, higher hitting rate or lower response time is better!

RobustScaler > BP > AdapBP

39 of 45

QoS-cost Pareto Plots on CRS trace

40 of 45

QoS-cost Pareto Plots on CRS trace

41 of 45

QoS-cost Pareto Plots under Perturbation

Perturbation: Queries within certain time intervals are deleted, or multiplied by c times for c=1,2,4,6

RobustScaler-HP surpasses AdapBP as c increases! Robustscaler is more robust against data contaminations.

42 of 45

Robustness: Missing Data and Anomalies

Robustness test against anomalies and missing data

Quantiles of response time before/after missing data injection (almost identical!)

43 of 45

Scalability

By modules:
Periodicity detection & query arrival prediction: RobustPeriod & Fast RobustSTL known to be scalable

Historical query arrival modeling: 7 secs for QPS series of four days, 100 secs for QPS series of three weeks

Scaling plan: Linearly growing runtime, remain efficient under high QPS

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

decisions

44 of 45

Accuracy of Metric Control

Small errors between the nominal metric targets used in RobustScaler and actual metrics on the CRS trace!

RobustScaler-RT

RobustScaler-cost

RobustScaler-HP

45 of 45

Conclusion

Propose RobustScaler for optimal QoS-cost tradeoff in scaling-per-query

Propose NHPP arrival modeling with periodicity regularization to explicitly capture complex and variable periodicity workload patterns
Design stochastically constrained optimization to deal with the impact of uncertainty in query arrivals

Better QoS than heuristic methods under the same cost

More robust against missing data, anomalies, and perturbations

Highly scalable for heavy workload and accurate metric control

1 of 45

2 of 45

3 of 45

4 of 45

5 of 45

6 of 45

7 of 45

8 of 45

9 of 45

10 of 45

11 of 45

12 of 45

13 of 45

14 of 45

15 of 45

16 of 45

17 of 45

18 of 45

19 of 45

20 of 45

21 of 45

22 of 45

23 of 45

24 of 45

25 of 45

26 of 45

27 of 45

28 of 45

29 of 45

30 of 45

31 of 45

32 of 45

33 of 45

34 of 45

35 of 45

36 of 45

37 of 45

38 of 45

39 of 45

40 of 45

41 of 45

42 of 45

43 of 45

44 of 45

45 of 45