1 of 45

RobustScaler: QoS-Aware Autoscaling for Complex Workloads

Huajie Qian, Qingsong Wen, Liang Sun, Jing Gu, Qiulin Niu, Zhimin Tang

Alibaba Group

2 of 45

Overview

  • Background: Autoscaling in cloud services
    • Dynamics of scaling-per-query and Challenges

  • The RobustScaler Framework
    • Query arrival modeling
    • Scaling decision optimization

  • Experiments

3 of 45

Background: Autoscaling

  • Autoscaling: Automatically add or delete computing resources to match the demand in cloud applications (Qu et al. ’18, Lorido-Botran et al. ’14, Al-Dhuraibi et al. ’17, …)

    • Horizontal scaling (add/delete instances or VMs) vs vertical scaling (up/downgrade CPU, RAM, network, etc.)

    • Provisioning delay: Adjust resource amount before workload changes (proactive scaling)

load balancer

instance

instance

QPS

4 of 45

Background: Autoscaling

  • Autoscaling: Automatically add or delete computing resources to match the demand in cloud applications (Qu et al. ’18, Lorido-Botran et al. ’14, Al-Dhuraibi et al. ’17, …)

    • Horizontal scaling (add/delete instances or VMs) vs vertical scaling (up/downgrade CPU, RAM, network, etc.)

    • Provisioning delay: Adjust resource amount before workload changes (proactive scaling)

load balancer

instance

instance

QPS

5 of 45

Background: Autoscaling

  • Autoscaling: Automatically add or delete computing resources to match the demand in cloud applications (Qu et al. ’18, Lorido-Botran et al. ’14, Al-Dhuraibi et al. ’17, …)

    • Horizontal scaling (add/delete instances or VMs) vs vertical scaling (up/downgrade CPU, RAM, network, etc.)

    • Provisioning delay: Adjust resource amount before workload changes (proactive scaling)

load balancer

instance

instance

QPS

instance

instance

6 of 45

Background: Autoscaling

  • Autoscaling: Automatically add or delete computing resources to match the demand in cloud applications (Qu et al. ’18, Lorido-Botran et al. ’14, Al-Dhuraibi et al. ’17, …)

    • Horizontal scaling (add/delete instances or VMs) vs vertical scaling (up/downgrade CPU, RAM, network, etc.)

    • Provisioning delay: Adjust resource amount before workload changes (proactive scaling)

load balancer

instance

instance

QPS

7 of 45

Background: Scaling-Per-Query

  • We focus on scaling-per-query scenario:
    • Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
    • Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 4

query 2

Instance 2

Instance 4

8 of 45

Background: Scaling-Per-Query

  • We focus on scaling-per-query scenario:
    • Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
    • Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 4

query 2

Instance 2

Instance 4

9 of 45

Background: Scaling-Per-Query

  • We focus on scaling-per-query scenario:
    • Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
    • Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 4

query 2

Instance 2

terminated

10 of 45

Background: Scaling-Per-Query

  • We focus on scaling-per-query scenario:
    • Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
    • Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 2

Instance 2

terminated

query 5

11 of 45

Background: Scaling-Per-Query

  • We focus on scaling-per-query scenario:
    • Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
    • Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 2

Instance 2

terminated

query 5

starting

waiting (cold start)

12 of 45

Background: Scaling-Per-Query

  • We focus on scaling-per-query scenario:
    • Each query is processed by a new instance which is terminated after processing the query (not reused/shared by other queries)
    • Common in on-demand cloud services, e.g., container registry (Wang et al. ATC ’21, Anwar et al. FAST ’18), CI/CD system (Hakli UCC ’18 Companion), function compute (Mahmoudi & Khazaei ’20, Mohan et al. HotCloud ’19, Lin & Glikson ’19), etc.
  • Reactive autoscaling leads to cold starts & long waiting times
  • Solution: Proactively start instances before query arrivals to reduce cold starts & waiting (e.g., pool of warm instances)

Instance 1

query 1

Instance 2

query 3

query 4

query 2

Instance 1

query 1

Instance 3

query 3

query 2

Instance 2

query 5

Instance 5

terminated

13 of 45

Background: Scaling-Per-Query Dynamics

 

 

 

 

 

 

 

 

time

 

 

 

 

QoS metrics:

  • Response time (RT)

end of processing time of arrival

  • Hitting probability (HP): probability of a query being hit (i.e., warm instance available upon arrival)

Cost metrics:

  • Idle time (of an instance)

time to start processing a query time of finishing startup

14 of 45

Background: Scaling-Per-Query Dynamics

 

 

 

 

 

 

 

 

time

 

 

 

not hit, idle time 0,

query wait > 0

 

QoS metrics:

  • Response time (RT)

end of processing time of arrival

  • Hitting probability (HP): probability of a query being hit (i.e., warm instance available upon arrival)

Cost metrics:

  • Idle time (of an instance)

time to start processing a query time of finishing startup

15 of 45

Background: Scaling-Per-Query Dynamics

 

 

 

 

 

 

 

 

time

 

 

 

not hit, idle time 0,

query wait > 0

not hit, idle time 0,

query wait longest

 

QoS metrics:

  • Response time (RT)

end of processing time of arrival

  • Hitting probability (HP): probability of a query being hit (i.e., warm instance available upon arrival)

Cost metrics:

  • Idle time (of an instance)

time to start processing a query time of finishing startup

16 of 45

Background: Scaling-Per-Query Dynamics

 

 

 

 

 

 

 

 

time

 

 

 

not hit, idle time 0,

query wait > 0

not hit, idle time 0,

query wait longest

 

QoS metrics:

  • Response time (RT)

end of processing time of arrival

  • Hitting probability (HP): probability of a query being hit (i.e., warm instance available upon arrival)

Cost metrics:

  • Idle time (of an instance)

time to start processing a query time of finishing startup

Idle time

RT

start earlier

start later

17 of 45

Related Works & Challenges

  • Related works:
    • Autoscaling systems for various cloud services: database (P-store by Taft et al. SIGMOD ’18, Jindal et al. SoCC ’19, …), microservices (Yu et al. ’20, Abdullah et al. ’20, Bauer et al. ICDCS ’19, …), stream processing (Turbine by Mei et al. ICDE ’20, Wang et al. SIGMOD ’19, …), web applications (Jiang et al. CCGrid ’13, Aslanpour et al. ’17, …), serverless computing (Mohan et al. HotCloud ’19, Lin & Glikson ’19, Wang et al. ATC ’18, …), etc.
    • Technical tools for autoscaling: control theory (Ullah et al. ’18), reinforcement learning (Lolos et al. ICDE ’17, Gari et al. ’21, …), queuing theory (Moreno-Vozmediano et al. ’19, Gandhi et al. ’18, …), rule-based methods (Taherizadeh & Stankovski ’19, ), etc.
    • Workload modeling and forecasting: Classical time series models (Calheiros et al. ’15, Kumar & Mazumdar ICIT ’16, …), neural networks (Kumar et al. ’20, Janardhanan & Barrett ICITST ’17, …), Poisson/Markovian process (Calzarossa et al. ’16, Pitchumani et al. MSST ’15, Pacheco-Sanchez et al. CLOUD ’11, …), hierarchical bundling models (Juan et al. PAKDD ’14), etc.

  • Challenges
    • Complex and varying periodicity patterns are common but hard to capture
    • Robustness to data contamination, e.g., anomalies, noises, missing data
    • Uncertainty of query arrival times (on top of traffic size) plays a crucial role in scaling-per-query

Nonhomogeneous Poisson Process (NHPP) + periodicity regularization

Stochastically constrained optimization

18 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

decisions

19 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

  • Build non-homogeneous Poisson process (NHPP) models for query arrival

  • Integrate periodicity info.

decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

20 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

  • Build non-homogeneous Poisson process (NHPP) models for query arrival

  • Integrate periodicity info.

decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

  • Decomposing NHPP intensity into season, trend, and noise components (Fast RobustSTL, Wen et al. KDD ’20)

  • Predict each component using standard time series models and then combine

21 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

service/cost level

  • Build non-homogeneous Poisson process (NHPP) models for query arrival

  • Integrate periodicity info.

Solve stochastically constrained optimization (SCO) for optimal scaling decisions

decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

  • Decomposing NHPP intensity into season, trend, and noise components (Fast RobustSTL, Wen et al. KDD ’20)

  • Predict each component using standard time series models and then combine

22 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

service/cost level

  • Build non-homogeneous Poisson process (NHPP) models for query arrival

  • Integrate periodicity info.

decisions

Solve stochastically constrained optimization (SCO) for optimal scaling decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

  • Decomposing NHPP intensity into season, trend, and noise components (Fast RobustSTL, Wen et al. KDD ’20)

  • Predict each component using standard time series models and then combine

23 of 45

Arrival Modeling as NHPP

 

 

log likelihood

smoothness penalty

periodicity penalty

 

 

24 of 45

Arrival Modeling as NHPP

 

 

log likelihood

smoothness penalty

periodicity penalty

 

effect of periodicity penalty

25 of 45

Arrival Modeling as NHPP

 

 

log likelihood

smoothness penalty

periodicity penalty

 

effect of periodicity penalty

26 of 45

Arrival Modeling as NHPP

 

 

log likelihood

smoothness penalty

periodicity penalty

 

effect of periodicity penalty

27 of 45

Arrival Modeling as NHPP

update smoothness penalty

update periodicity penalty

update dual variables

update intensity (no closed-form sol., use quadratic approximation)

Augmented Lagrangian:

ADMM Solution

 

28 of 45

Experiment: Benefit of Periodicity Penalty

Synthetic data:

Estimate vs. truth:

Blue: ground truth

Orange: w/o periodicity penalty (wavy, especially at the valleys)

Red: w/ periodicity penalty (accurate)

29 of 45

RobustScaler Framework

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

service/cost level

  • Build non-homogeneous Poisson process (NHPP) models for query arrival

  • Integrate periodicity info.

decisions

Solve stochastically constrained optimization (SCO) for optimal scaling decisions

Detect periodic patterns of different scales (RobustPeriod, Wen et al. SIGMOD ’21)

  • Decomposing NHPP intensity into season, trend, and noise components (Fast RobustSTL, Wen et al. KDD ’20)

  • Predict each component using standard time series models and then combine

30 of 45

The Constrained Optimization Model

  •  

31 of 45

The Constrained Optimization Model

  •  

 

32 of 45

The Constrained Optimization Model

  •  

separable

monotonicity

 

33 of 45

The Constrained Optimization Model

  •  

separable

monotonicity

 

 

34 of 45

The Constrained Optimization Model

  •  

 

 

35 of 45

 

 

36 of 45

Experiments

  • Baseline autoscaling methods:
  • Backup Pool (BP): A warm instance pool of fixed size B is maintained throughout, whenever one of the instances is consumed, fill with a new one immediately. (B=0 is purely reactive)
  • Adaptive Backup Pool (AdapBP): Adaptive version of BP with B=coeff*QPS for some fixed coefficient coeff.
  • RobustScaler (RobustScaler-HP, RobustScaler-RT, RobustScaler-cost): decision is updated every one second.

  • Data: Three real-world traces

 

#queries

duration

Alibaba container registry (CRS) trace

21,059

28 days

Alibaba cluster 2018 trace (https://github.com/alibaba/clusterdata)

503,850

5 days

Google cluster 2019 trace (https://github.com/google/cluster-data)

20,254

24 hours

37 of 45

QoS-cost Pareto Plots on Alibaba trace

Under the same relative cost, higher hitting rate or lower response time is better!

RobustScaler > AdapBP > BP

38 of 45

QoS-cost Pareto Plots on Google trace

Under the same relative cost, higher hitting rate or lower response time is better!

RobustScaler > BP > AdapBP

39 of 45

QoS-cost Pareto Plots on CRS trace

 

40 of 45

QoS-cost Pareto Plots on CRS trace

 

41 of 45

QoS-cost Pareto Plots under Perturbation

Perturbation: Queries within certain time intervals are deleted, or multiplied by c times for c=1,2,4,6

RobustScaler-HP surpasses AdapBP as c increases! Robustscaler is more robust against data contaminations.

42 of 45

Robustness: Missing Data and Anomalies

  • Robustness test against anomalies and missing data
  • Quantiles of response time before/after missing data injection (almost identical!)

43 of 45

Scalability

  • By modules:
  • Periodicity detection & query arrival prediction: RobustPeriod & Fast RobustSTL known to be scalable

  • Historical query arrival modeling: 7 secs for QPS series of four days, 100 secs for QPS series of three weeks

  • Scaling plan: Linearly growing runtime, remain efficient under high QPS

historical query arrival modeling

Periodicity detection

query arrival prediction

scaling plan

historical QPS data

decisions

44 of 45

Accuracy of Metric Control

  • Small errors between the nominal metric targets used in RobustScaler and actual metrics on the CRS trace!

RobustScaler-RT

RobustScaler-cost

RobustScaler-HP

45 of 45

Conclusion

  • Propose RobustScaler for optimal QoS-cost tradeoff in scaling-per-query
    • Propose NHPP arrival modeling with periodicity regularization to explicitly capture complex and variable periodicity workload patterns
    • Design stochastically constrained optimization to deal with the impact of uncertainty in query arrivals

  • Better QoS than heuristic methods under the same cost

  • More robust against missing data, anomalies, and perturbations

  • Highly scalable for heavy workload and accurate metric control