1 of 42

Spot Virtual Machine Eviction �Prediction in Microsoft Cloud

Fangkai Yang, Bowen Pang, �Jue Zhang, Bo Qiao, Lu Wang �

Microsoft Research

Microsoft 365

Microsoft Azure

2 of 42

2

Billing policies in public cloud

On-demand Instance

Pay for computer capacity by the hour or second
E.g., V100 GPU instance : $3.06 / Hour (p3.2xlarge)

Reserved Instance

Discount is available (up to 60%)
You have to reserve instances by 1 year or 3 year

Spot Instance

Significant discount (up to 90%)
It could be interrupted or not-fulfilled due to a lack of resources

AWS: Spot Instance

Azure: Spot Virtual Machines

Google Cloud: Spot Virtual Machines

What is Spot Instance in the Cloud Computing?

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

먼저 다들 아시겠지만 SpotInstance가 무엇있지 짚고 넘어가도록 하겠습니다.

대부분의 퍼블릭 클라우드의 컴퓨팅 서비스에는 대표적으로 3가지 가격 정책을 가지고 있습니다.

먼저 온디맨드 인스턴스 입니다. 약정이나 선불결제 없이 초 단위로 컴퓨팅 용량에 대한 비용을 지불합니다.

다음에는 리저브드 인스턴스 입니다. 1년 3년 약정시 온디맨드에 비해 저렴한 가격으로 인스턴스를 제공합니다.

마지막으로 스팟 인스턴스 입니다. 사용되지 않는 컴퓨팅 리소스를 최대 90%까지 할인된 가격으로 사용할 수 있습니다. 그러나 가격이나, 리소스 부족으로 스팟인스턴스 요청이 거절되거나 강제로 중단될 수 있습니다.

3 of 42

3

Computing resource (capacity)�in datacenter of public cloud

Used

resources

Unused

resources

How Spot Instance Work?

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Fulfill!

4 of 42

4

Computing resource (capacity)�in datacenter of public cloud

How Spot Instance Work?

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Used

resources

Unused

resources

Interrupt!

5 of 42

5

Spot Instance Interruption Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

User

Cloud Vendor

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Spot VM eviction predictions,

available per region in the Azure Portal

when deploying new VMs, are helpful

to optimize capacity utilization planning

and allocation management.

the eviction prediction informs users

to optimize deployment plans

to increase the survivability of Spot VMs

and reduce the possibility of interruptions.

6 of 42

6

Difficulty predicting spot instance interruption

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

complex allocation policy

large-scale data center

Difficult to predict

7 of 42

7

Spot Instance Interruption Prediction

Cluster Level & Node Level Prediction

8 of 42

8

Cluster Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

9 of 42

9

Computing resource (capacity)�in datacenter of public cloud

Cluster Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

T

…

60%

78%

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

50%

18%

30%

60%

10%

98%

10 of 42

10

Problem of Cluster Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

	On-demand	Spot
Instance Type A
Instance Type B
Instance Type C
Instance Type D

Cluster A

Cluster B

Cluster Capacity Utilization

Node 1

Node 2

Node 3

Node 1

Node 2

Node 3

Cluster A Node3

Cluster B Node3

Interrupt

No -Interrupt

Azure VM Allocator

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

11 of 42

11

Node Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

The overview of node-level spatial-temporal prediction framework

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

12 of 42

12

Spatial-Temporal Transformer Framework

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

] Spatial-Temporal Transformer Framework는 비디오에서 동작 인식 문제를 해결하기 위해 설계된 딥 러닝 프레임워크이다. 그것은 모델이 공간 및 시간 정보를 비디오에 효과적으로 캡처할 수 있도록 하는 공간 변환기 네트워크(STN)와 시간 컨볼루션 네트워크(TCN)의 조합이다.

공간 변환기 네트워크(STN)는 모델이 특정 관심 영역에 집중하기 위해 입력을 변환하는 방법을 배울 수 있는 신경망 아키텍처이다. 이는 비디오에서 동작 인식에 특히 유용하며, 여기서 동작은 프레임의 다른 위치에서 수행될 수 있다. STN은 비디오 시퀀스에서 프레임을 동적으로 정렬할 수 있으므로 프레임 내 위치에 관계없이 모델이 관심 영역에 집중할 수 있습니다.

13 of 42

13

Spatial-Temporal Transformer Framework

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

14 of 42

14

Transformer?

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Bidirectional Encoder Representation form Transfomers

Genverative Pre-Training

15 of 42

15

Transformer Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

16 of 42

16

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Encoder

Decoder

17 of 42

17

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

1. Input Embedding

Input

Embedding

안녕 경환아 잘 지내?

0.1

0.65

0.29

안녕 =

18 of 42

18

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

2. Input Embedding

Time Step

Positional

Encoding

Positional Input

Embeddings

1

2

3

4

19 of 42

19

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

3-4. Encoder Layer

Positional Input

Embeddings

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Encoder Input

Represintation

20 of 42

20

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

안녕

경환아

잘

지내?

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

21 of 42

21

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

안녕

경환아

잘

지내?

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

Linear

key

query

value

22 of 42

22

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

key

query

Scores

23 of 42

23

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

안녕

경환아

잘

지내?

안녕

경환아

잘

지내?

98

27

10

12

89

9

67

54

91

92

54

67

9

10

27

12

Attention

Energies

24 of 42

24

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

sqrt(dimension of key)

Scaled Scores

25 of 42

25

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

SoftMax

안녕

경환아

잘

지내?

안녕

경환아

잘

지내?

=

0.7

0.3

0.1

0.6

0.1

0.6

0.3

0.1

0.3

0.1

0.2

26 of 42

26

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

value

Attentaion weight

output

27 of 42

27

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

key

query

value

key

query

value

28 of 42

28

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

key

query

value

key

query

value

Self-Attention

Head 1

Self-Attention

Head 2

N = 2

29 of 42

29

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

N = 2

Linear

30 of 42

30

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Scale

MatMul

Linear

concat

Multi-headed Attention

Output vectors

Positinal Input

Embedding

31 of 42

31

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

4. Residual Connection, Layer Normalizatioin

& Pointwise Feed Forward

Multi-headed Attention

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

32 of 42

32

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

4. Residual Connection, Layer Normalizatioin

& Pointwise Feed Forward

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

LayerNorm

Linear

ReLu

LayerNorm

33 of 42

33

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

4. Residual Connection, Layer Normalizatioin

& Pointwise Feed Forward

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Transformer

Encoder

Transformer

Encoder

.

N times

34 of 42

34

Node Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

The overview of node-level spatial-temporal prediction framework

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

35 of 42

35

Node Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

36 of 42

36

Experimental Evaluation

Setup & Baselines, Result

37 of 42

37

Experimental setup

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

Intel(R) Xeon(R) CPU E5-2690@2.6GHz

112GB Memory

On-demand, Spot VM count

Node Capacity(core & memory)�Node Capacity Utilization

evicted (core/memory/rate/count) of Spot VMs �from the previous 3 hours, next 3 hours

Every 1hour Rule�2 Weeks

12000 Node, 20 Cluster

Experimental System

Collect Train Data

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

38 of 42

38

Baseline

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

Linear Regression (LR)

Support Vector Regression (SVR)

Random Forest (RF)

Gradient Boosting Decision Tree (GBDT)

Long Short-term Memory (LSTM)

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

39 of 42

39

Experimental Results

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

위 표는 비교 모델의 클러스터, 노드레벨 예측의 평균 제곱 편차와

논문에서 제시한 모델의 평균 제곱 편차를 보여주는 표입니다.

논문에서 제안한 모델이 다른 모델보다 우수한 성능을 보여주고 있다.

표 1은 예측 시간대가 증가함에 따라 RMSE가 증가하는 추세를 보여준다. 즉, 더 긴 시간 예측은 더 높은 RMSE로 이어진다.

그러나 일부 기준 방법의 경우 오류가 단조롭게 증가하지 않는다는 것을 알 수 있다(예: 2시간 예측의 경우 더 큰 오류).

이는 통계적 변동을 초래하는 데이터의 노이즈 또는 노드의 중복성으로 인해 이러한 기준선 방법이 제공된 데이터 양을 처리하지 못하기 때문일 수 있다.

우리의 방법은 모든 클러스터를 고려할 때 예측 지평선이 증가하면서 단조로운 RMSE 증가를 보여준다.

40 of 42

40

Experimental Results

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

다음은 클러스터 용량 사용량과 예측 정확도의 상관관계에 대한 표입니다.

클러스터들을 4개의 분류로 나누었습니다.

코어 활용륭 범위가 100-80 % = A

‘’ 80-60% = B

‘’ 60-40% = C

‘’ 40-20% = D

표에 보이는 것처럼 A처럼 코어 활용률이 높은 클러스터는 일반적으로 스팟/온디맨드 비율이 낮으면 이는 스팟 인스턴스가 적은 모습을 보여준다.

반면 D처럼 코어 활용률이 적은 클러스터는 오히려 spot 인스턴스가 온디맨드 인스턴스에 비해 훨씬 높은 비율을 차지하고 있는 것을 볼 수 있다.

논문에서 제안한 노드레벨 인터럽트 예측 모델은 코어 활용도가 높고 낮은 클러스터에서는 정확도가 높지만 코어 활용도가 절반 정도인 클러스터에서는 성능이 떨어집니다.

이는 코어 활용도가 높고 낮은 클러스터에서는 더 쉽게 예측이 가능하기 떄문이다.

41 of 42

41

Conclusion

42 of 42

42

Conculsion

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

In this paper investigate Spot VM eviction prediction methods at the node level and the cluster level

cluster-level prediction baselines generally perform better than their node-level implementations.

a longer-time prediction has a lower prediction accuracy in cluster-level baselines and our model, which is not observed at the node-level.

clusters with medium core utilization is the bottleneck case of our prediction model.

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim