1 of 42

Spot Virtual Machine Eviction �Prediction in Microsoft Cloud

Fangkai Yang, Bowen Pang, �Jue Zhang, Bo Qiao, Lu Wang �

Microsoft Research

Microsoft 365

Microsoft Azure

2 of 42

2

Billing policies in public cloud

On-demand Instance

  • Pay for computer capacity by the hour or second
  • E.g., V100 GPU instance : $3.06 / Hour (p3.2xlarge)

Reserved Instance

  • Discount is available (up to 60%)
  • You have to reserve instances by 1 year or 3 year

Spot Instance

  • Significant discount (up to 90%)
  • It could be interrupted or not-fulfilled due to a lack of resources

AWS: Spot Instance

Azure: Spot Virtual Machines

Google Cloud: Spot Virtual Machines

What is Spot Instance in the Cloud Computing?

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

3 of 42

3

Computing resource (capacity)�in datacenter of public cloud

Used

resources

Unused

resources

How Spot Instance Work?

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Fulfill!

4 of 42

4

Computing resource (capacity)�in datacenter of public cloud

How Spot Instance Work?

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Used

resources

Unused

resources

Interrupt!

5 of 42

5

Spot Instance Interruption Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

User

Cloud Vendor

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Spot VM eviction predictions,

available per region in the Azure Portal

when deploying new VMs, are helpful

to optimize capacity utilization planning

and allocation management.

the eviction prediction informs users

to optimize deployment plans

to increase the survivability of Spot VMs

and reduce the possibility of interruptions.

6 of 42

6

Difficulty predicting spot instance interruption

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

complex allocation policy

large-scale data center

Difficult to predict

7 of 42

7

Spot Instance Interruption Prediction

Cluster Level & Node Level Prediction

8 of 42

8

Cluster Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

9 of 42

9

Computing resource (capacity)�in datacenter of public cloud

Cluster Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

T

 

60%

78%

 

 

 

 

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

50%

18%

 

 

 

 

30%

60%

 

 

 

 

10%

98%

 

 

 

 

10 of 42

10

Problem of Cluster Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

On-demand

Spot

Instance Type A

Instance Type B

Instance Type C

Instance Type D

Cluster A

Cluster B

Cluster Capacity Utilization

Node 1

Node 2

Node 3

Node 1

Node 2

Node 3

Cluster A Node3

Cluster B Node3

Interrupt

No -Interrupt

Azure VM Allocator

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

11 of 42

11

Node Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

The overview of node-level spatial-temporal prediction framework

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

12 of 42

12

Spatial-Temporal Transformer Framework

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

13 of 42

13

Spatial-Temporal Transformer Framework

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

14 of 42

14

Transformer?

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Bidirectional Encoder Representation form Transfomers

Genverative Pre-Training

15 of 42

15

Transformer Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Positional�Encoding

16 of 42

16

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Positional�Encoding

Encoder

Decoder

17 of 42

17

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Positional�Encoding

1. Input Embedding

Input

Embedding

안녕 경환아 잘 지내?

0.1

0.65

0.29

안녕 =

18 of 42

18

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Positional�Encoding

2. Input Embedding

Time Step

Positional

Encoding

Positional Input

Embeddings

1

2

3

4

19 of 42

19

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Positional�Encoding

3-4. Encoder Layer

Positional Input

Embeddings

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Encoder Input

Represintation

20 of 42

20

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

안녕

경환아

지내?

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

21 of 42

21

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

안녕

경환아

지내?

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

Linear

Linear

Linear

key

query

value

22 of 42

22

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

key

query

Scores

23 of 42

23

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

안녕

경환아

지내?

안녕

경환아

지내?

98

27

10

12

89

9

67

54

91

92

54

67

9

10

27

12

Attention

Energies

24 of 42

24

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

sqrt(dimension of key)

Scaled Scores

25 of 42

25

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

SoftMax

안녕

경환아

지내?

안녕

경환아

지내?

=

0.7

0.3

0.1

0.6

0.1

0.6

0.3

0.1

0.1

0.3

0.3

0.1

0.1

0.1

0.1

0.2

26 of 42

26

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

value

Attentaion weight

output

27 of 42

27

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

key

query

value

key

query

value

28 of 42

28

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

key

query

value

key

query

value

Self-Attention

Head 1

Self-Attention

Head 2

N = 2

29 of 42

29

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

N = 2

Linear

30 of 42

30

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

Multi-Head�Attention

3. Multi-headed Attention

Q

K

V

Self-Attention

Linear

Softmax

Linear

Linear

Scale

MatMul

MatMul

Linear

concat

Multi-headed Attention

Output vectors

Positinal Input

Embedding

31 of 42

31

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

4. Residual Connection, Layer Normalizatioin

& Pointwise Feed Forward

Multi-headed Attention

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Positional�Encoding

32 of 42

32

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

4. Residual Connection, Layer Normalizatioin

& Pointwise Feed Forward

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Positional�Encoding

LayerNorm

Linear

Linear

ReLu

LayerNorm

33 of 42

33

Attention Mechanism

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

4. Residual Connection, Layer Normalizatioin

& Pointwise Feed Forward

Add &norm

Feed�Forwad

Add &norm

Multi-Head�Attention

Input

Embedding

Input

Embedding

Add &norm

Multi-Head�Attention

Add &norm

Multi-Head�Attention

Add &norm

Feed�Forwad

Linear

Softmax

Output

Probabilities

Inputs

Outputs

(shifted righ)

Positional�Encoding

Positional�Encoding

Transformer

Encoder

Transformer

Encoder

.

.

.

N times

34 of 42

34

Node Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

The overview of node-level spatial-temporal prediction framework

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

35 of 42

35

Node Level Prediction

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

36 of 42

36

Experimental Evaluation

Setup & Baselines, Result

37 of 42

37

Experimental setup

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

Intel(R) Xeon(R) CPU E5-2690@2.6GHz

112GB Memory

On-demand, Spot VM count

Node Capacity(core & memory)�Node Capacity Utilization

evicted (core/memory/rate/count) of Spot VMs �from the previous 3 hours, next 3 hours

Every 1hour Rule�2 Weeks

12000 Node, 20 Cluster

Experimental System

Collect Train Data

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

38 of 42

38

Baseline

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

Linear Regression (LR)

Support Vector Regression (SVR)

Random Forest (RF)

Gradient Boosting Decision Tree (GBDT)

Long Short-term Memory (LSTM)

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

39 of 42

39

Experimental Results

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

40 of 42

40

Experimental Results

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim

41 of 42

41

Conclusion

42 of 42

42

Conculsion

Distributed Data Processing System Lab, KOOKMIN UNIVERSITY

In this paper investigate Spot VM eviction prediction methods at the node level and the cluster level

  • cluster-level prediction baselines generally perform better than their node-level implementations.

  • a longer-time prediction has a lower prediction accuracy in cluster-level baselines and our model, which is not observed at the node-level.

  • clusters with medium core utilization is the bottleneck case of our prediction model.

2023-03-13 DDPS Seminar, Presenter Kyunghwan Kim