1 of 100

Advances in Microarchitectural Attacks

Yuval Yarom

2 of 100

Flush+Reload on Square-and-Multiply

40ns

3 of 100

Clock-based countermeasures

3

4 of 100

Clock-based countermeasures

  • Clock resolution in browsers (2019):
    • Chrome 0.1ms
    • Safari 1ms
    • Firefox 2ms
    • TOR 100ms
  • Apple M1 42 ns
    • Below the Nyquist rate

4

5 of 100

Reducing Timer Resolution

5

Measure this

With this

6 of 100

Reducing Timer Resolution

6

7 of 100

Alternative clocks

7

8 of 100

Low resolution clocks

  • There are many low resolution clocks
    • Browser messages, queues, etc. [SMGM17]
    • Network [SAO+21]
  • Techniques for using them:
    • Edge Thresholding [KS16,SMGM17]
    • Amplification (later)
    • Cache occupancy

8

[SMGM17] M. Schwarz et al. “Fantastic timers and where to find them …”, FC 2017

[SAO+21] A. Shusterman et al. “Prime+Probe 1, JavaScript 0 …”, USENIX Sec 2021

[KS16] D. Kohlbrenner and H. Shacham, “Trusted browsers for uncertain times”, USENIX Sec 2016

9 of 100

Problem: Reduced timer resolution

9

40 ns

25 μs

10 of 100

Problem: Reduced timer resolution

10

Detection

  • LLC latency: 12ns
  • DRAM latency: 50ns
  • Solutions:
    • Repetition
    • Occupancy
    • Amplification

Rate

  • At most one probe per clock

  • Can we do cache attacks at higher than clock resolution?

11 of 100

Logical State of Cache

  • Associate a logical value with memory addresses
    • TRUE – address is cached
    • FALSE – address is not cached
  • Flushing sets a value to FALSE
  • Accessing memory sets a value to TRUE (may also set another to FALSE)
  • Measuring access time observes value (and set to TRUE)
  • What else?

11

F F F T F F F F T F F F F

Processor

Memory

Cache

F

T

12 of 100

Conditional access

  • What is the cache state of *out after execution?

  • TRUE if *in != 0.

  • What if *in == 0?

12

Make sure *in is 0

if (*in == 0)

return;

a = *out

13 of 100

Speculative execution

  • Evaluation of branch conditions can take time

  • The CPU predicts future execution
    • Correct prediction – win
    • Incorrect prediction – rollback
    • Microarchitectural state remains

  • Translate branch prediction state to cache state

13

if (*in == 0)

return;

a = *out

May be executed even if *in == 0

Make sure *in is 0

Train branch to mispredict

14 of 100

Conditional Speculative �Execution

  • Speculation terminates when condition is resolved

  • *in in cache
    • Condition resolves fast
    • *out is not accessed

  • *in not in cache
    • Condition resolution delayed
    • *out is accessed

14

if (*in == 0)

return;

a = *out

*in

*out

TRUE

FALSE

FALSE

TRUE

out 🡨 NOT(in)

Make sure *in is 0

Train branch to mispredict

15 of 100

Other functions

if (*in1 + *in2 == 0)

return;

a = *out

*in1

*in2

*out

FALSE

FALSE

TRUE

FALSE

TRUE

TRUE

TRUE

FALSE

TRUE

TRUE

TRUE

FALSE

if (*in1 == 0)

return;

if (*in2 == 0)

return;

a = *out

*in1

*in2

*out

FALSE

FALSE

TRUE

FALSE

TRUE

FALSE

TRUE

FALSE

FALSE

TRUE

TRUE

FALSE

out 🡨 NOR(in1, in2)

out 🡨 NAND(in1, in2)

16 of 100

Multiple outputs

  • Needed because gates destroy their inputs

  • Limited by number of line fill buffers
    • Can handle up to 12 inputs and outputs

16

if (*in == 0)

return;

a = *out1 + *out2

17 of 100

Minority Report

17

if (*in1 + *in2 == 0)

return;

if (*in2 + *in3 == 0)

return;

if (*in1 + *in3 == 0)

return;

a = *out

*in1

*in2

*in3

*out

FALSE

FALSE

FALSE

TRUE

FALSE

FALSE

TRUE

TRUE

FALSE

TRUE

FALSE

TRUE

FALSE

TRUE

TRUE

FALSE

TRUE

FALSE

FALSE

TRUE

TRUE

FALSE

TRUE

FALSE

TRUE

TRUE

FALSE

FALSE

TRUE

TRUE

TRUE

FALSE

18 of 100

Circuits

  • 4-bit ALU
    • 1258 gates, 84-95% accuracy

  • SHA-1
    • One round: 2208 gates, 95% accuracy (67% with prefetcher)
    • Full (two blocks, with repetitions) 95% accuracy

  • Game of Life
    • 7807 gates 73% accuracy for one �generation, 25% for 20

18

19 of 100

Amplification

  • A NOT gate with a large fan-out amplifies the signal by a factor of 8
    • Two layers – 64
    • Three layers – 512
    • Four layers – 4096

  • Amplify to a resolution of �0.1 second

19

if (*in == 0)

return;

a = *out1 + *out2 +

*out3 + *out4 +

*out5 + *out6 +

*out7 + *out8;

20 of 100

High Resolution Prime+Probe

  • Probe is basically a NAND gate

  • Do multiple probes of the same�cache set. Store results.

  • Amplify later
    • Decouples probing from time measurements

  • Attack square-and-multiply ElGamal with a 0.1ms clock

20

21 of 100

Crypto and Speculative Execution

21

22 of 100

Background

22

23 of 100

Background

23

24 of 100

Motivating Example

key = ...�state = message�for (i from 0 to 8) {� bit = 1 << i� if (key & bit) { state ^= bit }}ciphertext = state

24

25 of 100

Motivating Example

key = ...�state = message�for (i from 0 to 8) {� bit = 1 << i� if (key & bit) { state ^= bit }}ciphertext = state

25

Applies a one-time-pad one bit a time

26 of 100

Motivating Example

key = ...�state = message�for (i from 0 to 8) {� bit = 1 << i� if (key & bit) { state ^= bit }}ciphertext = state

26

Applies a one-time-pad one bit a time

Branches on key material

27 of 100

Background

27

Influences

28 of 100

Background

28

Influences

Infer

29 of 100

Constant-Time Programming

  • Prevents Secret-Dependent
    • Memory Accesses
    • Control Flow
    • Instruction Timings

29

30 of 100

Motivating Example

key = ...�state = message�for (i from 0 to 8) {� bit = 1 << i� state ^= (key & bit)}ciphertext = state

30

No longer uses a branch

31 of 100

Motivating Example

secret key = ...�secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = state

31

32 of 100

Motivating Example

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� if (key & bit) { state ^= bit }}public ciphertext = state

32

33 of 100

Motivating Example

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� if (key & bit) { state ^= bit }}public ciphertext = state

33

34 of 100

Motivating Example

secret key = ...�secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = state

34

35 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

35

Don’t care

Constant Time

Outside World

36 of 100

Declassification

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = state

36

37 of 100

Declassification

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = state

37

38 of 100

Declassification

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = state

38

39 of 100

Declassification

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = declassify(state)

39

40 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

40

Constant Time

Outside World

41 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

41

Constant Time

Outside World

declassify

42 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

42

Constant Time

Outside World

declassify

43 of 100

43

44 of 100

Spectre Variant 1

if (index < array.length) {

public value = array[index]

leak(value)

}

44

45 of 100

Spectre Variant 1

if (index < array.length) {

public value = array[index]

leak(value)

}

45

Body executed even if condition is false

46 of 100

Spectre Variant 1

if (index < array.length) {

public value = array[index]

leak(value)

}

46

Could load a secret

Body executed even if condition is false

47 of 100

Spectre Variant 1

if (index < array.length) {

public value = array[index]

leak(value)

}

47

Could load a secret

Can leak

Body executed even if condition is false

48 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

48

48

declassify

Constant Time

Outside World

49 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

49

49

declassify

Constant Time

Outside World

50 of 100

Speculative Load Hardening

  • Compiler instruments program
  • Detects misspeculation
  • Poisons loaded values under misspeculation

50

51 of 100

Speculative Load Hardening

if (index < array.length) {

public value = array[index]

leak(value)

}

51

52 of 100

Speculative Load Hardening

mask = 0

if (index < array.length) {

mask = index < array.length ? mask : -1

public value = array[index] | mask

leak(value)

}

52

53 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

53

53

declassify

SLH

Constant Time

Outside World

54 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

54

54

declassify

SLH

Constant Time

Outside World

55 of 100

Speculative Declassification

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = declassify(state)

55

56 of 100

Speculative Declassification

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = declassify(state)

56

Skipped

57 of 100

Speculative Declassification

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = declassify(state)

57

Skipped

Unencrypted

58 of 100

Speculative Declassification

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = declassify(state)

58

Skipped

Unencrypted

59 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

59

Speculative declassify

SLH

🕑

Constant Time

Outside World

60 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

60

Speculative declassify

SLH

🕑

Constant Time

Outside World

61 of 100

Real-World Example

secret state = plaintext

state = aes_round(state, *key++)� ...�state = aes_round(state, *key++)if (rounds == 12) {...

}�state = aes_last_round(state, *key++)

public ciphertext = declassify(state)

61

62 of 100

Real-World Example

secret state = plaintext

state = aes_round(state, *key++)� ...�state = aes_round(state, *key++)if (rounds == 12) {...

}state = aes_last_round(state, *key++)

public ciphertext = declassify(state)

62

Skipped

63 of 100

Real-World Example

secret state = plaintext

state = aes_round(state, *key++)� ...�state = aes_round(state, *key++)if (rounds == 12) {...

}�state = aes_last_round(state, *key++)

public ciphertext = declassify(state)

63

Skipped

Incorrectly encrypted

64 of 100

Real-World Example

secret state = plaintext

state = aes_round(state, *key++)� ...�state = aes_round(state, *key++)if (rounds == 12) {...

}state = aes_last_round(state, *key++)

public ciphertext = declassify(state)

64

Skipped

Incorrectly encrypted

65 of 100

Countermeasure

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = declassify(state)

65

66 of 100

Countermeasure

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = declassify(state)

66

Insert speculation fence

67 of 100

Countermeasure

secret key = ...secret state = message�for (i from 0 to 8) {public bit = 1 << i� state ^= (key & bit)}public ciphertext = declassify(state)

67

Insert speculation fence

Earlier instructions

must execute correctly

68 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

68

SLH

Constant Time

Outside World

Fenced declassify

69 of 100

Flow of Values

Public-Typed Values

Secret-Typed Values

69

SLH

Constant Time

Outside World

Fenced declassify

70 of 100

Performance

Impact

70

71 of 100

Selective SLH

public array[]

�if (index < array.length) {

...

public value = array[index]

...

leak(value)

}

71

72 of 100

Selective SLH

public array[]

mask = 0�if (index < array.length) {

mask = index < array.length ? mask : -1

...

public value = array[index] | mask

...

leak(value)

}

72

73 of 100

Selective SLH

secret array[]

mask = 0�if (index < array.length) {

mask = index < array.length ? mask : -1

...

secret value = array[index] | mask

...

leak(value)

}

73

74 of 100

Selective SLH

secret array[]

mask = 0�if (index < array.length) {

mask = index < array.length ? mask : -1

...

secret value = array[index] | mask

...

leak(value)

}

74

75 of 100

Selective SLH

secret array[]

mask = 0�if (index < array.length) {

mask = index < array.length ? mask : -1

...

secret value = array[index] | mask

...

leak(value)

}

75

Out of bounds access

76 of 100

Selective SLH

secret array[]

mask = 0�if (index < array.length) {

mask = index < array.length ? mask : -1

...

secret value = array[index] | mask

...

leak(value)

}

76

Out of bounds access

77 of 100

Selective SLH

public array[]

�if (index < array.length) {

...

public value = array[index]

...

leak(value)

}

77

78 of 100

Selective SLH

public array[]

mask = 0�if (index < array.length) {

mask = index < array.length ? mask : -1

...

public value = array[index] | mask

...

leak(value)

}

78

What if (public) array never leaks?

79 of 100

Selective SLH

secret array[]

�if (index < array.length) {

...

secret value = array[index]

...

leak(value)

}

79

Tagging public values that do not leak as secret reduces overhead!!!

80 of 100

Selective SLH

80

Algorithm

SLH Masking Removed

ChaCha20 (1024B)

~80%

Donna-c64

~95%

81 of 100

What if we do not have secret types?

81

82 of 100

Exorcising Spectres

  • No secret types 🡪 we trust programmers
  • Programmers do not think about speculative execution
  • Can compilers preserve programmer’s intent?
    • No new leak in speculative execution
  • Results:
    • Speculative barriers --- good
    • SLH --- not that good

82

Patrignani and Guarnieri, Exorcising Spectres with Secure Compilers, CCS 2021

83 of 100

Spectre - Recap

if (index < array.length) {

value = array[index]

leak(value)

}�secret = recover()

83

Enter speculative execution

Obtain a secret

Leak secret

Recover secret

84 of 100

Speculative�Barriers

if (index < array.length) {

lfence

value = array[index]

leak(value)

}�secret = recover()

84

Enter speculative execution

Obtain a secret

Leak secret

Recover secret

85 of 100

SLH - Recap

if (index < array.length) {

mask = index < array.length ? mask : -1

value = array[index] | mask

leak(value)

}�secret = recover()

85

Enter speculative execution

Obtain a secret

Leak secret

Recover secret

86 of 100

Speculative secrets

  • Load from memory
  • Type confusion
    • Declassification
    • Register reuse
    • Dynamic type checks

86

87 of 100

Type confusion leaks

value = …

isPublic = …

if (isPublic) {

leak(value)

}�secret = recover()

87

Enter speculative execution

Obtain a secret

Leak secret

Recover secret

88 of 100

A concrete example

if (isPublic) {

x = array[value]

}

88

89 of 100

A concrete example – with SLH

if (isPublic) {

mask = isPublic ? mask : -1

x = array[value] | mask

}

89

Still leaks

90 of 100

A concrete example – with SSLH [PG21]

if (isPublic) {

mask = isPublic ? mask : -1

x = *((array + value) | mask)

}

90

*(array+value)�is the same as�array[value]

Supported in LLVM SLH

91 of 100

More SSLH

if (isPublic) {

mask = isPublic ? mask : -1

if (value | mask) {…}

}

91

92 of 100

More SSLH

if (isPublic) {

mask = isPublic ? mask : -1

if (value | mask) {…}

}

92

93 of 100

Observation

  • Under misspeculation every value is potentially secret
  • Cannot prevent access to secret
  • Prevent leaks w/ constant-time programming

93

LLVM SLH

SSLH

USLH

Memory Access

Control Flow

Instruction Timing

94 of 100

Observation

  • Under misspeculation every value is potentially secret
  • Cannot prevent access to secret
  • Prevent leaks w/ constant-time programming

94

LLVM SLH

SSLH

USLH

Memory Access

Control Flow

Instruction Timing

95 of 100

Leaking instruction timing

95

if (isPublic) {

value = sqrt(value)

value = value*value

value = sqrt(value)

value = value*value

}

On i7-10710U:

Sqrtsd + mulsd take:

  • 65536: 5 cycles
  • 2.34e-308: 7 cycles

96 of 100

Leaking speculative instruction timing

96

if (isPublic) {

value = sqrt(value)

value = value*value

value = sqrt(value)

value = value*value

}

On i7-10710U:

Sqrtsd + mulsd take:

  • 65536: 5 cycles
  • 2.34e-308: 7 cycles

Misspeculation time is independent of misspeculated code.

97 of 100

Leaking speculative instruction timing

97

if (isPublic) {

value = sqrt(value)

value = value*value

value = sqrt(value)

value = value*value

load(fixedAddress)

}

On i7-10710U:

Sqrtsd + mulsd take:

  • 65536: 5 cycles
  • 2.34e-308: 7 cycles

Misspeculation time is independent of misspeculated code.

No dependency?

Load “waits” for a reservation station. Release time depends on prior instructions.

Race between branch resolution and load time.

With 45 pairs, load execution probability is:

  • 65536: 98%
  • 2.34e-308: 5%

98 of 100

Ultimate SLH

  • Implement SSLH
  • Add defence for instruction timing

98

LLVM SLH

SSLH

USLH

Memory Access

Control Flow

Instruction Timing

99 of 100

Countermeasure

99

if (isPublic) {

mask = isPublic ? mask : -1

value = sqrt(value | mask)

value = value*value

value = sqrt(value)

value = value*value

load(fixedAddress)

}

100 of 100

Performance

100

Time Cost Normalisation