1 of 22

The ghost is the machine:�Weird machines in transient execution

Ping-Lun Wang, Fraser Brown, Riad S. Wahby

WOOT, 05/25 2023

2 of 22

What are weird machines?

  • Definition: models a system’s unintentional behavior, usually triggered from adversarial input
  • Use weird machines as computation primitives
  • Perform program obfuscation or secret computations
    • The MOV instruction[1]
    • Page fault handler [2]
    • Time (TSX WM)[3]

2

[1] S. Dolan, “mov is Turing-complete” (2013)

[2] J. Bangert et al., “The page-fault weird machine: Lessons in instruction-less computation” (WOOT 2013)

[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)

3 of 22

TSX weird machines that compute with time[3]

3

[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)

TSX Weird Gate

Input I1 in cache?

Input I2 in cache?

Fetch output Out into cache?

Cache

I1

I2

  • Cache hit

→ short latency

1

  • Cache miss

→ long latency

0

= 1

= 0

4 of 22

TSX weird machines that compute with time[3]

4

Cache

I1

I2

Out

I1

I2

Out

Cache

I1

I2

Out

I1

I2

Out

[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)

TSX Weird Gate

Input I1 in cache?

Input I2 in cache?

Fetch output Out into cache?

AND

OR

5 of 22

TSX weird machines that compute with time[3]

  • Compute on cache states (μArch)
  • Not visible in architectural states!
  • Gates: Assign, AND, OR, XOR

5

[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)

TSX Weird Gate

Input I1 in cache?

Input I2 in cache?

Fetch output Out into cache?

6 of 22

TSX weird machines that compute with time[3]

Limitations

  • TSX instructions are easy to detect
    • Vulnerable to static analysis!
  • Limited to some Intel CPUs
    • AMD and ARM CPUs are not supported!

6

[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)

TSX Weird Gate

Input I1 in cache?

Input I2 in cache?

Fetch output Out into cache?

7 of 22

Our work: generalize TSX WM to Transient WM

  • TSX → any transient execution primitive!
    • Prevent static analysis
    • Support AMD, ARM, and Intel CPUs
    • 95~99% accuracy on all CPUs
  • New NOT gate design
    • Idea: cache state controls the length of transient execution
    • Leads to a ≈7X faster XOR gate

7

Transient weird machine (WM)

Transient execution primitives

TSX

Meltdown

Spectre

Side channel

Cache

Weird gates

Assign

AND

OR

XOR

NOT

8 of 22

What is transient execution?

8

clflush(X);

TSX {� AX /= 0;� X[0] = 0;

}

Cache

PC →

(Waiting…)

(Speculative)

X

DIV by 0!!

▲ Transient execution (TSX)

(Modified!!)

Transient execution begins!

Transient execution ends!

9 of 22

Generalize TSX WM to other primitives

9

clflush(X);

TSX {� AX /= 0;� X[0] = 0;

}

▲ Transient execution (TSX)

INIT:�    SIGFPE {�        goto EXIT;�    }��clflush(X);�AX /= 0;�X[0] = 0;

EXIT:

▲ Transient execution (Meltdown)

10 of 22

Assign: input latency controls output variable

10

INIT:�    I[0] = 0;�    SIGFPE {�        goto EXIT;�    }��clflush(Y);�AX /= 0;�Out[I[0]] = 0;�EXIT:�    ...

I is in cache

I is NOT in cache

Cache

I

Out

Cache

I

Out

DIV by 0!!

Fetch I[0]

Fetch Out[0]

Assign gate

Transient exec.

Assign

DIV by 0!!

Transient exec.

Assign

Fetch I[0]

Fetch Out[0]

✔️

= 0

= 0

11 of 22

From Assign to OR & AND gates

11

Out[I1[0]] = 0;

Out[I2[0]] = 0;

Out[I2[I1[0]]] = 0;

OR gate

AND gate

INIT:�    I[0] = 0;�    SIGFPE {�        goto EXIT;�    }��clflush(Y);�AX /= 0;�Out[I[0]] = 0;�EXIT:�    ...

Assign gate

12 of 22

How to construct a NOT gate?

12

DIV by 0!!

Transient exec.

I in cache → fast

Assign

✔️

Assign

I not in cache → slow

AX /= 0;�Out[I[0]] = 0;

Assign gate

Time

How to fetch output when input is slow?

→ Fetch output when input is not in cache

Too slow to fetch output…

13 of 22

How to construct a NOT gate?

13

I not in cache → slow

DIV by 0!!

NOT

Fetch output

I in cache → fast

DIV by 0!!

✔️

Transient exec.

Transient exec.

AX /= I[0];�Out[tmp[0]] = 0;

NOT gate

Time

Input

Insight: transient execution time is not constant!

Slow input →

Slow transient exec.

14 of 22

NOT: input latency controls transient execution time

14

INIT:�    I[0] = O[0] = 0;�    SIGFPE {�        goto EXIT;�    }��clflush(O);�clflush(tmp);�AX /= I[0];�Out[tmp[0]] = 0;�EXIT:�    ...

I is in cache

I is NOT in cache

NOT gate

Cache

I

tmp

Out

Cache

I

tmp

Out

Transient exec.

NOT

Transient exec.

NOT

Fetch I[0]

Fetch tmp[0]

Fetch Out[0]

DIV by 0!!

Division

Fetch Out[0]

Fetch tmp[0]

Fetch I[0]

Division

DIV by 0!!

✔️

15 of 22

Measuring accuracy and runtime

  • Test on 3 CPUs
    • AMD EPYC 7R13
    • ARM Graviton 1
    • Intel Xeon E7-8800
  • Implementation
    • C/C++ and asm
    • 1,379 LoC
    • Available on GitHub*

15

Weird Gates

1M operations

Repeat 1,000 times

Median accuracy & runtime

16 of 22

Accuracy and runtime

Accuracy

16

Gate

AMD

Intel

ARM

AND

99.96%

99.99%

98.89%

OR

100.00%

99.99%

99.46%

Assign

99.99%

99.99%

99.42%

NOT

99.51%

99.99%

99.29%

XOR

94.36%

95.58%

97.97%

MUX

98.17%

99.93%

97.64%

Gate

AMD

Intel

ARM

AND

1.801

2.628

32.521

OR

1.756

2.551

32.506

Assign

1.751

2.637

32.378

NOT

1.779

2.741

32.573

XOR

7.403

15.648

130.898

MUX

5.870

11.975

131.150

Runtime (s/Mops)

17 of 22

Application: program obfuscation

  • Obfuscate computations and control flows
  • Future work: compiler to construct circuits of weird gates

17

if (input ^ password) {

    fail();

} else {

    success();

}

XOR

MUX

input

password

success

fail

Function pointer

Obfuscate with Transient WM

18 of 22

Takeaways

  • TSX WMs + Meltdown & Spectre → Transient WMs
  • Novel NOT gate design
  • Obfuscate malware and attack exploits

18

Thank you for listening!

  • Github: https://github.com/joeywang4/Transient-Weird-Machine
  • Authors: Ping-Lun Wang, Fraser Brown, Riad S. Wahby
  • Contact: pinglunw@andrew.cmu.edu

19 of 22

What is transient execution?

19

INIT:�    SIGFPE {�        goto EXIT;�    }��clflush(X);�AX /= 0;�X[0] = 0;

EXIT:

Cache

PC →

(Waiting…)

(Speculative)

X

DIV by 0!!

▲ Transient execution (Meltdown)

(Modified!!)

Transient execution begins!

Transient execution ends!

20 of 22

Can Meltdown & Spectre perform computations?

20

raise_exception();�// the line below is never reachedaccess(probe_array[data * 4096]);

if (x < array1_size)�    y = array2[array1[x] * 4096];

Spectre

Meltdown

Can we hide computations here?

Not visible in architectural states!

  • Encryption or decryption
  • Malicious code

21 of 22

Microarchitectural weird machines

21

Computing with time: Microarchitectural weird machines (ASPLOS’21)

✔️ Can compute AND, OR, XOR and other operations

✔️ Compute with microarchitectural states

TSX instructions are easy to detect

Only support some Intel CPUs

Q: Can we achieve the same computations without TSX?

A: Yes! Any transient execution primitives are possible!

Q: Can we add support to other CPUs?

A: Yes! AMD, ARM, and Intel CPUs are all supported!

22 of 22

TSX gates: the Assign gate

22

INIT:�    X[0] = 0;��clflush(Y);�TSX {�    tmp /= 0;�    Y[X[0]] = 0;�}

DIV by 0!

PC

Exit TSX

Pipeline

X is NOT in cache

X is in cache

Y[X[0]] = 0;

Y[0] = 0;

X[0] = 0

Y[X[0]] = 0;

X[0] = ???

Fetch Y[0] in cache!

Y[0] is not fetched…

Assign gate