The ghost is the machine:�Weird machines in transient execution
Ping-Lun Wang, Fraser Brown, Riad S. Wahby
WOOT, 05/25 2023
What are weird machines?
2
[1] S. Dolan, “mov is Turing-complete” (2013)
[2] J. Bangert et al., “The page-fault weird machine: Lessons in instruction-less computation” (WOOT 2013)
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX weird machines that compute with time[3]
3
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX Weird Gate
Input I1 in cache?
Input I2 in cache?
Fetch output Out into cache?
Cache
I1
I2
→ short latency
→ 1
→ long latency
→ 0
= 1
= 0
TSX weird machines that compute with time[3]
4
Cache
I1
I2
Out
I1
I2
Out
Cache
I1
I2
Out
I1
I2
Out
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX Weird Gate
Input I1 in cache?
Input I2 in cache?
Fetch output Out into cache?
AND
OR
TSX weird machines that compute with time[3]
5
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX Weird Gate
Input I1 in cache?
Input I2 in cache?
Fetch output Out into cache?
TSX weird machines that compute with time[3]
Limitations
6
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX Weird Gate
Input I1 in cache?
Input I2 in cache?
Fetch output Out into cache?
Our work: generalize TSX WM to Transient WM
7
Transient weird machine (WM)
Transient execution primitives
TSX
Meltdown
Spectre
Side channel
Cache
Weird gates
Assign
AND
OR
XOR
NOT
①
②
③
④
What is transient execution?
8
clflush(X);
TSX {� AX /= 0;� X[0] = 0;
}
Cache
PC →
(Waiting…)
(Speculative)
X
DIV by 0!!
▲ Transient execution (TSX)
(Modified!!)
Transient execution begins!
Transient execution ends!
Generalize TSX WM to other primitives
9
clflush(X);
TSX {� AX /= 0;� X[0] = 0;
}
▲ Transient execution (TSX)
INIT:� SIGFPE {� goto EXIT;� }��clflush(X);�AX /= 0;�X[0] = 0;
EXIT:
…
▲ Transient execution (Meltdown)
Assign: input latency controls output variable
10
INIT:� I[0] = 0;� SIGFPE {� goto EXIT;� }��clflush(Y);�AX /= 0;�Out[I[0]] = 0;�EXIT:� ...
① I is in cache
② I is NOT in cache
Cache
I
Out
Cache
I
Out
DIV by 0!!
Fetch I[0]
Fetch Out[0]
Assign gate
Transient exec.
Assign
DIV by 0!!
Transient exec.
Assign
Fetch I[0]
Fetch Out[0]
❌
✔️
= 0
= 0
From Assign to OR & AND gates
11
Out[I1[0]] = 0;
Out[I2[0]] = 0;
Out[I2[I1[0]]] = 0;
OR gate
AND gate
INIT:� I[0] = 0;� SIGFPE {� goto EXIT;� }��clflush(Y);�AX /= 0;�Out[I[0]] = 0;�EXIT:� ...
Assign gate
How to construct a NOT gate?
12
DIV by 0!!
Transient exec.
I in cache → fast
Assign
✔️
Assign
I not in cache → slow
❌
AX /= 0;�Out[I[0]] = 0;
▲ Assign gate
Time
How to fetch output when input is slow?
→ Fetch output when input is not in cache
Too slow to fetch output…
How to construct a NOT gate?
13
I not in cache → slow
DIV by 0!!
NOT
Fetch output
I in cache → fast
DIV by 0!!
❌
✔️
Transient exec.
Transient exec.
AX /= I[0];�Out[tmp[0]] = 0;
▲ NOT gate
Time
Input
↓
Insight: transient execution time is not constant!
Slow input →
Slow transient exec.
NOT: input latency controls transient execution time
14
INIT:� I[0] = O[0] = 0;� SIGFPE {� goto EXIT;� }��clflush(O);�clflush(tmp);�AX /= I[0];�Out[tmp[0]] = 0;�EXIT:� ...
① I is in cache
② I is NOT in cache
NOT gate
Cache
I
tmp
Out
Cache
I
tmp
Out
Transient exec.
NOT
Transient exec.
NOT
Fetch I[0]
Fetch tmp[0]
Fetch Out[0]
DIV by 0!!
Division
Fetch Out[0]
Fetch tmp[0]
Fetch I[0]
Division
DIV by 0!!
✔️
❌
Measuring accuracy and runtime
15
Weird Gates
1M operations
Repeat 1,000 times
Median accuracy & runtime
Accuracy and runtime
Accuracy
16
Gate | AMD | Intel | ARM |
AND | 99.96% | 99.99% | 98.89% |
OR | 100.00% | 99.99% | 99.46% |
Assign | 99.99% | 99.99% | 99.42% |
NOT | 99.51% | 99.99% | 99.29% |
XOR | 94.36% | 95.58% | 97.97% |
MUX | 98.17% | 99.93% | 97.64% |
Gate | AMD | Intel | ARM |
AND | 1.801 | 2.628 | 32.521 |
OR | 1.756 | 2.551 | 32.506 |
Assign | 1.751 | 2.637 | 32.378 |
NOT | 1.779 | 2.741 | 32.573 |
XOR | 7.403 | 15.648 | 130.898 |
MUX | 5.870 | 11.975 | 131.150 |
Runtime (s/Mops)
Application: program obfuscation
17
if (input ^ password) {
fail();
} else {
success();
}
XOR
MUX
input
password
success
fail
Function pointer
Obfuscate with Transient WM
Takeaways
18
Thank you for listening!
What is transient execution?
19
INIT:� SIGFPE {� goto EXIT;� }��clflush(X);�AX /= 0;�X[0] = 0;
EXIT:
…
Cache
PC →
(Waiting…)
(Speculative)
X
DIV by 0!!
▲ Transient execution (Meltdown)
(Modified!!)
Transient execution begins!
Transient execution ends!
Can Meltdown & Spectre perform computations?
20
raise_exception();�// the line below is never reached�access(probe_array[data * 4096]);
if (x < array1_size)� y = array2[array1[x] * 4096];
Spectre
Meltdown
Can we hide computations here?
Not visible in architectural states!
Microarchitectural weird machines
21
Computing with time: Microarchitectural weird machines (ASPLOS’21)
✔️ Can compute AND, OR, XOR and other operations
✔️ Compute with microarchitectural states
❌ TSX instructions are easy to detect
❌ Only support some Intel CPUs
Q: Can we achieve the same computations without TSX?
A: Yes! Any transient execution primitives are possible!
Q: Can we add support to other CPUs?
A: Yes! AMD, ARM, and Intel CPUs are all supported!
TSX gates: the Assign gate
22
INIT:� X[0] = 0;��clflush(Y);�TSX {� tmp /= 0;� Y[X[0]] = 0;�}
…
DIV by 0!
PC
Exit TSX
Pipeline
X is NOT in cache
X is in cache
Y[X[0]] = 0;
Y[0] = 0;
X[0] = 0
Y[X[0]] = 0;
X[0] = ???
Fetch Y[0] in cache!
Y[0] is not fetched…
▲ Assign gate