Bringing ISA semantics to Lean and Lean-MLIR
Léo Stefanesco
Leaning in – March 2025
What’s an Instruction Set Architecture?
The contract between CPU designers and programmers, that is
2
| ARMv9 | RISC-V |
Registers | ~1400 | ~150 |
Instructions | ~2600 | ~300 |
Note: we currently do not deal with the memory model
ISA specification
Historically, ISA are defined in a huge PDF with English text
3
with pseudocode!
Formal ISA specifications
The Arm Architecture Specification Language (ASL) was evolved from pseudocode by Alastair Reid, who is now doing the same at Intel.
4
Specifications are validated against internal conformance test suites.
The other language is Sail, developed at the University of Cambridge. Used for the RISC-V spec.
What can we do with ISAs and Lean?
Program verification: e.g. LNSym at AWS verifies cryptographic code in ARM assembly.
zkVM verification: e.g. Jolt is a zero knowledge virtual machine executing RISC-V instructions.
Compiler backend verification: correctness of compilers rests on the ISA spec, e.g. Lean-MLIR
5
MLIR – Multi-Level Intermediate Representation
6
C/C++
X86
ARM
RISC-V
C/C++
LLVM-IR
X86
ARM
RISC-V
C/C++
LLVM-IR
X86
Swift
SIL-IR
Rust
HIR
MIR
ARM
RISC-V
from alex zinenko
Lean-MLIR: verified rewrites
7
def src := [llvm ()| {
^bb0(%y : i1, %x : i1):
%r = llvm.add %x, %y : i1
llvm.return %r : i1
}]
def tgt := [llvm ()| {
^bb0(%y : i1, %x : i1):
%r = llvm.xor %x, %y : i1
llvm.return %r : i1
}]
theorem is_correct : src ⊑ tgt := by
unfold src, tgt
-- simplify away the semantics framework
simp_alive_peephole
-- apply expression-level theorem
apply add_xor_math_proof
theorem add_xor_math_proof:
∀ (y x : BitVec 1),
x + y = x ^^^ y := by
bv_decide
Also proved: some classic optimisations such as CSE or DCE, and dialect lowerings.
Instruction selection in MLIR
8
target-agnostic
generic
accelerator
riscv
cf
riscv
snitch
snitch
memref
stream
riscv
func
linalg
arith
func
scf
riscv
riscv
scf
target-specific
from Sasha Lopoukhine
9
Semantics through interpretation
10
Scattered definitions
Bitvector patterns
with some dependent types!
11
12
Shallow embedding of Sail into Lean
Design principles:
13
Coercions
14
In Sail, type-level numbers are integers:
Sail can compile away features
15
Interpreting effects
In the generic Lean support file:
In the generated file:
16
First milestone: full RISC-V model
Generates ~80k lines of Lean definitions, which takes around 5mn to typecheck (40% of that time is generating BEq instance for ast).
There is still one while loop which is miscompiled.
We pass 186 / 226 of the main Sail test suite.
17
What’s next
We hope you will find it useful
18
Lean-MLIR: semantics of dialects
19
inductive Ty
| int
instance : Goedel Ty where
toType
| .int => BitVec 32
inductive Op : Type
| add : Op
| const : (val : ℤ) → Op
instance : OpSignature Op Ty where
signature
| .const _ => ⟨[], [], .int⟩
| .add => ⟨[.int, .int], [], .int⟩
instance : OpDenote Op Ty where
denote
| .const n, _, _ => BitVec.ofInt 32 n
| .add, [(a : BitVec 32), (b : BitVec 32)]ₕ, _ => a + b
Examples of dialects: LLVM, FHE (Homomorphic Encryption), SCF, CIRCT dialects (for hardware)