1 of 19

Bring­ing ISA se­man­tics to Lean and Lean-MLIR

Léo Stefanesco

Leaning in – March 2025

2 of 19

What’s an Instruction Set Architecture?

The contract between CPU designers and programmers, that is

  • Semantics for machine code (= bitstrings)
  • Forward compatible over decades ⇝ non-determinism

2

ARMv9

RISC-V

Registers

~1400

~150

Instructions

~2600

~300

Note: we currently do not deal with the memory model

3 of 19

ISA specification

Historically, ISA are defined in a huge PDF with English text

3

with pseudocode!

4 of 19

Formal ISA specifications

The Arm Architecture Specification Language (ASL) was evolved from pseudocode by Alastair Reid, who is now doing the same at Intel.

4

Specifications are validated against internal conformance test suites.

The other language is Sail, developed at the University of Cambridge. Used for the RISC-V spec.

5 of 19

What can we do with ISAs and Lean?

Program verification: e.g. LNSym at AWS verifies cryptographic code in ARM assembly.

zkVM verification: e.g. Jolt is a zero knowledge virtual machine executing RISC-V instructions.

Compiler backend verification: correctness of compilers rests on the ISA spec, e.g. Lean-MLIR

5

6 of 19

MLIR – Multi-Level Intermediate Representation

6

C/C++

X86

ARM

RISC-V

C/C++

LLVM-IR

X86

ARM

RISC-V

C/C++

LLVM-IR

X86

Swift

SIL-IR

Rust

HIR

MIR

ARM

RISC-V

from alex zinenko

7 of 19

Lean-MLIR: verified rewrites

7

def src := [llvm ()| {

^bb0(%y : i1, %x : i1):

%r = llvm.add %x, %y : i1

llvm.return %r : i1

}]

def tgt := [llvm ()| {

^bb0(%y : i1, %x : i1):

%r = llvm.xor %x, %y : i1

llvm.return %r : i1

}]

theorem is_correct : src ⊑ tgt := by

unfold src, tgt

-- simplify away the semantics framework

simp_alive_peephole

-- apply expression-level theorem

apply add_xor_math_proof

theorem add_xor_math_proof:

∀ (y x : BitVec 1),

x + y = x ^^^ y := by

bv_decide

Also proved: some classic optimisations such as CSE or DCE, and dialect lowerings.

8 of 19

Instruction selection in MLIR

8

target-agnostic

generic

accelerator

riscv

cf

riscv

snitch

snitch

memref

stream

riscv

func

linalg

arith

func

scf

riscv

riscv

scf

target-specific

from Sasha Lopoukhine

9 of 19

9

10 of 19

Semantics through interpretation

10

Scattered definitions

Bitvector patterns

11 of 19

with some dependent types!

11

12 of 19

12

13 of 19

Shallow embedding of Sail into Lean

Design principles:

  1. Do not reimplement the Sail typechecker in Lean
  2. Get quickly to MVP
  3. Try to generate beautiful Lean code

  • Work directly upstream
  • Write tests

13

14 of 19

Coercions

14

In Sail, type-level numbers are integers:

15 of 19

Sail can compile away features

15

16 of 19

Interpreting effects

In the generic Lean support file:

In the generated file:

16

17 of 19

First milestone: full RISC-V model

Generates ~80k lines of Lean definitions, which takes around 5mn to typecheck (40% of that time is generating BEq instance for ast).

There is still one while loop which is miscompiled.

We pass 186 / 226 of the main Sail test suite.

17

18 of 19

What’s next

We hope you will find it useful

  • Finish the RISC-V model, handle Arm v9
  • Thorough testing by executing RISC-V binaries
  • Explore techniques to simplify model for user code: much of the complexity of models is for system code
  • Connect these models with handwritten specs and compilers.
  • As an authoritative semantics of assembly
  • As a testbed for automated reasoning techniques (eg SMT queries)

18

19 of 19

Lean-MLIR: semantics of dialects

19

inductive Ty

| int

instance : Goedel Ty where

toType

| .int => BitVec 32

inductive Op : Type

| add : Op

| const : (val : ℤ) → Op

instance : OpSignature Op Ty where

signature

| .const _ => ⟨[], [], .int⟩

| .add => ⟨[.int, .int], [], .int⟩

instance : OpDenote Op Ty where

denote

| .const n, _, _ => BitVec.ofInt 32 n

| .add, [(a : BitVec 32), (b : BitVec 32)]ₕ, _ => a + b

Examples of dialects: LLVM, FHE (Homomorphic Encryption), SCF, CIRCT dialects (for hardware)