1 of 34

a Hardware-Optimized SNARK

Ulvetanna | Jim Posen, CTO

2 of 34

Hardware-Software Co-design Examples

NAND Flash SSDs

Flash memory has a limited number of write-erase cycles, forcing storage firmware and operating systems to implement wear-leveling

TPUs for Machine Learning quantize floating point values to 8-bits because of the improved efficiency of 8-bit integer multiplication

Source: https://towardsdatascience.com/

Source: https://www.kingston.com/

3 of 34

Profiling Keccak-256

Keccak is the #1 bottleneck for ZK protocols scaling Ethereum
Scroll’s halo2 uses KZG elliptic curve commitment scheme (256-bit prime)
Plonky3 uses FRI commitment scheme (32-bit prime)
Benchmarks are on a GCP c3-highcpu-22 (22 vCPU, 11 core)

	# Keccak-f	Proof Size (KiB)	Prove time (s)	Verify time (ms)
Scroll	3493 (464 KiB)	39.5	261	49.4
Plonky3 (Keccak)	4096 (544 KiB)	1,790	16.2	262
Plonky3 (Poseidon2)	4096 (544 KiB)	1,790	20.0	351

4 of 34

Profiling Keccak-256

Four key steps account for majority of computation in both systems

Generating the witness
Committing the witness

Reed–Solomon encoding (NTT) + merkleization
Elliptic curve multiscalar multiplication (MSM)

Vanishing argument (lots of arithmetic circuit evaluations)
Opening proof (crypto magic)

5 of 34

Profiling Keccak-256

Plonky3 w/ Poseidon2

Plonky3 / Keccak

6 of 34

Why are towers of binary fields the right foundation for fast verifiable computing?

7 of 34

Why are towers of binary fields the right foundation for fast verifiable computing?

Efficient Arithmetic
Efficient Arithmetization

8 of 34

Efficient Arithmetic

9 of 34

Finite Field Arithmetic

Prime Fields 𝔽_p

Representation is not canonical

Addition is integer addition with a conditional subtraction

Multiplication is integer multiplication with a reduction method

Barrett reduction
Montgomery reduction
Special reduction (Mersenne-31, Goldilocks-64)

Binary Fields 𝔽₂k

Representation is canonical

Addition is bitwise XOR

Multiplication is carryless multiplication with a reduction method

Special (AES)
Montgomery reduction (POLYVAL)
Recursive reduction (Tower)

10 of 34

Prime vs Binary Field Multiplication

Carryless multiplication reduces propagation delay and enables higher clock speed

Source: “Exploring the Design Space of Prime Field vs. Binary Field ECC-Hardware Implementations”

11 of 34

Prime vs Binary Field Squaring

Squaring is much cheaper in binary fields because (X + Y)² = X² + Y²

Source: “Exploring the Design Space of Prime Field vs. Binary Field ECC-Hardware Implementations”

12 of 34

Binary Fields in Cryptography

Advanced Encryption Standard (AES) is based on 𝔽₂8

Galois Message Authentication Code (GMAC) is based on 𝔽₂128

Binary field elliptic curves are well-studied and very efficient in hardware

QR codes use Reed–Solomon encoding over 𝔽₂8

Original FRI and zk-STARK protocols

Grøstl SHA-3 finalist hash function is based on 𝔽₂8

13 of 34

Binary Tower Fields

0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0

𝔽₂128

14 of 34

Binary Tower Fields

0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0

𝔽₂128

𝔽₂32 𝔽₂32 𝔽₂32 𝔽₂32

15 of 34

Binary Tower Fields

0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0

𝔽₂128

𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8

𝔽₂32 𝔽₂32 𝔽₂32 𝔽₂32

16 of 34

Binary Tower Fields

0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0

128 x 𝔽₂

𝔽₂128

𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8 𝔽₂8

𝔽₂32 𝔽₂32 𝔽₂32 𝔽₂32

17 of 34

Binary Tower Fields

Multiplication	Subfield Mult (m-bit subfield)	Squaring	Inversion
O(n^1.58)	O(n m^0.58)	O(n log n)	O(n^1.58)

18 of 34

CPU Benchmarks: Multiplications

We measure the time to multiply cryptographically large extension field elements, at least 128 bits. Benchmarks run on GCP C3 instance, Intel Sapphire Rapids.

Field	Throughput (10⁶ mul / s)
Goldilocks^2 (plonky2)	216.07
BabyBear^4 (plonky3)	52.87
Binary Tower 128b (binius)	69.87
Binary POLYVAL (binius / Rust-Crypto)	613.2
BN254 Scalar Field (ark-bn254)	47.36

19 of 34

FPGA Estimates: Multiplication

Our experiments measure logic utilization (silicon area) on Xilinx Ultrascale+ device family, which is implemented in 16nm TSMC process node, and associated propagation delays, which are the key indicators for achieving maximum operating frequency.

	Area (Logic Gates)	Delay (ns)	Clock (MHz)	Area x Delay
Mersenne 31b	12,306	6.863	146	84,460
BN254 Scalar	2,538,066	3.753	266 (*49 cycles)	9,525,000
Binary Tower 16b	1,582	1.993	502	3,152
Binary Tower 32b	4,894	3.289	304	16,100
Binary Tower 128b	58,096	3.400	294	197,500

20 of 34

Vision Hash

Vision is an arithmetization-optimized hash function over binary fields

S-box uses both squaring and inversion

“Design of Symmetric-Key Primitives for Advanced Cryptographic Protocols”

by Aly, Ashur, Ben-Sasson, Dhooghe, and Szepieniec

21 of 34

Vision Mark-32

Vision Mark-32 is an optimized instance of Vision using the 32-bit tower field

S-box optimized with

fast recursive inversion in the Wiedemann tower field
arithmetic verification uses fast squaring
linearized polynomial evaluation as an 𝔽₂-affine transformation

MDS matrix optimized with

8-bit tower field coefficients and subfield multiplication
novel polynomial basis additive NTT techniques

Joint effort with 3MI Labs (Tomer Ashur & Mairon Mahzoun)

22 of 34

Efficient Arithmetization

23 of 34

Plonky3 Keccak Arithmetization

Plonky3 uses AIR arithmetization
Each Keccak-f permutation is 24 rows
One row has 2,328 1-bit columns and 404 16-bit columns
Trace has 1,048 non-linear constraints

24 of 34

Plonky3 Keccak Arithmetization

Plonky3 uses AIR arithmetization
Each Keccak-f permutation is 24 rows
One row has 2,328 1-bit columns and 404 16-bit columns
Trace has 1,048 non-linear constraints

In order to commit 1-bit or 16-bit columns, STARKs must pad them to full field elements. This is up to a 32x loss in efficiency!

25 of 34

Embedding Overhead

Can we commit small values faster than large values?

Using elliptic curve commitment schemes: yes, with caveats
With FRI commitments: not below 32 bits

26 of 34

Embedding Overhead

Can we commit small values faster than large values?

Using elliptic curve commitment schemes: yes, with caveats
With FRI commitments: not below 32 bits
With Binius: yes, using block-level encoding!

The Binius polynomial commitment scheme maximizes the information density of the committed data.

27 of 34

Binius: SNARKs over Towers of Binary Fields

Binius combines

Arithmetization over towers of binary fields
Multilinear poly IOP based on HyperPlonk
Generalized Brakedown + FRI commitment that eliminates embedding overhead

28 of 34

Binius: SNARKs over Towers of Binary Fields

Binius combines

Arithmetization over towers of binary fields
Multilinear poly IOP based on HyperPlonk
Generalized Brakedown + FRI commitment that eliminates embedding overhead

…also,

A new multilinear shift argument
Adaptation of Lasso lookup argument

29 of 34

Eliminating Embedding Overhead

Instead of embedding single coefficients into an extension field, we embed chunks of coefficients into the tower algebra.
Reed–Solomon encode with the tower algebra as the alphabet.
Generalize Brakedown and BaseFold’s multilinear FRI PCS to operate over the tower algebra.

30 of 34

Binius Keccak Arithmetization

Each Keccak-f permutation is 24 rows over 𝔽₂64 / 1,536 rows over 𝔽₂
One row has 60 committed columns and 80 virtual columns
Trace has 50 non-linear constraints and 10 linear constraints

31 of 34

Binius Keccak Arithmetization

In[25]

C[5]

C’[5]

D[5]

B[25]

Out[25]

64b

32 of 34

Profiling Keccak-256

Benchmarks run on GCP c3-highcpu-22 (22 vCPU, 11 core)

	# Keccak-f	Proof Size (KiB)	Prove time (s)	Verify time (ms)
halo2	3493 (464 KiB)	39.5	261	49.41
Plonky3 (Keccak)	4096 (544 KiB)	1,790	16.2	262
Plonky3 (Poseidon2)	4096 (544 KiB)	1,790	20.0	351
Binius	4096 (544 KiB)	4,737	8.10	73.5

33 of 34

Profiling: Keccak-f[1600]

Binius w/ Grøstl

Plonky3 w/ Keccak

34 of 34

Stay tuned!

@UlvetannaHQ

https://www.ulvetanna.io/

https://gitlab.com/UlvetannaOSS/binius