a Hardware-Optimized SNARK
Ulvetanna | Jim Posen, CTO
Hardware-Software Co-design Examples
NAND Flash SSDs
Flash memory has a limited number of write-erase cycles, forcing storage firmware and operating systems to implement wear-leveling
TPUs for Machine Learning quantize floating point values to 8-bits because of the improved efficiency of 8-bit integer multiplication
Source: https://towardsdatascience.com/
Source: https://www.kingston.com/
Profiling Keccak-256
| # Keccak-f | Proof Size (KiB) | Prove time (s) | Verify time (ms) |
Scroll | 3493 (464 KiB) | 39.5 | 261 | 49.4 |
Plonky3 (Keccak) | 4096 (544 KiB) | 1,790 | 16.2 | 262 |
Plonky3 (Poseidon2) | 4096 (544 KiB) | 1,790 | 20.0 | 351 |
Profiling Keccak-256
Four key steps account for majority of computation in both systems
Profiling Keccak-256
Plonky3 w/ Poseidon2
Plonky3 / Keccak
Why are towers of binary fields the right foundation for fast verifiable computing?
Why are towers of binary fields the right foundation for fast verifiable computing?
Finite Field Arithmetic
Prime Fields 𝔽p
Representation is not canonical
Addition is integer addition with a conditional subtraction
Multiplication is integer multiplication with a reduction method
Binary Fields 𝔽2k
Representation is canonical
Addition is bitwise XOR
Multiplication is carryless multiplication with a reduction method
Prime vs Binary Field Multiplication
Carryless multiplication reduces propagation delay and enables higher clock speed
Source: “Exploring the Design Space of Prime Field vs. Binary Field ECC-Hardware Implementations”
Prime vs Binary Field Squaring
Squaring is much cheaper in binary fields because (X + Y)2 = X2 + Y2
Source: “Exploring the Design Space of Prime Field vs. Binary Field ECC-Hardware Implementations”
Binary Fields in Cryptography
Advanced Encryption Standard (AES) is based on 𝔽28
Galois Message Authentication Code (GMAC) is based on 𝔽2128
Binary field elliptic curves are well-studied and very efficient in hardware
QR codes use Reed–Solomon encoding over 𝔽28
Original FRI and zk-STARK protocols
Grøstl SHA-3 finalist hash function is based on 𝔽28
Binary Tower Fields
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
𝔽2128
Binary Tower Fields
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
𝔽2128
𝔽232 𝔽232 𝔽232 𝔽232
Binary Tower Fields
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
𝔽2128
𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28
𝔽232 𝔽232 𝔽232 𝔽232
Binary Tower Fields
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 ………… 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0
128 x 𝔽2
𝔽2128
𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28 𝔽28
𝔽232 𝔽232 𝔽232 𝔽232
Binary Tower Fields
Multiplication | Subfield Mult (m-bit subfield) | Squaring | Inversion |
O(n1.58) | O(n m0.58) | O(n log n) | O(n1.58) |
CPU Benchmarks: Multiplications
We measure the time to multiply cryptographically large extension field elements, at least 128 bits. Benchmarks run on GCP C3 instance, Intel Sapphire Rapids.
Field | Throughput (106 mul / s) |
Goldilocks^2 (plonky2) | 216.07 |
BabyBear^4 (plonky3) | 52.87 |
Binary Tower 128b (binius) | 69.87 |
Binary POLYVAL (binius / Rust-Crypto) | 613.2 |
BN254 Scalar Field (ark-bn254) | 47.36 |
FPGA Estimates: Multiplication
Our experiments measure logic utilization (silicon area) on Xilinx Ultrascale+ device family, which is implemented in 16nm TSMC process node, and associated propagation delays, which are the key indicators for achieving maximum operating frequency.
| Area (Logic Gates) | Delay (ns) | Clock (MHz) | Area x Delay |
Mersenne 31b | 12,306 | 6.863 | 146 | 84,460 |
BN254 Scalar | 2,538,066 | 3.753 | 266 (*49 cycles) | 9,525,000 |
Binary Tower 16b | 1,582 | 1.993 | 502 | 3,152 |
Binary Tower 32b | 4,894 | 3.289 | 304 | 16,100 |
Binary Tower 128b | 58,096 | 3.400 | 294 | 197,500 |
Vision Hash
Vision is an arithmetization-optimized hash function over binary fields
S-box uses both squaring and inversion
“Design of Symmetric-Key Primitives for Advanced Cryptographic Protocols”
by Aly, Ashur, Ben-Sasson, Dhooghe, and Szepieniec
Vision Mark-32
Vision Mark-32 is an optimized instance of Vision using the 32-bit tower field
Joint effort with 3MI Labs (Tomer Ashur & Mairon Mahzoun)
Efficient Arithmetization
Plonky3 Keccak Arithmetization
Plonky3 Keccak Arithmetization
In order to commit 1-bit or 16-bit columns, STARKs must pad them to full field elements. This is up to a 32x loss in efficiency!
Embedding Overhead
Can we commit small values faster than large values?
Embedding Overhead
Can we commit small values faster than large values?
The Binius polynomial commitment scheme maximizes the information density of the committed data.
Binius: SNARKs over Towers of Binary Fields
Binius combines
Binius: SNARKs over Towers of Binary Fields
Binius combines
…also,
Eliminating Embedding Overhead
Binius Keccak Arithmetization
Binius Keccak Arithmetization
In[25]
C[5]
C’[5]
D[5]
B[25]
Out[25]
64b
64b
Profiling Keccak-256
Benchmarks run on GCP c3-highcpu-22 (22 vCPU, 11 core)
| # Keccak-f | Proof Size (KiB) | Prove time (s) | Verify time (ms) |
halo2 | 3493 (464 KiB) | 39.5 | 261 | 49.41 |
Plonky3 (Keccak) | 4096 (544 KiB) | 1,790 | 16.2 | 262 |
Plonky3 (Poseidon2) | 4096 (544 KiB) | 1,790 | 20.0 | 351 |
Binius | 4096 (544 KiB) | 4,737 | 8.10 | 73.5 |
Profiling: Keccak-f[1600]
Binius w/ Grøstl
Plonky3 w/ Keccak
Stay tuned!