Binary Tower Efficiency
Ulvetanna
Case Study: Polygon zkEVM
Polynomial Commitments
We measure the time to commit a polynomial with 228 coefficients (in seconds).
FRI and Binius are run with Reed-Solomon rate ½. FRI is batched for efficiency.
All benchmarks are on an 13th Gen Intel(R) Core(TM) i9-13900H, single-threaded.
| 1-bit elements | 8-bit elements | 32-bit elements | 64-bit elements |
Hyrax BN254 (lasso) | 85.02 | 118.6 | 286.9 | 605.5 |
FRI GL64, Poseidon (plonky2) | 98.718 | 98.718 | 98.817 | 98.817 |
FRI GL64, Keccak (plonky2) | 29.36 | 29.36 | 29.36 | 29.36 |
Binius | 0.5860 | 5.173 | 29.36 | 58.62 |
CPU Benchmarks: Hashing Throughput
Hash | Throughput (MiB / s) |
SHA-256 | 1,722 |
Poseidon Goldilocks 64b (plonky2) | 53.14 |
Poseidon2 BabyBear 31b (risc0) | 32.57 |
Grøstl (Binary Tower 8b) (binius) | 433.6 |
CPU Benchmarks: NTT
NTT times are normalized a 64-bit field size (ie. normalized for the bytes of data encoded).
| 216 | 220 | 224 |
Goldilocks (plonky2) | 1.082 | 18.88 | 366.7 |
Baby Bear (risc0) | 3.476 | 68.19 | 1,297 |
Binary Tower (binius) | 2.453 | 39.01 | 747.9 |
CPU Benchmarks: Multiplications
We measure the time to multiply cryptographically large extension field elements, at least 128 bits.
This is the computational bottleneck of quotient polynomial computation and multivariate sumcheck.
CPU Benchmarks: Multiplications
* For binius we show the timing with an pre-release optimization, called mixed monomial-tower basis.
Field | Time / mul (ns) |
Goldilocks^2 (plonky2) | 3.5231 |
BabyBear^4 (risc0) | 55.194 |
Binary Tower 128b (binius*) | 31.308* / 71.012 |
Binary POLYVAL (binius / Rust-Crypto) | 4.8002 |
BN254 Scalar Field (ark-bn254) | 58.746 |
FPGA Multiplication Operations
| Area (Logic Gates) | Delay (ns) | Clock (MHz) | Area x Delay |
Mersenne 31b | 12,306 | 6.863 | 146 | 84,460 |
Binary Tower 32b | 4,894 | 3.289 | 304 | 16,096 |
Binary Tower 16b | 1,582 | 1.993 | 502 | 3,152 |
Our experiments measure logic utilization (silicon area) on Xilinx Ultrascale+ device family, which is implemented in 16nm TSMC process node, and associated propagation delays, which are the key indicators for achieving maximum operating frequency.
A key fact is that binary multiplication is more efficient and has lower propagation delay because addition and multiplication are carryless.
FPGA NTT Operations
| Gate Count (M) | NTT/s | Efficiency |
Goldilocks 64b | 12.16 | 61,035 | 5,019 |
BabyBear 31b | 13.66 | 122,070 | 8,936 |
Mersenne^2 64b | 10.85 | 61,035 | 5,625 |
Binary Tower 16b | 1.65 | 244,141 | 147,964 |
Binary Tower 32b | 3.54 | 122,070 | 34,483 |
FPGA Hash Operations
We measure the rate of the sponge function for Poseidon, or the size of the compression inputs for Grøstl.
We define Efficiency as the ratio of the rate with the gate count.
| Gate Count (M) | Rate (bits) | Efficiency |
Poseidon | 66.14 | 512 | 7.741 |
Poseidon2 | 42.28 | 512 | 12.11 |
Grøstl | 1.76 | 256 | 145.5 |