Weikeng Chen
Application-specific proof systems
In this talk
1. Application-specific curves
2. Application-specific gates
3. Application-specific acceleration
In this talk
1. Application-specific curves
2. Application-specific gates
3. Application-specific acceleration
Application-specific curves
Zerocash over secp256k1 keys
(new)
Yeti
verifiable time-lock encryption
(Timofey Yaluhin)
Zexe
functional privacy
(Bowe et al.)
Lokum
recursive proof for Starknet
(Mikerah Quintyne-Collins)
Zorro
Zerocash over ed25519 keys
(Sun et al.)
Yafa
zkBridge with EIP-1962
(Akeela et al.)
Canaan
Lokum: pairing-friendly curves over Stark field
https://github.com/hashcloak/Lokum
sufficient 2-arity for FFT
Lokum is a pairing-friendly curve:
avoid nonnative field arithmetics
Zorro: non-pairing-friendly curve over ed25519
https://github.com/FindoraNetwork/noah/
no suitable FFT space at all
Zorro is a non-pairing-friendly curve:
Use Zorro to build Zerocash on ed25519 keys
Bulletproofs over Zorro
TurboPlonk proof over BLS12-381
Prove the knowledge of the private key
Prove the relation between nullifiers and commitments
See ePrint 2022/1079
“The inspection model for zero-knowledge proofs and efficient Zerocash with secp256k1 keys”
Delegated Schnorr
Canaan: pairing-friendly curve over secp256k1
Bulletproofs is expensive for prover and verifier
no suitable FFT space at all
Bulletproofs work (shown in ePrint 2022/1079)
Plonk without FFT for small circuits
Interpolation
Inverse FFT
Lagrange interpolation
linear to the # gates, not the next available FFT space size (which is too large)
Multiplication
Hadamard
over coset FFT
Karatsuba
Toom-Cook
Vanishing poly
Permutation
cyclic
acyclic
higher computation complexity, useful for small circuits
Interpolation: Inverse FFT vs Lagrange
Evaluations
0
0
0
0
0
0
0
0
0
0
0
13441
3000
Inverse FFT
Coefficients
very likely a degree-13440 polynomial
Interpolation: FFT vs Lagrange
Evaluations
?
?
?
?
?
?
?
?
?
?
?
13441
3000
Lagrange interpolation
Coefficients
definitely a degree-2999 polynomial
0
0
0
0
0
0
0
0
0
0
0
Multiplication: Hadamard over coset FFT vs Karatsuba Toom-Cook
Coefficients
Coset evaluations
Coset FFT
Hadamard (entry-wise multiplication)
Multiplication: Hadamard over coset FFT vs Karatsuba Toom-Cook
Coefficients
0
0
0
0
0
0
0
0
0
0
0
Coefficients
0
0
0
0
0
0
0
0
0
0
0
Polynomial multiplication
0
0
0
0
0
0
0
0
Coset evaluations
Evaluations
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Permutation: cyclic vs acyclic
Permutation polynomial evaluations
(cyclic)
1
1
1
1
1
1
1
1
1
1
1
Permutation polynomial evaluations
(acyclic)
?
?
?
?
?
?
?
?
?
?
?
Plonk without FFT for small circuits
Groth16
Bulletproofs
Previously
Gemini
expensive for prover and verifier
MSM overhead linear to the density of R1CS matrices
New
developer-friendly, MSM overhead linear to the number of R1CS constraints
Plonk
customized gates
Application-specific curves
Pairing-friendly
(Cocks-Pinch)
Non-pairing-friendly
(complex multiplication)
Bulletproofs, Spartan, Brakedown, Orion, Gemini, Libra,
and the like
Groth16, Plonk,
and the like
In this talk
1. Application-specific curves
2. Application-specific gates
3. Application-specific acceleration
Proof systems are increasingly customized
[BCTV14]
Groth16
Marlin
Plonk
TurboPlonk
UltraPlonk
Kimchi
Dusk Plonk
Plonky
Plonky2
Noah
Aztec
Zcash Halo2
Scroll Halo2
Plonky3
……
……
……
Customized gates
Example of complex gates: Rescue
Example of multi-line gates: range query
can overlay different gates
Customized gates
Example of complex, multi-line gates: Anemoi
reduce proving costs by about 89% compared with our prior implementation of Rescue
Deriving Anemoi customized gates
MSM and customized gates
Witness
Sigma
Quotient
Opening
Remark: (1) witness sparsity, (2) FFT space restriction,
(3) number of gates and equations
In this talk
1. Application-specific curves
2. Application-specific gates
3. Application-specific acceleration
Abstraction for hardware acceleration
GPU
or ASIC
Host
MSM
FFT
excessive host-to-device communication
Case study: Supranational Sppark
Nvidia A40 on Coreweave, MSM over BLS12-377
| | | | | |
2x comm. | 0.232s | 0.619s | 1.731s | 6.320s | 25.659s |
With comm. | 0.177s | 0.401s | 1.106s | 3.692s | 12.924s |
Without comm. | 0.073s | 0.090s | 0.195s | 0.531s | 1.909s |
Nvidia A40 on Coreweave, FFT over BLS12-377’s scalar field
| | | | | |
2x comm. | 0.039s | 0.091s | 0.321s | 1.323s | 4.565s |
With comm. | 0.029s | 0.058s | 0.205s | 0.760s | 2.647s |
Without comm. | 0.015s | 0.018s | 0.033s | 0.102s | 0.373s |
Case study: Supranational Sppark
“Copy time for the bases to the device will not be counted (fixed base). Copy time will be counted for the 1st of the 4 sets of scalars will be counted, but not thereafter.”
--- ZPrize, Accelerating MSM Operations on GPU/FPGA
Not a surprise
“In summary, our answer to the titular question is:
yes, but only after addressing the memory bottleneck!”
Abstraction for hardware acceleration
GPU
or ASIC
Host
MSMLoadBase
FFTLoadTwiddles
GPU
or ASIC
MSMSendScalar
FFTSendData
GPU
or ASIC
GPU
or ASIC
DoMSM
DoInPlaceFFT
Problem: Redundancy in witness
Actual witness
Public keys, private keys, Merkle tree paths, randomness, amounts, asset types
Witness after reductions
Intermediate results of SNARK-friendly hash functions, bits for range checks of the amounts, intermediate results of the calculation of the public keys from private keys, and intermediate results for asset mixing proofs, …
Illustration
Actual witness
Witness polynomials
Sigma polynomial
Quotient polynomial
Opening polynomial
Illustration
Actual circuit description
Selectors polynomials
Permutation polynomials
Precomputation
Abstraction for hardware acceleration
GPU
or ASIC
Host
Application-specific
code
GPU
or ASIC
GPU
or ASIC
GPU
or ASIC
Program the proof system directly in the GPU and ASIC
Example: Aleo mining
Compressed representation
Thanks
We have discussed the state-of-the-arts (March 2023) of
application-specific curves, gates, acceleration
bit.ly/delendum
Delendum has a weekly reading group
Wednesday, March 8, 10am PST
“Customized gates for the Anemoi hash function”