1 of 35

Weikeng Chen

Application-specific proof systems

2 of 35

In this talk

1. Application-specific curves

2. Application-specific gates

3. Application-specific acceleration

3 of 35

In this talk

1. Application-specific curves

2. Application-specific gates

3. Application-specific acceleration

4 of 35

Application-specific curves

Zerocash over secp256k1 keys

(new)

Yeti

verifiable time-lock encryption

(Timofey Yaluhin)

Zexe

functional privacy

(Bowe et al.)

Lokum

recursive proof for Starknet

(Mikerah Quintyne-Collins)

Zorro

Zerocash over ed25519 keys

(Sun et al.)

Yafa

zkBridge with EIP-1962

(Akeela et al.)

Canaan

5 of 35

Lokum: pairing-friendly curves over Stark field

https://github.com/hashcloak/Lokum

 

 

sufficient 2-arity for FFT

Lokum is a pairing-friendly curve:

  • generated from the Cocks-Pinch method
  • base field modulus is much larger
  • used to recursively verify Stark proofs

avoid nonnative field arithmetics

6 of 35

Zorro: non-pairing-friendly curve over ed25519

https://github.com/FindoraNetwork/noah/

 

 

no suitable FFT space at all

Zorro is a non-pairing-friendly curve:

  • generated from complex multiplication (using Riad S. Wahby’s script)
  • used to build Zerocash with ed25519 keys via Bulletproofs

 

7 of 35

Use Zorro to build Zerocash on ed25519 keys

Bulletproofs over Zorro

TurboPlonk proof over BLS12-381

Prove the knowledge of the private key

Prove the relation between nullifiers and commitments

See ePrint 2022/1079

“The inspection model for zero-knowledge proofs and efficient Zerocash with secp256k1 keys”

Delegated Schnorr

8 of 35

Canaan: pairing-friendly curve over secp256k1

 

 

Bulletproofs is expensive for prover and verifier

no suitable FFT space at all

 

Bulletproofs work (shown in ePrint 2022/1079)

9 of 35

Plonk without FFT for small circuits

Interpolation

Inverse FFT

Lagrange interpolation

linear to the # gates, not the next available FFT space size (which is too large)

Multiplication

Hadamard

over coset FFT

Karatsuba

Toom-Cook

Vanishing poly

 

 

Permutation

cyclic

acyclic

higher computation complexity, useful for small circuits

10 of 35

Interpolation: Inverse FFT vs Lagrange

Evaluations

0

0

0

0

0

0

0

0

0

0

0

13441

3000

Inverse FFT

Coefficients

very likely a degree-13440 polynomial

11 of 35

Interpolation: FFT vs Lagrange

Evaluations

?

?

?

?

?

?

?

?

?

?

?

13441

3000

Lagrange interpolation

Coefficients

definitely a degree-2999 polynomial

0

0

0

0

0

0

0

0

0

0

0

12 of 35

Multiplication: Hadamard over coset FFT vs Karatsuba Toom-Cook

Coefficients

Coset evaluations

Coset FFT

Hadamard (entry-wise multiplication)

13 of 35

Multiplication: Hadamard over coset FFT vs Karatsuba Toom-Cook

Coefficients

0

0

0

0

0

0

0

0

0

0

0

Coefficients

0

0

0

0

0

0

0

0

0

0

0

Polynomial multiplication

0

0

0

0

0

0

0

0

14 of 35

 

Coset evaluations

 

 

15 of 35

 

Evaluations

 

 

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

16 of 35

Permutation: cyclic vs acyclic

Permutation polynomial evaluations

(cyclic)

1

1

1

1

1

1

1

1

1

1

1

Permutation polynomial evaluations

(acyclic)

?

?

?

?

?

?

?

?

?

?

?

 

17 of 35

Plonk without FFT for small circuits

Groth16

Bulletproofs

Previously

Gemini

expensive for prover and verifier

MSM overhead linear to the density of R1CS matrices

New

developer-friendly, MSM overhead linear to the number of R1CS constraints

Plonk

customized gates

18 of 35

Application-specific curves

Pairing-friendly

(Cocks-Pinch)

Non-pairing-friendly

(complex multiplication)

Bulletproofs, Spartan, Brakedown, Orion, Gemini, Libra,

and the like

Groth16, Plonk,

and the like

19 of 35

In this talk

1. Application-specific curves

2. Application-specific gates

3. Application-specific acceleration

20 of 35

Proof systems are increasingly customized

[BCTV14]

Groth16

Marlin

Plonk

TurboPlonk

UltraPlonk

Kimchi

Dusk Plonk

Plonky

Plonky2

Noah

Aztec

Zcash Halo2

Scroll Halo2

Plonky3

……

……

……

21 of 35

Customized gates

Example of complex gates: Rescue

 

Example of multi-line gates: range query

 

 

 

can overlay different gates

22 of 35

Customized gates

Example of complex, multi-line gates: Anemoi

 

 

 

 

reduce proving costs by about 89% compared with our prior implementation of Rescue

23 of 35

Deriving Anemoi customized gates

 

24 of 35

MSM and customized gates

 

Witness

 

Sigma

 

Quotient

 

Opening

 

Remark: (1) witness sparsity, (2) FFT space restriction,

(3) number of gates and equations

25 of 35

In this talk

1. Application-specific curves

2. Application-specific gates

3. Application-specific acceleration

26 of 35

Abstraction for hardware acceleration

GPU

or ASIC

Host

MSM

FFT

excessive host-to-device communication

27 of 35

Case study: Supranational Sppark

Nvidia A40 on Coreweave, MSM over BLS12-377

2x comm.

0.232s

0.619s

1.731s

6.320s

25.659s

With comm.

0.177s

0.401s

1.106s

3.692s

12.924s

Without comm.

0.073s

0.090s

0.195s

0.531s

1.909s

Nvidia A40 on Coreweave, FFT over BLS12-377’s scalar field

2x comm.

0.039s

0.091s

0.321s

1.323s

4.565s

With comm.

0.029s

0.058s

0.205s

0.760s

2.647s

Without comm.

0.015s

0.018s

0.033s

0.102s

0.373s

28 of 35

Case study: Supranational Sppark

“Copy time for the bases to the device will not be counted (fixed base). Copy time will be counted for the 1st of the 4 sets of scalars will be counted, but not thereafter.”

--- ZPrize, Accelerating MSM Operations on GPU/FPGA

  • A100, A40, 4090 support up to PCI-e standard 4.0.
  • H100 supports up to PCI-e standard 5.0.
  • NVLink (not present in 4090) is necessary to achieve good performance.

29 of 35

Not a surprise

“In summary, our answer to the titular question is:

yes, but only after addressing the memory bottleneck!”

30 of 35

Abstraction for hardware acceleration

GPU

or ASIC

Host

MSMLoadBase

FFTLoadTwiddles

GPU

or ASIC

MSMSendScalar

FFTSendData

GPU

or ASIC

GPU

or ASIC

DoMSM

DoInPlaceFFT

31 of 35

Problem: Redundancy in witness

Actual witness

Public keys, private keys, Merkle tree paths, randomness, amounts, asset types

Witness after reductions

Intermediate results of SNARK-friendly hash functions, bits for range checks of the amounts, intermediate results of the calculation of the public keys from private keys, and intermediate results for asset mixing proofs, …

32 of 35

Illustration

Actual witness

Witness polynomials

Sigma polynomial

Quotient polynomial

Opening polynomial

33 of 35

Illustration

Actual circuit description

Selectors polynomials

Permutation polynomials

Precomputation

34 of 35

Abstraction for hardware acceleration

GPU

or ASIC

Host

Application-specific

code

GPU

or ASIC

GPU

or ASIC

GPU

or ASIC

Program the proof system directly in the GPU and ASIC

Example: Aleo mining

Compressed representation

35 of 35

Thanks

We have discussed the state-of-the-arts (March 2023) of

application-specific curves, gates, acceleration

bit.ly/delendum

Delendum has a weekly reading group

Wednesday, March 8, 10am PST

“Customized gates for the Anemoi hash function”