Presenter	Yifan Ning

3 of 31

Background - Secure Inference

Private Input

(the client)

Private Trained Model

(the server / service provider)

4 of 31

Background - CNN

Linear Layers:

Convolution Layers
Fully Connected Layers

Non-Linear Layers:

Activation
Max-Pooling

5 of 31

Gazelle - Threat Model

Semi-honest

Adhere to protocol
Attempt to learn other party’s sensitive data

client

server

6 of 31

Gazelle - Security Guarantees

Never able to hide the network architecture 100%

Hides:

Client input
Network weights
Filter & Stride size of convolution

Doesn’t hide:

Number of layers
Size of each layer
Client’s Input size

7 of 31

Gazelle - Inefficiencies from previous work

Use one scheme for both linear and non-linear layers

2PC (Garbled Circuit / Secret Sharing)

Efficient Computation
Expensive Communication cost

Homomorphic Encryption

Cheap Communication
Large Computation on server

8 of 31

Gazelle - Inefficiencies from previous work

Garbled Circuit / Secret Sharing

Better when generated circuit size is small / linear to input

Homomorphic Encryption

Better when multiplicative depth is small
Or large circuit size (eg. quadratic to input)

Non-Linear Layers

Linear Layers

Non-Linear Layers

Linear Layers

9 of 31

Gazelle - Key Ideas

Switch (customized) Schemes for Linear / Non-Linear Layers!

10 of 31

Gazelle - Key Ideas

For linear layer: Packed Additively Homomorphic Encryption (PAHE)

For non-linear layers: Garbled Circuit

Use Secret Sharing to patch different layers together and switch accordingly

11 of 31

Gazelle - Protocol Overview

GC to PAHE

PAHE Enc

PAHE Eval (Kernel)

PAHE to GC

PAHE Dec

Conv/FC Layer (PAHE)

C_y

[ C_y ]

S_y

[ y ] = [ C_y ] + [ S_y ]

[ x ]

[ C_x ] = [x + r]

Eval GC

Send Labels

C_x

s_x = r

{ s_x}, { s_y},

C_y

s_y

RELU Layer (GC)

Client

Server

a: plaintext

[a]: ciphertext

{a}: GC label

12 of 31

Gazelle - PAHE Abstraction

Fit plaintext into “slot” vectors, which hold the content plus some noise for security.
Supported Operations:

SIMDAdd: SIMD homomorphic addition
SIMDScMult: SIMD homomorphic scalar multiplication (between a ciphertext and a plaintext)
Perm (Automorphism): Plaintext slots permutation, mostly just rotation

13 of 31

Gazelle - PAHE Techniques - FC Layer

Fast Homomorphic Matrix Multiplication, the naive way

Setup:

plaintext weight matrix W, shape n₀ * n_i
Input vector v, shape n_i
Wants to evaluate W*v
Each slot is length n, holding some plaintext vector (plus some noise for security)

14 of 31

Gazelle - PAHE Techniques - FC Layer

Fast Homomorphic Matrix Multiplication, the naive way

Process

For each row w_i of W, pad it to length n
SIMDScMult: [w_i* v], gives component-wise product vector, need its sum
Rotate by half and sum
repeat for all n₀ rows

Cost: n₀ * SIMDScMult + n₀logn rotations + n₀logn SIMD additions
Shortcomings:

Produces n₀ ciphertext for one component of result
Leads to communication quadratic to input size

15 of 31

Gazelle - PAHE Techniques - FC Layer

Fast Homomorphic Matrix Multiplication, input packing

A small trick: when n_i<< n, rather than wasting a lot of space padding, packing n/n_i rows into one plaintext vector, and n/n_i copies of input vector into one ciphertext vector
Perform rotations block by block
Now number of rows change from n₀to n₀ * n_i / n

16 of 31

Gazelle - PAHE Techniques - FC Layer

Fast Homomorphic Matrix Multiplication, the diagonal way

Rationale: Arrange elements in a way such that after SIMDScMult, numbers need to added together never appear in same ciphertext, saving rotation
Process

Encode main diagonal a vector that will later SIMDScMult with input, encode every diagonal above or below SIMDScaMult with rotated input

Cost: n_i * SIMDScMult + (n_i-1) rotations + (n_i-1) SIMD additions, and now the output produces a single ciphertext that has the entire output vector in packed form.
Shortcoming:

Noise grows for some amount compared to naive method, due to doing rotation before SIMDScMult, but still acceptable.

17 of 31

Gazelle - PAHE Techniques - FC Layer

Fast Homomorphic Matrix Multiplication, the hybrid way

In reality, weight matrix W of shape n₀ * n_i is usually rectangular, with n_i >> n₀
For diagonal method, rotation is in n_i , not desirable
Combine both, pack the weights along these extended diagonals into plaintext vectors
Now we have n₀ number of input vector rotation before scalar multiplication
Benchmarks show it almost always outperforms simple naive / diagonal

18 of 31

Gazelle - PAHE Techniques - Conv Layer

Techniques for convolution layer is similar to FC, except some adaptations specially for the operation of convolution

19 of 31

Gazelle - Evaluation

10~20x runtime speedup

10~100x less bandwidth

20 of 31

Gazelle - Evaluation

10~20x runtime speedup

10~100x less bandwidth

21 of 31

Delphi - Intro

22 of 31

Delphi - Intro

achieves semi-honest simulation-based security

supports arbitrary CNNs

Eﬃciency:

improves bandwidth (9x) and inference latency (22x)
can utilize GPU/TPU for linear layers
evaluated on realistic workloads (CIFAR-100, ResNet-32)

23 of 31

Delphi - Motivation

24 of 31

Delphi - Major Techniques

25 of 31

Delphi - Major Techniques

It would be great if we can replace some RELUs with quadratic activations, which are cheap in 2PC

Using quadratic activation affects accuracy

26 of 31

Delphi - Major Techniques

Use Network Architecture Search to figure out the right amount of RELUs to replace while maintaining certain accuracy threshold!

27 of 31

Delphi - Architecture

28 of 31

Delphi - Evaluation

29 of 31

Delphi - Evaluation

30 of 31

Clarifying Questions

How does model training work in this setting? Does it just proceed in the non private approach.

This might be out of scope of this paper, but are arithmetic circuits or boolean circuits better for matrix multiplication? I would assume that arithmetic circuits are better for multiplication. If my assumption is correct, why do the authors use boolean circuits for their kernels and non-linear layers?

31 of 31

Discussion Questions

Could this system be used for non linear networks? (iex, l1 = f1(x), l2 = f2(l1), l3 = f3(l1,l2))

The authors only consider semi-honest corruptions. Can their protocol be compiled into a malicious-secure protocol without incurring too many zero knowledge proofs (the GMW compiler would require one proof per message)?

Why do the authors use a boolean circuit when evaluating the non-linear layer instead of using the proposed methods from SecureML (mpc friendly activation functions + arithmetic circuits). What are the trade-offs here?

The paper mainly mentions the model with one client and one server. However, in many modern use cases, many users can share the same ML model by contributing their input data securely. How well does Gazelle apply to that setting?