1 of 17

DAD: Data-free Adversarial Defense �at Test Time

Gaurav Kumar Nayak*, Ruchit Rawal*, Anirban Chakraborty

Department of Computational and Data Sciences

Indian Institute of Science, Bangalore, India

Indian Institute of Science

Bangalore, India

भारतीय विज्ञान संस्थान

बंगलौर, भारत

©Department of Computational and Data Science, IISc, 2016�This work is licensed under a Creative Commons Attribution 4.0 International License

Copyright for external content used with attribution is retained by their original authors

CDS

Department of Computational and Data Sciences

2 of 17

Adversarial Vulnerability

2

Motivation

Approach

Results

Conclusion

Deep Neural Networks are highly susceptible to ‘adversarial perturbations’

C.I. : Clean Image

A.I. : Adversarial Image

: Trained Model (Non Robust)

3 of 17

Existing Approaches

3

Data privacy and security

e.g. Patients’ data, biometric data

Data is property → Proprietary data

e.g. Google’s JFT-300M dataset

C.T.D. : Clean Training Data

A.I. : Adversarial Image

: Trained Model (Non Robust)

: Frozen : Trainable

Motivation

Approach

Results

Conclusion

4 of 17

Desired Objective

4

How to make the pretrained models robust against adversarial attacks in absence of original training data or their statistics?

The Pseudo-Data generation process is computationally expensive
Retraining the model on the generated data using adversarial defense techniques is an added computation overhead

Potential Solutions

Drawbacks

Generate Pseudo-Data from the pretrained model

(using methods such as ZSKD [1] , Deep Inversion [2] , DeGAN [3] )

Use generated data as a substitute for the unavailable training data

[1] G. K. Nayak, K. R. Mopuri, V. Shaj, V. B. Radhakrishnan, and A. Chakraborty, “Zero-shot knowledge distillation in deep networks,” in ICML, 2019.

[2] H. Yin, P. Molchanov, J. M. Alvarez, Z. Li, A. Mallya, D. Hoiem, N. K. Jha, and J. Kautz, “Dreaming to distill: Data-free knowledge transfer via deepinversion,” in CVPR, 2020.

[3] S. Addepalli, G. K. Nayak, A. Chakraborty, and R. V. Babu, “De-GAN : Data-Enriching gan for retrieving representative samples from a trained classifier,” in AAAI, 2020.

Motivation

Approach

Results

Conclusion

5 of 17

Proposed Approach

5

Motivation

Approach

Results

Conclusion

Test time adversarial detection and subsequent correction on input space (data) instead of model.

C.I. : Clean Image

A.I. : Adversarial Image

: Pretrained Model

(Non Robust)

LFC : Low Frequency

Component

6 of 17

Detection Module

6

Motivation

Approach

Results

Conclusion

The key idea is that if we have access to an adversarial detector that can classify samples from an arbitrary dataset as either clean or adversarial, then this binary classifier can be treated as a source classifier. This source classifier can then be adapted to our unlabeled test data using source-free unsupervised domain adaptation techniques.

Overall our method can be broken into four stages

Stage-1: we train source model with labeled arbitrary data.
Stage-2: We generate adversarial samples by using any adversarial attack that can fool the trained network Sm.
Stage-3: Append adversarial detection layers on the frozen source model Sm which are trained to detect adversarial samples.
Stage-4: We train the adversarial detection layers added on top of the given pretrained model via source-free unsupervised domain adaptation techniques. where the model obtained from stage 3 is used as source model and unlabelled test data is used as target data

7 of 17

Motivation for Correction Module

7

Motivation

Approach

Results

Conclusion

Humans primarily rely on low-frequency (LF) features, whereas DNNs can also extract features from the high-frequency (HF) components in the data to maximize their performance. These high-frequency components of the sample are often perturbed by many state-of-the-art adversarial attacks.
One way to lessen the effect of the adversarial attack is to get rid of the contaminated High-Frequency Components i.e. obtaining the LFC sample at a particular radius ‘r’.
The given figure shows visualization for LFC of sample at distinct radiuses and the corresponding model predictions on them.
LFC samples at a high radius favor more discriminability but high adversarial contamination. On the other hand, LFC’s at a small radius allows low AdvCont but low Disc.
Thus, there are two major factors associated with radius selection: Discriminatibility and Adversarial Contamination. Thus, we measure these two quantities present in LFC at each radius denoted by the DiscScore and AdvCont Score respectively.
As shown in the figure, careful selection of suitable radius r∗ is necessary to have a good trade-off between Disc score and AdvCont score i.e. low AdvCont but at the same time having high Disc.
Next, we will describe how we calculate the discriminability and adversarial contamination scores

8 of 17

Correction Module

8

Motivation

Approach

Results

Conclusion

At a particular radius :

Normalized discriminability score :

Corrected Adversarial Sample :

Normalized Adversarial Contamination score :

Optimal Radius ( ) :

Maximum Radius at which >

9 of 17

Correction Module

9

Motivation

Approach

Results

Conclusion

10 of 17

Correction Module

10

Motivation

Approach

Results

Conclusion

The input to our algorithm is the ith adversarial example xi’ and the pretrained model Tm.
We quantify the Discscore using a perceptual metric i.e. SSIM score between the LFC and the original sample. Thus a high SSIM score indicates high discriminability for a particular LFC sample.
We quantify the Adversarial Contamination (AdvCont) score by estimating how much perturbations have crept in the LFC with respect to adversarial image xi’
For this, we compute the label-change rate (LCR) at each radius. The key intuition of our method lies in the fact that if enough perturbations have passed through at some radius r, the adversarial component would be the dominating factor resulting in the LFC having the same prediction as the original sample.
The optimal radius is the maximum radius at which Disc_score is greater than AdvCont_score.

11 of 17

Performance of Proposed Detection Module

11

We achieve a very high “Clean Sample Detection Acc.” (≈90%-99%) that allows us to preserve the model’s accuracy on clean samples. �
The trend is consistent across a broad range of model architectures, datasets and adversarial attacks.

Motivation

Approach

Results

Conclusion

12 of 17

Performance of Proposed Correction Module

12

The performance of the non-robust model on adversarially perturbed data with and without our correction module is denoted by A.A. (After Attack) and A.C. (After Correction) respectively.

Motivation

Approach

Results

Conclusion

Most notably, we achieve a performance gain of ≈ 35 − 40% on the state-of the-art Auto-Attack across different architectures on multiple datasets. SImilar trend is observed across other attacks as well.

13 of 17

Effectiveness of Proposed Radius Selection

13

Ablations on a “Random Baseline” (R.B.) wherein is chosen randomly (within our specified range) for each sample. �
The random baseline is significantly lower than our proposed approach indicating the usefulness of selecting appropriately.

Motivation

Approach

Results

Conclusion

14 of 17

Performance of Combined Detection and Correction

14

Motivation

Approach

Results

Conclusion

15 of 17

Comparison with Data Dependent Approaches

15

DAD achieves decent adversarial accuracy while maintaining a high clean accuracy, entirely at test-time.

Motivation

Approach

Results

Conclusion

16 of 17

Conclusion

16

Proposed for the first time a complete test time detection and correction approach for adversarial robustness in absence of training data

Our adversarial detection framework is based on source-free UDA

Our correction framework inspired from human cognition, analyzes the input data in Fourier domain and discards the adversarially corrupted high-frequency regions

We achieve significant improvement in adversarial accuracy, even against state-of-the-art Auto Attack without compromising much on the clean accuracy

Additional benefits:

Any state-of-the-art classifier-based adversarial detector can be easily adopted on our source-free UDA-based adversarial detection framework

Any data-dependent detection approach can benefit from our correction module at test time to correct adversarial samples after successfully detecting them.

Motivation

Approach

Results

Conclusion

17 of 17

17

Thanks!

Project Website

https://sites.google.com/view/dad-wacv22

ACKNOWLEDGEMENT

This work is supported by a Start-up Research Grant (SRG) from SERB, DST, India (Project file number: SRG/2019/001938).