1 of 17

DAD: Data-free Adversarial Defense �at Test Time

Gaurav Kumar Nayak*, Ruchit Rawal*, Anirban Chakraborty

Department of Computational and Data Sciences

Indian Institute of Science, Bangalore, India

Indian Institute of Science

Bangalore, India

भारतीय विज्ञान संस्थान

बंगलौर, भारत

 

©Department of Computational and Data Science, IISc, 2016�This work is licensed under a Creative Commons Attribution 4.0 International License

Copyright for external content used with attribution is retained by their original authors

CDS

Department of Computational and Data Sciences

2 of 17

Adversarial Vulnerability

2

Motivation

Approach

Results

Conclusion

Deep Neural Networks are highly susceptible to ‘adversarial perturbations’

C.I. : Clean Image

A.I. : Adversarial Image

: Trained Model (Non Robust)

3 of 17

Existing Approaches

3

  • Data privacy and security

e.g. Patients’ data, biometric data

  • Data is property → Proprietary data

e.g. Google’s JFT-300M dataset

C.T.D. : Clean Training Data

A.I. : Adversarial Image

: Trained Model (Non Robust)

: Frozen : Trainable

Motivation

Approach

Results

Conclusion

4 of 17

Desired Objective

4

How to make the pretrained models robust against adversarial attacks in absence of original training data or their statistics?

  • The Pseudo-Data generation process is computationally expensive
  • Retraining the model on the generated data using adversarial defense techniques is an added computation overhead

Potential Solutions

Drawbacks

  • Generate Pseudo-Data from the pretrained model

(using methods such as ZSKD [1] , Deep Inversion [2] , DeGAN [3] )

  • Use generated data as a substitute for the unavailable training data

[1] G. K. Nayak, K. R. Mopuri, V. Shaj, V. B. Radhakrishnan, and A. Chakraborty, “Zero-shot knowledge distillation in deep networks,” in ICML, 2019.

[2] H. Yin, P. Molchanov, J. M. Alvarez, Z. Li, A. Mallya, D. Hoiem, N. K. Jha, and J. Kautz, “Dreaming to distill: Data-free knowledge transfer via deepinversion,” in CVPR, 2020.

[3] S. Addepalli, G. K. Nayak, A. Chakraborty, and R. V. Babu, “De-GAN : Data-Enriching gan for retrieving representative samples from a trained classifier,” in AAAI, 2020.

Motivation

Approach

Results

Conclusion

5 of 17

Proposed Approach

5

Motivation

Approach

Results

Conclusion

Test time adversarial detection and subsequent correction on input space (data) instead of model.

C.I. : Clean Image

A.I. : Adversarial Image

: Pretrained Model

(Non Robust)

LFC : Low Frequency

Component

6 of 17

Detection Module

6

Motivation

Approach

Results

Conclusion

7 of 17

Motivation for Correction Module

7

Motivation

Approach

Results

Conclusion

8 of 17

Correction Module

8

Motivation

Approach

Results

Conclusion

At a particular radius :

Normalized discriminability score :

Corrected Adversarial Sample :

Normalized Adversarial Contamination score :

Optimal Radius ( ) :

Maximum Radius at which >

9 of 17

Correction Module

9

Motivation

Approach

Results

Conclusion

10 of 17

Correction Module

10

Motivation

Approach

Results

Conclusion

11 of 17

Performance of Proposed Detection Module

11

  • We achieve a very high “Clean Sample Detection Acc.” (≈90%-99%) that allows us to preserve the model’s accuracy on clean samples. �
  • The trend is consistent across a broad range of model architectures, datasets and adversarial attacks.

Motivation

Approach

Results

Conclusion

12 of 17

Performance of Proposed Correction Module

12

  • The performance of the non-robust model on adversarially perturbed data with and without our correction module is denoted by A.A. (After Attack) and A.C. (After Correction) respectively.

Motivation

Approach

Results

Conclusion

  • Most notably, we achieve a performance gain of ≈ 35 − 40% on the state-of the-art Auto-Attack across different architectures on multiple datasets. SImilar trend is observed across other attacks as well.

13 of 17

Effectiveness of Proposed Radius Selection

13

  • Ablations on a “Random Baseline” (R.B.) wherein is chosen randomly (within our specified range) for each sample. �
  • The random baseline is significantly lower than our proposed approach indicating the usefulness of selecting appropriately.

Motivation

Approach

Results

Conclusion

14 of 17

Performance of Combined Detection and Correction

14

Motivation

Approach

Results

Conclusion

15 of 17

Comparison with Data Dependent Approaches

15

  • DAD achieves decent adversarial accuracy while maintaining a high clean accuracy, entirely at test-time.

Motivation

Approach

Results

Conclusion

16 of 17

Conclusion

16

  • Proposed for the first time a complete test time detection and correction approach for adversarial robustness in absence of training data

  • Our adversarial detection framework is based on source-free UDA

  • Our correction framework inspired from human cognition, analyzes the input data in Fourier domain and discards the adversarially corrupted high-frequency regions

  • We achieve significant improvement in adversarial accuracy, even against state-of-the-art Auto Attack without compromising much on the clean accuracy

  • Additional benefits:

    • Any state-of-the-art classifier-based adversarial detector can be easily adopted on our source-free UDA-based adversarial detection framework

    • Any data-dependent detection approach can benefit from our correction module at test time to correct adversarial samples after successfully detecting them.

Motivation

Approach

Results

Conclusion

17 of 17

17

Thanks!

ACKNOWLEDGEMENT

This work is supported by a Start-up Research Grant (SRG) from SERB, DST, India (Project file number: SRG/2019/001938).