DAD: Data-free Adversarial Defense �at Test Time
Gaurav Kumar Nayak*, Ruchit Rawal*, Anirban Chakraborty
Department of Computational and Data Sciences
Indian Institute of Science, Bangalore, India
Indian Institute of Science
Bangalore, India
भारतीय विज्ञान संस्थान
बंगलौर, भारत
©Department of Computational and Data Science, IISc, 2016�This work is licensed under a Creative Commons Attribution 4.0 International License
Copyright for external content used with attribution is retained by their original authors
CDS
Department of Computational and Data Sciences
Adversarial Vulnerability
2
Motivation
Approach
Results
Conclusion
Deep Neural Networks are highly susceptible to ‘adversarial perturbations’
C.I. : Clean Image
A.I. : Adversarial Image
: Trained Model (Non Robust)
Existing Approaches
3
e.g. Patients’ data, biometric data
e.g. Google’s JFT-300M dataset
C.T.D. : Clean Training Data
A.I. : Adversarial Image
: Trained Model (Non Robust)
: Frozen : Trainable
Motivation
Approach
Results
Conclusion
Desired Objective
4
How to make the pretrained models robust against adversarial attacks in absence of original training data or their statistics?
Potential Solutions
Drawbacks
(using methods such as ZSKD [1] , Deep Inversion [2] , DeGAN [3] )
[1] G. K. Nayak, K. R. Mopuri, V. Shaj, V. B. Radhakrishnan, and A. Chakraborty, “Zero-shot knowledge distillation in deep networks,” in ICML, 2019.
[2] H. Yin, P. Molchanov, J. M. Alvarez, Z. Li, A. Mallya, D. Hoiem, N. K. Jha, and J. Kautz, “Dreaming to distill: Data-free knowledge transfer via deepinversion,” in CVPR, 2020.
[3] S. Addepalli, G. K. Nayak, A. Chakraborty, and R. V. Babu, “De-GAN : Data-Enriching gan for retrieving representative samples from a trained classifier,” in AAAI, 2020.
Motivation
Approach
Results
Conclusion
Proposed Approach
5
Motivation
Approach
Results
Conclusion
Test time adversarial detection and subsequent correction on input space (data) instead of model.
C.I. : Clean Image
A.I. : Adversarial Image
: Pretrained Model
(Non Robust)
LFC : Low Frequency
Component
Detection Module
6
Motivation
Approach
Results
Conclusion
Motivation for Correction Module
7
Motivation
Approach
Results
Conclusion
Correction Module
8
Motivation
Approach
Results
Conclusion
At a particular radius :
Normalized discriminability score :
Corrected Adversarial Sample :
Normalized Adversarial Contamination score :
Optimal Radius ( ) :
Maximum Radius at which >
Correction Module
9
Motivation
Approach
Results
Conclusion
Correction Module
10
Motivation
Approach
Results
Conclusion
Performance of Proposed Detection Module
11
Motivation
Approach
Results
Conclusion
Performance of Proposed Correction Module
12
Motivation
Approach
Results
Conclusion
Effectiveness of Proposed Radius Selection
13
Motivation
Approach
Results
Conclusion
Performance of Combined Detection and Correction
14
Motivation
Approach
Results
Conclusion
Comparison with Data Dependent Approaches
15
Motivation
Approach
Results
Conclusion
Conclusion
16
Motivation
Approach
Results
Conclusion
17
Thanks!
Project Website
ACKNOWLEDGEMENT
This work is supported by a Start-up Research Grant (SRG) from SERB, DST, India (Project file number: SRG/2019/001938).