Deep Learning in absence of training data
By: Gaurav Kumar Nayak
Advisor: Dr. Anirban Chakraborty
Department of Computational and Data Sciences
Indian Institute of Science, Bangalore, India
Indian Institute of Science
Bangalore, India
भारतीय विज्ञान संस्थान
बंगलौर, भारत
Department of Computational and Data Sciences
CDS
Department of Computational and Data Sciences
ML vs DL
2
Other difference : Amount of data and Computational power needed
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data Hungry Deep Models
3
References:
Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In CVPR (pp. 1701-1708).
Levine, S., Pastor, P., Krizhevsky, A., & Quillen, D. (2016, October). Learning hand-eye coordination for robotic grasping with large-scale data collection. In International Symposium on Experimental Robotics (pp. 173-184). Springer, Cham.
Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: Fully convolutional localization networks for dense captioning. In CVPR (pp. 4565-4574).
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data is Critical
4
Deep Models performance strongly correlated with
‘the amount of training data’
CDS.IISc.ac.in | Department of Computational and Data Sciences
Absence of Training Data (Privacy concerns)
5
CDS.IISc.ac.in | Department of Computational and Data Sciences
Absence of Training Data (Proprietary Data)
6
Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In ICCV (pp. 843-852).
References:
https://www.slideshare.net/ExtractConf/andrew-ng-chief-scientist-at-baidu
CDS.IISc.ac.in | Department of Computational and Data Sciences
Concerns on Deployment of trained Models
7
Given: a pretrained model and no training data
Lightweight Model
Robust to adversarial attacks
Knowledge Distillation
?
CDS.IISc.ac.in | Department of Computational and Data Sciences
Knowledge Distillation (KD)
8
Figure from https://towardsdatascience.com
CDS.IISc.ac.in | Department of Computational and Data Sciences
Knowledge Distillation (KD)
9
Figure from https://towardsdatascience.com
Useful for
Model Compression
CDS.IISc.ac.in | Department of Computational and Data Sciences
Knowledge Distillation (KD)
10
T
S
{X,Y}
Distillation Loss
Cross-entropy Loss
Hinton et al. Distilling the Knowledge in a Neural Network, arXiv:1503.02531, 2015
Dataset
CDS.IISc.ac.in | Department of Computational and Data Sciences
Requirement
11
Rely on
Labeled Data
CDS.IISc.ac.in | Department of Computational and Data Sciences
Is it a big problem ?
12
Rely on
Labeled Data
CDS.IISc.ac.in | Department of Computational and Data Sciences
Can we do Knowledge Distillation without (access to) training data (Zero-Shot)?
13
CDS.IISc.ac.in | Department of Computational and Data Sciences
GK Nayak, KR Mopuri, V Shaj, R V Babu, and A Chakraborty
Zero-Shot Knowledge Distillation
in Deep Networks
Indian Institute of Science
Bangalore, India
भारतीय विज्ञान संस्थान
बंगलौर, भारत
Department of Computational and Data Sciences
CDS
Department of Computational and Data Sciences
Data Free KD
15
T
S
{X,Y}
Distillation Loss
Cross-entropy Loss
Dataset
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data Free KD
16
T
S
{X,Y}
Distillation Loss
Cross-entropy Loss
Dataset
Can samples be synthesized from the trained Teacher model ?
CDS.IISc.ac.in | Department of Computational and Data Sciences
Class Impressions: Parameters patterns
17
T
Pre-softmax
KR Mopuri et al., Ask, Acquire and Attack: Data-free UAP generation using Class impressions, ECCV 2018
c = Dog
CDS.IISc.ac.in | Department of Computational and Data Sciences
Training on CIs: Limitations
18
CDS.IISc.ac.in | Department of Computational and Data Sciences
Need an Improved modelling of the output space
19
CDS.IISc.ac.in | Department of Computational and Data Sciences
Dirichlet Distribution (on 2d simplex)
20
CDS.IISc.ac.in | Department of Computational and Data Sciences
21
Dirichlet modelling of output space
Class similarity matrix
Wk - weights learned by the Teacher’s softmax classifier for class ‘k’
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data Impressions (DI)
22
T
Class similarity matrix
DI
Car
Cat
Horse
Truck
CDS.IISc.ac.in | Department of Computational and Data Sciences
Distillation with DIs
23
T
S
Distillation Loss
Cross-entropy Loss
Data Impressions (DI)
CDS.IISc.ac.in | Department of Computational and Data Sciences
Distillation with DIs
24
T
S
Distillation Loss
Data Impressions (DI)
CDS.IISc.ac.in | Department of Computational and Data Sciences
Results:
25
MNIST & F-MNIST
CIFAR-10
Teacher: LeNet
Student: LeNet-Half
Teacher: AlexNet
Student: AlexNet-Half
Data Impressions
Original Training Data
MNIST
CIFAR-10
F-MNIST
Class Impressions
Data Impressions (Ours)
MNIST
CIFAR-10
F-MNIST
CI Vs DI
CDS.IISc.ac.in | Department of Computational and Data Sciences
Results: Comparison
26
MNIST
CIFAR-10
Model | Performance |
Teacher – CE | 99.34 |
Student – CE | 98.92 |
Student–KD (Hinton et al., 2015) 60K original data | 99.25 |
(Kimura et al., 2018) 200 original data | 86.70 |
(Lopes et al., 2017) (uses meta data) | 92.47 |
ZSKD (Ours) (24000 DIs, and no original data) | 98.77 |
Model | Performance |
Teacher – CE | 83.03 |
Student – CE | 80.04 |
Student – KD (Hinton et al., 2015) 50K original data | 80.08 |
ZSKD (Ours) (40000 DIs, and no original data) | 69.56 |
Model | Performance |
Teacher – CE | 90.84 |
Student – CE | 89.43 |
Student – KD (Hinton et al., 2015) 50K original data | 89.66 |
(Kimura et al., 2018) 200 original data | 72.50 |
ZSKD (Ours) (48000 DIs, and no original data) | 79.62 |
F-MNIST
CDS.IISc.ac.in | Department of Computational and Data Sciences
Summary
27
For more details,
G. K. Nayak, M. K. Reddy, V. Shaj, R. Venkatesh Babu, A. Chakraborty, “Zero-Shot Knowledge Distillation in Deep Networks”, ICML, 2019.
ZSKD Code: https://github.com/vcl-iisc/ZSKD
CDS.IISc.ac.in | Department of Computational and Data Sciences
Extraction of Data Impressions (DI)
Can Data Impressions be used across different computer vision applications ?
Does Robustness transfer to DI-Distilled Models ?
Can Data Impressions act as surrogate for original training samples ?
28
CDS.IISc.ac.in | Department of Computational and Data Sciences
GK Nayak, KR Mopuri, S Jain, and A Chakraborty
Mining Data Impressions from Deep Models as
Substitute for the Unavailable Training Data
Indian Institute of Science
Bangalore, India
भारतीय विज्ञान संस्थान
बंगलौर, भारत
Department of Computational and Data Sciences
CDS
Department of Computational and Data Sciences
Need for Proxy Data
30
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data Impressions as Proxy Data
31
Agnostic to downstream applications
CDS.IISc.ac.in | Department of Computational and Data Sciences
Testing the effectiveness of Data Impressions
32
Verifying the effectiveness of the extracted Data Impressions (DIs):
CDS.IISc.ac.in | Department of Computational and Data Sciences
Generic Nature of Data Impressions
33
Popular Applications (beyond KD) where data may not be accessible:
Independently tackled in literature - data generation is tied to task at hand (Application dependent)
Generation of Data Impressions - Application Independent and Architecture Independent
To show utility of DIs on diverse applications - proving DIs as reliable surrogates
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data-free Knowledge Distillation
34
Investigating Robustness of DI-Distilled Models
CDS.IISc.ac.in | Department of Computational and Data Sciences
Source-free Unsupervised Domain Adaptation
35
CDS.IISc.ac.in | Department of Computational and Data Sciences
Source-free Unsupervised Domain Adaptation
36
Comparison with Source Dependent Approaches
Comparison with Source-free Domain Adaptation Methods
CDS.IISc.ac.in | Department of Computational and Data Sciences
Continual Learning in absence of old class data
37
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data-free Universal Adversarial Perturbations
38
Mopuri, K. R., Uppala, P. K., & Babu, R. V. (2018). Ask, acquire, and attack: Data-free uap generation using class impressions. In ECCV .
CDS.IISc.ac.in | Department of Computational and Data Sciences
Can DIs be used for Data-free UAPs?
39
Class Impressions (CIs)
Data Impressions (DIs)
CIs - ‘special case of DIs’
“More generic than CIs”
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data-free Universal Adversarial Perturbations (Ours)
40
Data
CDS.IISc.ac.in | Department of Computational and Data Sciences
Data-free Universal Adversarial Perturbations (Ours)
41
UAPs crafted from CIFAR-10 Data Impressions
UAPs from Data Impressions achieve better fooling rates and outperform those of Class Impressions by a minimum of 4.05%
CDS.IISc.ac.in | Department of Computational and Data Sciences
Summary
42
CDS.IISc.ac.in | Department of Computational and Data Sciences
Recall
43
Nayak, G.K., Mopuri, K.R., Shaj, V., Babu, R.V., & Chakraborty, A. (2019). Zero-Shot Knowledge Distillation in Deep Networks. ICML.
First work to demonstrate “Data free KD”
CDS.IISc.ac.in | Department of Computational and Data Sciences
Adversarial Belief Matching
44
Figure from Micaelli et al.
CDS.IISc.ac.in | Department of Computational and Data Sciences
DAFL: GAN based generation
45
CDS.IISc.ac.in | Department of Computational and Data Sciences
Making Scalable using GANs and Proxy Data
46
Addepalli, S., Nayak, G. K., Chakraborty, A., & Radhakrishnan, V. B. . DeGAN: Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier. In AAAI, 2020.
CDS.IISc.ac.in | Department of Computational and Data Sciences
Existing Approaches for Data-free KD
47
Trained Teacher model
Direct composition of synthetic data
Learn training distribution via GAN that can seed proxy samples
Broad
Ways:
ZSKD [Nayak et al., ICML, 2019] DFKD [Lopes et al., NIPS Workshop, 2017]
ZSKT [Micaelli et al., NeurIPS, 2019]
DAFL [Chen et al., ICCV, 2019]
DeGAN [Addepalli et al., AAAI, 2020]
Existing Works:
Several iterations of backpropagations
Complicated Optimization requiring careful balancing of multiple losses
Drawbacks:
CDS.IISc.ac.in | Department of Computational and Data Sciences
Existing Approaches for Data-free KD
48
Trained Teacher model
Direct composition of synthetic data
Learn training distribution via GAN that can seed proxy samples
Broad
Ways:
ZSKD [Nayak et al., ICML, 2019] DFKD [Lopes et al., NIPS Workshop, 2017]
ZSKT [Micaelli et al., NeurIPS, 2019]
DAFL [Chen et al., ICCV, 2019]
DeGAN [Addepalli et al., AAAI, 2020]
Existing Works:
Several iterations of backpropagations
Complicated Optimization requiring careful balancing of multiple losses
Drawbacks:
The existing works suffer from heavy computational overhead
CDS.IISc.ac.in | Department of Computational and Data Sciences
Observation
Motivates to investigate arbitrary transfer set for data-free KD
CDS.IISc.ac.in | Department of Computational and Data Sciences
Objective
Practical Importance
50
CDS.IISc.ac.in | Department of Computational and Data Sciences
Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation
GK Nayak, KR Mopuri, and A Chakraborty
Indian Institute of Science
Bangalore, India
भारतीय विज्ञान संस्थान
बंगलौर, भारत
Department of Computational and Data Sciences
CDS
Department of Computational and Data Sciences
Proposed Method: Motivation
52
CIFAR-10
CDS.IISc.ac.in | Department of Computational and Data Sciences
Proposed Method: Motivation
53
CIFAR-10
CDS.IISc.ac.in | Department of Computational and Data Sciences
Proposed Method: Motivation
DNNs often partition the arbitrary input domain into disproportionate classification regions
CIFAR-10
CDS.IISc.ac.in | Department of Computational and Data Sciences
Proposed Method: Illustration
55
CDS.IISc.ac.in | Department of Computational and Data Sciences
Arbitrary Transfer sets : Unbalanced v/s Balanced
56
CDS.IISc.ac.in | Department of Computational and Data Sciences
Augmentation helps the Underrepresented classes
57
CDS.IISc.ac.in | Department of Computational and Data Sciences
Comparison with state-of-the-art
58
CDS.IISc.ac.in | Department of Computational and Data Sciences
Explicit removal of overlapping classes
59
CIFAR-10
CDS.IISc.ac.in | Department of Computational and Data Sciences
Explicit removal of overlapping classes
60
CIFAR-10
DeGAN uses 45000 CIFAR-100 samples whereas we are effectively utilizing only 18818 samples when added on top of SVHN
CDS.IISc.ac.in | Department of Computational and Data Sciences
Generality of the Proposed Strategy
61
Teacher Model Trained on
Binary-MNIST
Binary-FMNIST
CDS.IISc.ac.in | Department of Computational and Data Sciences
Generality of the Proposed Strategy
62
CDS.IISc.ac.in | Department of Computational and Data Sciences
Summary
“Arbitrary” transfer sets:
63
For more details,
G. K. Nayak, M. K. Reddy, A. Chakraborty, “Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation”, WACV, 2021.
Video Link: https://www.youtube.com/watch?v=7qiLHdr1iLk
CDS.IISc.ac.in | Department of Computational and Data Sciences
Conclusion
Three different approaches to perform knowledge distillation in absence of training data:
64
(without using any training data & publicly available data)
ZSKD method
(first to solve, but has scalability issues)
(no training samples)
DeGAN method
(scalable, but complicated GAN training)
(no training samples)
‘Target class-balanced’ transfer set method
(Computationally efficient,
competitive KD performance)
CDS.IISc.ac.in | Department of Computational and Data Sciences
Concerns on Deployment of trained Models
65
Given: a pretrained model and no training data
Lightweight Model
Robust to adversarial attacks
ZSKD
DeGAN
Arbitrary transfer sets (Target class-balanced)
?
CDS.IISc.ac.in | Department of Computational and Data Sciences
DAD: Data-free Adversarial Defense �at Test Time
Gaurav Kumar Nayak, Ruchit Rawal, Anirban Chakraborty
Accepted in WACV 2022
Indian Institute of Science
Bangalore, India
भारतीय विज्ञान संस्थान
बंगलौर, भारत
©Department of Computational and Data Science, IISc, 2016�This work is licensed under a Creative Commons Attribution 4.0 International License
Copyright for external content used with attribution is retained by their original authors
CDS
Department of Computational and Data Sciences
Adversarial Vulnerability
67
Motivation
Approach
Results
Conclusion
Deep Neural Networks are highly susceptible to ‘adversarial perturbations’
C.I. : Clean Image
A.I. : Adversarial Image
: Trained Model (Non Robust)
Existing Approaches
68
e.g. Patients’ data, biometric data
e.g. Google’s JFT-300M dataset
C.T.D. : Clean Training Data
A.I. : Adversarial Image
: Trained Model (Non Robust)
: Frozen : Trainable
Motivation
Approach
Results
Conclusion
Desired Objective
69
How to make the pretrained models robust against adversarial attacks in absence of original training data or their statistics?
Potential Solutions
Drawbacks
(using methods such as ZSKD [1] , Deep Inversion [2] , DeGAN [3] )
[1] G. K. Nayak, K. R. Mopuri, V. Shaj, V. B. Radhakrishnan, and A. Chakraborty, “Zero-shot knowledge distillation in deep networks,” in ICML, 2019.
[2] H. Yin, P. Molchanov, J. M. Alvarez, Z. Li, A. Mallya, D. Hoiem, N. K. Jha, and J. Kautz, “Dreaming to distill: Data-free knowledge transfer via deepinversion,” in CVPR, 2020.
[3] S. Addepalli, G. K. Nayak, A. Chakraborty, and R. V. Babu, “De-GAN : Data-Enriching gan for retrieving representative samples from a trained classifier,” in AAAI, 2020.
Motivation
Approach
Results
Conclusion
Proposed Approach
70
Motivation
Approach
Results
Conclusion
Test time adversarial detection and subsequent correction on input space (data) instead of model.
C.I. : Clean Image
A.I. : Adversarial Image
: Pretrained Model
(Non Robust)
LFC : Low Frequency
Component
Detection Module
71
Motivation
Approach
Results
Conclusion
Motivation for Correction Module
72
Motivation
Approach
Results
Conclusion
Correction Module
73
Motivation
Approach
Results
Conclusion
At a particular radius :
Normalized discriminability score :
Corrected Adversarial Sample :
Normalized Adversarial Contamination score :
Optimal Radius ( ) :
Maximum Radius at which >
Correction Module
74
Motivation
Approach
Results
Conclusion
Correction Module
75
Motivation
Approach
Results
Conclusion
Performance of Proposed Detection Module
76
Motivation
Approach
Results
Conclusion
Performance of Proposed Correction Module
77
Motivation
Approach
Results
Conclusion
Effectiveness of Proposed Radius Selection
78
Motivation
Approach
Results
Conclusion
Performance of Combined Detection and Correction
79
Motivation
Approach
Results
Conclusion
Comparison with Data Dependent Approaches
80
Motivation
Approach
Results
Conclusion
Conclusion
81
Motivation
Approach
Results
Conclusion
Project Website: https://sites.google.com/view/dad-wacv22
Thank you !!
82
For any queries:
Email : gauravnayak@iisc.ac.in
Linked In : https://www.linkedin.com/in/gaurav-nayak-6227ba53/
Webpage : https://sites.google.com/view/gauravnayak/
CDS.IISc.ac.in | Department of Computational and Data Sciences