Rethinking the Role of Gradient-based �Attribution Methods for Model Interpretability
Suraj Srinivas1 & François Fleuret2�Idiap Research Institute1 & EPFL1, University of Geneva2
Saliency Maps for Model Interpretability
2
Deep �Neural �Network
Saliency � Algorithm
Highlight important regions
Input-gradient Saliency
3
Simonyan et. al, Deep inside convolutional networks: Visualising image classification models and saliency maps, 2013
Input (x)
Saliency map (S)
Neural network
Why are gradients highly structured anyway?
4
Gradient Structure is Arbitrary
5
Pre-softmax (logit) gradients can be arbitrary, even if the model generalizes perfectly!��This also holds for post-softmax gradients (see paper for details).
Arbitrary!
Gradient Structure is Arbitrary
6
Input image
Logit-gradients of standard model
Logit-gradients of model with “fooled” gradients
Logit gradients don’t need to encode relevant information, but they still do. Why?
Generative Models hidden within Discriminative Models
7
Implicit Density Models within Discriminative Models
8
Grathwohl et. al, Your Classifier is Secretly an Energy-based Model and You Should Treat it Like One, ICLR 2020
Hypothesis
Hypothesis: The structure of logit-gradients is due to its alignment with the ground truth gradients of log density. �
A concrete test: Increasing gradient alignment must improve gradient interpretability & decreasing this alignment must deteriorate interpretability.
9
Training Energy-based Models
10
Energy-based Generative Models
11
Score-Matching
12
Aapo Hyvarinen. “Estimation of non-normalized statistical models by score matching”.Journal of Machine Learning Research, 6(Apr):695–709, 2005
Does not require ∇x log pdata(x)
Alignment of gradients is a generative modelling principle!
Regularized Score-Matching
Efficient estimation of Hessian trace
Regularization of Hessian trace
13
Hutchinson’s trick
Taylor series
Interpretability vs Generative Modelling
14
Interpretability | Generative Modelling |
Logit-Gradients | Gradient of log p(x) |
“Deep dream” Visualization by Activation Maximization | MCMC Sampling by Langevin Dynamics |
Pixel perturbation test | Density ratio test |
Simonyan et. al, Deep inside convolutional networks: Visualising image classification models and saliency maps, 2013
Experiments
We compare generative capabilities and gradient interpretability across different models
15
Effect on Generative Modelling
16
Schmelkov et. al, How good is my GAN?, ECCV 2018
Effect on Gradient Interpretability
17
This confirms our hypothesis that the implicit density modelling influences gradient interpretability.
Effect on Gradient Interpretability
18
Conclusion
19