Interpreting Machine Learning Models:�State-of-the-art, Challenges, Opportunities
Hima Lakkaraju
Schedule for Today
Motivation
3
Machine Learning is EVERYWHERE!!
[ Weller 2017 ]
Is Model Understanding Needed Everywhere?
4
[ Weller 2017 ]
When and Why Model Understanding?
5
[ Weller 2017, Lipton 2017, Doshi-Velez and Kim 2016 ]
When and Why Model Understanding?
ML is increasingly being employed in complex high-stakes settings
6
When and Why Model Understanding?
When and Why Model Understanding?
When and Why Model Understanding?
9
Model understanding becomes critical when:
Example: Why Model Understanding?
10
Predictive
Model
Input
Prediction = Siberian Husky
Model Understanding
This model is relying on incorrect features to make this prediction!! Let me fix the model
Model understanding facilitates debugging.
Example: Why Model Understanding?
11
Predictive
Model
Defendant Details
Prediction = Risky to Release
Model Understanding
Race
Crimes
Gender
This prediction is biased. Race and gender are being used to make the prediction!!
Model understanding facilitates bias detection.
[ Larson et. al. 2016 ]
Example: Why Model Understanding?
12
Predictive
Model
Loan Applicant Details
Prediction = Denied Loan
Model Understanding
Increase salary by 50K + pay credit card bills on time for next 3 months to get a loan
Loan Applicant
I have some means for recourse. Let me go and work on my promotion and pay my bills on time.
Model understanding helps provide recourse to individuals who are adversely affected by model predictions.
Example: Why Model Understanding?
13
Predictive
Model
Patient Data
Model Understanding
This model is using irrelevant features when predicting on female subpopulation. I should not trust its predictions for that group.
Predictions
25, Female, Cold
32, Male, No
31, Male, Cough
.
.
.
.
Healthy
Sick
Sick
.
.
Healthy
Healthy
Sick
If gender = female,
if ID_num > 200, then sick
If gender = male,
if cold = true and cough = true, then sick
Model understanding helps assess if and when to trust model predictions when making decisions.
Example: Why Model Understanding?
14
Predictive
Model
Patient Data
Model Understanding
This model is using irrelevant features when predicting on female subpopulation. This cannot be approved!
Predictions
25, Female, Cold
32, Male, No
31, Male, Cough
.
.
.
.
Healthy
Sick
Sick
.
.
Healthy
Healthy
Sick
If gender = female,
if ID_num > 200, then sick
If gender = male,
if cold = true and cough = true, then sick
Model understanding allows us to vet models to determine if they are suitable for deployment in real world.
Summary: Why Model Understanding?
15
Debugging
Bias Detection
Recourse
If and when to trust model predictions
Vet models to assess suitability for deployment
Utility
End users (e.g., loan applicants)
Decision makers (e.g., doctors, judges)
Regulatory agencies (e.g., FDA, European commission)
Researchers and engineers
Stakeholders
Achieving Model Understanding
Take 1: Build inherently interpretable predictive models
16
[ Letham and Rudin 2015; Lakkaraju et. al. 2016 ]
Achieving Model Understanding
Take 2: Explain pre-built models in a post-hoc manner
17
Explainer
Inherently Interpretable Models vs.
Post hoc Explanations
In certain settings, accuracy-interpretability trade offs may exist.
18
Example
[ Cireşan et. al. 2012, Caruana et. al. 2006, Frosst et. al. 2017, Stewart 2020 ]
Inherently Interpretable Models vs.
Post hoc Explanations
19
complex models might achieve higher accuracy
can build interpretable +
accurate models
Inherently Interpretable Models vs.
Post hoc Explanations
Sometimes, you don’t have enough data to build your model from scratch.
And, all you have is a (proprietary) black box!
20
Inherently Interpretable Models vs.
Post hoc Explanations
21
If you can build an interpretable model which is also adequately accurate for your setting, DO IT!
Otherwise, post hoc explanations come to the rescue!
Agenda
22
Agenda
23
Inherently Interpretable Models
Inherently Interpretable Models
Bayesian Rule Lists
[Letham et. al. 2016]
Bayesian Rule Lists
27
Bayesian Rule Lists: Generative Model
28
is a set of pre-mined antecedents
Model parameters are inferred using the Metropolis-Hastings algorithm which is a
Markov Chain Monte Carlo (MCMC) Sampling method
Pre-mined Antecedents
29
Interpretable Decision Sets
[Lakkaraju et. al. 2016]
Interpretable Decision Sets: Desiderata
IDS: Objective Function
32
IDS: Objective Function
33
IDS: Objective Function
34
IDS: Objective Function
35
IDS: Objective Function
36
IDS: Objective Function
37
IDS: Optimization Procedure
38
Inherently Interpretable Models
Risk Scores: Motivation
40
Risk Scores: Examples
41
[Ustun and Rudin, 2016]
Objective function to learn risk scores
42
Above turns out to be a mixed integer program, and is optimized using a cutting plane
method and a branch-and-bound technique.
Inherently Interpretable Models
Generalized Additive Models (GAMs)
44
[Lou et. al., 2012; Caruana et. al., 2015]
Formulation and Characteristics of GAMs
45
g is a link function; E.g., identity function in case of regression;
log (y/1 – y) in case of classification;
fi is a shape function
GAMs and GA2Ms
GAMs and GA2Ms
47
Inherently Interpretable Models
Prototype Selection for Interpretable Classification
[Bien et. al., 2012]
Prototype Selection for Interpretable Classification
Prototype Layers in Deep Learning Models
[Li et. al. 2017, Chen et. al. 2019]
Prototype Layers in Deep Learning Models
Prototype Layers in Deep Learning Models
Prototype Layers in Deep Learning Models
Prototype Layers in Deep Learning Models
Inherently Interpretable Models
Attention Layers in Deep Learning Models
[Bahdanau et. al. 2016; Xu et. al. 2015]
Input
Encoder
Context Vector
Decoder
Outputs
I
am
Bob
h1
h2
h3
C
s1
s2
s3
Je
suis
Bob
Attention Layers in Deep Learning Models
[Bahdanau et. al. 2016; Xu et. al. 2015]
Input
Encoder
Context Vector
Decoder
Outputs
I
am
Bob
h1
h2
h3
s1
s2
s3
Je
suis
Bob
c1
c2
c3
Attention Layers in Deep Learning Models
Inherently Interpretable Models
Agenda
61
What is an Explanation?
62
What is an Explanation?
Definition: Interpretable description of the model behavior
63
Classifier
User
Explanation
Faithful
Understandable
What is an Explanation?
Definition: Interpretable description of the model behavior
64
Summarize with a program/rule/tree
Classifier
User
Send all the model parameters θ?
Send many example predictions?
Select most important features/points
Describe how to flip the model prediction
...
[ Lipton 2016 ]
Local Explanations vs. Global Explanations
65
Explain individual predictions
Explain complete behavior of the model
Help unearth biases in the local neighborhood of a given instance
Help shed light on big picture biases affecting larger subgroups
Help vet if individual predictions are being made for the right reasons
Help vet if the model, at a high level, is suitable for deployment
Approaches for Post hoc Explainability
Local Explanations
66
Global Explanations
Approaches for Post hoc Explainability
Local Explanations
67
Global Explanations
LIME: Local Interpretable Model-Agnostic Explanations
68
LIME: Local Interpretable Model-Agnostic Explanations
69
LIME: Local Interpretable Model-Agnostic Explanations
70
LIME: Local Interpretable Model-Agnostic Explanations
71
LIME: Local Interpretable Model-Agnostic Explanations
72
Predict Wolf vs Husky
73
Only 1 mistake!
Predict Wolf vs Husky
74
We’ve built a great snow detector…
SHAP: Shapley Values as Importance
Marginal contribution of each feature towards the prediction,
averaged over all possible permutations.
Attributes the prediction to each of the features.
75
xi
P(y) = 0.9
xi
P(y) = 0.8
M(xi, O) = 0.1
O
O/xi
Approaches for Post hoc Explainability
Local Explanations
76
Global Explanations
Anchors
Salary Prediction
Approaches for Post hoc Explainability
Local Explanations
79
Global Explanations
Saliency Map Overview
80
Input
Model
Predictions
Junco Bird
Saliency Map Overview
81
What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’?
Input
Model
Predictions
Junco Bird
Saliency Map Overview
82
What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’?
Input
Model
Predictions
Junco Bird
Modern DNN Setting
83
Model
class specific logit
Input
Model
Predictions
Junco Bird
Input-Gradient
84
Input-Gradient
Logit
Input
Same dimension as the input.
Input
Model
Predictions
Junco Bird
Input-Gradient
85
Input
Model
Predictions
Junco Bird
Input-Gradient
Logit
Visualize as a heatmap
Input
Input-Gradient
86
Input
Model
Predictions
Junco Bird
Input-Gradient
Logit
Input
Challenges
SmoothGrad
87
Input
Model
Predictions
Junco Bird
SmoothGrad
Gaussian noise
Average Input-gradient of ‘noisy’ inputs.
SmoothGrad
88
Input
Model
Predictions
Junco Bird
SmoothGrad
Gaussian noise
Average Input-gradient of ‘noisy’ inputs.
Integrated Gradients
89
Input
Model
Predictions
Junco Bird
Baseline input
Path integral: ‘sum’ of interpolated gradients
Integrated Gradients
90
Input
Model
Predictions
Junco Bird
Path integral: ‘sum’ of interpolated gradients
Baseline input
Gradient-Input
91
Input
Model
Predictions
Junco Bird
Gradient-Input
Input gradient
Input
Element-wise product of input-gradient and input.
Gradient-Input
92
Input
Model
Predictions
Junco Bird
Gradient-Input
logit gradient
Input
Element-wise product of input-gradient and input.
Approaches for Post hoc Explainability
Local Explanations
93
Global Explanations
Prototypes/Example Based Post hoc Explanations
94
Use examples (synthetic or natural) to explain individual predictions
Training Point Ranking via Influence Functions
Which training data points have the most ‘influence’ on the test loss?
95
Input
Model
Predictions
Junco Bird
Training Point Ranking via Influence Functions
Which training data points have the most ‘influence’ on the test loss?
96
Input
Model
Predictions
Junco Bird
Training Point Ranking via Influence Functions
Influence Function: classic tool used in robust statistics for assessing the effect of a sample on regression parameters (Cook & Weisberg, 1980).
Instead of refitting model for every data point, Cook’s distance provides analytical alternative.
97
Training Point Ranking via Influence Functions
Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting.
98
Training sample point
Training Point Ranking via Influence Functions
Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting.
99
Training sample point
ERM Solution
UpWeighted ERM Solution
Training Point Ranking via Influence Functions
Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting.
100
Training sample point
ERM Solution
UpWeighted ERM Solution
Influence of Training Point on Parameters
Influence of Training Point on Test-Input’s loss
Training Point Ranking via Influence Functions
Applications:
101
[ Koh & Liang 2017 ]
Challenges and Other Approaches
Influence function Challenges:
102
Challenges and Other Approaches
Influence function Challenges:
Alternatives:
103
Activation Maximization
These approaches identify examples, synthetic or natural, that strongly activate a function (neuron) of interest.
104
Activation Maximization
These approaches identify examples, synthetic or natural, that strongly activate a function (neuron) of interest.
Implementation Flavors:
105
Feature Visualization
106
Approaches for Post hoc Explainability
Local Explanations
107
Global Explanations
Counterfactual Explanations
108
What features need to be changed and by how much to flip a model’s prediction?
[Goyal et. al., 2019]
Counterfactual Explanations
As ML models increasingly deployed to make high-stakes decisions (e.g., loan applications), it becomes important to provide recourse to affected individuals.
109
Counterfactual Explanations
What features need to be changed and by
how much to flip a model’s prediction ?
(i.e., to reverse an unfavorable outcome).
Counterfactual Explanations
110
Predictive
Model
Deny Loan
Loan Application
Recourse: Increase your salary by 5K & pay your credit card bills on time for next 3 months
f(x)
Applicant
Counterfactual Generation Algorithm
Recourse
Generating Counterfactual Explanations: Intuition
111
Proposed solutions differ on:
candidate counterfactuals?
the underlying predictive model?
Take 1: Minimum Distance Counterfactuals
112
Distance Metric
Predictive Model
Desired Outcome
Original Instance
Counterfactual
Choice of distance metric dictates what kinds of counterfactuals are chosen.
Wachter et. al. use normalized Manhattan distance.
Take 1: Minimum Distance Counterfactuals
113
Wachter et. al. solve a differentiable, unconstrained version of the objective using ADAM optimization algorithm with random restarts.
This method requires access to gradients of the underlying predictive model.
Take 1: Minimum Distance Counterfactuals
114
Not feasible to act upon these features!
Take 2: Feasible and Least Cost Counterfactuals
115
Take 2: Feasible and Least Cost Counterfactuals
116
Take 2: Feasible and Least Cost Counterfactuals
117
Question: What if we have a black box or a non-linear classifier?
Answer: generate a local linear model approximation (e.g., using LIME) and then apply Ustun et. al.’s framework
Take 2: Feasible and Least Cost Counterfactuals
118
Changing one feature without affecting another might not be possible!
Take 3: Causally Feasible Counterfactuals
119
After 1 year
Recourse:
Reduce current debt
from 3250$ to 1000$
My current debt has reduced to 1000$. Please give me loan.
Loan Applicant
f(x)
Your age increased by 1 year and the recourse is no longer valid! Sorry!
Important to account for feature interactions when generating counterfactuals!
But how?!
Loan Applicant
Predictive Model
Take 3: Causally Feasible Counterfactuals
120
is the set of causally feasible counterfactuals permitted according to a given Structural Causal Model (SCM).
Question: What if we don’t have access to the structural causal model?
Counterfactuals on Data Manifold
121
[ Verma et. al., 2020, Pawelczyk et. al., 2020]
Approaches for Post hoc Explainability
Local Explanations
122
Global Explanations
Global Explanations
123
Local vs. Global Explanations
124
Explain individual predictions
Help unearth biases in the local neighborhood of a given instance
Help vet if individual predictions are being made for the right reasons
Explain complete behavior of the model
Help shed light on big picture biases affecting larger subgroups
Help vet if the model, at a high level, is suitable for deployment
Approaches for Post hoc Explainability
Local Explanations
125
Global Explanations
Global Explanation as a Collection of Local Explanations
How to generate a global explanation of a (black box) model?
126
What local explanation technique to use?
How to choose the subset of k local explanations?
Global Explanations from Local Feature Importances: SP-LIME
83
LIME explains a single prediction
local behavior for a single instance
Can’t examine all explanations
Instead pick k explanations to show to the user
Diverse
Should not be redundant in their descriptions
Representative
Should summarize the model’s global behavior
Single explanation
SP-LIME uses submodular optimization and greedily picks k explanations
Model Agnostic
Global Explanations from Local Rule Sets: SP-Anchor
84
Approaches for Post hoc Explainability
Local Explanations
129
Global Explanations
Representation Based Approaches
130
Representation Based Explanations
131
[Kim et. al., 2018]
Zebra
(0.97)
How important is the notion of “stripes” for this prediction?
Representation Based Explanations: TCAV
132
Examples of the concept “stripes”
Random examples
Train a linear classifier to separate
activations
The vector orthogonal to the decision boundary pointing towards the “stripes” class quantifies the concept “stripes”
Compute derivatives by leveraging this vector to determine the importance of the notion of stripes for any given prediction
Quantitative Testing with Concept Activation Vectors (TCAV)
TCAV measures the sensitivity of a model’s prediction to user provided concept using the model internal representations.
133
Approaches for Post hoc Explainability
Local Explanations
134
Global Explanations
Model Distillation for Generating Global Explanations
135
Model
Predictions
Predictive
Model
Label 1
Label 1
.
.
.
Label 2
.
v1, v2
.
.
v11, v12
.
Data
Explainer
Simpler, interpretable model which is optimized to mimic the model predictions
f(x)
Generalized Additive Models as Global Explanations
136
Model
Predictions
Black Box
Model
Label 1
Label 1
.
.
.
Label 2
.
v1, v2
.
.
v11, v12
.
Data
Explainer
[Tan et. al., 2019]
Model Agnostic
Generalized Additive Models as Global Explanations: Shape Functions for Predicting Bike Demand
137
[Tan et. al., 2019]
Generalized Additive Models as Global Explanations: Shape Functions for Predicting Bike Demand
How does bike demand vary as a function of temperature?
138
[Tan et. al., 2019]
Generalized Additive Models as Global Explanations
Generalized Additive Model (GAM) :
139
[Tan et. al., 2019]
Shape functions of
individual features
Higher order feature interaction terms
Fit this model to the predictions of the black box to obtain the shape functions.
ŷ =
Decision Trees as Global Explanations
140
Model
Predictions
Black Box
Model
Label 1
Label 1
.
.
.
Label 2
.
v1, v2
.
.
v11, v12
.
Data
Explainer
[ Bastani et. al., 2019 ]
Model Agnostic
Customizable Decision Sets as Global Explanations
141
Model
Predictions
Black Box
Model
Label 1
Label 1
.
.
.
Label 2
.
v1, v2
.
.
v11, v12
.
Data
Explainer
Model Agnostic
Customizable Decision Sets as Global Explanations
142
Subgroup Descriptor
Decision Logic
Customizable Decision Sets as Global Explanations
143
Explain how the model behaves across patient subgroups with different values of smoking and exercise
Customizable Decision Sets as Global Explanations:
Desiderata & Optimization Problem
144
Fidelity
Describe model behavior accurately
Unambiguity
No contradicting explanations
Simplicity
Users should be able to look at the explanation and reason about model behavior
Customizability
Users should be able to understand model behavior across various subgroups of interest
Fidelity
Minimize number of instances for which explanation’s label ≠ model prediction
Unambiguity
Minimize the number of duplicate rules applicable to each instance
Simplicity
Minimize the number of conditions in rules;
Constraints on number of rules & subgroups;
Customizability
Outer rules should only comprise of features of user interest (candidate set restricted)
Customizable Decision Sets as Global Explanations
145
Approaches for Post hoc Explainability
Local Explanations
146
Global Explanations
Counterfactual Explanations
147
Predictive
Model
f(x)
Counterfactual Generation Algorithm
DENIED
LOANS
RECOURSES
How do recourses permitted by the model vary across various racial & gender subgroups?
Are there any biases against certain demographics?
Decision Maker
(or) Regulatory Authority
Customizable Global Summaries of Counterfactuals
148
Predictive
Model
f(x)
Algorithm for generating global summaries of counterfactuals
DENIED
LOANS
How do recourses permitted by the model vary across various racial & gender subgroups?
Are there any biases against certain demographics?
Customizable Global Summaries of Counterfactuals
149
Omg! this model is biased. It requires certain demographics to “act upon” lot more features than others.
Subgroup Descriptor
Recourse Rules
Customizable Global Summaries of Counterfactuals:
Desiderata & Optimization Problem
150
Recourse Correctness
Prescribed recourses should obtain desirable outcomes
Recourse Correctness
Minimize number of applicants for whom prescribed recourse
does not lead to desired outcome
Recourse Coverage
Minimize number of applicants for whom recourse does not exist (i.e., satisfy no rule).
Minimal Recourse Costs
Minimize total feature costs as well as magnitude of changes
in feature values
Interpretability of Summaries
Constraints on # of rules, # of conditions in rules & # of subgroups
Recourse Coverage
(Almost all) applicants should be provided with recourses
Minimal Recourse Costs
Acting upon a prescribed recourse
should not be impractical or terribly expensive
Interpretability of Summaries
Summaries should be readily understandable to
stakeholders (e.g., decision makers/regulatory authorities).
Customizability
Stakeholders should be able to understand model behavior across various subgroups of interest
Customizability
Outer rules should only comprise of features of stakeholder interest (candidate set restricted)
Customizable Global Summaries of Counterfactuals
151
Breakout Groups
Agenda
153
Evaluating Model Interpretations/Explanations
Evaluating Interpretability
155
Evaluating Interpretability
Evaluating Inherently Interpretable Models
Evaluating Bayesian Rule Lists
[Letham et. al. 2016]
Evaluating Interpretable Decision Sets
[Lakkaraju et. al. 2016]
Evaluating Interpretability of Bayesian Rule Lists and Interpretable Decision Sets
160
Interface for Objective Questions
161
Interface for Descriptive Questions
162
User Study Results
163
Task | Metrics | Our Approach | Bayesian Decision Lists |
Descriptive | Human Accuracy | 0.81 | 0.17 |
Avg. Time Spent (secs.) | 113.4 | 396.86 | |
Avg. # of Words | 31.11 | 120.57 | |
Objective | Human Accuracy | 0.97 | 0.82 |
Avg. Time Spent (secs.) | 28.18 | 36.34 |
Objective Questions: 17% more accurate, 22% faster;
Descriptive Questions: 74% fewer words, 71% faster.
Evaluating Prototype and Attention Layers
[Jain and Wallace, 2019]
No!!
Evaluating Post hoc Explanations
[Agarwal et. al., 2022]
Evaluating Faithfulness of Post hoc Explanations – Ground Truth
166
Evaluating Faithfulness of Post hoc Explanations – Ground Truth
167
Spearman rank correlation coefficient computed over features of interest
Evaluating Faithfulness of Post hoc Explanations – Explanations as Models
168
Evaluating Faithfulness of Post hoc Explanations
How important are selected features?
170
% of Pixels deleted
Prediction Probability
How important are selected features?
171
% of Pixels deleted
Prediction Probability
How important are selected features?
172
% of Pixels deleted
Prediction Probability
How important are selected features?
173
% of Pixels deleted
Prediction Probability
How important are selected features?
174
% of Pixels deleted
Prediction Probability
How important are selected features?
175
% of Pixels deleted
Prediction Probability
How important are selected features?
176
% of Pixels inserted
Prediction Probability
How important are selected features?
177
Prediction Probability
% of Pixels inserted
How important are selected features?
178
Prediction Probability
% of Pixels inserted
How important are selected features?
179
Prediction Probability
% of Pixels inserted
How important are selected features?
180
Prediction Probability
% of Pixels inserted
Evaluating Stability of Post hoc Explanations
[Alvarez-Melis, 2018; Agarwal et. al., 2022]
Local Lipschitz Constant
Input
Post hoc Explanation
max
Evaluating Stability of Post hoc Explanations
[Agarwal et. al., 2022]
Evaluating Fairness of Post hoc Explanations
[Dai et. al., 2022]
Evaluating Interpretability of Post hoc Explanations
184
Predicting Behavior (“Simulation”)
185
Classifier
Predictions & Explanations
Show to user
Data
Predictions
New Data
User guesses what
the classifier would do
on new data
Compare Accuracy
Predicting Behavior (“Simulation”)
186
Human-AI Collaboration
187
Human-AI Collaboration
188
https://machineintheloop.com/deception/
Can we improve the accuracy of decisions �using feature attribution-based explanations?
189
Lakkaraju et. al., 2022
Can we improve the accuracy of decisions �using feature attribution-based explanations?
190
Accuracy: 78.32%
Accuracy: 93.11%
Accuracy: 82.02%
+
At Risk (0.91)
+
At Risk (0.91)
Important Features
ESR
Family Risk
Chronic Health Conditions
Model Accuracy: 88.92%
Can we improve the accuracy of decisions �using feature attribution-based explanations?
191
Accuracy: 78.32%
Accuracy: 82.02%
+
At Risk (0.91)
+
At Risk (0.91)
Model Accuracy: 88.92%
Important Features
Appointment time
Appointment day
Zip code
Doctor ID > 150
Accuracy: 93.11%
Challenges of Evaluating Interpretable Models/Post hoc Explanation Methods
192
Open Source Tools for Quantitative Evaluation
Agenda
194
Empirically Analyzing Interpretations/Explanations
Limitations: Faithfulness
196
Gradient ⊙ Input
Guided Backprop
Guided GradCAM
Model parameter randomization test
Limitations: Faithfulness
197
Gradient ⊙ Input
Guided Backprop
Guided GradCAM
Model parameter randomization test
Limitations: Faithfulness
198
Gradient ⊙ Input
Guided Backprop
Guided GradCAM
No!!
Model parameter randomization test
Limitations: Faithfulness
199
Randomizing class labels of instances also didn’t impact explanations!
Limitations: Stability
200
Are post-hoc explanations unstable wrt small non-adversarial input perturbation?
input
model
hyperparameters
Local Lipschitz Constant
Input
Explanation function: LIME, SHAP, Gradient...etc.
Limitations: Stability
201
Estimate for 100 tests for an MNIST Model.
Are post-hoc explanations unstable wrt small non-adversarial input perturbation?
Limitations: Stability – Problem is Worse!
202
[Slack et. al., 2020]
Many = 250 perturbations; Few = 25 perturbations;
When you repeatedly run LIME on the same instance, you get different explanations (blue region)
Problem with having too few perturbations? If so, what is the optimal number of perturbations?
203
Post-hoc Explanations are Fragile
Post-hoc explanations can be easily manipulated.
204
Post-hoc Explanations are Fragile
Post-hoc explanations can be easily manipulated.
205
Post-hoc explanations can be easily manipulated.
Post-hoc Explanations are Fragile
206
Post-hoc explanations can be easily manipulated.
Post-hoc Explanations are Fragile
207
Adversarial Attacks on Explanations
Adversarial Attack of Ghorbani et. al. 2018
Minimally modify the input with a small perturbation without changing the model prediction.
208
Adversarial Attacks on Explanations
Adversarial Attack of Ghorbani et. al. 2018
Minimally modify the input with a small perturbation without changing the model prediction.
209
Adversarial Attacks on Explanations
Adversarial Attack of Ghorbani et. al. 2018
Minimally modify the input with a small perturbation without changing the model prediction.
210
Scaffolding attack used to hide classifier dependence on gender.
Adversarial Classifiers to fool LIME & SHAP
Vulnerabilities of LIME/SHAP: Intuition
211
Several perturbed data points are out of distribution (OOD)!
Vulnerabilities of LIME/SHAP: Intuition
212
Adversaries can exploit this and build a classifier that is biased on in-sample data points and unbiased on OOD samples!
Building Adversarial Classifiers
213
Building Adversarial Classifiers
214
Building Adversarial Classifiers
215
216
Limitations: Stability
Post-hoc explanations can be unstable to small, non-adversarial, perturbations to the input.
217
Limitations: Stability
Post-hoc explanations can be unstable to small, non-adversarial, perturbations to the input.
‘Local Lipschitz Constant’
Input
Explanation function: LIME, SHAP, Gradient...etc.
218
Limitations: Stability
Estimate for 100 tests for an MNIST Model.
219
Sensitivity to Hyperparameters
Explanations can be highly sensitive to hyperparameters such as random seed, number of perturbations, patch size, etc.
220
Utility: High fidelity explanations can mislead
In a bail adjudication task, misleading high-fidelity explanations improve end-user (domain experts) trust.
True Classifier relies on race
221
True Classifier relies on race
High fidelity ‘misleading’ explanation
In a bail adjudication task, misleading high-fidelity explanations improve end-user (domain experts) trust.
Utility: High fidelity explanations can mislead
Utility: Post hoc Explanations Instill Over Trust
222
[Kaur et. al., 2020; Bucinca et. al., 2020]
Responses from Data Scientists Using Explainability Tools
(GAM and SHAP)
223
[Kaur et. al., 2020]
224
Utility: Explanations for Debugging
In a housing price prediction task, Amazon mechanical turkers are unable to use linear model coefficients to diagnose model mistakes.
225
In a dog breeds classification task, users familiar with machine learning rely on labels, instead of saliency maps, for diagnosing model errors.
Utility: Explanations for Debugging
226
In a dog breeds classification task, users familiar with machine learning rely on labels, instead of saliency maps, for diagnosing model errors.
Utility: Explanations for Debugging
227
Conflicting Evidence on Utility of Explanations
228
Conflicting Evidence on Utility of Explanations
Utility: Disagreement Problem in XAI
229
Krishna and Han et. al., 2022
Practitioner Inputs on Explanation Disagreement
230
How do Practitioners Resolve Disagreements?
231
How do Practitioners Resolve Disagreements?
232
233
Empirical Analysis: Summary
Theoretically Analyzing Interpretable Models
Theoretical Analysis of Tabular LIME w.r.t. Linear Models
235
[Garreau et. al., 2020]
Unification and Robustness of LIME and SmoothGrad
236
[Agarwal et. al., 2020]
Function Approximation Perspective to Characterizing Post hoc Explanation Methods
237
[Han et. al., 2022]
Function Approximation Perspective to Characterizing Post hoc Explanation Methods
238
[Han et. al., 2022]
Function Approximation Perspective to Characterizing Post hoc Explanation Methods
239
Agenda
240
Future of Model Understanding
241
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Methods for More Reliable Post hoc Explanations
242
Challenges with LIME: Stability
243
Alvarez-Melis, 2018
Challenges with LIME: Consistency
244
Slack et. al., 2020
Many = 250 perturbations; Few = 25 perturbations;
When you repeatedly run LIME on the same instance,
you get different explanations (blue region)
Challenges with LIME: Consistency
245
Problem with having too few perturbations?
What is the optimal number of perturbations?
Can we just use a very large number of perturbations?
Challenges with LIME: Scalability
246
Generating reliable explanations using LIME can be computationally expensive!
Explanations with Guarantees: BayesLIME and BayesSHAP
247
BayesLIME and BayesSHAP
248
Feature Importances
Perturbations
Weighting Function
Black Box Predictions
Priors on feature importances and feature importance uncertainty
BayesLIME and BayesSHAP: Inference
249
These are the same equations used in LIME & SHAP!
Estimating the Required Number of Perturbations
250
Estimate required number of perturbations for user specified uncertainty level.
Improving Efficiency: Focused Sampling
251
This approach allows us to construct explanations with
user defined levels of confidence in an efficient manner!
Other Questions
252
Future of Model Understanding
253
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Theoretical Analysis of the Behavior of Explanations/Models
254
Future of Model Understanding
255
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Empirical Analysis of Correctness/Utility
256
Future of Model Understanding
257
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Characterizing Similarities and Differences
258
Future of Model Understanding
259
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Model Understanding Beyond Classification
260
[Coppens et. al., 2019, Amir et. al. 2018]
[Ying et. al., 2019]
Future of Model Understanding
261
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Intersections with Model Robustness
262
Future of Model Understanding
263
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Intersections with Model Fairness
264
Future of Model Understanding
265
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Intersections with Differential Privacy
266
[Harder et. al., 2020; Patel et. al. 2020]
Future of Model Understanding
267
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
New Interfaces, Tools, Benchmarks for �Model Understanding
268
[Lakkaraju et. al., 2022, Slack et. al., 2022]
Some Parting Thoughts..
269
Thank You!
270