1 of 23

Why should I trust you?

Explaining the Predictions of Any Classifier

2 of 23

Trusting a model

Widespread adoption of machine learning requires more trust in a model:

Trusting a prediction - when using model directly
Trusting a model - when deploying

3 of 23

LIME

Local

Interpretable

Model-Agnostic

Explanations

4 of 23

Explanations make model’s output more reliable

5 of 23

Why not just the test set?

Test data may not correspond to real-world data - accuracy is often overestimated
Data leakages
Validation metric may not be indicative of the model’s goal - it’s hard to evaluate model without the prior knowledge

6 of 23

Two models that for specific input give the same output, may do so for very different reasons.

It’s easy to see the difference and pick better model when given explanations (due to prior knowledge).

7 of 23

Intuition

8 of 23

General framework

9 of 23

To use LIME for a specific model, we need to specify:

Class of potentially interpretable models G (linear/logistic regression, decision trees, …)
Measure of model complexity Ω(g) (number of non-zero weights, depth of the tree, …)
What is the interpretable representation (superpixels, presence of a specific word, ...)
Proximity measure (weights of training samples for G)
Loss function (often imposed by choice of G)

10 of 23

Concrete example - sparse linear explanations

11 of 23

Explaining Inception prediction

12 of 23

Explaining model globally

13 of 23

Explaining model globally

14 of 23

15 of 23

Experiments

16 of 23

Are explanations faithful to the model?

Each model (sparse LR and decision tree) is trained to use at most 10 features.

We test how many of these features are recovered by:

Random
Parzen
Greedy
LIME

17 of 23

Should I trust this prediction?

25% of the features are randomly selected as untrustworthy. Prediction of a model is considered untrustworthy if the prediction changes when untrustworthy features are removed. We test precision and recall of finding untrustworthy predictions.

18 of 23

Can I trust this model?

We add 10 “noisy” features and train pair of random forests with:

Similar validation accuracy
Significantly different test accuracy

We test if user can identify the better classifier based on explanations from the validation set. Simulated user picks the model with fewer untrustworthy predictions.

19 of 23

MTurk experiments

20 of 23

Can users select the best classifier?

Users (with no knowledge of machine learning) need to decide between two models:

SVM trained on the original 20 newsgroups dataset (test accuracy 94%)
SVM trained on a “cleaned” dataset (test accuracy 88.6%)

Second model is better but it’s not evident from the test score.

21 of 23

Can non-experts improve a classifier?

Users (who are unfamiliar with feature engineering) need to identify which words from the explanations are unimportant and should be removed. Then new classifiers are trained.

22 of 23

Do explanations lead to insights?

We train classifier distinguishing between wolves and huskies. In the training set each wolf had snow in the background while huskies did not. Users were asked three questions:

Do they trust this algorithm to work well in the real world?
Why?
What features do they think the algorithm is using?

23 of 23

tinyurl.com/WhyShouldITrustYou