1 of 20

Understanding Black-box Predictions via Influence Functions

Alex Adam, Keiran Paster, Jenny (Jingyi) Liu

4/1/2021

CSC2541

Paper by: Pang Wei Koh, Percy Liang

2 of 20

Outline

Intro to Influence Functions
Use Cases / Experimental Results
Colab Notebook

3 of 20

Introduction to Influence Functions

4 of 20

Influence of a training input

where

Training set

Loss on single point:

, full loss:

Empirical risk minimizer:

5 of 20

Influence of a training input

where

Training set

Loss on single point:

, full loss:

Empirical risk minimizer:

How does our model’s predictions change if we remove training example ?

6 of 20

Influence of a training input

Key idea: more generally, upweighting a point by yields:

7 of 20

Influence of a training input

Key idea: more generally, upweighting a point by yields:

Effect on the parameters:

8 of 20

Influence of a training input

Key idea: more generally, upweighting a point by yields:

Effect on the parameters:

9 of 20

Influence of a training input

Key idea: more generally, upweighting a point by yields:

Effect on the parameters:

Effect on the loss for a single test example:

10 of 20

Perturbing a training input

Want to find:

11 of 20

Efficiently calculating influence

Precompute for each test example:

Naive computation:

Conjugate gradients to transform matrix inversion into optimization:

Hessian-vector products to speed up conjugate gradients
Can also use stochastic estimation to avoid going through all training points for every CG iteration

12 of 20

Influence functions vs leave-one-out retraining

Logistic regression on MNIST

Non-convergent, non-convex setting

13 of 20

Influence functions vs leave-one-out retraining

Smooth approximations to hinge loss

14 of 20

Uses Cases of Influence Functions

15 of 20

Understanding Model Behavior

We want to find which training points are “responsible” for a given prediction.
We can use to approximate the effect of a particular training image.
Questions:

How does the influence function correlate with raw pixel distance?
How does this correlation vary with different model types? Does this reveal anything about how the models make decisions?

16 of 20

Understanding Model Behavior

The authors compare an SVM with an Inception v3 network.
In the SVM, influence values are more correlated with pixel distance.
For the deep network, images of the same types of fish and even images of similar looking dogs (top right) are influential.

17 of 20

Adversarial Training Examples

Recall that tells us approximately the effect �changing will have on the loss at
We can set in the direction of to maximally increase the test loss at
In practice, we iterate
Measuring the magnitude of can also quantify how vulnerable a model is to training-set attacks.

18 of 20

Adversarial Training Examples

The authors found for 57% of test images, it was possible to flip the model’s prediction by changing 1 image.
They can change the prediction on multiple test images by changing just a single training image.

19 of 20

Fixing Mislabeled Examples

Often datasets are noisy and training examples may be mislabeled.
Human experts may be able to recognize mislabeled examples, but it might be impossible to correct everything if the dataset is too large.
By using influence functions, human experts can prioritize their attention to correcting points that have the highest influence on model decisions.
Since we don’t have access to the test set, we measure

20 of 20

Fixing Mislabeled Examples

The authors flipped the labels of 10% of training data and then simulated manually inspecting a fraction of the training points, correcting them.
Using the influence function to prioritize which points to correct outperformed randomly selecting labels to fix and prioritizing points with high training loss.