1 of 20

Understanding Black-box Predictions via Influence Functions

Alex Adam, Keiran Paster, Jenny (Jingyi) Liu

4/1/2021

CSC2541

Paper by: Pang Wei Koh, Percy Liang

2 of 20

Outline

  1. Intro to Influence Functions
  2. Use Cases / Experimental Results
  3. Colab Notebook

3 of 20

Introduction to Influence Functions

4 of 20

Influence of a training input

where

Training set

Loss on single point:

, full loss:

Empirical risk minimizer:

5 of 20

Influence of a training input

where

Training set

Loss on single point:

, full loss:

Empirical risk minimizer:

How does our model’s predictions change if we remove training example ?

6 of 20

Influence of a training input

Key idea: more generally, upweighting a point by yields:

7 of 20

Influence of a training input

Key idea: more generally, upweighting a point by yields:

Effect on the parameters:

8 of 20

Influence of a training input

Key idea: more generally, upweighting a point by yields:

Effect on the parameters:

9 of 20

Influence of a training input

Key idea: more generally, upweighting a point by yields:

Effect on the parameters:

Effect on the loss for a single test example:

10 of 20

Perturbing a training input

Want to find:

11 of 20

Efficiently calculating influence

Precompute for each test example:

Naive computation:

  • Conjugate gradients to transform matrix inversion into optimization:

  • Hessian-vector products to speed up conjugate gradients
  • Can also use stochastic estimation to avoid going through all training points for every CG iteration

12 of 20

Influence functions vs leave-one-out retraining

Logistic regression on MNIST

Non-convergent, non-convex setting

13 of 20

Influence functions vs leave-one-out retraining

Smooth approximations to hinge loss

14 of 20

Uses Cases of Influence Functions

15 of 20

Understanding Model Behavior

  • We want to find which training points are “responsible” for a given prediction.
  • We can use to approximate the effect of a particular training image.
  • Questions:
    • How does the influence function correlate with raw pixel distance?
    • How does this correlation vary with different model types? Does this reveal anything about how the models make decisions?

16 of 20

Understanding Model Behavior

  • The authors compare an SVM with an Inception v3 network.
  • In the SVM, influence values are more correlated with pixel distance.
  • For the deep network, images of the same types of fish and even images of similar looking dogs (top right) are influential.

17 of 20

Adversarial Training Examples

  • Recall that tells us approximately the effect �changing will have on the loss at
  • We can set in the direction of to maximally increase the test loss at
  • In practice, we iterate
  • Measuring the magnitude of can also quantify how vulnerable a model is to training-set attacks.

18 of 20

Adversarial Training Examples

  • The authors found for 57% of test images, it was possible to flip the model’s prediction by changing 1 image.
  • They can change the prediction on multiple test images by changing just a single training image.

19 of 20

Fixing Mislabeled Examples

  • Often datasets are noisy and training examples may be mislabeled.
  • Human experts may be able to recognize mislabeled examples, but it might be impossible to correct everything if the dataset is too large.
  • By using influence functions, human experts can prioritize their attention to correcting points that have the highest influence on model decisions.
  • Since we don’t have access to the test set, we measure

20 of 20

Fixing Mislabeled Examples

  • The authors flipped the labels of 10% of training data and then simulated manually inspecting a fraction of the training points, correcting them.
  • Using the influence function to prioritize which points to correct outperformed randomly selecting labels to fix and prioritizing points with high training loss.