1 of 8

Neuron Shapley:

Discovering Responsible Neurons

Amirata Ghorbani and James Zou

NeuraIPS 2020

CSG@UEF Reading Club

Xuechen Liu

2022.03.18

2 of 8

Background: Interpretability of DNN

Like a decision tree in ASR, we have to answer several questions to move to different branches, like the right-hand side figure.
Approximation is “kind of” an approach towards better interpretability and expressiveness of the model
But here, we focus on sensitivity analysis by analyzing components’ contributions in a DNN

Shapley Value

Lloyd Shapley, 2012, photo credit: NY Times

Neuron Shapley

Goal: Find the contribution of each neuron/filter to the behavior of the network
We can reflect the properties of Shapley values into the neural network filters

Zero contribution: Neurons removing which have no effect on performance shall be assigned with 0
Symmetric elements: Two neurons should have equal contributions assigned if they are exchangeable under every possible setting
Additivity in performance metric: The overall performance of the model can be the addition of multiple values computed on different testing points.

“The marginal contribution to the performance of every subnetwork S of the original model (normalized by the number of subnetworks with the same cardinality |S|)”
The equation is essentially same as the original Shapley definition

Optimizations

Computing Shapley values are very expensive
Instead of finding everybody’s contribution, the paper considers only top contributors
Early Truncation: Only compute shapley values for the most related elements
Two sampling methods

Monte-carlo estimation: Compute expectation rather than everything one-by-one
Adaptive Sampling via multi-arm bandit (MAB): We sample a subset of neurons within a confidence bound

Experiments

78.1% reported accuracy
Shapley value computation is performed on the 17216 filters before the logit layer

Experiments

TMAB Shapley is much faster, which confirms the efficiency of proposed MAB
Seems only a small number of filters/neurons are critical for the performance
Also there are numbers of filters which can be vulnerable to adversarial samples, removing them can correspond to fair level of accuracy (but not usable)
Shapley values can also be used to detect the “culprit” unfair filters, improving the gender detection accuracy

My Takeaways

Neuron Shapley is a flexible framework that can be applied to any types of models.
Further characterization models such as SHAP can also be applied to neural networks, and there are some works already
But it does not answer all questions Shapley values may bring. I do have question about how it deals exactly with feature correlation, for example.
This paper also gives an interesting perspective on interpretability: Interpretability research shall at the end be beneficial for model fine-tuning and repairing