Some new directions for Explainable AI
An illustrated tutorial
Céline Hudelot, Wassila Ouerdane,
Thomas Fel and Antonin Poché
PFIA 2024, La Rochelle
Overview
2
1. Introduction
2. Attribution Methods
3. Evaluation and Human-centered perspective
4. Concept-based XAI
1. Context: the era of promising high-performing models
3
Driving in Paris (source : Valeo)
Helping mathematical sciences (Nature, 2023)
Finding new materials : A-lab Berkeley
Aided-Diagnostics (Nature, 2016)
1. Context: the era of black-box models
4
See also:
‘Towards A Rigorous Science of Interpretable Machine Learning’, Doshi-Velez & Kim.
‘The Mythos of Model Interpretability’, Lipton
Tesla on autopilot slam into truck
1. Context: the era of black-box models
5
1. Why do we need Explainability ?
6
source: Freepik
[1] Bryce Goodman & al. European union regulations on algorithmic decision-making and a"right to explanation".
[2] Finale Doshi-Velez & al. Accountability of ai under the law: The role of explanation
[3] Gabriel Cadamuro & al. Debugging machine learning models
[4] Alfredo Vellido & al. Making machine learning models interpretable
1. Why do we need Explainability ?
7
source: Freepik
[1] Bryce Goodman & al. European union regulations on algorithmic decision-making and a"right to explanation".
[2] Finale Doshi-Velez & al. Accountability of ai under the law: The role of explanation
[3] Gabriel Cadamuro & al. Debugging machine learning models
[4] Alfredo Vellido & al. Making machine learning models interpretable
1. Why do we need Explainability ?
8
Ribeiro et al.: "Why Should I Trust You?"
[1] Bryce Goodman & al. European union regulations on algorithmic decision-making and a"right to explanation".
[2] Finale Doshi-Velez & al. Accountability of ai under the law: The role of explanation
[3] Gabriel Cadamuro & al. Debugging machine learning models
[4] Alfredo Vellido & al. Making machine learning models interpretable
1. Why do we need Explainability ?
9
[1] Bryce Goodman & al. European union regulations on algorithmic decision-making and a"right to explanation".
[2] Finale Doshi-Velez & al. Accountability of ai under the law: The role of explanation
[3] Gabriel Cadamuro & al. Debugging machine learning models
[4] Alfredo Vellido & al. Making machine learning models interpretable
Ribeiro et al.: "Why Should I Trust You?"
1. XAI: eXplainable Artificial Intelligence
10
Source: Gunning, D., Vorm, E., Wang, J.Y. and Turek, M. (2021), DARPA's explainable AI (XAI) program: A retrospective. Applied AI Letters, 2: e61.
1. The key components of Explainability
11
1. Scope of the explanation
12
Feature Viz,
Concept Activation Vector
Explanation “by design”
...
Feature Attribution
Feature Inversion
...
Nearest Neighbourhood
Influence Function
Prototypes
...
1. Application time
13
1. Necessary information
14
1. Format of the explanations
15
Fel, et al (CVPR, 2023)
Captum tutorial
Augustin et al. (NeurIPS 2023)
Xplique
Example-based
Model surrogate
Attributions
Concept-based
Feature viz
1. Target of the explanation
16
Explanations need to be tailored to users depending on its knowledge and the context:
Example of a fifty year old man with lung cancer.
1. Target of the explanation
17
End users
Regulatory entities
Data scientist
Domain expert
1. Recall
18
Intrinsic explainable models
Post-hoc models
Model agnostic
Model specific
Ribeiro et. al, 2016
1.
19
antonin.poche (at) irt-saintexupery.com
Overview
20
1. Introduction
2. Attribution Methods
2. Attributions: definition
21
2. Attributions: main approaches
22
Perturbation-based
Back-Propagation-based
in
out
White-box
Weights
and/or
gradients
🔍
Principle : assign an importance to each input variable for a given prediction
2. Attributions: gradient-based methods
23
2. Attributions: gradients-based methods
24
2. Attributions: perturbation-based methods
25
2. Attributions: CAM family
26
2. Attributions: qualitative results
Gradient-based
Perturbation based
27
2. Attributions: class-specific
28
Selvaraju et al. (ICCV 2019)
2. Notebook time !
29
!
!
2. Which one is “the best” ?
30
Overview
31
1. Introduction
2. Attribution Methods
3. Evaluation and Human-centered perspective
3. Sanity check: a first problem
32
3. Confirmation bias and over-interpretation
33
Just because it makes sense to humans
doesn't mean it reflects the evidence for prediction.
3. Discriminative problem, The Phantom menace
34
Rudin, Cynthia. “Please Stop Explaining Black Box Models for High Stakes Decisions.” Nature Machine Intelligence
3. Adversarial strike back: Attribution can be manipulated
35
Fairwashing Explanations with Off-Manifold Detergent
Interpretation of
Neural Networks is Fragile
Interpretable Deep Learning under Fire
Anders et. al (2020)
Ghorbani et. al (2017)
Zhang et. al (2018)
3. Fidelity metrics, A New Hope
36
3. Application time
37
!
!
3. other Fidelity metrics…
38
µFidelity
Bhatt et. al, 2020
Infidelity
Yeh et. al, 2019
AOPC
Samek et. al, 2015
3. Stability metrics
39
Stability
Alvarez-Melis et. al, 2018
Avg Stability
Bhatt et. al, 2020
Open question: Stability vs Fidelity ?
3. Fidelity metrics: back to the baselines…
40
Baseline has an impact
Picard et. al (2024), Sturmfel et. al, 2020
3. Fidelity metrics: are we moving in the right direction ?
41
3. Attribution methods: Mixed results
42
Making attribution methods (1) more faithful or (2) less visually complex,
not a promising avenue.
Colin, Fel et. al. 2021
3. Turning the Page on Attribution Methods ?
43
See also:
“Our results suggest that explanations engender human trust, even for incorrect predictions, yet are not distinct enough for users to distinguish between correct and incorrect predictions.”
Kim et. al
3. Understanding why Attribution methods are not useful*
*in a complex scenario
44
Hypothesis 1: Need new models that are more aligned with human decision-making.
Hypothesis 2: Need novel methods that go beyond the “where” information conveyed by attribution methods.
3. A big step for XAI ?
45
This is…
a shovel
!
!
Overview
46
1. Introduction
2. Attribution Methods
3. Evaluation and Human-centered perspective
4. Concept-based XAI
4. Concept-based XAI: a growing field
Have a look on the nice survey !
47
Poeta et al, 2023, Concept-based Explainable Artificial Intelligence : A Survey
2018 CAV & TACV
2019 ProtoPNet, ACE
2020 CBM, ProtoTree
2021 ICE, ICB,
2022 CRAFT, CAR
2023 Cockatiel, Holistic, Mech. Inter.
2024 Anthropic & Deep Mind
4. Types of concepts
48
Symbolic concepts
Human pre-defined attributes (e.g. shape, color)
Prototype concepts
Representative examples of peculiar traits of the training samples
Unsupervised concept bases
Embedded concept in the latent space matching human intuition
Textual concepts
Textual descriptions of the main classes
Rabbits are small, furry mammals with long ears, short fluffy tails, and strong, large hind leg
« FUR »
« LONG EARS »
4. Types of explanations
49
LONG EARS
HARE
LONG EARS
Class-concept relations
Relation among a concept and an output class of a model
Node-concept association
Explicit association of a concept with hidden nodes of the network.
Concept-visualization
Visualization of a learnt concept in terms of the input features
4. Types of methods
50
| Ante-hoc The model is trained to reason from concepts | Post-hoc Concepts are identified within the trained model |
Supervised Requires labelled concepts | e.g. Concept bottleneck models [1] | e.g. CAV, TCAV [3] |
Unsupervised Annotation free | e.g. ProtoPNet, ProtoTree [2] | e.g. ACE [4], CRAFT [5] |
[1] Koh et al, Concept Bottleneck Models. ICML 2020
[2] Chen et al, This Looks Like That: Deep Learning for Interpretable Image Recognition, NeurIPS 2019
[3] Kim et al, Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). ICML
[4] Ghorbani et al, owards Automatic Concept-based Explanations.
NeurIPS 2019
[5] Fel et al, CRAFT: Concept Recursive Activation FacTorization for Explainability, CVPR 2023
Supervised ante-hoc methods
4. C-XAI
51
Concept bottleneck models
4. C-XAI: Supervised ante-hoc methods
52
Koh et al.(ICLR 2020)
Unsupervised ante-hoc methods
4. C-XAI
53
4. C-XAI: Unsupervised ante-hoc methods
Prototypical Part Network (ProtoPNet)
54
Chen et al. (spotlight NeurIPS 2019)
Supervised post-hoc methods
4. C-XAI
55
4. Concept Activation Vector (CAV) & TCAV
56
Kim et al. (ICML 2018)
Unsupervised post-hoc methods
4. C-XAI
57
4. Unsupervised post-hoc C-XAI framework
58
Predictions
p output
Input
n samples
k dimensions (tokens, ...)
Layer L activations
d neurons
Slide credit : C. Claye
4. Unsupervised post-hoc C-XAI framework
59
Low-level concepts
Morphology, local relations
Abstract concepts
Semantic, long-distance relations
Deep neural networks encode concepts inside their representations…
Olah, et al.
(Distill, 2017)
Berlinkov, et al. (Computational Linguistics, 2020)
« Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction »
LeCun, et al. (Nature , 2015)
Why can we find such concepts ?
Slide credit : C. Claye
4. Module 1: choice of Layer
60
Which layer to extract
concepts from ?
1
Slide credit : C. Claye
4. Module 1: choice of Layer
61
1 – Concepts should have a semantic meaning for the user
2 – The user must be able to understand how the model uses concepts
Numerous non-linear operations
Low-level concepts
Morphology, local relations
Abstract concepts
Semantics, long-distance relations
Which layer to extract
concepts from ?
1
Slide credit : C. Claye
4. Module 2: choice of the samples
62
Which layer to extract
concepts from ?
1
From which samples
to extract concepts ?
2
Slide credit : C. Claye
4. Module 2: choice of the samples
63
From which samples
to extract concepts ?
2
Class-wise?
Representativity?
Quantity?
Granularity?
Examples of granularity with image and text
Photo of a rabbit with a fuzzy yellow background. In front there is grass with a vibrant green color. The rabbit is fixing us with an alerted gaze. Its ears are directed toward us as if the photograph took the picture just after making some noise.
Its ears are directed toward us
ears
4. Module 3: concept extraction method
64
Which layer to extract
concepts from ?
1
From which samples
to extract concepts ?
2
3
How to extract concepts ?
Slide credit : C. Claye
4. Module 3: concept extraction method
65
Concept property
Vielhaben, et al. (TMLR, 2023)
a1
a2
a3
c1
c2
c3
Neurons
3
How to extract concepts ?
Slide credit : C. Claye
4. Module 3: Polysemanticity / Superposition problem
66
There is a high probability that the representation basis we seek is distributed.
“direction rather than individual neurons.”
Elhage et al. 2022, Olah et al. 2020, Arora et al. 2018, Cheung et al. 2019, Fel et. al 2022
4. Module 3: concept extraction method
67
Concept property
Vielhaben, et al. (TMLR, 2023)
a1
a2
a3
c1
c2
c3
Neurons
a1
a2
a3
c1
c2
c3
Orthogonal directions
Rotation
PCA, SVD
Methods
3
How to extract
concepts ?
Slide credit : C. Claye
4. Module 3: concept extraction method
68
Concept property
Vielhaben, et al. (TMLR, 2023)
a1
a2
a3
c1
c2
c3
Neurons
a1
a2
a3
c1
c2
c3
Orthogonal directions
a1
a2
a3
c1
c2
c3
Arbitrary Directions
Non-orthogonality
Rotation
PCA, SVD
NMF
e.g.: Jourdan, et al (2023), Fel et al (2023)
Methods
3
How to extract
concepts ?
4. CRAFT: NMF based concept method
Extract, Estimate, Visualize
69
4. CRAFT: NMF based concept method
Extract, Estimate, Visualize
70
https://serre-lab.github.io/Lens/
4. Module 3: concept extraction method
71
Concept property
Vielhaben, et al. (TMLR, 2023)
Auto-encoders
e.g.: Zhao, et al (2023)
a1
a2
a3
c1
c2
c3
Neurons
a1
a2
a3
c1
c2
c3
Orthogonal directions
a1
a2
a3
c1
c2
c3
Arbitrary directions
a1
a2
a3
C2
Bases
C1
Multi-dimensionality
Non-orthogonality
Rotation
PCA, SVD
NMF
e.g.: Jourdan, et al (2023), Fel et al (2023)
Methods
3
How to extract
concepts ?
4. Module 3: concept extraction method
72
Concept property
Vielhaben, et al. (TMLR, 2023)
a1
a2
a3
c1
c2
c3
Neurons
a1
a2
a3
c1
c2
c3
Orthogonal directions
a1
a2
a3
c1
c2
c3
Arbitrary directions
a1
a2
a3
C2
Bases
C1
a1
a2
a3
Non linearity
Multi-dimensionality
Non-orthogonality
Rotation
Regions
Auto-encoders
e.g.: Zhao, et al (2023)
PCA, SVD
NMF
e.g.: Jourdan, et al (2023), Fel et al (2023)
Methods
3
How to extract
concepts ?
4. Module 4: Semantic meaning of the concepts
73
Which layer to extract
concepts from ?
1
From which samples
to extract concepts ?
2
3
How to extract
concepts ?
4
What is the semantic meaning of the concepts
4. Module 4: Concepts communication
74
Most aligned patches
______________________________________________________________
Features visualization
______________________________________________________________
Labellisation
Orange peel
Sliced orange
Fel & al, MaCO NeurIPS2023
4. Module 4: Feature Visualization
75
Olah et al. 2017-2020
4. Module 4: Feature Visualization
76
VGG16
ResNet50
ViT
Fel & al, 2023
Find an image that maximize an objective, e.g. neuron, channel, direction (concept)…
4. Module 5: Concept importance
77
Which layer to extract concepts from ?
1
From which samples
to extract concepts ?
2
3
How to extract
concepts ?
4
What is the semantic meaning of the concepts ?
How the model use the concept ?
5
4. Module 5: Concept importance
78
1 - “Local” Explanation
2 - “Global” Explanation
«
Si concept 1
et concept 0
mais pas concept 3
🡪 classe 1
»
How the model use the concept ?
5
4. A Modular framework: summary
79
3
How to extract
concepts ?
4
What is the semantic meaning of the concepts ?
How the model use the concept ?
5
Which layer to extract
concepts from ?
1
From which samples
to extract concepts ?
2
4. Application time
80
!
!
Conclusions
81
Céline Hudelot, Wassila Ouerdane,
Thomas Fel and Antonin Poché
PFIA 2024, La Rochelle
Thank you for your attention!
Any questions?
82
?
?
Céline Hudelot, Wassila Ouerdane,
Thomas Fel and Antonin Poché
PFIA 2024, La Rochelle
Discussions
83
Céline Hudelot, Wassila Ouerdane,
Thomas Fel and Antonin Poché
PFIA 2024, La Rochelle
84