1 of 84

Some new directions for Explainable AI

An illustrated tutorial

Céline Hudelot, Wassila Ouerdane,

Thomas Fel and Antonin Poché

PFIA 2024, La Rochelle

2 of 84

Overview

2

1. Introduction

2. Attribution Methods

3. Evaluation and Human-centered perspective

4. Concept-based XAI

3 of 84

1. Context: the era of promising high-performing models

3

Driving in Paris (source : Valeo)

Helping mathematical sciences (Nature, 2023)

Finding new materials : A-lab Berkeley

Aided-Diagnostics (Nature, 2016)

4 of 84

1. Context: the era of black-box models

4

See also:

‘Towards A Rigorous Science of Interpretable Machine Learning’, Doshi-Velez & Kim.

‘The Mythos of Model Interpretability’, Lipton

Tesla on autopilot slam into truck

5 of 84

1. Context: the era of black-box models

5

6 of 84

1. Why do we need Explainability ?

  • Build trust in the model predictions

6

source: Freepik

[1] Bryce Goodman & al. European union regulations on algorithmic decision-making and a"right to explanation".

[2] Finale Doshi-Velez & al. Accountability of ai under the law: The role of explanation

[3] Gabriel Cadamuro & al. Debugging machine learning models

[4] Alfredo Vellido & al. Making machine learning models interpretable

7 of 84

1. Why do we need Explainability ?

  • Build trust in the model predictions
  • Satisfy regulatory requirements and certification process (e.g. AI act 2024)

7

source: Freepik

[1] Bryce Goodman & al. European union regulations on algorithmic decision-making and a"right to explanation".

[2] Finale Doshi-Velez & al. Accountability of ai under the law: The role of explanation

[3] Gabriel Cadamuro & al. Debugging machine learning models

[4] Alfredo Vellido & al. Making machine learning models interpretable

8 of 84

1. Why do we need Explainability ?

  • Build trust in the model predictions
  • Satisfy regulatory requirements and certification process (e.g. AI act 2024)
  • Reveal bias or other unintented effects learned by a model

8

Ribeiro et al.: "Why Should I Trust You?"

[1] Bryce Goodman & al. European union regulations on algorithmic decision-making and a"right to explanation".

[2] Finale Doshi-Velez & al. Accountability of ai under the law: The role of explanation

[3] Gabriel Cadamuro & al. Debugging machine learning models

[4] Alfredo Vellido & al. Making machine learning models interpretable

9 of 84

1. Why do we need Explainability ?

  • Build trust in the model predictions
  • Satisfy regulatory requirements and certification process (e.g. AI act 2024)
  • Reveal bias or other unintented effects learned by a model
  • Understand to intervene on models

9

[1] Bryce Goodman & al. European union regulations on algorithmic decision-making and a"right to explanation".

[2] Finale Doshi-Velez & al. Accountability of ai under the law: The role of explanation

[3] Gabriel Cadamuro & al. Debugging machine learning models

[4] Alfredo Vellido & al. Making machine learning models interpretable

Ribeiro et al.: "Why Should I Trust You?"

10 of 84

1. XAI: eXplainable Artificial Intelligence

10

Source: Gunning, D., Vorm, E., Wang, J.Y. and Turek, M. (2021), DARPA's explainable AI (XAI) program: A retrospective. Applied AI Letters, 2: e61.

11 of 84

1. The key components of Explainability

11

12 of 84

1. Scope of the explanation

12

Feature Viz,

Concept Activation Vector

Explanation “by design”

...

Feature Attribution

Feature Inversion

...

Nearest Neighbourhood

Influence Function

Prototypes

...

13 of 84

1. Application time

13

14 of 84

1. Necessary information

14

15 of 84

1. Format of the explanations

15

Fel, et al (CVPR, 2023)

Captum tutorial

Augustin et al. (NeurIPS 2023)

Xplique

Example-based

Model surrogate

Attributions

Concept-based

Feature viz

16 of 84

1. Target of the explanation

16

Explanations need to be tailored to users depending on its knowledge and the context:

Example of a fifty year old man with lung cancer.

17 of 84

1. Target of the explanation

17

End users

Regulatory entities

Data scientist

Domain expert

18 of 84

1. Recall

For a more detailed introduction, see our last year tutorial

18

Intrinsic explainable models

Post-hoc models

Model agnostic

Model specific

Ribeiro et. al, 2016

19 of 84

1.

19

antonin.poche (at) irt-saintexupery.com

20 of 84

Overview

20

1. Introduction

2. Attribution Methods

21 of 84

2. Attributions: definition

21

22 of 84

2. Attributions: main approaches

22

Perturbation-based

Back-Propagation-based

in

out

White-box

Weights

and/or

gradients

🔍

Principle : assign an importance to each input variable for a given prediction

23 of 84

2. Attributions: gradient-based methods

23

24 of 84

2. Attributions: gradients-based methods

24

25 of 84

2. Attributions: perturbation-based methods

25

26 of 84

2. Attributions: CAM family

26

27 of 84

2. Attributions: qualitative results

Gradient-based

Perturbation based

27

28 of 84

2. Attributions: class-specific

28

Selvaraju et al. (ICCV 2019)

29 of 84

2. Notebook time !

29

!

!

30 of 84

2. Which one is “the best” ?

30

31 of 84

Overview

31

1. Introduction

2. Attribution Methods

3. Evaluation and Human-centered perspective

32 of 84

3. Sanity check: a first problem

32

33 of 84

3. Confirmation bias and over-interpretation

33

Just because it makes sense to humans

doesn't mean it reflects the evidence for prediction.

34 of 84

3. Discriminative problem, The Phantom menace

34

Rudin, Cynthia. “Please Stop Explaining Black Box Models for High Stakes Decisions.” Nature Machine Intelligence

35 of 84

3. Adversarial strike back: Attribution can be manipulated

35

Fairwashing Explanations with Off-Manifold Detergent

Interpretation of

Neural Networks is Fragile

Interpretable Deep Learning under Fire

Anders et. al (2020)

Ghorbani et. al (2017)

Zhang et. al (2018)

36 of 84

3. Fidelity metrics, A New Hope

36

37 of 84

3. Application time

37

!

!

38 of 84

3. other Fidelity metrics…

38

µFidelity

Bhatt et. al, 2020

Infidelity

Yeh et. al, 2019

AOPC

Samek et. al, 2015

39 of 84

3. Stability metrics

39

Stability

Alvarez-Melis et. al, 2018

Avg Stability

Bhatt et. al, 2020

Open question: Stability vs Fidelity ?

40 of 84

3. Fidelity metrics: back to the baselines…

40

Baseline has an impact

Picard et. al (2024), Sturmfel et. al, 2020

41 of 84

3. Fidelity metrics: are we moving in the right direction ?

41

42 of 84

3. Attribution methods: Mixed results

42

Making attribution methods (1) more faithful or (2) less visually complex,

not a promising avenue.

Colin, Fel et. al. 2021

43 of 84

3. Turning the Page on Attribution Methods ?

43

See also:

  • ‘The effectiveness of feature attribution methods and its correlation with automatic evaluation scores’, Nguyen & al.
  • ‘HIVE: Evaluating the Human Interpretability of Visual Explanations’, Kim & al.
  • ‘Evaluating explainable AI: Which algorithmic explanations help users predict model behavior?’, Hase & al.
  • ‘Do Users Benefit From Interpretable Vision? A User Study, Baseline, And Dataset’, Sixt & al.

“Our results suggest that explanations engender human trust, even for incorrect predictions, yet are not distinct enough for users to distinguish between correct and incorrect predictions.”

Kim et. al

44 of 84

3. Understanding why Attribution methods are not useful*

*in a complex scenario

44

Hypothesis 1: Need new models that are more aligned with human decision-making.

Hypothesis 2: Need novel methods that go beyond the “where” information conveyed by attribution methods.

45 of 84

3. A big step for XAI ?

45

This is…

a shovel

!

!

46 of 84

Overview

46

1. Introduction

2. Attribution Methods

3. Evaluation and Human-centered perspective

4. Concept-based XAI

47 of 84

4. Concept-based XAI: a growing field

Have a look on the nice survey !

47

2018 CAV & TACV

2019 ProtoPNet, ACE

2020 CBM, ProtoTree

2021 ICE, ICB,

2022 CRAFT, CAR

2023 Cockatiel, Holistic, Mech. Inter.

2024 Anthropic & Deep Mind

48 of 84

4. Types of concepts

48

Symbolic concepts

Human pre-defined attributes (e.g. shape, color)

Prototype concepts

Representative examples of peculiar traits of the training samples

Unsupervised concept bases

Embedded concept in the latent space matching human intuition

Textual concepts

Textual descriptions of the main classes

Rabbits are small, furry mammals with long ears, short fluffy tails, and strong, large hind leg

« FUR »

« LONG EARS »

49 of 84

4. Types of explanations

49

LONG EARS 

HARE

LONG EARS 

Class-concept relations

Relation among a concept and an output class of a model

Node-concept association

Explicit association of a concept with hidden nodes of the network.

Concept-visualization

Visualization of a learnt concept in terms of the input features

50 of 84

4. Types of methods

50

Ante-hoc

The model is trained to reason from concepts

Post-hoc

Concepts are identified within the trained model

Supervised

Requires labelled concepts

e.g. Concept bottleneck models [1]

e.g. CAV, TCAV [3]

Unsupervised

Annotation free

e.g. ProtoPNet, ProtoTree [2]

e.g. ACE [4], CRAFT [5]

[1] Koh et al, Concept Bottleneck Models. ICML 2020

[2] Chen et al, This Looks Like That: Deep Learning for Interpretable Image Recognition, NeurIPS 2019

[3] Kim et al, Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). ICML

[4] Ghorbani et al, owards Automatic Concept-based Explanations.

NeurIPS 2019

[5] Fel et al, CRAFT: Concept Recursive Activation FacTorization for Explainability, CVPR 2023

51 of 84

Supervised ante-hoc methods

4. C-XAI

51

52 of 84

Concept bottleneck models

4. C-XAI: Supervised ante-hoc methods

52

Koh et al.(ICLR 2020)

  • Training dataset $\left\{ (x^{i}, y^{i], c^{i}) \right\}_{i=1}^{n}$ with $c$ a vector of $k$ concepts

  • Model of the form f(g(x)) with g that maps an input to the concept space and f that maps concepts to the final prediction.

  • Various learning scheme :
    • Independent
    • Sequential
    • Joint

53 of 84

Unsupervised ante-hoc methods

4. C-XAI

53

54 of 84

4. C-XAI: Unsupervised ante-hoc methods

Prototypical Part Network (ProtoPNet)

54

Chen et al. (spotlight NeurIPS 2019)

55 of 84

Supervised post-hoc methods

4. C-XAI

55

56 of 84

4. Concept Activation Vector (CAV) & TCAV

56

Kim et al. (ICML 2018)

57 of 84

Unsupervised post-hoc methods

4. C-XAI

57

58 of 84

4. Unsupervised post-hoc C-XAI framework

58

Predictions

p output

Input

n samples

k dimensions (tokens, ...)

Layer L activations

d neurons

Slide credit : C. Claye

59 of 84

4. Unsupervised post-hoc C-XAI framework

59

Low-level concepts

Morphology, local relations

Abstract concepts

Semantic, long-distance relations

Deep neural networks encode concepts inside their representations…

Olah, et al.

(Distill, 2017)

Berlinkov, et al. (Computational Linguistics, 2020)

« Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction »

LeCun, et al. (Nature , 2015)

Why can we find such concepts ?

Slide credit : C. Claye

60 of 84

4. Module 1: choice of Layer

60

Which layer to extract

concepts from ?

1

Slide credit : C. Claye

61 of 84

4. Module 1: choice of Layer

61

1 – Concepts should have a semantic meaning for the user

2 – The user must be able to understand how the model uses concepts

Numerous non-linear operations

Low-level concepts

Morphology, local relations

Abstract concepts

Semantics, long-distance relations

Which layer to extract

concepts from ?

1

Slide credit : C. Claye

62 of 84

4. Module 2: choice of the samples

62

Which layer to extract

concepts from ?

1

From which samples

to extract concepts ?

2

Slide credit : C. Claye

63 of 84

4. Module 2: choice of the samples

63

From which samples

to extract concepts ?

2

Class-wise?

Representativity?

Quantity?

Granularity?

Examples of granularity with image and text

Photo of a rabbit with a fuzzy yellow background. In front there is grass with a vibrant green color. The rabbit is fixing us with an alerted gaze. Its ears are directed toward us as if the photograph took the picture just after making some noise.

Its ears are directed toward us

ears

64 of 84

4. Module 3: concept extraction method

64

Which layer to extract

concepts from ?

1

From which samples

to extract concepts ?

2

3

How to extract concepts ?

Slide credit : C. Claye

65 of 84

4. Module 3: concept extraction method

65

Concept property

Vielhaben, et al. (TMLR, 2023)

a1

a2

a3

c1

c2

c3

Neurons

3

How to extract concepts ?

Slide credit : C. Claye

66 of 84

4. Module 3: Polysemanticity / Superposition problem

66

There is a high probability that the representation basis we seek is distributed.

“direction rather than individual neurons.”

Elhage et al. 2022, Olah et al. 2020, Arora et al. 2018, Cheung et al. 2019, Fel et. al 2022

67 of 84

4. Module 3: concept extraction method

67

Concept property

Vielhaben, et al. (TMLR, 2023)

a1

a2

a3

c1

c2

c3

Neurons

a1

a2

a3

c1

c2

c3

Orthogonal directions

Rotation

PCA, SVD

Methods

3

How to extract

concepts ?

Slide credit : C. Claye

68 of 84

4. Module 3: concept extraction method

68

Concept property

Vielhaben, et al. (TMLR, 2023)

a1

a2

a3

c1

c2

c3

Neurons

a1

a2

a3

c1

c2

c3

Orthogonal directions

a1

a2

a3

c1

c2

c3

Arbitrary Directions

Non-orthogonality

Rotation

PCA, SVD

NMF

e.g.: Jourdan, et al (2023), Fel et al (2023)

Methods

3

How to extract

concepts ?

69 of 84

4. CRAFT: NMF based concept method

Extract, Estimate, Visualize

69

70 of 84

4. CRAFT: NMF based concept method

Extract, Estimate, Visualize

70

https://serre-lab.github.io/Lens/

71 of 84

4. Module 3: concept extraction method

71

Concept property

Vielhaben, et al. (TMLR, 2023)

Auto-encoders

e.g.: Zhao, et al (2023)

a1

a2

a3

c1

c2

c3

Neurons

a1

a2

a3

c1

c2

c3

Orthogonal directions

a1

a2

a3

c1

c2

c3

Arbitrary directions

a1

a2

a3

C2

Bases

C1

Multi-dimensionality

Non-orthogonality

Rotation

PCA, SVD

NMF

e.g.: Jourdan, et al (2023), Fel et al (2023)

Methods

3

How to extract

concepts ?

72 of 84

4. Module 3: concept extraction method

72

Concept property

Vielhaben, et al. (TMLR, 2023)

a1

a2

a3

c1

c2

c3

Neurons

a1

a2

a3

c1

c2

c3

Orthogonal directions

a1

a2

a3

c1

c2

c3

Arbitrary directions

a1

a2

a3

C2

Bases

C1

a1

a2

a3

Non linearity

Multi-dimensionality

Non-orthogonality

Rotation

Regions

Auto-encoders

e.g.: Zhao, et al (2023)

PCA, SVD

NMF

e.g.: Jourdan, et al (2023), Fel et al (2023)

Methods

3

How to extract

concepts ?

73 of 84

4. Module 4: Semantic meaning of the concepts

73

Which layer to extract

concepts from ?

1

From which samples

to extract concepts ?

2

3

How to extract

concepts ?

4

What is the semantic meaning of the concepts

74 of 84

4. Module 4: Concepts communication

74

Most aligned patches

______________________________________________________________

Features visualization

______________________________________________________________

Labellisation

Orange peel

Sliced orange

Fel & al, MaCO NeurIPS2023

75 of 84

4. Module 4: Feature Visualization

75

Olah et al. 2017-2020

76 of 84

4. Module 4: Feature Visualization

76

VGG16

ResNet50

ViT

Fel & al, 2023

Find an image that maximize an objective, e.g. neuron, channel, direction (concept)…

77 of 84

4. Module 5: Concept importance

77

Which layer to extract concepts from ?

1

From which samples

to extract concepts ?

2

3

How to extract

concepts ?

4

What is the semantic meaning of the concepts ?

How the model use the concept ?

5

78 of 84

4. Module 5: Concept importance

78

1 - “Local” Explanation

2 - “Global” Explanation

«

Si concept 1

et concept 0

mais pas concept 3

🡪 classe 1

»

How the model use the concept ?

5

79 of 84

4. A Modular framework: summary

79

3

How to extract

concepts ?

4

What is the semantic meaning of the concepts ?

How the model use the concept ?

5

Which layer to extract

concepts from ?

1

From which samples

to extract concepts ?

2

80 of 84

4. Application time

80

!

!

81 of 84

Conclusions

81

Céline Hudelot, Wassila Ouerdane,

Thomas Fel and Antonin Poché

PFIA 2024, La Rochelle

82 of 84

Thank you for your attention!

Any questions?

82

?

?

Céline Hudelot, Wassila Ouerdane,

Thomas Fel and Antonin Poché

PFIA 2024, La Rochelle

83 of 84

Discussions

83

Céline Hudelot, Wassila Ouerdane,

Thomas Fel and Antonin Poché

PFIA 2024, La Rochelle

84 of 84

To be keeped informed :

https://mygdr.hosted.lip6.fr/register

to the GDR RADIA - GT EXPLICON

84