1 of 153

Self-Explaining for Intuitive Interaction with AI

Ana Marasović�

Allen Institute for AI (AI2) AllenNLP University of Washington

2 of 153

2

AI technology has become an integral part of most people’s daily lives

3 of 153

3

+

text + labels

neural network

+

(not) spam

4 of 153

4

+

(not) sick

+

text + labels

neural network

(not) sick

AI Developer

Domain Experts�(Doctors)

People Affected by AI

(Patients)

5 of 153

5

Increasingly harder to opt out

  • Faster diagnosis
  • Better treatment
  • Less burnout & stress
  • Faster diagnosis
  • Better treatment
  • Hurt their patients
  • Bad performance review
  • Getting fired
  • Lawsuits
  • Delayed care
  • Wrong treatment
  • Death

Doctors

Patients

Promised Benefits

Risks

6 of 153

Challenge: How to maximize the benefits of AI systems while preventing and minimizing risks?

Approach: Build systems involving AI that are able to maintain contracts that are created to enable people to have appropriate confidence in the AI’s development and its applications

6

7 of 153

An AI model is trustworthy to a given contract if it is capable of maintaining the contract.

��

7

[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]

8 of 153

An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.

8

[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]

9 of 153

An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.��Trust does not exist if the human does not perceive risk.��

9

[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]

10 of 153

An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.��Trust does not exist if the human does not perceive risk.��Human’s contractual trust in AI is warranted if it is caused by AI’s trustworthiness. Otherwise, human’s trust in AI is unwarranted.

10

[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]

11 of 153

11

My work: Build AI Trustworthy to These Contracts

12 of 153

12

Robustness

[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]

[Marasović et al.; LILT 2016]

My work: Build AI Trustworthy to These Contracts

13 of 153

13

Robustness

Supporting Users’ Agency

[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]

[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]

[Marasović et al.; LILT 2016]

My work: Build AI Trustworthy to These Contracts

14 of 153

14

Robustness

Quality & Integrity of Data

Supporting Users’ Agency

[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]

[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]

[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]

[Marasović et al.; LILT 2016]

My work: Build AI Trustworthy to These Contracts

15 of 153

15

Robustness

Quality & Integrity of Data

Supporting Users’ Agency

[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]

[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]

[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]

[Marasović et al.; LILT 2016]

[Marasović and Frank; NAACL 2018]

[Zopf, …, Marasović, …, Frank; SNAMS 2018]

Green AI

My work: Build AI Trustworthy to These Contracts

16 of 153

16

Robustness

Quality & Integrity of Data

Supporting Users’ Agency

[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021][Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]

[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]

[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]

[Marasović et al.; LILT 2016]

[Marasović and Frank; NAACL 2018]

[Zopf, …, Marasović, …, Frank; SNAMS 2018]

Green AI

My work: Build AI Trustworthy to These Contracts

17 of 153

17

Extend models to self-explain:�predict & elaborate on the prediction

Help them create a mental model about how to interact with AI

Intuitive explanations motivated by frameworks of explainability in social sciences

Supporting Users’ Agency

18 of 153

Local explanations: Justifications of models’ individual predictions

A dominant ML/NLP perspective on local explanations:

  • Causal attribution: given a set of factors (usually, input words/pixels), select �all factors that cause the model’s decision

18

19 of 153

Insights from Social Science

Explanations are selected (in a biased manner) because:

  1. Reducing cognitive load: causal chains are often too large to comprehend�
  2. Explainee cares only about a small number of causes (relevant to the context)

19

[Miller, 2019]

20 of 153

20

+ documents from� the Web

misleading

21 of 153

21

+ documents from� the Web

misleading

Why misleading?�Am I missing out on this? Is the flagging wrong (again)?

22 of 153

Self-explaining with free-text explanations: �Given in plain language, immediately provide the gist of why is the input labeled as it is

22

Misleading because not every American over 65 can get these cards since they are not provided by Medicare, the federal health insurance program for senior citizens. They are offered as a benefit to some customers by private insurance companies that sell Medicare Advantage plans. The cards are available in limited geographic areas. Only the chronically ill qualify to use the cards for items such as food and produce.

+ documents from� the Web

💡

23 of 153

Insights from Social Science

Explanations are selected (in a biased manner) because:

  • Reducing cognitive load: causal chains are often too large to comprehend�
  • Explainee cares only about a small number of causes (relevant to the context)

23

[Miller, 2019]

24 of 153

Insights from Social Science

24

Explanations are contrastive = responses to:

“Why P rather than Q?”

“What changes to the input would hypothetically change the answer from P to Q?”

where P is an observed event (fact), and Q an imagined, counterfactual event that did not occur (foil)

[Miller, 2019]

25 of 153

25

Why is my post misleading?�How can I change it to make it clear/correct?

misleading

+ documents from� the Web

26 of 153

Contrastive explanations: explain how to minimally modify the input to change the prediction to something else

26

Why is my post misleading?�How can I change it to make it clear/correct?

I AM SO HAPPY I JUST LEARNED THIS!

As an American over 65, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!

misleading

27 of 153

Contrastive explanations: explain how to minimally modify the input to change the prediction to something else

27

I AM SO HAPPY I JUST LEARNED THIS!

As an American over 65, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!

I AM SO HAPPY I JUST LEARNED THIS!

As an American over 65 someone who has private health insurance with the Medicare Advantage plan, lives in X, and is chronically ill, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!

misleading

correct

💡

28 of 153

28

People assign human-like traits to AI models (anthropomorphic bias)

⇒ People expect explanations of models’ behavior� to follow the same conceptual framework used to� explain human behavior

⇒ No users’ agency otherwise

Why?

How to?

“Understanding how people define, generate, select, evaluate, and present explanations seems almost essential”

[Miller, 2019]

29 of 153

Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is

Contrastive explanations: explain how to minimally modify the input to change the prediction to something else

29

Extend models to self-explain:�predict & elaborate on the prediction

Intuitive explanations motivated by frameworks of explainability in social sciences

Help them create a mental model about how to interact with AI

Supporting Users’ Agency

30 of 153

30

Contrastive explanations: explain how to minimally modify the input to change the prediction to something else

How to generate contrastive explanations for standard NLP tasks such as sentiment classification, document classification, or multiple-choice question answering?

Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is

How to generate free-text explanations for visual reasoning tasks, e.g., for answering questions about images that require commonsense understanding?

31 of 153

Explaining

Visual Reasoning

31

32 of 153

Marasović et al. (2020)

Natural Language Rationales with Full-Stack Visual Reasoning: �From Pixels to Semantic Frames to Commonsense Graphs

32

33 of 153

Answering “why” by highlighting

Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story that thinks it can use various explosions to make it interesting, "the specialist" is about as exciting as an episode of "dragnet," and about as well acted. Even some attempts at film noir mood are destroyed by a sappy script, stupid and unlikable characters, and just plain nothingness. Who knew a big explosion could be so boring and anti-climactic?

Label: negative sentiment

33

[Zaidan et al., 2007]

[Lei et al., 2016]

34 of 153

Answering “why” by highlighting

34

[Adebayo et al., 2018]

35 of 153

35

[Zellers et al., 2019]

Question: What is going to happen next?

Answer: [person2] holding the photo will tell [person4] how cute their children are.

Free-text explanation: It looks like [person4] is showing the photo to [person2], and they will want to be polite.

Answering “why” by highlighting…�

…doesn’t work when the reason is not explicitly stated in the input

36 of 153

36

Free-text explanation:

  • [person4] is showing the photo to [person2]�
  • [person2] will want to be polite

We cannot highlight this in the input!

Answering “why” by highlighting…�

…doesn’t work when the reason is not explicitly stated in the input

[Zellers et al., 2019]

37 of 153

Question: Where is a frisbee in play likely to be?

Answer choices: outside, park, roof, tree, air

Free-text explanation: A frisbee is a concave plastic disc designed for skimming through the air as an outdoor game so while in play it is most likely to be in the air.

37

Answering “why” by highlighting…�

…doesn’t work when the reason is not explicitly stated in the input

[Aggarwal et al., 2021]

38 of 153

How to generate free-text explanations?

Step 1:

Find some human-written explanations

Step 2:

Finetune a pretrained transformer-based generation models (GPT-2)

38

[Wiegreffe* and Marasović*, NeurIPS 2021]

39 of 153

Pretrain-Finetune Paradigm

39

pretrain model

finetune model

text

text + labels

Option 1: mask & infill a word/span�Option 2: generate next word

standard supervised

training

40 of 153

Pretrain-Finetune Paradigm

40

pretrain model

finetune model

text + labels

text

41 of 153

Pretrain-Finetune Paradigm

41

pretrain model

finetune model

text + labels

text

42 of 153

How to generate free-text explanations?

Step 1:

Find some human-written explanations

Step 2:

Finetune a pretrained transformer-based generation models (T5, GPT-2/Neo)

42

[Wiegreffe* and Marasović*, NeurIPS 2021]

43 of 153

Transformer

43

44 of 153

How to generate free-text explanations?

Question: Where is a frisbee in play likely to be?

Answer choices: outside, park, roof, tree, air

Free-text explanation: A frisbee is a concave plastic disc designed for skimming through the air as an outdoor game so while in play it is most likely to be in the air.

44

[Aggarwal et al., 2021]

45 of 153

Generating

Explanations

45

question: where is a frisbee in play likely to be? choice: outside choice: park choice: roof choice: tree choice: air

[Marasović et al., Findings of EMNLP 2020]

46 of 153

Generating

Explanations

46

question: where is a frisbee in play likely to be? choice: outside choice: park choice: roof choice: tree choice: air

Air because a frisbee is a concave plastic disc designed for skimming through the air as an outdoor game so while in play it is most likely to be in the air.

[Marasović et al., Findings of EMNLP 2020]

47 of 153

47

question : where

[Marasović et al., Findings of EMNLP 2020]

48 of 153

48

question : where

???

???

???

???

[Marasović et al., Findings of EMNLP 2020]

49 of 153

Key challenge: image representation beyond explicit content

49

[person4]

photo

Question: What is going to happen next?

Answer: He is telling the waitress that the person on the left ordered the pancakes.

Free-text explanation: It looks like [person4] is showing the photo to [person2], and they will want to be polite.

showing

[person2] will want to be polite.

Raw features

Relations (semantics)

Inferences (pragmatics)

[Marasović et al., Findings of EMNLP 2020]

50 of 153

50

object detection

grounded situation recognitionO

visual commonsense graph

[person2] will want to be polite.

Raw features

Relations (semantics)

Inferences (pragmatics)

[Ren et al., 2015]

O [Pratt et al., 2020]

[Park et al., 2020]

[person4]

showing

photo

51 of 153

51

question : where

???

???

???

???

object detection

grounded situation recognition

visual commonsense graph

[Marasović et al., Findings of EMNLP 2020]

52 of 153

Back to Basics: Object Detection

52

[Ren et al., 2015]

Output:�

  1. bounding box coordinates
  2. class (e.g. cup)
  3. vector representation�

for each detected object

53 of 153

Uniform fusion: Prepend object labels to text

53

cup question :

image-related features

text-related features

Pro: very simple

Con: prone to propagation of errors from external

vision models

[Marasović et al., Findings of EMNLP 2020]

54 of 153

Hybrid fusion: Prepend object vectors to text embeddings

54

[Marasović et al., Findings of EMNLP 2020]

55 of 153

Hybrid fusion: Prepend object vectors to text embeddings

55

Pro: less error-prone

Con: image and text embeddings come from different vector spaces

box feature vector ➞ project

box’s coordinates ➞ project

sum

question :

text-related features

[Marasović et al., Findings of EMNLP 2020]

56 of 153

Tasks & Datasets

56

[Marasović et al., Findings of EMNLP 2020]

57 of 153

Tasks & Datasets

Visual Question Answering

  • VQA-E [Li et al., 2018]�
  • Gold explanations automatically extracted�
  • The easiest dataset

57

[Marasović et al., Findings of EMNLP 2020]

Question: Where is the dog laying?�

Answer: sidewalk��Free-text explanation: The white dog lays next to the bicycle on the sidewalk.

58 of 153

Tasks & Datasets

Visual Commonsense Reasoning

  • VCR [Zellers et al., 2019]�
  • Manually collected explanations �
  • Originally not introduced in a generation setting
  • The most difficult dataset

58

[Marasović et al., Findings of EMNLP 2020]

Question: What is going to happen next?�

Answer: [person2] holding the photo will tell [person4] how cute their children are.��Free-text explanation: It looks like [person4] is showing the photo to [person2], and they will want to be polite.

59 of 153

Tasks & Datasets

Visual-Textual Entailment

  • E-SNLI-VE [Do et al., 2018]�
  • Neutral instances noisy ⇒ We use only contradiction and entailment instances�
  • Explanations are re-purposed from another dataset

59

[Marasović et al., Findings of EMNLP 2020]

Hypothesis: A man is sitting down in a rocking chair.�

Label: contradiction��Free-text explanation: The man is not sitting on a rocking chair, he is sitting in front of a building.

60 of 153

Baseline

Research Questions:

�Given the correct answer in the input, do proposed visual features help GPT-2 generate explanations that…

  1. …support a given answer or entailment label better?�
  2. …are less likely to mention content that is irrelevant to a given image?

Baseline: GPT-2 without any information about the image

60

[Marasović et al., Findings of EMNLP 2020]

61 of 153

Evaluation Metric

3 crowdworkers answer:

Does the explanation support a given answer in the context of the image?

  • Instance plausibility = #“yes” / 3

61

62 of 153

62

VCR

VQA-X

E-SNLI-VE

contradiction

entailment

Best Features

visual commonsense graphs

grounded situation recognition

grounded situation recognition

Best Fusion Type

uniform

hybrid

hybrid / uniform

uniform

RVT (Our)

60.93

63.33

60.96 / 59.16

38.12

Baseline

53.14

47.20

46.85

46.75

Human

87.16

66.53

80.78

76.98

  • GPT-2 (Baseline) benefits from some form of visual adaptation

Results: Explanation Plausibility

63 of 153

63

VCR

VQA-X

E-SNLI-VE

contradiction

entailment

Best Features

visual commonsense graphs

grounded situation recognition

grounded situation recognition

Best Fusion Type

uniform

hybrid

hybrid / uniform

uniform

RVT (Our)

60.93

63.33

60.96 / 59.16

38.12

Baseline

53.14

47.20

46.85

46.75

Human

87.16

66.53

80.78

76.98

  • Best performing models are still behind human-written free-text explanations

Results: Explanation Plausibility

64 of 153

Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is

Contrastive explanations: explain how to minimally modify the input to change the prediction to something else

64

Extend models to self-explain:�predict & elaborate on the prediction

Intuitive explanations motivated by frameworks of explainability in social sciences

Help them create a mental model about how to interact with AI

Supporting Users’ Agency

65 of 153

Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is

Contrastive explanations: explain how to minimally modify the input to change the prediction to something else

65

Extend models to self-explain:�predict & elaborate on the prediction

Intuitive explanations motivated by frameworks of explainability in social sciences

Help them create a mental model about how to interact with AI

Supporting Users’ Agency

66 of 153

Contrastive explanations: explain how to minimally modify the input to change the prediction to something else

66

I AM SO HAPPY I JUST LEARNED THIS!

As an American over 65, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!

I AM SO HAPPY I JUST LEARNED THIS!

As an American over 65 someone who has private health insurance with the Medicare Advantage plan, lives in X, and is chronically ill, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!

misleading

correct

💡

67 of 153

Contrastive Explanations via Contrastive Editing

67

68 of 153

Contrastive Explanations via Contrastive Editing

68

Context:

...Dear Ann, I hope that you and your children will be here in two weeks. My husband and I will go to meet you at the train station. Our town is small...

MiCE-Edited Context:

...Dear Ann, I hope that you and your children will be here in two weeks. My husband and I will go to meet you at the train station your home on foot. Our town house is small...

Question:

Ann and her children are going to Linda’s home ____.�

(a) by bus (b) by car (c) on foot (d) by train

Why “by train” (d) and not “on foot” (c)?�How to change the answer from “by train” (d) to “on foot” (c)?

Example from the RACE dataset [Lai et al., 2017]�[Ross, Marasović, Peters, Findings of ACL 2021]

69 of 153

Deeper Into Contrastive Editing

69

70 of 153

Ross, Marasović, Peters (2021)

Explaining NLP Models via Minimal Contrastive Editing (MiCE)

70

71 of 153

Goal:

Explain a Predictor model by automatically finding a minimal edit to the input that causes the model output to change to the contrast case

A very high-level idea of 🐭:�

  • Use an Editor model to keep masking input words & filling masked positions until an edit that changes the label predicted by Predictor is found�
  • Simultaneously, minimize the masking percentage, i.e., the edit size

71

[Ross, Marasović, Peters, Findings of ACL 2021]

72 of 153

72

input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...

[Ross, Marasović, Peters, Findings of ACL 2021]

73 of 153

73

label: positive input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...

the contrast label (foil)

[Ross, Marasović, Peters, Findings of ACL 2021]

74 of 153

74

label: positive input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...

the contrast label (foil)

label: positive input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...

mask n% of input tokens

[Ross, Marasović, Peters, Findings of ACL 2021]

75 of 153

75

label: positive input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...

the contrast label (foil)

label: positive input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...

  1. label: positive input: Sylvester Stallone has made some good films in his lifetime, but this has got to be one of the worst. A totally novel story... �
  2. label: positive input: Sylvester Stallone has made some great films in his lifetime, but this has got to be one of the greatest of all time. A totally boring story... �

...�

15. label: positive input: Sylvester Stallone has made some wonderful films in� his lifetime, but this has got to be one of the greatest. A totally tedious� story...

sample 15 spans at each masked position

mask n% of input tokens

76 of 153

76

label: positive input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...

the contrast label (foil)

label: positive input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...

  • label: positive input: Sylvester Stallone has made some good films in his lifetime, but this has got to be one of the worst. A totally novel story... �
  • label: positive input: Sylvester Stallone has made some great films in his lifetime, but this has got to be one of the greatest of all time. A totally boring story... �

...�

15. label: positive input: Sylvester Stallone has made some wonderful films in� his lifetime, but this has got to be one of the greatest. A totally tedious� story...

get the logit of the contrast label

sample 15 spans at each masked position

mask n% of input tokens

77 of 153

77

4*15=60 samples

  1. Prepend the contrast label to the input
  2. Mask n% of the input tokens
  3. Sample 15 spans at masked positions

4 different values of n to minimize the edit

[Ross, Marasović, Peters, Findings of ACL 2021]

78 of 153

78

4*15=60 samples

rank 60 samples w.r.t. the logit of the contrast label

[Ross, Marasović, Peters, Findings of ACL 2021]

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample 15 spans at masked positions

4 different values of n to minimize the edit

79 of 153

79

keep top-3 samples

beam

[Ross, Marasović, Peters, Findings of ACL 2021]

4*15=60 samples

rank 60 samples w.r.t. the logit of the contrast label

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample 15 spans at masked positions

4 different values of n to minimize the edit

80 of 153

80

if a contrastive edit is found

[Ross, Marasović, Peters, Findings of ACL 2021]

keep top-3 samples

beam

4*15=60 samples

rank 60 samples w.r.t. the logit of the contrast label

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample 15 spans at masked positions

4 different values of n to minimize the edit

🛑

81 of 153

81

keep top-3 samples

beam

4*15=60 samples

rank 60 samples w.r.t. the logit of the contrast label

Repeat for every instance in the beam at most 2 more rounds

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample 15 spans at masked positions

4 different values of n to minimize the edit

82 of 153

82

Can a pretrained model without any additional tweaks fill in the spans?

We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label

(standard masking) Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...

(targeted masking) label: negative input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

83 of 153

83

MiCE is a two-stage approach to generating contrastive edits

  • Stage 1: Prepare an editor�
  • Stage 2: Make edits guided with gradients & logits of the model� we’re explaining

[Ross, Marasović, Peters, Findings of ACL 2021]

84 of 153

Tasks & Datasets

  1. Binary sentiment classification in the IMDB dataset [Mass et al., 2011]�
  2. A six-class version of the NewsGroups topic classification [Lang et al., 1995]�
  3. A multiple choice question-answering in the RACE dataset [Lai et al., 2017]

84

85 of 153

Results – Flip Rate

85

1.0 when we find a contrastive edit for all instances

[Ross, Marasović, Peters, Findings of ACL 2021]

86 of 153

Results – Edit Minimality

86

The minimum number of deletions, insertions, or substitutions required to transform the original to the

edited instance

lower is better; we change on average 18.5-33.5% of the input tokens

The size of the IMDB edits is similar to human edits*

* Compared to IMDB edits in [Gardner et al., 2020]

87 of 153

Results – Edit Fluency

87

1.0 when a LM loss pre- and post-editing doesn’t change

[Ross, Marasović, Peters, Findings of ACL 2021]

88 of 153

How MiCE Edits Can Be Used?

MiCE’s edits can offer hypotheses about model “bugs”

88

[Ross, Marasović, Peters, Findings of ACL 2021]

An interesting pairing of stories, this little flick manages to bring together seemingly different characters and story lines all in the backdrop of WWII and succeeds in tying them together without losing the audience. I was impressed by the depth portrayed by the different characters and also by how much I really felt I understood them and their motivations, even though the time spent on the development of each character was very limited. The outstanding acting abilities of the individuals involved with this picture are easily noted. A fun, stylized movie with a slew of comic moments and a bunch more head shaking events. 7/10

Original prediction: positive

89 of 153

How MiCE Edits Can Be Used?

MiCE’s edits can offer hypotheses about model “bugs”

89

[Ross, Marasović, Peters, Findings of ACL 2021]

An interesting pairing of stories, this little flick manages to bring together seemingly different characters and story lines all in the backdrop of WWII and succeeds in tying them together without losing the audience. I was impressed by the depth portrayed by the different characters and also by how much I really felt I understood them and their motivations, even though the time spent on the development of each character was very limited. The outstanding acting abilities of the individuals involved with this picture are easily noted. A fun, stylized movie with a slew of comic moments and a bunch more head shaking events. 7/10 4/10

MiCE’s edit contrast prediction (negative)

90 of 153

How MiCE Edits Can Be Used?

MiCE’s edits can offer hypotheses about model “bugs”

90

[Ross, Marasović, Peters, Findings of ACL 2021]

An interesting pairing of stories, this little flick manages to bring together seemingly different characters and story lines all in the backdrop of WWII and succeeds in tying them together without losing the audience. I was impressed by the depth portrayed by the different characters and also by how much I really felt I understood them and their motivations, even though the time spent on the development of each character was very limited. The outstanding acting abilities of the individuals involved with this picture are easily noted. A fun, stylized movie with a slew of comic moments and a bunch more head shaking events. 7/10 4/10

MiCE’s edit contrast prediction (negative)

91 of 153

How MiCE Edits Can Be Used?

Test the hypothesis using MiCE’s edits:

  1. Filter instances with edits smaller than ≤ 0.05�
  2. Select tokens that are removed/inserted more than expected given their frequency in the original IMDB inputs

MiCE’s edits can offer hypotheses about model “bugs”

Hypothesis:�Model learned to rely heavily on numerical ratings ⭐

91

[Ross, Marasović, Peters, Findings of ACL 2021]

92 of 153

✅ We present the first method for contrastive editing of text beyond

binary classification

✅ Contrastive editing is already achieving decent performance �

92

93 of 153

The maximum number of iterations for a single instance: �

# binary search levels s ⨉ # samples at each maskin position m + �

beam size b # binary search levels s ⨉ # samples at each masking position m ⨉ # of rounds =

��4 ⨉ 15 + 3 ⨉ 4 ⨉ 15 ⨉ 2 = 420

That’s a lot, and also there is no guarantee that a smaller contrastive edit does not exist

93

first round

other rounds

[Ross, Marasović, Peters, Findings of ACL 2021]

94 of 153

✅ We present the first method for contrastive editing of text beyond

binary classification

✅ Contrastive editing is already achieving decent performance �Needed improvements:

  • less iterations
  • more precise minimality

94

95 of 153

Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is

Contrastive explanations: explain how to minimally modify the input to change the prediction to something else

95

Extend models to self-explain:�predict & elaborate on the prediction

Intuitive explanations motivated by frameworks of explainability in social sciences

Help them create a mental model about how to interact with AI

Supporting Users’ Agency

96 of 153

�Intuitive Interaction:

What is Next?

96

97 of 153

97

Although local explanations are specifically motivated for people to use, there is no convincing evidence yet that local explanations help people who are using language technology

98 of 153

An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and therefore accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.��Trust does not exist if the human does not perceive risk.��Human’s contractual trust in AI is warranted if it is caused by trustworthiness in AI. Otherwise, human’s trust in AI is unwarranted.

98

[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]

99 of 153

Trust does not exist if the human does not perceive risk, but…��Researchers focus on grand AI challenges that people are good at (e.g., commonsense QA, “Where is a frisbee in play likely to be?”)��Researchers focus use simple tasks that people don’t need help with (e.g., claim verification against a very short text)

99

100 of 153

Future Direction:

How to measure utility of explanations?

What are potentially useful language applications & who is targeted audience?�(e.g., journalist and automatic fact checking)�

How explanations might help people using these applications?�(e.g., by helping them verify information faster with equal accuracy) �

Test them exactly for those purposes

100

101 of 153

101

(not) true because {explanation}

claim & documents

(not) true

Expert / Journalist

💰💰💰💰💰

102 of 153

102

(not) true because {explanation}

claim & documents

(not) true

Crowdsourcing??

💰💰

103 of 153

An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and therefore accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.��Trust does not exist if the human does not perceive risk.��Human’s contractual trust in AI is warranted if it is caused by trustworthiness in AI. Otherwise, human’s trust in AI is unwarranted.

103

[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]

104 of 153

Future Direction:

Can lay people measure utility of explanations?

�Can we use games with a purpose (GWAP) to simulate risk?��A person plays a role of a journalist; their character:

  • Is invited to a 1-1 meeting by their manager where they talk about their bad performance :( �
  • Does’t get a promotion because they cited incorrect information following AI’s advice :(�
  • Is discussed on social media because they stated a wrong fact :(

104

105 of 153

105

Domain Experts

AI Ethics Researchers

HCI Researchers

Evaluation Metric

Evaluation

People (GWAP)

Fairness Researchers

AI/NLP/CV Researchers

106 of 153

106

Domain Experts

AI Ethics Researchers

HCI Researchers

Evaluation Metric

Evaluation

People (GWAP)

Fairness Researchers

AI/NLP/CV Researchers

Modeling

107 of 153

Future Direction:

How to provide explanations for many modalities and query types, while using only a few human-authored explanations as supervision?

text

images

videos

speech

.� .

.

Query Type 1

Query Type 2

Query Type 3

108 of 153

Future Direction:

How to model and evaluate explainability as a conversational model?

Explanations are a transfer of knowledge, presented as part of an interaction

  • Personalized & contextualized explainability�
  • Text generation constrained on the model’s beliefs about the explainee’s background�
  • Challenges of dialog & multi-turn evaluation,�and trade-offs between privacy & personalization

108

Explanation Type 1

Explanation Type 2

Query Type 1

109 of 153

109

Robustness

Quality & Integrity of Data

Supporting Users’ Agency

[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]

[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]

[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]

[Marasović et al.; LILT 2016]

[Marasović and Frank; NAACL 2018]

[Zopf, …, Marasović, …, Frank; SNAMS 2018]

Green AI

My work: Build AI Trustworthy to These Contracts

110 of 153

110

Robustness

Quality & Integrity of Data

Supporting Users’ Agency

[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]

[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]

[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]

[Marasović et al.; LILT 2016]

[Marasović and Frank; NAACL 2018]

[Zopf, …, Marasović, …, Frank; SNAMS 2018]

Green AI

My work: Build AI Trustworthy to These Contracts

111 of 153

Future Direction:

Understanding multimodal & few-shot learning beyond explainability is crucial

111

text

images

videos

speech

Query Type 1

Query Type 2

Query Type 3

112 of 153

Future Direction:

Understanding multimodal & few-shot learning beyond explainability is crucial

112

text

images

videos

speech

Query Type 1

Query Type 2

Query Type 3

  • Do models copy from data or learn generalizable abstractions?�

113 of 153

Future Direction:

Understanding multimodal & few-shot learning beyond explainability is crucial

113

text

images

videos

speech

Query Type 1

Query Type 2

Query Type 3

  • Do models copy from data or learn generalizable abstractions?�
  • Is there an equally powerful but more data-efficient architecture than transformers?�

114 of 153

Future Direction:

Understanding multimodal & few-shot learning beyond explainability is crucial

114

text

images

videos

speech

Query Type 1

Query Type 2

Query Type 3

  • Do models copy from data or learn generalizable abstractions?�
  • Is there an equally powerful but more data-efficient architecture than transformers?�
  • If not, can we prove the causal relation between performance improvements & increasing the model and data size?�

115 of 153

Future Direction:

Understanding multimodal & few-shot learning beyond explainability is crucial

115

text

images

videos

speech

Query Type 1

Query Type 2

Query Type 3

  • Do models copy from data or learn generalizable abstractions?�
  • Is there an equally powerful but more data-efficient architecture than transformers?�
  • If not, can we prove the causal relation between performance improvements & increasing the model and data size?�
  • Is there more information-efficient data?

116 of 153

116

Domain Experts

AI Ethics Researchers

HCI Researchers

Evaluation Metric

Evaluation

People (GWAP)

Fairness Researchers

AI/NLP/CV Researchers

Modeling

Multimodality

Few-Shot Learning

Conversational AI

117 of 153

117

Robustness

Quality & Integrity of Data

Supporting Users’ Agency

[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]

[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]

[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]

[Marasović et al.; LILT 2016]

[Marasović and Frank; NAACL 2018]

[Zopf, …, Marasović, …, Frank; SNAMS 2018]

Green AI

[Marasović and Frank; NAACL 2018]

[Zopf, …, Marasović, …, Frank; SNAMS 2018]

118 of 153

118

. . .

THANK YOU!

YOUR QUESTION ⤵

Why this answer?

How to change the answer?

What if I change the input in this way?

119 of 153

References

119

120 of 153

References

120

121 of 153

MiCE Details

121

122 of 153

122

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample m spans at masked positions

s different values of n to minimize the edit*

* s=4 in the paper

How to pick which values for n?

Binary search on [0,55]

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

123 of 153

123

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample m spans at masked positions

s different values of n to minimize the edit*

* s=4 in the paper

How to pick which values for n?

Binary search on [0,55]

Start: n(1)=27.5%�

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

124 of 153

124

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample m spans at masked positions

s different values of n to minimize the edit*

* s=4 in the paper

How to pick which values for n?

Binary search on [0,55]

Start: n(1)=27.5%�

  • If a contrastive edit found: n(2)=13.75%

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

125 of 153

125

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample m spans at masked positions

s different values of n to minimize the edit*

* s=4 in the paper

How to pick which values for n?

Binary search on [0,55]

Start: n(1)=27.5%�

  • If a contrastive edit found: n(2)=13.75%

��

  • If a contrastive edit not found: n(2)=41.25%

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

126 of 153

126

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample m spans at masked positions

s different values of n to minimize the edit*

* s=4 in the paper

How to pick which values for n?

Binary search on [0,55]

Start: n(1)=27.5%�

  • If a contrastive edit found: n(2)=13.75%
    • If a contrastive edit found: n(3)=6.875%

  • If a contrastive edit not found: n(2)=41.25%
    • If a contrastive edit found: n(3)=20.625%

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

127 of 153

127

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample m spans at masked positions

s different values of n to minimize the edit*

* s=4 in the paper

How to pick which values for n?

Binary search on [0,55]

Start: n(1)=27.5%�

  • If a contrastive edit found: n(2)=13.75%
    • If a contrastive edit found: n(3)=6.875%
    • If a contrastive edit not found: n(3)=20.625%

  • If a contrastive edit not found: n(2)=41.25%
    • If a contrastive edit found: n(3)=20.625%
    • If a contrastive edit not found: n(3)=48.125%

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

128 of 153

128

How to pick masking positions?

Based on token importance for the original prediction

�Rank input tokens based on the gradient magnitude of the model we’re explaining

�Mask top-n% of ranked tokens

  • Prepend the contrast label to the input
  • Mask n% of the input tokens
  • Sample m spans at masked positions

s different values of n to minimize the edit*

* s=4 in the paper

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

129 of 153

129

Can a pretrained model without any additional tweaks fill in the spans?

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

130 of 153

130

Can a pretrained model without any additional tweaks fill in the spans?

We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label

(standard masking) Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...

(targeted masking) label: negative input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

131 of 153

131

Can a pretrained model without any additional tweaks fill in the spans?

We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label

We find that labels predicted by the model we’re explaining can be used in this step without a big loss in performance

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

132 of 153

132

Can a pretrained model without any additional tweaks fill in the spans?

We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label

We find that labels predicted by the model we’re explaining can be used in this step without a big loss in performance

Gradient-based masking in this step gives better performance

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

133 of 153

133

Can a pretrained model without any additional tweaks fill in the spans?

We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label

We find that labels predicted by the model we’re explaining can be used in this step without a big loss in performance

Gradient-based masking in this step gives better performance

MiCE is a two-stage approach to generating contrastive edits

Stage 1: prepare an editor

Stage 2: make edits guided with gradients & logits of the model we’re explaining

Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.

134 of 153

Categorization of Current Methods for Contrastive Explanations in NLP

134

135 of 153

NLP is starting to pay attention!

COLING 2020 ⇾ Yang et al. Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification.��TACL 2021 ⇾ Jacovi and Goldberg. Aligning Faithful Interpretations with their Social Attribution.��(Findings of) ACL 2021

Chen et al. KACE: Generating Knowledge-Aware Contrastive Explanations for NLI.�Ross et al. Explaining NLP Models via Minimal Contrastive Editing (MiCE).�Paranjape et al. Prompting Contrastive Explanations for Commonsense Reasoning Tasks.

Wu et al. Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models.��EMNLP 2021 ⇾ Jacovi et al. Contrastive Explanations for Model Interpretability.

135

136 of 153

✅ Almost all of these papers begin by citing Miller’s overview of� frameworks of explanations from social science

Are technical proposals the same?

136

137 of 153

Contrastive Explanations of NLP Models

Contrastive vector representation:

A dense representation of the input that captures latent features that differentiate two classes�Jacovi et al. EMNLP 2021.

...abstract them into templates, automatically fill in the templates (template-based infilling)�

Paranjape et al. Findings of ACL 2021.

Contrastive input editing:

Minimal edits to the input that change model output to the contrast case

Yang et al. COLING 2020.

Jacovi and Goldberg. TACL 2021.�Ross et al. Findings of ACL 2021.�Wu et al. ACL 2021.

Collect free-text human contrastive explanations, ...

137

...and generate them left-to-right Chen et al. ACL 2021.

138 of 153

Contrastive Explanations of NLP Models

Contrastive vector representation:

A dense representation of the input that captures latent features that differentiate two classes�Jacovi et al. EMNLP 2021.

...abstract them into templates, automatically fill in the templates (template-based infilling)�

Paranjape et al. Findings of ACL 2021.

Contrastive input editing:

Minimal edits to the input that change model output to the contrast case

Yang et al. COLING 2020.

Jacovi and Goldberg. TACL 2020.�Ross et al. Findings of ACL 2021.�Wu et al. ACL 2020.

Collect free-text human contrastive explanations, ...

138

...and generate them left-to-right Chen et al. ACL 2021.

139 of 153

Contrastive Explanations via Contrastive Editing

The key idea:

“Why P not Q?” ⇒ “How to change the answer from P to Q?”� ⇒ By making a contrastive minimal edit

A minimal edit to the input that causes the model output to change to the contrast case has hallmark characteristics of a human contrastive explanation:

cites contrastive features

selects a few relevant causes

139

Ross, Marasović, Peters. MiCE: Explaining NLP Models via Minimal Contrastive Editing. Findings of ACL 2021.

140 of 153

Contrastive Explanations via Contrastive Editing

140

Ross, Marasović, Peters. MiCE: Explaining NLP Models via Minimal Contrastive Editing. Findings of ACL 2021.

Context:

...Dear Ann, I hope that you and your children will be here in two weeks. My husband and I will go to meet you at the train station. Our town is small...

MiCE-Edited Context:

...Dear Ann, I hope that you and your children will be here in two weeks. My husband and I will go to meet you at the train station your home on foot. Our town house is small...

Question:

Ann and her children are going to Linda’s home ____.�

(a) by bus (b) by car (c) on foot (d) by train

Why “by train” (d) and not “on foot” (c)?�How to change the answer from “by train” (d) to “on foot” (c)?

141 of 153

Contrastive Explanations of NLP Models

Contrastive vector representation:

A dense representation of the input that captures latent features that differentiate two classes�Jacovi et al. EMNLP 2021.

...abstract them into templates, automatically fill in the templates (template-based infilling)�

Paranjape et al. Findings of ACL 2021.

Contrastive input editing:

Automatic edits to the input that change model output to the contrast case

Yang et al. COLING 2020.

Jacovi and Goldberg. TACL 2020.�Ross et al. Findings of ACL 2021.�Wu et al. ACL 2020.

Collect free-text human contrastive explanations, ...

141

...and generate them left-to-right Chen et al. ACL 2021.

142 of 153

Contrastive Explanations via Conditional Generation

The key idea (IMO):

Contrastive edits could still not be immediately understandable (cognitive load could still be notable) �

“Why P not Q?” ⇒ Generate free-text contrastive explanations

Example: The model predicts “by train” because the context mentions meeting at “the train station”. If the context had said that they will meet at “your home on foot” the prediction would be “on foot”.

142

143 of 153

Contrastive Explanations via Conditional Generation

Step 1: Generate contrastive edits

(1.a) Highlight important tokens

(1.b) Replace important tokens with WordNet hypernyms and hyponyms

(1.c) Minimize the loss between the predicted and contrast label for examples in (1b)

(1.d) Minimize the distance between the original and edited examples in (1b)

(1.e) Maximize the diversity of edited examples in (1b)

Step 2: Compose a free-text contrastive explanation by generating “Why P” and “Why not Q” explanations from two supervised models, given the original instance, the contrastively edited instance (Step 1), and external knowledge

143

144 of 153

Contrastive Explanations of NLP Models

Contrastive vector representation:

A dense representation of the input that captures latent features that differentiate two classes�Jacovi et al. EMNLP 2021.

...abstract them into templates, automatically fill in the templates (template-based infilling)�

Paranjape et al. Findings of ACL 2021.

Contrastive input editing:

Automatic edits to the input that change model output to the contrast case

Yang et al. COLING 2020.

Jacovi and Goldberg. TACL 2020.�Ross et al. Findings of ACL 2021.�Wu et al. ACL 2020.

Collect free-text human contrastive explanations, ...

144

...and generate them left-to-right Chen et al. ACL 2021.

145 of 153

Contrastive Explanations via Template-Based Infilling

The key idea (IMO): �

“Why P not Q?” ⇒ Develop templates (prompts) to retrieve “contrastive knowledge”* –� a comparison of P and Q along a distinguishing attribute – from a� pretrained model

* Example: Peanuts are salty while raisins tend to be sweet.

145

146 of 153

Contrastive Explanations via Template-Based Infilling

Data Step 1: Collect human-written free-text contrastive explanations

Data Step 2: Abstract them into templates with placeholders

146

Human contrastive explanation: Ruler is hard while a ribbon is flexible.

Template:

P is ___ while Q is ___

How to tie pieces of paper together?

(a) Thread ruler through the holes.

(b) Thread ribbon through the holes. [correct]

147 of 153

Contrastive Explanations via Template-Based Infilling

Modeling Step 1:

Generate contrastive knowledge by filling in the placeholders in explanation templates

147

To prepare the puff pastry for you pie, line a baking sheet with parchment. Then ___

(a) Unroll the pastry, lay it over baking twine. [correct]

(b) Unroll the pastry, lay it over fishing line.

Contrastive knowledge:

Baking twine is used in baking while fishing line is used in fishing. �Baking twine takes longer to catch fish than fishing line.

Baking twine can cause fire while fishing line results in tangling.

...

Templates:

P is ___ while Q is ___

P takes longer to ___ that Q

P can cause ___ while Q results in ___

...

148 of 153

Contrastive Explanations via Template-Based Infilling

Modeling Step 2:

Augment the input with contrastive knowledge and make a prediction with the same model

148

To prepare the puff pastry for you pie, line a baking sheet with parchment. Then unroll the pastry, lay it over baking twine.

Contrastive knowledge:

Baking twine is used in baking while fishing line is used in fishing. �Baking twine takes longer to catch fish than fishing line.

Baking twine can cause fire while fishing line results in tangling.

...

model scores (context ,

answer candidate , contrastive knowledge ) tuples

The highest scoring explanation is THE explanation

To prepare the puff pastry for you pie, line a baking sheet with parchment. Then unroll the pastry, lay it over fishing line.

149 of 153

Contrastive Explanations of NLP Models

Contrastive vector representation:

A dense representation of the input that captures latent features that differentiate two classes�Jacovi et al. EMNLP 2021.

...abstract them into templates, automatically fill in the templates (template-based infilling)�

Paranjape et al. Findings of ACL 2021.

Contrastive input editing:

Automatic edits to the input that change model output to the contrast case

Yang et al. COLING 2020.

Jacovi and Goldberg. TACL 2020.�Ross et al. Findings of ACL 2021.�Wu et al. ACL 2020.

Collect free-text human contrastive explanations, ...

149

...and generate them left-to-right Chen et al. ACL 2021.

150 of 153

Contrastive Explanations via Contrastive Projection

The key idea (IMO):

“Why P not Q?” ⇒ Select latent contrastive features in the space of hidden� representations instead of selecting them in the input (discrete� tokens)

150

151 of 153

Contrastive Explanations via Contrastive Projection

Thesis: Entailment because of a high lexical overlap between the premise and hypothesis

Overlap concept: All of the content words in the hypothesis also exist in the premise

Causal Intervention (Why P?)Study how model logits change by removing all features in the hidden representation� indicative of the overlap concept

151

152 of 153

Contrastive Explanations via Contrastive Projection

Thesis: Entailment because of a high lexical overlap between the premise and hypothesis

Overlap concept: All of the content words in the hypothesis also exist in the premise

Causal Intervention (Why P?)Study how model logits change by removing all features in the hidden representation� indicative of the overlap concept

Doesn’t answer if (a subset of) these features differentiate entailment from other classes

152

153 of 153

Contrastive Explanations via Contrastive Projection

Thesis: Entailment because of a high lexical overlap between the premise and hypothesis

Overlap concept: All of the content words in the hypothesis also exist in the premise

Contrastive Intervention (Why P not Q?)��Project the hidden representation to the space of contrastive feature, i.e., remove� hidden features that the model doesn’t use to differentiate class P (entailment) from class� Q (contradiction or neutral)��Study how model logits change by removing all features in the contrastively projectedhidden representation indicative of the overlap concept

153