Self-Explaining for Intuitive Interaction with AI
Ana Marasović�
Allen Institute for AI (AI2) ⨉ AllenNLP ⨉ University of Washington
2
AI technology has become an integral part of most people’s daily lives
3
+
text + labels
neural network
+
(not) spam
4
+
(not) sick
+
text + labels
neural network
(not) sick
AI Developer
Domain Experts�(Doctors)
People Affected by AI
(Patients)
5
Increasingly harder to opt out
Doctors
Patients
Promised Benefits
Risks
Challenge: How to maximize the benefits of AI systems while preventing and minimizing risks?
Approach: Build systems involving AI that are able to maintain contracts that are created to enable people to have appropriate confidence in the AI’s development and its applications
6
An AI model is trustworthy to a given contract if it is capable of maintaining the contract.
��
7
[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]
An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.
8
[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]
An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.��Trust does not exist if the human does not perceive risk.��
9
[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]
An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.��Trust does not exist if the human does not perceive risk.��Human’s contractual trust in AI is warranted if it is caused by AI’s trustworthiness. Otherwise, human’s trust in AI is unwarranted.
10
[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]
11
My work: Build AI Trustworthy to These Contracts
12
Robustness
[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]
[Marasović et al.; LILT 2016]
My work: Build AI Trustworthy to These Contracts
13
Robustness
Supporting Users’ Agency
[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]
�
[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]
[Marasović et al.; LILT 2016]
My work: Build AI Trustworthy to These Contracts
14
Robustness
Quality & Integrity of Data
Supporting Users’ Agency
[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]
�
[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]
[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]
[Marasović et al.; LILT 2016]
My work: Build AI Trustworthy to These Contracts
15
Robustness
Quality & Integrity of Data
Supporting Users’ Agency
[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]
�
[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]
[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]
[Marasović et al.; LILT 2016]
[Marasović and Frank; NAACL 2018]
[Zopf, …, Marasović, …, Frank; SNAMS 2018]
Green AI
My work: Build AI Trustworthy to These Contracts
16
Robustness
Quality & Integrity of Data
Supporting Users’ Agency
[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]
�
[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]
[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]
[Marasović et al.; LILT 2016]
[Marasović and Frank; NAACL 2018]
[Zopf, …, Marasović, …, Frank; SNAMS 2018]
Green AI
My work: Build AI Trustworthy to These Contracts
17
Extend models to self-explain:�predict & elaborate on the prediction
Help them create a mental model about how to interact with AI
Intuitive explanations motivated by frameworks of explainability in social sciences
Supporting Users’ Agency
Local explanations: Justifications of models’ individual predictions
A dominant ML/NLP perspective on local explanations:
18
Insights from Social Science
Explanations are selected (in a biased manner) because:
19
[Miller, 2019]
20
+ documents from� the Web
misleading
21
+ documents from� the Web
misleading
Why misleading?�Am I missing out on this? Is the flagging wrong (again)?
Self-explaining with free-text explanations: �Given in plain language, immediately provide the gist of why is the input labeled as it is
22
Misleading because not every American over 65 can get these cards since they are not provided by Medicare, the federal health insurance program for senior citizens. They are offered as a benefit to some customers by private insurance companies that sell Medicare Advantage plans. The cards are available in limited geographic areas. Only the chronically ill qualify to use the cards for items such as food and produce.
+ documents from� the Web
💡
Insights from Social Science
Explanations are selected (in a biased manner) because:
23
[Miller, 2019]
Insights from Social Science
24
Explanations are contrastive = responses to:
“Why P rather than Q?”
“What changes to the input would hypothetically change the answer from P to Q?”
where P is an observed event (fact), and Q an imagined, counterfactual event that did not occur (foil)
[Miller, 2019]
25
Why is my post misleading?�How can I change it to make it clear/correct?
misleading
+ documents from� the Web
Contrastive explanations: explain how to minimally modify the input to change the prediction to something else
26
Why is my post misleading?�How can I change it to make it clear/correct?
I AM SO HAPPY I JUST LEARNED THIS!
As an American over 65, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!
misleading
Contrastive explanations: explain how to minimally modify the input to change the prediction to something else
27
I AM SO HAPPY I JUST LEARNED THIS!
As an American over 65, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!
I AM SO HAPPY I JUST LEARNED THIS!
As an American over 65 someone who has private health insurance with the Medicare Advantage plan, lives in X, and is chronically ill, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!
misleading
correct
💡
28
People assign human-like traits to AI models (anthropomorphic bias)
�
⇒ People expect explanations of models’ behavior� to follow the same conceptual framework used to� explain human behavior
�
⇒ No users’ agency otherwise
Why?
How to?
“Understanding how people define, generate, select, evaluate, and present explanations seems almost essential”
[Miller, 2019]
Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is
Contrastive explanations: explain how to minimally modify the input to change the prediction to something else
29
Extend models to self-explain:�predict & elaborate on the prediction
Intuitive explanations motivated by frameworks of explainability in social sciences
Help them create a mental model about how to interact with AI
Supporting Users’ Agency
30
Contrastive explanations: explain how to minimally modify the input to change the prediction to something else
How to generate contrastive explanations for standard NLP tasks such as sentiment classification, document classification, or multiple-choice question answering?
Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is
How to generate free-text explanations for visual reasoning tasks, e.g., for answering questions about images that require commonsense understanding?
Explaining
Visual Reasoning
31
Marasović et al. (2020)
Natural Language Rationales with Full-Stack Visual Reasoning: �From Pixels to Semantic Frames to Commonsense Graphs
32
Answering “why” by highlighting
Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story that thinks it can use various explosions to make it interesting, "the specialist" is about as exciting as an episode of "dragnet," and about as well acted. Even some attempts at film noir mood are destroyed by a sappy script, stupid and unlikable characters, and just plain nothingness. Who knew a big explosion could be so boring and anti-climactic?
Label: negative sentiment
33
[Zaidan et al., 2007]
[Lei et al., 2016]
Answering “why” by highlighting
34
[Adebayo et al., 2018]
35
[Zellers et al., 2019]
Question: What is going to happen next?
Answer: [person2] holding the photo will tell [person4] how cute their children are.
Free-text explanation: It looks like [person4] is showing the photo to [person2], and they will want to be polite.
Answering “why” by highlighting…�
…doesn’t work when the reason is not explicitly stated in the input
36
Free-text explanation: �
We cannot highlight this in the input!
Answering “why” by highlighting…�
…doesn’t work when the reason is not explicitly stated in the input
[Zellers et al., 2019]
Question: Where is a frisbee in play likely to be?
Answer choices: outside, park, roof, tree, air
Free-text explanation: A frisbee is a concave plastic disc designed for skimming through the air as an outdoor game so while in play it is most likely to be in the air.
37
Answering “why” by highlighting…�
…doesn’t work when the reason is not explicitly stated in the input
[Aggarwal et al., 2021]
How to generate free-text explanations?
Step 1:
Find some human-written explanations♢
Step 2:
Finetune a pretrained transformer-based generation models (GPT-2)
38
♢ [Wiegreffe* and Marasović*, NeurIPS 2021]
Pretrain-Finetune Paradigm
39
pretrain model
finetune model
text
text + labels
Option 1: mask & infill a word/span�Option 2: generate next word
standard supervised
training
Pretrain-Finetune Paradigm
40
pretrain model
finetune model
text + labels
text
Pretrain-Finetune Paradigm
41
pretrain model
finetune model
text + labels
text
How to generate free-text explanations?
Step 1:
Find some human-written explanations♢
Step 2:
Finetune a pretrained transformer-based generation models (T5, GPT-2/Neo)
42
♢ [Wiegreffe* and Marasović*, NeurIPS 2021]
Transformer
43
How to generate free-text explanations?
Question: Where is a frisbee in play likely to be?
Answer choices: outside, park, roof, tree, air
Free-text explanation: A frisbee is a concave plastic disc designed for skimming through the air as an outdoor game so while in play it is most likely to be in the air.
44
[Aggarwal et al., 2021]
Generating
Explanations
45
question: where is a frisbee in play likely to be? choice: outside choice: park choice: roof choice: tree choice: air
[Marasović et al., Findings of EMNLP 2020]
Generating
Explanations
46
question: where is a frisbee in play likely to be? choice: outside choice: park choice: roof choice: tree choice: air
Air because a frisbee is a concave plastic disc designed for skimming through the air as an outdoor game so while in play it is most likely to be in the air.
[Marasović et al., Findings of EMNLP 2020]
47
question : where
[Marasović et al., Findings of EMNLP 2020]
48
question : where
???
???
???
???
[Marasović et al., Findings of EMNLP 2020]
Key challenge: image representation beyond explicit content
49
[person4]
photo
Question: What is going to happen next?
Answer: He is telling the waitress that the person on the left ordered the pancakes.
Free-text explanation: It looks like [person4] is showing the photo to [person2], and they will want to be polite.
showing
[person2] will want to be polite.
Raw features
Relations (semantics)
Inferences (pragmatics)
[Marasović et al., Findings of EMNLP 2020]
50
object detection♢
grounded situation recognitionO
visual commonsense graphロ
[person2] will want to be polite.
Raw features
Relations (semantics)
Inferences (pragmatics)
♢ [Ren et al., 2015]
O [Pratt et al., 2020]
ロ [Park et al., 2020]
[person4]
showing
photo
51
question : where
???
???
???
???
object detection
grounded situation recognition
visual commonsense graph
[Marasović et al., Findings of EMNLP 2020]
Back to Basics: Object Detection
52
[Ren et al., 2015]
Output:�
for each detected object
Uniform fusion: Prepend object labels to text
53
cup question :
image-related features
text-related features
Pro: very simple
�Con: prone to propagation of errors from external
vision models
[Marasović et al., Findings of EMNLP 2020]
Hybrid fusion: Prepend object vectors to text embeddings
54
[Marasović et al., Findings of EMNLP 2020]
Hybrid fusion: Prepend object vectors to text embeddings
55
Pro: less error-prone
�Con: image and text embeddings come from different vector spaces
box feature vector ➞ project
box’s coordinates ➞ project
sum
question :
text-related features
[Marasović et al., Findings of EMNLP 2020]
Tasks & Datasets
56
[Marasović et al., Findings of EMNLP 2020]
Tasks & Datasets
Visual Question Answering
57
[Marasović et al., Findings of EMNLP 2020]
Question: Where is the dog laying?�
Answer: sidewalk��Free-text explanation: The white dog lays next to the bicycle on the sidewalk.
Tasks & Datasets
Visual Commonsense Reasoning
58
[Marasović et al., Findings of EMNLP 2020]
Question: What is going to happen next?�
Answer: [person2] holding the photo will tell [person4] how cute their children are.��Free-text explanation: It looks like [person4] is showing the photo to [person2], and they will want to be polite.
Tasks & Datasets
Visual-Textual Entailment
59
[Marasović et al., Findings of EMNLP 2020]
Hypothesis: A man is sitting down in a rocking chair.�
Label: contradiction��Free-text explanation: The man is not sitting on a rocking chair, he is sitting in front of a building.
Baseline
Research Questions:
�Given the correct answer in the input, do proposed visual features help GPT-2 generate explanations that…
�Baseline: GPT-2 without any information about the image
60
[Marasović et al., Findings of EMNLP 2020]
Evaluation Metric
3 crowdworkers answer:
Does the explanation support a given answer in the context of the image?�
61
62
| VCR | VQA-X | E-SNLI-VE | |
| contradiction | entailment | ||
Best Features | visual commonsense graphs | grounded situation recognition | grounded situation recognition | |
Best Fusion Type | uniform | hybrid | hybrid / uniform | uniform |
RVT (Our) | 60.93 | 63.33 | 60.96 / 59.16 | 38.12 |
Baseline | 53.14 | 47.20 | 46.85 | 46.75 |
Human | 87.16 | 66.53 | 80.78 | 76.98 |
Results: Explanation Plausibility�
63
| VCR | VQA-X | E-SNLI-VE | |
| contradiction | entailment | ||
Best Features | visual commonsense graphs | grounded situation recognition | grounded situation recognition | |
Best Fusion Type | uniform | hybrid | hybrid / uniform | uniform |
RVT (Our) | 60.93 | 63.33 | 60.96 / 59.16 | 38.12 |
Baseline | 53.14 | 47.20 | 46.85 | 46.75 |
Human | 87.16 | 66.53 | 80.78 | 76.98 |
Results: Explanation Plausibility�
Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is
Contrastive explanations: explain how to minimally modify the input to change the prediction to something else
64
Extend models to self-explain:�predict & elaborate on the prediction
Intuitive explanations motivated by frameworks of explainability in social sciences
Help them create a mental model about how to interact with AI
Supporting Users’ Agency
Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is
Contrastive explanations: explain how to minimally modify the input to change the prediction to something else
65
Extend models to self-explain:�predict & elaborate on the prediction
Intuitive explanations motivated by frameworks of explainability in social sciences
Help them create a mental model about how to interact with AI
Supporting Users’ Agency
Contrastive explanations: explain how to minimally modify the input to change the prediction to something else
66
I AM SO HAPPY I JUST LEARNED THIS!
As an American over 65, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!
I AM SO HAPPY I JUST LEARNED THIS!
As an American over 65 someone who has private health insurance with the Medicare Advantage plan, lives in X, and is chronically ill, I qualified for the “Elderly Spend Card”, which pays for my groceries, my dental, and my prescription refills. All I did to qualify, was tap the image below, entered my zip and I got my flex card in the mail a week later!
misleading
correct
💡
Contrastive Explanations via Contrastive Editing
67
Contrastive Explanations via Contrastive Editing
68
Context:
...Dear Ann, I hope that you and your children will be here in two weeks. My husband and I will go to meet you at the train station. Our town is small...
MiCE-Edited Context:
...Dear Ann, I hope that you and your children will be here in two weeks. My husband and I will go to meet you at the train station your home on foot. Our town house is small...
Question:
Ann and her children are going to Linda’s home ____.�
(a) by bus (b) by car (c) on foot (d) by train
Why “by train” (d) and not “on foot” (c)?�How to change the answer from “by train” (d) to “on foot” (c)?
Example from the RACE dataset [Lai et al., 2017]�[Ross, Marasović, Peters, Findings of ACL 2021]
Deeper Into Contrastive Editing
69
Ross, Marasović, Peters (2021)
Explaining NLP Models via Minimal Contrastive Editing (MiCE)
70
Goal:
Explain a Predictor model by automatically finding a minimal edit to the input that causes the model output to change to the contrast case �
A very high-level idea of 🐭:�
71
[Ross, Marasović, Peters, Findings of ACL 2021]
72
input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...
[Ross, Marasović, Peters, Findings of ACL 2021]
73
label: positive input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...
the contrast label (foil)
[Ross, Marasović, Peters, Findings of ACL 2021]
74
label: positive input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...
the contrast label (foil)
label: positive input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...
mask n% of input tokens
[Ross, Marasović, Peters, Findings of ACL 2021]
75
label: positive input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...
the contrast label (foil)
label: positive input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...
...�
15. label: positive input: Sylvester Stallone has made some wonderful films in� his lifetime, but this has got to be one of the greatest. A totally tedious� story...
sample 15 spans at each masked position
mask n% of input tokens
76
label: positive input: Sylvester Stallone has made some crap films in his lifetime, but this has got to be one of the worst. A totally dull story...
the contrast label (foil)
label: positive input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...
...�
15. label: positive input: Sylvester Stallone has made some wonderful films in� his lifetime, but this has got to be one of the greatest. A totally tedious� story...
get the logit of the contrast label
sample 15 spans at each masked position
mask n% of input tokens
77
4*15=60 samples
✕
4 different values of n to minimize the edit
[Ross, Marasović, Peters, Findings of ACL 2021]
78
4*15=60 samples
rank 60 samples w.r.t. the logit of the contrast label
[Ross, Marasović, Peters, Findings of ACL 2021]
✕
4 different values of n to minimize the edit
79
keep top-3 samples
beam
[Ross, Marasović, Peters, Findings of ACL 2021]
4*15=60 samples
rank 60 samples w.r.t. the logit of the contrast label
✕
4 different values of n to minimize the edit
80
if a contrastive edit is found
[Ross, Marasović, Peters, Findings of ACL 2021]
keep top-3 samples
beam
4*15=60 samples
rank 60 samples w.r.t. the logit of the contrast label
✕
4 different values of n to minimize the edit
🛑
81
keep top-3 samples
beam
4*15=60 samples
rank 60 samples w.r.t. the logit of the contrast label
Repeat for every instance in the beam at most 2 more rounds
✕
4 different values of n to minimize the edit
82
Can a pretrained model without any additional tweaks fill in the spans?
We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label
(standard masking) Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...
�(targeted masking) label: negative input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
83
MiCE is a two-stage approach to generating contrastive edits
[Ross, Marasović, Peters, Findings of ACL 2021]
Tasks & Datasets
84
Results – Flip Rate
85
1.0 when we find a contrastive edit for all instances
[Ross, Marasović, Peters, Findings of ACL 2021]
Results – Edit Minimality
86
The minimum number of deletions, insertions, or substitutions required to transform the original to the
edited instance
lower is better; we change on average 18.5-33.5% of the input tokens
The size of the IMDB edits is similar to human edits*
* Compared to IMDB edits in [Gardner et al., 2020]
Results – Edit Fluency
87
1.0 when a LM loss pre- and post-editing doesn’t change
[Ross, Marasović, Peters, Findings of ACL 2021]
How MiCE Edits Can Be Used?
MiCE’s edits can offer hypotheses about model “bugs”
88
[Ross, Marasović, Peters, Findings of ACL 2021]
An interesting pairing of stories, this little flick manages to bring together seemingly different characters and story lines all in the backdrop of WWII and succeeds in tying them together without losing the audience. I was impressed by the depth portrayed by the different characters and also by how much I really felt I understood them and their motivations, even though the time spent on the development of each character was very limited. The outstanding acting abilities of the individuals involved with this picture are easily noted. A fun, stylized movie with a slew of comic moments and a bunch more head shaking events. 7/10
Original prediction: positive
How MiCE Edits Can Be Used?
MiCE’s edits can offer hypotheses about model “bugs”
89
[Ross, Marasović, Peters, Findings of ACL 2021]
An interesting pairing of stories, this little flick manages to bring together seemingly different characters and story lines all in the backdrop of WWII and succeeds in tying them together without losing the audience. I was impressed by the depth portrayed by the different characters and also by how much I really felt I understood them and their motivations, even though the time spent on the development of each character was very limited. The outstanding acting abilities of the individuals involved with this picture are easily noted. A fun, stylized movie with a slew of comic moments and a bunch more head shaking events. 7/10 4/10
MiCE’s edit ✕ contrast prediction (negative)
How MiCE Edits Can Be Used?
MiCE’s edits can offer hypotheses about model “bugs”
90
[Ross, Marasović, Peters, Findings of ACL 2021]
An interesting pairing of stories, this little flick manages to bring together seemingly different characters and story lines all in the backdrop of WWII and succeeds in tying them together without losing the audience. I was impressed by the depth portrayed by the different characters and also by how much I really felt I understood them and their motivations, even though the time spent on the development of each character was very limited. The outstanding acting abilities of the individuals involved with this picture are easily noted. A fun, stylized movie with a slew of comic moments and a bunch more head shaking events. 7/10 4/10
MiCE’s edit ✕ contrast prediction (negative)
How MiCE Edits Can Be Used?
Test the hypothesis using MiCE’s edits:
MiCE’s edits can offer hypotheses about model “bugs”
Hypothesis:�Model learned to rely heavily on numerical ratings ⭐
91
[Ross, Marasović, Peters, Findings of ACL 2021]
✅ We present the first method for contrastive editing of text beyond
binary classification
✅ Contrastive editing is already achieving decent performance ��
92
The maximum number of iterations for a single instance: �
# binary search levels s ⨉ # samples at each maskin position m + �
beam size b ⨉ # binary search levels s ⨉ # samples at each masking position m ⨉ # of rounds =
��4 ⨉ 15 + 3 ⨉ 4 ⨉ 15 ⨉ 2 = 420
That’s a lot, and also there is no guarantee that a smaller contrastive edit does not exist
93
first round
other rounds
[Ross, Marasović, Peters, Findings of ACL 2021]
✅ We present the first method for contrastive editing of text beyond
binary classification
✅ Contrastive editing is already achieving decent performance ��❗ Needed improvements:
94
Free-text explanations: given in plain language, immediately provide the gist of why is the input labeled as it is
Contrastive explanations: explain how to minimally modify the input to change the prediction to something else
95
Extend models to self-explain:�predict & elaborate on the prediction
Intuitive explanations motivated by frameworks of explainability in social sciences
Help them create a mental model about how to interact with AI
Supporting Users’ Agency
�Intuitive Interaction:
What is Next?
96
97
Although local explanations are specifically motivated for people to use, there is no convincing evidence yet that local explanations help people who are using language technology
An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and therefore accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.��Trust does not exist if the human does not perceive risk.��Human’s contractual trust in AI is warranted if it is caused by trustworthiness in AI. Otherwise, human’s trust in AI is unwarranted.
98
[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]
Trust does not exist if the human does not perceive risk, but…��Researchers focus on grand AI challenges that people are good at (e.g., commonsense QA, “Where is a frisbee in play likely to be?”)��Researchers focus use simple tasks that people don’t need help with (e.g., claim verification against a very short text)
99
Future Direction: �
How to measure utility of explanations?
What are potentially useful language applications & who is targeted audience?�(e.g., journalist and automatic fact checking)�
How explanations might help people using these applications?�(e.g., by helping them verify information faster with equal accuracy) �
Test them exactly for those purposes
100
101
(not) true because {explanation}
claim & documents
(not) true
Expert / Journalist
💰💰💰💰💰
102
(not) true because {explanation}
claim & documents
(not) true
Crowdsourcing??
💰💰
An AI model is trustworthy to a given contract if it is capable of maintaining the contract.��If a human perceives that an AI model is trustworthy to a contract, and therefore accepts vulnerability to AI’s actions, then the human trusts AI contractually. Otherwise, human distrusts AI contractually.��Trust does not exist if the human does not perceive risk.��Human’s contractual trust in AI is warranted if it is caused by trustworthiness in AI. Otherwise, human’s trust in AI is unwarranted.
103
[Jacovi, Marasović, Miller, Goldberg; FAccT 2021]
Future Direction: �
Can lay people measure utility of explanations?
�Can we use games with a purpose (GWAP) to simulate risk?��A person plays a role of a journalist; their character:
104
105
Domain Experts
AI Ethics Researchers
HCI Researchers
Evaluation Metric
Evaluation
People (GWAP)
Fairness Researchers
AI/NLP/CV Researchers
106
Domain Experts
AI Ethics Researchers
HCI Researchers
Evaluation Metric
Evaluation
People (GWAP)
Fairness Researchers
AI/NLP/CV Researchers
Modeling
Future Direction: �
How to provide explanations for many modalities and query types, while using only a few human-authored explanations as supervision?
text
images
videos
speech
.� .
.
Query Type 1
Query Type 2
Query Type 3
Future Direction: �
How to model and evaluate explainability as a conversational model?
Explanations are a transfer of knowledge, presented as part of an interaction
108
Explanation Type 1
Explanation Type 2
Query Type 1
109
Robustness
Quality & Integrity of Data
Supporting Users’ Agency
[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]
�
[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]
[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]
[Marasović et al.; LILT 2016]
[Marasović and Frank; NAACL 2018]
[Zopf, …, Marasović, …, Frank; SNAMS 2018]
Green AI
My work: Build AI Trustworthy to These Contracts
110
Robustness
Quality & Integrity of Data
Supporting Users’ Agency
[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]
�
[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]
[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]
[Marasović et al.; LILT 2016]
[Marasović and Frank; NAACL 2018]
[Zopf, …, Marasović, …, Frank; SNAMS 2018]
Green AI
My work: Build AI Trustworthy to These Contracts
Future Direction: �
Understanding multimodal & few-shot learning beyond explainability is crucial
111
text
images
videos
speech
Query Type 1
Query Type 2
Query Type 3
Future Direction: �
Understanding multimodal & few-shot learning beyond explainability is crucial
112
text
images
videos
speech
Query Type 1
Query Type 2
Query Type 3
Future Direction: �
Understanding multimodal & few-shot learning beyond explainability is crucial
113
text
images
videos
speech
Query Type 1
Query Type 2
Query Type 3
Future Direction: �
Understanding multimodal & few-shot learning beyond explainability is crucial
114
text
images
videos
speech
Query Type 1
Query Type 2
Query Type 3
Future Direction: �
Understanding multimodal & few-shot learning beyond explainability is crucial
115
text
images
videos
speech
Query Type 1
Query Type 2
Query Type 3
116
Domain Experts
AI Ethics Researchers
HCI Researchers
Evaluation Metric
Evaluation
People (GWAP)
Fairness Researchers
AI/NLP/CV Researchers
Modeling
Multimodality
Few-Shot Learning
Conversational AI
117
Robustness
Quality & Integrity of Data
Supporting Users’ Agency
[Marasović*, Beltagy*, et al.; Under Review]�[Wiegreffe, Marasović, Smith; EMNLP 2021]�[Ross, Marasović, Peters; Findings of ACL 2021] �[Sun and Marasović; Findings of ACL 2021]�[Jacovi, Marasović, et al; FAccT 2021]�[Marasović et al; Findings of EMNLP 2020]
�
[Wiegreffe* and Marasović*; NeurIPS 2021]�[Dodge, Sap, Marasović, et al.; EMNLP 2021]�[Ning, …, Marasović, Nie; EMNLP Demo 2020]
[Hoyle, Marasović, Smith; Findings of ACL 2021]�[Gururangan, Marasović et al; ACL 2020] Honorable Mention for Best Paper�[Dasigi, Liu, Marasović, et al.; EMNLP 2019]�[Marasović et al; EMNLP 2017]�[Marasović and Frank; Repl4NLP 2016]
[Marasović et al.; LILT 2016]
[Marasović and Frank; NAACL 2018]
[Zopf, …, Marasović, …, Frank; SNAMS 2018]
Green AI
[Marasović and Frank; NAACL 2018]
[Zopf, …, Marasović, …, Frank; SNAMS 2018]
118
. . .
THANK YOU!
YOUR QUESTION ⤵
Why this answer?
How to change the answer?
What if I change the input in this way?
References
119
References
120
MiCE Details
121
122
✕
s different values of n to minimize the edit*
* s=4 in the paper
How to pick which values for n?
Binary search on [0,55]
�
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
123
✕
s different values of n to minimize the edit*
* s=4 in the paper
How to pick which values for n?
Binary search on [0,55]
�Start: n(1)=27.5%�
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
124
✕
s different values of n to minimize the edit*
* s=4 in the paper
How to pick which values for n?
Binary search on [0,55]
�Start: n(1)=27.5%�
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
125
✕
s different values of n to minimize the edit*
* s=4 in the paper
How to pick which values for n?
Binary search on [0,55]
�Start: n(1)=27.5%�
��
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
126
✕
s different values of n to minimize the edit*
* s=4 in the paper
How to pick which values for n?
Binary search on [0,55]
�Start: n(1)=27.5%�
�
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
127
✕
s different values of n to minimize the edit*
* s=4 in the paper
How to pick which values for n?
Binary search on [0,55]
�Start: n(1)=27.5%�
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
128
How to pick masking positions?
Based on token importance for the original prediction
�Rank input tokens based on the gradient magnitude of the model we’re explaining
�Mask top-n% of ranked tokens
✕
s different values of n to minimize the edit*
* s=4 in the paper
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
129
Can a pretrained model without any additional tweaks fill in the spans?
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
130
Can a pretrained model without any additional tweaks fill in the spans?
We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label
(standard masking) Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...
�(targeted masking) label: negative input: Sylvester Stallone has made some <mask> films in his lifetime, but this has got to be one of the <mask>. A totally <mask> story...
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
131
Can a pretrained model without any additional tweaks fill in the spans?
We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label
We find that labels predicted by the model we’re explaining can be used in this step without a big loss in performance
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
132
Can a pretrained model without any additional tweaks fill in the spans?
We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label
We find that labels predicted by the model we’re explaining can be used in this step without a big loss in performance
Gradient-based masking in this step gives better performance
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
133
Can a pretrained model without any additional tweaks fill in the spans?
We find it’s important to prepare the editor by finetuning it to infill masked spans given masked text and a target end-task label
We find that labels predicted by the model we’re explaining can be used in this step without a big loss in performance
Gradient-based masking in this step gives better performance
⇒ MiCE is a two-stage approach to generating contrastive edits
Stage 1: prepare an editor
Stage 2: make edits guided with gradients & logits of the model we’re explaining
Ross, Marasović, Peters. Explaining NLP Models via Minimal Contrastive Editing (MiCE). Findings of ACL 2021.
Categorization of Current Methods for Contrastive Explanations in NLP
134
NLP is starting to pay attention!
COLING 2020 ⇾ Yang et al. Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification.��TACL 2021 ⇾ Jacovi and Goldberg. Aligning Faithful Interpretations with their Social Attribution.��(Findings of) ACL 2021
⇾ Chen et al. KACE: Generating Knowledge-Aware Contrastive Explanations for NLI.��⇾ Ross et al. Explaining NLP Models via Minimal Contrastive Editing (MiCE).��⇾ Paranjape et al. Prompting Contrastive Explanations for Commonsense Reasoning Tasks.
⇾ Wu et al. Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models.��EMNLP 2021 ⇾ Jacovi et al. Contrastive Explanations for Model Interpretability.
135
✅ Almost all of these papers begin by citing Miller’s overview of� frameworks of explanations from social science
Are technical proposals the same?
136
Contrastive Explanations of NLP Models
Contrastive vector representation:
A dense representation of the input that captures latent features that differentiate two classes��Jacovi et al. EMNLP 2021.
...abstract them into templates, automatically fill in the templates (template-based infilling)�
Paranjape et al. Findings of ACL 2021.
Contrastive input editing:
Minimal edits to the input that change model output to the contrast case
Yang et al. COLING 2020.
Jacovi and Goldberg. TACL 2021.��Ross et al. Findings of ACL 2021.��Wu et al. ACL 2021.
Collect free-text human contrastive explanations, ...
137
...and generate them left-to-right Chen et al. ACL 2021.
Contrastive Explanations of NLP Models
Contrastive vector representation:
A dense representation of the input that captures latent features that differentiate two classes��Jacovi et al. EMNLP 2021.
...abstract them into templates, automatically fill in the templates (template-based infilling)�
Paranjape et al. Findings of ACL 2021.
Contrastive input editing:
Minimal edits to the input that change model output to the contrast case
Yang et al. COLING 2020.
Jacovi and Goldberg. TACL 2020.��Ross et al. Findings of ACL 2021.��Wu et al. ACL 2020.
Collect free-text human contrastive explanations, ...
138
...and generate them left-to-right Chen et al. ACL 2021.
Contrastive Explanations via Contrastive Editing
The key idea:
�“Why P not Q?” ⇒ “How to change the answer from P to Q?”� ⇒ By making a contrastive minimal edit
�A minimal edit to the input that causes the model output to change to the contrast case has hallmark characteristics of a human contrastive explanation:
⇾ cites contrastive features
⇾ selects a few relevant causes
139
Ross, Marasović, Peters. MiCE: Explaining NLP Models via Minimal Contrastive Editing. Findings of ACL 2021.
Contrastive Explanations via Contrastive Editing
140
Ross, Marasović, Peters. MiCE: Explaining NLP Models via Minimal Contrastive Editing. Findings of ACL 2021.
Context:
...Dear Ann, I hope that you and your children will be here in two weeks. My husband and I will go to meet you at the train station. Our town is small...
MiCE-Edited Context:
...Dear Ann, I hope that you and your children will be here in two weeks. My husband and I will go to meet you at the train station your home on foot. Our town house is small...
Question:
Ann and her children are going to Linda’s home ____.�
(a) by bus (b) by car (c) on foot (d) by train
Why “by train” (d) and not “on foot” (c)?�How to change the answer from “by train” (d) to “on foot” (c)?
Contrastive Explanations of NLP Models
Contrastive vector representation:
A dense representation of the input that captures latent features that differentiate two classes��Jacovi et al. EMNLP 2021.
...abstract them into templates, automatically fill in the templates (template-based infilling)�
Paranjape et al. Findings of ACL 2021.
Contrastive input editing:
Automatic edits to the input that change model output to the contrast case
Yang et al. COLING 2020.
Jacovi and Goldberg. TACL 2020.��Ross et al. Findings of ACL 2021.��Wu et al. ACL 2020.
Collect free-text human contrastive explanations, ...
141
...and generate them left-to-right Chen et al. ACL 2021.
Contrastive Explanations via Conditional Generation
The key idea (IMO):
Contrastive edits could still not be immediately understandable (cognitive load could still be notable) �
“Why P not Q?” ⇒ Generate free-text contrastive explanations
Example: The model predicts “by train” because the context mentions meeting at “the train station”. If the context had said that they will meet at “your home on foot” the prediction would be “on foot”.
142
Chen et al. KACE: Generating Knowledge-Aware Contrastive Explanations for Natural Language Inference. ACL 2021.
Contrastive Explanations via Conditional Generation
Step 1: Generate contrastive edits
(1.a) Highlight important tokens
(1.b) Replace important tokens with WordNet hypernyms and hyponyms
(1.c) Minimize the loss between the predicted and contrast label for examples in (1b)
(1.d) Minimize the distance between the original and edited examples in (1b)
(1.e) Maximize the diversity of edited examples in (1b)
Step 2: Compose a free-text contrastive explanation by generating “Why P” and “Why not Q” explanations from two supervised models, given the original instance, the contrastively edited instance (Step 1), and external knowledge
143
Chen et al. KACE: Generating Knowledge-Aware Contrastive Explanations for Natural Language Inference. ACL 2021.
Contrastive Explanations of NLP Models
Contrastive vector representation:
A dense representation of the input that captures latent features that differentiate two classes��Jacovi et al. EMNLP 2021.
...abstract them into templates, automatically fill in the templates (template-based infilling)�
Paranjape et al. Findings of ACL 2021.
Contrastive input editing:
Automatic edits to the input that change model output to the contrast case
Yang et al. COLING 2020.
Jacovi and Goldberg. TACL 2020.��Ross et al. Findings of ACL 2021.��Wu et al. ACL 2020.
Collect free-text human contrastive explanations, ...
144
...and generate them left-to-right Chen et al. ACL 2021.
Contrastive Explanations via Template-Based Infilling
The key idea (IMO): �
“Why P not Q?” ⇒ Develop templates (prompts) to retrieve “contrastive knowledge”* –� a comparison of P and Q along a distinguishing attribute – from a� pretrained model
* Example: Peanuts are salty while raisins tend to be sweet.
145
Paranjape et al. Prompting Contrastive Explanations for Commonsense Reasoning Tasks. Findings of ACL 2021.
Contrastive Explanations via Template-Based Infilling
Data Step 1: Collect human-written free-text contrastive explanations
Data Step 2: Abstract them into templates with placeholders
146
Human contrastive explanation: �Ruler is hard while a ribbon is flexible.
Template:
P is ___ while Q is ___
How to tie pieces of paper together?
(a) Thread ruler through the holes.
(b) Thread ribbon through the holes. [correct]
Paranjape et al. Prompting Contrastive Explanations for Commonsense Reasoning Tasks. Findings of ACL 2021.
Contrastive Explanations via Template-Based Infilling
Modeling Step 1:
Generate contrastive knowledge by filling in the placeholders in explanation templates
147
To prepare the puff pastry for you pie, line a baking sheet with parchment. Then ___
(a) Unroll the pastry, lay it over baking twine. [correct]
(b) Unroll the pastry, lay it over fishing line.
Contrastive knowledge:
⇾ Baking twine is used in baking while fishing line is used in fishing. �⇾ Baking twine takes longer to catch fish than fishing line.
⇾ Baking twine can cause fire while fishing line results in tangling.
...
Templates:
⇾ P is ___ while Q is ___
⇾ P takes longer to ___ that Q
⇾ P can cause ___ while Q results in ___
...
Paranjape et al. Prompting Contrastive Explanations for Commonsense Reasoning Tasks. Findings of ACL 2021.
Contrastive Explanations via Template-Based Infilling
Modeling Step 2:
Augment the input with contrastive knowledge and make a prediction with the same model
148
To prepare the puff pastry for you pie, line a baking sheet with parchment. Then unroll the pastry, lay it over baking twine.
Contrastive knowledge:
�⇾ Baking twine is used in baking while fishing line is used in fishing. ��⇾ Baking twine takes longer to catch fish than fishing line.
�⇾ Baking twine can cause fire while fishing line results in tangling.
...
Paranjape et al. Prompting Contrastive Explanations for Commonsense Reasoning Tasks. Findings of ACL 2021.
model scores (context ,
answer candidate , contrastive knowledge ) tuples
⬇
⬇
The highest scoring explanation is THE explanation
⨉
To prepare the puff pastry for you pie, line a baking sheet with parchment. Then unroll the pastry, lay it over fishing line.
Contrastive Explanations of NLP Models
Contrastive vector representation:
A dense representation of the input that captures latent features that differentiate two classes��Jacovi et al. EMNLP 2021.
...abstract them into templates, automatically fill in the templates (template-based infilling)�
Paranjape et al. Findings of ACL 2021.
Contrastive input editing:
Automatic edits to the input that change model output to the contrast case
Yang et al. COLING 2020.
Jacovi and Goldberg. TACL 2020.��Ross et al. Findings of ACL 2021.��Wu et al. ACL 2020.
Collect free-text human contrastive explanations, ...
149
...and generate them left-to-right Chen et al. ACL 2021.
Contrastive Explanations via Contrastive Projection
The key idea (IMO):
“Why P not Q?” ⇒ Select latent contrastive features in the space of hidden� representations instead of selecting them in the input (discrete� tokens)
150
Jacovi et al. Contrastive Explanations for Model Interpretability. EMNLP 2021.
Contrastive Explanations via Contrastive Projection
Thesis: Entailment because of a high lexical overlap between the premise and hypothesis
Overlap concept: All of the content words in the hypothesis also exist in the premise
�Causal Intervention (Why P?)�⇾ Study how model logits change by removing all features in the hidden representation� indicative of the overlap concept
151
Jacovi et al. Contrastive Explanations for Model Interpretability. EMNLP 2021.
Contrastive Explanations via Contrastive Projection
Thesis: Entailment because of a high lexical overlap between the premise and hypothesis
Overlap concept: All of the content words in the hypothesis also exist in the premise
�Causal Intervention (Why P?)�⇾ Study how model logits change by removing all features in the hidden representation� indicative of the overlap concept
Doesn’t answer if (a subset of) these features differentiate entailment from other classes
152
Jacovi et al. Contrastive Explanations for Model Interpretability. EMNLP 2021.
Contrastive Explanations via Contrastive Projection
Thesis: Entailment because of a high lexical overlap between the premise and hypothesis
Overlap concept: All of the content words in the hypothesis also exist in the premise
�Contrastive Intervention (Why P not Q?)��⇾ Project the hidden representation to the space of contrastive feature, i.e., remove� hidden features that the model doesn’t use to differentiate class P (entailment) from class� Q (contradiction or neutral)��⇾ Study how model logits change by removing all features in the contrastively projected� hidden representation indicative of the overlap concept
153
Jacovi et al. Contrastive Explanations for Model Interpretability. EMNLP 2021.