[NeurIPS’23]
Background
“RRR movie has a great story and amazing visuals.”
Positive
Negative
corrective input
Model
The keywords ’great’ and ’amazing’ are important cues in predicting the sentiment of this sentence.
“ in-context learning”
Motivation
“ in-context learning”
Requires human involvement
challenges; scalability
Generate automatically
Introduction
→ Compute post hoc explanations using a smaller proxy model and then incorporate these explanations into prompts for larger language models.
→ Takes advantage of the accessibility of smaller models that are open source.
AMPLIFY
AMPLIFY
STEP 1. Proxy Model Selection
→ Both proxy model (smaller model) do not perform well on reasoning tasks by themselves.
AMPLIFY
STEP 2. Few-shot Sample Selection
→ The samples that exhibit the highest MCS represent the most egregious misclassifications.
*x : Input sequence
*y : Incorrect label
*ŷ : Ground truth label
*f : Fine-tuned LM
x = { RRR movie has a great story and amazing visuals. }
Positive
Negative
Proxy
Model
AMPLIFY
STEP 3. Rationale Generation
→ Output the set of top-k words for the input sample.
“RRR movie has a great story and amazing visuals.”
Model
{great, amazing, …}
top-k words
AMPLIFY
STEP 4. Prompt Design for LLMs
"The key words: ’great’ and ’amazing’ are important clues to predict ‘Positive’ as the correct answer."
“RRR movie has a great story and amazing visuals.”
Positive
Negative
corrective input
Model
+ {test samples}
Experiment Setup
Formal Fallacies task, Salient Translation Error Detection task
CommonsenseQA dataset, Coin Flip dataset
Experiment Setup
* gradient
Evaluation
Overall Task Performance
Evaluation
Impact of Proxy Model Selection on LLM Performance
*E: proxy model fine-tuning epoch
Evaluation
Impact of Selection Strategies on LLM Performance
Evaluation
Impact of Post Hoc Explanation Method on LLM Performance