Rethinking Why Intermediate-Task Fine-Tuning Works
Ting-Yun Chang
tingyun@usc.edu.tw
Chi-Jen Lu
cjlu@iis.sinica.edu.tw
Intermediate-Task Fine-Tuning
What kinds of intermediate tasks work well?
Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding:
When and Why Does It Work? (Pruksachatkun et al. 2020)
Blue: helpful; Red: hurtful
But HellaSwag is a synthetic dataset...
Previous work has found that RoBERTa tends to use artifacts in HellaSwag to make predictions
Ablating Common Sense: Two Simple Baselines (1)
HellaSwag-p
Ablating Common Sense: Two Simple Baselines (2)
SynthesisGPT2
The topography of the city center was also changed by the construction of a seawall
=> We do not introduce extra commonsense information
Tasks
Target
We intentionally include target tasks that require specific knowledge
A Good Intermediate Task
36 hyperparameter trials for each intermediate-target task pair (each blue violin)
Generally, Syn_GPT2 & HellaSwag-p can sightly improve the best performance on dev sets
Syn_GPT2 & HellaSwag-p can greatly improve the average performance on dev sets
Using 2k intermediate training data of HellaSwag-p already helps!
improvement in accuracy over vanilla fine-tuning
Contribution