NLU Lab: Paper Reading (14 Feb 2024)
Today:
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
(McCoy et al. 2019)
arXiv: 1902.01007
What can you learn from title + abstract?
Kinds of NLP Papers
“Prove transformers cannot learn to multiply arbitrary sequences in S5”
“Models rely on shallow heuristics to solve NLI tasks”
“Behold: the Transformer”
Empirical tests need benchmarks!
“Benchmark” = way to test on common ground
Good:
“Modifying model X by doing Y improves performance on Z”
Bad:
“Sentiment classification models work better on English than on Icelandic”
Why is the good example good?
Why is the bad example bad?
Sections & their purposes
From McCoy et al. (2019):
Where do they state their hypothesis?
Why are (3) and (4) different sections?
What’s going on with (7)?
What’s missing?
What is their hypothesis?
Quantitative & qualitative explanations
Meta Questions