How to Present
Gabi Stanovsky
I’m a Naturally Bad Speaker
Presenting is Important
Present as Often as Possible
Disclaimer
Agenda
Agenda
2. Know your audience
3. Know how much time you have
Agenda
Running Example
High level
(NLP topics)
Mid level
(takeaways)
Low level
(numbers)
time
I work on combining discrete and end-to-end models in various NLP real-world tasks
I found that models are biased
Google translate performs 52% on WinoMT benchmark
I combined the two to create a better model
It does well on news, less so on sports, here’s an analysis
Finding adverse drug reactions at 83% F1
This approach works well on real-world medical task
Start with the high-level topic
Natural Language Processing
Models which take human-written text as input �& perform a task requiring some form of human intelligence
Slide #1
Grand Challenges in NLP
Automated assistants�“I got one of those terrible headaches from lack of sleep. Can you give me something for it?”
Automated assistants Moon
Machine translation
“the universal translator, invented in 2151, is used for deciphering unknown languages”
Machine translation Star Trek
Information retrieval Star Wars
Information retrieval
“What’s the second largest star in this galaxy?”
Slide #2
2. Sign Post
Outline: Research Questions
Do NLP models capture meaning?
ACL 2019 🎉 Nominated for Best Paper�MRQA 2019 🎉 Best Paper award�EMNLP 2018
Can we integrate meaning into NLP?�ACL 2015, EACL 2017, SemEval 2017,
NAACL 2017, SemEval 2019
How can we build parsers for meaning?
EMNLP 2016a, EMNLP 2016b,
ACL 2016a, ACL 2016b, ACL 2017,
NAACL 2018, EMNLP2018a,
EMNLP2018b,
CoNLL 2019 🎉 Honorable mention
Models miss crucial meaning aspects
Gender bias in machine translation
Data collection�QA is an intuitive annotation format�
Model design�Robust performance across domains
Real-world application�Adverse drug reactions on social media
Slide #10
Outline: Research Questions
Can we integrate meaning into NLP?�ACL 2015, EACL 2017, SemEval 2017,
NAACL 2017, SemEval 2019
How can we build parsers for meaning?
EMNLP 2016a, EMNLP 2016b,
ACL 2016a, ACL 2016b, ACL 2017,
NAACL 2018, EMNLP2018a,
EMNLP2018b,
CoNLL 2019 Honorable mention
Do NLP models capture meaning?
ACL 2019 🎉 Nominated for Best Paper�MRQA 2019 🎉 Best Paper award�EMNLP 2018
Models miss crucial meaning aspects
Gender bias in machine translation
Slide #11
Research Questions
Can we integrate meaning into NLP?�ACL 2015, EACL 2017, SemEval 2017,
NAACL 2017, SemEval 2019
How can we build parsers for meaning?
EMNLP 2016a, EMNLP 2016b,
ACL 2016a, ACL 2016b, ACL 2017,
NAACL 2018, EMNLP2018a,
EMNLP2018b,
CoNLL 2019 🎉 Honorable mention
Weaknesses in state of the art
ACL 2019 🎉 Nominated for Best Paper�MRQA 2019 🎉 Best Paper award�EMNLP 2018
Data collection�QA is an intuitive annotation format�
Model design�Robust performance across domains
Real-world application�Adverse drug reactions on social media
Slide #70
Research Questions
Building meaning representations
EMNLP 2016a, EMNLP 2016b,
ACL 2016a, ACL 2016b, ACL 2017,
NAACL 2018, EMNLP2018a,
EMNLP2018b,
CoNLL 2019 🎉 Honorable mention
Weaknesses in state of the art
ACL 2019 🎉 Nominated for Best Paper�MRQA 2019 🎉 Best Paper award�EMNLP 2018
Can we integrate meaning into NLP?�ACL 2015, EACL 2017, SemEval 2017,
NAACL 2017, SemEval 2019
Real-world application�Adverse drug reactions on social media
Slide #103
3. If it’s important – put it on the screen
Grand Challenges in Natural Language Processing (NLP)
NLP models need to capture the �meaning behind our words and interact accordingly
Slide #3
4. Keep a running example
Methodology: Automatic evaluation of gender accuracy
The doctor asked the nurse to help her in the procedure.
Input: MT model + target language�Output: Gender accuracy
Slide #55
Methodology: Automatic evaluation of gender accuracy
The doctor asked the nurse to help her in the procedure.
La doctora le pidió a la enfermera que le ayudara con el procedimiento.
Input: MT model + target language�Output: Gender accuracy
Slide #56
Results
Google Translate
Acc (%)
Human performance
random
The doctor asked the nurse to help him in the procedure.
Slide #60
Results
Google Translate
Acc (%)
The doctor asked the nurse to help her in the procedure.
Human performance
random
Slide #61
Results
Acc (%)
Google Translate
Gender bias
Human performance
random
Slide #62
5. Form a visual language
Background: How should we represent text?
Background: How should we represent text?
Explicitly! We should define a formal representation of meaning
Explicit Representations
Background: How should we represent text?
Implicitly! Models should learn a latent useful representation for an end-task�
Implicit Representations
[1] Peters et al, 2018 [2] Devlin et al., 2019
Natural Language Processing in 2019
Implicit
representation
Explicit representation
Results and Analysis
implicit
explicit
6. Present figures and tables thoughtfully
Automatic Evaluation: Single Token Level
Automatic Evaluation: Single Token Level
Automatic Evaluation: Single Token Level
Automatic Evaluation: Single Token Level
Fetaya 2020
Automatic Evaluation: Single Token Level
zero-shot
Fetaya 2020
Automatic Evaluation: Single Token Level
zero-shot
Fetaya 2020
from scratch
Automatic Evaluation: Single Token Level
zero-shot
Fetaya 2020
from scratch
multilingual
Automatic Evaluation: Single Token Level
Automatic Evaluation: Single Token Level
Our models achieve SotA results by combining multilingual pre-training + Akkadian fine-tuning
Automatic Evaluation: Single Token Level
MBERT zero-shot outperforms training from scratch without training on Akkadian text!
Results
Google Translate
Acc (%)
Human performance
random
The doctor asked the nurse to help him in the procedure.
Results
Google Translate
Acc (%)
The doctor asked the nurse to help her in the procedure.
Human performance
random
Results
Acc (%)
Google Translate
Gender bias
Human performance
random
Evaluation - Open IE
QA data
High confidence threshold→ Accurate propositions, relatively few of them
Low confidence threshold→ �More propositions, relatively less accurate
Evaluation - Open IE
QA data
Our approach presents a favorable precision-recall tradeoff on our data
Evaluation - Open IE
QA data
Other datasets
We generalize well to datasets unseen during training
Evaluation - Open IE
4 points over state of the art
QA data
Other datasets
Our method
7. You don’t have to present everything you did
8. Keep your slides simple
9. Be Consistent
10. Don’t have a blank “thank you״ slide
Conclusion
Thanks for listening!
Come to the Gender Bias Workshop! (Friday)
¡Gracias por su atención!
Merci pour l'écoute!
Grazie per aver ascoltato!
Спасибо за внимание!
Спасибі за слухання!
!תודה על ההקשבה
!شكرا على الإنصات
Danke fürs Zuhören!
Agenda
2. Practice your presentation often
3. Interact with the audience if appropriate
4. Count to 10 when interacting with audience
5. Let people to finish their questions
6. Don’t take questions as personal attacks
Conclusions
Preliminaries
Design
Presentation
Watch this excellent presentation for more tips
Thank you!