Transfer Learning and tools for Conversational Agents
Thomas Wolf - HuggingFace Inc.
1
Hugging Face Inc.
2
Hugging Face: Democratizing NLP
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 3
Transfer Learning for Language Generation
A dialog generation task:
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 4
More complex adaptation:
Transfer Learning for�Language Generation
�The Conversational Intelligence Challenge 2
« ConvAI2 »
(NIPS 2018 competition)
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 5
Final Automatic Evaluation Leaderboard (hidden test set)
Hugging Face: Democratizing NLP
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 6
Democratizing NLP – sharing knowledge, code, data
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 7
Libraries
8
Transformers library
We’ve built an opinionated framework providing state-of-the-art general-purpose tools for Natural Language Understanding and Generation.
Features:
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 9
Transformers library
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 10
Transformers library: code example
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 11
💥 Check it out at 💥 �https://github.com/huggingface/transformers
Transformers: model hub
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 12
💥 Check it out at 💥 �huggingface.co
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 13
Tokenizers library
Now that neural nets have fast implementations, a bottleneck in Deep-Learning based NLP pipelines is often tokenization: converting strings ➡️ model inputs.
We have just released 🤗Tokenizers: ultra-fast & versatile tokenization
Features:
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 14
Datasets library
The full data-processing pipeline goes beyond tokenization and models to include data access and preprocessing at the beginning and model evaluation at the end.
We have recently released a new library 🤗Datasets to improve the situation on both ends of the pipeline.
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 15
Data
Tokenization
Prediction
Datasets
Tokenizers
Transformers
Metrics
Datasets
Datasets library
Datasets is a lightweight and extensible library to easily access and process datasets and evaluation metrics for Natural Language Processing (NLP).
Features:
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 16
Datasets: code example
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 17
💥 Check it out at 💥�https://github.com/huggingface/datasets
Datasets: datasets hub
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 18
💥 Check it out at 💥 �huggingface.co
Tools for generation
19
Decoding methods for language generation� with Transformers
Since Feb 2020 (v.2.4.0) the Transformers library include a method to generate from any model provided with output embeddings with many methods.
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 20
Decoding methods for language generation� with Transformers
Beam search reduces the risk of missing hidden high probability word sequences by keeping the most likely num_beams of hypotheses at each time step and eventually choosing the hypothesis that has the overall highest probability.
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 21
Decoding methods for language generation� with Transformers
Beam search reduces the risk of missing hidden high probability word sequences.
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 22
Decoding methods for language generation� with Transformers
In open-ended generation, beam search might not be the best option:
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 23
Decoding methods for language generation� with Transformers
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 24
Decoding methods for language generation
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 25
Decoding methods for language generation
For more information:
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 26
Thanks for listening!
27
Concepts
What is Transfer Learning?
28
What is Transfer Learning?
Adapted from NAACL 2019 Tutorial: https://tinyurl.com/NAACLTransfer
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 29
Sequential Transfer Learning
Learn on one task/dataset, transfer to another task/dataset
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 30
word2vec
GloVe
skip-thought
InferSent
ELMo
ULMFiT
GPT
BERT
DistilBERT
Text classification
Word labeling
Question-Answering
....
Pretraining
Adaptation
Computationally intensive�step
General purpose
model
Training: The rise of language modeling pretraining
Many currently successful pretraining approaches are based on language modeling: learning to predict Pϴ(text) or Pϴ(text | other text)
Advantages:
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 31
Pretraining Transformers models (BERT, GPT…)
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 32
Sequential Transfer Learning
Learn on one task/dataset, transfer to another task/dataset
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 33
word2vec
GloVe
skip-thought
InferSent
ELMo
ULMFiT
GPT
BERT
DistilBERT
Text classification
Word labeling
Question-Answering
....
Pretraining
Adaptation
Computationally intensive�step
General purpose
model
Data-efficient
step
Task-specific
high-performance
model
Model: Adapting for target task
General workflow:
Sometimes very complex: Adapting to a structurally different task
Ex: Pretraining with a single input sequence and adapting to a task with� several input sequences (ex: translation, conditional generation...)� ➯ Use pretrained model to initialize as much as possible of target model� ➯ Ramachandran et al., EMNLP 2017; Lample & Conneau, 2019
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 34
Downstream tasks and
Model Adaptation:
Quick Examples
35
A – Transfer Learning for text classification
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 36
Pretrained� model
Adaptation� Head
Tokenizer
Jim Henson was a puppeteer
Jim |
Henson |
was |
a |
puppet |
##eer |
Tokenization
11067 |
5567 |
245 |
120 |
7756 |
9908 |
1.2 | 2.7 | 0.6 | -0.2 |
3.7 | 9.1 | -2.1 | 3.1 |
1.5 | -4.7 | 2.4 | 6.7 |
6.1 | 2.4 | 7.3 | -0.6 |
-3.1 | 2.5 | 1.9 | -0.1 |
0.7 | 2.1 | 4.2 | -3.1 |
Classifier model
Convert
to
vocabulary
indices
Pretrained
model
True | 0.7886 |
False | -0.223 |
A – Transfer Learning for text classification
Remarks:
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 37
Trends and limits of
Transfer Learning in NLP
38
Model size and Computational efficiency
39
⬆
GShard
600B
⬆
GPT3
175B
Model size and Computational efficiency
Why is this a problem?
�
40
“Energy and Policy Considerations for Deep Learning in NLP” - Strubell, Ganesh, McCallum - ACL 2019
Model size and Computational efficiency
Reducing the size of a pretrained model
Three main techniques currently investigated:
41
The generalization problem:
Brittle Spurious
������
��
Robin Jia and Percy Liang, “Adversarial Examples for Evaluating Reading Comprehension� Systems,” ArXiv:1707.07328 [Cs], July 23, 2017, http://arxiv.org/abs/1707.07328
R. Thomas McCoy, Junghyun Min, and Tal Linzen, “BERTs of a Feather Do Not Generalize Together: Large Variability in Generalization across Models with Similar � Test Set Performance,” ArXiv:1911.02969 [Cs], November 7, 2019, http://arxiv.org/abs/1911.02969.
42
Shortcoming of language modeling in general
Need for grounded representations
Transfer Learning in NLP: Concepts, Tools & Trends - Thomas Wolf - Slide 43
Current transfer learning performs adaptation once.
Main challenge:
Catastrophic forgetting.
Different approaches from the literature:
44