Self-Introduction and Research Topics
D3 - Jorge Balazs
2018 - 11 - 30
Contents
2
Refining Raw Sentence Representations for Textual Entailment Recognition
via Attention
IIIDYT at IEST 2018: Implicit Emotion Classification With Deep
Contextualized Word Representations
About me
3
About me
4
My name is Jorge [ˈxoɾxe]
But you can call me George, ホルヘ, or じょうじ
About me
5
I come from Chile!
About me
6
I graduated as an Industrial Engineer from the University of Chile (4 years as Bachelor + 2 Years specialization)
About me
7
I graduated as an Industrial Engineer from the University of Chile (4 years as Bachelor + 2 Years specialization)
About me - Research Interests
8
“There are any number of questions that might lead one to undertake a study of language. Personally, I am primarily intrigued by the possibility of learning something, from the study of language, that will bring to light inherent properties of the human mind.”
Noam Chomsky, “Language and Mind”
Contents
9
Refining Raw Sentence Representations for Textual Entailment Recognition
via Attention
IIIDYT at IEST 2018: Implicit Emotion Classification With Deep
Contextualized Word Representations
RepEval Shared Task
Refining Raw Sentence Representations for Textual Entailment Recognition via Attention
10
RepEval - Task Description
11
Task:
classify pairs of sentences in one of three categories: Entailment, Contradiction or Neutral.
Dataset:
Examples:
Premise: At the other end of Pennsylvania Avenue, people began to line up for a White House tour.
Hypothesis: People formed a line at the end of Pennsylvania Avenue.
Label: entailment
Premise: This site includes a list of all award winners and a searchable database of Government Executive articles.
Hypothesis: The Government Executive articles housed on the website are not able to be searched.
Label: contradiction
Premise: The new rights are nice enough
Hypothesis: Everyone really likes the newest benefits
Label: neutral
RepEval - Approach 1
12
Full
Maxpooling
Attentive
Max-Attentive
1 Wang et al., 2017, Bilateral Multi-Perspective Matching for Natural Language Sentences
RepEval - Approach 1 Results
13
Method | Accuracy |
CBOW Baseline | 64.7 |
ESIM (Tree-based encoder) | 72.2 |
Our implementation of BiMPM | 73.4 |
1 Wang et al., 2017, Bilateral Multi-Perspective Matching for Natural Language Sentences
RepEval
14
1 Wang et al., 2017, Bilateral Multi-Perspective Matching for Natural Language Sentences
First attempt: IBM’s BiMPM model1 (details)
Problem: This kind of model was not allowed in the RepEval competition
The purpose of the competition was to create good models for encoding single sentences into vectors.
This one did not.
RepEval - Approach 2
15
EMB
LSTM
LSTM
EMB
LSTM
EMB
Word Encoder
Second attempt: character-aware Inner-attention mechanism2 (details)
RepEval - Approach 2
16
EMB
LSTM
LSTM
EMB
LSTM
EMB
BiLSTM
Word Encoder
Context Layer
Second attempt: character-aware Inner-attention mechanism2 (details)
RepEval - Approach 2
17
EMB
LSTM
LSTM
EMB
LSTM
EMB
Pooling
Inner Attention
BiLSTM
Word Encoder
Context Layer
Sentence Encoder
Second attempt: character-aware Inner-attention mechanism2 (details)
RepEval - Approach 2
18
EMB
LSTM
LSTM
EMB
LSTM
EMB
Pooling
Inner Attention
BiLSTM
Word Encoder
Context Layer
Sentence Encoder
Feature Extractor
Second attempt: character-aware Inner-attention mechanism2 (details)
RepEval - Approach 2
19
EMB
LSTM
LSTM
EMB
LSTM
EMB
Pooling
Inner Attention
BiLSTM
Linear
Softmax & Argmax
Label
Word Encoder
Context Layer
Sentence Encoder
Feature Extractor
Classifier
Second attempt: character-aware Inner-attention mechanism2 (details)
RepEval - Approach 2 Results
20
Inner-attention mechanism2 (details)
Method | Accuracy |
CBOW Baseline | 64.7 |
ESIM (Tree-based encoder) | 72.2 |
Our implementation of BiMPM | 73.4 |
Our implementation of the Inner-Attention model | 72.3 |
RepEval - Approach 2 Results
21
Table: RepEval results on the test set for each team (Nangia et al., 2017)
Contents
22
Refining Raw Sentence Representations for Textual Entailment Recognition
via Attention
IIIDYT at IEST 2018: Implicit Emotion Classification With Deep
Contextualized Word Representations
Implicit Emotion Shared Task
Implicit Emotion Classification with Deep Contextualized Representations
23
Implicit Emotion ST - Task Description
24
Task:
Given a tweet with a word removed, predict the emotion of such word. Classes are: sad, fear, disgust, surprise,
anger, and joy.
Dataset:
Tweets with a hidden word, and a label indicating the emotion of the removed word
Examples:
It's [#TRIGGERWORD#] when you feel like you are invisible to others. sad�
My step mom got so [#TRIGGERWORD#] when she came home from work and saw that the boys
didn't come to Austin with me. sad�
We are so [#TRIGGERWORD#] that people must think we are on good drugs or just really good
actors. joy
Implicit Emotion ST - Proposed Architecture
25
BiLSTM
ELMo Layer
Max Pooling
Context Layer
Sentence Encoder
Linear
Softmax & Argmax
Label
Classifier
[#TRIGGERWORD#]
It’s
when
Word Encoder
...
Implicit Emotion ST - Results
26
Implicit Emotion ST - Info Sources
27
Implicit Emotion ST - Methods
28
Implicit Emotion ST - Tools
29
Implicit Emotion ST - Ablation Study
30
Implicit Emotion ST - Dropout Ablation Study
31
Implicit Emotion ST - Confusion Matrix
32
Implicit Emotion ST - Annotation Artifact
33
Separate joy cluster corresponds to those sentences containing the “un[#TRIGGERWORD#]” pattern.
Implicit Emotion ST - Emoji
34
Different emoji affect classification performance in different ways
😷💕😍❤️😡😢😭😒😩😂😅😕
Contents
35
Refining Raw Sentence Representations for Textual Entailment Recognition
via Attention
IIIDYT at IEST 2018: Implicit Emotion Classification With Deep
Contextualized Word Representations
Gating Mechanisms for Combining Character and Word-level Word Representations
36
Problem
37
Incorporating subword information (characters, morphemes, byte-pair encoding, etc.) has been proven to create better word representations, however:
Research Question
38
Are there any fundamental principles underlying the way in which we combine representations in NLP?
A first step towards answering the question above, would be to answer:
What is the best way in which we can combine character and word-level representations?
Research Question
39
What is the best way in which we can combine character and word-level representations?
Concat?
(1 - )
Scalar-weighted sum?
Word
Characters
Sum?
Multiply?
Approach
40
Architecture
41
?
Max Pooling
BiLSTM
Linear
Softmax & Argmax
Label
Architecture based in Conneau et al., 2017
Word Encoder
Context Layer
Sentence Encoder
Feature Extractor
Classifier
Conneau et al., 2017, Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Word Encoder Variations
42
What is the best way of combining word and character level vectors?
Word-only
EMB
Concat
EMB
LSTM
Scalar Gate
EMB
LSTM
Linear (d x 1)
g x c + (1 - g) x w
Vector Gate
EMB
LSTM
Linear (d x d)
g c + (1 - g) w
Char-only
LSTM
Results
43
Analysis
44
Overview of the Reviews
45
Attempted to submit a previous version of this research as short paper to EMNLP 2018, but was rejected because:
Vector gate representations
46
Test accuracy
84.4
Vector Gate
EMB
LSTM
Linear (d x d)
g c + (1 - g) w
Vector gate representations
47
Vector Gate
EMB
LSTM
Linear (d x d)
g c + (1 - g) w
84.4
Vector gate representations
48
Vector Gate
EMB
LSTM
Linear (d x d)
g c + (1 - g) w
84.4
Character only representations
49
For reference only
79.4
Char-only
LSTM
Concat representations
50
84.6
Concat
EMB
LSTM
Concat representations
51
Concat
EMB
LSTM
84.6
Concat representations (randomly initialized word embeddings)
52
81.6
Concat
EMB
LSTM
Concat representations (randomly initialized word embeddings + norm.)
53
Concat
EMB
LSTM
79.3
Concat representations (GloVe pre-trained word embeddings + norm.)
54
Concat
EMB
LSTM
84.0
Concat representations (MultiNLI)
55
Concat
EMB
LSTM
Discussion
56
Characters don’t seem to help in the SNLI task
Possible causes:
However, they do help in MultiNLI
Discussion
57
The distributions of character and word-level representations seem to differ greatly
Normalizing word and character level representations worsens classification results
cat
normalized cat
vector gate
normalized vector gate
2D PCA projections of word representations learned by different architectures. ‘words’ correspond to GloVe word embeddings, and ‘chars’ to corresponds to word representations coming from character-level vectors.
Ongoing research
58
We could apply domain adaptation knowledge to our problem.
We want to adapt the character-level domain to the word-level domain.
Ganin et al., (2016) state that:
“for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains.”
Ganin et al.., 2016, Domain-Adversarial Training of Neural Networks
cat
normalized cat
vector gate
normalized vector gate
2D PCA projections of word representations learned by different architectures. ‘words’ correspond to GloVe word embeddings, and ‘chars’ to corresponds to word representations coming from character-level vectors.
Ongoing research
59
Ganin et al.., 2016, Domain-Adversarial Training of Neural Networks
Ongoing research
60
Ganin et al.., 2016, Domain-Adversarial Training of Neural Networks
Ongoing research
61
Max Pooling
BiLSTM
Linear
Softmax & Argmax
Label
GloVe Vector
Character-level representation
Concat
or
Scalar Gate
or
Vector Gate
Gradient Reversal Layer
Label: char or word
Discriminator
Ongoing research
62
Concat non-adapted (0.8507)
Concat adapted (0.853)
Latest Findings
63
Domain-adversarial training forces the model to use character-level word representations.
Preliminary results show that adversarial training can improve results in the NLI downstream task for some models.
Domain-adversarial training is more difficult to achieve when there are gating mechanisms present.
Future Research Directions
64
Goals: Submit results to NAACL (December 10, 2018)
Thank you
65