Machine Translation in the Real World
Hassan Sajjad
About me
2
Research Experience
NLP Areas
Statistical machine translation
Neural machine translation
Neural language models
Domain adaptation
Multitask learning
Word alignment
Query expansion
Corpus generation
Transliteration mining
Part of speech tagging
Machine translation evaluation
Comparable corpora extraction
Interpretation of deep models
Techniques
Unsupervised methods
Deep neural networks
Supervised methods
Genre/Domain
Informal language (SMS, Tweet, Chat)
Spoken language (Talks, Lectures)
Formal language (News)
Languages
Low-resource languages
Resource-rich languages
Morphologically-rich languages
3
Research Experience
Research Experience
4
ICLR
AAAI
Machine Learning
CL
ACL
NAACL
EMNLP
COLING
EACL
LREC
Computational Linguistics
Data Resources
In this talk
5
Translation
Meaningful representation of one language in another language
6
He does not go home
Er geht ja nicht nach hause
No va a su casa
他不回家
English
Spanish
Chinese
German
هو لا يذهب إلى البيت
Arabic
Machine Translation (MT)
7
He does not go home
Er geht ja nicht nach hause
I am working on it
Ich arbeite daran
Domain Adaptation for MT
About the problem of unwanted pregnancy
About the problem of choice overload
8
Domain Adaptation for MT
“Domain adaptation aims to preserve the identity of a domain while exploiting the large heterogeneous data in favor of it”
9
Domain Adaptation for MT
In this work:
10
Neural Network Joint Model (Devlin 2014)
Given a parallel corpus, minimize the negative log-likelihood of the training data
Figure - three source words, four target words
11
Neural Network Joint Model (Devlin 2014)
Given a parallel corpus, minimize the negative log-likelihood of the training data
Figure - three source words, four target words
Limitations
12
Neural Domain Adaptation Model
13
Results
14
In this talk
15
Model for Transliteration Mining
16
Model for Transliteration Mining
17
Model for Transliteration Mining
In this work:
18
Model for Transliteration Mining
Intuition:
19
Model for Transliteration Mining
Transliteration mining model is defined as a mixture of a transliteration model and a non-transliteration model
Where is the prior probability of non-transliteration, is character language model probability
20
Transliteration Model
non-transliteration Model
Model for Transliteration Mining
Intuition:
21
Model for Transliteration Mining
Intuition:
22
Results
23
In this talk
24
Interpretation of Neural MT
25
Interpretation of Neural MT
In this work:
26
Input
Layer 1
Layer 2
Layer 3
Output
Analyzing Vector Representations
Research Questions:
27
Analyzing Vector Representations
Methodology:
28
Results
29
Input
Layer 1
Layer 2
Layer 3
Output
Word-level concepts
Syntax and Semantic
Analyzing Vector Representations
Limitation:
30
Input
Layer 1
Layer 2
Layer 3
Output
Individual neurons
Analyzing Individual Neurons
Limitation:
Open questions:
31
Input
Layer 1
Layer 2
Layer 3
Output
Individual neurons
Analyzing Individual Neurons
32
Linguistic Correlation Analysis
33
Cross-model Correlation Analysis
Hypothesis
34
Visualization - Top Neurons
English Verb # 1902
Position Neuron # 1903
Article Neuron # 590
Focused vs. Distributed Neurons
Open class vs. closed class categories
36
Neuron | Top 10 Words |
#1925 | August, July, January, September, October, presidential, April, May, February, December |
#1960 | no, No, not, nothing, nor, neither, or, none, whether, appeal |
#1590 | 50, 10, 51, 61, 47, 37, 48, 33, 43, 49 |
Controlling of Models
Can we use this information to control models?
37
Controlling of Models
38
Media Coverage
39
In this talk
40
Practical Machine Translation
41
Practical Machine Translation
Ranking top or among the best performing systems
42
WMT 2013
Russian-English - 2nd tier
IWSLT 2013 & 2016
Lecture and speech translation
Arabic-English - 1st
English-Arabic - 1st
NIST 2015
Dialectal Arabic-English - 2nd
Practical Machine Translation
43
Startup grant $100k
32 million tokens translated!
35 countries
Potential Research Directions
44
Thank you
45
Neural Network Language Model (Bengio 2003)
Given a monolingual corpus, minimize the negative log-likelihood of the training data
is an indicator variable, is the language model context
is the softmax output
46
Neural Domain Adaptation Model
47
Machine Translation
48
Machine Translation through Transliteration
Can we leverage the benefit of similar vocabulary?
49
Machine Translation through Transliteration
Can we leverage the benefit of similar vocabulary?
50
Let’s leverage the benefit of similar vocabulary by modeling transliteration between language pairs
Machine Translation through Transliteration
Basic idea:
51
Machine Translation through Transliteration
52
Initial output
Input sentence
Decoder
Translation model
Language model
Transliteration of unknown words
final output
Transliteration as a post-processing step
Input sentence
Decoder
Translation model
Language model
final output
Transliteration as a component of the translation model
Translation
Transliteration
Machine Translation through Transliteration
53
Translation model
Language model
Machine Translation through Transliteration
Estimate conditional probability of words using an interpolation of translation sub-model and transliteration sub-model
54
Translation model
Machine Translation through Transliteration
Summary
55
Machine Translation
56
Machine Translation
57
Analysis of Neural Machine Translation
In this work:
58
Analysis of Neural Machine Translation
Research Questions:
59
Analysis of Neural Machine Translation
Methodology:
60
Analysis of Neural Machine Translation
Methodology:
61
Analysis of Neural Machine Translation
Hypothesis:
62
Analysis of Neural Machine Translation
Results:
63
Analysis of Neural Machine Translation
Results:
64
Machine Translation
65
Improving Neural Decoder using Multitask Learning
In this work:
66
Improving Neural Decoder using Multitask Learning
is a hyperparameter to find the balance between translation and morphology prediction tasks
67
Improving Neural Decoder using Multitask Learning
Results:
68
Machine Translation
69
Semi-supervised Model for Transliteration Mining
is the labeled data counts of the character alignment , is the unlabeled data probability, , is the , and is the number of character alignment types observed in the viterbi alignment of the labeled data
70
Model for Transliteration Mining
71
Miscellaneous Projects
72
In this talk
73
Miscellaneous Projects
74
Analysis of Neural Machine Translation
Methodology:
75
Analysis of Neural Machine Translation
Results:
76
Domain Adaptation for MT
In this work:
77
Multi-domain Training Scenario for Neural MT
78
Multi-domain Training Scenario for Neural MT
79
Multi-domain Training Scenario for Neural MT
80
Unsupervised Model for Transliteration Mining
Transliteration mining model is defined as a mixture of a transliteration model and a non-transliteration model
Where is the prior probability of non-transliteration
81
Transliteration Model
Non-transliteration Model
Unsupervised Model for Transliteration Mining
Transliteration mining model is defined as a mixture of a transliteration model and a non-transliteration model
Where is the prior probability of non-transliteration, is character language model probability
82
Transliteration Model
non-transliteration Model
Neural Domain Adaptation Model
Method 1: Give higher weight to word sequences that are liked by the in-domain data
�
is the probability of training instance according to the in-domain model , and is the probability of the adapted model
83
Neural Domain Adaptation Model
Method 2: Additionally penalizes those sequences that are liked by the out-of-domain data
is the probability of training instance according to the out-of-domain model
84
Neural Domain Adaptation Model
Method 3:
85
Machine Translation
A few other notable projects:
86