Back to the noisy channel
YANS 2021: Back to the noisy channel
Early MT research
2
When I look at an article in Russian, I say; “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”
... I frankly am afraid the boundaries of words in different languages are too vague .... to make any quasimechanical translation scheme very hopeful.
YANS 2021: Back to the noisy channel
History of MT
3
2016
Google NMT
2013
Neural MT
2000
Statistical MT
Phrase-based MT
Syntax-based MT
1990
Example-based MT
IBM Model
1960
ALPAC report
Rule-based MT
Systran
1950
Code breaking
YANS 2021: Back to the noisy channel
MT as transfer
4
昨日麒麟を散歩した。
I walked a giraffe yesterday.
昨日麒麟を散歩した。
I walked a giraffe yesterday.
散歩
arg0: ?
arg1: 麒麟
temp: 昨日
walk
arg0: I
arg1: giraffe
temp: yesterday
Event
walk(?, giraffe)
date(yesterday)
YANS 2021: Back to the noisy channel
MT as transfer: The Vauquois triangle
5
昨日麒麟を散歩した。
I walked a giraffe yesterday.
昨日麒麟を散歩した。
I walked a giraffe yesterday.
散歩
arg0: ?
arg1: 麒麟
temp: 昨日
walk
arg0: I
arg1: giraffe
temp: yesterday
Event
walk(?, giraffe)
date(yesterday)
Interlingua
Semantic
Syntax
Words
YANS 2021: Back to the noisy channel
Example-based MT
6
Lookup similar examples and edits
YANS 2021: Back to the noisy channel
Bilingual Data
7
上海浦东开发与法制建设同步
新华社上海二月十日电(记者谢金虎、张持坚)
上海浦东近年来颁布实行了涉及经济、贸易、建设、规划、科技、文教等领域的七十一件法规性文件,确保了浦东开发的有序进行。
浦东开发开放是一项振兴上海,建设现代化经济、贸易、金融中心的跨世纪工程,因此大量出现的是以前不曾遇到过的新情况、新问题。
对此,浦东不是简单的采取“干一段时间,等积累了经验以后再制定法规条例”的做法,而是借鉴发达国家和深圳等特区的经验教训,聘请国内外有关专家学者,积极、及时地制定和推出法规性文件,使这些经济活动一出现就被纳入法制轨道。
去年初浦东新区诞生的中国第一家医疗机构药品采购服务中心,正因为一开始就比较规范,运转至今,成交药品一亿多元,没有发现一例回扣。
The development of Shanghai's Pudong is in step with the establishment of its legal system
Xinhua News Agency, Shanghai, February 10, by wire (reporters Jinhu Xie and Chijian Zhang)
In recent years Shanghai's Pudong has promulgated and implemented 71 regulatory documents relating to areas such as economics, trade, construction, planning, science and technology, culture and education, etc., ensuring the orderly advancement of Pudong's development.
Pudong's development and opening up is a century-spanning undertaking for vigorously promoting Shanghai and constructing a modern economic, trade, and financial center. Because of this, new situations and new questions that have not been encountered before are emerging in great numbers.
In response to this, Pudong is not simply adopting an approach of "work for a short time and then draw up laws and regulations only after waiting until experience has been accumulated." Instead, Pudong is taking advantage of the lessons from experience of developed countries and special regions such as Shenzhen by hiring appropriate domestic and foreign specialists and scholars, by actively and promptly formulating and issuing regulatory documents, and by ensuring that these economic activities are incorporated into the sphere of influence of the legal system as soon as they appear.
Precisely because as soon as it opened it was relatively standardized, China's first drug purchase service center for medical treatment institutions, which came into being at the beginning of last year in the Pudong new region, in operating up to now, has concluded transactions for drugs of over 100 million yuan and hasn't had one case of kickback.
YANS 2021: Back to the noisy channel
Guess Translation
8
上海浦东开发与法制建设同步
新华社上海二月十日电(记者谢金虎、张持坚)
上海浦东近年来颁布实行了涉及经济、贸易、建设、规划、科技、文教等领域的七十一件法规性文件,确保了浦东开发的有序进行。
浦东开发开放是一项振兴上海,建设现代化经济、贸易、金融中心的跨世纪工程,因此大量出现的是以前不曾遇到过的新情况、新问题。
对此,浦东不是简单的采取“干一段时间,等积累了经验以后再制定法规条例”的做法,而是借鉴发达国家和深圳等特区的经验教训,聘请国内外有关专家学者,积极、及时地制定和推出法规性文件,使这些经济活动一出现就被纳入法制轨道。
去年初浦东新区诞生的中国第一家医疗机构药品采购服务中心,正因为一开始就比较规范,运转至今,成交药品一亿多元,没有发现一例回扣。
The development of Shanghai's Pudong is in step with the establishment of its legal system
Xinhua News Agency, Shanghai, February 10, by wire (reporters Jinhu Xie and Chijian Zhang)
In recent years Shanghai's Pudong has promulgated and implemented 71 regulatory documents relating to areas such as economics, trade, construction, planning, science and technology, culture and education, etc., ensuring the orderly advancement of Pudong's development.
Pudong's development and opening up is a century-spanning undertaking for vigorously promoting Shanghai and constructing a modern economic, trade, and financial center. Because of this, new situations and new questions that have not been encountered before are emerging in great numbers.
In response to this, Pudong is not simply adopting an approach of "work for a short time and then draw up laws and regulations only after waiting until experience has been accumulated." Instead, Pudong is taking advantage of the lessons from experience of developed countries and special regions such as Shenzhen by hiring appropriate domestic and foreign specialists and scholars, by actively and promptly formulating and issuing regulatory documents, and by ensuring that these economic activities are incorporated into the sphere of influence of the legal system as soon as they appear.
Precisely because as soon as it opened it was relatively standardized, China's first drug purchase service center for medical treatment institutions, which came into being at the beginning of last year in the Pudong new region, in operating up to now, has concluded transactions for drugs of over 100 million yuan and hasn't had one case of kickback.
YANS 2021: Back to the noisy channel
Two modeling approaches to MT
9
YANS 2021: Back to the noisy channel
Code Breaking
10
Y
Noisy channel
Decoder
Y’
X
Source
YANS 2021: Back to the noisy channel
Statistical MT
11
昨日麒麟を散歩した。
I walked a giraffe yesterday.
P(昨日 | yesterday)
P(散歩 | walked)
P(麒麟 | giraffe)
P(麒麟 | dog)
...
P(昨日 | yesterday I)
P(麒麟 | a giraffe)
P(散歩した | I walked)
P(散歩した | I walk)
...
Translation model
P(I walked a giraffe...)
P(he walked a dog...)
Language model
MT as code breaking
YANS 2021: Back to the noisy channel
Direct Modeling
12
X
transfer
Y
YANS 2021: Back to the noisy channel
Neural MT
13
昨日麒麟を散歩した。
I walked a giraffe yesterday.
Direct modeling by Neural Networks
YANS 2021: Back to the noisy channel
Deeper model with residual connection
14
Stack layers for better representation.
Residual connection to avoid vanishing gradient
RNNs are slow
Similar networks by Zhou et al., (2016) and Wu et al., (2016)
YANS 2021: Back to the noisy channel
Parallel computation
15
Encoder by multiple layers of CNN + gating
Decoder by multiple layers of CNN + gating
YANS 2021: Back to the noisy channel
Long distance relation
16
Attention is position neutral
Transformer is attention heavy
YANS 2021: Back to the noisy channel
Quality Improvements on WMT14 EN-FR by Neural MT (NMT)
17
YANS 2021: Back to the noisy channel
Research Trends
18
YANS 2021: Back to the noisy channel
Back to the noisy channel
19
YANS 2021: Back to the noisy channel
History of MT
20
2016
Google NMT
2013
Neural MT
2000
Statistical MT
Phrase-based MT
Syntax-based MT
1990
Example-based MT
IBM Model
1960
ALPAC report
Rule-based MT
Systran
1950
Code breaking
YANS 2021: Back to the noisy channel
Why noisy channel?
Direct model is biased by highly predictive outputs, a.k.a, explained away effects (Klein and Manning, 2001).
21
YANS 2021: Back to the noisy channel
Why noisy channel?
The model is trying to select likely outputs a priori and to explain the distribution of x in decoding.
22
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
23
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
24
YANS 2021: Back to the noisy channel
Simple and Effective Noisy Channel Model
Employ a standard NMT, e.g., Transformer (Vaswani et al., 2017), for P(x | y).
25
YANS 2021: Back to the noisy channel
Decoding: Direct Model
For each time step, preserves k-best candidates in beam search.
26
<s>
a
the
a
the
giraffe
dog
cat
giraffe
I
He
p(I | x)
p(He | x)
walked
walk
walks
walked
p(walked | I, x) × p(I | x)
p(walked | He, x) × p(He | x)
YANS 2021: Back to the noisy channel
Decoding: Noisy Channel Model
For each prefix, computes p(x | y) and p(y).
27
<s>
I
He
walked
walk
walks
walked
a
the
a
the
giraffe
dog
cat
giraffe
p(x | I) × p(I)
p(x | He) × p(He)
p(x | I walked) × p(I walked)
p(x | He walked) × p(He walked)
YANS 2021: Back to the noisy channel
Decoding: Approximation
Filter vocabulary space by p(y | x).
28
<s>
I
He
walked
walk
walks
walked
a
the
a
the
giraffe
dog
cat
giraffe
p(x | I) × p(I)
p(x | I walked) × p(I walked)
p(x | He) × p(He)
p(x | He walked) × p(He walked)
Filter by
p(Y | y1...yt, x)
YANS 2021: Back to the noisy channel
Model Combination
Combine p(y | x) and p(x | y) with length normalization.
29
YANS 2021: Back to the noisy channel
Experimental Results
30
WMT17 De-En BLEU
Best results by NCM
Reranking results
Larger gains by reranking
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
31
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
32
YANS 2021: Back to the noisy channel
Neural Noisy Channel Model
Introduce a latent variable z to indicate monotonic alignment between x and y.
33
YANS 2021: Back to the noisy channel
Factorization
Split into two sub-models, alignment model and word model using z.
34
Alignment model
Word model
Latent alignment variable
YANS 2021: Back to the noisy channel
Alignment: z
For each position in x, specify how to monotonically segment y, i.e., the end position of each span.
35
y
x
z1 = 2
z2 = 2
z3 = 3
z4 = 4
z5 = 4
z6 = 5
z7 = 5
z8 = 5
z9 = 5
z10 = 7
z11 = 7
z12 = 8
YANS 2021: Back to the noisy channel
Action sequence: a
Transition is modeled by an action sequence a = {SHIFT, EMIT}|x| × |y|.
36
36
y
x
SHIFT, EMIT
EMIT
SHIFT, EMIT
SHIFT, EMIT
EMIT
SHIFT, EMIT
EMIT
YANS 2021: Back to the noisy channel
Alignment Model
p(zi | zi-1, ...) conditionally depends on a.
37
YANS 2021: Back to the noisy channel
Instantiated as NN
The model is instantiated as two LSTMs, one for x and the other for y.
38
x1
x2
x3
y1
y2
y3
YANS 2021: Back to the noisy channel
Inference
Forward-backward algorithm (Rabiner, 1989) for efficient inference.
39
An external variable to memorize intermediate computation.
Recurrence is happening here.
Sum over all z
YANS 2021: Back to the noisy channel
Decoding
Approximate decoding by searching for the maximum of y and z.
40
YANS 2021: Back to the noisy channel
Model Combination
Model combination of two directions and length bias.
41
YANS 2021: Back to the noisy channel
Experimental Results
42
LDC zh-en
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
43
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
44
YANS 2021: Back to the noisy channel
Document-level MT
Extend MT task to document-level context.
45
YANS 2021: Back to the noisy channel
Factorization
Translating the whole document x into y as a noisy channel model.
46
YANS 2021: Back to the noisy channel
Graphical Model
Very strong conditional independence assumption of x and y.
47
YANS 2021: Back to the noisy channel
Decoding
Compute k-best translations for all sentences in x using q(y | x), then rescore with beam search using p(x | y) p(y).
48
x1
x2
x3
y1, 1
y1, 2
y1, 3
y2, 1
y2, 2
y2, 3
y3, 1
y3, 2
y3, 3
Proposal by p(y | x)
Search by p(x | y) p(y)
YANS 2021: Back to the noisy channel
Model Combination
49
YANS 2021: Back to the noisy channel
Experimental Results
50
LDC zh-en
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
51
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
52
YANS 2021: Back to the noisy channel
Text Classification
Given a label, generate an input text (Ding and Gimpel, 2019).
53
YANS 2021: Back to the noisy channel
Experimental Results
54
YANS 2021: Back to the noisy channel
Question Answering
Find an answer (a) to a question (q) under context (c), e.g., document or image (Lewis and Fan, 2019).
55
YANS 2021: Back to the noisy channel
Experimental Results
56
Robust results on adversarial SQuAD (Jia and Liang, 2017)
Good results on multi-paragraph inputs, though not trained with contexts.
YANS 2021: Back to the noisy channel
Noisy channel modeling is explainable
57
Interpret which question words are explained by the answer.
YANS 2021: Back to the noisy channel
Dialogue
Given context (C), predict state (B), dialogue act (A) and response (R) (Liu et al., 2021).
58
YANS 2021: Back to the noisy channel
Experimental Results: MultiWOZ
59
YANS 2021: Back to the noisy channel
Grammatical Error Correction
A straightforward application of noisy channel modeling (Flacks et al., 2019).
60
YANS 2021: Back to the noisy channel
Experimental Results
61
BEA 2019 Shared Task
YANS 2021: Back to the noisy channel
Zero/few-shot text classification
Given a verbalized classification label, predict an input text (Min et al., 2021).
62
YANS 2021: Back to the noisy channel
Experimental Results
63
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
64
YANS 2021: Back to the noisy channel
Noisy channel with NN
Machine Translation
Other tasks
65
Your task/application!
YANS 2021: Back to the noisy channel
Back to the noisy channel
66
“The crude force of computers is not science” COLING review of Brown et al. (1988)
deep learning
YANS 2021: Back to the noisy channel
Recipes for the noisy channel
67
YANS 2021: Back to the noisy channel
Questions?
68
YANS 2021: Back to the noisy channel