Studying Shared Representations in NMT through Zero-Shot Translation
SRA “Language Similarity in Multilingual Translation”
Danni Liu
Meta
16/02/2022
Shared representations through zero-shot translation
1
Multilingual translation
…
…
Shared representations through zero-shot translation
1
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
…
…
Parallel data availability:
Shared representations through zero-shot translation
1
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
…
…
Parallel data availability:
Shared representations through zero-shot translation
2
Shared representations through zero-shot translation
2
Shared representations through zero-shot translation
2
Shared representations through zero-shot translation
2
Shared representations through zero-shot translation
2
encoder
Good day
Shared representations through zero-shot translation
2
encoder
Good day
Guten Tag
Shared representations through zero-shot translation
2
encoder
decoder
Buenos días </s>
Buenos días
Shared representations through zero-shot translation
2
encoder
decoder
Bonjour </s>
Bonjour
Outline
3
How language-independent are existing NMT models?
4
How language-independent are existing NMT models?
4
encoder
classifier
de en fr … … pt
*: Source sentences w/o src tags
How language-independent are existing NMT models?
4
encoder
classifier
de en fr … … pt
*: Source sentences w/o src tags
How language-independent are existing NMT models?
5
On Europarl
Overall accuracy: 87%
Language classification results
predicted
true
(Liu et al., 2021)
How language-independent are existing NMT models?
5
🙂 some notion of language similarity
😐 strong language signals preserved
On Europarl
Overall accuracy: 87%
Language classification results
predicted
true
(Liu et al., 2021)
Language-specific representations & zero-shot translation
6
decoder
Hello
Language-specific representations & zero-shot translation
6
decoder
Hello
Language-specific representations & zero-shot translation
6
X
decoder
encoder
En
decoder
encoder
Y
Language-specific representations & zero-shot translation
6
Zero-shot and pivoting translation quality
Promote similar representations - similarity regularizer
7
dec.
enc.
tgt
src
Promote similar representations - similarity regularizer
7
dec.
MT loss
similarity loss
enc.
MT loss
dec.
enc.
tgt src
src tgt
Promote similar representations - adversarial classifier
8
dec.
enc.
language classifier
src
tgt
Promote similar representations - adversarial classifier
8
dec.
enc.
MT loss
language classifier loss
−λ
src
tgt
Effects of similarity-enforcing training objectives
9
* Supervised BLEU degradation ≤ 0.5
Pivoting BLEU: 22.1
Zero-shot translation quality
Data amount: 0.9M 18M 0.5M
# languages 4 9 10
# zero-shot dir. 6 56 72
Effects of similarity-enforcing training objectives
9
* Supervised BLEU degradation ≤ 0.5
Pivoting BLEU: 22.1
Zero-shot translation quality
Data amount: 0.9M 18M 0.5M
# languages 4 9 10
# zero-shot dir. 6 56 72
Effects of similarity-enforcing training objectives
9
Zero-shot translation quality
Pivoting BLEU: 19.1
Pivoting BLEU: 26.0
Pivoting BLEU: 22.1
🙂 improved zero-shot translation quality
😐 performance gap to pivoting
* Supervised BLEU degradation ≤ 0.5
Effects of similarity-enforcing training objectives
9
🙂 improved zero-shot translation quality
😐 performance gap to pivoting
😐 unaddressed: source word order difference
Zero-shot translation quality
Pivoting BLEU: 19.1
Pivoting BLEU: 26.0
Pivoting BLEU: 22.1
* Supervised BLEU degradation ≤ 0.5
Promote similar representations - source word order
10
encoder
encoder
=
Promote similar representations - source word order
10
encoder
encoder
≠
I can there go
I can go there
Promote similar representations - source word order
10
Promote similar representations - source word order
10
Promote similar representations - source word order
10
encoder
encoder
I can there go
I can go there
≠
Promote similar representations - source word order
11
Promote similar representations - source word order
11
Effects of removing residual connections in middle layer
12
Zero-shot translation quality
Analyzing source language signals
13
Language classification results
Baseline Transformer
After residual removal
(Liu et al., 2021)
Complementary effects
15
Zero-shot translation quality
Complementary effects
15
Zero-shot translation quality
Pivoting BLEU: 26.0
So far we focused on modeling
16
So far we focused on modeling, what about data?
Next up:
16
Choosing similar bridging languages
17
WMT21 large-scale task
en/id
Data:
Bridge language:
MultiCCAligned
en/hi
OpenSubtitle
en/es
Choosing similar bridging languages
17
Zero-shot translation quality
WMT21 large-scale task
en/id
Data:
Bridge language:
MultiCCAligned
en/hi
OpenSubtitle
en/es
Choosing similar bridging languages
17
Zero-shot translation quality
WMT21 large-scale task
en/id
Data:
Bridge language:
MultiCCAligned
en/hi
OpenSubtitle
en/es
21.6
31.9
13.8
12.5
17.9
19.7
Pivoting BLEU
Choosing similar bridging languages
17
Zero-shot translation quality
English-centric extended with local connectivity
18
Zero-shot translation quality
English-centric extended with local connectivity
18
Zero-shot translation quality
1-stop
English-centric extended with local connectivity
18
Zero-shot translation quality
2-stop
English-centric extended with local connectivity
18
Zero-shot translation quality
3-stop
English-centric extended with local connectivity
18
Pivoting BLEU: 14.6
Zero-shot translation quality
*: Avg. supervised performance degraded by 0.6~1.3 BLEU
English-centric extended with local connectivity
18
Pivoting BLEU: 14.6
Zero-shot translation quality
*: Avg. supervised performance degraded by 0.6~1.3 BLEU
English-centric extended with local connectivity
18
Pivoting BLEU: 14.6
Zero-shot translation quality
*: Avg. supervised performance degraded by 0.6~1.3 BLEU
Pretrain-finetune setup
19
Data:
Model:
+: methods to promote language similarity
M2M-124
(Goyal et al., 2021)
{id, ms, tl, jv} × {id, ms, tl, jv}
Initialize: Train: Test:
Finetune pretrained models for zero-shot translation
20
“Zero-shot” translation quality
Finetune pretrained models for zero-shot translation
20
“Zero-shot” translation quality
Finetune pretrained models for zero-shot translation
20
“Zero-shot” translation quality
Finetune pretrained models for zero-shot translation
20
“Zero-shot” translation quality
Encouraging cross-modality similarity
21
Encouraging cross-modality similarity
21
(Dinh et al., 2022)
similarity loss
Encouraging cross-modality similarity
21
(Dinh et al., 2022)
Encouraging cross-modality similarity
22
Few-shot speech translation quality on CoVoST en-de
BLEU w/ 100% data: 14.9
(Dinh et al., 2022)
Ongoing: close the gap to pivot-based translation
23
X
decoder
encoder
IL (interlingua)
decoder
encoder
Y
param. sharing
Summary
24
Summary
24
Summary
24
Summary
24
References
Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. ICLR 2017.
Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, and Wolfgang Macherey. 2019. The missing ingredient in zero-shot neural machine translation.
Tu Anh Dinh, Danni Liu, and Jan Niehues. Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques. To appear in ICASSP 2022.
Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzmán, Angela Fan. The FLORES-101 evaluation benchmark for low-resource and multilingual machine translation.
Danni Liu, Jan Niehues, James Cross, Francisco Guzmán, and Xian Li. Improving zero-shot translation by disentangling positional information. ACL 2021.
Ngoc-Quan Pham, Jan Niehues, Thanh-Le Ha, and Alexander Waibel. 2019. Improving zero-shot translation with language-independent constraints. WMT 2019.
25
Question/Discussion
Thank you! :)
{danni.liu, jan.niehues}@maastrichtuniversity.nl
26