1 of 44

Neural Machine Translation English-Bengali

Rishi Dey Chowdhury

Ayush Bilkhiwal

Sujeet Kumar

Confidential

Customized for Lorem Ipsum LLC

Version 1.0

2 of 44

Hello! হ্যালো!

How are you? তুমি কেমন আছো?: 0.80, তুমি কেমন আছ?: 0.70 তুমি কেমন আছো?

(English Input) (Model generated scores) (Reference)

Confidential

Customized for Lorem Ipsum LLC

Version 1.0

3 of 44

Outline

Overview

Problems to solve

Project objective

Strategy

Model Architecture

Hyperparameters

Model Performance

Translation Results

Conclusion

Confidential

Customized for Lorem Ipsum LLC

Version 1.0

4 of 44

Overview

Language being the heart of communication, with the increase in computational power and growing need for conversion of English content to Indic Languages to make it more accessible to local people, Neural Network based methods have taken over the existing statistical methods to generate better machine(automated) translations.
We here explore one such direction of Neural Machine Translation to convert English Content to North-Eastern Language (in our case Bengali).

5 of 44

Project objective

Creating Baseline Systems using different architectures for Language Pair.
Identification of Features which can be used in Building Feature based MT systems.

6 of 44

Literature

Survey

7 of 44

A detailed search for previous works related to the field of Neural Machine Translation Applications for English to Bengali and vice-versa translation, revealed very few approaches being experimented with these Language Pairs.
The methods adopted in all these papers mostly included Simple RNN, LSTM and GRU [1].
Other efforts involve using Back Translation to increase the amount of training examples to better train a model [2].
As per our search only one paper used BiLSTM with attention and Self-Attention based Modern Transformer Based Architecture [3].
This is mostly due to lack of availability of large parallel corpus.��

Literature Survey

8 of 44

Datasets

English-Bangla (en-bn) dataset from Samanantar and WAT Indic (Workshop on Asian Translation).
The Samanantar dataset contains 92,51,702 parallel sentences. WAT Indic dataset contains 2390 parallel sentences.
Various domains such as news, sports, tech, entertainment, lifestyle, education, business, and general.
Training

1,54,836 parallel sentences from Samanatar for running our small model.
7,28,047 parallel sentences from Samanatar for running our large model.
Number of unique words in English and Bengali data are 1,16,349 and 1,48,459 respectively. �

9 of 44

Datasets

Validation

3694 sentences taken from Samanatar.
1194 sentences from Indic. WAT

Testing

1194 sentences from Indic. WAT

�

10 of 44

Data Preprocessing

Byte-Pair Encoding to tokenize both the train and test data.
Vocabulary size of 32,000 for both English and Bengali language.
Normalization

Converting the sentences to lower case and then by using standard unicode normalization.

Tokenization

Adding the bos (beginning of sentence) token at the start and the eos (end of sentence) token and end of each encoded sentence in both the language.
Adding unk (unknown) for the unknown subwords encountered and pad (padding) token for padding the sentences.

11 of 44

Output Sentence Selection

Neural Network is just the way to calculate the conditional probability of the next word that comes in the output sentence given the previous word generated in the output sequence till now and the input sentence in the case of Machine Translation. Now, we can opt for several ways, like Greedy Search, Beam Search, Minimizing Baye's Risk,etc. to find the output sequence with the highest joint probability. Looking for all possible combinations is computationally very expensive. Hence, we resort to heuristic and asymptotically best methods to generate the output sequence with highest probability. We resorted to two ways which are:

Greedy Search: Picks the next id with the highest softmax probability. Though it works for short sentences it is not suitable for many cases.
Minimum Baye's Risk(MBR): Generates multiple candidate translations. Compare each one of them with all other using a similarity score(in our case ROUGE). Then, choosing the one with the highest similarity score, gives us the candidate translation that is in consensus with all the generated samples.

ROUGE SCORE =

MBR Selection Criterion:

12 of 44

MODELS

13 of 44

Architecture

1 Layer Transformer Architecture

We changed the Hyperparameters of the Transformer and trained 2 of these models to compare their performance.

14 of 44

With Attention head 0

Reference: https://jalammar.github.io/illustrated-transformer/

15 of 44

heads(0-7)

Feed Forward Neural Network

Reference: https://jalammar.github.io/illustrated-transformer/

16 of 44

With Multi headed Self Attention

Here we are using 2 heads we get the importance of tired word as well as the importance of animal word.

If we add all the attention heads to the picture, however, things can be harder to interpret

Reference: https://jalammar.github.io/illustrated-transformer/

17 of 44

Architecture

	8 Heads	8 Heads Big
Number of Layers	1	1
Number of Heads	8	8
Embedding Dimension	256	256
Key Dimension	32	32
Value Dimension	32	32
Number of Parallel Sentences	1,54,836	7,28,047
Epochs	10	10
Batch Size	256	256

18 of 44

Architecture

2 Layer Transformer Architecture

We changed the Hyperparameters of the Transformer and trained 4 of these models to compare their performance.

19 of 44

Architecture

	4 Heads	8 Heads	8 Heads Big	8 Heads Dim Mod
Number of Layers	2	2	2	2
Number of Heads	4	8	4	8
Embedding Dimension	256	256	256	512
Key Dimension	32	32	32	64
Value Dimension	32	32	32	64
Number of Parallel Sentences	1,54,836	1,54,836	7,28,047	1,54,836
Epochs	10	10+10	10 (pretrained weights from 8-heads)+10	10+10
Batch Size	256	256	256	256

20 of 44

Architecture

3 Layer Transformer Architecture

Hyperparameters:

Number of Layers = 2

Number of Heads = 4

Embedding Dimension = 256

Key Dimension = 32

Value Dimension = 32

Parallel Sentences = 1,54,836

Epochs = 10

Batch Size = 256

21 of 44

Architecture

4 Layer Transformer Architecture

We changed the Hyperparameters of the Transformer and trained 2 of these models to compare their performance.

22 of 44

Architecture

	4 Heads	4 Heads Big
Number of Layers	4	4
Number of Heads	4	4
Embedding Dimension	256	256
Key Dimension	32	32
Value Dimension	32	32
Number of Parallel Sentences	1,54,836	7,28,047
Epochs	10	10
Batch Size	256	256

23 of 44

RESULTS

24 of 44

Results

For 1 Layers we look at the comparison between 8 Heads and 8 Heads Big’s accuracy

For smaller dataset

For larger dataset

25 of 44

Results

For 1 Layers we look at the comparison between 8 Heads and 8 Heads Big’s Loss

For smaller data

For larger data

26 of 44

Results

	8 Heads	8 Heads Big
BLEU	1.51	3.03
chrF2	25.52	29.33
TER	100.29	96.24

27 of 44

Manual Evaluation of 8 Heads on Test Data

English Sentence	Bengali Translation	Adequacy	Fluency
Are we leaving for good?	আমরা ভাল যাব?	0	0
Investigation was taken away.	তদন্ত করে খুঁজতে তদন্ত করা হয়েছে।	0	0
The palace is an extended part of a huge complex.	প্রাসাদটি একটি জটিল জটিল একটি জটিল স্তর ধারণ করে।	0	0
Then he abruptly disappeared.	তারপর হঠাৎ হারিয়ে গেলো সে।	4	4
He had a rope tied around his waist.	তার কোমর ঘিরে ফেলে তিনি।	1	4
There have been numerous ideas and attempts to reduce the amount of carbon emissions.	অনেকগুলি অণুযায়ী অনেক কিছু সফ্টভ্যতার জন্য প্রয়োজনীয় উপাদান সরবরাহ করা হয়েছে।	0	0
In India, faith and Nature have had a deep link since ancient times.	ভারতে, বিশ্বাস এবং প্রকৃতি এবং প্রকৃতি যে প্রাচীন কালে ছিল ভারত।	2	1

28 of 44

Manual Evaluation of 8 Heads on Test Data

English Sentence	Bengali Translation	Adequacy	Fluency
What kind?	কোন ধরনের?	4	4
India has emerged as a bright spot in the global economy which is driving global growth as well.	ভারতে বিশ্ব অর্থনীতি সারা বিশ্বে একটি উজ্জ্বল শিল্প সঞ্চার করে এবং বিশ্বের অর্থনীতি বিশ্বব্যাপী বিনিয়োগ করছে।	2	2
Do you think it is possible for mere humans to come to know our almighty Creator, as stated here in the Bible?	আপনার কি মনে হয়, তুমি কি জানো আর আমাদের সৃষ্টিকর্তা সম্বন্ধে জানতে পারবে না?	3	2

29 of 44

Results

Heads	English	Bengali Translation: MBR Score(10 samples)	Reference Translation
8	I love you.	তোমায় ভালোবাসি।?: 0.83, আমি তোমায় ভালবাসি।: 0.83	আমি তোমাকে ভালোবাসি।
8	How are you.	তুমি কেমন আছ।: 0.93, তুমি কেমন আছ?: 0.89	তুমি কেমন আছো।
8	I am hungry.	আমি ক্ষুধার্ত মানুষ।: 0.80, আমি ক্ষুধার্ত।: 0.74	আমি ক্ষুধার্ত।
8	I am a boy.	আমি ছেলে।: 0.98, আমি তো ছেলে।: 0.85	আমি ছেলে।
8 Big	I love you.	ভালোবাসি।: 0.36, আমার সঙ্গে ভালবাসার সম্পর্ক।: 0.36	আমি তোমাকে ভালোবাসি।
8 Big	How are you.	কেমন আছো তুমি।: 0.99, তুমি কেমন আছো।: 0.99	তুমি কেমন আছো।
8 Big	Hyderabad is a beautiful city.	হায়দরাবাদের সুন্দর শহর।: 0.8576, হায়দ্রাবাদের একটি শহর।: 0.8573	হায়দ্রাবাদ একটি সুন্দর শহর।
8 Big	My name Rishi.	আমার নাম ঋষি।: 0.87, আমার নাম।: 0.83	আমার নাম ঋষি।

30 of 44

Results

For 2 Layers we look at the comparison between 4 Heads and 8 Heads’ Accuracy

31 of 44

Results

For 2 Layers we look at the comparison between 4 Heads and 8 Heads’ Loss

32 of 44

Results

For 2 Layers we look at the comparison between 8 Heads and 8 Heads Big’s Accuracy

33 of 44

Results

For 2 Layers we look at the comparison between 8 Heads Big and 8 Heads Dim’s Accuracy

And the same trend holds for losses as well.

34 of 44

Results

	4 Heads	8 Heads	8 Heads Big	8 Heads Dim Mod
BLEU	0.80	1.07	2.82	0.06
chrF2	20.25	20.48	29.59	5.04
TER	107.92	98.69	93.91	99.45

35 of 44

Manual Evaluation of 8 Heads Big on Test Data

English Sentence	Bengali Translation	Adequacy	Fluency
His demise is anguishing.	তাঁর মৃত্যু মহাসমাঢ়।	4	3
This is the ninth interaction in the series by the Prime Minister through video conference with the beneficiaries of various Government schemes.	প্রধানমন্ত্রীর বিভিন্ন প্রকল্পের মাধ্যমে এই আলোচনা সভা ছাড়াও প্রধানমন্ত্রী বিভিন্ন ধরনের সচিবদের সঙ্গে আলাপ-আলোচনা করবেন।	0	1
He said that the Union Government is working with an approach of “isolation to integration” to develop all the hitherto under-developed parts of the country.	প্রধানমন্ত্রী বলেছেন, দেশের সার্বিক উন্নয়নের লক্ষ্যে কেন্দ্রীয় সরকার একযোগে কাজ করছে।	1	4
Imran khan taking oath	শপথ নিলেন ইমরান খান।	4	4
Samsung has been heavily rumoured to launch two new mid-end smartphones the Galaxy J7 (2017) and Galaxy J5 (2017).	স্যামসাং গ্যালাক্সি এম ০১ (২০১৭) এবং স্যামসাং (২০১৭-১৮৯), স্যামসাং এর দুটি নতুন স্মার্টফোন বাজারে এসেছে।	1	3
Hence, the people in the area are in panic.	ফলে আতঙ্কে রয়েছে এলাকাবাসী।	4	4
The issue has not come to my notice.	বিষয়টি আমার নজরে আসেনি।	4	4

36 of 44

Manual Evaluation of 8 Heads Big on Test Data

English Sentence	Bengali Translation	Adequacy	Fluency
But there is no use.	কিন্তু তাতে কোনও লাভ হয়নি।	2	4
I just hate feeling helpless.	আমি শুধু অসহায় বোধ করি।	1	4
I congratulate the Finance Minister Arun Jaitley Jee for presenting an excellent Budget.	এ নিয়ে বাজেট বক্তৃতায় প্রধানমন্ত্রী নরেন্দ্র মোদীর সঙ্গে বিভিন্ন বাজেটের জন্য শুভেচ্ছা জানাই।	0	0

37 of 44

Results

Heads	English	Bengali Translation: MBR Score(10 samples)	Reference Translation
4	I love you.	আমি তোমাকে ভালোবাসি।: 0.98, আমি তোমাকে ভালবাসি।: 0.97	আমি তোমাকে ভালোবাসি।
4	Thank You!	ধন্যবাদ!: 0.99, ধন্যবাদ তোমার!: 0.72	ধন্যবাদ!
4	Modiji is India's Prime Minister.	কিন্তু নরেন্দ্র মোদী সরকার।: 0.81,নরেন্দ্র মোদী সরকারের ভারত।: 0.77	মোদিজি ভারতের প্রধানমন্ত্রী
8	I am tired.	আমি ক্লান্ত হয়ে গেছি।: 0.93, ক্লান্ত হয়ে গেছি।: 0.86	আমি ক্লান্ত
8	How are you?	তুমি কেমন আছো?: 0.80, তুমি কেমন আছ?: 0.70	তুমি কেমন আছো
8	Hello!	হ্যালো!	হ্যালো!
8	Let's start! আসুন শুরু যাক!	চল শুরু করি!
8 Big	Let's Start!	চলো, শুরু করি!: 0.72, চলো শুরু করছি!: 0.72	চল শুরু করি!
8 Big	I like Durga Puja very much.	আমি দুর্গা পূজা করতে খুব পছন্দ করি।: 0.76, আমার কাছে দুর্গা পূজা খুব ভাল লাগে।: 0.76, আমি দুর্গা পূজা খুব ভালোবাসি।: 0.75	আমি দুর্গা পূজা করতে খুব পছন্দ করি।
8 Big	I am hungry	আমি ক্ষুধার্ত।: 0.95, আমার ক্ষুধার্ত।: 0.87	আমি ক্ষুধার্ত।
8 Dim	I love you.	কিন্তু অপরিবর্তিত থাকতে পেরেছিলেন ঠিক।: 0.52, ১৩ ঈশ্বরের বাক্য বাইবেল জোর নেই।: 0.52	আমি তোমাকে ভালোবাসি।

38 of 44

Results

For 3 Layers & 4 Heads

39 of 44

Results

	4 Heads
BLEU	0.01
chrF2	2.85
TER	211.89

Heads	English	Bengali Translation: MBR Score(10 samples)	Reference Translation
4	I love you.	কলকাতা, নোয়াখালী মতো-এর নিচে পূর্ণ করে তাদের কম কাছে খুশি করা এবং অন্যটি যা বিভিন্ন পরিশ্রম হয়।: 0.37, তারা গোকীতে আমি এই ভূমিকা।: 0.37	আমি তোমাকে ভালোবাসি।

Manual Evaluation of all examples gives 0 Adequacy and 0 Fluency

40 of 44

Results

For 4 Layers we look at the comparison between 4 Heads and 4 Heads Big’s Accuracy

41 of 44

Results

For 4 Layers we look at the comparison between 4 Heads and 4 Heads Big’s Loss

42 of 44

Results

	4 Heads	4 Heads Big
BLEU	0.00	0.00
chrF2	3.29	2.18
TER	429.87	100.00

Heads	English	Bengali Translation: MBR Score(10 samples)	Reference Translation
4	I love you.	কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু কিন্তু	আমি তোমাকে ভালোবাসি।
4 Big	I love you.	দেশের ছোট সরকারি ওঁকে সামনে।: 0.43, রাস্তায় কোন তেমনটা কোন পেয়েছে পারে কারো এখানে এ গ্রেপ্তারবেন হয় করে সঙ্গে।: 0.43	আমি তোমাকে ভালোবাসি।

Manual Evaluation of all examples gives 0 Adequacy and 0 Fluency

43 of 44

Conclusion

Model faces issues in fitting when it grows in complexity but has less data available to train all the parameters.
Out of all the models trained it is quite starking that the one with the least complexity seems to perform the best that is the transformer with 1 Layer and 8 Heads. The reason might be due to not that big training data.
The model exposed to larger data performs better than the one exposed to less data.
Model trained on larger data is able to translate named entity better, even if it hasn't seen it before in the data. (Like my Name Rishi, Places name like Hyderabad)
The Model learns associations between words quite well e.g. Modiji is converted to Narendra Modi in translations due to its appearance multiple times in dataset.
Larger Models tend to robust to punctuation marks.
Model faces issue in discriminating between spellings like ভাল and ভালো.

�

44 of 44

English Bengali Translation: MBR Score Reference Translation

Thank You! ধন্যবাদ!: 0.99, ধন্যবাদ তোমার!: 0.72 ধন্যবাদ!