1 of 35

Rank Your Summaries Enhancing Bengali Text Summarization via Ranking based Approach

Authors:

G. M. Shahariar *

Tonmoy Talukder *

Rafin Alam Khan Sotez

Md. Tanvir Rouf Shawon

* denotes equal contribution

Presented by —

Tonmoy Talukder

Ahsanullah University of Science and Technology

Paper ID: 312

2 of 35

2

  • Introduction
  • Related Work
  • Research Question
  • Objectives
  • Outcomes and Impacts
  • Dataset Preparation
  • Methodology
  • Conclusion
  • Future Research Direction
  • References

Contents

of

Presentation

3 of 35

Introduction

3

4 of 35

Bengali Text Summarization

4

  • Languages with limited resources, such as Bengali, face challenges in developing accurate text summarization systems.

  • Pre-trained transformer models like BERT [1] and T5 [2] have improved Bengali text summarization by capturing contextual information.

5 of 35

Related Work

5

6 of 35

Sentence similarity measurement for bengali abstractive text summarization. [3]

6

  • This paper applied sentence similarity measurement method using cosine similarity and word embeddings
  • Creates and uses a Bengali news corpus

  • Limitations: Small corpus, no semantic/syntactic analysis

  • Difference:
    • This paper selects sentences, our paper generates summaries applying ranking-based approach
    • This paper: simple and fast, domain-independent, Our paper: complex and sophisticated, domain-specific

7 of 35

Automatic back transliteration of romanized bengali (banglish) to bengali. [4]

7

  • This paper introduces automatic back transliteration of romanized Bengali (Banglish) to Bengali applying ranking-based approach
  • Creates and used a Banglish-Bengali parallel corpus

  • Limitations: limited corpus size, no handling of out-of-vocabulary words

  • Difference:
    • This paper converts Banglish to Bangla, our paper generates summaries from Bengali texts
    • This paper: solves a transliteration problem, domain-independent, Our paper: solves a summarization problem, domain-specific

8 of 35

The evaluation of sentence similarity measures. [5]

8

  • This paper evaluates various methods of measuring sentence similarity based on lexical, syntactic, semantic, and pragmatic features
  • Uses two datasets of sentence pairs from different domains and languages
  • Evaluates the methods based on correlation with human judgments and classification accuracy

  • Limitations: no analysis of the impact of individual features, no comparison with state-of-the-art methods, no application to specific tasks

  • Difference:
    • This paper measures sentence similarity, our paper generates summaries based on sentence ranking
    • This paper: surveys existing methods, domain- and language-independent, Our paper: proposes a new method, domain- and language-specific

9 of 35

Ranking paragraphs for improving answer recall in open-domain question answering. [6]

9

  • This paper proposes a paragraph ranking model that uses query expansion and paragraph filtering techniques
  • Uses a large-scale corpus of web documents and questions
  • Evaluates the model based on answer recall and F1-score

  • Limitations: no analysis of the impact of query expansion and paragraph filtering, no evaluation of answer quality or relevance

  • Difference:
    • This paper ranks paragraphs for question answering, our paper ranks summaries for text summarization
    • This paper: solves a question answering problem, domain and language independent, Our paper: solves a text summarization problem, domain and language specific

10 of 35

10

    • How can we select the most suitable summary for a given document in Bengali, a language with limited resources?

Research Question

11 of 35

11

    • Develop a novel ranking approach for summaries generated by pre-trained transformer models. This approach will select the most suitable summary based on its ranking score, which will allow for the identification of informative and coherent summaries.
    • Evaluate the effectiveness of the ranking approach using multiple metrics. The metrics will be used to measure the accuracy, fluency, and informativeness of the summaries.

Objectives

12 of 35

12

    • A rank-based approach was proposed that uses multiple pre-trained models to generate summaries and ranks them based on quality.
    • The approach was evaluated using various metrics, and it was shown to outperform existing methods.
    • The implementation of the approach is available for further research.

Outcomes and Impacts

13 of 35

Dataset

13

14 of 35

Data Statistics

14

  • We have used two datasets mentioned below:

Dataset

Total Summaries

XL-Sum [7]

10126

Bangla Text Summarization [8]

5000

Table 01: Dataset Statistics

15 of 35

Methodology

15

16 of 35

Methodology Flowchart

16

Figure 01: Flow chart of Summary Ranking.

17 of 35

Proposed Approach

17

Figure 02: Proposed Methodology.

18 of 35

Output Examples

18

Figure 03: Example of a few candidate summaries generated by all the models along with the reference and best-ranked summary on two randomly picked newspaper texts.

Result Diagram

19 of 35

Experimental Results

19

20 of 35

Hyper Parameter Settings

20

Maximum output token length

400

Minimum output token length

64

Maximum input token length

512

no_repeat_ngram_size

2

Beam size

4

21 of 35

Evaluation Metrics

21

  • BLEU Score (Bilingual Evaluation Understudy) [9]
  • ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [10]
  • BERTScore [11]
  • METEOR (Metric for Evaluation of Translation with Explicit Ordering) [12]
  • WER (Word Error Rate) [13]
  • WIL (Word Information Lost) [13]

22 of 35

Performance Measurements

22

Table 1: Performance comparison between the input text and all the summaries on two different datasets

Summary

XLSum Dataset

Bangla Text Summarization Dataset

WIL

METEOR

WER

BERTScore (F1)

WIL

METEOR

WER

BERTScore (F1)

Given Summary

0.0099

0.196

0.0098

0.673

0.0098

0.278

0.0097

0.651

Best Summary

0.0095

0.347

0.0094

0.723

0.0092

0.361

0.0090

0.725

Model A

0.0098

0.320

0.0097

0.716

0.0095

0.332

0.0092

0.715

Model B

0.0098

0.296

0.0097

0.709

0.0095

0.326

0.0093

0.714

Model C

0.0081

0.579

0.0081

0.625

0.0082

0.489

0.0079

0.765

Model D

0.0100

0.025

0.0099

0.625

0.0099

0.032

0.0098

0.624

23 of 35

23

23

Summary

XLSum Dataset

Bangla Text Summarization Dataset

WIL

METEOR

WER

BERTScore (F1)

WIL

METEOR

WER

BERTScore (F1)

Best Summary

0.0095

0.189

0.017

0.749

0.0094

0.192

0.040

0.708

Model A

0.0095

0.182

0.012

0.750

0.0095

0.164

0.031

0.701

Model B

0.0097

0.143

0.012

0.735

0.0095

0.163

0.031

0.702

Model C

0.0099

0.108

0.051

0.679

0.0096

0.185

0.078

0.681

Model D

0.0100

0.007

0.019

0.619

0.0099

0.033

0.052

0.635

Table 2: Performance comparison between the reference and all other summaries (candidate and best ranked) on two different datasets

24 of 35

24

24

XLSum Dataset

Summary Model

BLEU 3

BLEU 4

ROUGE Version

Recall

Precision

F1 Score

Best Summary

0.783

0.0496

r-1

0.313

0.222

0.249

r−2

0.132

0.096

0.107

r−l

0.260

0.186

0.208

Best Summary

0.0300

0.0130

r−1

0.433

0.118

0.184

r−2

0.176

0.044

0.069

r−1

0.392

0.107

0.167

Bangla Text Summarization

Table 3: BLEU and ROUGE scores comparison between the reference and all other summaries (candidate and best ranked) on two different datasets.

25 of 35

25

25

XLSum Dataset

Summary Model

BLEU 3

BLEU 4

ROUGE Version

Recall

Precision

F1 Score

Model A

0.0765

0.0463

r-1

0.288

0.227

0.245

r−2

0.125

0.096

0.105

r−l

0.245

0.191

0.208

Model A

0.0253

0.0108

r−1

0.369

0.107

0.165

r−2

0.144

0.038

0.060

r−1

0.337

0.098

0.151

Bangla Text Summarization

26 of 35

26

26

XLSum Dataset

Summary Model

BLEU 3

BLEU 4

ROUGE Version

Recall

Precision

F1 Score

Model B

0.0502

0.029

r-1

0.235

0.187

0.201

r−2

0.088

0.068

0.074

r−l

0.197

0.155

0.168

Model B

0.0248

0.0102

r−1

0.367

0.108

0.166

r−2

0.141

0.038

0.059

r−1

0.332

0.098

0.151

Bangla Text Summarization

27 of 35

27

27

XLSum Dataset

Summary Model

BLEU 3

BLEU 4

ROUGE Version

Recall

Precision

F1 Score

Model C

0.0125

0.0064

r-1

0.277

0.075

0.112

r−2

0.072

0.018

0.027

r−l

0.202

0.055

0.082

Model C

0.0200

0.0089

r−1

0.454

0.080

0.132

r−2

0.187

0.029

0.049

r−1

0.415

0.073

0.121

Bangla Text Summarization

28 of 35

28

28

XLSum Dataset

Summary Model

BLEU 3

BLEU 4

ROUGE Version

Recall

Precision

F1 Score

Model D

1.13E-05

2.91E-82

r-1

0.017

0.010

0.012

r−2

0.001

0.000

0.000

r−l

0.016

0.010

0.012

Model D

0.0009

0.0001

r−1

0.099

0.020

0.034

r−2

0.016

0.003

0.005

r−1

0.093

0.019

0.032

Bangla Text Summarization

29 of 35

29

29

Figure 04: Statistics of the summaries per model that are selected by our approach on both datasets.

30 of 35

Best Summaries Statistics

30

Figure 04: Statistics of the summaries per model that are selected by our approach on both datasets.

31 of 35

31

Conclusion

  • Text summarization is a valuable tool for condensing large amounts of text and extracting key information.
  • Low-resource languages like Bengali pose unique challenges for text summarization.
  • A rank-based approach that leverages multiple models and selects the best summary can enhance the accuracy and quality of the generated summaries.

32 of 35

32

Future Research Direction

  • Using different pre-trained transformer models to generate summaries.
  • Developing more sophisticated ranking algorithms to select the best summary.
  • Applying the rank-based approach to other low-resource languages.
  • Investigating the impact of the rank-based approach on the accuracy and quality of the generated summaries.

33 of 35

References

33

[1] Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics (Jun 2019)

[2] Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (Jun 2021)

[3] Masum, A.K.M., Abujar, S., Tusher, R.T.H., Faisal, F., Hossain, S.A.: Sentence similarity measurement for bengali abstractive text summarization. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). pp. 1–5. IEEE (2019)

[4] Shibli, G.S., Shawon, M.T.R., Nibir, A.H., Miandad, M.Z., Mandal, N.C.: Automatic back transliteration of romanized bengali (banglish) to bengali. Iran Journal of Computer Science pp. 1–12 (2022)

[5] Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: Data Warehousing and Knowledge Discovery: 10th International Conference, DaWaK 2008 Turin, Italy, September 2-5, 2008 Proceedings 10. pp. 305–316. Springer (2008)

[6] Lee, J., Yun, S., Kim, H., Ko, M., Kang, J.: Ranking paragraphs for improving answer recall in open-domain question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (Oct-Nov 2018)

34 of 35

References

34

[7] Hasan, T., Bhattacharjee, A., Islam, M.S., Mubasshir, K., Li, Y.F., Kang, Y.B., Rahman, M.S., Shahriyar, R.: XL-sum: Large-scale multilingual abstractive summarization for 44 languages. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics (Aug 2021)

[8] Bengali Text Summarization. (n.d.). Bengali Text Summarization | Kaggle. https:///datasets/hasanmoni/bengali-text-summarization

[9] Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. pp. 311–318 (2002)

[10] Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out. pp. 74–81 (2004)

[11] Zhang*, T., Kishore*, V., Wu*, F., Weinberger, K.Q., Artzi, Y.: Bertscore: Evaluating text generation with bert. In: International Conference on Learning Representations (2020), https://openreview.net/forum?id=SkeHuCVFDr

[12] Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. pp. 65–72 (2005)

[13] Morris, A.C., Maier, V., Green, P.: From wer and ril to mer and wil: improved evaluation measures for connected speech recognition. In: Eighth International Conference on Spoken Language Processing (2004)

35 of 35

THANKS!

Any questions?