Rank Your Summaries Enhancing Bengali Text Summarization via Ranking based Approach
Authors:
G. M. Shahariar *
Tonmoy Talukder *
Rafin Alam Khan Sotez
Md. Tanvir Rouf Shawon
* denotes equal contribution
Presented by —
Tonmoy Talukder
Ahsanullah University of Science and Technology
Paper ID: 312
2
Contents
of
Presentation
Introduction
3
Bengali Text Summarization
4
Related Work
5
Sentence similarity measurement for bengali abstractive text summarization. [3]
6
Automatic back transliteration of romanized bengali (banglish) to bengali. [4]
7
The evaluation of sentence similarity measures. [5]
8
Ranking paragraphs for improving answer recall in open-domain question answering. [6]
9
10
Research Question
11
Objectives
12
Outcomes and Impacts
Dataset
13
Data Statistics
14
Dataset | Total Summaries |
XL-Sum [7] | 10126 |
Bangla Text Summarization [8] | 5000 |
Table 01: Dataset Statistics
Methodology
15
Methodology Flowchart
16
Figure 01: Flow chart of Summary Ranking.
Proposed Approach
17
Figure 02: Proposed Methodology.
Output Examples
18
Figure 03: Example of a few candidate summaries generated by all the models along with the reference and best-ranked summary on two randomly picked newspaper texts.
Result Diagram
Experimental Results
19
Hyper Parameter Settings
20
Maximum output token length | 400 |
Minimum output token length | 64 |
Maximum input token length | 512 |
no_repeat_ngram_size | 2 |
Beam size | 4 |
Evaluation Metrics
21
Performance Measurements
22
Table 1: Performance comparison between the input text and all the summaries on two different datasets
Summary | XLSum Dataset | Bangla Text Summarization Dataset | ||||||
WIL | METEOR | WER | BERTScore (F1) | WIL | METEOR | WER | BERTScore (F1) | |
Given Summary | 0.0099 | 0.196 | 0.0098 | 0.673 | 0.0098 | 0.278 | 0.0097 | 0.651 |
Best Summary | 0.0095 | 0.347 | 0.0094 | 0.723 | 0.0092 | 0.361 | 0.0090 | 0.725 |
Model A | 0.0098 | 0.320 | 0.0097 | 0.716 | 0.0095 | 0.332 | 0.0092 | 0.715 |
Model B | 0.0098 | 0.296 | 0.0097 | 0.709 | 0.0095 | 0.326 | 0.0093 | 0.714 |
Model C | 0.0081 | 0.579 | 0.0081 | 0.625 | 0.0082 | 0.489 | 0.0079 | 0.765 |
Model D | 0.0100 | 0.025 | 0.0099 | 0.625 | 0.0099 | 0.032 | 0.0098 | 0.624 |
23
23
Summary | XLSum Dataset | Bangla Text Summarization Dataset | ||||||
WIL | METEOR | WER | BERTScore (F1) | WIL | METEOR | WER | BERTScore (F1) | |
Best Summary | 0.0095 | 0.189 | 0.017 | 0.749 | 0.0094 | 0.192 | 0.040 | 0.708 |
Model A | 0.0095 | 0.182 | 0.012 | 0.750 | 0.0095 | 0.164 | 0.031 | 0.701 |
Model B | 0.0097 | 0.143 | 0.012 | 0.735 | 0.0095 | 0.163 | 0.031 | 0.702 |
Model C | 0.0099 | 0.108 | 0.051 | 0.679 | 0.0096 | 0.185 | 0.078 | 0.681 |
Model D | 0.0100 | 0.007 | 0.019 | 0.619 | 0.0099 | 0.033 | 0.052 | 0.635 |
Table 2: Performance comparison between the reference and all other summaries (candidate and best ranked) on two different datasets
24
24
| XLSum Dataset | |||||
Summary Model | BLEU 3 | BLEU 4 | ROUGE Version | Recall | Precision | F1 Score |
Best Summary | 0.783 | 0.0496 | r-1 | 0.313 | 0.222 | 0.249 |
r−2 | 0.132 | 0.096 | 0.107 | |||
r−l | 0.260 | 0.186 | 0.208 | |||
Best Summary | 0.0300 | 0.0130 | r−1 | 0.433 | 0.118 | 0.184 |
r−2 | 0.176 | 0.044 | 0.069 | |||
r−1 | 0.392 | 0.107 | 0.167 | |||
| Bangla Text Summarization |
Table 3: BLEU and ROUGE scores comparison between the reference and all other summaries (candidate and best ranked) on two different datasets.
25
25
| XLSum Dataset | |||||
Summary Model | BLEU 3 | BLEU 4 | ROUGE Version | Recall | Precision | F1 Score |
Model A | 0.0765 | 0.0463 | r-1 | 0.288 | 0.227 | 0.245 |
r−2 | 0.125 | 0.096 | 0.105 | |||
r−l | 0.245 | 0.191 | 0.208 | |||
Model A | 0.0253 | 0.0108 | r−1 | 0.369 | 0.107 | 0.165 |
r−2 | 0.144 | 0.038 | 0.060 | |||
r−1 | 0.337 | 0.098 | 0.151 | |||
| Bangla Text Summarization |
26
26
| XLSum Dataset | |||||
Summary Model | BLEU 3 | BLEU 4 | ROUGE Version | Recall | Precision | F1 Score |
Model B | 0.0502 | 0.029 | r-1 | 0.235 | 0.187 | 0.201 |
r−2 | 0.088 | 0.068 | 0.074 | |||
r−l | 0.197 | 0.155 | 0.168 | |||
Model B | 0.0248 | 0.0102 | r−1 | 0.367 | 0.108 | 0.166 |
r−2 | 0.141 | 0.038 | 0.059 | |||
r−1 | 0.332 | 0.098 | 0.151 | |||
| Bangla Text Summarization |
27
27
| XLSum Dataset | |||||
Summary Model | BLEU 3 | BLEU 4 | ROUGE Version | Recall | Precision | F1 Score |
Model C | 0.0125 | 0.0064 | r-1 | 0.277 | 0.075 | 0.112 |
r−2 | 0.072 | 0.018 | 0.027 | |||
r−l | 0.202 | 0.055 | 0.082 | |||
Model C | 0.0200 | 0.0089 | r−1 | 0.454 | 0.080 | 0.132 |
r−2 | 0.187 | 0.029 | 0.049 | |||
r−1 | 0.415 | 0.073 | 0.121 | |||
| Bangla Text Summarization |
28
28
| XLSum Dataset | |||||
Summary Model | BLEU 3 | BLEU 4 | ROUGE Version | Recall | Precision | F1 Score |
Model D | 1.13E-05 | 2.91E-82 | r-1 | 0.017 | 0.010 | 0.012 |
r−2 | 0.001 | 0.000 | 0.000 | |||
r−l | 0.016 | 0.010 | 0.012 | |||
Model D | 0.0009 | 0.0001 | r−1 | 0.099 | 0.020 | 0.034 |
r−2 | 0.016 | 0.003 | 0.005 | |||
r−1 | 0.093 | 0.019 | 0.032 | |||
| Bangla Text Summarization |
29
29
Figure 04: Statistics of the summaries per model that are selected by our approach on both datasets.
Best Summaries Statistics
30
Figure 04: Statistics of the summaries per model that are selected by our approach on both datasets.
31
Conclusion
32
Future Research Direction
References
33
[1] Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics (Jun 2019)
[2] Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (Jun 2021)
[3] Masum, A.K.M., Abujar, S., Tusher, R.T.H., Faisal, F., Hossain, S.A.: Sentence similarity measurement for bengali abstractive text summarization. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). pp. 1–5. IEEE (2019)
[4] Shibli, G.S., Shawon, M.T.R., Nibir, A.H., Miandad, M.Z., Mandal, N.C.: Automatic back transliteration of romanized bengali (banglish) to bengali. Iran Journal of Computer Science pp. 1–12 (2022)
[5] Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: Data Warehousing and Knowledge Discovery: 10th International Conference, DaWaK 2008 Turin, Italy, September 2-5, 2008 Proceedings 10. pp. 305–316. Springer (2008)
[6] Lee, J., Yun, S., Kim, H., Ko, M., Kang, J.: Ranking paragraphs for improving answer recall in open-domain question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (Oct-Nov 2018)
References
34
[7] Hasan, T., Bhattacharjee, A., Islam, M.S., Mubasshir, K., Li, Y.F., Kang, Y.B., Rahman, M.S., Shahriyar, R.: XL-sum: Large-scale multilingual abstractive summarization for 44 languages. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics (Aug 2021)
[8] Bengali Text Summarization. (n.d.). Bengali Text Summarization | Kaggle. https:///datasets/hasanmoni/bengali-text-summarization
[9] Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. pp. 311–318 (2002)
[10] Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out. pp. 74–81 (2004)
[11] Zhang*, T., Kishore*, V., Wu*, F., Weinberger, K.Q., Artzi, Y.: Bertscore: Evaluating text generation with bert. In: International Conference on Learning Representations (2020), https://openreview.net/forum?id=SkeHuCVFDr
[12] Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. pp. 65–72 (2005)
[13] Morris, A.C., Maier, V., Green, P.: From wer and ril to mer and wil: improved evaluation measures for connected speech recognition. In: Eighth International Conference on Spoken Language Processing (2004)
THANKS!
Any questions?