X International conference�“Information Technology and Implementation” (IT&I-2023)�Kyiv, Ukraine
1
Estimation of the Factual Correctness of Summaries of a Ukrainian-language Silver Standard Corpus
Artem Kramov, Seraf AI LLC�
Dedicated to the tenth anniversary of the Faculty of Information Technology
The essence of the problem
2
The essence of the problem (1)
The estimation of factual correctness of datasets can be important especially for low-resource languages (e.g., the Ukrainian language) as far as in most of cases such datasets (e.g., XL-Sum) are constructed in a self-supervised manner.
3
The goals of the work
4
Literature analysis
5
SummaC method
6
XL-Sum dataset
7
Results - Inconsistent summaries discrimination
For each document-summary pair in a dataset another summary was picked. The ROUGE-1 F1 score of a new summary had to be higher that the corresponding metric of the original summary. The task consisted in the ability of the metric to detect the original one by assigning higher value.
8
Metric | Model | Accuracy, % | PCC |
| paraphrase-multilingual-mpnet-base-v2 | 85.855 | 0.168 |
distiluse-base-multilingual-cased-v2 | 81.655 | 0.273 | |
| xlm-roberta-large-xnli | 75.664 | -0.075 |
Results - XL-Sum dataset
9
Results – Pre-trained model
10
Results – Metrics for the dataset and model
In order to compare the results between ground-truth and model-generated summaries, it was decided to take a median value as an average score, and the interquartile range (IQR) value for the measurement of the deviation of the metric.
11
Summaries | Median | IQR |
Ground-truth | 0.848 | 0.248 |
Model-generated | 0.958 | 0.186 |
Conclusions
12
Limitations and future directions
13
Authors
Artem Kramov, Doctor of Philosophy (Ph.D) in Information Technology, large language models engineer in Seraf AI LLC
artemkramov@gmail.com
14