1 of 5

Smart Slides

Install the GPT Smart Slides plugin to create presentations like this in a flash

Smart Slides (GPT)

2 of 5

Introduction to BLEU and ROUGE Metrics

BLEU (Bilingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) are popular metrics for evaluating the quality of text generated by machine learning models.
They are widely used in tasks such as machine translation and text summarization.

Smart Slides (GPT)

BLEU Metric

BLEU is a precision-oriented metric that measures how many n-grams in the machine-generated text also appear in the human-generated reference text.
The score ranges from 0 to 1, with 1 indicating a perfect match with the reference.
Mathematically, the BLEU score is calculated as: BLEU = BP * exp(sum_{n=1}^{N} w_n * log p_n), where BP is the brevity penalty, p_n is the precision of n-grams, and w_n are the weights for each n-gram precision.

Smart Slides (GPT)

ROUGE Metric

ROUGE is a recall-oriented metric that measures how many n-grams in the human-generated reference text also appear in the machine-generated text.
There are several types of ROUGE metrics, including ROUGE-N, ROUGE-L, and ROUGE-S.
Mathematically, the ROUGE-N score is calculated as: ROUGE-N = sum_{S in references} sum_{gram_n in S} Count_match(gram_n) / sum_{S in references} sum_{gram_n in S} Count(gram_n), where Count_match(gram_n) is the maximum number of n-grams co-occurring in a candidate summary and a set of reference summaries, and Count(gram_n) is the number of n-grams in the reference summary.

Smart Slides (GPT)

Using BLEU and ROUGE with Huggingface

The Huggingface `datasets` library provides easy-to-use implementations of the BLEU and ROUGE metrics.
You can compute these metrics for a list of reference sentences and model predictions using the `compute` method.

Smart Slides (GPT)