1 of 5

Smart Slides

Install the GPT Smart Slides plugin to create presentations like this in a flash

Smart Slides (GPT)

2 of 5

Introduction to BLEU and ROUGE Metrics

  • BLEU (Bilingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) are popular metrics for evaluating the quality of text generated by machine learning models.
  • They are widely used in tasks such as machine translation and text summarization.

Smart Slides (GPT)

3 of 5

BLEU Metric

  • BLEU is a precision-oriented metric that measures how many n-grams in the machine-generated text also appear in the human-generated reference text.
  • The score ranges from 0 to 1, with 1 indicating a perfect match with the reference.
  • Mathematically, the BLEU score is calculated as: BLEU = BP * exp(sum_{n=1}^{N} w_n * log p_n), where BP is the brevity penalty, p_n is the precision of n-grams, and w_n are the weights for each n-gram precision.

Smart Slides (GPT)

4 of 5

ROUGE Metric

  • ROUGE is a recall-oriented metric that measures how many n-grams in the human-generated reference text also appear in the machine-generated text.
  • There are several types of ROUGE metrics, including ROUGE-N, ROUGE-L, and ROUGE-S.
  • Mathematically, the ROUGE-N score is calculated as: ROUGE-N = sum_{S in references} sum_{gram_n in S} Count_match(gram_n) / sum_{S in references} sum_{gram_n in S} Count(gram_n), where Count_match(gram_n) is the maximum number of n-grams co-occurring in a candidate summary and a set of reference summaries, and Count(gram_n) is the number of n-grams in the reference summary.

Smart Slides (GPT)

5 of 5

Using BLEU and ROUGE with Huggingface

  • The Huggingface `datasets` library provides easy-to-use implementations of the BLEU and ROUGE metrics.
  • You can compute these metrics for a list of reference sentences and model predictions using the `compute` method.

Smart Slides (GPT)