1 of 15

Summarising News Articles

AlphaNLP

Ziyi Gan (z5505978)

Yanhao Li (z5491438)

Yanjian Shen(z5541664)

Liming Song (z5461675)

Jiawei Zhu (z5559772)

2 of 15

Problem Statement

The Need for Automated Abstractive Summarisation

  • In the digital era, the volume of online news has grown exponentially.
  • Readers face time constraints and shorter attention spans.
  • There is a growing demand for concise, coherent summaries of news articles.
  • Manual summarisation is not scalable for the vast number of daily publications.
  • This project addresses the challenge by building a deep learning-based abstractive summarisation system.

2

Presenter: Jiawei ZHU (z5559772)

3 of 15

Motivation

Why Abstractive Summarisation?

  • Extractive methods simply select existing sentences from the text, often lacking fluency and coherence.
  • Abstractive summarisation generates new sentences by paraphrasing and rephrasing, better imitating human summarisation.
  • This has direct applications in:
  • News aggregation platforms
  • Search engines
  • User-facing tools (e.g., reading apps, digital assistants)

Our motivation is to improve reading efficiency and comprehension quality for users in information-dense environments.

3

Presenter: Jiawei ZHU (z5559772)

4 of 15

Dataset Overview

Dataset Summary

The CNN/DailyMail dataset, which contains over 300,000 English news articles, is used for machine reading comprehension and abstract question answering.

Introduced by

Reading comprehension: Hermann et al. (2015)

Summarisation: Nallapati et al. (2016)

Pre-processing

  • Separate article and abstract
  • Multi-level text cleaning
  • Batch conversion and saving

4

Presenter: Yanhao Li (z5491438)

TASKS!!

5 of 15

Data Exploration

5

  • Article length: mostly >30 sentences, 600–1000 tokens
  • Summary length: 50–60 tokens, high compression rate
  • Summary characteristics: non-sentence duplication, restatement of key information

Conclusion: The dataset has long text, high compression, and abstract rewriting, which requires high understanding and generation capabilities of the model.

Presenter: Yanhao Li (z5491438)

6 of 15

Method(s)

  • BART (facebook/bart-large)
    • Seq2Seq transformer: BERT-style encoder + GPT-style decoder.
    • Pre-train tasks: text infilling & sentence permutation → strong in denoising and paraphrasing.
    • Widely used baseline for news summarisation; excels on longer inputs.
  • PEGASUS (google/pegasus-large)
  • Purpose-built for summarisation.
  • Pre-train with Gap-Sentence Generation (GSG) → masks whole sentences, forcing model to predict salient ones.
  • Fewer pre-train steps yet competitive or superior on CNN/DailyMail & XSum.

6

Presenter: Liming SONG(z5461675)

7 of 15

Evaluation Metrics

7

Presenter: Liming SONG (z5461675)

Metric

Description

📏ROUGE (1/2/L)

Measures lexical overlap using unigrams, bigrams, and longest common subsequence. Widely used in summarisation benchmarking. (word matching)

🧠BERTScore

Calculates semantic similarity via contextual embeddings from a pretrained BERT model. (meaning match)

🗣 BARTScore

Uses BART to compute the likelihood of reference summary given generated summary (fluency + semantic relevance).

🔄 Junadhi Metric

A proposed metric combining coherence, informativeness, and fluency for better human alignment(Junadhi et al., 2025).

Junadhi et al. (2025)

8 of 15

Coding and implementation

  • Data cleaning

Remove content that is not suitable for machine reading:

URLs, advertisements, author, publisher, etc.

  • Hardware

Cloud computing power rental platform (vast.ai),

Multi-graphics card parallel training

  • Coding process

Handling potential exceptions and preserving weights during long training

  • Data Analysis and Statistics

Conduct statistics, create charts, and analyze conclusions based on metrics

8

Presenter: Yanjian Shen(z5541664)

9 of 15

Results

9

Presenter: Yanjian Shen(z5541664)

10 of 15

Results

10

Presenter: Yanjian Shen(z5541664)

11 of 15

Conclusion

  • Key Findings

BART: Achieves the largest relative improvement after fine‑tuning, producing fluent, coherent summaries with strong semantic alignment.

PEGASUS: Provides a strong baseline due to its summarisation‑specific pretraining, excelling in sentence compression and abstraction

11

Presenter: Ziyi Gan (z5505978)

12 of 15

Example

Original Summary (Human-written):

The child, who was on vacation from New Jersey, was being taught at the Bullets and Burgers range in White Hills, Arizona.

No charges will be filed in the shooting, which was declared an accident by authorities.

Pegasus (Before Training):

'Scroll down for video Sam Scarmardo, left, the owner of an Arizona shooting range, has come out to defend letting a 9-year-old girl fire an Uzi, after the girl accidentally shot dead her instructor Charles Vacca, right, with a single bullet to the head

Scene: The tragedy unfolded at Bullets and Burgers, an activity center 25 miles south of Las Vegas Charles Vacca, 39, a father and retired army general, was teaching a young girl how to fire an Uzi at Bullets and Burgers range in White Hills, Arizona on Monday when she lost control of the gun.'

12

Presenter: Ziyi Gan (z5505978)

Pegasus (After Training):

The child was being taught at the Bullets and Burgers range in White Hills, Arizona by 39-year-old Charles Vacca, a father and retired army general.

Vacca was standing next to the girl when she fired the weapon, but she and her parents only realized he was mortally wounded when one his colleagues at the shooting range rushed to help him.

13 of 15

Conclusion

  • Limitations

Evaluation limits: Relied on fluency and semantic alignment. Lack context aware assessment and factuality checking

Dataset & Time Constraint: we used 8:0:0.01 dataset split, which sped up training to 3 days but limited thorough validation

Hardware limits: Initial Tesla T4 training was slow; even with RTX4090, batch size and runtime limited scalability.

Architecture exploration: Could not implement token‑level or hierarchical modelling due to time constraints

13

Presenter: Ziyi Gan (z5505978)

14 of 15

Conclusion

  • Future Work

Integrate factuality‑checking modules to improve reliability.

Explore emotionally aware or context‑weighted loss functions to better capture human‑salient information.

Explore hierarchical or token‑level architectures to better mimic human summarisation.

Scale training with multi‑GPU and full dataset to improve stability and performance.

14

Presenter: Ziyi Gan (z5505978)

15 of 15

15

This Photo by Unknown Author is licensed under CC BY-SA