Summarising News Articles
AlphaNLP
Ziyi Gan (z5505978)
Yanhao Li (z5491438)
Yanjian Shen(z5541664)
Liming Song (z5461675)
Jiawei Zhu (z5559772)
Problem Statement
The Need for Automated Abstractive Summarisation
2
Presenter: Jiawei ZHU (z5559772)
Motivation
Why Abstractive Summarisation?
Our motivation is to improve reading efficiency and comprehension quality for users in information-dense environments.
3
Presenter: Jiawei ZHU (z5559772)
Dataset Overview
Dataset Summary
The CNN/DailyMail dataset, which contains over 300,000 English news articles, is used for machine reading comprehension and abstract question answering.
Introduced by
Reading comprehension: Hermann et al. (2015)
Summarisation: Nallapati et al. (2016)
Pre-processing
4
Presenter: Yanhao Li (z5491438)
TASKS!!
Data Exploration
5
Conclusion: The dataset has long text, high compression, and abstract rewriting, which requires high understanding and generation capabilities of the model.
Presenter: Yanhao Li (z5491438)
Method(s)
6
Presenter: Liming SONG(z5461675)
Evaluation Metrics
7
Presenter: Liming SONG (z5461675)
Metric | Description |
📏ROUGE (1/2/L) | Measures lexical overlap using unigrams, bigrams, and longest common subsequence. Widely used in summarisation benchmarking. (word matching) |
🧠BERTScore | Calculates semantic similarity via contextual embeddings from a pretrained BERT model. (meaning match) |
🗣 BARTScore | Uses BART to compute the likelihood of reference summary given generated summary (fluency + semantic relevance). |
🔄 Junadhi Metric | A proposed metric combining coherence, informativeness, and fluency for better human alignment(Junadhi et al., 2025). |
Junadhi et al. (2025)
Coding and implementation
Remove content that is not suitable for machine reading:
URLs, advertisements, author, publisher, etc.
Cloud computing power rental platform (vast.ai),
Multi-graphics card parallel training
Handling potential exceptions and preserving weights during long training
Conduct statistics, create charts, and analyze conclusions based on metrics
8
Presenter: Yanjian Shen(z5541664)
Results
9
Presenter: Yanjian Shen(z5541664)
Results
10
Presenter: Yanjian Shen(z5541664)
Conclusion
BART: Achieves the largest relative improvement after fine‑tuning, producing fluent, coherent summaries with strong semantic alignment.
PEGASUS: Provides a strong baseline due to its summarisation‑specific pretraining, excelling in sentence compression and abstraction
11
Presenter: Ziyi Gan (z5505978)
Example
Original Summary (Human-written):
The child, who was on vacation from New Jersey, was being taught at the Bullets and Burgers range in White Hills, Arizona.
No charges will be filed in the shooting, which was declared an accident by authorities.
Pegasus (Before Training):
'Scroll down for video Sam Scarmardo, left, the owner of an Arizona shooting range, has come out to defend letting a 9-year-old girl fire an Uzi, after the girl accidentally shot dead her instructor Charles Vacca, right, with a single bullet to the head
Scene: The tragedy unfolded at Bullets and Burgers, an activity center 25 miles south of Las Vegas Charles Vacca, 39, a father and retired army general, was teaching a young girl how to fire an Uzi at Bullets and Burgers range in White Hills, Arizona on Monday when she lost control of the gun.'
12
Presenter: Ziyi Gan (z5505978)
Pegasus (After Training):
The child was being taught at the Bullets and Burgers range in White Hills, Arizona by 39-year-old Charles Vacca, a father and retired army general.
Vacca was standing next to the girl when she fired the weapon, but she and her parents only realized he was mortally wounded when one his colleagues at the shooting range rushed to help him.
Conclusion
Evaluation limits: Relied on fluency and semantic alignment. Lack context aware assessment and factuality checking
Dataset & Time Constraint: we used 8:0:0.01 dataset split, which sped up training to 3 days but limited thorough validation
Hardware limits: Initial Tesla T4 training was slow; even with RTX4090, batch size and runtime limited scalability.
Architecture exploration: Could not implement token‑level or hierarchical modelling due to time constraints
13
Presenter: Ziyi Gan (z5505978)
Conclusion
Integrate factuality‑checking modules to improve reliability.
Explore emotionally aware or context‑weighted loss functions to better capture human‑salient information.
Explore hierarchical or token‑level architectures to better mimic human summarisation.
Scale training with multi‑GPU and full dataset to improve stability and performance.
14
Presenter: Ziyi Gan (z5505978)
15
This Photo by Unknown Author is licensed under CC BY-SA