2 of 44

Text Summarization

Goal: reducing a text with a computer program in order to create a summary that retains the most important points of the original text

Create a concise representation that retains most relevant information.
Relevant information with respect to particular aspect:

Time saving
Informing
Decision making
Orientation (maps)
Planning (maps)

3 of 44

Table of contents

4 of 44

Abstracts of papers

5 of 44

Summarize the Web

Search engines organizes information for accessibility and usefulness
Match keywords to queries (words)
Richer meaning: Refer to entities, objects (people, places, things) in real world.

6 of 44

Knowledge Graph

Searching for entities, instantly provide information relevant about the entity
Provide connections: Relations between entities
An ever growing database of structured knowledge

500 million entities
3.5 billion defining attributes and connections

7 of 44

Headline news

9 of 44

Graphical maps

10 of 44

Textual Directions

11 of 44

Questions

What kinds of summaries do people want?

What are summarizing, abstracting, gisting,...?

How sophisticated must summ. systems be?

Are statistical techniques sufficient?
Or do we need symbolic techniques and deep understanding as well?

What milestones would mark quantum leaps in summarization theory and practice?

How do we measure summarization quality?

12 of 44

13 of 44

What to summarize? Single vs. multiple documents

Single-document summarization

Given a single document, produce
abstract
outline
headline

Multiple-document summarization

Given a group of documents, produce a gist of the content:

a series of news stories on the same event
a set of web pages about some topic or question

14 of 44

Single-document Summarization

15 of 44

Multiple-document Summarization

16 of 44

Query-focused Summarization & Generic Summarization

Generic summarization:

Summarize the content of a document

Query-focused summarization:

Summarize a document with respect to an information need expressed in a user query.
A kind of complex question answering:

Answer a question by summarizing a document that has the information to construct the answer

17 of 44

Snippets: query-focused summaries

18 of 44

Summarization for Question Answering: Snippets

Create snippets summarizing a web page for a query

19 of 44

Summarization for Question Answering: Multiple documents

Create answers to complex questions summarizing multiple documents.
Instead of giving a snippet for each document
Create a cohesive answer that combines information from each document

20 of 44

Extractive summarization & Abstractive summarization

Extractive summarization:

create the summary from phrases or sentences in the source document(s)

Abstractive summarization:

express the ideas in the source documents using (at least in part) different words

21 of 44

A Summarization Machine

22 of 44

The Modules of the Summarization Machine

23 of 44

Typical 3 Stages of Summarization

1. Topic Identification: find/extract the most important material

2. Topic Interpretation: compress it

3. Summary Generation: say it in your own words

24 of 44

Overview of Topic Extraction Methods

General method: score each sentence; combine scores; choose best sentence(s)
Scoring techniques:

Position in the text: lead method; optimal position policy; title/heading method

Claim: Important sentences occur at the beginning (and/or end) of texts.
Lead method: just take first sentence(s)!

Title-Based Method

Claim: Words in titles and headings are positively relevant to summarization

Cue phrases in sentences

Claim: Important sentences contain ‘bonus phrases’, such as significantly, In this paper we show, and In conclusion, while non-important sentences contain ‘stigma phrases’ such as hardly and impossible
Method: Add to sentence score if it contains a bonus phrase, penalize if it contains a stigma phrase

Word frequencies throughout the text

Claim: Important sentences contain words that occur “somewhat” frequently
Method: Increase sentence score for each frequent word.

Cohesion: links among words; word co-occurrence; coreference; lexical chains

Claim: Important sentences/paragraphs are the highest connected entities in more or less elaborate semantic structures
Method: determine relatedness score S_i for each paragraph, and extract paragraphs with largest S_i scores

Discourse structure of the text

Claim: The multi-sentence coherence structure of a text can be constructed, and the ‘centrality’ of the textual units in this structure reflects their importance

Information Extraction: parsing and analysis

Idea: content selection using forms (templates)

25 of 44

Topic Interpretation

From extract to abstract; topic interpretation or concept fusion
Concept generalization

Sue ate apples, pears, and bananas ⇒ Sue ate fruit

Meronymy replacement

Both wheels, the pedals, saddle, chain… ⇒ the bike

Script identification (Schank and Abelson, 77)

He sat down, read the menu, ordered, ate, paid, and left ⇒ He ate at the restaurant

Metonymy

A spokesperson for the US Government announced that… ⇒ Washington announced that...

26 of 44

NL Generation for Summaries

Level 1: no separate generation

Produce extracts, verbatim from input text.

Level 2: simple sentences

Assemble portions of extracted clauses together.

Level 3: full NLG

1. Sentence Planner: plan sentence content, sentence length, theme, order of constituents, words chosen... (Hovy and Wanner, 96)

2. Surface Realizer: linearize input grammatically (Elhadad, 92; Knight and Hatzivassiloglou, 95).

27 of 44

Unsupervised content selection

Intuition dating back to Luhn (1958):

Choose sentences that have salient or informative words

Two approaches to defining salient words
tf-idf: weigh each word w_iin document j by tf-idf

topic signature: choose a smaller set of salient words

mutual information
log-likelihood ratio (LLR) Dunning (1993), Lin and Hovy (2000)

28 of 44

Topic signature-based content selection with queries

choose words that are informative either

by log-likelihood ratio (LLR)
or by appearing in the query

Weigh a sentence (or window) by weight of its words:

(could learn more complex weights)

29 of 44

Graph-based Ranking Algorithms

unsupervised sentence extraction

Identify text units that best define the task at hand, and add them as vertices in the graph.
Identify relations that connect such text units, and use these relations to draw edges between vertices in the graph. Edges can be directed or undirected, weighted or unweighted.
Iterate the graph-based ranking algorithm until convergence.
Sort vertices based on their final score. Use the values attached to each vertex for ranking/selection decisions

30 of 44

Supervised content selection

Given:

a labeled training set of good summaries for each document

Align:

the sentences in the document with sentences in the summary

Extract features

position (first sentence?)
length of sentence
word informativeness, cue phrases
cohesion

Train

a binary classifier (put sentence in summary? yes or no)

Problems:

hard to get labeled training
alignment difficult
performance not better than unsupervised algorithms

So in practice:

Unsupervised content selection is more common

31 of 44

Data and metrics

Q: How to find the list of commonly used datasets?

A: Look at the recent SOTA papers

32 of 44

Data and metrics

Q: How to find the list of commonly used datasets?

A: Look at the recent SOTA papers

Q: How to find the SOTA papers?

^{nlpprogress.com}
^{paperswithcode.com/sota}
^{connectedpapers.com}

33 of 44

Data and metrics

34 of 44

Data and metrics

35 of 44

Data and metrics

36 of 44

CNN/DailyMail

37 of 44

CNN/DailyMail

https://huggingface.co/datasets/cnn_dailymail

38 of 44

CNN/DailyMail

https://huggingface.co/datasets/cnn_dailymail

39 of 44

XSum

Let’s follow the trail…

41 of 44

Evaluating Summaries: ROUGE

ROUGE; Recall Oriented Understudy for Gisting Evaluation
Intrinsic metric for automatically evaluating summaries

Based on BLEU (a metric used for machine translation)
Not as good as human evaluation (“Did this answer the user’s question?”)
But much more convenient

Given a document D, and an automatic summary X:

Have N humans produce a set of reference summaries of D
Run system, giving automatic summary X
What percentage of the bigrams from the reference summaries appear in X?

42 of 44

A ROUGE example: Q: “What is water spinach?”

System output: Water spinach is a leaf vegetable commonly eaten in tropical areas of Asia.
Human Summaries (Gold)

Water spinach is a

Human 1: tropics.

Human 2:

green leafy vegetable grown in the

Water spinach is a

semi-aquatic tropical plant grown as a

vegetable. Human 3:

Water spinach is a commonly eaten

leaf vegetable

of Asia.

ROUGE-2 =

10 + 10 + 9

3 + 3 + 6

= 12/29 = .41

1 of 44

2 of 44

3 of 44

4 of 44

5 of 44

6 of 44

7 of 44

8 of 44

9 of 44

10 of 44

11 of 44

12 of 44

13 of 44

14 of 44

15 of 44

16 of 44

17 of 44

18 of 44

19 of 44

20 of 44

21 of 44

22 of 44

23 of 44

24 of 44

25 of 44

26 of 44

27 of 44

28 of 44

29 of 44

30 of 44

31 of 44

32 of 44

33 of 44

34 of 44

35 of 44

36 of 44

37 of 44

38 of 44

39 of 44

40 of 44

41 of 44

42 of 44

43 of 44

44 of 44