1 of 23

summarization using text processing and pre-trained models

Fatemeh Rahimi

2 of 23

Summarization

Going through a lot of documents to find the most important parts�
Manually:

Time consuming
Not practical��

2

3 of 23

Why is Summarization Useful?

over-growing the amount of data available online
Automatic summarization algorithms are less biased than human summarizers.

3

4 of 23

Where can we use Summarization?

Researchers

Reading related work

Medical Health Records

Emergency rooms

Law companies

Summary of their previous courts

And so on ...

4

5 of 23

Summarization

5

Single Document Summarization

Summarization

Multi-Document Summarization

6 of 23

Types of Summarization

6

Extractive Summarization

Summarization

Abstractive Summarization

7 of 23

Extractive Summarization

Supervised

Binary classification

Unsupervised

Ranking Algorithms
Graph based approaches
Clustering

7

8 of 23

Extractive Summarization (Supervised)

A dataset with highlighted sentences

8

Sentences	Highlighted
By the Mid 19th…..	1
Japan changed int ...	0
Sentence 3	0
Sentence 4	1
...	...
...	...
Sentence n	0

9 of 23

Extractive Summarization (Supervised)

We need input of numbers for classification, But we have sentences and words.

What is the solution?

9

10 of 23

Pretrained-models

A deep neural network
A Transfer learning approach
A model that has already learned �a good presentation of fake data
Use for another task

NLP pretrained models:

BERT (a breakthrough)
Roberta
T5 (state-of-the-art)

10

11 of 23

Pretrained-models (Cont.)

Train the neural network on Wikipedia

So that model learn good features and context of the language

And use it for

Sentiment analysis
Question Answering
Summarization
Semantic Textual Similarity
Information Retrieval
etc. �

11

12 of 23

Let’s use pre-trained models for Summarization

Supervised

12

Sentences within the document

Pretrained models

Embeddings

ML approaches

To find the highlighted sentences

13 of 23

Extracting Embeddings with pre-trained models

What is Word Embedding?

A vector of numbers that represent a word
Use as input for Machine Learning approaches

Word Embeddings in BERT:�From base model: [12x768]

13

14 of 23

Extractive Summarization (Supervised)

Having Embeddings:

Vectors that represent Words

Binary classification

Deep Neural Network

Output layer (Logistic activation function)

Machine Learning

Ridge regression
Random forest

14

15 of 23

Extractive Summarization (UnSupervised)

(Useful When u also have a topic �or question to find summary on)��Ranking Algorithms:

Cosine Similarity (most basic form)
BM25 (old, reliable)
DSSM (DNN)
Conv-KNRM (new, DNN)

15

16 of 23

Extractive Summarization (UnSupervised)

Clustering

K-means
DBSCAN
….

16

17 of 23

Extractive Summarization (UnSupervised)

Graph-based

Node: sentences of documents
Edges: similarity between sentence

Extracting summary:

Finding Centroids
Clustering the graph

17

18 of 23

18

19 of 23

Wrap Up

We Learned:

What Multi-document Extractive Summarization is.
What pre-trained models are.
How to use pre-trained models for Extractive Summarization.

19

20 of 23

Thanks for listening

Any Questions?

20

21 of 23

BERT

21

22 of 23

Evaluation for summarization

ROUGE

stands for Recall-Oriented Understudy for Gisting Evaluation

ROUGE-N

measures unigram, bigram, trigram and higher order n-gram overlap between the system summary and reference summary

ex: ROUGE-1, ROUGE-2

ROUGE-L

measures longest matching sequence of words using LCS

22

23 of 23

DBSCAN

23