1 of 26

Automatic Summarization of Scientific Articles�from Biomedical Domain

Sadik Ahammed Siddique, 1405016

Md. Kaykobad Reza, 1405057

2 of 26

Problem Definition

Usually there is a large medical history for every patient.
Difficult for doctors to find useful information.
Lots of obsolete and irrelevant information.
Cumbersome and time consuming to summarize manually.
Needs a way to extract useful and relevant information.

3 of 26

Motivation

4 of 26

Motivation

Large Medical Corpus

Computer program

Summary

5 of 26

Motivation

6 of 26

Previous Works

TextRank:

Slight modification of “PageRank” by Google.
Builds a graph considering each sentence as a node and similarity between two sentences as a weighted edge.
Good for multi document summarization and web link analysis.

7 of 26

Previous Works

TF-IDF:

TF-IDF stands for Term Frequency-Inverse Document Frequency.
Based on numerical statistics that reflects the importance of a word in a document.
The term frequency is the raw count of a term in a document - measures how frequently the term occurs in the document.
The inverse document frequency measures how much information the word provides - if it is common or rare across documents.
TF-IDF score is calculated as:

TF-IDF(t; d;D) = TF (t; d) * IDF(t;D)

8 of 26

Previous Works

Paragraph Extraction:

Finds the most important paragraph of a document by determining how the paragraphs are related to each other.
Each paragraph is represented by a node on graph and similar paragraphs are linked by edges.
Nodes are traversed in a specific way (Bushy path, Depth-first path, Segmented bushy path etc.) depending on some criteria to select the most important nodes to generate the summary.

9 of 26

Methodology

Approach 1:

WordRank

We modified TextRank a bit to find out the central words in a document.
We first split a article into sentences.
Then we extracted the noun, adjective and verbs from each sentence and converted into vectors using Word2Vec.
Built a graph making each word a node, and edges between nodes are weighted according to cosine similarity.
Score of a word is the sum of all the edge weights that are incident on the node.
Score of a sentence is the sum of the scores of all the words in it.
Most scored sentences are picked for generating summary.

10 of 26

Methodology

WordRank:

11 of 26

Comparison

Comparison between TextRank and WordRank

Algorithm	Recall	Precision	F1-Score
TextRank	47.07	35.31	37.75
WordRank	49.79	29.48	34.92

ROUGE-1 Score

Algorithm	Recall	Precision	F1-Score
TextRank	21.39	16.26	17.31
WordRank	22.45	13.31	15.79

ROUGE-2 Score

12 of 26

Methodology

Approach 2:

Hybrid:

Combination of TextRank and WordRank
We found out that TextRank results in higher Precision when WordRank results in higher Recall.
So, we decided to combine them in a way so that we can improve both.
Four combinations are shown here.

13 of 26

Methodology

TextRank

WordRank

Summary

(50%)

TR_50_Word_16

Document

Final

Summary

14 of 26

Methodology

TextRank

WordRank

Summary

(64%)

TR_64_Word_125

Document

Final

Summary

15 of 26

Comparison

Comparison between TextRank and Modified algorithms:

Algorithm	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-SU4
TR_50_Word_16	35.14	15.96	26.00	18.31
TR_64_Word_125	35.01	15.85	25.83	18.20
TextRank	37.75	17.31	29.02	19.82

16 of 26

Methodology

WordRank

TextRank

Summary

(50%)

Word_50_TR_16

Document

Final

Summary

17 of 26

Methodology

WordRank

TextRank

Summary

(64%)

Word_64_TR_125

Document

Final

Summary

18 of 26

Result

Dataset:

The algorithms are evaluated in the context of single document summarization task, using articles from PubMed.
Abstract of an article is treated as the summary of the whole article.
We selected 9978 from 100,000 articles as our dataset.
For each document, we produced summary which is 8% in length of the original document.

19 of 26

Result

Comparison between All the algorithms are shown below:

20 of 26

Result

21 of 26

Result

Median of the ROUGE Scores

Algorithm	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-SU4
Paragraph Extraction	34.41	16.31	27.78	19.09
TR_50_Word_16	35.83	15.09	26.02	17.58
TR_64_Word_125	35.68	15.00	25.89	17.45
TextRank	38.47	16.56	28.99	19.16
WordRank	35.56	14.95	25.84	17.43
Word_50_TR_16	38.18	16.61	29.00	19.15
Word_64_TR_125	38.28	16.70	29.05	19.25

22 of 26

Result

Standard Deviation(SD) of the ROUGE Scores

Algorithm	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-SU4
Paragraph Extraction	0.0767	0.0611	0.0554	0.0606
TR_50_Word_16	0.1093	0.0833	0.0697	0.0826
TR_64_Word_125	0.1090	0.0827	0.0696	0.0821
TextRank	0.1085	0.0856	0.0750	0.0836
WordRank	0.1092	0.0825	0.0697	0.0820
Word_50_TR_16	0.1095	0.0866	0.0749	0.0849
Word_64_TR_125	0.1090	0.0862	0.0748	0.0844

23 of 26

Result

Sentiment Polarity Comparison of the Algorithms

Algorithm	No. of Summaries with Opposite Polarity	% of Error
Paragraph Extraction	1952	19.56
TR_50_Word_16	1973	19.77
TR_64_Word_125	1959	19.63
TextRank	1926	19.30
WordRank	1990	19.94
Word_50_TR_16	1947	19.51
Word_64_TR_125	1950	19.54

24 of 26

Result

Subjectivity Comparison of the Algorithms

Algorithm	Squared Difference	RMS difference
Paragraph Extraction	0.0072	0.0835
TR_50_Word_16	0.0012	0.0351
TR_64_Word_125	0.0045	0.0674
TextRank	0.0102	0.1010
WordRank	0.0005	0.0222
Word_50_TR_16	0.0041	0.0638
Word_64_TR_125	0.0064	0.0798

25 of 26

Future Works

We will try to optimize this algorithm to perform better.
We will run this algorithm on other datasets to see how it performs.
We will request field experts to score our summaries.
We will apply machine learning methods to see if we can generate better summary.

26 of 26

Thank You