1 of 26

Automatic Summarization of Scientific Articles�from Biomedical Domain

Sadik Ahammed Siddique, 1405016

Md. Kaykobad Reza, 1405057

2 of 26

Problem Definition

  • Usually there is a large medical history for every patient.
  • Difficult for doctors to find useful information.
  • Lots of obsolete and irrelevant information.
  • Cumbersome and time consuming to summarize manually.
  • Needs a way to extract useful and relevant information.

3 of 26

Motivation

4 of 26

Motivation

Large Medical Corpus

Computer program

Summary

5 of 26

Motivation

6 of 26

Previous Works

  • TextRank:
    • Slight modification of “PageRank” by Google.
    • Builds a graph considering each sentence as a node and similarity between two sentences as a weighted edge.
    • Good for multi document summarization and web link analysis.

7 of 26

Previous Works

  • TF-IDF:
    • TF-IDF stands for Term Frequency-Inverse Document Frequency.
    • Based on numerical statistics that reflects the importance of a word in a document.
    • The term frequency is the raw count of a term in a document - measures how frequently the term occurs in the document.
    • The inverse document frequency measures how much information the word provides - if it is common or rare across documents.
    • TF-IDF score is calculated as:
      • TF-IDF(t; d;D) = TF (t; d) * IDF(t;D)

8 of 26

Previous Works

  • Paragraph Extraction:
    • Finds the most important paragraph of a document by determining how the paragraphs are related to each other.
    • Each paragraph is represented by a node on graph and similar paragraphs are linked by edges.
    • Nodes are traversed in a specific way (Bushy path, Depth-first path, Segmented bushy path etc.) depending on some criteria to select the most important nodes to generate the summary.

9 of 26

Methodology

  • Approach 1:
    • WordRank
      • We modified TextRank a bit to find out the central words in a document.
      • We first split a article into sentences.
      • Then we extracted the noun, adjective and verbs from each sentence and converted into vectors using Word2Vec.
      • Built a graph making each word a node, and edges between nodes are weighted according to cosine similarity.
      • Score of a word is the sum of all the edge weights that are incident on the node.
      • Score of a sentence is the sum of the scores of all the words in it.
      • Most scored sentences are picked for generating summary.

10 of 26

Methodology

  • WordRank:

11 of 26

Comparison

  • Comparison between TextRank and WordRank

Algorithm

Recall

Precision

F1-Score

TextRank

47.07

35.31

37.75

WordRank

49.79

29.48

34.92

ROUGE-1 Score

Algorithm

Recall

Precision

F1-Score

TextRank

21.39

16.26

17.31

WordRank

22.45

13.31

15.79

ROUGE-2 Score

12 of 26

Methodology

  • Approach 2:
    • Hybrid:
      • Combination of TextRank and WordRank
      • We found out that TextRank results in higher Precision when WordRank results in higher Recall.
      • So, we decided to combine them in a way so that we can improve both.
      • Four combinations are shown here.

13 of 26

Methodology

TextRank

WordRank

Summary

(50%)

  • TR_50_Word_16

Document

Final

Summary

14 of 26

Methodology

TextRank

WordRank

Summary

(64%)

  • TR_64_Word_125

Document

Final

Summary

15 of 26

Comparison

  • Comparison between TextRank and Modified algorithms:

Algorithm

ROUGE-1

ROUGE-2

ROUGE-L

ROUGE-SU4

TR_50_Word_16

35.14

15.96

26.00

18.31

TR_64_Word_125

35.01

15.85

25.83

18.20

TextRank

37.75

17.31

29.02

19.82

16 of 26

Methodology

WordRank

TextRank

Summary

(50%)

  • Word_50_TR_16

Document

Final

Summary

17 of 26

Methodology

WordRank

TextRank

Summary

(64%)

  • Word_64_TR_125

Document

Final

Summary

18 of 26

Result

Dataset:

  • The algorithms are evaluated in the context of single document summarization task, using articles from PubMed.
  • Abstract of an article is treated as the summary of the whole article.
  • We selected 9978 from 100,000 articles as our dataset.
  • For each document, we produced summary which is 8% in length of the original document.

19 of 26

Result

  • Comparison between All the algorithms are shown below:

20 of 26

Result

21 of 26

Result

Median of the ROUGE Scores

Algorithm

ROUGE-1

ROUGE-2

ROUGE-L

ROUGE-SU4

Paragraph Extraction

34.41

16.31

27.78

19.09

TR_50_Word_16

35.83

15.09

26.02

17.58

TR_64_Word_125

35.68

15.00

25.89

17.45

TextRank

38.47

16.56

28.99

19.16

WordRank

35.56

14.95

25.84

17.43

Word_50_TR_16

38.18

16.61

29.00

19.15

Word_64_TR_125

38.28

16.70

29.05

19.25

22 of 26

Result

Standard Deviation(SD) of the ROUGE Scores

Algorithm

ROUGE-1

ROUGE-2

ROUGE-L

ROUGE-SU4

Paragraph Extraction

0.0767

0.0611

0.0554

0.0606

TR_50_Word_16

0.1093

0.0833

0.0697

0.0826

TR_64_Word_125

0.1090

0.0827

0.0696

0.0821

TextRank

0.1085

0.0856

0.0750

0.0836

WordRank

0.1092

0.0825

0.0697

0.0820

Word_50_TR_16

0.1095

0.0866

0.0749

0.0849

Word_64_TR_125

0.1090

0.0862

0.0748

0.0844

23 of 26

Result

Sentiment Polarity Comparison of the Algorithms

Algorithm

No. of Summaries with Opposite Polarity

% of Error

Paragraph Extraction

1952

19.56

TR_50_Word_16

1973

19.77

TR_64_Word_125

1959

19.63

TextRank

1926

19.30

WordRank

1990

19.94

Word_50_TR_16

1947

19.51

Word_64_TR_125

1950

19.54

24 of 26

Result

Subjectivity Comparison of the Algorithms

Algorithm

Squared Difference

RMS difference

Paragraph Extraction

0.0072

0.0835

TR_50_Word_16

0.0012

0.0351

TR_64_Word_125

0.0045

0.0674

TextRank

0.0102

0.1010

WordRank

0.0005

0.0222

Word_50_TR_16

0.0041

0.0638

Word_64_TR_125

0.0064

0.0798

25 of 26

Future Works

  • We will try to optimize this algorithm to perform better.
  • We will run this algorithm on other datasets to see how it performs.
  • We will request field experts to score our summaries.
  • We will apply machine learning methods to see if we can generate better summary.

26 of 26

Thank You