1 of 15

Goal-Directed Extractive Summarization of Financial Reports

Yash Agrawal1, Vivek Anand1, Manish Gupta1,2

 S Arunachalam2, Vasudeva Varma1

1IIIT-Hyderabad, 2ISB Hyderabad

2 of 15

Financial Annual Report Summarization

  • Automatically summarize financial annual reports
    • American 10-K filings
    • Extractive Summarization
  • Goal of summaries:
    • To help make Buy or Sell decision. 
    • Incorporating operations related information

2

3 of 15

Related Work

  • Financial Narrative Summarization (Shared Task)
    • Considers sections as summaries
    • Uses simple regex-based tools to extract sections
  • Xiong and Litman, COLING 2014
    • Review summarization using review helpfulness as guide
    • Used supervised LDA (sLDA) to rank sentences for extractive summarization
  • Ng et al, COLING 2012
    • Used category specific features to guide sentence ranks for summarization

3

4 of 15

Approach and Evaluation Overview

4

  • Unsupervised Approach
  • Goal Directed Summarization
  • Goal Specific Evaluation
  • Model Reliability using Portfolio Construction

Summarization

Evaluation

5 of 15

Data Collection

5

  • SEC for 10-K filings (Indian equivalent: SEBI).
  • SEC for SIC codes, 414 unique in our dataset.
  • Yahoo Finance for stock price data.
  • Extract MD&A section using regex.
  • Using stock price data to label the 10-K filings as buy or sell.
  • Half dataset is used for evaluation as there is training involved during evaluation time.

6 of 15

Architecture

6

  • Hierarchical Network with sentence level attention
  • SBERT used to encode sentences and Bi-LSTM to encode document
  • Model referred to as HFinSum

7 of 15

Incorporating Operations Information 

"We expect growth in real-estate business."    >>     "We expect growth."

7

We take a multi-task learning based approach

8 of 15

Final Architecture

8

  • Stock Movement Prediction (Goal 1)
  • Industry Classification (Goal 2)
  • Multi-Task Learning: Simultaneously for both the goals. Loss is averaged. 
  • Model is referred to as MHFinSum.

9 of 15

Evaluation

  • Intrinsic: 50 ground truth annotated summaries by experts from evaluation split selected at random used as reference summaries.
  • Extrinsic: Are summaries better at the goal itself?
    • Train HAN (document classifier) for given summaries on stock movement classification.
  • Baselines
    • BERTSUMBERTSUMEXT and BERTSUMEXTABS.
    • TEXTRANK: Page Rank with sentences as nodes in the graph.
    • LEXRANK: Page Rank + hyperlink-induced topic search, eigen vector centrality in graph.
    • LSA: Semantically important sentences through matrix decomposition.
  • Metrics
    • Intrinsic: ROUGE-1, ROUGE-2, ROUGE-L.
    • Extrinsic: Accuracy and Matthews Correlation Coefficient (MCC) for stock prediction.

9

10 of 15

Results

10

11 of 15

Model Reliability Analysis

  • Attention weights drive the predictions, what if predictions are wrong?
  • We analyze the predictions made by the model by constructing portfolio.

11

Used to get summary sentences

Used to construct portfolio for analysis

Stock movement prediction

12 of 15

Portfolio Construction Methodology

  • We use complete evaluation data split for this experiment.
  • We get buy and sell probabilities at the output for given company for given year.
  • Sort buy probabilities in descending order and choose top P most probable buy companies to form portfolio.
    • We experimented with P = 10, 25, 35, 50.
  • This is an equal weighted long-only portfolio.

12

13 of 15

Portfolio Returns

13

  • Calculated for a period 1994-2018.
  • Compound annual growth rate (CAGR) is the compounded return one would have to make same overall returns for that period.

14 of 15

Conclusion

  • Proposed goal-based summarization and evaluation in finance domain.
  • Incorporating different goals to have control over the extracted summaries.
    • Operations related information
  • Intrinsic and Extrinsic evaluation indicates that summaries extracted by proposed method is better able to satisfy the goal.
  • Models tend to be reliable (Portfolio) and thus attentions can be considered for ranking sentences.

14

15 of 15

Thank You

Yash Agrawal (yash.agrawal@research.iiit.ac.in)

Vivek Anand (vivek.a@research.iiit.ac.in)

Manish Gupta (manish.gupta@iiit.ac.in)

S Arunachalam (s_arunachalam@isb.edu)

Vasudeva Varma (vv@iiit.ac.in)

15