Fact Aware Multi-Task Learning for Text Coherence Modeling
Tushar Abhishek, Daksh Rawat, Manish Gupta, Vasudeva Varma
tushar.abhishek@research.iiit.ac.in, daksh.rawat@students.iiit.ac.in, manish.gupta@iiit.ac.in, vv@iiit.ac.in
What is textual coherence?
Coherent Text
Incoherent Text
Previous work
Previous work
Proposed Transformer based architectures
RoBERTa (Liu et al., 2019) for short sequences (less than 512 tokens)
Longformer (Beltagy et al., 2020) for very long sequences (up to 2048 tokens)
1. Vanilla Transformer
2. Fact-Aware transformers
3. Fact-Aware Multi-Task Learning (MTL) Transformers
Combining coherence specific loss and NLI-based task loss
Text coherence evaluation tasks
Text coherence datasets
| Document count | Average sent. count | Average word count | Synthetic document count |
Train | 1376 | 21.0 | 529.8 | 29720 |
Test | 1090 | 21.9 | 564.3 | 21800 |
Text coherence datasets
Domain | Document count | Average sent. count | Average word count | Low, Medium, High coherence (%) |
Yahoo | 1200 | 7.5 | 162.1 | 46.6, 17.4, 37.0 |
Clinton | 1200 | 6.6 | 189.0 | 28.2, 20.6, 51.1 |
Enron | 1200 | 7.7 | 196.2 | 29.9, 19.4, 50.7 |
Yelp | 1200 | 7.5 | 183.1 | 27.1, 21.8, 51.1 |
Text coherence datasets
Prompt | Essay count | Genre | Average word count | Range of scores |
1 | 1783 | argumentative | 350 | 2-12 |
2 | 1800 | argumentative | 350 | 2-12 |
3 | 1726 | response | 150 | 0-3 |
4 | 1772 | response | 150 | 0-3 |
5 | 1805 | response | 150 | 0-4 |
6 | 1800 | response | 150 | 0-4 |
7 | 1569 | narrative | 250 | 0-30 |
8 | 723 | narrative | 650 | 0-60 |
Experimental setup
Results: Sentence ordering on WSJ
| Models | PRA |
Baselines | LC [Li and Hovy, 2014] | 74.10 |
PARSEQ [Lai and Tetreault, 2018] | 74.10 | |
Seq2Seq [Li and Jurafsky, 2017] | 86.95 | |
CNN-Egrid [Mohiuddin et al., 2018] | 88.69 | |
Unified (ELMo) [Moon et al., 2019] | 93.19 | |
Coh+GR [Farag and Yannakoudakis, 2019] | 93.20 | |
LCD-L [Xu et al., 2019] | 95.49 | |
Coh+GR_BERT [Farag et al., 2020] | 96.10 | |
LCD_BERT [Farag et al., 2020] | 97.10 | |
Ours | Vanilla Transformer | 97.34 |
Fact-aware Transformer | 97.81 | |
Fact-aware MTL Transformer | 98.22 |
Results: 3-way classification on GCDC
| Model | Yahoo | Clinton | Enron | Yelp | Average |
Baselines | Flesch-Kincaid grade level [Kincaid et al., 1975] | 43.5 | 56.0 | 52.5 | 55 | 51.8 |
Coh+SOX [Farag and Yannakoudakis, 2019] | 50.5 | 58.5 | 51.0 | - | 53.3 | |
Hierachical LSTM [Farag and Yannakoudakis, 2019] | 55.0 | 59.0 | 50.5 | - | 54.8 | |
PARSEQ [Lai and Tetreault, 2018] | 54.9 | 60.2 | 53.2 | 54.4 | 55.7 | |
LC [Li and Hovy, 2014] | 53.5 | 61.0 | 54.4 | - | 56.3 | |
PARSEQ (all) [Lai and Tetreault, 2018] | 58.5 | 61.0 | 53.9 | 56.5 | 57.5 | |
Coh+GR [Farag and Yannakoudakis, 2019] | 56.0 | 62.0 | 56.0 | - | 58.0 | |
Incremental-lex-coh [Jeon et al., 2020] | 57.3 | 61.3 | 54.5 | 59.0 | 58.1 | |
Avg-RoBERTa-Doc [Jeon et al., 2020] | 60.0 | 65.3 | 55.0 | 58.8 | 59.8 | |
Avg-XLNET-Doc [Jeon et al., 2020] | 60.5 | 65.9 | 56.9 | 59.0 | 60.6 | |
Ours | Vanilla Transformer (all) | 58.1 | 63.9 | 55.3 | 57.6 | 58.7 |
Fact-aware Transformer | 59.2 | 67.2 | 56.3 | 58.5 | 60.3 | |
Fact-aware MTL transformer | 60.7 | 67.4 | 56.4 | 59.0 | 60.8 |
Results: Automated Essay Scoring (ASAP)
| Models | Prompts | ||||||||
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Average |
Baselines | CohLSTM [Mesgar et al., 2018] | 0.669 | 0.634 | 0.591 | 0.710 | 0.639 | 0.716 | 0.729 | 0.641 | 0.666 |
EASE (SVR) | 0.781 | 0.630 | 0.621 | 0.749 | 0.782 | 0.771 | 0.727 | 0.534 | 0.699 | |
EASE (BLRR) | 0.761 | 0.606 | 0.621 | 0.742 | 0.784 | 0.775 | 0.730 | 0.617 | 0.705 | |
EASE+CohLSTM [Mesgar et al., 2018] | 0.784 | 0.654 | 0.663 | 0.788 | 0.793 | 0.794 | 0.756 | 0.646 | 0.735 | |
Constraint MTL [Cummins et al., 2016] | 0.816 | 0.667 | 0.654 | 0.783 | 0.801 | 0.778 | 0.787 | 0.692 | 0.747 | |
Attention based RCNN [Dong, et al., 2017] | 0.822 | 0.682 | 0.672 | 0.814 | 0.803 | 0.811 | 0.801 | 0.705 | 0.764 | |
SkipFlow [Tay et al., 2018] | 0.832 | 0.684 | 0.695 | 0.788 | 0.815 | 0.810 | 0.800 | 0.697 | 0.765 | |
Ours | Longformer | 0.824 | 0.660 | 0.693 | 0.820 | 0.795 | 0.810 | 0.817 | 0.701 | 0.765 |
Longformer+Fact aware MTL Transformer | 0.822 | 0.674 | 0.696 | 0.821 | 0.798 | 0.812 | 0.822 | 0.699 | 0.768 | |
Qualitative analysis
Lexical coherence
Text inputs | Vanilla | MTL |
Mary ate some apples. She likes apples. | 1.45 | 2.00 |
Mary ate some apples. She likes pears. | 1.37 | 1.84 |
Mary ate some apples. She likes Paris. | 1.26 | 1.52 |
Pinochet was arrested. His arrest was unexpected. | 1.81 | 2.76 |
Pinochet was arrested. His death was unexpected. | 1.67 | 1.56 |
Qualitative analysis: Temporal order
Text inputs | Vanilla | Ours |
Washington was unanimously elected president in the first two national elections. He oversaw the creation of a strong, well financed national government. | 1.93 | 2.79 |
Washington oversaw the creation of a strong, well-financed national government. He was unanimously elected president in the first two national elections. | 1.88 | 2.36 |
Qualitative analysis: Centering/Referential coherence
Text inputs | Vanilla | Ours |
John went to his favorite music store to buy a piano. He had frequented the store for many years. He was excited that he could finally buy a piano. He arrived just as the store was closing for the day. | 2.38 | 2.86 |
John went to his favorite music store to buy a piano. It was a store John had frequented for many years. He was excited that he could finally buy a piano. It was closing just as John arrived. | 2.45 | 2.67 |
Take-aways
THANK YOU