1 of 20

WebNLG: Natural Language Generation

Team Members:

Agnese Chiatti

Thanh Tran

Tejas Mahale

Rajeev Bhatt Ambati

1

2 of 20

Motivation

Preprocessing

Submissions are simple NMT models.
High BLEU score shows the importance of �preprocessing.

Solutions

Improved Delexicalisation
Aggregation
Constituency Parsed Trees

2

Method	BLEU
UPF-FORGE	35.70
MELBOURNE	33.27
PKU WRITER	25.36
ADAPT	10.53
TILB-NMT	25.12
Interim Baseline	1.56

3 of 20

Motivation

Problem of unknown tokens.

Neural Network architectures cannot handle out-of-vocabulary (OOV) words.

3

4 of 20

Motivation

Misplacing of nouns.

Embeddings of similar words are clustered.

Solutions

Pointer Generator Networks
Improved Data Preparation:

Delexicalisation, Aggregation, grammar-based templates

5 of 20

Outline

Data Preprocessing

Improved Delexicalisation
Aggregation
Constituency Parsed Trees

Models

Attention
Pointer Generator Networks

Results
Conclusion

5

6 of 20

Improved Delexicalisation

Example of input�

TRIPLES:

Addiction_(journal) | publisher | Wiley-Blackwell
Addiction_(journal) | ISSN_number | "1360-0443"
Addiction_(journal) | LCCN_number | 93645978
Addiction_(journal) | abbreviation | "Addiction“�

LEX:

The Addiction Journal is published by Wiley - Blackwell and is abbreviated �to Addiction. The ISSN number is 1360 - 0443 and the LCCN number is 93645978 .�

6

7 of 20

Improved Delexicalisation

Example of delexicalization:�

TRIPLES: �ENTITY1 UNIVERSITY publisher ENTITY2 ORGANIZATION ENTITY1 UNIVERSITY issn number ENTITY3 UNK ENTITY1 UNIVERSITY lccn number ENTITY4 NUMBER ENTITY1 UNIVERSITY abbreviation ENTITY5 UNK�
LEX: �The ENTITY5 Journal is published by ENTITY2 and is abbreviated to ENTITY1 number is ENTITY3 and the LCCN number is ENTITY4.

7

8 of 20

Aggregation

Examples of Aggregation:�

TRIPLES:��ENTITY1 UNIVERSITY publisher ENTITY2 ORGANIZATION ENTITY1 UNIVERSITY issn number ENTITY3 UNK ENTITY1 UNIVERSITY lccn number ENTITY4 NUMBER ENTITY1 UNIVERSITY abbreviation ENTITY5 UNK�

8

9 of 20

Constituency Parsed Trees

Inspired by observations on UPF-FORGE (Mille and Dasiopoulou, 2017)

But manual template generation is inefficient
If combined with a similar approach to UMEL, can it create synergy?

From sentences to constituents
CFG (terminal/non terminal)

On the other hand:

Longer sequences require more iterations �(increased computational cost)
Risk of underfitting

9

Img source

10 of 20

Constituency Parsed Trees for WebNLG

On training:

Lex sentences are parsed to constituency trees�Stanford Core NLP toolkit (Manning et al., 2014)�

On Dev/Test:

Incoming (delexicalised) triple sequences are POS tagged
WordNet-based similarity with 50 randomly picked training examples �is computed (pairwise comparison)�
We followed Mihalcea et al. (2006) approach for similarity computation

Corpus-based similarity to try to capture semantic similarity
WordNet is queries for token-level similarity
Then, for each sequence, check the most similar word in another sequence
Average based on sequence length

10

11 of 20

Attention Model

Problem formulation

Attention distribution

11

12 of 20

Attention Model

Context vector

Scoring function

Bahdanau et.al:

Feed-forward network

Luong et.al:

12

13 of 20

Pointer Generator Networks

Generation probability

Total probability �distribution

Log-likelihood

13

14 of 20

Visualization

Interactive Visualization

14

15 of 20

Visualization

15

16 of 20

Results

With our Evaluation schema Short < 100 char, Long otherwise

16

17 of 20

Results

With WebNLG Challenge evaluation script� (tends to overestimate)

17

18 of 20

Lessons Learned

Delexicalisation improved performance on both sets
Delexicalisation + aggregation was more effective, overall, than constituency trees
But the constituency-based solution was the only one performing better on longer sequences than on shorter sequences
Attention-embedded solution led to top performance, would have placed 1^st on unseen classes in the challenge
Terms were successfully copied from input to output when using Pointer Generator Network
Team distribution of work was great! ☺

18

19 of 20

Next Steps

Test attention seq2seq on Ctrees
For constituency parsing:

averaging across top-K candidates instead of picking most similar tree only
considering auxiliary data for development and test instead of deriving from training lex

Fine tune Pointer Generator Network
Or combine it with Attention approach
Testing Recursive Neural Networks on this task

19

20 of 20

Thank you!

Q&A ?

20