1 of 223

Storytelling from Structured Data and Knowledge Graphs

ANIRBAN LAHA

PARAG JAIN

ABHIJIT MISHRA

KARTHIK SANKARANARAYANAN

SARAVANAN KRISHNAN

2 of 223

How is the weather this weekend in Atlanta?

Weather Ontology

Database (Relational DB) for Weather

Natural Language Query in Weather Domain

Slight chance of showers on Saturday morning with a high of 31 degrees. Sunny day and clear skies all day Sunday.

…

....

Language Generation

NLG

Query Parser

Tabular

results

SQL

3 of 223

The Nikon D5300 DSLR Camera, which comes in black color features 24.2 megapixels and 3X optical zoom. It also has image stabilization and self-timer capabilities. The package includes lens and Lithium cell batteries.

Product Information

Product Description

4 of 223

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

Born Matthew Paige Damon October 8, 1970 Residence U.S. Occupation Actor filmmaker screenwriter

Input

Output

5 of 223

Knowledge Graph summarization

General graph summary:

Hugo Weaving acted in movie Cloud Atlas(as Bill Smoke) along with Tom Hanks(as Zachry) and in movie The Matrix(as Agent Smith). Both the movies were directed by Lana Wachowski.

Query: Show me movies directed by Lana and their lead actors.

Focus Lana

Entity focused summary(Focus Lana):

Lana Wachowski born in 1965 is the director of movies Cloud Atlas(released in 2012) and The Matrix(released in 1999)

https://neo4j.com/

6 of 223

Summarization

Headline Generation

Image Captioning

Attorney from Alton files a lawsuit

against himself by mistake

Paraphrasing

L'avocat d'Alton se

poursuit par accident

Machine

Translation

Question

Generation

When did the Lakin firm file a complaint against Alliance Mortgage?

Question Answering

Q: What are the consequences?

A: Emert Wyss had hired four law firms

and now all of them are after his money.

Text-to-Text NLG

7 of 223

Natural Language Generation

Branch of Computational Linguistic that deals with generation of natural language text from unstructured / structured textual/non-textual (data) forms. (Reiter and Dale, 2000)

Focusses on computer systems
Produces understandable texts (in English or other human languages)

Gatt et al., 2017

Multimodal

Multilingual

8 of 223

Data-to-text NLG

INPUT: Non-linguistic input
OUTPUT: Documents, Reports, Explanations, Help messages, and other kinds of text.
Knowledge Required: (1) Language, and (2) Application domain.

{

"answer":

{

"premium": {"$":502.83},

"initial_payment": {"$":100},

"monthly_payment": {"$":85.57}

}

The child and his mother:

A curious child asked his mother: “Mommy, why are some of your hairs turning grey?”

The mother tried to use this occasion to teach her child: “It is because of you, dear. Every bad action of yours will turn one of my hairs grey!”

The child replied innocently: “Now I know why grandmother has only grey hairs on her head.”

�

Unstructured Text

Table

Graph

XML

JSON

9 of 223

Data-to-text NLG: A 4D perspective

Sentiment

Emotion

Complexity

Formalness

Tone

Generation Facets

Heuristic

Statistical

Neural

Paradigms

Hybrid

Finance

Healthcare

Practical

(Domain)

Retail

Tasks

Summarization

Insightful Narratives

Report Generation

Interaction & Dialog

Tabular Data Comprehension

Open-ended vs closed generation

Input type

Structured, Unstructured – textual

Image, Video

Cognitive signals – EEG, Eye tracking, MEG

Concept: CS626, IIT Bombay

10 of 223

What this tutorial is NOT about?

Generation of Structured Representation like AMR/RDF/KB or Code

Creative Content Generation or Story/Poetry Writing

Cross lingual settings, transfer learning, k-shot learning, domain adaptation

Reasoning for content planning, conversational settings, NLU

11 of 223

What this tutorial is NOT about?

Text-to-text generation

Machine Translation
Text summarization
Simplification of Complex texts
Automatic spelling, grammar and text correction
Paraphrasing of sentences
Automatic generation of questions from text paragraphs

Multimodal-to-text generation

Speech recognition
Image Captioning
Visual Storytelling
Video description and summary generation
Natural Language explanations generation from Deep Neural Networks.

12 of 223

Tutorial Roadmap

PART 1:

Introduction to NLG from Structured data and Knowledge Bases
Traditional NLG
Statistical and Neural Methods
Evaluation Methods for NLG

PART 2:

Hybrid Methods
Role of Semantics and Pragmatics
Open Problems and Future Directions
Conclusion and Future Remarks

13 of 223

PART - 1

Traditional NLG

Statistical and Neural Methods

Evaluation Methods for NLG

14 of 223

Traditional NLG

Rule based NLG

Template based NLG

Current Approaches

Industry Solutions

Shortcomings

15 of 223

Rule based Generation – When and When Not

When the phenomenon is understood AND expressed, rules are the way to go

“Do not learn when you know!!”

When the phenomenon “seems arbitrary” at the current state of knowledge, DATA is the only handle!

Why do we say “Many Thanks” and not “Several Thanks”!
Very tedious to give a rule and fragile

Rely on machine learning to tease truth out of data.

Source: CS626 NLP, IIT Bombay

16 of 223

Table Description in Natural Language Text: High Level Rules

Name	Birth City
Albert Einstein	Ulm, Germany

Enrichment

(Verb phrase)

was born in

Subject

Object

Albert Einstein was born in Ulm, Germany

Rules:

Consider one column as “subject and the other column as object”
Use column header and extract verb phrase VP by looking up in a lexicon
Realized sentences: S + VP + O

Name	Nationality
Albert Einstein	Ulm, Germany

Albert Einstein’s nationality is German ✅

Albert Einstein is from Germany ✅

Exception

Verb ???

nationalized??

Albert Einstein …….. Germany ❌

17 of 223

Step back…

18 of 223

Communicative Goal

Knowledge

Source

Content Planning

Micro planning

Realization

Text

Natural Language Generation Pipeline

Content Selection
Content Ordering

Sentence Planning

Sentence aggregation
Lexicalization
Referring expression generation

Linguistic Realization

Lexical rules for realization
Syntax / Grammar rules

Target audience
Domain
Task

Reiter at al. 2000

Example:

Describe
Compare

19 of 223

Communicative Goal

Knowledge

Source

Natural Language Generation Pipeline

Target audience: Web
Domain: Biography
Task: Describe

Reiter at al. 2000

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

Content Planning

Micro planning

Realization

Text

20 of 223

Communicative Goal

Knowledge

Source

Content Planning

Natural Language Generation Pipeline

Target audience: Web
Domain: Biography
Task: Describe

Name: Matthew Paige Damon
Born: October 8, 1970
Residence: Pacific Palisades, California, United States
Occupation: Actor, filmmaker, screenwriter

At this stage we know what we want to talk about .. but still have no idea about how.

Content determination and selection

21 of 223

Content Planning

Natural Language Generation Pipeline

Name: Matthew Paige Damon
Born: October 8, 1970
Residence: Pacific Palisades, California, United States
Occupation: Actor, filmmaker, screenwriter

Micro planning

Matthew Paige Damon born in October 8, 1970
Matthew Paige Damon residence Pacific Palisades, California, United States
Matthew Paige Damon is Actor. Matthew Paige Damon is filmmaker. Matthew Paige Damon is screenwriter.

Fakeness alert: For example purpose there is some structure in the sentences, but in reality everything will be in the form of data structures passed from one layer to another. There are no sentences yet!

Matthew Paige Damon born in October 8, 1970 and residence of America. OR Matthew Paige Damon born in October 8, 1970 is an American.
He is an Actor, filmmaker and screenwriter.

Sentence aggregation, Lexicalization and referring expression

22 of 223

Content Planning

Natural Language Generation Pipeline

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

Micro planning

Matthew Paige Damon(N) born in(VP, TENSE: PAST) October 8, 1970 … American(Adj). … [Actor, filmmaker, screenwriter]

Realization

Realizer

23 of 223

Extremely Simple Template-driven NLG Architecture: Insurance case

Output

Template Manager

Intent – Template mapping

Template Repository

Query: How much should I pay ?

Info 1 (intent) : query(amount(payment)).

Info 2: {

“result":

{

"premium": {"$":502.83},

"initial_payment": {"$":100},

"monthly_payment": {"$":85.57}

}

Query Intent ⬄ Template ID

query(amount(payment)) ⬄ all_payment

Template ID : all_payment

NL text : You can choose to pay an initial payment of $ {InitPay} and a monthly payment of $ {MonthPay}, or you can pay a one-time premium of $ {prm}.

Parameters : InitPay : 100, MonthPay:85.57,

prm:502.83

You can choose to pay an initial payment of $100 and a monthly payment of $85.57, or you can pay a one-time premium of $502.83.

If 90% of your customers are asking same 10 questions, you can build a template driven system quickly with a human as fallback.

Else, templates based techniques quickly becomes difficult to manage.

https://github.com/parajain/twig/wiki

24 of 223

Rule base NLG System: SimpleNLG

[data to realize ] : [Related information]

Teacher : subject

The : determinant

Deliver : verb, past tense
Lecture : object
While : complementizer

He : subject
Be : verb, past tense
Class : preposition

In : preposition key

Realization

Engine

The teacher delivered a lecture while he was in class.

25 of 223

SimpleNLG - Usage

SPhraseSpec sentObj = new SPhraseSpec(nlgFactory);

sentObj.setSubject("John");

sentObj.setVerb("write");

sentObj.setObject("story");

String sentence = realiser.realiseSentence(sentObj);

System.out.println(sentence);

sentObj.getVerbPhrase().setFeature(Feature.TENSE, Tense.PAST);

sentObj.getVerbPhrase().setFeature(Feature.NEGATED, true);

sentObj.getVerbPhrase().setFeature(Feature.PASSIVE, true);

sentence = realiser.realiseSentence(sentObj);

System.out.println(sentence);

“John writes story”

“Story was not written by John”

“Does John write story?”

sentObj.setFeature(Feature.INTERROGATIVE_TYPE,

InterrogativeType.YES_NO);

sentence = realiser.realiseSentence(sentObj);

System.out.println(sentence);

26 of 223

Representative Public Datasets

ROBOCUP, for sportscasting (Chen and Mooney, 2008);
SUMTIME, for technical weather forecast generation (Reiter et al., 2005)
WEATHERGOV, for common weather forecast generation (Liang et al., 2009)
WikiBio (Lebret et al 2016).
ROTOWIRE and SBNATION (Wiseman, Shieber, and Rush 2017).
WEBNLG dataset (Gardent et al. 2017)
WikiTableText (Bao et al 2018)

Describing table region – typically rescricted to rows.

WikiTablePara (Laha et al, 2018)

Created from WikiTable dataset
171 tables with comprehensive descriptions.

Other NLG datasets: https://aclweb.org/aclwiki/Data_sets_for_NLG

27 of 223

Heuristic driven NLG Systems

GenL (http://kowey.github.io/GenI) :

Surface realizer based on Tree Adjoining Grammar and Minimal Recursion Semantics

RealPro

Input: Deep syntactic structure (like a parse tree) without function words
Output: Natural Language text
Introduces function words

GoPhi

Abstract Meaning Representation (AMR) to Natural Language Text
AMR graph is converted to a tree of constituents , which is transformed into English

KPML

Rich source of grammatical structures and realization rules
Multilingual resources for creating and maintaining grammar rules

RNNLG (https://shawnwun.github.io/talks/DL4NLG_20160906.pdf)

Statistical / Neural sentence plan generation + Statistical sentence plan ranking + Neural Surface realization
Spoken Dialog Domain

stay

(I John [class:proper-noun]

II New-york [class: proper-noun]

John stays in New-york

https://aclweb.org/aclwiki/Downloadable_NLG_systems

28 of 223

Other Industrial NLG Systems

Wordsmith from Automated insights: (https://automatedinsights.com)

Enables users to turn data into text using dynamic templates

Arria NLG Studio (https://www.arria.com/)

Powered by proprietary Articulate Text Language (ATL), this tool enable rules-based linguistic capabilities for natural language generation
Heavily rule based and minimal NLG

“Quill” by Narrative Science (https://narrativescience.com/)

Considers user intent (e.g., comparison of a metric across two columns of a table) then figures out what analytics to perform on the data (many of them rule-based or heuristic-based or simple sum/avg/percentage statistics)
Possibly apply some minimal classifier to figure out intent but the generation step is heavily-rule based even though they do not apply templates.

29 of 223

Shortcomings of Traditional Approaches

Rule-based systems/templates are mostly inflexible and not scalable

Non-transferrable rules pertaining to domain specific requirements / choices of language artefacts (tone, sentiment, syntax, complexity)

Typically do not leverage web scale data / freely available knowledge bases (like DBPedia, Yago, Freebase)

30 of 223

Statistical and Neural Methods

Pre-neural Statistical

Neural Methods

31 of 223

Simplified Steps

We will continue explaining recent NLG systems from this pipeline perspective

Content Selection

Content Planning

Surface Realization

32 of 223

Pre-neural

33 of 223

Moving away from Templates…..

Templates are inflexible and not scalable to different use-cases.
However, templates do not require much semantic understanding or decision making.
Can we get best of both worlds?

Have a good meaning representation of input data.
Move the linguistic decision-making to the surface realization step.
This makes surface realization more flexible than templates.

The surface realization (generation) needs additional knowledge

Knowledge from corpus perhaps? [Langkilde and Knight, 1998]

Precompute N-gram (word-pair) frequencies.

34 of 223

Flexible Surface Realization

Input Meaning Representation to the generator.

AMR captures all things to be said.

The generator converts the AMR to word lattice.

Word lattice defines transition between states.
The state transitions are labeled by words.
The conversion uses pre-defined grammar rules.
The word lattice captures all things to be said.

Statistical Ranker selects the best path in word lattice as output.

N-gram frequencies are computed from monolingual corpora.
The pre-computed N-gram frequencies are used to score the paths in the lattice.
The sequence of words corresponding to the best path is the final output string.

[Langkilde and Knight, 1998]

35 of 223

WeatherGov Table Format

RECORDS

FIELDS

RECORD TYPE

36 of 223

Generative Modeling Approach

Notation:

Text to be generated:
The tabular records:
Sequence of records:
Sequence of fields in record :

Modeling Objective:

[Liang et al, 2009]

37 of 223

Generative Modeling Approach (2)

[Liang et al, 2009]

38 of 223

Generative Modeling Approach (3)

Record Choice Model: Markov Model on record types.

Field Choice Model: Markov Model on chosen fields for each chosen record

Word Choice Model: Generate a sequence of words (uniform distribution) for field

[Liang et al, 2009]

Coherence

Saliency

39 of 223

Generative Modeling Approach (4)

Training Objective:

Inference :

Simple way to use the generative process to generate output text (Not very effective).
Use dynamic programming style decoding algorithm [Kim and Mooney, 2010].

This approach can also be called hierarchical hidden semi-Markov model (h-HSMM).
Does not involve Content Planning in modeling!!!!

[Liang et al, 2009]

[Kim and Mooney, 2010]

40 of 223

End-to-end Probabilistic Approach

Unified framework to perform content selection and surface realization.
Hierarchical Approach [like before!!!!]

Choosing records from database (macro content selection)
Choosing a subset of fields for a record (micro content selection)
Choosing a suitable template for the selected fields (surface realization)

Generate a sequence (conditional probability model):

r₁, F₁, T₁, r₂, F₂, T₂, …, STOP.

Generation Process:

[Angeli et al, 2010]

41 of 223

End-to-end Probabilistic Approach (2)

The sequence of decision steps (example):

r₁, F₁, T₁, r₂, F₂, T₂, …, STOP.

[Angeli et al, 2010]

42 of 223

End-to-end Probabilistic Approach (2)

[Angeli et al, 2010]

Activation of Rules

Rule definitions

43 of 223

End-to-end Probabilistic Approach (3)

Generation Sequence:

r₁, F₁, T₁, r₂, F₂, T₂, …, STOP.
Denote it as decision sequence :

Probability Model to Train:

Decoding:
Greedy Fashion:

Sampling Strategy:

Viterbi decoding Algorithm :

Simple, but effective

More diverse outputs

More computation!!

[Angeli et al, 2010]

44 of 223

End-to-end Probabilistic Approach (4)

Shortcomings of this approach:

Need to specify rules at every decision level.
The rules need to be specified separately for every domain.
Surface realization is again dependent on templates – Not flexible!!!

One of the earliest end-to-end approach

Learnt both content selection and surface realization together.
Content planning was not done (handled by templates).

[Angeli et al, 2010]

45 of 223

Using Probabilistic Context-Free Grammars

[Konstas and Lapata 2012]

[Konstas and Lapata 2013]

RECORDS

FIELDS

RECORD TYPE

The grammar captures the structure of the table

Note the difference from parsing

46 of 223

Using Probabilistic Context-Free Grammars (2)

The defined grammar can be equivalently represented as a hypergraph.
For a predefined structure, the hypergraph representation for the grammar is constant.

This representation helps in computing the probabilities for the grammar rules.
Inside-outside Algorithm is used for the computation.

Generation involves finding the best derivation path in the hypergraph

Below one such derivation path is shown for the string “Sunny with a low around 30 .”
Viterbi Algorithm is used for finding best path.

[Konstas and Lapata 2012]

[Konstas and Lapata 2013]

47 of 223

Using Probabilistic Context-Free Grammars (3)

Decoding Step for database d:

How to incorporate Language Model?

Dynamic programming based Algorithm [Huang and Chiang 2007].

This approach also performs Content Selection and Surface Realization end-to-end.

Performs jointly compared to sequence of decisions [Angeli 2010].

Still no Content/Document Planning!!!!

Likelihood

Derived from hypergraph

Language Model

[Konstas and Lapata 2012]

[Konstas and Lapata 2013]

48 of 223

Using Probabilistic Context-Free Grammars (4)

Incorporating document planning in an end-to-end manner.
For every table/database, the model first decides on a global document plan.

Which record types belong to a each sentence (or phrase).
How these sentences (or phrases) should be ordered.
Content Selection and Surface Realization follow.

Two kinds of methodologies for document planning:

Planning with Record Sequences – Document comprises of sentences delimited by period. Sentences can be split into sequence of record types.
Planning with Rhetorical Structure Theory [Mann and Thompson, 1988] – Deals which how text spans are hierarchically organized.

Grammar rules defined for both the above methodologies.

[Konstas and Lapata 2013]

49 of 223

Neural

50 of 223

Sequence to sequence models

Bahdanau et al., 2014

Xu et al., 2015

Rush et al.. 2015

ENCODER

Encoder States

Word Embedding

……

Decoder States

Output

51 of 223

Sequence to sequence models

Bahdanau et al., 2014

Xu et al., 2015

Rush et al.. 2015

Decoder States

Output

ENCODER

Encoder States

Word Embedding

……

…

Attention Mechanism

52 of 223

How to use Seq2Seq for structured data?

Matt

Damon

Oct

8

1970

U.S.

actor

filmmaker

screen

writer

…

53 of 223

How to use Seq2Seq for structured data?

Matt

Damon

Oct

8

1970

U.S.

actor

filmmaker

screen

writer

…

Attention

Matt Damon born on Oct 8 is an American actor…

Encoder

Attention

Decoder

54 of 223

Sequence of records…

Record

55 of 223

Sequence of records…

Mei et al. 2016

ENCODER

Encoder States

Word Embedding

……

Decoder States

Output

…

Attention Mechanism

All records are not important
Multiple records makes it difficult to learn alignment

56 of 223

Sequence of records…

Mei et al. 2016

Decoder States

Output

All records are not important
Multiple records makes it difficult to learn alignment

ENCODER

Encoder States

Word Embedding

……

…

Refiner

Helps attention to fix on important records and not be distracted by non-salient records

Prior attention

time independent

Attention Mechanism

57 of 223

57

Basic Encode-Attend-Decode Model

Too generic!

Unable to exploit structure

Matt

Damon

Oct

8

1970

U.S.

actor

filmmaker

screen

writer

…

Attention

Matt Damon born on Oct 8

is an American actor…

Encoder

Attention

Decoder

58 of 223

How to use structural information while encoding?

RECORDS

RECORDS / FIELDS

ATTRIBUTES

VALUES

RECORD TYPE

59 of 223

Capturing hierarchical structure

Jain et al. 2018

Record encoder

Attribute encoder

How do you encode hierarchical information present in tabular data? ✅
Can we reduce the complexity of a hierarchical encoder? ✅
How to encode continuous, categorical and time-range values present in the dataset? ✅
Large Vocabulary?
Dynamic table schema?

60 of 223

It’s difficult to remember floccinaucinihilipilification, can I copy?

Seq2Seq is good at producing fluent outputs. But cannot handle rare words effectively.
Copy mechanism enables the model to copy words from the input, instead of generating it (generating rare words is hard, can also handle OOV to some extent).

Nallapati et al. 2016

Miao et al, 2016

Gu et al, 2016

See et al, 2017

61 of 223

Copy actions

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

Input

Output

Another example can be generation from Wikipedia Infoboxes, which is also a structured table of facts. Lets look at the Wikipedia infobox of Matt Damon. The information here is organized in fields and values corresponding to them. For example, ‘residence’ is a field and its corresponding value is a sequence of text. Give such a table of facts, can we generate a short description about Matt Damon. Lets look at the couple of output descriptions on the right. The first one is essentially stitching together contents from the fact table, whereas the second one looks more human-like or natural. That is it has the desired properties of human language like grammaticality, coherence, etc.

In this sessions, we are going to start from a fact table like this as input and produce a natural language text like the bottom right one.

62 of 223

Copy mechanism

Nallapati et al. 2016

Miao et al, 2016

Gu et al, 2016

See et al, 2017

Context

Attention

Encoder

Decoder

Input sequence

At each time step t.

Decide: Generate or copy

63 of 223

Typical approaches for incorporating copy

Copy

Sequence level copy

Word level copy

Conditional

Induce competition between generate & copy through shared SoftMax. Joint model.
No explicit supervision

Mixture dist.

Copy/Gen. switch

Shared SoftMax

Gu et al. 2016

Output vocab

Input vocab

SoftMax

See et al. 2017

No explicit supervision

Explicit supervision

Gulcehre et al. 2016

Zhou et al. 2018

Other notable work:

Nallapati et al 2016

Miao et al, 2016

Nema et al. 2018

64 of 223

How do you encode hierarchical information present in tabular data? ✅
Can we reduce the complexity of a hierarchical encoder? ✅
How to encode continuous, categorical and time-range values present in the dataset? ✅
Large Vocabulary? Rare words.
Dynamic table schema?

65 of 223