1 of 223

Storytelling from Structured Data and Knowledge Graphs

ANIRBAN LAHA

PARAG JAIN

ABHIJIT MISHRA

KARTHIK SANKARANARAYANAN

SARAVANAN KRISHNAN

2 of 223

How is the weather this weekend in Atlanta?

Weather Ontology

Database (Relational DB) for Weather

Natural Language Query in Weather Domain

Slight chance of showers on Saturday morning with a high of 31 degrees. Sunny day and clear skies all day Sunday.

....

Language Generation

NLG

Query Parser

Tabular

results

SQL

3 of 223

The Nikon D5300 DSLR Camera, which comes in black color features 24.2 megapixels and 3X optical zoom. It also has image stabilization and self-timer capabilities. The package includes lens and Lithium cell batteries.

Product Information

Product Description

4 of 223

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

Born Matthew Paige Damon October 8, 1970 Residence U.S. Occupation Actor filmmaker screenwriter

Input

Output

5 of 223

Knowledge Graph summarization

General graph summary:

Hugo Weaving acted in movie Cloud Atlas(as Bill Smoke) along with Tom Hanks(as Zachry) and in movie The Matrix(as Agent Smith). Both the movies were directed by Lana Wachowski.

Query: Show me movies directed by Lana and their lead actors.

Focus Lana

Entity focused summary(Focus Lana):

Lana Wachowski born in 1965 is the director of movies Cloud Atlas(released in 2012) and The Matrix(released in 1999)

6 of 223

Summarization

Headline Generation

Image Captioning

Attorney from Alton files a lawsuit

against himself by mistake

Paraphrasing

L'avocat d'Alton se

poursuit par accident

Machine

Translation

Question

Generation

When did the Lakin firm file a complaint against Alliance Mortgage?

Question Answering

Q: What are the consequences?

A: Emert Wyss had hired four law firms

and now all of them are after his money.

Text-to-Text NLG

7 of 223

Natural Language Generation

  • Branch of Computational Linguistic that deals with generation of natural language text from unstructured / structured textual/non-textual (data) forms. (Reiter and Dale, 2000)
    • Focusses on computer systems
    • Produces understandable texts (in English or other human languages)

Gatt et al., 2017

Multimodal

Multilingual

8 of 223

Data-to-text NLG

  • INPUT: Non-linguistic input
  • OUTPUT: Documents, Reports, Explanations, Help messages, and other kinds of text.
  • Knowledge Required: (1) Language, and (2) Application domain.

{

"answer":

{

"premium": {"$":502.83},

"initial_payment": {"$":100},

"monthly_payment": {"$":85.57}

}

}

The child and his mother:

A curious child asked his mother: “Mommy, why are some of your hairs turning grey?”

The mother tried to use this occasion to teach her child: “It is because of you, dear. Every bad action of yours will turn one of my hairs grey!”

The child replied innocently: “Now I know why grandmother has only grey hairs on her head.”

Unstructured Text

Table

Graph

XML

JSON

9 of 223

Data-to-text NLG: A 4D perspective

Sentiment

Emotion

Complexity

Formalness

Tone

Generation Facets

Heuristic

Statistical

Neural

Paradigms

Hybrid

Finance

Healthcare

Practical

(Domain)

Retail

Tasks

Summarization

Insightful Narratives

Report Generation

Interaction & Dialog

Tabular Data Comprehension

Open-ended vs closed generation

Input type

Structured, Unstructured – textual

Image, Video

Cognitive signals – EEG, Eye tracking, MEG

Concept: CS626, IIT Bombay

10 of 223

What this tutorial is NOT about?

  • Generation of Structured Representation like AMR/RDF/KB or Code

  • Creative Content Generation or Story/Poetry Writing

  • Cross lingual settings, transfer learning, k-shot learning, domain adaptation

  • Reasoning for content planning, conversational settings, NLU

11 of 223

What this tutorial is NOT about?

  • Text-to-text generation
    • Machine Translation
    • Text summarization
    • Simplification of Complex texts
    • Automatic spelling, grammar and text correction
    • Paraphrasing of sentences
    • Automatic generation of questions from text paragraphs

  • Multimodal-to-text generation
    • Speech recognition
    • Image Captioning
    • Visual Storytelling
    • Video description and summary generation
    • Natural Language explanations generation from Deep Neural Networks.

12 of 223

Tutorial Roadmap

PART 1:

  • Introduction to NLG from Structured data and Knowledge Bases
  • Traditional NLG
  • Statistical and Neural Methods
  • Evaluation Methods for NLG

PART 2:

  • Hybrid Methods
  • Role of Semantics and Pragmatics
  • Open Problems and Future Directions
  • Conclusion and Future Remarks

13 of 223

PART - 1

Traditional NLG

Statistical and Neural Methods

Evaluation Methods for NLG

14 of 223

Traditional NLG

Rule based NLG

Template based NLG

Current Approaches

Industry Solutions

Shortcomings

15 of 223

Rule based Generation – When and When Not

  • When the phenomenon is understood AND expressed, rules are the way to go

  • Do not learn when you know!!

  • When the phenomenon “seems arbitrary” at the current state of knowledge, DATA is the only handle!
    • Why do we say Many Thanksand not Several Thanks!
    • Very tedious to give a rule and fragile

  • Rely on machine learning to tease truth out of data.

Source: CS626 NLP, IIT Bombay

16 of 223

Table Description in Natural Language Text: High Level Rules

Name

Birth City

Albert Einstein

Ulm, Germany

Enrichment

(Verb phrase)

was born in

Subject

Object

Albert Einstein was born in Ulm, Germany

Rules:

  • Consider one column as “subject and the other column as object”
  • Use column header and extract verb phrase VP by looking up in a lexicon
  • Realized sentences: S + VP + O

Name

Nationality

Albert Einstein

Ulm, Germany

Albert Einstein’s nationality is German ✅

Albert Einstein is from Germany ✅

Exception

Verb ???

nationalized??

Albert Einstein …….. Germany ❌

17 of 223

Step back…

18 of 223

Communicative Goal

Knowledge

Source

Content Planning

Micro planning

Realization

Text

Natural Language Generation Pipeline

  • Content Selection
  • Content Ordering

Sentence Planning

  • Sentence aggregation
  • Lexicalization
  • Referring expression generation

Linguistic Realization

  • Lexical rules for realization
  • Syntax / Grammar rules
  1. Target audience
  2. Domain
  3. Task

Reiter at al. 2000

Example:

  • Describe
  • Compare

 

19 of 223

Communicative Goal

Knowledge

Source

Natural Language Generation Pipeline

  1. Target audience: Web
  2. Domain: Biography
  3. Task: Describe

Reiter at al. 2000

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

Content Planning

Micro planning

Realization

Text

20 of 223

Communicative Goal

Knowledge

Source

Content Planning

Natural Language Generation Pipeline

  1. Target audience: Web
  2. Domain: Biography
  3. Task: Describe
  1. Name: Matthew Paige Damon
  2. Born: October 8, 1970
  3. Residence: Pacific Palisades, California, United States
  4. Occupation: Actor, filmmaker, screenwriter

At this stage we know what we want to talk about .. but still have no idea about how.

Content determination and selection

21 of 223

Content Planning

Natural Language Generation Pipeline

  1. Name: Matthew Paige Damon
  2. Born: October 8, 1970
  3. Residence: Pacific Palisades, California, United States
  4. Occupation: Actor, filmmaker, screenwriter

Micro planning

  1. Matthew Paige Damon born in October 8, 1970
  2. Matthew Paige Damon residence Pacific Palisades, California, United States
  3. Matthew Paige Damon is Actor. Matthew Paige Damon is filmmaker. Matthew Paige Damon is screenwriter.

Fakeness alert: For example purpose there is some structure in the sentences, but in reality everything will be in the form of data structures passed from one layer to another. There are no sentences yet!

  1. Matthew Paige Damon born in October 8, 1970 and residence of America. OR Matthew Paige Damon born in October 8, 1970 is an American.
  2. He is an Actor, filmmaker and screenwriter.

Sentence aggregation, Lexicalization and referring expression

22 of 223

Content Planning

Natural Language Generation Pipeline

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

Micro planning

Matthew Paige Damon(N) born in(VP, TENSE: PAST) October 8, 1970 … American(Adj). … [Actor, filmmaker, screenwriter]

Realization

Realizer

23 of 223

Extremely Simple Template-driven NLG Architecture: Insurance case

Output

Template Manager

Intent – Template mapping

Template Repository

Query: How much should I pay ?

Info 1 (intent) : query(amount(payment)).

Info 2: {

“result":

{

"premium": {"$":502.83},

"initial_payment": {"$":100},

"monthly_payment": {"$":85.57}

}

}

Query Intent ⬄ Template ID

query(amount(payment)) ⬄ all_payment

Template ID : all_payment

NL text : You can choose to pay an initial payment of $ {InitPay} and a monthly payment of $ {MonthPay}, or you can pay a one-time premium of $ {prm}.

Parameters : InitPay : 100, MonthPay:85.57,

prm:502.83

You can choose to pay an initial payment of $100 and a monthly payment of $85.57, or you can pay a one-time premium of $502.83.

If 90% of your customers are asking same 10 questions, you can build a template driven system quickly with a human as fallback.

Else, templates based techniques quickly becomes difficult to manage.

24 of 223

Rule base NLG System: SimpleNLG

[data to realize ] : [Related information]

  • Teacher : subject
    • The : determinant
  • Deliver : verb, past tense
  • Lecture : object
  • While : complementizer
    • He : subject
    • Be : verb, past tense
    • Class : preposition
      • In : preposition key

Realization

Engine

The teacher delivered a lecture while he was in class.

25 of 223

SimpleNLG - Usage

SPhraseSpec sentObj = new SPhraseSpec(nlgFactory);

sentObj.setSubject("John");

sentObj.setVerb("write");

sentObj.setObject("story");

String sentence = realiser.realiseSentence(sentObj);

System.out.println(sentence);

sentObj.getVerbPhrase().setFeature(Feature.TENSE, Tense.PAST);

sentObj.getVerbPhrase().setFeature(Feature.NEGATED, true);

sentObj.getVerbPhrase().setFeature(Feature.PASSIVE, true);

sentence = realiser.realiseSentence(sentObj);

System.out.println(sentence);

“John writes story”

“Story was not written by John”

“Does John write story?”

sentObj.setFeature(Feature.INTERROGATIVE_TYPE,

InterrogativeType.YES_NO);

sentence = realiser.realiseSentence(sentObj);

System.out.println(sentence);

26 of 223

Representative Public Datasets

  • ROBOCUP, for sportscasting (Chen and Mooney, 2008);
  • SUMTIME, for technical weather forecast generation (Reiter et al., 2005)
  • WEATHERGOV, for common weather forecast generation (Liang et al., 2009)
  • WikiBio (Lebret et al 2016).
  • ROTOWIRE and  SBNATION (Wiseman, Shieber, and Rush 2017).
  • WEBNLG dataset (Gardent et al. 2017)
  • WikiTableText (Bao et al 2018)
    • Describing table region – typically rescricted to rows.
  • WikiTablePara (Laha et al, 2018)
    • Created from WikiTable dataset
    • 171 tables with comprehensive descriptions.

27 of 223

Heuristic driven NLG Systems

  • GenL (http://kowey.github.io/GenI) :
    • Surface realizer based on Tree Adjoining Grammar and Minimal Recursion Semantics
  • RealPro
    • Input: Deep syntactic structure (like a parse tree) without function words
    • Output: Natural Language text
    • Introduces function words
  • GoPhi
    • Abstract Meaning Representation (AMR) to Natural Language Text
    • AMR graph is converted to a tree of constituents , which is transformed into English
  • KPML
    • Rich source of grammatical structures and realization rules
    • Multilingual resources for creating and maintaining grammar rules
  • RNNLG (https://shawnwun.github.io/talks/DL4NLG_20160906.pdf)
    • Statistical / Neural sentence plan generation + Statistical sentence plan ranking + Neural Surface realization
    • Spoken Dialog Domain

stay

(I John [class:proper-noun]

II New-york [class: proper-noun]

John stays in New-york

https://aclweb.org/aclwiki/Downloadable_NLG_systems

28 of 223

Other Industrial NLG Systems

  • Wordsmith from Automated insights: (https://automatedinsights.com)
    • Enables users to turn data into text using dynamic templates
  • Arria NLG Studio (https://www.arria.com/)
    • Powered by proprietary Articulate Text Language (ATL), this tool enable rules-based linguistic capabilities for natural language generation
    • Heavily rule based and minimal NLG
  • “Quill” by Narrative Science (https://narrativescience.com/)
    • Considers user intent (e.g., comparison of a metric across two columns of a table) then figures out what analytics to perform on the data (many of them rule-based or heuristic-based or simple sum/avg/percentage statistics)
    • Possibly apply some minimal classifier to figure out intent but the generation step is heavily-rule based even though they do not apply templates.

29 of 223

Shortcomings of Traditional Approaches

  • Rule-based systems/templates are mostly inflexible and not scalable

  • Non-transferrable rules pertaining to domain specific requirements / choices of language artefacts (tone, sentiment, syntax, complexity)

  • Typically do not leverage web scale data / freely available knowledge bases (like DBPedia, Yago, Freebase)

30 of 223

Statistical and Neural Methods

Pre-neural Statistical

Neural Methods

31 of 223

Simplified Steps

We will continue explaining recent NLG systems from this pipeline perspective

Content Selection

Content Planning

Surface Realization

32 of 223

Pre-neural

33 of 223

Moving away from Templates…..

  • Templates are inflexible and not scalable to different use-cases.
  • However, templates do not require much semantic understanding or decision making.
  • Can we get best of both worlds?
    • Have a good meaning representation of input data.
    • Move the linguistic decision-making to the surface realization step.
    • This makes surface realization more flexible than templates.

  • The surface realization (generation) needs additional knowledge
    • Knowledge from corpus perhaps? [Langkilde and Knight, 1998]

  • Precompute N-gram (word-pair) frequencies.

34 of 223

Flexible Surface Realization

  • Input Meaning Representation to the generator.
    • AMR captures all things to be said.

  • The generator converts the AMR to word lattice.
    • Word lattice defines transition between states.
    • The state transitions are labeled by words.
    • The conversion uses pre-defined grammar rules.
    • The word lattice captures all things to be said.

  • Statistical Ranker selects the best path in word lattice as output.
    • N-gram frequencies are computed from monolingual corpora.
    • The pre-computed N-gram frequencies are used to score the paths in the lattice.
    • The sequence of words corresponding to the best path is the final output string.

[Langkilde and Knight, 1998]

35 of 223

WeatherGov Table Format

RECORDS

FIELDS

RECORD TYPE

36 of 223

Generative Modeling Approach

  • Notation:
    • Text to be generated:
    • The tabular records:
    • Sequence of records:
    • Sequence of fields in record :

  • Modeling Objective:

[Liang et al, 2009]

37 of 223

Generative Modeling Approach (2)

[Liang et al, 2009]

38 of 223

Generative Modeling Approach (3)

  • Record Choice Model: Markov Model on record types.

  • Field Choice Model: Markov Model on chosen fields for each chosen record

  • Word Choice Model: Generate a sequence of words (uniform distribution) for field

[Liang et al, 2009]

Coherence

Saliency

39 of 223

Generative Modeling Approach (4)

  • Training Objective:

  • Inference :
    • Simple way to use the generative process to generate output text (Not very effective).
    • Use dynamic programming style decoding algorithm [Kim and Mooney, 2010].

  • This approach can also be called hierarchical hidden semi-Markov model (h-HSMM).
  • Does not involve Content Planning in modeling!!!!

[Liang et al, 2009]

[Kim and Mooney, 2010]

40 of 223

End-to-end Probabilistic Approach

  • Unified framework to perform content selection and surface realization.
  • Hierarchical Approach [like before!!!!]
    1. Choosing records from database (macro content selection)
    2. Choosing a subset of fields for a record (micro content selection)
    3. Choosing a suitable template for the selected fields (surface realization)

  • Generate a sequence (conditional probability model):
    • r1, F1, T1, r2, F2, T2, …, STOP.

  • Generation Process:

[Angeli et al, 2010]

41 of 223

End-to-end Probabilistic Approach (2)

  • The sequence of decision steps (example):
    • r1, F1, T1, r2, F2, T2, …, STOP.

[Angeli et al, 2010]

42 of 223

End-to-end Probabilistic Approach (2)

[Angeli et al, 2010]

Activation of Rules

Rule definitions

43 of 223

End-to-end Probabilistic Approach (3)

  • Generation Sequence:
    • r1, F1, T1, r2, F2, T2, …, STOP.
    • Denote it as decision sequence :

  • Probability Model to Train:

  • Decoding:
  • Greedy Fashion:

    • Sampling Strategy:

    • Viterbi decoding Algorithm :

Simple, but effective

More diverse outputs

More computation!!

[Angeli et al, 2010]

44 of 223

End-to-end Probabilistic Approach (4)

  • Shortcomings of this approach:
    1. Need to specify rules at every decision level.
    2. The rules need to be specified separately for every domain.
    3. Surface realization is again dependent on templates – Not flexible!!!

  • One of the earliest end-to-end approach
    • Learnt both content selection and surface realization together.
    • Content planning was not done (handled by templates).

[Angeli et al, 2010]

45 of 223

Using Probabilistic Context-Free Grammars

[Konstas and Lapata 2012]

[Konstas and Lapata 2013]

RECORDS

FIELDS

RECORD TYPE

The grammar captures the structure of the table

Note the difference from parsing

46 of 223

Using Probabilistic Context-Free Grammars (2)

  • The defined grammar can be equivalently represented as a hypergraph.
  • For a predefined structure, the hypergraph representation for the grammar is constant.
    • This representation helps in computing the probabilities for the grammar rules.
    • Inside-outside Algorithm is used for the computation.
  • Generation involves finding the best derivation path in the hypergraph
    • Below one such derivation path is shown for the string “Sunny with a low around 30 .
    • Viterbi Algorithm is used for finding best path.

[Konstas and Lapata 2012]

[Konstas and Lapata 2013]

47 of 223

Using Probabilistic Context-Free Grammars (3)

  • Decoding Step for database d:

  • How to incorporate Language Model?
    • Dynamic programming based Algorithm [Huang and Chiang 2007].

  • This approach also performs Content Selection and Surface Realization end-to-end.
    • Performs jointly compared to sequence of decisions [Angeli 2010].

  • Still no Content/Document Planning!!!!

Likelihood

Derived from hypergraph

Language Model

[Konstas and Lapata 2012]

[Konstas and Lapata 2013]

48 of 223

Using Probabilistic Context-Free Grammars (4)

  • Incorporating document planning in an end-to-end manner.
  • For every table/database, the model first decides on a global document plan.
    • Which record types belong to a each sentence (or phrase).
    • How these sentences (or phrases) should be ordered.
    • Content Selection and Surface Realization follow.

  • Two kinds of methodologies for document planning:
    • Planning with Record Sequences – Document comprises of sentences delimited by period. Sentences can be split into sequence of record types.
    • Planning with Rhetorical Structure Theory [Mann and Thompson, 1988] – Deals which how text spans are hierarchically organized.

  • Grammar rules defined for both the above methodologies.

[Konstas and Lapata 2013]

49 of 223

Neural

50 of 223

Sequence to sequence models

Bahdanau et al., 2014

Xu et al., 2015

Rush et al.. 2015

ENCODER

Encoder States

Word Embedding

 

 

 

 

……

 

 

 

 

……

Decoder States

Output

 

 

 

 

 

51 of 223

Sequence to sequence models

Bahdanau et al., 2014

Xu et al., 2015

Rush et al.. 2015

Decoder States

Output

ENCODER

Encoder States

Word Embedding

 

 

 

 

……

 

 

 

 

 

 

 

 

Attention Mechanism

 

52 of 223

How to use Seq2Seq for structured data?

Matt

Damon

Oct

8

1970

U.S.

actor

filmmaker

screen

writer

53 of 223

How to use Seq2Seq for structured data?

Matt

Damon

Oct

8

1970

U.S.

actor

filmmaker

screen

writer

Attention

Matt Damon born on Oct 8 is an American actor…

Encoder

Attention

Decoder

54 of 223

Sequence of records…

 

 

 

 

 

 

Record

55 of 223

Sequence of records…

Mei et al. 2016

ENCODER

Encoder States

Word Embedding

 

 

 

 

……

Decoder States

Output

 

 

 

 

 

 

 

 

Attention Mechanism

  1. All records are not important
  2. Multiple records makes it difficult to learn alignment

56 of 223

Sequence of records…

Mei et al. 2016

Decoder States

Output

 

 

 

 

  1. All records are not important
  2. Multiple records makes it difficult to learn alignment

ENCODER

Encoder States

Word Embedding

 

 

 

 

……

 

 

 

 

Refiner

Helps attention to fix on important records and not be distracted by non-salient records

Prior attention

time independent

Attention Mechanism

57 of 223

57

  • Basic Encode-Attend-Decode Model

  • Too generic!

  • Unable to exploit structure

Matt

Damon

Oct

8

1970

U.S.

actor

filmmaker

screen

writer

Attention

Matt Damon born on Oct 8

is an American actor…

Encoder

Attention

Decoder

58 of 223

How to use structural information while encoding?

RECORDS

RECORDS / FIELDS

ATTRIBUTES

VALUES

RECORD TYPE

RECORD TYPE

59 of 223

Capturing hierarchical structure

Jain et al. 2018

Record encoder

Attribute encoder

  • How do you encode hierarchical information present in tabular data?
  • Can we reduce the complexity of a hierarchical encoder?
  • How to encode continuous, categorical and time-range values present in the dataset?
  • Large Vocabulary?
  • Dynamic table schema?

60 of 223

It’s difficult to remember floccinaucinihilipilification, can I copy?

  • Seq2Seq is good at producing fluent outputs. But cannot handle rare words effectively.
  • Copy mechanism enables the model to copy words from the input, instead of generating it (generating rare words is hard, can also handle OOV to some extent).

Nallapati et al. 2016

Miao et al, 2016

Gu et al, 2016

See et al, 2017

61 of 223

Copy actions

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

Input

Output

62 of 223

Copy mechanism

Nallapati et al. 2016

Miao et al, 2016

Gu et al, 2016

See et al, 2017

Context

Attention

Encoder

Decoder

Input sequence

 

 

 

At each time step t.

Decide: Generate or copy

63 of 223

Typical approaches for incorporating copy

Copy

Sequence level copy

Word level copy

Conditional

  • Induce competition between generate & copy through shared SoftMax. Joint model.
  • No explicit supervision

Mixture dist.

Copy/Gen. switch

Shared SoftMax

Gu et al. 2016

Output vocab

Input vocab

SoftMax

See et al. 2017

No explicit supervision

Explicit supervision

Gulcehre et al. 2016

Zhou et al. 2018

Other notable work:

Nallapati et al 2016

Miao et al, 2016

Nema et al. 2018

64 of 223

  • How do you encode hierarchical information present in tabular data?
  • Can we reduce the complexity of a hierarchical encoder?
  • How to encode continuous, categorical and time-range values present in the dataset?
  • Large Vocabulary? Rare words.
  • Dynamic table schema?

65 of 223

Conditional LM with structured input

Lebret et al. 2016

Introduced WikiBio dataset

 

66 of 223

Conditional LM with structured input

Lebret et al. 2016

Introduced WikiBio dataset

67 of 223

Conditional LM with structured input

Lebret et al. 2016

68 of 223

Conditional LM with structured input

Lebret et al. 2016

69 of 223

Conditional LM with structured input

Lebret et al. 2016

70 of 223

Conditional LM with structured input

Lebret et al. 2016

71 of 223

Conditional LM with structured input

Lebret et al. 2016

72 of 223

Conditional LM with structured input

Lebret et al. 2016

73 of 223

Conditional LM with structured input

Lebret et al. 2016

74 of 223

Conditional LM with structured input

Lebret et al. 2016

75 of 223

Lebret et al. 2016

Conditional LM with structured input

76 of 223

Conditional LM with structured input

Lebret et al. 2016

  • Lebret et al. proposed a statistical n-gram LM with local and global conditioning
  • Feed-forward network does not capture the long-range dependency in the sequence present field/record names and values
  • Attention, copy and content planning can be improved

77 of 223

Decoder

Matt Damon born on Oct 8

is an American actor…

Micro attention

Fused Attention

Matt

Damon

Oct

8

actor

Luciana

Bozan

writer

K1

K2

KM

…..

α

[f(K1); Born]

[f(K2); Occupation]

[f(KM); spouse]

…..

Macro attention

Decoder State

β

Nema et al. 2018

Liu et al. 2018

Hierarchical structure aware encoding

78 of 223

78

Matthew

Matthew Paige

Matthew Paige Damon

Matthew Paige Damon (born October 8,

Matthew Paige Damon (born October 8, 1970)

Matthew Paige Damon (born October 8, 1970) is an American

Matthew Paige Damon (born October 8, 1970) is an American actor,

Matthew Paige Damon (born October 8, 1970) is an American actor, film producer,

Matthew Paige Damon (born October 8, 1970) is an American actor, film producer, and screenwriter.

  • Input has a natural hierarchy
    • Table 🡪 Fields 🡪 Values

  • Once you visit a field you tend to stay on it for a while

  • Once you exit a field you never look back

Stay on and never look back

79 of 223

FORGET GATE: decides till when to stay on a field

Context vector seen at last time-step

  • Encodes information about previously seen field.

  • New context vector:

79

Modeling Stay-On Behaviour

Nema et al. 2018

Introduced German & French version of WikiBio

80 of 223

FORGET GATE: decides till when to stay on a field

(Soft) Orthogonalize the context  vector once it is time to forget

80

Modeling Never-Look-Back

Modified

Nema et al. 2018

81 of 223

Micro attention

Fused Attention

Matt

Damon

Oct

8

actor

Luciana

Bozan

writer

K1

K2

KM

[f(K1); Born]

[f(K2); Occupation]

[f(KM); spouse]

…..

…..

Macro attention

Decoder

Matt Damon born on Oct 8

is an American actor…

Gated Orthogonalization

Decoder State

β

α

Nema et al. 2018

82 of 223

Order-planning

Sha et al. 2018

Which field are we talking about?

83 of 223

ROTOWIRE Dataset

Matthew Paige Damon who was born in October 8, 1970 is an American actor, film producer, and screenwriter.

  • Requires short generation.
  • Less variation in style.

Wiseman et al. 2017

  • Longer generation required
  • Content selection
  • Content ordering

Wiseman et al. 2017

Introduced ROTOWIRE & SBNATION dataset

84 of 223

  •  

 

Content selection and planning

Text generation

Content selection and Planning

Puduppully et al. 2019

85 of 223

Content Selection Gate

 

 

Puduppully et al. 2019

86 of 223

Content selection and Planning

Puduppully et al. 2019

87 of 223

Content Planning

  • Content plan: Sequence of pointers each pointing to a record
  • Content plan supervision was generated via information extraction method.
  • Not generated by humans, contains inaccuracies.

Puduppully et al. 2019

88 of 223

Some issues…

  • Generation is not interpretable.
  • Zero control on generated output,

Template based generation

End2End methods

Interpretable

You can check which template was picked.

Difficult. Needs a lot of analysis to get insights.

Output control

You can select which template makes sense.

Almost None

Scalable

Needs a lot of templates.

Can scale well (in domain/task)

89 of 223

Neural Templates for Text Generation

Wiseman et al. 2018

90 of 223

HMM vs HSMMs (Hidden Semi-Markov Models)

HMM

HSMM

Wiseman et al. 2018

91 of 223

A Conditional (Neural) HSMM

Parameterize probabilities with neural components:

  • Segment probabilities are given by an RNN + attention + copy attention

Wiseman et al. 2018

92 of 223

Knowledge Graph to Text

93 of 223

Knowledge Graph to text

Neil Armstrong

United States

Astronaut

Wapakoneta

occupation

birthPlace

location

nationality

<Neil Armstrong, occupation, Astronaut>

RDF triples

Knowledge Graph

Zhu et al. 2019

94 of 223

Knowledge Graph to text

Neil Armstrong

United States

Astronaut

Wapakoneta

occupation

birthPlace

location

nationality

<Neil Armstrong, occupation, Astronaut>

<Neil Armstrong, nationality , United States>

<Neil Armstrong, birthPlace, Wapakoneta>

<Wapakoneta, location, United States>

RDF triples

Knowledge Graph

Zhu et al. 2019

95 of 223

Knowledge Graph to text

Neil Armstrong was an American astronaut in Wapakoneta, a city in United States.

Neil Armstrong

United States

Astronaut

Wapakoneta

birthPlace

location

nationality

<Neil Armstrong, occupation, Astronaut>

<Neil Armstrong, nationality , United States>

<Neil Armstrong, birthPlace, Wapakoneta>

<Wapakoneta, location, United States>

RDF triples

Knowledge Graph

Triple to text

Zhu et al. 2019

96 of 223

Minimizing KL divergence

 

 

Less loss

Allows fake/diverse/creative generation

But, low quality

 

Zhu et al. 2019

97 of 223

Minimizing inverse KL divergence

 

 

 

 

 

 

Zhu et al. 2019

98 of 223

Training

Zhu et al. 2019

99 of 223

Capture relationships�� between and within ��triples?

100 of 223

Triple encoder

  • Handle loops by topological sort and traversing in breadth-first order

Trisedy et al. 2018

101 of 223

Triple encoder

  • Handle loops by topological sort and traversing in breadth-first order

Trisedy et al. 2018

102 of 223

Triple encoder

  • Handle loops by topological sort and traversing in breadth-first order

Trisedy et al. 2018

103 of 223

Tutorial Roadmap

PART 1:

  • Introduction to NLG from Structured data and Knowledge Bases
  • Traditional NLG
  • Statistical and Neural Methods
  • Evaluation Methods for NLG

PART 2:

  • Hybrid Methods
  • Role of Semantics and Pragmatics
  • Open Problems and Future Directions
  • Conclusion and Future Remarks

104 of 223

Evaluation Methods

Overlap based Metrics

Intrinsic Evaluation

Human Evaluation

105 of 223

Expectation from a Good Evaluation Metric

  • Scale for human evaluation
    • Perfect: No problem in both information and grammar
    • Fair: Easy to understand with some un-important information missing / flawed grammar
    • Acceptable: Broken but understandable with effort
    • Nonsense: important information has been translated incorrectly

Perfect

Fair

Acceptable

Nonsense

fluency

adequacy

106 of 223

Overlap Based Metrics

107 of 223

BLEU

  • BiLingual Evaluation Understudy.
  • Traditionally used for machine translation.
    • Ubiquitous and standard evaluation metric
    • 60% NLG works between 2012-2015 used BLEU
  • Automatic evaluation technique:
    • Goal: The closer machine translation is to a professional human translation,

the better it is.

  • Precision based metric.
    • How many results returned were correct?
  • Precision for NLG:
    • How many words returned were correct?

[Papineni et al., 2002]

108 of 223

BLEU evaluation

  • Candidate (Machine): It is a guide to action which ensures that the military always obeys the commands of the party.

  • References (Human):
    1. It is a guide to action that ensures that the military will forever heed Party commands.
    2. It is the guiding principle which guarantees the military forces always being under the command of the Party.
    3. It is the practical guide for the army always to heed the directions of the party.

  • Precision =

[Papineni et al., 2002]

109 of 223

Consider this….

  • Candidate: the the the the the the the.

  • References:
    1. The cat is on the mat.
    2. There is a cat on the mat.

  • Unigram Precision = 7/7 = 1. Incorrect.
  • Modified Unigram Precision = 2/7. (based on count clipping)
  • Maximum reference count (‘the’) = 2
  • Modified 1-gram precision 🡪 Modified n-gram precision.

[Papineni et al., 2002]

110 of 223

Modified n-gram precision

  • Candidate (Machine): It is a guide to action which ensures that the military always obeys the commands of the party.

  • List all possible n-grams. (Example bigram : It is)
  • N-gram Precision =

  • Modified N-gram Precision : Produced by clipping the counts for each n-gram to maximum occurrences in a single reference.

[Papineni et al., 2002]

111 of 223

Brevity Penalty

  • Candidate sentences longer than all references are already penalized by modified n-gram precision.

  • Another multiplicative factor introduced.

  • Objective: To ensure the candidate length matches one of the reference length.
    • If lengths equal, then BP = 1.
    • Otherwise, BP < 1.

[Papineni et al., 2002]

112 of 223

Final BLEU score

  • BP 🡪 Brevity penalty.
  • 🡪 Modified n-gram precision.
  • Number
  • Weights

[Papineni et al., 2002]

113 of 223

Evaluation of data-to–text NLG: More BLUEs for BLEU

  • BLEU: Truly an Understudy
  • Intrinsically Meaningless (Ananthakrishnan et al, 2009)
    • Not meaningful in itself: What does a BLEU score of 69.9 mean?
    • Only for comparison between two or more automatic systems
  • Admits too much “combinatorial” variation
    • Many possible variations of syntactically and semantically incorrect variations of hypothesis output
    • Reordering within N-gram mismatch may not alter the BLEU scores
  • Admits too little “linguistic” variation
    • Languages allow variety in choice of vocabulary and syntax
    • Not always possible to keep all possible variations as references
    • Multiple references do not help capture variations much (Doddington, 2002; Turian et al, 2003)
  • Variants of BLEU: cBLEU (Mei et al, 2016), GLEU (Mutton et al, 2007), Q-BLEU (Nema et al, 2018), take input (source) into account

114 of 223

ROUGE

  • Recall-Oriented Understudy for Gisting Evaluation.
  • Recall based metric for NLP:
    • How many correct words were returned?

  • Candidate: the cat was found under the bed.
  • Reference: the cat was under the bed.
  • Recall =

  • ROUGE metric:

[Lin 2004]

115 of 223

Variants of ROUGE

  • ROUGE can combine both Precision and Recall to compute F1 score.
    • Precision prevents prediction of too many unnecessary words.
    • Recall encourages prediction of reference words.

  • ROUGE-N : n-gram ROUGE. Most popular are ROUGE-1 and ROUGE-2.

  • ROUGE-L : Based on length of Longest Common Subsequence (LCS).

  • ROUGE-S : ‘the boy ran away
    • skip-bigrams: ‘the boy’, ‘the ran’, ’the away’, ‘boy away’, etc.

[Lin 2004]

116 of 223

Other Metrics: METEOR, TER, WER

  • METEOR:
    • Overlap based but considers additional linguistic characteristics
    • Partial Credits for matching Stems, Synonyms and Paraphrases
    • Depends on availability of resources and tools for computing specific matches

  • Word Error Rate:
    • Minimum Number of Editing steps to Transform output to reference
    • Levensthein Distance

  • Translation Edit Rate (TER):
    • Edit distance based but allows shifting of words
    • Shifting a phrase is assumed to have the same edit cost as inserting, deleting or substituting a word, regardless of the number of words being shifted

117 of 223

Other Metrics: NIST

  • Based on BLEU, but indicates how informative the N-Grams are

  • More weights to rarer N-gram matches

  • Different calculation of brevity penalty – small variation in translation do not impact overall score

118 of 223

Problems with overlap based metrics

  • References needed
  • Assumes output space to be confined to a set of reference given
  • Often penalizes paraphrases at syntactic and deep semantic levels
  • Task agnostic
    • Cannot reward task-specific correct generation
  • Relativistic evaluation
    • Intrinsically don’t mean anything (what does 50 BLEU mean?)

119 of 223

BLEU not perfect for evaluation…..

[Liu et al., 2016]

120 of 223

ROUGE comes at a cost….

  • [Paulus et al., 2017] used Reinforcement Learning (RL) to directly optimize for ROUGE-L
    • Instead of the usual cross-entropy loss.
    • ROUGE-L is not differentiable, hence need RL-kind of framework.

  • Observation:
    • Outputs obtained with higher ROUGE-L scores, but lower human scores for relevance and readability.

Slide credit: CS224n, Stanford

[Paulus et al., 2017]

121 of 223

Summary...

  • No Automatic metrics to adequately capture overall quality of generated text (w.r.t human judgement).

  • Though more focused automatic metrics can be defined to capture particular aspects:
    • Fluency (compute probability w.r.t. well-trained Language Model).
    • Correct Style (probability w.r.t. LM trained on target corpus – still not perfect)
    • Diversity (rare word usage, uniqueness of n-grams, entropy-based measures)
    • Relevance to input (semantic similarity measures – may not be good enough – see next)
    • Simple measurable aspects like length and repetition
    • Task-specific metrics, e.g. compression rate for summarization

Slide credit: CS224n, Stanford

122 of 223

Intrinsic Evaluation

123 of 223

Document Similarity techniques

  • Candidate Representation (embedding vector) – X
  • Reference Representation (embedding vector) – Y

  • Cosine Similarity :

  • Value lies between 0 to 1. Nearer to 1 implies higher similarity.
  • Objective: Candidate and Reference should be semantically similar.
  • Generally works well only for short candidate/reference length.
  • What are the representations?
    • Naïve: Obtain sentence representation by taking weighted average of word embedding (Mikolov 2013; Pennington, 2014).
    • Better ones: More sophisticated sentence embedding models.

124 of 223

BERT / Skip thought / Universal Encoder

  • Averaging word level embeddings disregards sentential properties (context, syntax, semantics)
  • Obtain sentence level representation using Skip-thought (Kiros 2015) / BERT (Devlin, 2018) / GPT / Embeddings from Language Models (ELMo) (Peters, 2018)

Sentences

RNN Encoder

Decoder

Decoder

Previous Sent

Next Sent

Sentence Vector

Sentences

LSTM / GRU

Bidirectional

Decoder

Next Sent

Task specific

Linear combination

Of hidden

representations

Sentence Vector

Sentences

Unidirectional

Transformers

Decoder

Next Sent

Sentence Vector

Sentences

Bidirectional

Transformers

Decoder

Next Sent

Sentence Vector

Skip-thought

ELMo

BERT

GPT

125 of 223

Problems with Document Similarity

  • Embedding learning : Highly architecture and dataset specific
  • Expects input to be a sequence (well represented context)
    • Tabular semantics - difficult to handle
    • Altered / Reordered input may result in different output values
  • Understanding and disambiguation of senses
    • Word vectors often capture most frequent senses (disregarding context)
    • BERT, ELMo can at most understand senses from their training data
      • The score is one love
      • One love for the mother’s pride… [The famous song by Blue]
      • However, need task-specific tuning

126 of 223

Next we discuss intrinsic metrics like complexity, grammaticality,�coherence…..���These metrics DO NOT require reference text!!!!

127 of 223

Text Complexity (Lexical Complexity)

  • Degree of Polysemy: the total sum of the number of senses possesses by each word in the sentence, normalized by the sentence length

It is possible that this is tough sentence. =>

{ it:1, is: 13, possible: 4, that: 1, this: 1, tough: 12, sentence: 4} = 34 / 8 = 4.25

Intuition: More polysemy + less context = harder to disambiguate

  • Fraction of Rare Words: Percentage of rare words estimated by the frequency of their occurrence in representative corpora
    • The General Word Service List and the Academic Word List are used as the representative corpora (contain word frequency list)

Adaptation and mitigation efforts must therefore go hand in hand.

  • Fraction of Nouns, Verbs and Adjectives: Ratio of the number of nouns/verbs/prepositions in a sentence to the sentence length

Intuition: Define semantic roles

  • Average Syllables per Word: It is the average number of syllables per word in a sentence. For example.

It is possible that this is tough sentence.

    • The above sentence has a total of 11 syllables. Hence, the average number of syllables per word is 11/8 = 1.375

[Mishra et al., 2018]

128 of 223

Text Complexity (Readability)

  • Readability tests, readability formulas, or readability metrics are formulae for evaluating the readability of text, usually by counting syllables, words, and sentences.
  • Lexicalized Scores for Readability (Word, Syllable, Sentence based):
    • Automated Readability Index (ARI)
    • Coleman-Liau Index
    • Dale-Chall Readability Formula
    • Flesch-Kincaid readability tests
    • Flesch Reading Ease
    • Flesch–Kincaid Grade Level
    • Fry Readability Formula
    • Gunning-Fog Index
    • LEXILE

[Mishra et al., 2018]

129 of 223

Text Complexity (Syntactic Complexity)

  • Dependency Distance: mean of the total length of the dependency links appearing in the dependency parse tree of a sentence

The structural complexity of the sentence is 15/7 = 2.14

[Mishra et al., 2018]

130 of 223

Text Complexity (Syntactic Complexity)

  • Non terminal to terminal ratio

The non-terminal to terminal ratio is thus 10 / 9 = 1.11

  • Intuition: intuitively, the ratio would be higher for sentences with nested structures which would add to the syntactic difficulty

[Mishra et al., 2018]

131 of 223

Text Complexity (Syntactic Complexity)

  • Clause Count
    • Passive clause count

The house is guarded by the dog that is taken care of by the homeowner.

    • This sentence contains two passive clauses, and hence the passive clause count is 2.

  • Coreference Distance:
    • Sum of distances, in number of words, between all pairs of co-referring text segments in a sentence
    • Larger coreference distance => greater cognitive load in processing => greater complexity

[Mishra et al., 2018]

132 of 223

Grammaticality (types of grammar error)

  • Spelling error
  • Repeated words
  • Subject-Verb agreement
    • He walk(walks) to college
  • Noun Number agreement
    • There are lot of restaurant(restaurants) in the college
  • Verb Tense
    • I have seen (saw) him yesterday
  • Pronoun
    • The girls won her (their) game
  • Preposition
    • The train will arrive within (in) five minutes
  • Articles
    • The Paris is big city => Paris is a big city
  • Double Negatives
    • I can’t hardly believe => I can hardly believe

133 of 223

Grammaticality (solutions)

  • Quick and Automatic (mostly used in NLG settings):
    • LM Perplexity / log-likelihood
    • Confidence Scores of probabilistic parsers (e.g., PCFG parsers)
  • Heuristic Based:
    • Pattern Matching and String replacement (Bustamante, 1996)
    • POS and Parse tree based rules (Naber, 2003)
  • Data Driven:
    • Labeled corpus indicating grammar error (CoNLL 2013, 2014 shared tasks, EMILLE Corpus)
    • Error Generator Tool: GenERRate (Foster, 2009)
    • Techniques: SMT, Seq2Seq, Feature based classifiers, Multi-task learning with language and error correction tasks
  • Problem:
    • Solutions address human error ; different in NLG (unrealistic errors)

134 of 223

Discourse Coherence

  • Local coherence methods:
    • Feature based supervised classification
      • Entity grid model (Barzilay, 2008)
      • HMM + Syntactic patterns (Louis, 2012)
      • Seq2seq (Li and Jurafsky, 2017)

  • Topical Coherence:
    • Measure coherence based on the degree of topic-drift (Shrivastava, 2018)
    • Entropy of Topic document distribution and topic relatedness

135 of 223

Human Evaluation

136 of 223

Human judgement scores typically considered in NLG

  • Fluency: How grammatically correct the output sentence is?

  • Adequacy: To what extent information in the input has been preserved in the output ?

  • Coherence: How coherent the output paragraph is ?

  • Readability: How hard is the output to comprehend?

  • Catchiness (persuasion / creative domain): How attractive the output sentence is?

“Ah, go boil yer heads, both of yeh. Harry—yer a wizard.”

INPUT: <Einstein, birthplace, Ulm> | OUTPUT: Einstein was born in Florence

The most important part of an essay is the thesis statement. Essays can be written on various topics

from domains such as politics, sports, current affairs etc. I like to write about Football because it is the

most popular team sport played at international level.

A neutron walks into a bar and asks how much for a drink. The bartender replies “for you no charge.”

MasterCard: "There are some things money can't buy. For everything else, there's MasterCard."

MasterCard: ”You can use this for shopping."

vs

137 of 223

Problems with human evaluation

  • Better AUTOMATIC evaluation metrics are NEEDED!!!!
  • Can be slow and expensive

  • Can be unreliable:
    • Humans are (1) inconsistent, (2) sometimes illogical, (3) can lose concentration, (4) misinterpret the input, (5) cannot always explain why they feel the way they do.

  • Can be subjective (vary from person to person)

  • Judgements can be affected by different expectations
    • the chatbot was very engaging because it always wrote back

Slide credit: CS224n, Stanford

138 of 223

PART - 2

Hybrid Methods

Role of Semantics and Pragmatics

Problems beyond Simple Generation

Conclusion and Future Directions

139 of 223

Hybrid Methods�

140 of 223

Scalable Micro-planned ��Generation of Discourse��from Structured Data

141 of 223

Structured data input formats

Laha et al. 2018

142 of 223

Central idea?

  • How to make learning more interpretable?
  • How to handle variable schema?
  • How to make a domain adaptable model?
  • How to train the system in domain independent manner?
  • Unsupervised scheme to train a general structured data summarization system.

Laha et al. 2018

143 of 223

Method

  • Modular Statistical System Consisting of 3 Stages

Laha et al. 2018

144 of 223

Canonicalization

name

birth place

birth date

wife

Albert Einstein

Ulm, Germany

14 March 1879

Elsa Lowenthal

name

birth place

Albert Einstein

Ulm, Germany

name

birth date

Albert Einstein

14 March 1879

name

wife

Albert Einstein

Elsa Lowenthal

“Albert Einstein”

“birth place”

“Ulm, Germany”

“Albert Einstein”,

“birth date”

“14 March 1879”

“Albert Einstein”,

“wife”,

“Elsa Lowenthal”

< PERSON birth place GPE >

< PERSON birth date DATE >

< PERSON wife PERSON >

Splitting

Flattening

NE tagging

Flattening

Flattening

NE tagging

NE tagging

Conversion from various formats to triples made of binary relations among two entities types

Laha et al. 2018

145 of 223

Data Creation for Keyword/Triple to Text Generation

EXAMPLE:

Input:

AlbertEinstein, HasWonPrize, NobelPrize

Segmentation:

Albert Einstein, has won prize, Nobel prize

Concatination:

Albert Einstein has won prize Nobel prize

Post-process:

Albert Einstein has won Nobel prize

DeLex: PERSON has won AWARD

Concatenation and Grammar Correction

Triples from rich KBs

(Freebase, DBPedia, Yago)

Entity tagging

Original Parallel

Original

Triples

Synthesized

Sentences

Delexicalized Parallel

DeLex

Triples

DeLex

Sentences

EXAMPLE:

Input:

Obama was born in Honolulu

OpenIE:

<Obama, born in, Honolulu>

Original parallel instance:

Src: <Obama, born in, Honolulu>

Tgt: Obama was born in Honolulu

Domain agnostic parallel instance:

Src: <PERSON, born in, LOCATION>

Tgt: PERSON was born in LOCATION

Simple sentences from webscale corpus (e.g., Wikipedia Dump)

Open Information Extraction

Entity tagging

Original Parallel

Extracted

Triples

Original

Sentences

Delexicalized Parallel

DeLex

Triples

DeLex

Sentences

Combinations

of Possible

Entity

Types

VerbNet

Verbs

SVO

Template

Triples

Sentences

EXAMPLE:

Input : “go”

Possible Entities:

PERSON, LOCATION, WORK OF ART …

Possible Sentences (after correction):

1. PERSON goes to LOCATION

2. PERSON goes to WORK OF ART

N. WORK OF ART goes to PERSON

Best sentence:

PERSON goes to LOCATION

Concatenation + Correction +LM based ranking

Simple sentences from web corpus (e.g., Wikipedia Dump)

Stemming and Stopword

Removal

Entity tagging

Original Parallel

Extracted

Keywords

Original

Sentences

Delexicalized Parallel

DeLex

Triples

DeLex

Sentences

EXAMPLE:

Input :

Albert Einstein has won Nobel prize

Preprocessing:

Albert Einstein win Nobel Prize

Original parallel instance:

Src: <Albert Einstein, win, Nobel Prize>

Tgt: Albert Einstein has won Nobel prize

Domain agnostic parallel instance:

Src: PERSON win AWARD

Tgt: PERSON has won award

Laha et al. 2018

146 of 223

Simple Language Generation: Triple2Text

<Sachin Tendulkar, born in, India>

<PERSON, born in, GPE>

Seq2Seq

PERSON was born in GPE.

{Sachin Tendulkar: PERSON,

India: GPE}

Sachin Tendulkar was born in India.

Laha et al. 2018

147 of 223

MorphKey2Text: Rich Parallel Data extraction

Albert Einstein married Elsa Lowenthal in 1919 .

PERSON NOUN married VERB PERSON NOUN DATE NOUN

1. Coarse POS Tagging

2. NE Replacement

3. Stopword Removal

1. Fine-grained POS Tagging

2. POS retention for VERBs

3. NE Replacement

PERSON marry VBD PERSON in DATE

Source

Target

Original Sentence

Laha et al. 2018

148 of 223

Simple Language Generation: MorphKey2Text

Laha et al. 2018

149 of 223

Ranking of simple sentences

 

  • One can also add rules, for example, sentences with no verbs can be removed

Fluency

Adequacy

Laha et al. 2018

150 of 223

Sentence Compounding/Aggregation (rule based)

split

Jordan played basketball

Jordan played football

<Jordan, played, basketball>

<Jordan, played, football>

Jordan played basketball and football.

e11 == e21 && rvp1 == rvp2

Rule: e11 rvp1 e12 and e22

<e11 rvp1 e12>

<e21 rvp2 e22>

Example

Laha et al. 2018

151 of 223

Sentence Compounding/Aggregation (rule based)

split

Jordan played basketball

Jordan represented USA

<Jordan, played, basketball>

<Jordan, represented, USA>

Jordan played basketball and represented U.S.A.

e11 == e21

Rule: e11 rvp1 e12 and rvp2 e22

<e11 rvp1 e12>

<e21 rvp2 e22>

Example

Laha et al. 2018

152 of 223

Coreference Replacement(for entities)

Jordan played basketball and represented USA. Jordan was born in New York.

Jordan<PERSON> played basketball and represented USA<GPE>. Jordan<PERSON> was born in New York<GPE>.

Jordan<PERSON>

Gender detection

He

Jordan played basketball and represented USA. He was born in New York.

Laha et al. 2018

153 of 223

Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation�NAACL 2019

Moryossef et al. 2019

154 of 223

Triple-to-text Variations (Planning)

  • Input triples:

  • Possible output:

  • Fact re-ordering

  • Entity re-ordering

  • Sentence-split

John, who was born in London, works for IBM.

John, who works for IBM, was born in London.

IBM employs John, who was born in London.

IBM employs John. John was born in London.

Moryossef et al. 2019

155 of 223

Triple-to-Text Variations (Realization)

  • Input triples:

One-way (Sentence-split):

Verbalization:

IBM employs John. John was born in London.

IBM employs John. He was born in London.

Moryossef et al. 2019

156 of 223

Issues with end-to-end Neural Approaches

  • Lack of coherence on longer texts.
  • Not maintaining coherent order of facts.
  • Not faithful to input facts.
  • Omitting, repeating, hallucinating or changing facts.

  • Possible reasons:
    • Neural Approaches decent in language modeling/surface realization.
    • They fall behind on modelling more abstract levels of text structuring, etc.

Moryossef et al. 2019

157 of 223

Remedy: Two-step Approach

Text Planning

Plan Realization

Moryossef et al. 2019

158 of 223

Triple-to-Plan Generation

John, residence, London

England, capital, London

John, residence, London

John, occupation, Bartender

John, residence, London

England, capital, London

John, occupation, Bartender

Moryossef et al. 2019

159 of 223

Plan-to-Text Generation

Input Sequence:

Output Sequence:

Linearization

Seq2seq + Copy

Text Plan

Moryossef et al. 2019

160 of 223

Key Takeaway

Faithfulness to the input is improved compared to end-to-end Neural Approaches!!!

Also known adequacy or correctness

Moryossef et al. 2019

161 of 223

Gaps

  • Technique is dataset dependent
    • Restricted to WebNLG.

  • Assumption for text-plan design:
    • Each entity is mentioned only once in a sentence.

  • Restrictive definition of text-plan:
    • Splitting should be same as reference sentence splits.
    • Order of entities to be preserved between respective splits and their reference sentences.

Moryossef et al. 2019

162 of 223

Microplanning for ��Sentence Realization from ��Data

163 of 223

Verb Selection

  • Verb -> most important part of sentence, represents action
  • Often input contains noun (subjects, objects and complements)

Name

Work City

Occupation

Award

Albert Einstein

Ulm, Germany

Physicist

Nobel Prize

Subject

Complement

Object

Albert Einstein worked in Ulm, Germany

Albert Einstein received Nobel Prize

Zhang et al. 2018

Often verb prediction is important

Input: PERSON, CITY -> stay / work

Input: PERSON, AWARD -> receive

164 of 223

Data-driven Solution to Verb Selection

  • Easy to obtain data from monolingual corpus of simple sentences

Raw

Corpus

(WSJ, Reuter, Wiki)

Triples

Triple

Extraction

(OpenIE)

Tagged

Triples

1:<e1, e2: verb>

2: <e1, e2: verb>

3: <e1, e2: verb>

N: <e1, e2: verb>

Tagged entities

and Verbs

Entity

Tagging

and

Delexicalization

POS and

Verb selection

Training data

Model

Massive

Classification

(MLE, Neural)

Revenue rose highly by 13%

<Revenue, rose by, 13%>

<VALUE rose by PERCENTAGE>

<VALUE rise PERCENTAGE>

<VALUE , PERCENTAGE: rise >

Entity1, Entity2

Verb

training

test

Zhang et al. 2018

165 of 223

Preposition selection

  • Noun + Prepositional Phrase
    • wife of PERSON
  • Verb + Prepositional Phrase
    • worked at LOCATION
  • Training Dataset (verb)
    • Format: <verb-lemma, noun: preposition>

<work, location: at>

  • Source: raw corpus + constituency parsing (tricky)
  • Leveraging PP attachment dataset (Ratnaparkhi et al, 1994):
    • < join, board, as, director>: V
    • < be, chairman, of , IBM>: N
  • Training:
    • Multiclass classification
    • Using pretrained embedding helps address sparsity

<join, position_holder : as>

<position_holder, organization : of>

166 of 223

Role of Semantics and Pragmatics

167 of 223

Semantics and Pragmatics in NLG

  • Current generation paradigms focus on lexical and syntax aspects of language generation
  • However, NLG, especially data-to-text generation often requires content plans that convey more information than the input data
  • Paraphrasing at semantic /pragmatic levels: Same things is also spoken in various ways

What does John do for a living? ⬄ What is john’s job?

(Not merely lexical / syntactic paraphrasing)

  • Additional information has stronger effect

Restaurant

Food Type

China Town

Chinese

China town’s food type is Chinese

VS

China town serves Chinese food

Semantics: Situation agnoistic but deeper

Pragmatics: May vary according to situation, depends on who is listening

what is the environment

168 of 223

NLG Under Pragmatic Constraints

  • Initial approach by Hovy, 1987, PAULINE (Planning and Uttering Language in Natural Environment)
  • Semantics: Includes topics-based enrichment
  • Pragmatics: Includes extra-linguistic information involving attributes of speaker and listener
  • Characteristics of conversation setting
    • Conversational Atmosphere
      • Time: much, some, little (say, control generation (length) based on these)
      • Tone: formal, informal
      • Conditions: good, noisy
    • Speaker / Hearer
      • Topic knowledge: expert, student
      • Interest in the topic: high, low
      • Emotional state: happy, angry
    • Speaker-hearer relationship
      • Depth of acquaintance: friend, stranger
      • Emotion: like, equal , different
    • Interpersonal Goals
      • Speaker’s objective: affect hearer’s knowledge , affect hearer’s emotional state
      • Speaker-hearer relationship: affect hearer’s emotion towards speaker

169 of 223

PAULINE: System Overview

  • Characteristic decisions w.r.t pragmatic constraints:

  • Implementing Rhetorical Strategies
    • Maintain templates / heuristic for generation pertaining to each pragmatic aspect of conversation
    • E.g.,. If the constraint is to be “formal”, heuristics / templates structure / word choice , topic organization, sentence ordering will be activated

Topic

Collection

Topic Organization

Realization

Input topics

Text

  • Topic collection plans
  • Interpretation
  • New topics
  • Juxtaposition
  • Ordering
  • Sentence Type
  • Organization
  • Clauses
  • Words

S

T

R

A

T

E

G

I

E

S

Pragmatic aspects

of conversation

170 of 223

Hybrid System for Enriching NLG with Pragmatic Information

  • Approach by Shen et al, 2019
  • Tested on two tasks: data-to-text NLG (E2E dataset), text-to-text NLG (CNN abstractive summarization dataset)

Embedding

Attention

Input tuple

Generated Sentence

Speaker

Model

Embedding

Multiclass

Multitask

Classifier

Input tuple

Listener

Model

 

Reconstruction based model

base 🡪 vanilla seq2seq

R🡪 reconstruction based model

171 of 223

Example output (Shen et al, 2019)

Input:

NAME [FITZBILLIES], EATTYPE [COFFEE SHOP], FOOD [ENGLISH], PRICERANGE [CHEAP], CUSTOMERRATING [5 OUT OF 5], AREA [RIVERSIDE], FAMILYFRIENDLY [YES]

Human written

A cheap coffee shop in riverside with a 5 out of 5 customer rating is Fitzbillies. Fitzbillies is family friendly and serves English food.

Basic Seq2Seq

Fitzbillies is a family friendly coffee shop located near the river.

Reconstructor-based pragmatic system

Fitzbillies is a family friendly coffee shop that serves cheap English food in the riverside area. It has a customer rating of 5 out of 5.

Note:

  • Not truly pragmatic but has a provision to include more information (through classification based on reconstruction)
  • Listener works on complete output
    • Alternative: Word / phrase level listener models (Shen et al, 2019)

172 of 223

Problems beyond Simple Generation

Controllable Text Generation

Argument Generation

Persuasive Text Generation

Theme / Topic based Generation

Creative Storytelling

173 of 223

Controllable Text Generation

174 of 223

Key Goal of “Strong AI”

Source: https://www.slideshare.net/kimveale/building-a-sense-of-humour-the-robots-guide-to-humorous-incongruity

Configurable Personalities

Set Humor to 75%

Interstellar (2014)

You want 55?

Confirmed. Self destructing in

10, 9, 8 …

Make that 60?

60% confirmed

Knock Knock…

175 of 223

Controllable Text Transformation – System overview

Transformer

Input Text

Transformed Text

{Style1:Value1, Style1:Value2, …, StyleN:ValueN}

“A deep learning server needs at least 32 GB of RAM and an NVIDIA GPU”

{Wording: “Formal”, Sentiment: “Negative”, Word Count: “<30”}

“A server with having less than 32 GB of RAM, without an NVIDIA GPU is not recommended for running deep learning algorithms.”

(Control Intention: The user wants cautionary, yet formal text to be generated)

176 of 223

Control-based Text Transformation

Examples:

Sentence: The movie is terrible

Transformation: It is messy, uncouth, incomprehensible, vicious and absurd.

(Lexical, Sentiment Intensity, Formal, Complex)

Sentence: The movie is terrible

Transformation: A somewhat crudely constructed and hence, quite an unwatchable movie (it was)

(Syntactic, Semantic, Semi-Formal)

Sentence: The movie is terrible

Transformation: You sit through these kinds of movies because the theatre has air conditioning

(Pragmatic, Sentiment Intensity, Informal)

Lexical

Syntactic

Semantic

Pragmatic

Tone

Formalness

Sentiment

Emotion

Complexity

Linguistic

Perceptual

Finance

Healthcare

Retail

Practical

(Domain)

Controls

177 of 223

Need for Unsupervised Methods

  • Hard to create parallel data
    • Very large number of combinations of controls.
    • Data will be very sparse for some lesser seen combinations.
    • Hard to quantify all controls separately (a combination can mean another control)

  • Hard to quantify control values in manually annotated data
    • It is a good movie” when compared to “It is a terrible movie” does not change only in sentiment (from positive to negative), it also changes in formalness (from formal to informal).
    • It is not always possible to manually write output transformation changing only the specified control value.

  • Diverse use cases, diverse text transformations
    • Different types of linguistic aspects and domains involved.

178 of 223

Unsupervised Approaches: Background

  • Unsupervised Machine Translation: (Artetxe et al, 2017, 2018; Lampel et. al, 2017)
    • Denoising shared encoders, shared multi-lingual embeddings, back translation
  • Style Transfer using Non-parallel Text (Shen et. al, 2017)
    • Autoencoding, cross-alignment
  • Sequence to Better Sequence (Mueller et al, 2017)
    • Transformation as a correction step, guided by metrics. Training using Seq2Seq
  • Controllable Text Generation (Hu et al, 2017)
    • Conditional Variational Auto-encoders
  • Paraphrase Genration (Wubben et al., 2010; Prakash et al., 2016)
    • Monolingual Statistical Machine Translation and improvised Neural Machine Translation

179 of 223

Unsupervised Text Formalization (Jain et al, 2018)

  • Degree of formalization (control) given

as input during runtime

  • Features:
    • Unsupervised training scheme, handles infeasibility to annotate data for each <input, output, controls> triples
    • Preservation of language semantics
    • Use off-the-shelf NLP modules for verification and scoring
    • Control the degree of the intended attribute desired at the output.
    • Learning to incorporate multiple control inputs (which can be dependent)

180 of 223

Central Idea (Jain et al, 2018)

Exploration

(generate training data)

Exploitation

(retrain model)

<Sentences from unlabeled corpora, model>

<Sampled paraphrases, control values>

Model

<Model>

Control value

Input Sentence

Transformed Sentence

Training

Testing

181 of 223

Controllable Generation Architecture (Jain et al, 2018)

182 of 223

Argument Generation

183 of 223

IBM Project Debater

Project Debater

Human Debater

Grand Challenge

like Deep Blue, Jeopardy!

Man vs Machine

Debate

(Feb 11, 2019)

184 of 223

What is debating?

Debate Topic: “We should ban smoking

For the Motion:

Against the Motion:

Smoking causes cancer

Almost half the deaths (48.5%) from 12 different types of cancer combined are attributable to cigarette smoking, according to a study by researchers from the American Cancer Society and colleagues.

Smoking creates jobs

As tobacco smoking is a common activity, there are currently 1% of the population in the country who are involved in the growing, manufacturing and ultimately distribution of tobacco in various forms.

Claim

Evidence

185 of 223

How Project Debater works?

  1. Goes through 10 billion sentences.
  2. Picks the ones which are relevant to the Debate Motion.
  3. Selects the ones with the respective polarity to the motion (For/Against).
  4. These are called arguments : Claim (shorter phrases/sentences), Evidence (larger ones).
  5. The arguments are stitched together into Narratives (Natural Language Generation)
  6. A Text-2-Speech System converts the narrative into speech.

186 of 223

Challenges in Debate Speech Construction

  • The arguments need to be of correct polarity.
    • Understanding whether an argument is supporting or contrasting is a big challenge.
    • Generating an argument with appropriate polarity is even tougher!

  • The narrative needs to be coherent and sound natural.
    • There should be a good topical and local coherence between sentences.
    • The narrative should be to the point of discussion and not straying off.
    • The narrative should be logically correct.

  • The speech should be persuasive.
    • The arguments should be informative.
    • The arguments should be conveyed in a manner enough to persuade listener.

187 of 223

Project Debater is here at ACL 2019!!!!

Please visit IBM Booth for a demo

188 of 223

Persuasive Text Generation

189 of 223

Persuasion

  • Task: Given a product specification, generate a persuasive description

190 of 223

System Architecture [PersuAIDE!]

[Munigala et al, 2018]

191 of 223

Example outputs

[Munigala et al, 2018]

192 of 223

Theme / Topic based Text ��Generation

193 of 223

Overview

  • Fusion of topic models and language models

  • Motivation:
    • LMs: primary function is to predict probability of a span of text
    • Traditionally applied at sentence level, ignore broader document context
    • S: “Python is a beautiful …”
      • Computer Science: P(“language” | S ) ↑
      • Biology: P(“reptile” | S ) ↑
    • Sensitizing prediction of LMs to larger document narratives (using topics)
      • P(“reptile”|S , topic = “biology”) ↑

  • Proposed work jointly learns topics and word sequence representation

[Lau et al, 2017]

194 of 223

Approach�(Lau et al, 2017)

195 of 223

Example Generated Sentences

[Lau et al, 2017]

196 of 223

Creative Storytelling

197 of 223

Desiderata for Storytelling

  • Stories must remain thematically consistent across the complete document.
    • Require modeling very long-term dependencies.

  • Stories require creativity.
    • According to characteristics of creative artifacts: Novelty, Value, Unexpectedness, Impact, etc.

  • Stories need a high-level plot.
    • Necessitating planning ahead of word-by-word generation.

[Fan et al., 2018]

198 of 223

Generating Story from a prompt [Fan et al., 2018]

Two main challenges:

    • Require modeling very long-term dependencies (Story is very long).
    • Require modeling connection between prompt and story.

199 of 223

Tackling challenge 1….

  • Modeling very long-term dependencies:
    • Self-attention based approach [proven technique for Transformers – Vaswani et al., 2017]
    • Multi-head self-attention at Decoder.
    • Different heads look at different range of times-steps, first sees full input, second sees every second input, third sees every third input… - Called downsampling.

Self-attention at a single head

Multi-head self-attention

[Fan et al., 2018]

200 of 223

Tackling challenge 2….

  • Modeling connection between prompt and story
    • Through fusion approach – second attempt to learn the connection in case if it was missed the first time.
    • Fusion of the hidden states of the pre-trained language model and the current seq2seq model.

[Fan et al., 2018]

201 of 223

Conclusion and Future Directions

202 of 223

Holy Grail of data-to-text Systems

Data Scientist

Artist

Psychologist

+

+

  • Data Comprehension
  • Reasoning
  • Insights detection
  • Entertaining Text
  • Creative (open-ended)
  • Engaging Narratives
  • Understanding of listener (Empathetic)
  • Understanding of situation (Pragmatics)
  • Affective generation with desired controls (persuasive)

203 of 223

Future Goals

Short-term

Mid-term

Long-Term

In a couple of years

In 5-10 years

In at least a decade

204 of 223

Cross-lingual Inference (Short-term Goal)

 

 

Leonardo di ser Piero da Vinci was an Italian painter and scientist.

Léonard de Piero da Vinci était un peintre et scientifique italien.

Leonardo di ser Piero da Vinci era un pittore e scienziato italiano.

English

French

Italian

205 of 223

Entity-focused Knowledge Graph Summarization (Short-Term)

General graph summary:

Hugo Weaving acted in movie Cloud Atlas(as Bill Smoke) along with Tom Hanks(as Zachry) and in movie The Matrix(as Agent Smith). Both the movies were directed by Lana Wachowski.

Query: Show me movies directed by Lana and their lead actors.

Focus Lana

Entity focused summary(Focus Lana):

Lana Wachowski born in 1965 is the director of movies Cloud Atlas(released in 2012) and The Matrix(released in 1999)

206 of 223

Cross-lingual Learning (Mid-term Goal)

 

 

Leonardo di ser Piero da Vinci  (15 April 1452 – 2 May 1519), more commonly Leonardo da Vinci or simply Leonardo, was an Italian Renaissance polymath whose areas of interest included invention, painting, sculpting, architecture, science, music, mathematics, engineering, literature, anatomy, geology, astronomy, botany, writing, history, and cartography

English

French

207 of 223

Hierarchical Table Understanding (Mid-Term)

208 of 223

Data++ To Text (Mid-term)

name

birth place

birth date

wife

Albert Einstein

Ulm, Germany

14 March 1879

Elsa Lowenthal

+

Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist[5] who developed the theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). His work is also known for its influence on the philosophy of science. He is best known to the general public for his mass–energy equivalence formula, which has been dubbed "the world's most famous equation". He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect", a pivotal step in the development of quantum theory.

Albert Einstein, a theoretical physicist born on 14 March 1879 in Ulm, Germany is prominently known for developing the theory of relativity. Here we can see him interacting with Mahatma Gandhi.

Table

Image

Text

209 of 223

Interesting Narratives Generation from Data (Long-term)

Player

Goals

World Cup Wins

Nationality

Messi

419

0

Argentina

Ronaldo

311

0

Portugal

Zidane

155

1

France

Even though Zidane has scored lesser goals than both Messi and Ronaldo, he has won the World Cup once compared to others.

More examples of Interesting Facts :

“Messi going goal-less in a match”

“Indian football team scoring 10 goals against Brazil”

“3 red cards in a single match”

Anomalies

Two parts to the problem:

  • Figuring out interestingness in the data (Content Selection + Reasoning).

  • Realizing the data in an interesting way for the target.

210 of 223

Persuasive Argument Generation from Data (Long-Term)

OnePlus 7 Pro has a better camera and with larger memory space to capture all

your holiday photos in high quality without the fear of running out of space

  • Involves Understanding Context
  • Involves Reasoning (Comparison)
  • Interesting insights generation
  • Creative/Persuasive Generation

This is possible through rules and templates in limited/restricted settings!!

Can we do in more generalized way across domains??

211 of 223

https://sites.google.com/view/acl-19-nlg/

Tutorial Website:

THANK YOU

212 of 223

References

  • Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., & Shah, R. M. (2007). Some issues in automatic evaluation of english-hindi mt: more blues for bleu. ICON.
  • Angeli, G., Liang, P., & Klein, D. (2010, October). A simple domain-independent probabilistic approach to generation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 502-512). Association for Computational Linguistics.
  • Artetxe, M., Labaka, G., & Agirre, E. (2018). Unsupervised statistical machine translation. arXiv preprint arXiv:1809.01272.
  • Artetxe, M., Labaka, G., Agirre, E., & Cho, K. (2017). Unsupervised neural machine translation. arXiv preprint arXiv:1710.11041.
  • Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  • Bustamante, F. R., & León, F. S. (1996, August). GramCheck: A grammar and style checker. In Proceedings of the 16th conference on Computational linguistics-Volume 1 (pp. 175-181). Association for Computational Linguistics.

213 of 223

References

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • Doddington, G. (2002, March). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research (pp. 138-145). Morgan Kaufmann Publishers Inc..
  • Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neural story generation. arXiv preprint arXiv:1805.04833.
  • Foster, J., & Andersen, Ø. E. (2009). GenERRate: generating errors for use in grammatical error detection. The Association for Computational Linguistics.
  • Fu, Z., Tan, X., Peng, N., Zhao, D., & Yan, R. (2018, April). Style transfer in text: Exploration and evaluation. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Gatt, A., & Krahmer, E. (2018). Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61, 65-170.

214 of 223

References

  • Gatt, A., & Reiter, E. (2009, March). SimpleNLG: A realisation engine for practical applications. In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009) (pp. 90-93).
  • Gu, J., Lu, Z., Li, H., & Li, V. O. (2016). Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393.
  • Gulcehre, C., Ahn, S., Nallapati, R., Zhou, B., & Bengio, Y. (2016). Pointing the unknown words. arXiv preprint arXiv:1603.08148.
  • Hovy, E. (1987). Generating natural language under pragmatic constraints. Journal of Pragmatics, 11(6), 689-719.
  • Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., & Xing, E. P. (2017, August). Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1587-1596). JMLR. org.
  • Huang, L., & Chiang, D. (2007, June). Forest rescoring: Faster decoding with integrated language models. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 144-151).

215 of 223

References

  • Jain, P., Laha, A., Sankaranarayanan, K., Nema, P., Khapra, M. M., & Shetty, S. (2018). A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization. arXiv preprint arXiv:1804.07790.
  • Jain, P., Mishra, A., Azad, A. P., & Sankaranarayanan, K. (2018). Unsupervised Controllable Text Formalization. arXiv preprint arXiv:1809.04556.
  • Kim, J., & Mooney, R. J. (2010, August). Generative alignment and semantic parsing for learning from ambiguous supervision. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (pp. 543-551). Association for Computational Linguistics.
  • Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Skip-thought vectors. In Advances in neural information processing systems (pp. 3294-3302).
  • Konstas, I., & Lapata, M. (2012, June). Unsupervised concept-to-text generation with hypergraphs. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 752-761). Association for Computational Linguistics.
  • Konstas, I., & Lapata, M. (2013, October). Inducing document plans for concept-to-text generation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 1503-1514).

216 of 223

References

  • Langkilde, Irene and Knight, Kevin (1998). Generation that Exploits Corpus-Based Statistical Knowledge. ACL 1998, Montreal, Canada.
  • Laha, A., Jain, P., Mishra, A., & Sankaranarayanan, K. (2018). Scalable Micro-planned Generation of Discourse from Structured Data. arXiv preprint arXiv:1810.02889.
  • Lau, J. H., Baldwin, T., & Cohn, T. (2017). Topically driven neural language model. arXiv preprint arXiv:1704.08012.
  • Lebret, R., Grangier, D., & Auli, M. (2016). Neural text generation from structured data with application to the biography domain. arXiv preprint arXiv:1603.07771.
  • Liang, P., Jordan, M. I., & Klein, D. (2009, August). Learning semantic correspondences with less supervision. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1 (pp. 91-99). Association for Computational Linguistics.
  • Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).
  • Lin, D. (1996). On the structural complexity of natural language sentences. In COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics.

217 of 223

References

  • Liu, C. W., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., & Pineau, J. (2016). How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. arXiv preprint arXiv:1603.08023.
  • Liu, T., Wang, K., Sha, L., Chang, B., & Sui, Z. (2018, April). Table-to-text generation by structure-aware seq2seq learning. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Louis, A., & Nenkova, A. (2012, July). A coherence model based on syntactic patterns. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1157-1168). Association for Computational Linguistics.
  • Mann, W. C., & Thompson, S. A. (1988). Towards a functional theory of text organization.
  • Mei, H., Bansal, M., & Walter, M. R. (2015). What to talk about and how? selective generation using lstms with coarse-to-fine alignment. arXiv preprint arXiv:1509.00838.
  • Melamed, I. D., Green, R., & Turian, J. P. (2003). Precision and recall of machine translation. In Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers.

218 of 223

References

  • Miao, Y., & Blunsom, P. (2016). Language as a latent variable: Discrete generative models for sentence compression. arXiv preprint arXiv:1609.07317.
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
  • Mishra, A., & Bhattacharyya, P. (2018). Cognitively Inspired Natural Language Processing: An Investigation Based on Eye-tracking. Springer.
  • Mishra, A., & Bhattacharyya, P. (2018). Estimating Annotation Complexities of Text Using Gaze and Textual Information. In Cognitively Inspired Natural Language Processing (pp. 49-76). Springer, Singapore.
  • Moryossef, A., Goldberg, Y., & Dagan, I. (2019). Step-by-step: Separating planning from realization in neural data-to-text generation. arXiv preprint arXiv:1904.03396.
  • Mueller, J., Gifford, D., & Jaakkola, T. (2017, August). Sequence to better sequence: continuous revision of combinatorial structures. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 2536-2544). JMLR. org.
  • Munigala, V., Mishra, A., Tamilselvam, S. G., Khare, S., Dasgupta, R., & Sankaran, A. (2018, April). Persuaide! An adaptive persuasive text generation system for fashion domain. In Companion Proceedings of the The Web Conference 2018 (pp. 335-342). International World Wide Web Conferences Steering Committee.

219 of 223

References

  • Mutton, A., Dras, M., Wan, S., & Dale, R. (2007, June). GLEU: Automatic evaluation of sentence-level fluency. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 344-351).
  • Naber, D. (2003). A rule-based style and grammar checker (pp. 5-7). GRIN Verlag.
  • Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
  • Nema, P., & Khapra, M. M. (2018). Towards a better metric for evaluating question generation systems. arXiv preprint arXiv:1808.10192.
  • Nema, P., Shetty, S., Jain, P., Laha, A., Sankaranarayanan, K., & Khapra, M. M. (2018). Generating descriptions from structured data using a bifocal attention mechanism and gated orthogonalization. arXiv preprint arXiv:1804.07789.
  • Nisioi, S., Štajner, S., Ponzetto, S. P., & Dinu, L. P. (2017, July). Exploring neural text simplification models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 85-91).

220 of 223

References

  • Niu, T., & Bansal, M. (2018). Polite dialogue generation without parallel data. Transactions of the Association of Computational Linguistics, 6, 373-389.
  • Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. ACL 2002.
  • Paulus, R., Xiong, C., & Socher, R. (2017). A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304.
  • Pennington, J., Socher, R., & Manning, C. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
  • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
  • Prakash, A., Hasan, S. A., Lee, K., Datla, V., Qadir, A., Liu, J., & Farri, O. (2016). Neural paraphrase generation with stacked residual LSTM networks. arXiv preprint arXiv:1610.03098.

221 of 223

References

  • Puduppully, R., Dong, L., & Lapata, M. (2018). Data-to-text generation with content selection and planning. arXiv preprint arXiv:1809.00582.
  • Ratnaparkhi, A., Reynar, J., & Roukos, S. (1994). A maximum entropy model for prepositional phrase attachment. In HUMAN LANGUAGE TECHNOLOGY: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994.
  • Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685.
  • See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.
  • Sha, L., Mou, L., Liu, T., Poupart, P., Li, S., Chang, B., & Sui, Z. (2018, April). Order-planning neural text generation from structured data. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Sheika, F. A., & Inkpen, D. (2012). Learning to classify documents according to formal and informal style. Linguistic Issues in Language Technology, 8(1), 1-29.

222 of 223

References

  • Sheikha, F. A., & Inkpen, D. (2011, September). Generation of formal and informal sentences. In Proceedings of the 13th European Workshop on Natural Language Generation (pp. 187-193). Association for Computational Linguistics.
  • Shen, S., Fried, D., Andreas, J., & Klein, D. (2019). Pragmatically Informative Text Generation. arXiv preprint arXiv:1904.01301.
  • Shrivastava, D., Mishra, A., & Sankaranarayanan, K. (2018). Modeling Topical Coherence in Discourse without Supervision. arXiv preprint arXiv:1809.00410.
  • Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006, August). A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas (Vol. 200, No. 6).
  • Specia, L., Turchi, M., Cancedda, N., Dymetman, M., & Cristianini, N. (2009, May). Estimating the sentence-level quality of machine translation systems. In 13th Conference of the European Association for Machine Translation (pp. 28-37).
  • Tianxiao Shen, Tao Lei, Regina Barzilay, Tommi Jaakkola. 2017. Style Transfer from Non-Parallel Text by Cross-Alignment. NeurIPS 2017

223 of 223

References

  • Trisedya, B. D., Qi, J., Zhang, R., & Wang, W. (2018). GTR-LSTM: A triple encoder for sentence generation from RDF data. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 1627-1637).
  • Wiseman, S., Shieber, S. M., & Rush, A. M. (2017). Challenges in data-to-document generation. arXiv preprint arXiv:1707.08052.
  • Wubben, S., Van Den Bosch, A., & Krahmer, E. (2010, July). Paraphrase generation as monolingual translation: Data and evaluation. In Proceedings of the 6th International Natural Language Generation Conference (pp. 203-207). Association for Computational Linguistics.
  • Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., ... & Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048-2057).
  • Zhang, D., Yuan, J., Wang, X., & Foster, A. (2018). Probabilistic verb selection for data-to-text generation. Transactions of the Association for Computational Linguistics, 6, 511-527.
  • Zhou, Q., Yang, N., Wei, F., & Zhou, M. (2018, April). Sequential copying networks. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Zhu, Y., Wan, J., Zhou, Z., Chen, L., Qiu, L., Zhang, W., ... & Yu, Y. (2019). Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence. arXiv preprint arXiv:1906.01965.
  • Zhang, D., Yuan, J., Wang, X., & Foster, A. (2018). Probabilistic verb selection for data-to-text generation. Transactions of the Association for Computational Linguistics6, 511-527.