Towards Exploiting Background Knowledge for Building Conversation Systems
Nikita Moghe, Siddhartha Arora, Suman Banerjee, Mitesh M. Khapra
1
Indian Institute of Technology Madras
Robert Bosch Centre for Data Sciences and AI, IITM
2
Courtesy: Google Images (Cortana, Allo, Echo, Siri, Ask Jenn)
3
Deep Learning for Conversational AI
Rise of data driven systems
First Chatbot
Template/Rule based Systems
Natural Language Generation
Natural Language Understanding
Dialog State Tracking
Utterance
Response
Rules
Slot-Filling
Rules
Data
Data
Data Hungry
End to End Systems
Data
Weizenbaum.1966
Ritter et al., 2011; Vinyals and Le, 2015; Lowe et al., 2015
Perez-Marin and Pascual-Nieto, 2011; Shawar and Atwell, 2007b; Williams et al., 2013
Aust et al., 1995; McGlashan et al., 1992; Simpson and Eraser, 1993
Data Data Everywhere..
4
Based on Movie Scripts
Crawled from websites
60+ such datasets for dialog
~1.3M
chats
~930K
chats
~1.7B
comments
A Survey of Available Corpora for Building Data-Driven Dialogue Systems, Serban et al., arxiv 2015
Based on Human-Human Spoken Interaction
Based on Human Machine Interaction
Logos: https://ubuntuforums.org/, https://twitter.com/, https://www.reddit.com/
Dialog as a Seq2Seq Problem!
5
Human: Please suggest a movie ?
Bot: Sure! Check out Titanic
please suggest a movie
sure check out <UNK>
Encoder
Decoder
But...
Humans rely on background knowledge to converse
Vinyals et al., ICML 2015
Has this never been tried before ?
6
Not really...
Linux
Man Pages
Goal Oriented Dialog
Ubuntu Corpus
But the resources and chats are not tightly coupled!
Open Domain Dialog
Alexa Proceedings 2017
Lowe et al., 2015
Key Contribution
7
Domain specific conversation systems with alternate responses explicitly obtained from specific
background knowledge
Dataset Creation
8
The 4C’s of Dataset Creation
9
Crowdsource
Crawl
Check
Curate
9071 chats
~90K
Utterances
9278
resources
15.29
Avg words/turn
153.07
Avg words/chat
10
Movie : Spider-Man
Popular and Diverse Movie List
921 Movies
IMDb 250
Top Ten Movies by Genre
1001 Movies you must watch before you die!
Curate
11
.. spiders through
genetic manipulation. While Peter is taking photographs of Mary Jane for the school newspaper, ...
I thought the movie very engrossing. Director Sam Raimi kept the action quotient high but also emphasized the human element of the story
Box Office | $403,706,375 |
Similar Movies | Avengers SpiderMan 2 |
Crazy attention to detail.
It was too heavily reliant on light-hearted humor.
Movie : Spider-Man
Plot
Review
Comments
Wikipedia for Plots
Collected Reviews using IMDb Most Popular Reviews
Crawl
Curate
Official Reddit Pages for Comments
Facts
Wikipedia Infoboxes for Facts
Background Knowledge
12
.. spiders through
genetic manipulation. While Peter is taking photographs of Mary Jane for the school newspaper, ...
I thought the movie very engrossing. Director Sam Raimi kept the action quotient high but also emphasized the human element of the story
Box Office | $403,706,375 |
Similar Movies | Avengers SpiderMan 2 |
Crazy attention to detail.
It was too heavily reliant on light-hearted humor
Movie : Spider-Man
Plot
Review
Comments
S1(N): Which is your favourite character in this?
S2(C): My favorite character was played by Tobey Maguire.
Crowdsource
Crawl
Curate
Chat opening statements
Which is your favourite scene in the movie ?
Which is your favourite character in this ?
What do you think about the movie ?
9 per movie
Facts
My favorite character was played by Tobey Maguire.
13
S1(N): Which is your favourite character in this?
S2(C): My favorite character was played by Tobey Maguire.
.. spiders through
genetic manipulation. While Peter is taking photographs of Mary Jane for the school newspaper, ...
I thought the movie very engrossing. Director Sam Raimi kept the action quotient high but also emphasized the human element of the story
Box Office | $403,706,375 |
Similar Movies | Avengers SpiderMan 2 |
Movie : Spider-Man
Plot
Review
Comments
Crowdsource
Crawl
Curate
Facts
Crazy attention to detail.
It was too heavily reliant on light-hearted humor
My favorite character was played by Tobey Maguire.
Crowdsourcing platforms are meant for Atomic Tasks!
But dialog requires two people
Same worker plays the role of Speaker 1 and Speaker 2
to complete the chat
Self-Chats
Krause et al. Alexa Proceedings 2017
14
S1(N): Which is your favourite character in this?
S2(C): My favorite character was played by Tobey Maguire.
S1(N): I thought he did an excellent job as Peter Parker, I didn’t see what it was that turned him into Spider-Man though.
.. spiders through
genetic manipulation. While Peter is taking photographs of Mary Jane for the school newspaper, ...
I thought the movie very engrossing. Director Sam Raimi kept the action quotient high but also emphasized the human element of the story
Box Office | $403,706,375 |
Similar Movies | Avengers SpiderMan 2 |
Movie : Spider-Man
Plot
Review
Comments
Crowdsource
Crawl
Curate
Facts
Crazy attention to detail.
It was too heavily reliant on light-hearted humor
My favorite character was played by Tobey Maguire.
Self-Chats
Speaker 1 and Speaker 2 are played by the same person.
Speaker 1 is free to talk about anything
15
S1(N): Which is your favourite character in this?
S2(C): My favorite character was played by Tobey Maguire.
S1(N): I thought he did an excellent job as Peter Parker, I didn’t see what it was that turned him into Spider-Man though.
S2(P): Well this happens while Peter is taking photographs of Mary Jane for the school newspaper, one of these new spiders lands on his hand and bites him.
.. spiders through
genetic manipulation. While Peter is taking photographs of Mary Jane for the school newspaper, ...
I thought the movie very engrossing. Director Sam Raimi kept the action quotient high but also emphasized the human element of the story
Box Office | $403,706,375 |
Similar Movies | Avengers SpiderMan 2 |
Movie : Spider-Man
Plot
Review
Comments
Crowdsource
Crawl
Curate
Facts
Crazy attention to detail.
It was too heavily reliant on light-hearted humor
My favorite character was played by Tobey Maguire.
Self-Chats
Speaker 1 and Speaker 2 are played by the person.
Speaker 1 is free to talk about anything
Speaker 2 has to reply using background knowledge
Specifically, Speaker 2 selects a contiguous span of words from the resource and appends suitable words
16
S1(N): Which is your favourite character in this?
S2(C): My favorite character was played by Tobey Maguire.
S1(N): I thought he did an excellent job as Peter Parker, I didn’t see what it was that turned him into Spider-Man though.
S2(P): Well this happens while Peter is taking photographs of Mary Jane for the school newspaper, one of these new spiders lands on his hand and bites him.
.. spiders through
genetic manipulation. While Peter is taking photographs of Mary Jane for the school newspaper, ...
I thought the movie very engrossing. Director Sam Raimi kept the action quotient high but also emphasized the human element of the story
Box Office | $403,706,375 |
Similar Movies | Avengers SpiderMan 2 |
Movie : Spider-Man
Plot
Review
Comments
AMT workers are notorious!
Check for:
Crowdsource
Crawl
Check
Curate
Facts
Crazy attention to detail.
It was too heavily reliant on light-hearted humor
My favorite character was played by Tobey Maguire.
Amazon Mechanical Turk https://www.mturk.com
17
S1(N): Which is your favourite character in this?
S2(C): My favorite character was played by Tobey Maguire.
S1(N): I thought he did an excellent job as Peter Parker, I didn’t see what it was that turned him into Spider-Man though.
S2(P): Well this happens while Peter is taking photographs of Mary Jane for the school newspaper, one of these new spiders lands on his hand and bites him.
S1 (N): I see. I was very excited to see this film and it did not disappoint!
S2(R): I agree, I thoroughly enjoyed “Spider-Man”
S1(N): I loved that they stayed pretty true to the comic.
S2(C): Yeah, it was a really great comic book adaptation
S1(N): The movie is a great life lesson on balancing
power.
S2(F): That is my most favorite line in the movie, ‘With
great power comes great responsibility.’
I thoroughly enjoyed“Spider-Man” which I saw in a screening. I thought the movie very engrossing. Director Sam Raimi kept the action quotient high, but also emphasized the human element of the story. The casting was perfect. Tobey Maguire was very believable as the gawky teenager in the early part of the film and then, after his run-in with the radioactive
Peter’s science class takes a field trip
to a genetics laboratory at Columbia University. The lab works on spiders and has even managed to create new species of spiders through
genetic manipulation. While Peter is taking photographs of Mary Jane for the school newspaper, one of these new spiders lands on his hand and bites him
Plot
Review
Crazy attention to detail. My favorite character was played by Tobey Maguire. I can’t get over the ”I’m gonna kill you dead” line.
No spoilers, but it does start to take itself more seriously towards the finale. It was too heavily
reliant on constant light-hearted humor. How ever the constant joking around kinda was low. A really great comic book adaptation.
Comments
Box Office | $403,706,375 |
Taglines | With great power comes great responsibility Get Ready For Ultimate Spin! |
Fact Table
Movie : Spider-Man
Multi-Reference Test Set
18
S1(N): I thought he did an excellent job as Peter Parker, I didn’t see what it was that turned him into Spider-Man though.
S2(P): Well this happens while Peter is taking photographs of Mary Jane for the school newspaper, one of these new spiders lands on his hand and bites him.
S1 (N): I see. I was very excited to see this film and it did not disappoint!
I thoroughly enjoyed“Spider-Man” which I saw in a screening. I thought the movie very engrossing. Director Sam Raimi kept the action quotient high, but also emphasized the human element of the story. The casting was perfect. Tobey Maguire was very believable as the gawky teenager in the early part of the film and then, after his run-in with the radioactive
Review
S2 (R): I agree. I thoroughly enjoyed Spider-Man
S2 (R): Also, The casting was perfect
S2 (R): I think so too! Director Sam Raimi kept the action quotient high
78.04% of test set
Several Responses can be correct for a given context
But wait… Is this Natural?
19
500 randomly chosen chats
Evaluated by three in-house annotators per chat
How can self-chat be a dialog?
What if the conversation digressed ?
Does this copy-paste make any sense ?
Metric | Score on 5 |
Intelligibility | 4.47 |
Coherence | 4.33 |
Two-person Chat | 4.47 |
On-Topic | 4.57 |
Grammar | 4.41 |
Methods
20
21
Generation Based Models
Copy-or-Generate Models
Span Prediction Models
Hierarchical Recurrent
Encoder Decoder (HRED)
(No Background Knowledge)
Get to the Point (GTTP)
Document <-> Resource
Summary <-> Response
BiDirectional
Attention Flow (BiDAF)
Question <-> Context
Document <-> Resource
Serban et al. AAAI 2016; See et al. ACL 2017; Seo et al., ICLR 2017
Challenges
22
S1 (N): I see. I was very excited to see this film and it did not disappoint!
S2(R): I agree, I thoroughly enjoyed “Spider-Man”
S1(N): I loved that they stayed pretty true to the comic
S1 (N): I see. I was very excited to see this film and it did not disappoint!
S2(R): I agree, I thoroughly enjoyed “Spider-Man”
S1(N): I loved that they stayed pretty true to the comic
S1 (N): I see. I was very excited to see this film and it did not disappoint!
S2(R): I agree, I thoroughly enjoyed “Spider-Man”
S1(N): I loved that they stayed pretty true to the comic
Oracle
Mixed-Long
Mixed-Short
BiDirectional Attention Flow fails beyond 256 resource words!
Average combined resource length ~ 900 words
Results
23
Model | Type | F1 | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
HRED | - | - | 5.23 | 24.55 | 7.61 | 18.87 |
GTTP | oracle | - | 13.92 | 30.32 | 17.78 | 25.67 |
GTTP | mixed-short | - | 11.05 | 29.66 | 17.7 | 25.13 |
GTTP | mixed-long | - | 7.51 | 23.2 | 9.91 | 17.35 |
BiDAF | oracle | 39.69 | 28.85 | 39.68 | 33.72 | 35.91 |
BiDAF | mixed-short | 45.72 | 32.95 | 45.69 | 40.18 | 43.8 |
Results
24
Background knowledge helps! But correct background knowledge helps better!
Model | Type | F1 | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
HRED | - | - | 5.23 | 24.55 | 7.61 | 18.87 |
GTTP | oracle | - | 13.92 | 30.32 | 17.78 | 25.67 |
GTTP | mixed-short | - | 11.05 | 29.66 | 17.7 | 25.13 |
GTTP | mixed-long | - | 7.51 | 23.2 | 9.91 | 17.35 |
BiDAF | oracle | 39.69 | 28.85 | 39.68 | 33.72 | 35.91 |
BiDAF | mixed-short | 45.72 | 32.95 | 45.69 | 40.18 | 43.8 |
Results
25
Span prediction models perform better
Model | Type | F1 | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
HRED | - | - | 5.23 | 24.55 | 7.61 | 18.87 |
GTTP | oracle | - | 13.92 | 30.32 | 17.78 | 25.67 |
GTTP | mixed-short | - | 11.05 | 29.66 | 17.7 | 25.13 |
GTTP | mixed-long | - | 7.51 | 23.2 | 9.91 | 17.35 |
BiDAF | oracle | 39.69 | 28.85 | 39.68 | 33.72 | 35.91 |
BiDAF | mixed-short | 45.72 | 32.95 | 45.69 | 40.18 | 43.8 |
Results
26
BiDAF can chose to ignore noise. GTTP has no such distinction
Model | Type | F1 | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
HRED | - | - | 5.23 | 24.55 | 7.61 | 18.87 |
GTTP | oracle | - | 13.92 | 30.32 | 17.78 | 25.67 |
GTTP | mixed-short | - | 11.05 | 29.66 | 17.7 | 25.13 |
GTTP | mixed-long | - | 7.51 | 23.2 | 9.91 | 17.35 |
BiDAF | oracle | 39.69 | 28.85 | 39.68 | 33.72 | 35.91 |
BiDAF | mixed-short | 45.72 | 32.95 | 45.69 | 40.18 | 43.8 |
Results
27
On Multi Reference Test Set (78.04%)
Model | Type | F1 | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
HRED | - | - | 5.38 | 25.38 | 8.35 | 19.67 |
GTTP | oracle | - | 16.46 | 32.74 | 20.20 | 28.23 |
GTTP | mixed-short | - | 15.68 | 31.71 | 19.72 | 27.35 |
GTTP | mixed-long | - | 8.73 | 25.51 | 12.13 | 19.57 |
BiDAF | oracle | 47.18 | 34.98 | 46.49 | 40.58 | 42.64 |
BiDAF | mixed-short | 51.35 | 39.39 | 50.73 | 45.01 | 46.95 |
What next?
Finding the right resource is important
Cross attention mechanisms are useful
Generation models are more human-like
28
Code and data : https://github.com/nikitacs16/Holl-E
29
Siddhartha Arora
Suman Banerjee
Mitesh M. Khapra
Microsoft Research India
(Student Travel Grant)
TextKernel
(EMNLP Student Travel Scholarship)
Thank You!
Questions/Suggestions
30
Human Evaluation of Responses
31
Model | Type | Human-Like | Appropriate | Fluency | Specificity |
HRED | - | 2.91 | 1.97 | 2.74 | 2.14 |
GTTP | oracle | 4.1 | 3.82 | 4.03 | 3.33 |
GTTP | mixed-long | 2.93 | 3.46 | 3.42 | 2.6 |
BiDAF | oracle | 3.78 | 4.17 | 4.05 | 3.76 |
BiDAF | mixed-short | 3.41 | 3.5 | 3.47 | 3.3 |