Commonsense
and how far have we come in giving our NLU systems common sense?
Oct 2022
0
Nasrin Mostafazadeh
Co-founder at Verneek (stealth)
@nasrinmmm
nasrin@verneek.com
Verneek is a deep-tech AI company in NYC, with mission of “enabling anyone to make better & faster decisions”. Verneek’s proprietary AI technologies power truly intuitive modalities of interaction on top of heterogeneous data sources.
We are hiring across all roles!
State of AI, ~17 years ago
Robotics vs. NLP
2
Answering the following basic question was deemed very challenging for the AI systems at the time!
Correct answer: Monkey
State of AI, now!
Robotics vs. NLP
Stanford CoreNLP Coreference Resolver
(March 2022)
And it’s still challenging!!
GPT-3
3
.. still doesn’t work!
Why is NLU so Hard?
People outside of the field often don’t understand what even is “AI”-related about human language, let alone why it’s the holy grail of AI!
4
… because of the Dual Problem of Language Ambiguity & Meaning Variability
5
The same expression can mean different things and be ambiguous.
The same meaning can be expressed in many ways
… because tackling the dual problem requires enormous amounts of world, common sense, and linguistic knowledge
6
We use our knowledge about entities and their attributes, selectional restrictions of verbs, and beyond…
Moravec’s Paradox
Skills that appear effortless to be difficult to reverse-engineer, but skills that require effort may not necessarily be difficult to engineer at all.
So NLU is hard because of …
knowledge acquisition bottleneck!
7
The Spectrum of Knowledge:
from Word Knowledge to World Knowledge
Common sense Knowledge
8
Lexical Knowledge World knowledge
What is Common Sense Knowledge
Well, the definition of common sense is not common sense!
9
My definition of common sense knowledge
The most fundamental and general knowledge about the world, shared by most people including a 5-year old kid. This includes:
10
According to my definition, here are the loose boundaries of common sense knowledge vs world/domain knowledge
It is:
11
Commonsense Reasoning
The basic reasoning abilities for connecting the dots and applying common sense knowledge in everyday contexts!
12
What does Having Common Sense Mean?
Common Sense Knowledge + Commonsense Reasoning Capabilities
13
A program has common sense if it automatically deduces for itself a sufficiently
wide class of immediate consequences of anything it is told and what it already
knows. – McCarthy (1959)
AI has common sense if it doesn’t make dumb mistakes. – Someone on Twitter! (2021)
How do we tackle the problem of
common sense knowledge acquisition?
14
Approaches for Acquiring Common Sense Knowledge
“Let’s roll up our sleeves and get on with it!” But it’s a daunting task…
15
Enter:
GOFAI
16
CYC: The Common Sense Knowledge Base
17
18
Dark times came after commonsense reasoning failed investments in 80s!
source: towards datascience
source: Yejin Choi
…when I started my PhD in Commonsense Reasoning in 2012, still no one really cared about the subject…
19
We have had much better approaches for acquiring Common Sense Knowledge since 2000s
“Let’s roll up our sleeves and get on with it!” But it’s a daunting task…
20
Overview of existing common sense resources
ATOMIC�(Sap et al., 2019)
OpenCyc�(Lenat, 2004)
ConceptNet 5.5�(Speer et al., 2017)
Web Child 2.0�(Tandon et al., 2017)
ConceptNet�(Liu & Singh, 2004)
Web Child�(Tandon et al., 2014)
Open Mind �Common Sense�(Singh, 2002)
Cyc�(Lenat et al., 1984)
OpenCyc 4.0�(Lenat, 2012)
ResearchCyc�(Lenat, 2006)
NELL�(Mitchell et al., 2015)
NELL�(Carlson et al., 2010)
today
22
Benchmarks with small amount of direct supervision…
23
Winograd Schema Challenge (WSC)
273 examples
(Levesque, 2011)
Choice of Plausible Alternatives (COPA)
500 dev, 500 test
(Roemmele et al., 2011)
‘John didn’t see Brian’s car coming, because he was dizzy.’
‘Who was dizzy?’
‘John didn’t see Brian’s car coming, because he had his lights
off.’
‘Who had his lights off?’
Winograd Schema Common Sense Reasoning challenge, proposed by Levesque, 2011.
25
Around 2016, we saw a renewed interest in commonsense reasoning...
...with deep learning folks acknowledging it as the
Holy Grail of AI!
The paradigm shift in NLU, since 2015…
26
Chris Manning
���Schank and Abelson, 1977; Dyer, 1983; Charniak 1972; Turner, 1994; Schubert and Hwang, 2000, …����
27
Story Understanding
Storytelling
Story Generation
Narrative Intelligence�
Script Learning
Narrative Structure Learning
Episodic Knowledge Acquisition
Story Understanding has been one of the oldest ambitions of AI, and one of the most extremely challenging tasks!
28
Story Understanding
Story Generation
Story Cloze Test (Mostafazadeh et al., 2016)�Commonsense Reasoning benchmark
Context:
Jim found an old disposable camera in the bottom of his junk drawer. He began snapping away at everything around him. The counter clicked down to one final photo. The gravity of the situation began to dawn on Jim.
29
Two alternative endings:
Jim took time to decide what he would take a picture of.
Jim took 20 more photos.
A challenging commonsense reasoning task, where SOTA was ~65% for many months after release of the dataset.
Story Cloze Test has only ~1,500 direct-supervision data points
30
We intentionally did not provide a large training data with direct-supervision, with the goal of pushing commonsense reasoning forward, and to prevent the task from becoming yet another pattern-recognition/memorization task.
Things got interesting in 2017-2018!
Brand new established SOTA on various supposedly more complex reading comprehension tasks.
31
GPT-1, 2018
(Radford et al., 2018)
(Radford et al , 2018)
(Radford et al , 2018
32
These results were on the Story Cloze Test v1, where there had been some stylistic biases (Sap et al., 2017).
We tested a host of models on the new debiased Story Cloze Test v 1.5 test set (Sharma et al., 2018).
Then the GPT-1 (Radford et al., 2018) model was the only model still holding its rather high performance!
So, do these models actually have narrative understanding? Are they actually learning to transfer various lexical, conceptual, commonsense, and world knowledge?
GPT-3, 2020...
33
34
We have come a long way…�We have to keep moving the goal post!
35
So we’ve come a rather long way in the last few years in giving AI systems common sense reasoning capabilities, with lots of exciting progress.
�We need to work on tackling the following issues which we are still grappling with…
36
Issue: Our amazing models often make glaringly stupid mistakes, being brittle! This makes it hard to deploy these models into real-world products.
37
38
Other Issues...
… And we yet don’t have an AI system that has commonsense of perhaps even a dog (?), let alone a 5-year-old kid
39
Startups are a perfect setting for developing novel AI models that actually have to work in the real messy world!
Moving the Goalpost on
Natural Language Understanding
40
When humans, even young children, read, they make countless implicit commonsense inferences that frame their understanding of the unfolding narrative!
Peppa was riding her bike.
A car turned in front of her. Peppa turned her bike sharply.
She fell off of her bike.
Peppa skinned her knee.
(adapted from the ROCStories corpus)
41
While reading, humans construct a coherent representation of what happened and why, combining information from the text with relevant background knowledge.
42
Humans can construct the causal chain that describes how the sequence of events led to a particular outcome!
43
A car turned in front of Peppa causes
Peppa to turn her bike sharply causes
Peppa fell off of her bike
causes
Peppa skinned her knee
causes
(likely) she asks for help!
Humans can also describe how characters’ different states, such as emotions and location, changed throughout the story.�
44
Peppa went from feeling (likely) happy to feeling in pain after falling.
Peppa was on her bike throughout riding it. Then after falling, Peppa was on the ground.
Though humans build such mental models of situations with ease (Zwaan et al., 1995), AI systems for tasks such as reading comprehension and dialogue remain far from exhibiting similar commonsense reasoning capabilities!
45
Why?
Difficulty in acquiring (often-implicit) commonsense knowledge at scale.
Difficulty in incorporating knowledge into state-of-the-art AI systems.
GLUCOSE: �GeneraLized and COntextualized Story Explanations!
(Mostafazadeh et al., 2020)
46
The GLUCOSE Task
47
ToC
GLUCOSE framework through an Example�Peppa was riding her bike. A car turned in front of her. Peppa turned her bike sharply. She fell off of her bike. Peppa skinned her knee.�
48
ToC
Dim #1
Is there an event
that directly causes
or enables X?
Dim #2
Is there an emotion or basic human drive that motivates X?
Dim #3
Is there a location state that enables X?
Generalized: General rules provide general mini-theories about the world!
Contextualized: Specific statements exemplify how a general rule could be grounded in a particular context
Semi-structured Inference Rule = antecedent connective consequent
GLUCOSE framework through an Example�Peppa was riding her bike. A car turned in front of her. Peppa turned her bike sharply. She fell off of her bike. Peppa skinned her knee.�
49
GLUCOSE captures mini causal theories about the world focused around events, states (location, possession, emotion, etc), motivations, and naive human psychology.
ToC
Dim #4
Is there a possession state that enables X?
Dim #5
Are there any other attributes enabling X?
GLUCOSE is a unique perspective on commonsense reasoning for presenting often-implicit commonsense knowledge in the form of semi-structured general inference rules that are also grounded in the context of a specific story!
How to address the problem of implicit knowledge acquisition at scale?
Filling in the GLUCOSE dimensions is a demanding task, requiring grasping the concepts of causality and generalization and to write semi-structured inference rules.
50
ToC
An effective multi-stage crowdsourcing platform
After many rounds of pilot studies, we successfully designed an effective platform for collecting GLUCOSE data that is cognitively accessible to laypeople!
51
ToC
GLUCOSE Qualification UI
GLUCOSE Main UI
GLUCOSE Review Dashboard
Statistics and Examples
52
ToC
# total annotations ~670K
# total unique stories 4,881
# workers participated 371
Various implicit and script-like mini-theories:
Number of rules collected for each of the GLUCOSE dimensions
GLUCOSE captures extensive commonsense knowledge that is unavailable in the existing resources
53
Ceiling overlap between GLUCOSE and other resources based on best-effort mapping of relations.
GLUCOSE Dim 1 2 5 6 7 10
ConceptNet (Speer et al., 2017) 1.2% 0.3% 0% 1.9% 0% 0%
ATOMIC (Sap et al., 2019) 7.8% 1.2% 2.9% 5.3% 1.8% 4.9%
Atomic:�inferential knowledge in natural language form
Atomic: 880,000 triples for AI systems to reason about causes and effects of everyday situations
X repels� Y’s attack
X repels� Y’s attack
X repels� Y’s attack
nine inference dimensions
Causes
Effects
X repels� Y’s attack
Static
Dynamic
X repels� Y’s attack
Involuntary
Voluntary
X repels� Y’s attack
Theme
Agent
X repels� Y’s attack
X repels� Y’s attack
300,000 event nodes to date
880,000 if-Event-then-* knowledge triples
Atomic: knowledge of cause and effect
(Pearl; Davis and Marcus 2015; Lake et al. 2017; Marcus 2018)
Theory of Mind
How to incorporate commonsense knowledge into the state-of-the-art AI systems?
64
GLUCOSE Empirical Evaluation Task�A testbed for evaluating models that can dynamically produce GLUCOSE-like inferences on novel input
65
We designed a specialized Human Evaluation UI for collecting reliable, reproducible, and calibrated ratings!
66
Automatic Evaluation�of natural language generations
67
Automatic Evaluation�of natural language generations in GLUCOSE
68
GLUCOSE task has a systematic evaluation that is fast and easily replicable!
GLUCOSE
Strong correlation between human and automatic metric!!
Example Predictions�Dimension 3; a location enabling X. �
69
GPT-2
Enc-Dec
Human
She was in front of a TV >Enables> Karen made a pan of lasagna.
Karen is in the kitchen >Enables> Karen makes a pan of lasagna
SomeoneA is in a kitchen >Enables> SomeoneA cooks SomethingA
Karen is in the kitchen >Enables> Karen made a pan of lasagna
SomeoneA is in a kitchen >Enables> SomeoneA prepares SomethingA (that is a dish)
ToC
Enc-Dec
(Raffel et al., 2019)
Human!
Avg: s 2.8/3 g 2.6/3
Avg: s 2.6/3 g 2.3/3
Example Predictions�Dimension 6; an event that X Causes/Enables. �. �
70
Enc-Dec
Human
Karen makes a pan of lasagna >Causes/Enables> Karen eats it for a week
SomeoneA makes SomethingA (that is food) >Causes/Enables> SomeoneA eats SomethingA
Karen makes a pan of lasagna >Causes/Enables> Karen brought it to the party
SomeoneA prepares SomethingA (that is a dish) >Causes/Enables> SomeoneA takes SomethingA to SomethingB (that is an event)
ToC
71
Conclusion
We proved our following hypothesis
72
A promising new recipe for giving machines commonsense is to use high-quality commonsense knowledge such as GLUOCSE for training neural models that have pre-existing lexical and conceptual knowledge.
GLUCOSE-Trained model that can dynamically generate GLUCOSE dimensions for any novel input
Static commonsense knowledge base with GLUCOSE mini-theories authored by humans
<
ToC
Classic commonsense knowledge bases have been static
New commonsense knowledge bases should be dynamic
value
All the data and models can be found under: https://github.com/ElementalCognition/glucose/
Are our commonsense reasoning systems simply hallucinating?!
73
Perhaps...
“Our conscious reality is just a hallucination
that we collectively agree upon.”
--Anil Seth
Actually, as humans, we also don’t just passively perceive the world,
we actively generate it and make best guesses based on our prior experiences!
So, our AI reasoning systems also simply make their best guesses, given the perceptual evidence, which will be always defeasible in light of further evidence.
74
To conclude:
We have come a long way in giving our AI systems narrative understanding, but we are just getting close to the foothills...
where we need to get
where we are
75
“To build truly intelligent machines, teach them cause & effect” --Judea Pearl
Truly intelligent AI systems need to learn to build causal explanations
Any Questions?
☺
76
ToC