The role of discreteness in gradient inference judgments
Aaron Steven White
University of Rochester
Colloquium Talk
Leipzig University
31 May 2023
A case study of lexically triggered
inferences
Overarching Question�In using a particular linguistic expression, what can we mean and what must we mean?
I remembered that I went a few days ago, but I’m now realizing I forgot to grab beer.
Jo:
[to Jo] I thought you were heading out to get groceries.
Bo:
⇝
Bo no longer thinks that Jo is heading out to get groceries.
⇝
Jo went to get groceries a few days before.
Jo did not grab beer.
Cessation inference
Veridicality inferences
Observation�What we can mean in using an expression is constrained by lexical knowledge.
I remembered that I went a few days ago, but I’m now realizing I forgot to grab beer.
Jo:
[to Jo] I thought you were heading out to get groceries.
Bo:
⇝
Bo no longer thinks that Jo is heading out to get groceries.
⇝
Jo went to get groceries a few days before.
Jo did not grab beer.
Cessation inference
Veridicality inferences
describes a mental event grounded in experience
Observation�What we can mean in using an expression is constrained by lexical knowledge.
in conjunction with structural knowledge
I remembered that I went a few days ago, but I’m now realizing I forgot to grab beer.
Jo:
[to Jo] I thought you were heading out to get groceries.
Bo:
⇝
Bo no longer thinks that Jo is heading out to get groceries.
⇝
Jo went to get groceries a few days before.
Jo did not grab beer.
Cessation inference
Veridicality inferences
I remembered that I went a few days ago, and I’m now realizing I forgot that I even grabbed beer already.
Jo:
[to Jo] I thought you were heading out to get groceries.
Bo:
⇝
Bo no longer thinks that Jo is heading out to get groceries.
⇝
Jo went to get groceries a few days before.
Jo did not grab beer.
Cessation inference
Veridicality inferences
Question�What knowledge undergirds what we must mean in using an utterance?
Language
NP
to VP
Concepts
forget
⇝
NP not VP
forget
Language
Concepts
NP
that S
⇝
S
forget
Language
Concepts
NP
that S
⇝
S
Question�What types of concepts does language “see”?
Prior Work�For some areas of the lexicon, we have a solid understanding what language ”sees”.
Generalization #1 Barwise & Cooper 1981�Determiners are conservative: if D is a determiner then D expresses a relation R between sets A and B s.t. R(A, B) iff R(A, A ∩ B).
Every greyhound is happy.
Every greyhound is a happy greyhound.
Some greyhound is happy.
Some greyhound is a happy greyhound.
Most greyhounds are happy.
Most greyhounds are happy greyhounds.
Generalization #1 Barwise & Cooper 1981�Determiners are conservative: if D is a determiner then D expresses a relation R between sets A and B s.t. R(A, B) iff R(A, A ∩ B).
Language only “sees” relational concepts of this form.
Generalization #2 Gärdenfors 2000, Jäger 2010�Color terms express convex regions in color space.
Language only “sees” convex color concepts.
Challenge�As we expand to larger and more open classes of words, generalizations tend to be harder to find.
Reason�Standard methodologies for discovering lexical generalizations do not scale well because we don’t have good sampling methodologies.
White, Aaron Steven. 2021. On Believing and Hoping Whether. Semantics and Pragmatics 14 (6): 1–18.
Proposed Generalization #1 Egre 2008 see also Hintikka 1975�A predicate triggers veridicality inferences…
I remembered that I went a few days ago, but I’m now realizing I forgot to grab beer.
Jo:
[to Jo] I thought you were heading out to get groceries.
Bo:
⇝
Bo no longer thinks that Jo is heading out to get groceries.
⇝
Jo went to get groceries a few days before.
Jo did not grab beer.
Cessation inference
Veridicality inferences
Proposed Generalization #1 Egre 2008 see also Hintikka 1975�A predicate triggers veridicality inferences iff it takes both declarative and interrogative clauses.
Triggers veridicality inferences
Takes both declaratives and interrogatives
White & Rawlins 2018, White 2021
Proposed Generalization #2 Zuber 1983, Theiler et al. 2019�If a predicate triggers neg-raising inferences, it does not take interrogative clauses.
Question�Could this be because there are no lexical generalizations to be had for these inferences?
Possibility #1 see Degen & Tonhauser 2022 on veridicality inferences�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.
Simons et al. 2010, 2017
Judith Degen
Stanford University
Judith Tonhauser
University of Stuttgart
Question�Could this be because there are no lexical generalizations to be had for these inferences?
Possibility #1 see Degen & Tonhauser 2022 on veridicality inferences�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.
Simons et al. 2010, 2017
Evidence
Degen & Tonhauser 2022, Fig. 7, derived from White & Rawlins’ (2016) MegaVeridicality dataset
This evidence is extremely weak!
This gradience may arise from noise in the measurement that obscures discrete classes.
Question�Could this be because there are no lexical generalizations to be had for these inferences?
Possibility #1 see Degen & Tonhauser 2022 on veridicality inferences�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.
Simons et al. 2010, 2017
Evidence
Degen & Tonhauser 2022, Fig. 7, derived from White & Rawlins’ (2018) MegaVeridicality dataset
Few predicates up here!
Question�Could this be because there are no lexical generalizations to be had for these inferences?
Possibility #1 (see Degen & Tonhauser 2022 on veridicality inferences)�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.
Possibility #2�There is such a classification. It’s just obscured by noise in the measurement.
Simons et al. 2010, 2017
Possibility #2�There is such a classification. It’s just obscured by noise in the measurement.
Pursued in this talk!
Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?
Jo hated that Bo left. ⇝ Bo left.
NP V S ⇝ S
Veridicality inference
Jo hated that Bo left. ⇝ Jo believed Bo left.
Doxastic inference
NP V S ⇝ NP believe S
Jo hated that Bo left. ⇝ Jo didn't want Bo to have left.
Bouletic inference
NP V S ⇝ NP not want S
Predicate | NP V S ⇝ S | NP V S ⇝ NP believe S | NP V S ⇝ NP want S |
think | 0 | + | 0 |
doubt | 0 | - | 0 |
hope | 0 | 0 | + |
hate | + | + | - |
Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?
Yes! Though some patterns are associated with defeasible inferences.
Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?
Possibility #1�A pattern containing a defeasible inference implies irreducible gradience (e.g. vagueness) in the semantic content conditioning that pattern.
Possibility #2�A pattern containing a defeasible inference implies some gradient metalinguistic knowledge about the semantic content conditioning that pattern.
Possibility #2�A pattern containing a defeasible inference implies some gradient metalinguistic knowledge about the semantic content conditioning that pattern.
Pursued in this talk!
Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?
Yes! Though some patterns are associated with defeasible inferences.
Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?
It’s gradient metalinguistic knowledge about sense and/or structural indeterminacy.
Takeaway�We should continue to posit discrete semantic contents that pragmatics selects among.
Inference Patterns
Ben Kane
University of Rochester
Will Gantt
University of Rochester
Kyle Rawlins
Johns Hopkins University
Ellise Moon
University of Rochester
Hannah An
University of Rochester
Ben Kane
University of Rochester
Will Gantt
University of Rochester
Yu’an Yang
Amazon
Zhendong Liu
University of Southern California
Nick Huang
National University of Singapore
Ben Van Durme
Johns Hopkins University
Rachel Rudinger
University of Maryland
Approach
Predicate | NP V S ⇝ S | NP V S ⇝ NP believe S | NP V S ⇝ NP want S |
think | 0 | + | 0 |
doubt | 0 | - | 0 |
hope | 0 | 0 | + |
hate | + | + | - |
NP V S ⇝ S
Veridicality inference
Doxastic inference
NP V S ⇝ NP believe S
Bouletic inference
NP V S ⇝ NP want S
NP not V S ⇝ (not) S
NP not V S ⇝ NP (not) believe S
NP not V S ⇝ NP (not) want S
(not)
(not)
(not)
Neg-raising inference
NP not V S ⇝ NP V not S
Approach
Takeaway�There are recurring, distributionally correlated patterns of inference when filtering noise!
Roadmap
Inference Patterns�Measuring Distribution
Kyle Rawlins
Johns Hopkins University
MegaAcceptability dataset
Acceptability for 1,000 verbs in 50 syntactic frames focused on clause-embedding.
White & Rawlins 2016, 2020
think
know
wonder
love
surprise
tell
say
start
stop
...
Verbs
Bleaching method
Frame templates (e.g. NP __ that S) instantiated by semantically bleached fillers.
Someone __ed something happened
Someone __ed that something happened
Someone __ed whether something happened
Someone __ed which someone something happened
Someone __ed someone that something happened
Someone __ed someone whether something happened
Someone __ed to someone that something happened
Someone __ed to do something
Someone __ed someone to do something
...
think
know
wonder
love
surprise
tell
say
start
stop
...
x
Verbs
Frames
50,000 total items x 5 judgments per item
MegaAcceptability dataset
Acceptability for 1,000 verbs in 50 syntactic frames focused on clause-embedding.
White & Rawlins 2016, 2020
Question
Is bleaching a valid method for capturing the acceptability of a verb in a frame?
Validation Strategy
Compare judgments for bleached items against judgments from trained linguists.
Validation data
Comparison
Correlation between judgments from LI and Sprouse et al.'s (2013) dataset
Sprouse Linguistic Inquiry
MegaAcceptability
Correlation
Conclusion
Safe to use bleaching to collect acceptabiliy judgments focused on capturing selection.
Inference Patterns�Measuring Inference
NP V S ⇝ S
Veridicality inference
Doxastic inference
NP V S ⇝ NP believe S
Bouletic inference
NP V S ⇝ NP want S
NP not V S ⇝ (not) S
NP not V S ⇝ NP (not) believe S
NP not V S ⇝ NP (not) want S
(not)
(not)
(not)
Neg-raising inference
NP not V S ⇝ NP V not S
Recipe
Veridicality task
White & Rawlins 2018
Kyle Rawlins
Johns Hopkins University
Ben Van Durme
Johns Hopkins University
Rachel Rudinger
University of Maryland
Someone was irritated that a particular thing happened.
Did that thing happen?
no maybe or maybe not yes
Veridicality task
White & Rawlins 2018
Someone {knew, didn't know} that a particular thing happened.
NP _ that S
Someone {was, wasn't} surprised that a particular thing happened.
NP be _ that S
Someone {needed, didn’t need} for a particular thing to happen.
NP _ for NP to VP
Someone {told, didn’t tell} a particular person to do a particular thing.
Someone {believed, didn’t believe} a particular person to have a particular thing.
NP _ NP to VP[+/-eventive]
A particular person {was, wasn’t} excited to do a particular thing.
A particular person {was, wasn’t} suspected to have a particular thing.
NP be _ to VP[+/-eventive]
A particular person {managed, didn’t manage} to do a particular thing.
A particular person {seemed, didn’t seem} to have a particular thing.
NP _ to VP[+/-eventive]
Neg-raising task
An & White 2020
Hannah An
University of Rochester
If I were to say I don’t think that a particular thing happened, how likely is it that I mean I think that that thing didn’t happen?
Neg-raising task
Extremely unlikely
Extremely likely
An & White 2020
know that a particular thing happened.
NP _ that S
A particular person {didn’t, doesn’t}
I {didn’t, don’t}
surprised that a particular thing happened.
NP be _ that S
A particular person {wasn’t, isn’t}
I {wasn’t, ‘m not}
told to do a particular thing.
believed to have a particular thing.
NP be _ to VP[+/-eventive]
A particular person {wasn’t, isn’t}
I {wasn’t, ‘m not}
managed to do a particular thing.
seemed to have a particular thing.
NP _ to VP[+/-eventive]
A particular person {didn’t, doesn’t}
I {didn’t, don’t}
If A knew that C happened, how likely is it that A believed that C happened?
Doxastic task
Extremely unlikely
Extremely likely
Kane et al. 2021
If A persudaded B that C happened, how likely is it that B believed that C happened?
Doxastic task
Extremely unlikely
Extremely likely
Kane et al. 2021
If A was appalled that C happened, how likely is it that A wanted C to have happened?
Bouletic task
Extremely unlikely
Extremely likely
Kane et al. 2021
If A apologized to B that C happened, how likely is it that B wanted C to have happened?
Bouletic task
Extremely unlikely
Extremely likely
Kane et al. 2021
A {knew, didn't know} that C happened.
NP _ that S
A {told, didn't tell} B that C happened.
NP _ NP that S
A {said, didn't say} to B that C happened.
NP _ to NP that S
A {was, wasn’t} surprised that C happened.
NP _ that S
A {hoped, didn't hope} that C would happen.
NP _ that S[+future]
A {promised, didn't promise} B that C would happen.
NP _ NP that S[+future]
A {predicted, didn't predict} to B that C would happen.
NP _ to NP that S[+future]
A {was, wasn’t} excited that C would happen.
NP _ that S[+future]
Question
Is bleaching a valid method for capturing inferences associated with verb in a frame?
Validation Strategy #1
Compare judgments for bleached items against judgments from trained linguists.
| Neg-raising | Non-neg-raising |
NP __ that S | think, believe, feel, reckon, figure, guess, suppose, imagine | announce, claim, assert, report, know, realize, notice, find out |
NP __ to VP | want, wish, happen, seem, plan, intend, mean, turn out | love, hate, need, continue, try, like, desire, decide |
Non-neg-raising
Neg-raising
Mean rating of bleached example
Validation Strategy #1
Compare judgments for bleached items against judgments from trained linguists.
Validation Strategy #2
Compare judgments for bleached items to judgments for more contentful items.
Implementation
For each verb-frame pair in validation set, sample five items from corpus.
Mean rating of corpus example
Mean rating of bleached example
r = 0.8
(p < 0.001)
Validation Strategy #1
Compare judgments for bleached items against judgments from trained linguists.
Validation Strategy #2
Compare judgments for bleached items to judgments for more contentful items.
Validation Strategy #3
Compare inference judgments for bleached items to acceptability judgments for established distributional diagnostic.
Implementation
For each verb-frame pair in validation set, collect acceptability of strong NPI (additive either).
Jo didn’t do a particular thing, and…
…I think that Bo didn’t do that thing either.
…I don’t think that Bo did that thing either.
Mean rating of bleached example
Mean acceptability of strong NPI
r = 0.77
(p < 0.001)
Conclusion
Safe to use bleaching to collect at least these types of inference judgments.
Important Point (again)
Be cautious in using this dataset to investigate individual predicates.
Inference Patterns�Discovering Patterns
Approach
Cluster predicate-frame pairs in inference space using a multiview mixed effects mixture model.
Predicate | NP V S ⇝ S | NP V S ⇝ NP believe S | NP V S ⇝ NP want S |
think | 0 | + | 0 |
doubt | 0 | - | 0 |
hope | 0 | 0 | + |
hate | + | + | - |
know + NP _ that S
1
2
3
4
5
6
7
8
9
10
11
12
Inference patterns
1
0
1
0
1
0
Doxastic
Bouletic
no
maybe
yes
Veridicality
Neg-raising
know + NP _ that S
1
2
3
4
5
6
7
8
9
10
11
12
Inference patterns
1
0
1
0
1
0
Doxastic
Bouletic
no
maybe
yes
Veridicality
Neg-raising
Finding clusters
Fit model to raw that-clause data in MegaVeridicality, MegaNegRaising, and MegaIntensionality using variational inference.
Output
know + NP _ that S
1
2
3
4
5
6
7
8
9
10
11
12
Inference patterns
1
0
1
0
1
0
Doxastic
Bouletic
no
maybe
yes
Veridicality
Neg-raising
Output
know + NP _ that S
1
2
3
4
5
6
7
8
9
10
11
12
Inference patterns
1
0
1
0
1
0
Doxastic
Bouletic
no
maybe
yes
Veridicality
Neg-raising
Question
How many inference patterns should we assume there are?
Idea
Only as many as we need to explain syntactic distribution.
Implementation
Select the smallest clustering for which no larger clustering improves prediction of the judgments in MegaAcceptability.
Cluster
Predicate
Cluster
Frame
Predicate
Implementation
Select the smallest clustering for which no larger clustering improves prediction of the judgments in MegaAcceptability.
Result
Optimal number of inference patterns is 15.
Implementation
Select the smallest clustering for which no larger clustering improves prediction of the judgments in MegaAcceptability.
Result
Optimal number of inference patterns is 15.
Interpretation
There are at least 15 distributionally correlated inference patterns.
Important Point #2
Enriching the distributional representation could increase the granularity of the patterns.
Important Point #1
Not all inference patterns instantiated by particular predicates will get their own inference pattern.
Inference Patterns�Investigating Patterns
know + NP _ that S
1
2
3
4
5
6
7
8
9
10
11
12
Inference patterns
1
0
1
0
1
0
Doxastic
Bouletic
no
maybe
yes
Veridicality
Neg-raising
0
0.5
1
Predicate
Cluster
Frame
Predicate
Representiationals
doxastic mental states and mental processes
NP {thought, believed, suspected} that S
Preferentials
expressions of preference for a (future) situation.
NP {hoped, wished, demanded, recommended} that S[+/-future]
Positive internal emotives
positive emotional states
A was {pleased, thrilled, enthused} that C happened.
Preferentials
expressions of preference for a (future) situation.
NP {hoped, wished, demanded, recommended} that S[+/-future]
Negative emotive miratives
expressions of surprise with negative valence
NP was {dazed, flustered, alarmed} that S[+future].
Negative external emotives
expressions of negative emotion with behavioral correlates
NP {whined, whimpered, pouted} to NP that S[+future].
Positive external emotives
expressions of positive emotion with behavioral correlates
NP was {congratulated, praised, fascinated} that S.
Positive internal emotives
positive emotional states
NP was {pleased, thrilled, enthused} that S.
Preferentials
expressions of preference for a (future) situation.
NP {hoped, wished, demanded, recommended} that S[+future/-tense]
Negative internal emotives
negative emotional states
NP was {frightened, disgusted, infuriated} that S.
Representiationals
doxastic mental states and mental processes
NP {thought, believed, suspected} that S
Speculatives
communication of uncertain beliefs.
NP {ventured, guessed, gossiped} that S
Future commitment
expressions of commitment to future action or result.
NP {promised, ensured, attested} S[+future]
Weak communicatives
communicative acts with weak doxastic inferences about the source.
NP {reported, remarked, yelped} to NP that S
Representiationals
doxastic mental states and mental processes
NP {thought, believed, suspected} that S
Speculatives
communication of uncertain beliefs.
NP {ventured, guessed, gossiped} that S
Future commitment
expressions of commitment to future action or result.
NP {promised, ensured, attested} S[+future]
Strong communicatives
communicative acts with strong doxastic inferences about the source.
NP {confessed, admitted, acknowledged} that S
Discourse commitment
communicative acts committing the source to the content’s truth.
A {maintained, remarked, swore} that C would happen.
Negative emotive miratives
expressions of surprise with negative valence
A was {dazed, flustered, alarmed} that C would happen.
Negative external emotives
expressions of negative emotion with behavioral correlates
A {whined, whimpered, pouted} to B that C would happen.
Positive external emotives
expressions of positive emotion with behavioral correlates
A was {congratulated, praised, fascinated} that C happened.
Positive internal emotives
positive emotional states
A was {pleased, thrilled, enthused} that C happened.
Preferentials
expressions of preference for a (future) situation.
NP {hoped, wished, demanded, recommended} that S[+/-future]
Negative internal emotives
negative emotional states
A was {frightened, disgusted, infuriated} that C happened.
Negative emotive communicatives
communicative acts with broadly negative valence.
A {screamed, ranted, growled} to B that C would happen.
Weak communicatives
communicative acts with weak doxastic inferences about the source.
NP {reported, remarked, yelped} to NP that S
Representiationals
doxastic mental states and mental processes
NP {thought, believed, suspected} that S
Speculatives
communication of uncertain beliefs.
NP {ventured, guessed, gossiped} that S
Future commitment
expressions of commitment to future action or result.
NP {promised, ensured, attested} S[+future]
Strong communicatives
communicative acts with strong doxastic inferences about the source.
NP {confessed, admitted, acknowledged} that S
Deceptives
actions involving dishonesty, deceit, or pretense.
NP {lied, misled, faked, fabricated} ((to) NP) that S.
Discourse commitment
communicative acts committing the source to the content’s truth.
NP{maintained, remarked, swore} that S[+future].
Interpretation
There are at least 15 distributionally correlated inference patterns.
Important Point #2
Enriching the distributional representation could increase the granularity of the patterns.
Important Point #1
Not all inference patterns instantiated by particular predicates will get their own inference pattern.
Interpretation
There are at least 15 distributionally correlated inference patterns.
Important Point #2
Enriching the distributional representation could increase the granularity of the patterns.
Important Point #1
Not all inference patterns instantiated by particular predicates will get their own inference pattern.
Inference Patterns�Discussion�
Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?
distributionally correlated
Yes!
Yes! Though some patterns are associated with defeasible inferences.
Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?
Possibility #1�A pattern containing a defeasible inference implies irreducible gradience (e.g. vagueness) in the semantic content conditioning that pattern.
Possibility #2�A pattern containing a defeasible inference implies some gradient metalinguistic knowledge about the semantic content conditioning that pattern.
Possibility #2a�Gradient metalinguistic knowledge is knowledge about how to resolve formal indeterminacy (e.g. lexical/structural ambiguity).
Possibility #2b�Gradient metalinguistic knowledge is knowledge about contexts that a lexical item is used in.
Defeasible Inferences
Julian Grove
University of Rochester
Our Approach
Judith Degen
Stanford University
Judith Tonhauser
University of Stuttgart
Degen & Tonhauser’s Data
Empirical distribution of norming response
How likely is is that Zoe calculated the tip?
Degen & Tonhauser’s Data
Empirical distribution of projection response
X asks: Did Y pretend that P?
Is X certain that P?
Empirical distribution of projection response
X asks: Did Y know that P?
Is X certain that P?
Defeasible Inferences�Our Models��
Julian Grove
University of Rochester
Jean-Philippe Bernardy
University of Gothenburg
Jo
knew
that
Bo
left
b
λx.LEAVE(x)
j
e
λp.λx.p: BEL(x, p)
λp.p
[[ ]]c
LEAVE(b)
LEAVE(b)
λx.LEAVE(b): BEL(x, LEAVE(b))
LEAVE(b): BEL(j, LEAVE(b))
Jo
knew
that
Bo
left
b
λx.LEAVE(x)
j
e
λp.λx.p: BEL(x, p)
λp.p
[[ ]]c
LEAVE(b)
LEAVE(b)
λx.LEAVE(b): BEL(x, LEAVE(b))
LEAVE(b): BEL(j, LEAVE(b))
Maps deterministic programs to probabilistic program structured by compositional semantics
Jo
knew
that
Bo
left
b
λx.LEAVE(x)
j
λp.λx.p: BEL(x, p)
λp.p
[[ ]]c
LEAVE(b)
LEAVE(b)
λx.LEAVE(b): BEL(x, LEAVE(b))
LEAVE(b): BEL(j, LEAVE(b))
Maps deterministic programs to probabilistic program structured by compositional semantics
η
Monadic return (continuation monad)
λc.c(a): α → (α → r) → r
Probability distribution
Probabilities
r
r
Threads probability distributions through derivation
Chris Barker
New York University
Chung-Shieh Shan
Indiana University
Jo
knew
that
Bo
left
b
λx.LEAVE(x)
j
e
λp.λx.p: BEL(x, p)
λp.p
[[ ]]c
LEAVE(b)
LEAVE(b)
λx.LEAVE(b): BEL(x, LEAVE(b))
LEAVE(b): BEL(j, LEAVE(b))
Allows us to stack multiple layers of uncertainty while retaining compositionality!
Julian Grove
University of Rochester
Idea�Model fundamental gradience on the “inner layer” and metalinguistic uncertainty on the “outer”.
Upshot�Layer at which probabilistic program evaluated corresponds to different hypotheses about the nature of gradience.
Jo
knew
that
Bo
left
b
λx.LEAVE(x)
j
e
λp.λx.p: BEL(x, p)
λp.p
[[ ]]c
LEAVE(b)
LEAVE(b)
λx.LEAVE(b): BEL(x, LEAVE(b))
LEAVE(b): BEL(j, LEAVE(b))
Allows us to stack multiple layers of uncertainty while retaining compositionality!
Julian Grove
University of Rochester
Importantly, stacking not done by monad transformers!
Verb discrete-context discrete
πc ∼ Prob(LEAVE(b))
whether the embedded proposition is true
Sample prior probabilities for:
Sample a discrete value for:
whether the verb projects the embedded proposition
πp ∼ Prob(KNOW(j, _))
τc ∼ Bernoulli(πc)
whether the embedded proposition is true
τp ∼ Bernoulli(πp)
whether the verb projects the embedded proposition
τp ∨ [¬τp → τc]
On each response:
πc ∼ Prob(LEAVE(b))
Verb discrete-context discrete
whether the embedded proposition is true
whether the verb projects the embedded proposition
Sample prior probabilities for:
Sample a discrete value for:
τp ∼ Bernoulli(πp)
τc ∼ Bernoulli(πc)
πp ∼ Prob(KNOW(j, _))
whether the embedded proposition is true
whether the verb projects the embedded proposition
On each response:
τp ∨ τc
πc ← Prob(LEAVE(b))
Verb discrete-context discrete
whether the embedded proposition is true
whether the verb projects the embedded proposition
Sample prior probabilities for:
Sample a discrete value for:
τp ← Bernoulli(πp)
τc ← Bernoulli(πc)
πp ← Prob(KNOW(j, _))
whether the embedded proposition is true
whether the verb projects the embedded proposition
On each response:
η(η(τp ∨ τc))
do
Verb gradient-context gradient
πc ∼ Prob(LEAVE(b))
whether the embedded proposition is true
whether the verb projects the embedded proposition
Sample prior probabilities for:
πp ∼ Prob(KNOW(j, _))
τp ∼ Bernoulli(πp)
τc ∼ Bernoulli(πc)
On each response, produce the likelihood that the verb project or the embedded proposition is true.
τp ∨ τc
πc ∼ Prob(LEAVE(b))
Verb discrete-context gradient
whether the embedded proposition is true
whether the verb projects the embedded proposition
Sample prior probabilities for:
τc ∼ Bernoulli(πc)
τp ∼ Bernoulli(πp)
πp ∼ Prob(KNOW(j, _))
τp ∨ τc
Sample a discrete value for:
whether the verb projects the embedded proposition
On each response:
Defeasible Inferences�Modeling prior knowledge�
Approach
πc ∼ Prob(LEAVE(b))
τp ∨ [¬τp → τc]
Verb discrete-context discrete
whether the embedded proposition is true
whether the verb projects the embedded proposition
Sample prior probabilities for:
Sample a discrete value for:
τp ∼ Bernoulli(πp)
τc ∼ Bernoulli(πc)
πp ∼ Prob(KNOW(j, LEAVE(b)))
whether the embedded proposition is true
whether the verb projects the embedded proposition
On each response:
Empirical distribution of norming response
How likely is is that Zoe calculated the tip?
Posterior distribution of mean
How likely is is that Zoe calculated the tip?
Approach
Defeasible Inferences�Modeling Inference Judgments�
Approach
πc ∼ Prob(LEAVE(b))
τp ∨ [¬τp → τc]
Verb discrete-context discrete
whether the embedded proposition is true
whether the verb projects the embedded proposition
Sample prior probabilities for:
Sample a discrete value for:
τp ∼ Bernoulli(πp)
τc ∼ Bernoulli(πc)
πp ∼ Prob(KNOW(j, LEAVE(b)))
whether the embedded proposition is true
whether the verb projects the embedded proposition
On each response:
Approach
Model comparison results
Empirical distribution of projection response
X asks: Did Y pretend that P?
Is X certain that P?
Empirical distribution of projection response
X asks: Did Y know that P?
Is X certain that P?
Posterior distribution of mean response
Defeasible Inferences�Additional Evaluation�
Question�Do we see the same pattern when we take away all information about embedded clause content?
Approach
Model comparison results on bleached
Model comparison results on templatic
Question�Do we see the same pattern when we take away all information about embedded clause content?
Answer�Yes!
Defeasible Inferences�Discussion�
Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?
Yes! Though some patterns are associated with defeasible inferences.
Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?
Possibility #1�A pattern containing a defeasible inference implies irreducible gradience (e.g. vagueness) in the semantic content conditioning that pattern.
Possibility #2�A pattern containing a defeasible inference implies some gradient metalinguistic knowledge about the semantic content conditioning that pattern.
Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?
Yes! Though some patterns are associated with defeasible inferences.
Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?
It’s gradient metalinguistic knowledge about sense and/or structural indeterminacy.
Possibility #2a�Gradient metalinguistic knowledge is knowledge about how to resolve formal indeterminacy (e.g. lexical/structural ambiguity).
Possibility #2b�Gradient metalinguistic knowledge is knowledge about contexts that a lexical item is used in.
Question�Could this be because there are no lexical generalizations to be had for these inferences?
Possibility #1 (see Degen & Tonhauser 2022 on veridicality inferences)�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.
Possibility #2�There is such a classification. It’s just obscured by noise in the measurement.
Simons et al. 2010, 2017
Possibility #2�There is such a classification. It’s just obscured by noise in the measurement.
Possibility #2a�Gradient metalinguistic knowledge is knowledge about how to resolve formal indeterminacy (e.g. lexical/structural ambiguity).
Possibility #2b�Gradient metalinguistic knowledge is knowledge about contexts that a lexical item is used in.
Possibility #2a�Gradient metalinguistic knowledge is knowledge about how to resolve formal indeterminacy (e.g. lexical/structural ambiguity).
Conclusion�
Takeaway�We should continue to posit discrete semantic contents that pragmatics selects among.
It’s not all just pragmatics.
Question for Future Work #1�How might pragmatics select among alternative discrete semantic contents?
Possible Approach�Integrate existing conventionalist account of lexically triggered inference into probabilistic semantics framework (e.g. alternative-based account, as in Abusch 2002, 2010).
Question for Future Work #2�To what extent does class-level knowledge interact with pragmatic selection processes?
Approach�Integrate mixture model into the (hyper)priors of the probabilistic semantics model.
πc ∼ Prob(LEAVE(b))
Verb discrete-context gradient
whether the embedded proposition is true
whether the verb projects the embedded proposition
Sample prior probabilities for:
τc ∼ Bernoulli(πc)
τp ∼ Bernoulli(πp)
πp ∼ Prob(KNOW(j, _))
τp ∨ τc
Sample a discrete value for:
whether the verb projects the embedded proposition
On each response:
Question for Future Work #3�To what extent do inference types interrelate?
Approach�Integrate probabilistic inference relationships among inference types into model.
Thanks!
Supported by NSF-BCS-1748969
The MegaAttitude Project: Investigating selection and polysemy at the scale of the lexicon
Appendix A:�Further Validation of MegaAcceptability
Case Study�The vast majority of about-PPs are adjuncts�
Rawlins 2013, 2014
XP1 V (XP2) (XP3) about XP4
is acceptable
XP1 V (XP2) (XP3)
is acceptable
X
NP _ed
NP _ed about XP
Rawlins 2014
NP _ed about XP
NP _ed
NP _ed about XP
NP _ed
NP _ed
NP _ed about XP
NP _ed about XP
NP _ed
NP _ed
NP _ed about XP
NP _ed about XP
NP _ed
Noise variance / acceptability variance
Proportion violations
Independence
Noise variance / acceptability variance
Proportion violations
Independence
NP (was) _ed
NP (was) _ed about whether S
NP (was) _ed about whether S
NP (was) _ed
NP (was) _ed about whether S
NP (was) _ed
NP (was) _ed about whether S
NP (was) _ed
Acceptability threshold
Proportion violations
Noise variance / acceptability variance
Proportion violations
Independence
Acceptability threshold
Proportion violations
Acceptability threshold
Proportion violations
Appendix B:�Distribution of Inference Judgments
Appendix C:�Validation of MegaIntensionality
Question
Is bleaching a valid method for capturing doxastic and bouletic inferences associated with verb in a frame?
Challenge
Doxastic and bouletic inferences are highly sensitive to world knowledge.
Jo doubts that Bo left. ⇝ Jo doesn't believe that Bo left.
Jo doubts that Bo left. ⇝ Jo wants Bo to have left.
Trump doubts that he won in 2020.
Trump wants to have won in 2020.
Approach
Executives generally want their deals to go through.
Executives generally believe that their deals will go through.
Norming
Approach
Executives generally want their deals to go through.
Executives generally believe that their deals will go through.
Norming
The executive knew that his deal had gone through.
Contentful
Approach
Executives generally want their deals to go through.
Executives generally believe that their deals will go through.
Norming
The executive knew that his deal had gone through.
Contentful
A knew that C happened.
Bleached
Appendix D:�Number of possible inference patterns
(3 veridicality inferences)2 matrix polarities
x
(3 doxastic inferences)2 matrix polarities
x
(3 bouletic inferences)2 matrix polarities
x
2 neg-raising inferences
=
1,458 inference patterns
If any lexical knowledge relevant to any inference type is gradient (and continuous), there are an uncountable number of patterns.
Appendix E:�Principal Component Analysis
95% of variance