1 of 243

The role of discreteness in gradient inference judgments

Aaron Steven White

University of Rochester

Colloquium Talk

Leipzig University

31 May 2023

A case study of lexically triggered 

inferences

2 of 243

Overarching Question�In using a particular linguistic expression, what can we mean and what must we mean?

3 of 243

I remembered that I went a few days ago, but I’m now realizing I forgot to grab beer.

Jo:

[to Jo] I thought you were heading out to get groceries.

Bo:

Bo no longer thinks that Jo is heading out to get groceries.

Jo went to get groceries a few days before.

Jo did not grab beer.

Cessation inference

Veridicality inferences

4 of 243

Observation�What we can mean in using an expression is constrained by lexical knowledge.

5 of 243

I remembered that I went a few days ago, but I’m now realizing I forgot to grab beer.

Jo:

[to Jo] I thought you were heading out to get groceries.

Bo:

Bo no longer thinks that Jo is heading out to get groceries.

Jo went to get groceries a few days before.

Jo did not grab beer.

Cessation inference

Veridicality inferences

describes a mental event grounded in experience

6 of 243

Observation�What we can mean in using an expression is constrained by lexical knowledge.

in conjunction with structural knowledge

7 of 243

I remembered that I went a few days ago, but I’m now realizing I forgot to grab beer.

Jo:

[to Jo] I thought you were heading out to get groceries.

Bo:

Bo no longer thinks that Jo is heading out to get groceries.

Jo went to get groceries a few days before.

Jo did not grab beer.

Cessation inference

Veridicality inferences

8 of 243

I remembered that I went a few days ago, and I’m now realizing I forgot that I even grabbed beer already.

Jo:

[to Jo] I thought you were heading out to get groceries.

Bo:

Bo no longer thinks that Jo is heading out to get groceries.

Jo went to get groceries a few days before.

Jo did not grab beer.

Cessation inference

Veridicality inferences

9 of 243

Question�What knowledge undergirds what we must mean in using an utterance?

10 of 243

Language

NP

to VP

Concepts

forget

NP not VP

11 of 243

forget

Language

Concepts

NP

that S

S

12 of 243

forget

Language

Concepts

NP

that S

S

13 of 243

Question�What types of concepts does language “see”?

14 of 243

Prior Work�For some areas of the lexicon, we have a solid understanding what language ”sees”.

15 of 243

Generalization #1 Barwise & Cooper 1981�Determiners are conservative: if D is a determiner then D expresses a relation R between sets A and B s.t. R(A, B) iff R(A, AB).

16 of 243

Every greyhound is happy.

Every greyhound is a happy greyhound.

Some greyhound is happy.

Some greyhound is a happy greyhound.

Most greyhounds are happy.

Most greyhounds are happy greyhounds.

17 of 243

Generalization #1 Barwise & Cooper 1981�Determiners are conservative: if D is a determiner then D expresses a relation R between sets A and B s.t. R(A, B) iff R(A, AB).

Language only “sees” relational concepts of this form.

18 of 243

Generalization #2 Gärdenfors 2000, Jäger 2010�Color terms express convex regions in color space.

19 of 243

Language only “sees” convex color concepts.

20 of 243

Challenge�As we expand to larger and more open classes of words, generalizations tend to be harder to find.

Reason�Standard methodologies for discovering lexical generalizations do not scale well because we don’t have good sampling methodologies.

21 of 243

White, Aaron Steven. 2021. On Believing and Hoping WhetherSemantics and Pragmatics 14 (6): 1–18.

22 of 243

Proposed Generalization #1 Egre 2008 see also Hintikka 1975�A predicate triggers veridicality inferences…

23 of 243

I remembered that I went a few days ago, but I’m now realizing I forgot to grab beer.

Jo:

[to Jo] I thought you were heading out to get groceries.

Bo:

Bo no longer thinks that Jo is heading out to get groceries.

Jo went to get groceries a few days before.

Jo did not grab beer.

Cessation inference

Veridicality inferences

24 of 243

Proposed Generalization #1 Egre 2008 see also Hintikka 1975�A predicate triggers veridicality inferences iff it takes both declarative and interrogative clauses.

25 of 243

Triggers veridicality inferences

Takes both declaratives and interrogatives

White & Rawlins 2018, White 2021

26 of 243

Proposed Generalization #2 Zuber 1983, Theiler et al. 2019�If a predicate triggers neg-raising inferences, it does not take interrogative clauses.

27 of 243

Question�Could this be because there are no lexical generalizations to be had for these inferences?

Possibility #1 see Degen & Tonhauser 2022 on veridicality inferences�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.

Simons et al. 2010, 2017

28 of 243

Judith Degen

Stanford University

Judith Tonhauser

University of Stuttgart

29 of 243

Question�Could this be because there are no lexical generalizations to be had for these inferences?

Possibility #1 see Degen & Tonhauser 2022 on veridicality inferences�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.

Simons et al. 2010, 2017

Evidence

  1. No clear separation in inference judgments.

30 of 243

Degen & Tonhauser 2022, Fig. 7, derived from White & Rawlins’ (2016) MegaVeridicality dataset

This evidence is extremely weak!

This gradience may arise from noise in the measurement that obscures discrete classes.

31 of 243

Question�Could this be because there are no lexical generalizations to be had for these inferences?

Possibility #1 see Degen & Tonhauser 2022 on veridicality inferences�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.

Simons et al. 2010, 2017

Evidence

  1. No clear separation in inference judgments.
  2. Almost all inferences appear defeasible.

32 of 243

Degen & Tonhauser 2022, Fig. 7, derived from White & Rawlins’ (2018) MegaVeridicality dataset

Few predicates up here!

33 of 243

Question�Could this be because there are no lexical generalizations to be had for these inferences?

Possibility #1 (see Degen & Tonhauser 2022 on veridicality inferences)�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.

Possibility #2There is such a classification. It’s just obscured by noise in the measurement.

Simons et al. 2010, 2017

Possibility #2�There is such a classification. It’s just obscured by noise in the measurement.

Pursued in this talk!

34 of 243

Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?

35 of 243

Jo hated that Bo left. ⇝ Bo left.

NP   V           S     ⇝    S    

Veridicality inference

Jo hated that Bo left. ⇝ Jo believed Bo left.

Doxastic inference

NP   V           S     ⇝ NP believe     S    

Jo hated that Bo left. ⇝ Jo didn't want Bo to have left.

Bouletic inference

NP   V           S     ⇝ NP    not want        S        

36 of 243

Predicate

NP V S ⇝ S

NP V S ⇝ NP believe S

NP V S ⇝ NP want S

think

0

+

0

doubt

0

-

0

hope

0

0

+

hate

+

+

-

37 of 243

Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?

Yes! Though some patterns are associated with defeasible inferences.

Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?

38 of 243

Possibility #1�A pattern containing a defeasible inference implies irreducible gradience (e.g. vagueness) in the semantic content conditioning that pattern.

Possibility #2�A pattern containing a defeasible inference implies some gradient metalinguistic knowledge about the semantic content conditioning that pattern.

Possibility #2�A pattern containing a defeasible inference implies some gradient metalinguistic knowledge about the semantic content conditioning that pattern.

Pursued in this talk!

39 of 243

Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?

Yes! Though some patterns are associated with defeasible inferences.

Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?

It’s gradient metalinguistic knowledge about sense and/or structural indeterminacy.

40 of 243

Takeaway�We should continue to posit discrete semantic contents that pragmatics selects among.

41 of 243

Inference Patterns

42 of 243

Ben Kane

University of Rochester

Will Gantt

University of Rochester

43 of 243

Kyle Rawlins

Johns Hopkins University

Ellise Moon

University of Rochester

Hannah An

University of Rochester

Ben Kane

University of Rochester

Will Gantt

University of Rochester

Yu’an Yang

Amazon

Zhendong Liu

University of Southern California

Nick Huang

National University of Singapore

Ben Van Durme

Johns Hopkins University

Rachel Rudinger

University of Maryland

44 of 243

Approach

  1. Cluster predicates based on measures of their inferential properties.

45 of 243

Predicate

NP V S ⇝ S

NP V S ⇝ NP believe S

NP V S ⇝ NP want S

think

0

+

0

doubt

0

-

0

hope

0

0

+

hate

+

+

-

46 of 243

NP V S S

Veridicality inference

Doxastic inference

NP V S ⇝ NP believe S

Bouletic inference

NP V S ⇝ NP want S

NP not V S ⇝ (not) S    

NP not V S ⇝ NP (not) believe S

NP not V S ⇝ NP (not) want S

(not)

(not)

(not)

Neg-raising inference

NP not V S ⇝ NP V not S

47 of 243

Approach

  1. Cluster predicates based on measures of their inferential properties.
  2. Determine optimal # of clusters based on how well particular clusterings predict syntactic distribution.

48 of 243

Takeaway�There are recurring, distributionally correlated patterns of inference when filtering noise!

49 of 243

Roadmap

  1. Measuring distribution
  2. Measuring inference
  3. Discovering inference patterns
  4. Investigating inference patterns

50 of 243

Inference Patterns�Measuring Distribution

51 of 243

Kyle Rawlins

Johns Hopkins University

52 of 243

MegaAcceptability dataset

Acceptability for 1,000 verbs in 50 syntactic frames focused on clause-embedding.

White & Rawlins 2016, 2020

53 of 243

54 of 243

think

know

wonder

love

surprise

tell

say

start

stop

...

Verbs

55 of 243

Bleaching method

Frame templates (e.g. NP __ that S) instantiated by semantically bleached fillers.

56 of 243

Someone __ed something happened

Someone __ed that something happened

Someone __ed whether something happened

Someone __ed which someone something happened

Someone __ed someone that something happened

Someone __ed someone whether something happened

Someone __ed to someone that something happened

Someone __ed to do something

Someone __ed someone to do something

...

think

know

wonder

love

surprise

tell

say

start

stop

...

x

Verbs

Frames

57 of 243

50,000 total items x 5 judgments per item

MegaAcceptability dataset

Acceptability for 1,000 verbs in 50 syntactic frames focused on clause-embedding.

White & Rawlins 2016, 2020

58 of 243

Question

Is bleaching a valid method for capturing the acceptability of a verb in a frame?

Validation Strategy

Compare judgments for bleached items against judgments from trained linguists. 

59 of 243

Validation data

  1. Select 30 verbs from across Hacquard & Wellwood's (2012) classification
  2. Gather judgments for these verbs in all 50 syntactic frames from:
    1. trained linguists
    2. naïve speakers 

60 of 243

Comparison

Correlation between judgments from LI and Sprouse et al.'s (2013) dataset

61 of 243

Sprouse Linguistic Inquiry

MegaAcceptability

Correlation

62 of 243

Conclusion

Safe to use bleaching to collect acceptabiliy judgments focused on capturing selection.

63 of 243

Inference Patterns�Measuring Inference

64 of 243

NP V S S

Veridicality inference

Doxastic inference

NP V S ⇝ NP believe S

Bouletic inference

NP V S ⇝ NP want S

NP not V S ⇝ (not) S    

NP not V S ⇝ NP (not) believe S

NP not V S ⇝ NP (not) want S

(not)

(not)

(not)

Neg-raising inference

NP not V S ⇝ NP V not S

65 of 243

Recipe

  1. Validate a bleaching paradigm for collecting judgments for an inference type.
  2. Select a set of frames of interest.
  3. Select predicates acceptable in those frames using MegaAcceptability.
  4. Collect judgments using the paradigm.

66 of 243

Veridicality task

White & Rawlins 2018

67 of 243

Kyle Rawlins

Johns Hopkins University

Ben Van Durme

Johns Hopkins University

Rachel Rudinger

University of Maryland

68 of 243

Someone was irritated that a particular thing happened.

Did that thing happen?

no      maybe or maybe not       yes

Veridicality task

White & Rawlins 2018

69 of 243

Someone {knew, didn't know} that a particular thing happened.

NP _ that S

Someone {was, wasn't} surprised that a particular thing happened.

NP be _ that S

Someone {needed, didn’t need} for a particular thing to happen

NP _ for NP to VP

Someone {told, didn’t tell} a particular person to do a particular thing

Someone {believed, didn’t believe} a particular person to have a particular thing

NP _ NP to VP[+/-eventive]

A particular person {was, wasn’t} excited to do a particular thing

A particular person {was, wasn’t} suspected to have a particular thing

NP be _ to VP[+/-eventive]

A particular person {managed, didn’t manage} to do a particular thing

A particular person {seemed, didn’t seem} to have a particular thing.

NP _ to VP[+/-eventive]

70 of 243

Neg-raising task

An & White 2020

71 of 243

Hannah An

University of Rochester

72 of 243

If I were to say I don’t think that a particular thing happened, how likely is it that I mean I think that that thing didn’t happen?

Neg-raising task

Extremely unlikely

Extremely likely

An & White 2020

73 of 243

know that a particular thing happened.

NP _ that S

A particular person {didn’t, doesn’t}

I {didn’t, don’t}

surprised that a particular thing happened.

NP be _ that S

A particular person {wasn’t, isn’t}

I {wasn’t, ‘m not}

told to do a particular thing

believed to have a particular thing

NP be _ to VP[+/-eventive]

A particular person {wasn’t, isn’t}

I {wasn’t, ‘m not}

managed to do a particular thing

seemed to have a particular thing.

NP _ to VP[+/-eventive]

A particular person {didn’t, doesn’t}

I {didn’t, don’t}

74 of 243

If A knew that C happened, how likely is it that A believed that C happened?

Doxastic task

Extremely unlikely

Extremely likely

Kane et al. 2021

75 of 243

If A persudaded B that C happened, how likely is it that B believed that C happened?

Doxastic task

Extremely unlikely

Extremely likely

Kane et al. 2021

76 of 243

If A was appalled that C happened, how likely is it that A wanted C to have happened?

Bouletic task

Extremely unlikely

Extremely likely

Kane et al. 2021

77 of 243

If A apologized to B that C happened, how likely is it that B wanted C to have happened?

Bouletic task

Extremely unlikely

Extremely likely

Kane et al. 2021

78 of 243

A {knew, didn't know} that C happened.

NP _ that S

A {told, didn't tell} B that C happened.

NP _ NP that S

A {said, didn't say} to B that C happened.

NP _ to NP that S

A {was, wasn’t} surprised that C happened.

NP _ that S

A {hoped, didn't hope} that C would happen.

NP _ that S[+future]

A {promised, didn't promise} B that C would happen.

NP _ NP that S[+future]

A {predicted, didn't predict} to B that C would happen.

NP _ to NP that S[+future]

A {was, wasn’t} excited that C would happen.

NP _ that S[+future]

79 of 243

Question

Is bleaching a valid method for capturing inferences associated with verb in a frame?

80 of 243

Validation Strategy #1

Compare judgments for bleached items against judgments from trained linguists. 

81 of 243

Neg-raising

Non-neg-raising

NP __ that S

think, believe, feel, reckon, figure, guess, suppose, imagine

announce, claim, assert, report, know, realize, notice, find out

NP __ to VP

want, wish, happen, seem, plan, intend, mean, turn out

love, hate, need, continue, try, like, desire, decide

82 of 243

Non-neg-raising

Neg-raising

Mean rating of bleached example

83 of 243

Validation Strategy #1

Compare judgments for bleached items against judgments from trained linguists. 

Validation Strategy #2

Compare judgments for bleached items to judgments for more contentful items.

84 of 243

Implementation

For each verb-frame pair in validation set, sample five items from corpus.

85 of 243

Mean rating of corpus example

Mean rating of bleached example

r = 0.8

(p < 0.001)

86 of 243

Validation Strategy #1

Compare judgments for bleached items against judgments from trained linguists. 

Validation Strategy #2

Compare judgments for bleached items to judgments for more contentful items.

Validation Strategy #3

Compare inference judgments for bleached items to acceptability judgments for established distributional diagnostic.

87 of 243

Implementation

For each verb-frame pair in validation set, collect acceptability of strong NPI (additive either).

Jo didn’t do a particular thing, and…

…I think that Bo didn’t do that thing either.

…I don’t think that Bo did that thing either.

88 of 243

Mean rating of bleached example

Mean acceptability of strong NPI

r = 0.77 

(p < 0.001)

89 of 243

Conclusion

Safe to use bleaching to collect at least these types of inference judgments.

Important Point (again)

Be cautious in using this dataset to investigate individual predicates.

90 of 243

Inference Patterns�Discovering Patterns

91 of 243

Approach

Cluster predicate-frame pairs in inference space using a multiview mixed effects mixture model.

92 of 243

Predicate

NP V S ⇝ S

NP V S ⇝ NP believe S

NP V S ⇝ NP want S

think

0

+

0

doubt

0

-

0

hope

0

0

+

hate

+

+

-

93 of 243

know + NP _ that S

1

2

3

4

5

6

7

8

9

10

11

12

Inference patterns

1

0

1

0

1

0

Doxastic

Bouletic

no

maybe

yes

Veridicality

Neg-raising

94 of 243

know + NP _ that S

1

2

3

4

5

6

7

8

9

10

11

12

Inference patterns

1

0

1

0

1

0

Doxastic

Bouletic

no

maybe

yes

Veridicality

Neg-raising

95 of 243

Finding clusters

Fit model to raw that-clause data in MegaVeridicality, MegaNegRaising, and MegaIntensionality using variational inference.

96 of 243

Output

  1. A distribution over inference patterns for each verb-frame pair.

97 of 243

know + NP _ that S

1

2

3

4

5

6

7

8

9

10

11

12

Inference patterns

1

0

1

0

1

0

Doxastic

Bouletic

no

maybe

yes

Veridicality

Neg-raising

98 of 243

Output

  1. A distribution over inference patterns for each verb-frame pair.
  2. Distributions over judgments for each inference type and inference pattern

99 of 243

know + NP _ that S

1

2

3

4

5

6

7

8

9

10

11

12

Inference patterns

1

0

1

0

1

0

Doxastic

Bouletic

no

maybe

yes

Veridicality

Neg-raising

100 of 243

Question

How many inference patterns should we assume there are?

Idea

Only as many as we need to explain syntactic distribution.

101 of 243

Implementation

Select the smallest clustering for which no larger clustering improves prediction of the judgments in MegaAcceptability.

102 of 243

Cluster

103 of 243

Predicate

Cluster

Frame

Predicate

104 of 243

Implementation

Select the smallest clustering for which no larger clustering improves prediction of the judgments in MegaAcceptability.

Result

Optimal number of inference patterns is 15.

105 of 243

106 of 243

107 of 243

Implementation

Select the smallest clustering for which no larger clustering improves prediction of the judgments in MegaAcceptability.

Result

Optimal number of inference patterns is 15.

108 of 243

Interpretation

There are at least 15 distributionally correlated inference patterns.

Important Point #2

Enriching the distributional representation could increase the granularity of the patterns.

Important Point #1

Not all inference patterns instantiated by particular predicates will get their own inference pattern.

109 of 243

Inference Patterns�Investigating Patterns

110 of 243

know + NP _ that S

1

2

3

4

5

6

7

8

9

10

11

12

Inference patterns

1

0

1

0

1

0

Doxastic

Bouletic

no

maybe

yes

Veridicality

Neg-raising

0

0.5

1

111 of 243

Predicate

Cluster

Frame

Predicate

112 of 243

113 of 243

Representiationals

doxastic mental states and mental processes

NP {thought, believed, suspected} that S

114 of 243

115 of 243

116 of 243

117 of 243

Preferentials 

expressions of preference for a (future) situation.

NP {hoped, wished, demanded, recommended} that S[+/-future]​

118 of 243

119 of 243

120 of 243

121 of 243

Positive internal emotives

positive emotional states

A was {pleased, thrilled, enthused} that C happened.

Preferentials 

expressions of preference for a (future) situation.

NP {hoped, wished, demanded, recommended} that S[+/-future]​

122 of 243

123 of 243

124 of 243

125 of 243

126 of 243

Negative emotive miratives

expressions of surprise with negative valence

NP was {dazed, flustered, alarmed} that S[+future].

Negative external emotives

expressions of negative emotion with behavioral correlates

NP {whined, whimpered, pouted} to NP that S[+future].​

Positive external emotives

expressions of positive emotion with behavioral correlates

NP was {congratulated, praised, fascinated} that S.

Positive internal emotives

positive emotional states

NP was {pleased, thrilled, enthused} that S.

Preferentials 

expressions of preference for a (future) situation.

NP {hoped, wished, demanded, recommended} that S[+future/-tense]​

Negative internal emotives

negative emotional states

NP was {frightened, disgusted, infuriated} that S.​

127 of 243

128 of 243

Representiationals

doxastic mental states and mental processes

NP {thought, believed, suspected} that S

Speculatives 

communication of uncertain beliefs.

NP {ventured, guessed, gossiped} that S

Future commitment

expressions of commitment to future action or result.

NP {promised, ensured, attested} S[+future]

129 of 243

130 of 243

Weak communicatives

communicative acts with weak doxastic inferences about the source.

NP {reported, remarked, yelped} to NP that S

Representiationals

doxastic mental states and mental processes

NP {thought, believed, suspected} that S

Speculatives 

communication of uncertain beliefs.

NP {ventured, guessed, gossiped} that S

Future commitment

expressions of commitment to future action or result.

NP {promised, ensured, attested} S[+future]

Strong communicatives

communicative acts with strong doxastic inferences about the source.

NP {confessed, admitted, acknowledged} that S​

Discourse commitment

communicative acts committing the source to the content’s truth.

A {maintained, remarked, swore} that C would happen.

131 of 243

Negative emotive miratives

expressions of surprise with negative valence

A was {dazed, flustered, alarmed} that C would happen.

Negative external emotives

expressions of negative emotion with behavioral correlates

A {whined, whimpered, pouted} to B that C would happen.​

Positive external emotives

expressions of positive emotion with behavioral correlates

A was {congratulated, praised, fascinated} that C happened.

Positive internal emotives

positive emotional states

A was {pleased, thrilled, enthused} that C happened.

Preferentials 

expressions of preference for a (future) situation.

NP {hoped, wished, demanded, recommended} that S[+/-future]​

Negative internal emotives

negative emotional states

A was {frightened, disgusted, infuriated} that C happened.​

Negative emotive communicatives

communicative acts with broadly negative valence.

A {screamed, ranted, growled} to B that C would happen.​

132 of 243

133 of 243

Weak communicatives

communicative acts with weak doxastic inferences about the source.

NP {reported, remarked, yelped} to NP that S

Representiationals

doxastic mental states and mental processes

NP {thought, believed, suspected} that S

Speculatives 

communication of uncertain beliefs.

NP {ventured, guessed, gossiped} that S

Future commitment

expressions of commitment to future action or result.

NP {promised, ensured, attested} S[+future]

Strong communicatives

communicative acts with strong doxastic inferences about the source.

NP {confessed, admitted, acknowledged} that S​

Deceptives

actions involving dishonesty, deceit, or pretense.

NP {lied, misled, faked, fabricated} ((to) NP) that S.​

Discourse commitment

communicative acts committing the source to the content’s truth.

NP{maintained, remarked, swore} that S[+future].

134 of 243

135 of 243

Interpretation

There are at least 15 distributionally correlated inference patterns.

Important Point #2

Enriching the distributional representation could increase the granularity of the patterns.

Important Point #1

Not all inference patterns instantiated by particular predicates will get their own inference pattern.

136 of 243

Interpretation

There are at least 15 distributionally correlated inference patterns.

Important Point #2

Enriching the distributional representation could increase the granularity of the patterns.

Important Point #1

Not all inference patterns instantiated by particular predicates will get their own inference pattern.

137 of 243

Inference Patterns�Discussion�

138 of 243

Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?

distributionally correlated

Yes!

Yes! Though some patterns are associated with defeasible inferences.

Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?

139 of 243

Possibility #1�A pattern containing a defeasible inference implies irreducible gradience (e.g. vagueness) in the semantic content conditioning that pattern.

Possibility #2�A pattern containing a defeasible inference implies some gradient metalinguistic knowledge about the semantic content conditioning that pattern.

140 of 243

Possibility #2a�Gradient metalinguistic knowledge is knowledge about how to resolve formal indeterminacy (e.g. lexical/structural ambiguity).

Possibility #2b�Gradient metalinguistic knowledge is knowledge about contexts that a lexical item is used in.

141 of 243

Defeasible Inferences

142 of 243

Julian Grove

University of Rochester

143 of 243

Our Approach

  1. Develop modular, minimalistic, probabilistic extension of standard compositional semantic analyses to model sources of defeasibility.
  2. Fit models assuming these difference analyses to inference judgment data in order to quantitatively compare them.

144 of 243

Judith Degen

Stanford University

Judith Tonhauser

University of Stuttgart

145 of 243

Degen & Tonhauser’s Data

  1. Measure prior knowledge: rate pairings of propositions for likelihood of p2 given p1.

146 of 243

147 of 243

Empirical distribution of norming response

How likely is is that Zoe calculated the tip?

148 of 243

Degen & Tonhauser’s Data

  1. Measure prior knowledge: rate likelihood of proposition P given fact F.
  2. Measure projection: rate X’s certainty that P given F and a context where X asks Y V that P?

149 of 243

150 of 243

Empirical distribution of projection response

X asks: Did Y pretend that P?

Is X certain that P?

151 of 243

Empirical distribution of projection response

X asks: Did Y know that P?

Is X certain that P?

152 of 243

Defeasible Inferences�Our Models��

153 of 243

Julian Grove

University of Rochester

Jean-Philippe Bernardy

University of Gothenburg

154 of 243

Jo

knew

that

Bo

left

b

λx.LEAVE(x)

j

e

λp.λx.p: BEL(x, p)

λp.p

[[ ]]c

LEAVE(b)

LEAVE(b)

λx.LEAVE(b): BEL(x, LEAVE(b))

LEAVE(b): BEL(j, LEAVE(b))

155 of 243

Jo

knew

that

Bo

left

b

λx.LEAVE(x)

j

e

λp.λx.p: BEL(x, p)

λp.p

[[ ]]c

LEAVE(b)

LEAVE(b)

λx.LEAVE(b): BEL(x, LEAVE(b))

LEAVE(b): BEL(j, LEAVE(b))

Maps deterministic programs to probabilistic program structured by compositional semantics

156 of 243

Jo

knew

that

Bo

left

b

λx.LEAVE(x)

j

λp.λx.p: BEL(x, p)

λp.p

[[ ]]c

LEAVE(b)

LEAVE(b)

λx.LEAVE(b): BEL(x, LEAVE(b))

LEAVE(b): BEL(j, LEAVE(b))

Maps deterministic programs to probabilistic program structured by compositional semantics

η

Monadic return (continuation monad)

λc.c(a): α → (α → r) → r

Probability distribution

Probabilities

r

r

Threads probability distributions through derivation

157 of 243

Chris Barker

New York University

Chung-Shieh Shan

Indiana University

158 of 243

Jo

knew

that

Bo

left

b

λx.LEAVE(x)

j

e

λp.λx.p: BEL(x, p)

λp.p

[[ ]]c

LEAVE(b)

LEAVE(b)

λx.LEAVE(b): BEL(x, LEAVE(b))

LEAVE(b): BEL(j, LEAVE(b))

Allows us to stack multiple layers of uncertainty while retaining compositionality!

Julian Grove

University of Rochester

159 of 243

Idea�Model fundamental gradience on the “inner layer” and metalinguistic uncertainty on the “outer”.

Upshot�Layer at which probabilistic program evaluated corresponds to different hypotheses about the nature of gradience.

160 of 243

Jo

knew

that

Bo

left

b

λx.LEAVE(x)

j

e

λp.λx.p: BEL(x, p)

λp.p

[[ ]]c

LEAVE(b)

LEAVE(b)

λx.LEAVE(b): BEL(x, LEAVE(b))

LEAVE(b): BEL(j, LEAVE(b))

Allows us to stack multiple layers of uncertainty while retaining compositionality!

Julian Grove

University of Rochester

Importantly, stacking not done by monad transformers!

161 of 243

Verb discrete-context discrete

πc ∼ Prob(LEAVE(b))

whether the embedded proposition is true

Sample prior probabilities for:

Sample a discrete value for:

whether the verb projects the embedded proposition

πp ∼ Prob(KNOW(j, _)) 

τc ∼ Bernoulli(πc)

whether the embedded proposition is true

τp ∼ Bernoulli(πp

whether the verb projects the embedded proposition

τp ∨ [¬τp τc]

On each response:

  1. If the verb projects, respond true.
  2. If not, respond with the truth of the embedded proposition.

162 of 243

πc ∼ Prob(LEAVE(b))

Verb discrete-context discrete

whether the embedded proposition is true

whether the verb projects the embedded proposition

Sample prior probabilities for:

Sample a discrete value for:

τp ∼ Bernoulli(πp

τc ∼ Bernoulli(πc)

πp ∼ Prob(KNOW(j, _)) 

whether the embedded proposition is true

whether the verb projects the embedded proposition

On each response:

  1. If the verb projects, respond true.
  2. If not, respond with the truth of the embedded proposition.

τp ∨ τc

163 of 243

πc Prob(LEAVE(b))

Verb discrete-context discrete

whether the embedded proposition is true

whether the verb projects the embedded proposition

Sample prior probabilities for:

Sample a discrete value for:

τp Bernoulli(πp

τc Bernoulli(πc)

πp Prob(KNOW(j, _)) 

whether the embedded proposition is true

whether the verb projects the embedded proposition

On each response:

  1. If the verb projects, respond true.
  2. If not, respond with the truth of the embedded proposition.

η(η(τp ∨ τc))

do

164 of 243

Verb gradient-context gradient

πc ∼ Prob(LEAVE(b))

whether the embedded proposition is true

whether the verb projects the embedded proposition

Sample prior probabilities for:

πp ∼ Prob(KNOW(j, _)) 

τp ∼ Bernoulli(πp

τc ∼ Bernoulli(πc)

On each response, produce the likelihood that the verb project or the embedded proposition is true.

τp ∨ τc

165 of 243

πc ∼ Prob(LEAVE(b))

Verb discrete-context gradient

whether the embedded proposition is true

whether the verb projects the embedded proposition

Sample prior probabilities for:

τc ∼ Bernoulli(πc

τp ∼ Bernoulli(πp)

πp ∼ Prob(KNOW(j, _)) 

τp ∨ τc

Sample a discrete value for:

whether the verb projects the embedded proposition

On each response:

  1. If the verb projects, respond true.
  2. If not, respond with the likelihood of the embedded proposition.

166 of 243

Defeasible Inferences�Modeling prior knowledge�

167 of 243

Approach

  1. Fit random effects model with parameters for mean likelihood of context and participant bias.

168 of 243

πc ∼ Prob(LEAVE(b))

τp ∨ [¬τp τc]

Verb discrete-context discrete

whether the embedded proposition is true

whether the verb projects the embedded proposition

Sample prior probabilities for:

Sample a discrete value for:

τp ∼ Bernoulli(πp

τc ∼ Bernoulli(πc)

πp ∼ Prob(KNOW(j, LEAVE(b))) 

whether the embedded proposition is true

whether the verb projects the embedded proposition

On each response:

  1. If the verb projects, respond true.
  2. If not, respond with the truth of the embedded proposition.

169 of 243

Empirical distribution of norming response

How likely is is that Zoe calculated the tip?

170 of 243

Posterior distribution of mean

How likely is is that Zoe calculated the tip?

171 of 243

Approach

  1. Fit random effects model with parameters for mean likelihood of context and participant bias.
  2. Use the posterior distribution over the mean likelihood for a context as the prior distribution for that context in our models.

172 of 243

Defeasible Inferences�Modeling Inference Judgments�

173 of 243

Approach

  1. Fit random effects model using Markov Chain Monte Carlo with parameters for:
    1. mean likelihood of context (priors from norming model)
    2. mean likelihood of projection for each verb
    3. participant bias

174 of 243

πc ∼ Prob(LEAVE(b))

τp ∨ [¬τp τc]

Verb discrete-context discrete

whether the embedded proposition is true

whether the verb projects the embedded proposition

Sample prior probabilities for:

Sample a discrete value for:

τp ∼ Bernoulli(πp

τc ∼ Bernoulli(πc)

πp ∼ Prob(KNOW(j, LEAVE(b))) 

whether the embedded proposition is true

whether the verb projects the embedded proposition

On each response:

  1. If the verb projects, respond true.
  2. If not, respond with the truth of the embedded proposition.

175 of 243

Approach

  1. Fit random effects model using Markov Chain Monte Carlo with parameters for:
    1. mean likelihood of context (priors from norming model)
    2. mean likelihood of projection for each verb
    3. participant bias
  2. Compare model fit using expected log pointwise density computed using WAIC.

176 of 243

Model comparison results

177 of 243

Empirical distribution of projection response

X asks: Did Y pretend that P?

Is X certain that P?

178 of 243

Empirical distribution of projection response

X asks: Did Y know that P?

Is X certain that P?

179 of 243

Posterior distribution of mean response

180 of 243

Defeasible Inferences�Additional Evaluation�

181 of 243

Question�Do we see the same pattern when we take away all information about embedded clause content?

182 of 243

Approach

  1. Fit random effects model using Markov Chain Monte Carlo with parameters for:
    1. mean likelihood of context (priors from norming model)
    2. mean likelihood of projection for each verb (priors from projection model)
    3. participant bias
  2. Compare model fit using expected log pointwise density computed using WAIC.

183 of 243

184 of 243

Model comparison results on bleached

185 of 243

186 of 243

Model comparison results on templatic

187 of 243

Question�Do we see the same pattern when we take away all information about embedded clause content?

Answer�Yes!

188 of 243

Defeasible Inferences�Discussion�

189 of 243

Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?

Yes! Though some patterns are associated with defeasible inferences.

Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?

190 of 243

Possibility #1�A pattern containing a defeasible inference implies irreducible gradience (e.g. vagueness) in the semantic content conditioning that pattern.

Possibility #2�A pattern containing a defeasible inference implies some gradient metalinguistic knowledge about the semantic content conditioning that pattern.

191 of 243

Part 1: Inference Patterns�Do we find recurring, distributionally correlated patterns of inference when filtering noise?

Yes! Though some patterns are associated with defeasible inferences.

Part 2: Defeasible Inferences�What could it mean for an inference pattern to contain a defeasible inference?

It’s gradient metalinguistic knowledge about sense and/or structural indeterminacy.

192 of 243

Possibility #2a�Gradient metalinguistic knowledge is knowledge about how to resolve formal indeterminacy (e.g. lexical/structural ambiguity).

Possibility #2b�Gradient metalinguistic knowledge is knowledge about contexts that a lexical item is used in.

193 of 243

Question�Could this be because there are no lexical generalizations to be had for these inferences?

Possibility #1 (see Degen & Tonhauser 2022 on veridicality inferences)�There is no discrete classification of concepts that determines these inferences. It’s all pragmatics.

Possibility #2There is such a classification. It’s just obscured by noise in the measurement.

Simons et al. 2010, 2017

Possibility #2�There is such a classification. It’s just obscured by noise in the measurement.

194 of 243

Possibility #2a�Gradient metalinguistic knowledge is knowledge about how to resolve formal indeterminacy (e.g. lexical/structural ambiguity).

Possibility #2b�Gradient metalinguistic knowledge is knowledge about contexts that a lexical item is used in.

Possibility #2a�Gradient metalinguistic knowledge is knowledge about how to resolve formal indeterminacy (e.g. lexical/structural ambiguity).

195 of 243

Conclusion�

196 of 243

Takeaway�We should continue to posit discrete semantic contents that pragmatics selects among.

It’s not all just pragmatics.

197 of 243

Question for Future Work #1�How might pragmatics select among alternative discrete semantic contents?

Possible Approach�Integrate existing conventionalist account of lexically triggered inference into probabilistic semantics framework (e.g. alternative-based account, as in Abusch 2002, 2010).

198 of 243

Question for Future Work #2�To what extent does class-level knowledge interact with pragmatic selection processes?

Approach�Integrate mixture model into the (hyper)priors of the probabilistic semantics model.

199 of 243

πc ∼ Prob(LEAVE(b))

Verb discrete-context gradient

whether the embedded proposition is true

whether the verb projects the embedded proposition

Sample prior probabilities for:

τc ∼ Bernoulli(πc

τp ∼ Bernoulli(πp)

πp ∼ Prob(KNOW(j, _)) 

τp ∨ τc

Sample a discrete value for:

whether the verb projects the embedded proposition

On each response:

  1. If the verb projects, respond true.
  2. If not, respond with the likelihood of the embedded proposition.

200 of 243

Question for Future Work #3�To what extent do inference types interrelate?

Approach�Integrate probabilistic inference relationships among inference types into model.

201 of 243

Thanks!

Supported by NSF-BCS-1748969

The MegaAttitude Project: Investigating selection and polysemy at the scale of the lexicon

202 of 243

Appendix A:�Further Validation of MegaAcceptability

203 of 243

Case Study�The vast majority of about-PPs are adjuncts�

Rawlins 2013, 2014

204 of 243

XP1 V (XP2) (XP3) about XP4

is acceptable

XP1 V (XP2) (XP3)

is acceptable

X

205 of 243

NP _ed  

NP _ed about XP  

206 of 243

Rawlins 2014

207 of 243

NP _ed about XP  

NP _ed  

208 of 243

NP _ed about XP  

NP _ed  

209 of 243

NP _ed  

NP _ed about XP  

210 of 243

NP _ed about XP  

NP _ed  

211 of 243

NP _ed  

NP _ed about XP  

212 of 243

NP _ed about XP  

NP _ed  

213 of 243

Noise variance / acceptability variance 

Proportion violations

Independence

214 of 243

Noise variance / acceptability variance 

Proportion violations

Independence

215 of 243

NP (was) _ed  

NP (was) _ed about whether S  

216 of 243

NP (was) _ed about whether S  

NP (was) _ed  

217 of 243

NP (was) _ed about whether S  

NP (was) _ed  

218 of 243

NP (was) _ed about whether S  

NP (was) _ed  

219 of 243

Acceptability threshold 

Proportion violations

220 of 243

Noise variance / acceptability variance 

Proportion violations

Independence

221 of 243

Acceptability threshold 

Proportion violations

222 of 243

Acceptability threshold 

Proportion violations

223 of 243

Appendix B:�Distribution of Inference Judgments

224 of 243

225 of 243

226 of 243

227 of 243

Appendix C:�Validation of MegaIntensionality

228 of 243

Question

Is bleaching a valid method for capturing doxastic and bouletic inferences associated with verb in a frame?

229 of 243

Challenge

Doxastic and bouletic inferences are highly sensitive to world knowledge.

230 of 243

Jo doubts that Bo left. ⇝ Jo doesn't believe that Bo left.

Jo doubts that Bo left. ⇝ Jo wants Bo to have left.

Trump doubts that he won in 2020.

Trump wants to have won in 2020.

231 of 243

Approach

  1. Norm scenarios for likelihood of prior belief or desire not conditioned on a previous sentence

232 of 243

Executives generally want their deals to go through.

Executives generally believe that their deals will go through.

Norming

233 of 243

234 of 243

Approach

  1. Norm scenarios for likelihood of prior belief or desire not conditioned on a previous sentence
  2. Test those normed schenarios in an inference task focused 24 verbs. 

235 of 243

Executives generally want their deals to go through.

Executives generally believe that their deals will go through.

Norming

The executive knew that his deal had gone through.

Contentful

236 of 243

Approach

  1. Norm scenarios for likelihood of prior belief or desire not conditioned on a previous sentence
  2. Test those normed schenarios in an inference task focused 24 verbs. 
  3. Compare to bleached variants.

237 of 243

Executives generally want their deals to go through.

Executives generally believe that their deals will go through.

Norming

The executive knew that his deal had gone through.

Contentful

A knew that C happened.

Bleached

238 of 243

239 of 243

Appendix D:�Number of possible inference patterns

240 of 243

(3 veridicality inferences)2 matrix polarities

x

(3 doxastic inferences)2 matrix polarities

x

(3 bouletic inferences)2 matrix polarities

x

2 neg-raising inferences

=

1,458 inference patterns

If any lexical knowledge relevant to any inference type is gradient (and continuous), there are an uncountable number of patterns.

241 of 243

Appendix E:�Principal Component Analysis

242 of 243

95% of variance

243 of 243

  1. The polarity of veridicality and doxastic inferences under negation is anti-correlated with neg-raising.
  2. The polarity of a belief presupposition about a recipient is correlated with the polarity of a desire presupposition.
  3. The valence of an emotive communicative is anticorrelated with veridicality.
  4. Bouletic inferences about the source and the target of a communication are anticorrelated with veridicality.
  5. Desire inferences about the source in a communication are anticorrelated with belief inferences about the target.
  6. Veridicality is correlated with belief inferences in the target of a communication but anticorrelated with desire inferences.