1 of 26

The Role of Verb Semantics

in Hungarian Verb Object Order

Dora Demszky

Stanford University

ddemszky@stanford.edu

January 8, 2021

LSA Annual Meeting

2 of 26

Talk Goal

To present evidence from a large-scale corpus analysis that in Hungarian, despite its status as the paradigm discourse-configurational language, the verb's lexical semantics has a significant effect on the relative order of the verb and its object.

3 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language
    • word order is constrained by fixed topic and focus positions

4 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language

‘Joe loves Sarah’

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

SVO

SOV

VSO

OVS

OSV

VOS

5 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

SVO

SOV

VSO

OVS

OSV

VOS

CONTEXT

Kit szeret Józsi?

‘Who does Joe love?’

‘Joe loves Sarah’

6 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

SVO

SOV

VSO

OVS

OSV

VOS

CONTEXT

Kit szeret Józsi?

‘Who does Joe love?’

focus is preverbal

7 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language
    • word order is constrained by fixed topic and focus positions
  • but verbs exhibit discourse-independent ordering preferences (Komlósy, 1989)

OV preferring

(focus-avoiding verbs)

VO preferring

(focus-preferring verbs)

VS

talál ‘find’

marad ‘remain’

tartalmaz ‘contain’

tud ‘know’

utál ‘hate’

emlékszik ‘remember’

8 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language
    • word order is constrained by fixed topic and focus positions
  • but verbs exhibit discourse-independent ordering preferences (Komlósy, 1989)

OV preferring

(focus-avoiding verbs)

VO preferring

(focus-preferring verbs)

VS

talál ‘find’

marad ‘remain’

tartalmaz ‘contain’

tud ‘know’

utál ‘hate’

emlékszik ‘remember’

Hypothesis: Lexical semantics influences verbs’ ordering preference.

9 of 26

Method

1

Extract verb-object pairs from the Hungarian Gigaword Corpus (Oravecz et al., 2014)

Józsi szereti Sárit.

‘Joe loves Sarah.’

direct obj

Stanford CoreNLP dependency parser (Qi et al., 2018)

10 of 26

Method

1

380 unique verb lemmas [types]

~1.3M verb-object pairs [tokens]

Extract verb-object pairs from the Hungarian Gigaword Corpus (Oravecz et al., 2014)

Józsi szereti Sárit.

‘Joe loves Sarah.’

direct obj

Verb lemma

Object lemma

VO?

szeret ‘love’

Sári

‘Sarah’

yes

11 of 26

Method

1

2

Group verbs into 11 semantic classes

Extract verb-object pairs from the Hungarian Gigaword Corpus (Oravecz et al., 2014)

Activity (8)

Affect (91)

Change of State/Location (110)

Creation/Representation (50) Evaluation/Experience (56)

Ingestion (11)

Ownership (4)

Perception (6)

Preference (5)

Spatial Configuration (19)

Other (18)

12 of 26

Method

1

2

Group verbs into 11 semantic classes

3

Extract control features via dependency parsing

object definiteness

object NP weight

Extract verb-object pairs from the Hungarian Gigaword Corpus (Oravecz et al., 2018)

13 of 26

Method

1

2

Group verbs into 11 semantic classes

3

Extract control features via dependency parsing

Extract verb-object pairs from the Hungarian Gigaword Corpus (Oravecz et al., 2018)

4

Run logistic regression to predict ordering of verb-object pairs

  1. Split data into training (50%) and held-out test (50%) sets

14 of 26

Method

1

2

Group verbs into 11 semantic classes

3

Extract control features via dependency parsing

Extract verb-object pairs from the Hungarian Gigaword Corpus (Oravecz et al., 2018)

4

Run logistic regression to predict ordering of verb-object pairs

  • Split data into training (50%) and held-out test (50%) sets
  • Estimate predictive power of each feature via test set accuracy

15 of 26

Regression Results

Feature

Accuracy

No features

(majority baseline)

53%

Object NP weight

55%

Object definiteness

58%

Semantic class

65%

All pairwise differences among accuracies are significant

(p < 0.001, two-sample t-test)

16 of 26

Ordering Preference of Verb Classes

OV

most stative verbs (except psych verbs)

Activity (8)

Affect (91)

Change of State/Location (110) Creation/Representation (50) Evaluation/Experience (56)

Ingestion (11)

Ownership (4)

Perception (6)

Preference (5)

Spatial Configuration (19)

Other (18)

17 of 26

Ordering Preference of Verb Classes

OV

most stative verbs (except psych verbs)

Activity (8)

Affect (91)

Change of State/Location (110) Creation/Representation (50) Evaluation/Experience (56)

Ingestion (11)

Ownership (4)

Perception (6)

Preference (5)

Spatial Configuration (19)

Other (18)

VO

most non-stative verbs

18 of 26

Conclusion

Our findings

  1. suggest that lexical semantic factors play a significant role in Hungarian verb-object order,
  2. demonstrate how computational tools can be used to test similar questions at a large scale.

19 of 26

Thank you!

Advisors: Beth Levin, Dan Jurafsky, László Kálmán

20 of 26

References

Komlósy, A. (1989). Fókuszban az igék [Verbs in focus]. Általános Nyelvészeti Tanulmányok, 17, 171–82

Oravecz, C., Váradi T., & Sass, B. (2014). The Hungarian Gigaword Corpus. In LREC (pp. 1719–1723).

Bresnan, J., Cueni, A., Nikitina, T., & Baayen, R. H. (2007). Predicting the dative alternation. In Cognitive Foundations of Interpretation (pp. 69-94).

Benor, S. B., & Levy, R. (2006). The chicken or the egg? A probabilistic analysis of English binomials. Language, 233-278.

Levin, B. (1993). English verb classes and alternations: A preliminary investigation. University of Chicago Press.

Wasow, T. (1997). Remarks on grammatical weight. Language Variation and Change, 9(1), 81-105.

Qi, P., Dozat, T., Zhang, Y., & Manning, C. D. (2019). Universal dependency parsing from scratch. In CoNLL (pp. 160-170)

Trón, V., Gyepesi, G., Halácsy, P., Kornai, A., Németh, L., & Varga, D. (2005). HunMorph: open source word analysis. In Proceedings of Workshop on Software (pp. 77-85).

21 of 26

Verb Classes

Verb Class

Count

Examples

ACTIVITY

8

keres ‘search for’, firtat ‘dwell on’, foglalkoztat ‘employ’

AFFECT

91

tisztít ‘clean’, vereget ‘hit at’, sürget ‘urge’

CHANGE OF STATE / LOCATION

110

aktivál ‘activate’, érlel ‘ripen’, mélyít ‘deepen, aggravate’

CREATION/REPRESENTATION

50

alkot ‘create’, szemléltet ‘illustrate’, szaporít ‘breed’

EVALUATION/EXPERIENCE

56

gyűlöl ‘hate’, csodál ‘admire’, un ‘be bored by’

INGESTION

11

fogyaszt ‘consume’, fal ‘devour’, kortyol ‘take sips of’

OWNERSHIP

4

birtokol ‘own, possess’, érdemel ‘deserve’, illet ‘belong to’

PERCEPTION

6

hall ‘hear’, vizsgál ‘examine’, érzékel ‘perceive’

PREFERENCE

5

preferál ‘prefer’, választ ‘choose’, latolgat ‘ponder on’

SPATIAL CONFIGURATION

19

övez ‘surround’, óv ‘guard’, tartalmaz ‘contain’

OTHER

18

szerkeszt ‘edit’, dédelget ‘fondle, pamper’, hallat ‘make heard’

OV

VO

22 of 26

Avenues for Future Work

  • extending the analysis to a larger number of verbs
    • e.g. ones with multiple arguments
  • investigate the role of additional features
    • e.g. animacy and humanness of arguments
  • better understand the relationship among different factors that influence Hungarian word order
    • discourse context, lexical semantics, cognitive constraints

23 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language

‘Joe loves Sarah’

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

SVO

SOV

VSO

OVS

OSV

VOS

24 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

SVO

SOV

VSO

OVS

OSV

VOS

CONTEXT

Kit szeret Józsi?

‘Who does Joe love?’

‘Joe loves Sarah’

25 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

SVO

SOV

VSO

OVS

OSV

VOS

CONTEXT

Kit szeret Józsi?

‘Who does Joe love?’

focus is preverbal

26 of 26

Background on Hungarian

  • canonical example of a discourse-configurational language

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

Józsi

Sárit

szereti

Józsi

Sárit

szereti

Józsi

Sárit

Szereti

SVO

SOV

VSO

OVS

OSV

VOS

CONTEXT

Ki szereti Sárit?

‘Who loves Sarah?’

focus