1 of 19

Sampling tonal languages��Dmitry Gerasimov, INALCO

ThoT

Nov. 16, 2023

1st ThoT workshop, Paris, Nov. 30 – Dec. 1, 2023

2 of 19

Sampling tonal languages

  1. What is the general set of all tonal languages of the world?
  2. How can we build a representative sample thereof �for the purposes of the project?

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

2

02.02.20ГГ

3 of 19

The universe of tonal languages

  • “Modern standard Chinese, like nearly half of the world’s languages, uses pitch patterns to distinguish between one word and another.” [Ashby & Maidment 2005: 163]
  • “Probably all languages have structural intonation. However, only about half have lexical tone,..” [Gussenhoven 2004: 12]

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

3

02.02.20ГГ

4 of 19

The universe of tonal languages

  • “Tell someone you are writing a book about ‘Tone’, and they look blank, and yet by some estimates as much as 60–70 per cent of the world’s languages are tonal.” [Yip 2002: 1]
  • “Of the 6520 languages in the families listed in Table 2, the indications are that 3044, or 46.7 %, are tonal. <…> Moreover, given the way that the sampling was conducted this figure is more likely to be an over-estimate…” [Maddieson 2023: 12]

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

4

02.02.20ГГ

5 of 19

WALS

No tones: 307

Simple: 132

Complex: 88

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

5

02.02.20ГГ

6 of 19

LAPSyD

No tones: 503

Marginal: 8

Simple: 119

Moderately �complex: 53

Complex: 68

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

6

02.02.20ГГ

7 of 19

Larry Hyman’s database

665 tonal languoids

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

7

02.02.20ГГ

8 of 19

Larry Hyman’s database

665 tonal languoids,

of which I’ve been able�to code 627

99 top-level families�(per glottolog),�inc. 60 represented �by 1 language�

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

8

02.02.20ГГ

9 of 19

[Hammarström & Mannby, forthc.]

  • Over 10K digitized sources on 4587 languages.
  • Automated search for “tone”, “tonal”, etc. + their translational equivalents.
  • Number of hits in the source > certain threshold => Tone = TRUE.
  • The majority of sources for each language.
  • Manual check of 600 languages.

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

9

02.02.20ГГ

10 of 19

[Hammarström & Mannby, forthc.]

  • 1865 languages identified as “tonal” (ca. 40 % out of 4587).
  • Manual check of 600 languages returned a 9% margin of error.
    • 28 false positives
    • 26 false negatives

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

10

02.02.20ГГ

11 of 19

Language sampling

  • Convenience sample: grabbing whatever languages are within your reach;
  • Variety sample: capturing as much linguistic diversity as possible;
  • Probability sample: testing quantitative hypotheses;
    • e.g.: in terms of TDI, do languages cluster around the four loci of omnisyllabic, tonemic, etc., or is it a gradual continuum?

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

11

02.02.20ГГ

12 of 19

Language sampling

  • We primarily want a variety sample;
  • While still keeping an eye open for a potential probability (sub)sample; �(cf. [Dahl 2008; Miestamo 2016; Guzmán Naranjo & Becker 2022; inter alia].
  • However, lack of sufficient descriptions may push us towards something more like a convenience sample.

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

12

02.02.20ГГ

13 of 19

Building our sample

  • Compiling information from WALS, LAPSyD, LH and HH&EM.
  • Special attention to cases when different databases disagree.
  • Removing apparent false positives from HH&EM.

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

13

02.02.20ГГ

True

Bali (Democratic Republic of Congo)

bcp

Atlantic-Congo

True

Complex

Y

Bench

bcq

Ta-Ne-Omotic

False

Babine

bcr

Athabaskan-Eyak-Tlingit

True

Moderately

Kohumono

bcs

Atlantic-Congo

True

Y

Awad Bing

bcu

Austronesian

True

Bana

bcw

Afro-Asiatic

14 of 19

Building our sample

  • The databases give us over 160 first-level families (per glottolog) presumably containing tonal languages.
  • We also decided to split major tone-rich families (AtlCongo, AfrAs, SinTib, NucTNG), bringing the total number of taxa well over 220.

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

14

02.02.20ГГ

15 of 19

Building our sample

  • But:
    • Some taxa prove to only contain false positives from HH&EM;
    • Some small taxa prove to be crucially underdescribed;
    • At some point it makes sense to convert to a less-fine grained genetic classification like that of Ethnologue (cf. [Miestamo 2016]).

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

15

02.02.20ГГ

16 of 19

Building our sample

  • We also decided to split major tone-rich families (AtlCongo, AfrAs, SinTib, NucTNG), bringing the total number of taxa well over 220.
  • Of these, about 80 contain a single presumably tonal language.
  • The number of smaller taxa that make it into the sample determines how many “slots” we need to distribute among larger, more diverse taxa.

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

16

02.02.20ГГ

17 of 19

Building our sample

  • A manual check of 25 “one-language taxa”:
    • 7 are “false positives”;
    • 6 we can’t use because of scarcity of description;
    • 7 are suitable;
    • 5 are pending further consideration.

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

17

02.02.20ГГ

18 of 19

References

  • Ashby, M. & Maidment, J. (2005). Introducing Phonetic Science. CUP.
  • Dahl, Ö. (2008). An exercise in a posteriori language sampling // STUF 61/3: 208-220.
  • Gussenhoven, C. (2004). The Phonology of Tone and Intonation. Cambridge: CUP.
  • Guzmán Naranjo, M. & Becker, L. (2022). Statistical bias control in typology // Linguistic Typology 26/3, 605-670.
  • Hammarström, H. & Mannby, E. (forthc). On the world-wide distribution of tonal languages.
  • Maddieson, I. (2013) Tone. Chapter #13A in WALS.

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

18

02.02.20ГГ

19 of 19

References

  • Maddieson, I. (2023) Tone is not predominant; Tone is not primordial // Proceedings of ICPhS 2023.
  • Miestamo, M., Bakker, D. & Arppe, A. (2016) Sampling for variety // Linguistic Typology 20/2, 233-296.
  • Yip, M. (2002). Tone. Cambridge: CUP.

ЗАГОЛОВОК ПРЕЗЕНТАЦИИ

19

02.02.20ГГ