1 of 42

Is it social or linguistic? Examining internal factors in language change

Ian Stewart

@alethioguy

New Methods in �Computational Sociolinguistics

5 Nov 2018

1

2 of 42

Disclaimer

2

Not sure what counts as “sociolinguistics” vs. language evolution vs. diachronic...

¯\_(ツ)_/¯

3 of 42

Language change: internal factors?

3

Synchronic patterns of language variation can lead to diachronic effects in the overall system.� Transmission: children acquire phonetic variable from parents, modify it, pass to next generation => sound change.

4 of 42

Language change: internal factors?

4

Synchronic patterns of language variation can lead to diachronic effects in the overall system.� Transmission: children acquire phonetic variable from parents, modify it, pass to next generation => sound change.

In sound change, the effect of internal factors is well-established as in chain shifts: vowel A’s change in proximity to B causes B to shift, etc.

What about other levels of language? Syntax, semantics, lexicon, etc.

5 of 42

Language change: internal factors?

5

Synchronic patterns of language variation can lead to diachronic effects in the overall system.� Transmission: children acquire phonetic variable from parents, modify it, pass to next generation => sound change.

In sound change, the effect of internal factors is well-established as in chain shifts: vowel A’s change in proximity to B causes B to shift, etc.

What about other levels of language? Syntax, semantics, lexicon, etc.

6 of 42

Internal factors: morphology

Successful lexical innovations often exhibit morphological/derivational diversity. (Kershaw, Rose and Stacey 2016)

6

7 of 42

Internal factors: semantics

Volatility in semantics often predicts significant lexical change more readily than frequency alone (Kulkarni et al. 2016, Stewart et al. 2017).

7

8 of 42

Internal factors: context

Extension to more diverse contexts leads to intensifier change. �(Ito and Tagliamonte, 2003)

8

9 of 42

Internal factors: some questions

  1. Does collocation diversity help the adoption of nonstandard lexical variables?�
  2. How much does lexical “niche” influence the adoption of lexical variables? Ex. “rivalries” between words in same niche.�
  3. Is linguistic flexibility modulated by individual preferences? Ex. if speaker A only uses an innovation in a fixed context while speaker B uses innovation in variety of contexts.

9

10 of 42

Internal factors: some questions

  • Does collocation diversity help the adoption of nonstandard lexical variables?
  • How much does lexical “niche” influence the adoption of lexical variables? Ex. “rivalries” between words in same niche.�
  • Is linguistic flexibility modulated by individual preferences? Ex. if speaker A only uses an innovation in a fixed context while speaker B uses innovation in variety of contexts.

10

11 of 42

Making “fetch” happen�(abridged)

11

12 of 42

Language change: social factors

12

The internet is a breeding ground for nonstandard words and phrases.

Most theories of innovation adoption focus on social factors: �the more people that use a word, the more likely it will be adopted in the future.

r/Atlanta

r/SanFrancisco

r/Chicago

r/SanFrancisco

hella

af

13 of 42

Language change: linguistic factors

13

Language is not like other innovations! A word’s usage is constrained by linguistic systems such as syntax and semantics.

A word may be adopted because of its dissemination across multiple topical or grammatical contexts.

sweet af bro

it’s hot af

you’re funny af

af

it’s hella hot

hella

14 of 42

Language change: comparison

14

r/Atlanta

r/SanFrancisco

r/Chicago

af

sweet af bro

it’s hot af

you’re funny af

af

Does the social context of a word influence its adoption more than its linguistic context?

<

?

15 of 42

Research questions

  1. Does social dissemination increase likelihood of nonstandard word growth or decline?�

15

16 of 42

Research questions

  • Does social dissemination increase likelihood of nonstandard word growth or decline?��
  • Does linguistic dissemination
    1. increase likelihood of nonstandard word growth or decline?
    2. increase likelihood of nonstandard word growth or decline, when controlling for social dissemination?

16

17 of 42

Roadmap

Detect and characterize nonstandard, non-topical words (<kinda>) that grow and decline over long period of time.�

Quantify social and linguistic dissemination.�

Test the influence of dissemination on growth and decline likelihood.

17

18 of 42

Data processing

18

2013-2016

English-language subreddits

Monthly bins

Vocabulary = top 100K unigrams

  • open-access
  • reproducible
  • social factors: user, subreddit, thread

19 of 42

Identifying nonstandard words

Find words with either:�(1) monotonic growth, (Spearman coefficient)�(2) growth followed by decline. (piecewise regression, logistic)�

Qualitative filter “newspaper” words (standard).

19

20 of 42

Nonstandard word examples

20

Word

Gloss

Formation type

idk

I don’t know

acronym

shitpost

low-quality post

compound

tho

though

clipping

eyebleach

pleasing image(s)

compound

trashy

undesirable

derivation

wot

what

respelling

growth

decline

21 of 42

Social dissemination

Observed social count normalized by expected count. (Altmann et al. 2011)

Compute for: users (DU), subreddits (DS), threads (DT).

21

expected

observed

user dissemination

22 of 42

Linguistic dissemination

Observed count of trigram contexts normalized by expected count.

Scalable to 1000s of words without memory problems.

Similar to prior “dispersion” metrics. (Chesley and Baayen 2010)

22

23 of 42

Linguistic dissemination

23

log probability

log 3gram

24 of 42

Analysis

  1. Relative importance regression�
  2. Causal inference
  3. Growth vs. decline prediction
  4. Survival analysis

24

25 of 42

Causal inference

Goal: determine the causal influence of dissemination (treatment) �on the probability of word growth (outcome), by controlling for covariates.

Continuous treatment requires an Average Dose Response Function. (Hirano & Imbens 2005)

25

26 of 42

Causal inference

Goal: determine the causal influence of dissemination (treatment) �on the probability of word growth (outcome), by controlling for covariates.

Continuous treatment requires an Average Dose Response Function. (Hirano & Imbens 2005)

26

x%�Treatment

Covariates

Outcome

Repeat over all x% values

Propensity score

27 of 42

Causal inference

27

Linguistic dissemination

User dissemination

Subreddit dissemination

Thread dissemination

P(growth)

Treatment quantile

Social

28 of 42

Causal inference

28

User dissemination

P(growth)

29 of 42

Causal inference

29

RQ2: Linguistic dissemination has consistently positive influence, even when controlling for social dissemination.

RQ1: Social dissemination has little influence (except for subreddit dissemination).

Linguistic dissemination

User dissemination

Subreddit dissemination

Thread dissemination

P(growth)

Treatment quantile

Social

30 of 42

Growth vs. decline prediction

30

Models:�f = frequency-only�f+L = frequency, linguistic dissemination�f+S = frequency, social dissemination�f+L+S = all predictors

Predict growth versus decline using k initial months of data.

31 of 42

Growth vs. decline prediction

31

Models:�f = frequency-only�f+L = frequency, linguistic dissemination�f+S = frequency, social dissemination�f+L+S = all predictors

Months of training data

32 of 42

Growth vs. decline prediction

32

RQ1: Models with social dissemination do not outperform baseline.

RQ2: Models with linguistic dissemination significantly outperform baseline.

Months of training data

33 of 42

Conclusions

33

RQ1: Social dissemination has a weak or null influence on the growth and decline of nonstandard words.

RQ2: Linguistic dissemination has a consistent influence on the growth and decline of nonstandard words.

Language change must be considered in the context of the linguistic system in which it takes place, not merely the social system.

34 of 42

Leftover questions: niche

Initially the “fetch” study was going to incorporate distributional information: ��how close is a word to its neighbors ≈ how “niche-y” the word is.

af

hella

super

really

very

totally

asfuck

asf

extremely

35 of 42

Leftover questions: niche

Initially the “fetch” study was going to incorporate distributional information: ��how close is a word to its neighbors ≈ how “niche-y” the word is.

af

hella

super

really

very

totally

asfuck

asf

extremely

Problem: not actually how word embeddings work. ��High distance from neighbors =/= no synonyms.

36 of 42

Leftover questions: competition

Words are not adopted or abandoned in isolation, because language is not IID!

37 of 42

Leftover questions: competition

We have lots of tools to test relationships between linguistic variables: �word embeddings, PMI, Granger causality...

Tan, Card and Smith (2017)

38 of 42

Leftover questions: modeling adoption

Typical innovation adoption model = social influence + self-excitation.

How can we incorporate language? social + self + linguistic signal

sweet af bro

it’s hot af

you’re funny af

I’m smart af

it’s hot af

it’s hot af

it’s hot af

???

39 of 42

Is it social or linguistic?

Language change is complicated and can’t always be boiled down to a few numbers or metrics.

But we can agree that social factors and linguistic factors represent different aspects of innovation adoption.

Most online language change studies have treated them separately.

Let’s use computational methods to examine linguistic factors and compare to social factors!

40 of 42

Is it social or linguistic?

Language change is complicated and can’t always be boiled down to a few numbers or metrics.

But we can agree that social factors and linguistic factors represent different aspects of innovation adoption.

Most online language change studies have treated them separately.

Let’s use computational methods to examine linguistic factors and compare to social factors!

41 of 42

Shameless plug

GT Computational Linguistics lab has more papers on language change!

42 of 42

Thanks!

Questions?