Is it social or linguistic? Examining internal factors in language change
Ian Stewart
@alethioguy
New Methods in �Computational Sociolinguistics
5 Nov 2018
1
Disclaimer
2
Not sure what counts as “sociolinguistics” vs. language evolution vs. diachronic...
¯\_(ツ)_/¯
Language change: internal factors?
3
Synchronic patterns of language variation can lead to diachronic effects in the overall system.� Transmission: children acquire phonetic variable from parents, modify it, pass to next generation => sound change.
Language change: internal factors?
4
Synchronic patterns of language variation can lead to diachronic effects in the overall system.� Transmission: children acquire phonetic variable from parents, modify it, pass to next generation => sound change.
In sound change, the effect of internal factors is well-established as in chain shifts: vowel A’s change in proximity to B causes B to shift, etc.
What about other levels of language? Syntax, semantics, lexicon, etc.
Language change: internal factors?
5
Synchronic patterns of language variation can lead to diachronic effects in the overall system.� Transmission: children acquire phonetic variable from parents, modify it, pass to next generation => sound change.
In sound change, the effect of internal factors is well-established as in chain shifts: vowel A’s change in proximity to B causes B to shift, etc.
What about other levels of language? Syntax, semantics, lexicon, etc.
Internal factors: morphology
Successful lexical innovations often exhibit morphological/derivational diversity. (Kershaw, Rose and Stacey 2016)
6
Internal factors: semantics
Volatility in semantics often predicts significant lexical change more readily than frequency alone (Kulkarni et al. 2016, Stewart et al. 2017).
7
Internal factors: context
Extension to more diverse contexts leads to intensifier change. �(Ito and Tagliamonte, 2003)
8
Internal factors: some questions
9
Internal factors: some questions
10
Making “fetch” happen�(abridged)
11
Language change: social factors
12
The internet is a breeding ground for nonstandard words and phrases.
Most theories of innovation adoption focus on social factors: �the more people that use a word, the more likely it will be adopted in the future.
r/Atlanta
r/SanFrancisco
r/Chicago
r/SanFrancisco
hella
af
Language change: linguistic factors
13
Language is not like other innovations! A word’s usage is constrained by linguistic systems such as syntax and semantics.
A word may be adopted because of its dissemination across multiple topical or grammatical contexts.
sweet af bro
it’s hot af
you’re funny af
af
it’s hella hot
hella
Language change: comparison
14
r/Atlanta
r/SanFrancisco
r/Chicago
af
sweet af bro
it’s hot af
you’re funny af
af
Does the social context of a word influence its adoption more than its linguistic context?
<
?
Research questions
15
Research questions
16
Roadmap
Detect and characterize nonstandard, non-topical words (<kinda>) that grow and decline over long period of time.�
Quantify social and linguistic dissemination.�
Test the influence of dissemination on growth and decline likelihood.
17
Data processing
18
2013-2016
English-language subreddits
Monthly bins
Vocabulary = top 100K unigrams
Identifying nonstandard words
Find words with either:�(1) monotonic growth, (Spearman coefficient)�(2) growth followed by decline. (piecewise regression, logistic)�
Qualitative filter “newspaper” words (standard).
19
Nonstandard word examples
20
Word | Gloss | Formation type |
idk | I don’t know | acronym |
shitpost | low-quality post | compound |
tho | though | clipping |
eyebleach | pleasing image(s) | compound |
trashy | undesirable | derivation |
wot | what | respelling |
growth
decline
Social dissemination
Observed social count normalized by expected count. (Altmann et al. 2011)
Compute for: users (DU), subreddits (DS), threads (DT).
21
expected
observed
user dissemination
Linguistic dissemination
Observed count of trigram contexts normalized by expected count.
Scalable to 1000s of words without memory problems.
Similar to prior “dispersion” metrics. (Chesley and Baayen 2010)
22
Linguistic dissemination
23
log probability
log 3gram
Analysis
24
Causal inference
Goal: determine the causal influence of dissemination (treatment) �on the probability of word growth (outcome), by controlling for covariates.
Continuous treatment requires an Average Dose Response Function. (Hirano & Imbens 2005)
25
Causal inference
Goal: determine the causal influence of dissemination (treatment) �on the probability of word growth (outcome), by controlling for covariates.
Continuous treatment requires an Average Dose Response Function. (Hirano & Imbens 2005)
26
x%�Treatment
Covariates
Outcome
Repeat over all x% values
Propensity score
Causal inference
27
Linguistic dissemination
User dissemination
Subreddit dissemination
Thread dissemination
P(growth)
Treatment quantile
Social
Causal inference
28
User dissemination
P(growth)
Causal inference
29
RQ2: Linguistic dissemination has consistently positive influence, even when controlling for social dissemination.
RQ1: Social dissemination has little influence (except for subreddit dissemination).
Linguistic dissemination
User dissemination
Subreddit dissemination
Thread dissemination
P(growth)
Treatment quantile
Social
Growth vs. decline prediction
30
Models:�f = frequency-only�f+L = frequency, linguistic dissemination�f+S = frequency, social dissemination�f+L+S = all predictors
Predict growth versus decline using k initial months of data.
Growth vs. decline prediction
31
Models:�f = frequency-only�f+L = frequency, linguistic dissemination�f+S = frequency, social dissemination�f+L+S = all predictors
Months of training data
Growth vs. decline prediction
32
RQ1: Models with social dissemination do not outperform baseline.
RQ2: Models with linguistic dissemination significantly outperform baseline.
Months of training data
Conclusions
33
RQ1: Social dissemination has a weak or null influence on the growth and decline of nonstandard words.
RQ2: Linguistic dissemination has a consistent influence on the growth and decline of nonstandard words.
Language change must be considered in the context of the linguistic system in which it takes place, not merely the social system.
Leftover questions: niche
Initially the “fetch” study was going to incorporate distributional information: ��how close is a word to its neighbors ≈ how “niche-y” the word is.
af
hella
super
really
very
totally
asfuck
asf
extremely
Leftover questions: niche
Initially the “fetch” study was going to incorporate distributional information: ��how close is a word to its neighbors ≈ how “niche-y” the word is.
af
hella
super
really
very
totally
asfuck
asf
extremely
Problem: not actually how word embeddings work. ��High distance from neighbors =/= no synonyms.
Leftover questions: competition
Words are not adopted or abandoned in isolation, because language is not IID!
Leftover questions: competition
We have lots of tools to test relationships between linguistic variables: �word embeddings, PMI, Granger causality...
Tan, Card and Smith (2017)
Leftover questions: modeling adoption
Typical innovation adoption model = social influence + self-excitation.
How can we incorporate language? social + self + linguistic signal
sweet af bro
it’s hot af
you’re funny af
I’m smart af
it’s hot af
it’s hot af
it’s hot af
???
Is it social or linguistic?
Language change is complicated and can’t always be boiled down to a few numbers or metrics.
But we can agree that social factors and linguistic factors represent different aspects of innovation adoption.
Most online language change studies have treated them separately.
Let’s use computational methods to examine linguistic factors and compare to social factors!
Is it social or linguistic?
Language change is complicated and can’t always be boiled down to a few numbers or metrics.
But we can agree that social factors and linguistic factors represent different aspects of innovation adoption.
Most online language change studies have treated them separately.
Let’s use computational methods to examine linguistic factors and compare to social factors!
Shameless plug
GT Computational Linguistics lab has more papers on language change!
Thanks!
Questions?