1 of 41

Additional explorations in text analysis

Kevin Lanning

SICSS – South Florida

2 of 41

Overview: Three projects

Using language to identify scholarly communities (brief)�Personality and ego development (not brief)�Word use in news transcripts (brief)

Papers are in Zotero and links to R code are (mostly) in the papers - or just ask me.

3 of 41

Language of scholarly communities

  • Code in Google drive

4 of 41

Language of scholarly communities

  • Code in Google drive

5 of 41

Personality and ego development

  • Background
  • Study 1: Examined words and LIWC categories characteristic of each ego level in sentence completions (Nature Human Behavior, 2018)
  • Study 2: Expanded the ego lexicon with broader dictionaries (SPSP 2019)
  • Study 3: Moved from words to texts (compared several ways for scoring ego level and applied these to sentence completions and blogs (SPSP 2020)

6 of 41

personality > traits

7 of 41

“maturity”

The construct(s) of ego development

Cognitive

Social

Moral

Autonomous/

Integrated (8

-

9)

Fulfillment

Interdependence

Complexity

Individualistic (7)

Development

Mutuality

Tolerance

Conscientious (6)

Achievement

Responsibility

Self

-

criticism

Self

-

Aware (5)

Adjustment

Helpfulness

Exceptions

Conformist (4)

Appearances

Loyalty

Obedience

Self

-

Protective (3)

Trouble

Wariness

Opportunism

Impulsive (2)

Good vs. bad

Solipsism

Urges

8 of 41

The measure: Wash U. Sentence Completion Test

When a child will not join in group activities…

Impulsive (2)

… they are sick

Self-Protective (3)

… give him 2 choices, join or sit by himself

Conformist (4)

… he might be tired

Self-Aware (5)

I wonder what is wrong

Conscientious (6)

… I wonder if he doesn't feel good about himself

Individualistic (7)

… it may be a healthy or unhealthy sign

Autonomous/�Integrated (8-9)

… it's sometimes a reflection on the group, not the child.

9 of 41

Overview / goals�

  • Compile as many scored responses to the WUSCT as possible
  • Elucidate the construct of ego development using language analysis of both LIWC categories and individual words
  • - is there evidence supporting a stage model?
  • - how are stages of development expressed in language?
  • Provide a starting point for ego level as a culturomic tool

10 of 41

The construct(s) of ego development

“maturity”

Cognitive

Social

Moral

Autonomous/

Integrated (8

-

9)

Fulfillment

Interdependence

Complexity

Individualistic (7)

Development

Mutuality

Tolerance

Conscientious (6)

Achievement

Responsibility

Self

-

criticism

Self

-

Aware (5)

Adjustment

Helpfulness

Exceptions

Conformist (4)

Appearances

Loyalty

Obedience

Self

-

Protective (3)

Trouble

Wariness

Opportunism

Impulsive (2)

Good vs. bad

Solipsism

Urges

11 of 41

Sample

Impulsive (2)

Self-Protective (3)

Conformist (4)

Self-Aware (5)

Conscientious (6)

Individualistic (7)

Autonomous/ Integrated (8-9)

total

Responses at each level

Univ I

63

184

1287

1622

1006

178

24

4364

Univ II

187

1045

6746

9933

7081

1407

159

26558

Midlife

27

120

1131

1945

1640

479

76

5418

Exemplar

141

402

1007

2167

2412

1130

334

7593

Total

418

1751

10171

15667

12139

3194

593

43933

Words coded at each level

N words

1456

7876

47969

103618

108911

39925

10655

320410

N distinct words

398

1413

3273

5493

6129

3932

2032

10670

The data

12 of 41

Some LIWC results

13 of 41

Ego level as sequence

2

3

4

5

6

7

8-9

Impulsive (2)

0.90

0.85

0.83

0.80

0.75

0.70

Self-Protective (3)

0.90

0.90

0.91

0.88

0.84

0.80

Conformist (4)

0.84

0.90

0.95

0.92

0.88

0.83

Self-Aware (5)

0.83

0.91

0.95

0.98

0.96

0.93

Conscientious (6)

0.79

0.87

0.91

0.98

0.99

0.97

Individualistic (7)

0.75

0.83

0.87

0.96

0.99

0.98

Autonomous/ Integrated (8-9)

0.70

0.79

0.83

0.93

0.97

0.98

Correlations between word use at different levels supports a simplex model.

Correlations between word counts across all 10670 (top) or most common 1811 terms.

14 of 41

15 of 41

16 of 41

17 of 41

18 of 41

19 of 41

20 of 41

21 of 41

22 of 41

23 of 41

Expanding the model

  • For each of seven levels, reduce terms into a set of homogeneous facets
  • Assess cosine similarities between these facets and vectors of the lexicon (using the common crawl.)
  • Weigh these vectors and combine them into new measures of seven ego levels.
  • Combine these seven measures (Impulsive …Autonomous) into a single ego score.
  • Apply these scores to new corpora – here, offensive tweets and ads in the 2016 presidential campaign (considered briefly) and presidential speeches (considered at greater length).

24 of 41

Expanding the ego lexicon

level

nWords

Facet Weight

facet

Original dictionaries:

words in SCTs

2

4

0.9

aggression

bothers, fight, hate, violence

2

2

0.7

banal- hyperbolic

amazing, awesome

2

6

1

banal-cool

cool, fine, liked, nice, ok, okay

2

4

0.8

prohibition

cant, nobody, nothing, rules

hate(.78), violence, fight, bothers, hating, hatred, dislike, fights, racism, fighting, disrespect, annoys, hates, bashing, disgusts, think, injustice, bigotry, complain, blaming, violent, hated, bullying, pisses, bullies, scares, loathe, anymore, animosity, misogyny, irks, bullshit (.50) …

Impulsive-aggression words in expanded dictionary

25 of 41

Ego level in multiple samples

(a weak test, passed)

26 of 41

(a stronger test, failed)

27 of 41

28 of 41

29 of 41

Step 3: Reexamining ego level in words and texts (today)

  • Is the ego level of a text essentially the average of the ego level of its constituent words?
  • Can the (expanded) dictionaries be used to score ego level in other texts?

30 of 41

When ego level is computed as averages, estimates for long responses are too low…

31 of 41

…and estimates for short responses may be too high.

32 of 41

LIWC scales associated with prediction errors in sentence completions

33 of 41

Exploring a regression approach

  • Series 1: Original data / original dictionaries
    • Data : Initial sample of 45000 responses, split into 80% training/20% test
    • Model 1: Ego level
    • Model 2: From seven ego levels
    • Model 3: + WC
    • Model 4: + all LIWC
    • Model 5: - small/LASSO
  • Series 2: Original data / expanded dictionaries
  • Series 3 and 4: Blogs/ original and expanded dictionaries

34 of 41

Regressions: Sentence completions

Original dictionaries

Expanded dictionaries

35 of 41

Blogs

  • Blog Authorship Corpus (Schler et al 2006), available in csv format from Kaggle.
  • All blogs from <= 2004, 681,288 posts, 140 million words - or approximately 35 posts and 7250 words per person.
  • Here, a fraction (5%) of the data are examined, including ~ 33000 posts from ~9000 persons.
  • No measure of ego level - but there is age.

36 of 41

Blogs: Predicting age from sentence-completion derived models

From original dictionaries

From expanded dictionaries

Model Blogs Persons Blogs Persons

37 of 41

Summary

  • Blogs and responses to sentence completions are not the same
    • In the sentence completions, the more complex models outperformed simpler ones
    • This did not generalize to the very different blogs, with a very different “criterion”
  • No all-purpose tool for assessing ego level across texts is ideal. But the expanded ego dictionaries appear ok.

38 of 41

Summary of personality stuff

  • Blogs and responses to sentence completions are not the same
    • In the sentence completions, the more complex models outperformed simpler ones
    • This did not generalize to the very different blogs, with a very different “criterion”
  • No all-purpose tool for assessing ego level across texts is ideal. But the expanded ego dictionaries appear ok.

39 of 41

Fox and MSNBC

40 of 41

Fox > MSNBC

41 of 41

MSNBC > Fox