1 of 34

Computational Text Analysis

Christopher Barrie�

Week 9��

2 of 34

Introduction

3 of 34

Supervised learning

You’ve tried dictionary-based methods

And that’s kind of like classifying…

4 of 34

5 of 34

An example

Denominating by totals

What if we also had lots of words indicating other types of sentiment.

E.g., a typical phrase like “I was furious to be at a lecture at 9AM but I was filled with utter joy and elation upon entering the classroom…”
And then we counted up words denoting sentiment…?
Ideas?

6 of 34

An example

Denominating by totals

“I was furious to be at a lecture at 9AM but I was filled with utter joy and elation upon entering the classroom…”

+ 2 “happy” words
+ 1 “unhappy” words
+ 23 total words

2/23 = .087…
1/12 = .044…

7 of 34

Supervised learning

You’ve tried dictionary-based methods

And that’s kind of like classifying…

We have:

1. Some rule according to which we’re classifying (supervised)
2. Some output unit of analysis we’re targeting

8 of 34

Supervised learning

You’ve tried word embedding methods

And that’s also useful when classifying…

9 of 34

I like doing text analysis with Chris

10 of 34

Word embeddings

Context window: how many words around the target word we are counting
Co-occurence: for any two words, the number of times they appear together in a context window
Words in red = target words
Words in blue = context words

I like doing text analysis with Chris

11 of 34

Word embeddings

How does it work?
We count up the co-occurrences of words over our pre-specified context window (often ~ 6 words)

I like doing text analysis with Chris

I 0 1 1 0 0 0 0

like 0 0 1 1 0 0 0

doin 0 0 0 1 1 0 0

text 0 0 0 0 1 1 0

an… 0 0 0 0 0 1 1

with 0 0 0 0 0 0 1

Chris 0 0 0 0 0 0 0

Context window = 2

12 of 34

Word embeddings

Now imagine what this would look like for a whole book

many dimensions!
matrix of dimensions V x V where V is vocabulary of corpus
e.g…

13 of 34

Supervised learning

You’ve tried word embedding methods

And that’s also useful when classifying…

We can use DTMs as input for classification algorithm
We can also use embeddings as input (not covered here)

14 of 34

15 of 34

Supervised learning

Ways of approaching supervised learning problems:

Train your own model from step 1 using standard algo. (Naive Bayes, Random Forest etc.)
Classify using some pre-packaged engine (e.g., Perspective)
Classify by fine-tuning some Transformer-based architecture
Classify in zero- or few-shot setting

16 of 34

An example

Trumping Hate on Twitter? Online Hate Speech in the 2016 U.S. Election Campaign and its Aftermath. Siegel et al. 2021. Quarterly Journal of Political Science

17 of 34

An example

Trumping Hate on Twitter? Online Hate Speech in the 2016 U.S. Election Campaign and its Aftermath. Siegel et al. 2021. Quarterly Journal of Political Science

18 of 34

Recent advances

19 of 34

An example

Attention Is All You Need. Vaswani et al. 2017. NeurIPS

20 of 34

An example

Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models. Bonikowski et al. 2022. Sociological Methods and Research

21 of 34

An example

Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models. Bonikowski et al. 2022. Sociological Methods and Research

22 of 34

An example

Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models. Bonikowski et al. 2022. Sociological Methods and Research

23 of 34

Transformer-based models

Transformers:

Take pre-trained model
Fine tune with classification head (labelled data)
Higher performance
Higher speed
Reproducible and accessible: https://huggingface.co/docs/transformers/model_doc/bert

24 of 34

An example

Language Models are Few Shot Learners. Brown et al. 2020. NeurIPS

25 of 34

Foundation models

LLMs:

Take pre-trained model
Add a prompt or series of examples
Higher performance (versus Transformers) on some tasks
Question marks over accessibility and reproducibility
An example: https://chat.openai.com/share/657b645a-f166-4457-8f16-32a9ecbf1437

26 of 34

An example

ChatGPT outperforms crowd workers for text-annotation tasks. Gilardii et al. 2023. PNAS

27 of 34

An example

ChatGPT outperforms crowd workers for text-annotation tasks. Gilardii et al. 2023. PNAS

28 of 34

An example

ChatGPT outperforms crowd workers for text-annotation tasks. Gilardii et al. 2023. PNAS

29 of 34

Some terminology…

30 of 34

Supervised learning

Validation -

Often relies on human coders

Accuracy: % correctly classified
Recall: true positive/true positive + false negative
Precision: true positive/true positive + false positive
ROC (receiver operating characteristic) curves: true positive rate (i.e., recall a.k.a. sensitivity) versus false positive rate (i.e., false alarm rate)

31 of 34

Source: https://smltar.com/mlclassification.html#classfirstattemptlookatdata

32 of 34

The course book...

33 of 34

https://cjbarrie.github.io/CTA-ED/

34 of 34

Thanks!

christopher.barrie@ed.ac.uk

https://www.cjbarrie.com/

@cbarrie