1 of 81

Probing Covert Racism in Language Models Through the Lens of Dialect Prejudice

Valentin Hofmann��UHH Data Science Group, 12/16/2024

2 of 81

V. Hofmann, P. R. Kalluri, D. Jurafsky, S. King. AI generates covertly racist decisions about people based on their dialect. Nature, 633:147–154, 2024.��F. Lin, S. Mao, E. La Malfa, V. Hofmann, A. de Wynter, J. Yao, S. Chen, M. Wooldridge, F. Wei. One language, many gaps: Evaluating dialect fairness and robustness of large language models in reasoning tasks. arXiv:2410.11005.

3 of 81

Racial Bias in AI Systems

Facial recognition

4 of 81

Racial Bias in AI Systems

Facial recognition

Criminal risk assessment

5 of 81

Racial Bias in AI Systems

Facial recognition

Criminal risk assessment

What about language models (LMs)?

6 of 81

Example Study 1: Sheng et al. (2019)

7 of 81

Example Study 1: Sheng et al. (2019)

8 of 81

Example Study 2: Tamkin et al. (2023)

9 of 81

Example Study 2: Tamkin et al. (2023)

10 of 81

Example Study 2: Tamkin et al. (2023)

positive discrimination

negative discrimination

11 of 81

Example Study 2: Tamkin et al. (2023)

Evidence for positive racial discrimination in Claude 2.0

positive discrimination

negative discrimination

12 of 81

Has racial bias in LMs �been resolved?

13 of 81

Shortcoming of Prior Work

Prior work focused on racial stereotypes and discrimination triggered by explicit mentions of race (e.g., “Black man”)

14 of 81

Shortcoming of Prior Work

Prior work focused on racial stereotypes and discrimination triggered by explicit mentions of race (e.g., “Black man”)
Racism can manifest in subtle forms, which have largely been overlooked

15 of 81

Shortcoming of Prior Work

Prior work focused on racial stereotypes and discrimination triggered by explicit mentions of race (e.g., “Black man”)
Racism can manifest in subtle forms, which have largely been overlooked
Colorblindness

Racist behavior is overtly rejected (“I don't see �color. I just see people.”)
Racism continues to exist on a more covert level
Example: residential choices

16 of 81

Raciolinguistic Stereotypes

Stereotypes due to raciolinguistic ideologies (Rosa and Flores, 2017)
Speakers of African American English (AAE) experience discrimination in a range of contexts, including education, employment, and legal outcomes

17 of 81

Raciolinguistic Stereotypes

Stereotypes due to raciolinguistic ideologies (Rosa and Flores, 2017)
Speakers of African American English (AAE) experience discrimination in a range of contexts, including education, employment, and legal outcomes
Example: Rachel Jeantel’s testimony in George Zimmerman trial dismissed as incomprehensible and not credible (Rickford and King, 2016)

Jeantel

Martin

Zimmerman

18 of 81

Raciolinguistic Stereotypes

Raciolinguistic stereotypes encoded on the web

19 of 81

Raciolinguistic Stereotypes

Raciolinguistic stereotypes encoded on the web
LMs are trained on this data
Do LMs pick up raciolinguistic stereotypes?

20 of 81

Questions of This Talk

Do LMs exhibit raciolinguistic stereotypes about speakers of AAE?
In what way do raciolinguistic stereotypes affect the decisions that LMs make about speakers of AAE?
How can raciolinguistic stereotypes in LMs be resolved?
Can raciolinguistic stereotypes affect the downstream performance of LMs?

21 of 81

Questions of This Talk

Do LMs exhibit raciolinguistic stereotypes about speakers of AAE?
In what way do raciolinguistic stereotypes affect the decisions that LMs make about speakers of AAE?
How can raciolinguistic stereotypes in LMs be resolved?
Can raciolinguistic stereotypes affect the downstream performance of LMs?

22 of 81

Methodology

We want to measure the stereotypes that LMs exhibit about speakers of AAE compared to speakers of Standardized American English (SAE)
We draw upon the matched guise technique developed in sociolinguistics

Participants listen to audio recordings in two languages and are asked to make judgments about various traits of the speakers

23 of 81

Methodology

We want to measure the stereotypes that LMs exhibit about speakers of AAE compared to speakers of Standardized American English (SAE)
We draw upon the matched guise technique developed in sociolinguistics

Participants listen to audio recordings in two languages and are asked to make judgments about various traits of the speakers
Both recordings were spoken by the same (bilingual) speaker

24 of 81

Matched Guise Probing

25 of 81

Matched Guise Probing

Meaning-matched texts in SAE and AAE

26 of 81

Matched Guise Probing

Meaning-matched texts in SAE and AAE

Prompts asking for speaker traits

27 of 81

Matched Guise Probing

Meaning-matched texts in SAE and AAE

Prompts asking for speaker traits

Language model

28 of 81

Matched Guise Probing

Meaning-matched texts in SAE and AAE

Prompts asking for speaker traits

Language model

Adjectives

29 of 81

Computing AAE Association Scores

p(x|v(t); θ): probability of an adjective x following an (AAE or SAE) text t embedded in a prompt v, given an LM θ

30 of 81

Computing AAE Association Scores

p(x|v(t); θ): probability of an adjective x following an (AAE or SAE) text t embedded in a prompt v, given an LM θ
AAE association scores for adjectives:

31 of 81

Computing AAE Association Scores

p(x|v(t); θ): probability of an adjective x following an (AAE or SAE) text t embedded in a prompt v, given an LM θ
AAE association scores for adjectives:

n AAE/SAE text pairs

32 of 81

Computing AAE Association Scores

p(x|v(t); θ): probability of an adjective x following an (AAE or SAE) text t embedded in a prompt v, given an LM θ
AAE association scores for adjectives:

n AAE/SAE text pairs

probability of adjective x following AAE version

33 of 81

Computing AAE Association Scores

p(x|v(t); θ): probability of an adjective x following an (AAE or SAE) text t embedded in a prompt v, given an LM θ
AAE association scores for adjectives:

n AAE/SAE text pairs

probability of adjective x following SAE version

probability of adjective x following AAE version

34 of 81

Interpretation of AAE Association Scores

q(x; v, θ) > 0: LM θ associates adjective x more with AAE given prompt v

35 of 81

Interpretation of AAE Association Scores

q(x; v, θ) > 0: LM θ associates adjective x more with AAE given prompt v
q(x; v, θ) < 0: LM θ associates adjective x more with SAE given prompt v

36 of 81

Data

Two sets of texts:

Meaning-matched: AAE tweets and SAE translations (Groenwold et al., 2020)
Non-meaning-matched: AAE and SAE tweets with different content (Blodgett et al., 2016)

Adjectives are from the Princeton Trilogy (Katz and Braly, 1933; Gilbert, 1951; Karlins et al., 1969)

37 of 81

Experimental Setup

We analyze the covert, raciolinguistic stereotypes of LMs and the overt stereotypes that LMs show when race is explicitly mentioned

Example covert prompt: A person who says [TEXT] is [ADJECTIVE]
Example overt prompt: A person who is Black is [ADJECTIVE]

38 of 81

Experimental Setup

We analyze the covert, raciolinguistic stereotypes of LMs and the overt stereotypes that LMs show when race is explicitly mentioned

Example covert prompt: A person who says [TEXT] is [ADJECTIVE]
Example overt prompt: A person who is Black is [ADJECTIVE]

We compare the stereotypes of LMs with those of humans from the Princeton Trilogy as well as a recent reinstallment (Bergsieker et al., 2012)
Five LMs: RoBERTa, GPT2, GPT3.5, GPT4, T5

39 of 81