Probing Covert Racism in Language Models Through the Lens of Dialect Prejudice
Valentin Hofmann��UHH Data Science Group, 12/16/2024
V. Hofmann, P. R. Kalluri, D. Jurafsky, S. King. AI generates covertly racist decisions about people based on their dialect. Nature, 633:147–154, 2024.��F. Lin, S. Mao, E. La Malfa, V. Hofmann, A. de Wynter, J. Yao, S. Chen, M. Wooldridge, F. Wei. One language, many gaps: Evaluating dialect fairness and robustness of large language models in reasoning tasks. arXiv:2410.11005.
Racial Bias in AI Systems
Facial recognition
Racial Bias in AI Systems
Facial recognition
Criminal risk assessment
Racial Bias in AI Systems
Facial recognition
Criminal risk assessment
What about language models (LMs)?
Example Study 1: Sheng et al. (2019)
Example Study 1: Sheng et al. (2019)
Example Study 2: Tamkin et al. (2023)
Example Study 2: Tamkin et al. (2023)
Example Study 2: Tamkin et al. (2023)
positive discrimination
negative discrimination
Example Study 2: Tamkin et al. (2023)
Evidence for positive racial discrimination in Claude 2.0
positive discrimination
negative discrimination
Has racial bias in LMs �been resolved?
Shortcoming of Prior Work
Shortcoming of Prior Work
Shortcoming of Prior Work
Raciolinguistic Stereotypes
Raciolinguistic Stereotypes
Jeantel
Martin
Zimmerman
Raciolinguistic Stereotypes
Raciolinguistic Stereotypes
Questions of This Talk
Questions of This Talk
Methodology
Methodology
Matched Guise Probing
Matched Guise Probing
Meaning-matched texts in SAE and AAE
Matched Guise Probing
Meaning-matched texts in SAE and AAE
Prompts asking for speaker traits
Matched Guise Probing
Meaning-matched texts in SAE and AAE
Prompts asking for speaker traits
Language model
Matched Guise Probing
Meaning-matched texts in SAE and AAE
Prompts asking for speaker traits
Language model
Adjectives
Computing AAE Association Scores
Computing AAE Association Scores
Computing AAE Association Scores
n AAE/SAE text pairs
Computing AAE Association Scores
n AAE/SAE text pairs
probability of adjective x following AAE version
Computing AAE Association Scores
n AAE/SAE text pairs
probability of adjective x following SAE version
probability of adjective x following AAE version
Interpretation of AAE Association Scores
Interpretation of AAE Association Scores
Data
Experimental Setup
Experimental Setup
Top Stereotypes About African Americans
Top Stereotypes About African Americans
Covert stereotypes of all LMs are more negative than human stereotypes reported in any year
Adjectives with highest average association scores q(x; v, θ)
Top Stereotypes About African Americans
Overt stereotypes of all LMs are much more positive than their covert stereotypes
Covert stereotypes of all LMs are more negative than human stereotypes reported in any year
Adjectives with highest average association scores q(x; v, θ)
Top Stereotypes About African Americans
Adjectives with highest average association scores q(x; v, θ)
Covert stereotypes of GPT2, RoBERTa, and T5 are strikingly similar to human stereotypes from 1933!
Top Stereotypes About African Americans
Adjectives with highest average association scores q(x; v, θ)
Stereotypes for GPT3.5 and GPT4 have the complete opposite direction in the overt versus the covert setting!
Favorability Analysis
Favorability Analysis
Favorability Analysis
Temporal Agreement Analysis
Is It Really a Prejudice Against AAE?
Is It Really a Prejudice Against AAE?
Use of invariant be for habitual aspect as in he be drinkin
Alternative Explanation 1
Alternative Explanation 1
Alternative Explanation 2
Alternative Explanation 2
LMs hold covert stereotypes about African Americans that are triggered by dialect
Questions of This Talk
Experimental Setup
Experimental Setup
Experimental Setup
Note that the use of LMs in such contexts is inherently problematic, and we do not support it in any way!
Employability Analysis
Employability Analysis
Employability Analysis
Most jobs get less likely with AAE!
Criminality Analysis
Dialect prejudice in LMs perpetuates discrimination against African Americans
Questions of This Talk
Experimental Setup
Scaling Analysis
Scaling Analysis
Scaling Analysis
Human Feedback Analysis
Human Feedback Analysis
Currently used methods do not resolve dialect prejudice
Questions of This Talk
Model Sandbagging
Experimental Setup
General Performance
All tested LMs perform substantially worse on dialectal input
Comparison with Perturbed Text
Comparison with Perturbed Text
Dialect prejudice might affect the performance on downstream tasks
Conclusion 1
Conclusion 2
Thank You for Your Attention!
E-mail: valentinh@allenai.org�X: @vjhofmann�Bluesky: @valentinhofmann