Social Influences on the Production of Text
AJ Alvero
SICSS-South Florida 2023
Roadmap for today's talk
Computational text analysis for social science: big picture
Example: Stereotype shifts over time in newspaper data (PNAS)
Example: Disparities in scientific uptake with dissertation abstracts (PNAS)
Example: Variation in rental postings across neighborhood racial demographics (Social Forces)
Results show that listings from White neighborhoods emphasize trust and connections to neighborhood history and culture, while listings from non-White neighborhoods offer more incentives and focus on transportation and development features, sundering these units from their surroundings.
Example: Tweets are better predictor of county level heart disease than common risk factors (Psychological Science)
A cross-sectional regression model based only on Twitter language predicted AHD mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking, diabetes, hypertension, and obesity.
Computational text analysis for social science: big picture
pollev.com/ajalvero275
My background, interests, and theoretical perspectives
Example of theory informed CSS: sociolinguistic perspective
So how can we show these types of patterns in text?
Topic modeling: document content via word co-occurrences
The sport I love baseball. I hit homeruns and doubles when I bat. I also put on my mitt to play catch.
Basketball is my favorite sport. I hit three-pointers and alley-oop. I also dribble and catch passes.
Tennis is my top sport. I hit and serve the ball in a way that they can't volley. Doubles is fun.
Document 1
Document 2
Document 3
R2 = .12
R2 = .19
R2 = .49
Topic modeling example: what students write about is highly predictive of income and SAT score
Word embedding: word meanings based on word neighbors
The sport I love baseball. I hit homeruns and doubles when I bat. I also put on my mitt to play catch.
Basketball is my favorite sport. I hit three-pointers and alley-oop. I also dribble and catch passes.
Tennis is my top sport. I hit and serve the ball in a way that they can't volley. Doubles is fun.
Document 1
Document 2
Document 3
Word embedding example: analogies
Using topic modeling to make predictions
Social Influences on Textual Production: Data
Social Influences on Textual Production: Methods
Social influences of textual production: Discussion
Wrapping up
Questions/Comments/Concerns?
Thank you!
aalvero@ufl.edu
But what kinds of methodological frameworks can we use?
Prediction
Quantitative
Qualitative
This text predicts Y better than this other X.
Adding text to the model increases prediction accuracy by X%
Text feature is correlated with X.
Treatment condition had effect of X on open ended survey responses.
X type of text used to signify Y
Image data described by AI using X kinds of language