Mapping America’s Digital Dialects
Clayton Hamre
Using data science, social media, and spatial analysis to explore the English language
Research questions
Data extraction
Data processing
Global spatial autocorrelation analysis
Global Moran’s I results (North America)
Index: 0.757749
Z-score: 13.67815
p-value: < 0.000001
“Raw” values for variables
Getis-Ord GI* Z-scores
Principal components
Eight regional clusters
Local spatial autocorrelation
Principal
component
analysis in R
Agglomerative
cluster analysis in R
Variables with significant global spatial autocorrelation were included in a regional analysis
Base map created in ArcGIS – Map content created in Inkscape
Statistics for inter-country analysis
Anyway vs. anyways in US, Canadian, and UK city subreddits
Equal usage of
the two variants
Greater usage
of anyway
All three countries used anyway more often than anyways, but they significantly differed in how much more often. Canada used anyways the most.
Network analysis
User data
Python script
Shared users table
There is a strong correlation between lexical distance and shared users among subreddits
Takeaways