Devika Kakkar
Centre for Geographic Analysis, Harvard University
2024 ESRI User Conference (UC)
Bridging the Gap: A Case Study of Integrating Social Media Big Data with Geography
Introduction: Social Media and Geography
2
Social media platforms generate data that serve as social sensors over space and time
Traditional GIS systems have challenges in handling big data
Real-time analysis in urban dynamics, public health and disaster management
Definition
Challenges
Opportunity
3
Challenges in Big Geospatial Data Merging
4
Approaches
Move geospatial process from desktops to clouds / clusters
Manage data storage in proximity with tools for efficient access
Leverage GPU-based solutions for high performance tasks
Containerize applications for rapid deployment and shut down
Scale applications based on the computing needs
CGA’s Geotweet Archive
5
Collection of 10 Billion geotweets
Stored on Harvard’s High Performance Cluster
Sentiment enrichment using BERT
Global coverage (164 countries)
10 years of data (2010-2023)
5 TB of dataset
Multilingual dataset
To the best of our knowledge, it is the first social media dataset of this scale and granularity!
In contrast to the other datasets, it is not limited to a specific topic, period, or location.
U.S. Administrative Boundaries
7
Methodology: Computing Environment
Cluster Computing Infrastructure
Harvard University Research Computing Cluster (FASRC)
Methodology: Geography Enrichment
8
GPU-Based
CPU-Based
1
2
9
3 Billion Tweets, 8.18 million U.S. Census Blocks
Multi processing system using CPUs/ GPUs on HPC
Publicly available on Dataverse
Real-time analysis at census block level for over a decade
Novel dataset of this temporal and geographic granularity
Result: Geowteets Census Archive
Result: Open Access Dataset
10
Harvard Dataverse Archive
Github repository
11
Real-time political expression across granular geographic level
Result: Political Science Use-Case
Results and Impact
12
Geo-located tweets provided us a unique opportunity to observe real-time political expressions across granular geographic units over a decade. It is unprecedented to observe public opinion with such temporal and geographic granularity over a decade. This data also became easy to use with the CGA's effort enriching the data with census information.
Sun Young Park
Doctorate Student
Department of Government
Park, S. (2024), "Political Animosity in Declining and Prospering Areas: An Analysis of 3 Billion Geo-located Tweets Spanning a Decade in Three Advanced Democracies”, Working Paper.
Park, S., Brown, J., and Enos, R. (2024), "Does Where You Live Influence How You Talk About Politics Online?", Working paper
13
Future Work: GenAI based User-Interface
Acknowledgements
14
Co-authors
Xiaokang Fu Jack Hayes
Questions / Comments?