Devika Kakkar
Centre for Geographic Analysis, Harvard University
kakkar@fas.harvard.edu
Unlocking Geospatial Big Data �for Climate Change Research
2
Climate Change impact on Psychological Well-being
Climate Change impact on �Physical Well-being
1
2
Project I: Introduction
3
Study the effects of climate change on psychological well being using social media data
Objective
Establish causal link between extreme heat and expressed sentiment on Twitter
Approach
Enrich every tweet in �CGA’s Geotweet Archive with Sentiment and Geography
Method
Vector Big Data Challenge
4
2010
Present
Multilingual
4TB size
10 Billion tweets
12 years
Real-time ongoing collection
150,000 tweets/hour
Global coverage
Geotweet Archive: 2021 Statistics
Solution
5
SMES (Social Media data Enrichment System)
Sentiment
Natural Language Processing
Geotagged Tweets
Geography
GPU Database
(GIS-based) Econometric Modeling
Results and Impact
6
Dr. Juan Palacios
Head of Research, MIT SUL
10 Billion Tweets enriched with Sentiment and Geography
The MIT SUL Lab undertook a global study on the social costs of extreme weather events. The access to the archive of high-frequency geocoded Tweets allowed us to include in our study granular measures of heat exposure and make use of state of the art econometric techniques to draw some causal links between extreme heat and expressed sentiment.
High Performance System: 100,000 tweets stamped/minute
Real Time enrichment running on NERC
Open Source Social Media Data Enrichment Platform
Sentiment Data Archive under development
Open- access Data Infrastructure for wider community
Two working paper targeting Nature
Project II: Introduction
7
Improve the health of mothers and their children by looking at social and environmental factors
Objective
Examine the effects of climate related environmental exposures at the address level
Approach
Extract climate variables at cohort member address locations over time
Method
This work is sponsored by Dr. Diane Gold of the Harvard T.H. Chan School of Public Health (HSPH) within the NIH-ECHO program, grant UH3OD023286. Data analysis assistance with the Viva cohort provided by Jeff Blossom of the CGA and Heike Gibson of HSPH. This work is also partially sponsored by NSF Award #1841403.
Raster Big Data Challenge
8
7 climate variables
48 U.S. States
8TB Storage, 85 MB each
100,000 Rasters
20 years data
5,000 address locations
800m resolution PRISM data
1999
2018
Solution
9
RINX (Raster Information Extraction System)
BASH automation for Loading
Input Climate Rasters + Address
PostGIS for Extraction
Climate �variables for each patient/day
Results and Impact
10
10
10.3 Million patient/days of calculation
Extraction of 7 Climate Variable for 20 years of data
High Performance System: 1 day for loading, 4 days for extraction
Easily Replicable solution
Easily Scalable solution
Open Source System
Two working papers from Harvard School of Public Health
Dr. Nicholas Nassikas
The PRISM climate data extracted by the CGA allowed us to study associations of precipitation, relative humidity and temperature with lung function in children. The climate data will also allow us to study similar associations in adults. There is a need to determine if short term exposures to these weather conditions affects the respiratory health of children and adults, especially in the context of a changing climate.
MD, Beth Israel Deaconess Medical Center
Instructor, Harvard Medical School