1 of 10

Devika Kakkar

Centre for Geographic Analysis, Harvard University

kakkar@fas.harvard.edu

Unlocking Geospatial Big Data �for Climate Change Research

2 of 10

2

Climate Change impact on Psychological Well-being

Climate Change impact on �Physical Well-being

1

2

3 of 10

Project I: Introduction

3

Study the effects of climate change on psychological well being using social media data

Objective

Establish causal link between extreme heat and expressed sentiment on Twitter

Approach

Enrich every tweet in �CGA’s Geotweet Archive with Sentiment and Geography

Method

4 of 10

Vector Big Data Challenge

4

2010

Present

Multilingual

4TB size

10 Billion tweets

12 years

Real-time ongoing collection

150,000 tweets/hour

Global coverage

Geotweet Archive: 2021 Statistics

5 of 10

Solution

5

SMES (Social Media data Enrichment System)

Sentiment

Natural Language Processing

Geotagged Tweets

Geography

GPU Database

(GIS-based) Econometric Modeling

6 of 10

Results and Impact

6

Dr. Juan Palacios

Head of Research, MIT SUL

10 Billion Tweets enriched with Sentiment and Geography

The MIT SUL Lab undertook a global study on the social costs of extreme weather events. The access to the archive of high-frequency geocoded Tweets allowed us to include in our study granular measures of heat exposure and make use of state of the art econometric techniques to draw some causal links between extreme heat and expressed sentiment.

High Performance System: 100,000 tweets stamped/minute

Real Time enrichment running on NERC

Open Source Social Media Data Enrichment Platform

Sentiment Data Archive under development

Open- access Data Infrastructure for wider community

Two working paper targeting Nature

7 of 10

Project II: Introduction

7

Improve the health of mothers and their children by looking at social and environmental factors

Objective

Examine the effects of climate related environmental exposures at the address level

Approach

Extract climate variables at cohort member address locations over time

Method

This work is sponsored by Dr. Diane Gold of the Harvard T.H. Chan School of Public Health (HSPH) within the NIH-ECHO program, grant UH3OD023286. Data analysis assistance with the Viva cohort provided by Jeff Blossom of the CGA and Heike Gibson of HSPH. This work is also partially sponsored by NSF Award #1841403.

8 of 10

Raster Big Data Challenge

8

7 climate variables

48 U.S. States

8TB Storage, 85 MB each

100,000 Rasters

20 years data

5,000 address locations

800m resolution PRISM data

1999

2018

9 of 10

Solution

9

RINX (Raster Information Extraction System)

BASH automation for Loading

Input Climate Rasters + Address

PostGIS for Extraction

Climate �variables for each patient/day

10 of 10

Results and Impact

10

10

10.3 Million patient/days of calculation

Extraction of 7 Climate Variable for 20 years of data

High Performance System: 1 day for loading, 4 days for extraction

Easily Replicable solution

Easily Scalable solution

Open Source System

Two working papers from Harvard School of Public Health

Dr. Nicholas Nassikas

The PRISM climate data extracted by the CGA allowed us to study associations of precipitation, relative humidity and temperature with lung function in children. The climate data will also allow us to study similar associations in adults. There is a need to determine if short term exposures to these weather conditions affects the respiratory health of children and adults, especially in the context of a changing climate.

MD, Beth Israel Deaconess Medical Center

Instructor, Harvard Medical School