Intro to Spatial Analysis & GIS for Spatial Epidemiology in R
SER Workshop
June 13, 2023��go.illinois.edu/SER2023
Marynia Kolak, PhD, MFA, MS
Qinyun Lin, PhD
Today’s workshop
Learning goals for today:
Today’s workshop
1:00 - 1:30pm | Introduction to R Spatial | Marynia |
1:30 - 2:00pm | Mapping Neighborhoods | Qinyun |
2:00 - 2:20pm* | Spatial Cluster Detection | Marynia |
2:30 - 3:00pm | Adding Health Resources | Marynia |
3:00 - 3:20pm* | Calculating Spatial Metrics | Qinyun |
3:30 - 4:00pm | Resource Sharing & Q&A | Qinyun |
Follow along with live coding: https://makosak.github.io/Intro2RSpatialMed/
*We will take two 10-minute breaks
Environment Set-Up
Introduction to R Spatial
What is Spatial Data?
Spatial Data = Information + Location 🎯�
Spatial data is data that contains information about specific locations.
Note: the information content of the data may change with location.�
On some occasions, spatial data may only include “location.”
But without “location,” the data is no longer spatial.
Spatial Data Types
/ Polygon
Non-Spatial & Spatial Data Formats
Non-Spatial & Spatial Data Formats
��
Non-Spatial & Spatial Data Formats
In R, we will work with the sf library. ��sf uses the following data structures:
Coordinate Reference Systems
A Coordinate Reference System (CRS) communicates what method should be used to flatten or project the Earth’s surface onto a 2-dimensional map.
Different CRS imply different ways of projections and can generate substantially different visualizations. For example, the following are some world maps using different projections →
Defining your CRS
CRSs can be referred to using a Spatial Reference System Identifier (SRID) including EPSG codes. The EPSG Database is one of the most comprehensive databases that store thousands of different projections.
EPSG 4326 is one of the most common projections used today.
In {sf}, you can use the function st_crs to check the CRS used in a data, & st_transform to re-project the data to a particular CRS.
R-spatial Ecosystem
An active and growing online community:
The popularity of spatial packages in R. The y-axis shows average number of downloads per day, within a 30-day rolling window. Source: Geocomputation in R
Live demo
Mapping Neighborhoods
Thematic Mapping
Thematic or chloropleth maps represent quantitative data through colors or patterns in different geographic areas.
There are many different ways to classify & thematically map data:�
Image: Axis Maps
Equal interval classification with 5 bins
Data Classification Methods
Different methods arrange your data using different boundaries to separate classes.
In a quantile map, data is grouped into classes or bins that each have the same number of observations.
3 classes = tertile �4 classes = quartile�5 classes = quintile
Data Classification Methods
Different methods arrange your data using different boundaries to separate classes.
In a natural breaks map (also called Jenks map), data is grouped so that the within-group homogeneity is maximized.
Note the number of observations in each class can be highly unequal.
Data Classification Methods
Different methods arrange your data using different boundaries to separate classes.
A standard deviation map can be useful for helping identify outliers.
The variable is transformed to standard deviational units, covering the range from lowest to highest.
Data Classification Methods
Different methods arrange your data using different boundaries to separate classes.
Additional methods:
Image: Axis Maps
Which method is best?
This depends on your data -- and the findings you are trying to convey
Quantile
Natural Breaks
Standard Deviation
Live demo
Spatial Cluster Detection
Formalizing Tobler’s Law: Spatial Autocorrelation
Using Test Statistics
When considering spatial autocorrelation, we test against a null hypothesis of spatial randomness.
Operationalizing Spatial Randomness
From Anselin, Fall 2017, SOC 20253
Spatial Autocorrelation Statistic
Attribute similarity
Location similarity
Defining Spatial Weights
Before we can test for spatial autocorrelation, we need to define neighbors.
We do this by creating a spatial weight, which is a matrix where neighbor relationships for each unit are recorded.
When creating a spatial weight, think about how far the impact can reach...
Europe, queen contiguity weights
Spatial Weight - 2nd Order Queen Contiguity
Same area, different spatial weight (w)
2nd order queen contiguity
1st order rook contiguity
Contiguity-Based Weights
Rook Contiguity Weight
Queen Contiguity Weight
| | |
| | |
| | |
| | |
| | |
| | |
Distance-Based Weights
KNN Weight
Europe, KNN-4 contiguity weights
Matrix Representation
If 2 are neighbors, the value = 1. Otherwise = 0.
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
3 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
4 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
5 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
6 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 |
7 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 |
8 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 |
9 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
Global Spatial Autocorrelation
Moran’s I Test Statistic:
zi = yi - mx : deviations from the mean
Cross product statistic ~ correlation stat.
z statistic: (observed - mean) / SD�Comparable across variables & spatial weights
Need to assess whether computed value of Moran’s I is significantly different from a value for a spatially random distribution:
Moran’s I & the Permutation Approach
Tip: More permutations (ex. n=999 vs 499), more stability in findings.
Spatial Autocorrelation Measures
Global Clustering Measure: Is there spatial clustering present?
Local Clustering Measures: Where and what type of spatial clustering is present?
Spatial Clusters
Spatial Outliers
Spatial Clusters
https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html
Adding Resources
What is geocoding?
Geocoding is the process of converting addresses into geographic coordinates using a known coordinate reference system (CRS).
We can then use these coordinates (latitude, longitude) to spatially enable data.
Input | Output |
Address | Latitude, longitude coordinates |
In R, we use the tidygeocoder package
Read in csv or text file with addresses → data.frame
In R, we use the tidygeocoder package
Geocode addresses to generate latitude, longitude coordinates
Prepare data
In R, we use the tidygeocoder package
→ convert to spatial data sf
Mapping points
Now that addresses have been cleaned, geocoded, and spatially-enabled, they can be added as a point layer on a map
Live demo
Calculating Spatial Metrics
Accessibility Metrics: �How accessible is a resource?
Accessibility is a multidimensional concept, and spatial distance is only one component of accessibility. �
Classical model: (Penchansky & Thomas, 1981)
Choosing the “right” measure is complicated
Location optimization considers:
Different access measures could lead to different conclusions about the existence of spatial mismatch and inequity (Talen & Anselin, 1998)
It is important to consider what characterization of access is most suitable for a specific problem
Choosing the “right” measure is complicated
How should distance between the user/patient and the facility be characterized?
What assumptions about travel behaviors are most appropriate?
Access Measurements
Density:
Proximity:
Container Method
A count of facilities (or measure of services) provided by any geographic unit.
For example:
Proximity Methods
Minimum distance: the distance from an origin location to the nearest destination (e.g. methadone providers)�
Travel cost: measure of the total or average distance (or travel time) between each origin and destination�
Gravity potential: resources are weighted by size and adjusted for “friction of distance”
Minimum distance to the nearest methadone provider for all contiguous US zip codes
Live demo
Wrap Up & Discussion
Additional R-Spatial Resources
Opioid Environment Toolkit (HEROP resource): https://geodacenter.github.io/opioid-environment-toolkit/index.html
R Spatial Workshops (CSDS/HEROP resource): https://spatialanalysis.github.io/tutorials/ ��Geocomputation in R (Lovelace et al, 2019): https://geocompr.robinlovelace.net��Spatial Data Science with R: https://rspatial.org/
Other topics & useful tools
Kernel density estimates (KDE)
Approximates the distribution of point data along a continuous surface, so that one could understand how likely an event is to occur across spaces, or the “intensity” of activities.
Resources
Spatial regression models (basics)
y = X𝛽 + u, where E[uu’] = σ2I
y = ρWy + X𝛽 + u
E[uu’] = 𝚺 ≠ σ2I
Spatial regression models
PCA followed by cluster analysis & spatial regression
SDOH Index | Description |
Socioeconomic advantage index (SES) | Include socioeconomic status factors such as poverty and educational level; this measure is strongly correlated with the Sing et al (cite) areal deprivation index. |
Limited mobility index (MOB) | Captures the proportion of older adults and persons with disabilities. |
Urban core opportunity index (URB) | Reflects highly urbanized populations experiencing more opportunities and also high living costs. |
Mixed immigrant cohesion and accessibility index (MICA) | Features immigrant populations with traditional family structures and multiple accessibility stressors. |
Seven SDOH neighborhood typologies were optimized across the continental United States, as presented in this screenshot from the SDOH web application generated to visualize results.
SDOH Neighborhood Typology.
The four indexes/PCs were also used later in spatial regression models to examine how SDoH associates with COVID-19 outcomes differently in rural, urban, suburban areas, accounting for spatial dependence.
Thank you