1 of 20

Spatial Autocorrelation is

EVERYWHERE

Pelayo Arbués

Data Scientist

@pelayoarbues

2 of 20

AGENDA

What is Spatial Autocorrelation?

How to test it?

Quick note on Spatial Cross Validation

2

3 of 20

“Everything is related to everything else, but near things are more related than distant things”

Waldo Tobler (1970)

4 of 20

Spatial Autocorrelation: What is it? Why you should care

Relationship between nearby locations of the realization of a single variable

Positive:

Similar values in Similar location. (Clustered)

Negative:

Similar values further apart. (Checkerboard pattern)

Uses of Spatial Autocorrelation (Getis, 2010):

A test on model misspecification
A measure of spatial effects
A test on spatial heterogeneity
A means of identifying spatial clusters
A way to understand Modifiable Areal Unit Problem (MAUP)
A means of identifying outliers (spatial and non spatial)
...

neighbors are similar, more alike than they would be under spatial randomness

s Galton's problem recognizes, there are two principal sources of spatial dependence. The first is an explicitly spatial source, focused on behavioral diffusion. Spatial proximity promotes behavioral interaction; generally, units of interest in the social sciences, be they individuals, states, nation-states, or other units, are more likely to interact with each other if they are spatially proximate to each other. Spatial diffusion occurs when spatially proximate units are influenced directly by the behaviors of their neighbors, and vice versa. From a modeling perspective, positive spatial diffusion corresponds to a positive and significant parameter on a spatially lagged dependent variable, where neighbors are defined via a weights matrix such as those presented in Chapter 2.

Alternatively, neighboring units may share similar behaviors even if there is no behavioral interaction between these units. In this case, the units’ similarity in the behavior of interest is not affected directly by their neighbors’ behavior. Instead, neighboring units in this case share similar behaviors as a result of the geographic clustering of the sources of these behaviors. Such spatial dependence may be called attributional dependence because it is traced not directly to the behavior of neighboring units, but instead to shared attributes at neighboring locations. If we are unable to model fully the attributional sources of spatial dependence in the data generating process (DGP), these sources will produce spatial dependence in the error terms at neighboring locations. This spatial error dependence is modeled via a spatially lagged error term, where the error dependence is again modeled using a weights matrix such as presented in Chapter 2.

Spatial autocorrelation that is diagnosed may be produced neither by a diffusion process nor by an attributional process. Instead, it may be produced by behavioral heterogeneity. Within a modeling context, this will take the form of spatial heterogeneity in parameters. If this parameter heterogeneity is not modeled, spatial dependence will persist in the presence of covariates. Residual spatial dependence in a multivariate model may also be produced by another undiagnosed form of spatial heterogeneity – functional form heterogeneity. Depending on the modeling strategy employed to account for spatial heterogeneity in parameters or functional form, a spatial econometric specification may not be required to model the spatial heterogeneity. Instead, the researcher may choose to model the spatial heterogeneity via standard econometric approaches.

SPATIAL HETEROGENEITY IN PARAMETERS

The intuition of how spatial heterogeneity in parameters may produce univariate spatial autocorrelation is straightforward. If a covariate differs in its effects on the phenomenon of interest to social scientists across observations, and if the effects are similar among neighboring observations, this may produce similar values on the dependent variable. I examine three sets of approaches for modeling spatial heterogeneity in parameters – spatial random coefficients models, spatial switching regressions models for discrete parameter heterogeneity, and spatially varying coefficients models for continuous parameter heterogeneity – in turn next.

SPATIAL RANDOM COEFFICIENTS MODELS

Random coefficients models have received extensive use in econometrics as an approach for modeling heterogeneity in parameters (Swamy 1970; Hsiao 1975). The standard, nonspatial random coefficients specification takes the following form:

y_i = X_iβ_i +ε_i

β_i = β +μ_i,

where β_i is no longer assumed constant, but instead is allowed to vary across observations as a function of a mean, β, and a stochastic term, μ_i. The random coefficients model induces heteroskedasticity, which is modeled via feasible generalized least squares (FGLS).

In the standard random coefficients model, the stochastic variation, μ_i, around the common mean, β, is assumed to be random with regard to the spatial locations of observations. In contrast, in the spatial random coefficients approach, there is spatial dependence in this variation around the mean.

Definition by Getis: it represents the relationship between nearby spatial units, as seen on maps, where each unit is coded with a realization of a single variable

5 of 20

6 of 20

Introducing Uber’s H3

Data points are bucketed in hexagons
Hexagons have regular shapes vs Postal areas, Census tracts and other administrative polygons
H3 supports sixteen resolutions. Each finer resolution has cells with one seventh the area of the coarser resolution.
Square grids have two different neighbors: edge and vertex

Source: https://eng.uber.com/h3/

7 of 20

8 of 20

9 of 20

10 of 20

Testing for Spatial Autocorrelation:

1. Choose a neighborhood criterion ? Which areas are linked?

2. Assign weights to the areas that are linked ? Create a spatial weights matrix

3. Run statistical test, using weights matrix, to examine spatial autocorrelation

11 of 20

Testing for Spatial Autocorrelation:

1. Choose a neighborhood criterion: Which areas are linked?

2. Assign weights to the areas that are linked: Create a spatial weights matrix

3. Run statistical test, using weights matrix, to examine spatial autocorrelation

Source: https://crd230.github.io/

12 of 20

Testing for Spatial Autocorrelation:

1. Choose a neighborhood criterion: Which areas are linked?

2. Assign weights to the areas that are linked: Create a spatial weights matrix (W)

3. Run statistical test, using weights matrix, to examine spatial autocorrelation

13 of 20

Testing for Spatial Autocorrelation:

1. Choose a neighborhood criterion: Which areas are linked?

2. Assign weights to the areas that are linked: Create a spatial weights matrix

3. Run statistical test, using weights matrix, to examine spatial autocorrelation

Global Tests: Moran’s I:

Local Tests: Local Indicators Of Spatial Associations (LISAs):

Local Moran’s I
Getis-Ord G

14 of 20

15 of 20

16 of 20

Global Tests

17 of 20

Local Tests

18 of 20

Cross Validation under Spatial Autocorrelation

Source: https://geocompr.robinlovelace.net/

19 of 20

References

Arribas-Bel, Daniel (2019): Geographic Data Science Course by @darribasbel: http://darribas.org/gds19/

Uber’s H3: https://eng.uber.com/h3/

Spatial Weights Matrix: https://crd230.github.io/lab5.html#spatial_weights_matrix

Geocomputation with R: https://geocompr.robinlovelace.net/

Machine Learning for Spatial Data: http://www.opengeohub.org/machine-learning-spatial-data

Fischer, M. M., & Getis, A. (Eds.). (2010). Handbook of Applied Spatial Analysis. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-03647-7

20 of 20

We data