Spatial Autocorrelation: What is it? Why you should care
Relationship between nearby locations of the realization of a single variable
Positive: Similar values in Similar location. (Clustered)
Negative: Similar values further apart. (Checkerboard pattern)
Uses of Spatial Autocorrelation (Getis, 2010):
A test on model misspecification A measure of spatial effects A test on spatial heterogeneity A means of identifying spatial clusters A way to understand Modifiable Areal Unit Problem (MAUP) A means of identifying outliers (spatial and non spatial) ...
neighbors are similar, more alike than they would be under spatial randomness
s Galton's problem recognizes, there are two principal sources of spatial dependence. The first is an explicitly spatial source, focused on behavioral diffusion. Spatial proximity promotes behavioral interaction; generally, units of interest in the social sciences, be they individuals, states, nation-states, or other units, are more likely to interact with each other if they are spatially proximate to each other. Spatial diffusion occurs when spatially proximate units are influenced directly by the behaviors of their neighbors, and vice versa. From a modeling perspective, positive spatial diffusion corresponds to a positive and significant parameter on a spatially lagged dependent variable, where neighbors are defined via a weights matrix such as those presented in Chapter 2.
Alternatively, neighboring units may share similar behaviors even if there is no behavioral interaction between these units. In this case, the units’ similarity in the behavior of interest is not affected directly by their neighbors’ behavior. Instead, neighboring units in this case share similar behaviors as a result of the geographic clustering of the sources of these behaviors. Such spatial dependence may be called attributional dependence because it is traced not directly to the behavior of neighboring units, but instead to shared attributes at neighboring locations. If we are unable to model fully the attributional sources of spatial dependence in the data generating process (DGP), these sources will produce spatial dependence in the error terms at neighboring locations. This spatial error dependence is modeled via a spatially lagged error term, where the error dependence is again modeled using a weights matrix such as presented in Chapter 2.
Spatial autocorrelation that is diagnosed may be produced neither by a diffusion process nor by an attributional process. Instead, it may be produced by behavioral heterogeneity. Within a modeling context, this will take the form of spatial heterogeneity in parameters. If this parameter heterogeneity is not modeled, spatial dependence will persist in the presence of covariates. Residual spatial dependence in a multivariate model may also be produced by another undiagnosed form of spatial heterogeneity – functional form heterogeneity. Depending on the modeling strategy employed to account for spatial heterogeneity in parameters or functional form, a spatial econometric specification may not be required to model the spatial heterogeneity. Instead, the researcher may choose to model the spatial heterogeneity via standard econometric approaches.
SPATIAL HETEROGENEITY IN PARAMETERS
The intuition of how spatial heterogeneity in parameters may produce univariate spatial autocorrelation is straightforward. If a covariate differs in its effects on the phenomenon of interest to social scientists across observations, and if the effects are similar among neighboring observations, this may produce similar values on the dependent variable. I examine three sets of approaches for modeling spatial heterogeneity in parameters – spatial random coefficients models, spatial switching regressions models for discrete parameter heterogeneity, and spatially varying coefficients models for continuous parameter heterogeneity – in turn next.
SPATIAL RANDOM COEFFICIENTS MODELS
Random coefficients models have received extensive use in econometrics as an approach for modeling heterogeneity in parameters (Swamy 1970; Hsiao 1975). The standard, nonspatial random coefficients specification takes the following form:
y i = X i β i + ε i
β i = β + μ i ,
where β i is no longer assumed constant, but instead is allowed to vary across observations as a function of a mean, β , and a stochastic term, μ i . The random coefficients model induces heteroskedasticity, which is modeled via feasible generalized least squares (FGLS).
In the standard random coefficients model, the stochastic variation, μ i , around the common mean, β , is assumed to be random with regard to the spatial locations of observations. In contrast, in the spatial random coefficients approach, there is spatial dependence in this variation around the mean.
Definition by Getis: it represents the relationship between nearby spatial units, as seen on maps, where each unit is coded with a realization of a single variable