Sesión 4: Contrastes de dependencia espacial univariante: técnicas avanzadas de AEDE Profesora: Coro Chasco Yrigoyen Universidad Autónoma de Madrid 17 a 21 de mayo, 2010 2010, Coro Chasco Yrigoyen All Rights Reserved Índice del Curso S1: Introducción a la Econometría Espacial SP1: Introducción al programa GeoDa S2: Efectos espaciales: dependencia espacial S3: Análisis Exploratorio de Datos Espaciales (AEDE): técnicas básicas SP2: AEDE en GeoDa: técnicas básicas S4: Contrastes de dependencia espacial: técnicas avanzadas de AEDE S5: Análisis confirmatorio de datos espaciales: especificación de los modelos de dependencia espacial SP3: AEDE en GeoDa: técnicas avanzadas S6: Estimación y contrastes de un modelo de regresión espacial por el método de Mínimos Cuadrados Ordinarios S7: Estimación y contraste de los modelos de dependencia espacial SP4: El módulo de regresión espacial en el programa GeoDa S8: Estimación y contraste del modelo del error espacial y estrategias de modelización espacial. SP5: Aplicación de la estrategia de modelización clásica a casos prácticos con el programa GeoDa @ 2010, Coro Chasco Yrigoyen All Rights Reserved 2 . CHASCO, C. (2003), “Econometría espacial aplicada a la predicción-extrapolación de datos microterritoriales”. Comunidad de Madrid; pp. 62-78. Session 4 Overview and Goals Global spatial autocorrelation 1. 2. 3. 4. 5. Moran’s I Moran’s scatterplot Geary’s c Mantel’s Getis and Ord’s G(d) Local spatial autocorrelation 1. Getis and Ord’s local statistics 4. LISA tests & maps Bivariate & space-time spatial autocorrelation Spatial autocorrelation tests for rates @ 2010, Coro Chasco Yrigoyen All Rights Reserved 3 Session 4 4.1. Global spatial autocorrelation Used to test for the presence of general spatial trends in the distribution of a geographical variable over a whole space. But how can we determine the existence of spatial autocorrelation? 4.1.1. 4.1.2. 4.1.3. 4.1.4. 4.1.5. Moran’s I Moran’s scatterplot Geary’s c Mantel’s Getis and Ord’s G(d) @ 2010, Coro Chasco Yrigoyen All Rights Reserved Rta. disp. por hab. (1997) (miles ptas.) 1.400 a 1.800 1.125 a 1.400 900 a 1.125 4 Session 4 . MORAN, P. (1948), “The interpretation of statistical maps”. Journal of the Royal Statistical Society B, vol. 10; pp. 243-251. 4.1. Global spatial autocorrelation 4.1.1. Moran’s I Moran’I theoretical mean: E(I) = W* Possitive aut. Negativ aut. N: sample size @ 2010, Coro Chasco Yrigoyen All Rights Reserved 5 . CLIFF, A. y J. ORD (1973), “Spatial autocorrelation”. London: Pion. . CLIFF, A. y J. ORD (1981), “Spatial processes, models and applications”. London: Pion Session 4 4.1. Global spatial autocorrelation 4.1.1. Moran’s I (II) For N , z(I) follows a standard normal distribution: z(I) N(0,1) Inference is typically based on a standardized z-value, Assumptions: Normalisation: the variable X follows an asymptotic normal distribution. Randomisation by permutation: unknown distribution function for X @ 2010, Coro Chasco Yrigoyen All Rights Reserved 6 Session 4 4.1. Global spatial autocorrelation 4.1.1. Moran’s I (III) Normalisation: the variable X follows a normal distribution 1) For N , zN(I) follows a standard normal distribution: zN(I) N(0,1) 2) Significance of zN(I): in a standard normal table 1 EN I N 1 VarN I 4 AN 2 8 A D N 12 A2 4 A2 N 2 1 1 N A Li S0 2 i 1 @ 2010, Coro Chasco Yrigoyen All Rights Reserved 1 n D Li Li 1 2 i 1 7 Session 4 4.1. Global spatial autocorrelation 4.1.1. Moran’s I (IV) Permutation: randomisation with unknown distribution function 1) A reference distribution for I is generated empirically. 2) Randomly permuting observations & computing Moran’s for a set of n! new samples 3) E[I] & SD[I] are computed directly from the generated distribution of Moran’s Is 4) Significance of z(I): in a standard normal table. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 8 4.1. Global spatial autocorrelation 4.1.1. Moran’s I (V) @ 2010, Coro Chasco Yrigoyen All Rights Reserved 9 Session 4 4.1. Global spatial autocorrelation 4.1.1. Moran’s I (VI) Interpretation: Non-significant values for z(I) should be interpreted as a rejection of H0(no spatial autocorrelation). Significant z(I) > 0 positive spatial autocorrelation: it is possible to find out similar high/low values of a variable X spatially clustered than could be by chance. Significant z(I) < 0 negative spatial autocorrelation: there is a lack of similar high/low values of X spatially clustered than could be by chance. This pattern is perfectly represented by a checkerboard. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 10 Session 4 4.1. Global spatial autocorrelation 4.1.1. Moran’s I (VII) A negative significant z(I): spatial autocorrelation (lack of clustering more than would be in a random pattern) @ 2010, Coro Chasco Yrigoyen All Rights Reserved 11 Session 4 . CLIFF, A. y J. ORD (1981), “Spatial processes, models and applications”. London: Pion; chapter 5. 4.1. Global spatial autocorrelation 4.1.1. Moran’s I (VIII) Correlogram: an analytic method that is of value in assessing the spatial scale of a process. Sometimes the strength of spatial interaction will vary in a complex way with distance. Higher-order spatial autocorrelation: spatial correlogram 1.5 Z (I) M ORAN 1 0.5 0 1 -0.5 -1 -1.5 @ 2010, Coro Chasco Yrigoyen All Rights Reserved 12 2 3 4 5 6 7 8 9 Session 4 4.1. Global spatial autocorrelation 4.1.2. Moran’s scatterplot Visualizes I as the slope of the regression line in a scatterplot with Wz on Y-axis and z on X-axis. . ANSELIN, L. & S. BAO (1997), “Exploratory Spatial Data Analysis”. In “Recent developments in spatial analysis” (Eds. Fischer y Getis), Springer-Verlag, Berlín; pp. 35-59. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 13 Session 4 4.1. Global spatial autocorrelation 4.1.2. Moran’s scatterplot (II) Moran scatterplot map II I (-) (+) III IV (+) (-) Moran scatterplot @ 2010, Coro Chasco Yrigoyen All Rights Reserved 14 Session 4 4.1. Global spatial autocorrelation 4.1.2. Moran’s scatterplot (III) @ 2010, Coro Chasco Yrigoyen All Rights Reserved 15 Session 4 4.1. Global spatial autocorrelation 4.1.3. Geary’s c N 1 2 wij xi x j c 2 S0 N x x i 1 2 2 i Geary’s c theoretical mean: E(c) = 1 Perfect possitive aut.: c = 0, xi xj xi – xj = 0 Geary’s c: depends on the (absolute) difference between neighboring values of a variable. It is similar to the Durbin-Watson test. It’s a variance test. Moran’s I: depends on the difference between each value of X variable and its mean. It is similar to the Pearson correlation coefficient. It’s a covariance test. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 16 Session 4 4.1. Global spatial autocorrelation 4.1.3. Geary’s c (II) For N , z(c) follows a standard normal distribution: z(c) N(0,1) Inference is typically based on a standardized z-value, c E c z c SD c Normalisation: the variable X follows an asymptotic normal distribution. Randomisation by permutation: unknown distribution function for X @ 2010, Coro Chasco Yrigoyen All Rights Reserved 17 Session 4 4.1. Global spatial autocorrelation 4.1.3. Geary’s c (III) Interpretation: c E c z c SD c Non-significant values for z(c) should be interpreted as a rejection of H0(no spatial autocorrelation). Significant z(c) < 0 positive spatial autocorrelation: it is possible to find out similar spatially clustered high/low values of a variable X than it would be by chance. Significant z(I) > 0 negative spatial autocorrelation: there is a lack of clustered similar high/low values of X than it would be by chance. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 18 Session 4 4.1. Global spatial autocorrelation 4.1.4. Mantel’s Mantel (1967): matrix association index, which is the sum of the cross-product of the coincident elements of matrices A, B: a ij bij i j wij xi x j Moran’s I x x i j 2 Geary’s c Spatial association measures can be obtained, in general, expressing similarities by means of matrices: 1) spatial similarity (e.g., the spatial weight matrix) and 2) value similarities. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 19 Session 4 4.1. Global spatial autocorrelation 4.1.5. Getis and Ord G(d) Spatial autocorrelation is measured as a distanced-based or spatial clustering measure. For this test, two spatial units are neighbors if they are located at a certain distance (d). N G d N w d x x i 1 j 1 N ij x x i X>0 j ; N i 1 j 1 i j for j i W = binary, symmetric It measures the association degree existent between the values of X around “i” and the association in the value of X around “j” j @ 2010, Coro Chasco Yrigoyen All Rights Reserved 20 Session 4 4.1. Global spatial autocorrelation @ 2010, Coro Chasco Yrigoyen All Rights Reserved 21 Session 4 4.2. Local spatial autocorrelation Concentration -in a particular zone of the global space- of particularly high/low values of a variable more than the expected mean value (or mean of the variable). This phenomenon takes place in non-stationary spatial processes: spatial dependence changes with location. Sometimes there is no global spatial autocorrelation in a variable but small spatial clusters, in which it takes a significant concentration/lack of high values. Sometimes there is global spatial autocorrelation in a variable, but each region contributes differently to it. TESTS: 1. Getis and Ord’s local statistics 2. LISA tests. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 22 Session 4 4.2. Local spatial autocorrelation 4.2.1. Getis and Ord’s local statistics Gi(d), Gi*(d), New Gi(d), New Gi*(d) Gi(d) measures the concentration (or lack or it) of the weighted sum of values of variable Y in a subregion of “j” locations around “i” in the global space. GLOBAL N G d LOCAL N N wij d xi x j i 1 j 1 N N xi x j i 1 j 1 Gi d w d x j 1 ij N x j 1 @ 2010, Coro Chasco Yrigoyen All Rights Reserved j j ji ; for x j 0 W binary & symmetric 23 Session 4 4.2. Local spatial autocorrelation 4.2.1. Getis and Ord’s local statistics (II) Gi*(d): Local spatial concentration also considers the value of variable X in “i”. Since wii = 0, the only difference with Gi(d) is only in the denominator. @ 2010, Coro Chasco Yrigoyen All Rights Reserved N Gi d w d x j 1 ij j N x j 1 j j for x j 0 W binary & symmetric 24 Session 4 4.2. Local spatial autocorrelation 4.2.1. Getis and Ord’s local statistics (III) New Gi(d), New Gi*(d): the standardized versions of Gi(d) and Gi*(d). They distribute as normal variables. Significant positive values of these tests = positive spatial autocorrelation = concentration of high values of the variable. Significant negative values of these tests = positive spatial autocorrelation = concentration of low values of the variable. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 25 Session 4 Anselin, L. (1995). “Local indicators of spatial association - LISA.” Geographical Analysis 27, 93–115. 4.2. Local spatial autocorrelation 4.2.2. LISA tests & maps LISA: Local Indicators of Spatial Autocorrelation Detect the contribution of each location to global spatial autocorrelation Local spatial autocorrelation statistics are useful to identify hot spots: Spatial concentration of high/low values or Spatial outliers Local autocorrelation is always present in global spatial autocorrelation, but it can also exist in the absence of it. Local Moran’s I is the most popular. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 26 Session 4 4.2. Local spatial autocorrelation Anselin, L. (1995). “Local indicators of spatial association - LISA.” Geographical Analysis 27, 93–115. 4.2.2. LISA tests… (II) Gives an indication of the extent of significant spatial clustering of similar values around one observation “i”. The sum of LISAs for all observations is proportional to the global Moran’s I. Local Moran’s I (LISA) zi, zj: standaridzed yi values For a row-standardised W OBS I_DIST01 1 168.0678 2 -1.155578 3 -0.88391 4 0.044727 5 -5.440304 Z_DIST01 -1.845431 0.842808 0.342822 0.291255 1.480166 P_DIST01 0.000557 0.677512 0.673284 0.950351 0.019687 The moments for Ii statistic, under the null hypothesis of no spatial association, can be derived for a randomisation hypothesis. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 27 Session 4 4.2. Local spatial autocorrelation 4.2.2. LISA tests & maps (III) Non-significant Moran’s I over the whole area of the Spanish provinces @ 2010, Coro Chasco Yrigoyen All Rights Reserved 28 4.2. Local spatial autocorrelation 4.2.2. LISA tests & maps (III) Session 4 @ 2010, Coro Chasco Yrigoyen All Rights Reserved 29 . LÓPEZ, F. & C. CHASCO (2004), “Space-time lags: Specification strategy in spatial regression models ”. REAL 04-T17, http://www2.uiuc.edu/unit/real/d-paper/real04-t-17.pdf . Session 4 4.3. Bivariate & space-time plots Bivariate Moran spatial autocorrelation Multivariate spatial correlation Wartenberg, 1985 Moran space-time autocorrelation Anselin et al., 2002 mkl zkW zl s zk [Yk k ]/ k z kWzl I kl z z k Our proposal I t k ,t zt kWzt zt k zt k zt [Yt t ] / t z l [Yl l ] / l z t k [Yt k t k ] / t k Ws is a doubly standardized W @ 2010, Coro Chasco Yrigoyen All Rights Reserved 30 Session 4 4.3. Bivariate & space-time plots (II) It-k,t: Moran space-time autocorrelation coefficient Moran’s I value coincides with the slope Employment rate (E) of the regression line of Wzt on zt-k p-value=0.001 p-value=0.427 Wzt Zt-k t=2002 k=4 W:contiguity matrix (0-1) Population (P) @ 2010, Coro Chasco Yrigoyen All Rights Reserved 31 4.3. Bivariate & space-time plots (IV) Bivariate LISA @ 2010, Coro Chasco Yrigoyen All Rights Reserved 32 Session 4 Session 4 Anselin, L. (2005). “Exploring spatial data with GeoDa” University of Illinois, Urbana-Champaign. 3.3. Spatial correlation analysis for rates El cálculo e inferencia de los estadísticos de dependencia espacial se basa en el supuesto de estacionariedad de las variables originales (media y varianza son constantes en el espacio). Varianza constante: suele incumplirse cuando las observaciones son muy diferentes entre sí en términos de superficie o población. Cuando las variables están transformadas en tasas o proporciones, suele haber muchas observaciones cuyo valor es muy pequeño o nulo (en áreas pequeñas o despobladas) o también valores atípicos extremadamente elevados (cuando se produce algún suceso inesperado en áreas pequeñas). Situación habitual en variables expresadas como tasas de eventos ligados a la población (tasa de mortalidad, paro, delitos, etc.) @ 2010, Coro Chasco Yrigoyen All Rights Reserved 33 Session 4 Anselin, L. (2005). “Exploring spatial data with GeoDa” University of Illinois, Urbana-Champaign. 3.3. Spatial correlation analysis for rates En medicina y epidemiología: se utilizan métodos para eliminar (o aminorar) la heteroscedasticidad en las ratios. El cálculo directo de tasas (ej. mortalidad) como cociente entre el número de fallecidos y la población total existente en una unidad geográfica puede estar sesgado, dado que la población expuesta al riesgo (el denominador de la tasa) puede diferir mucho de unos lugares a otros. En el campo económico microterritorial, es fácil obtener tasas o proporciones con valores nulos (o casi nulos) en unidades muy pequeñas o de escasa población (zonas rurales, con población envejecida), dificultando o sesgando la comparación inter-territorial en términos de dicha variable. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 34 Session 4 Anselin, L. (2005). “Exploring spatial data with GeoDa” University of Illinois, Urbana-Champaign. 3.3. Spatial correlation analysis for rates Estandarización Empírica Bayesiana (EB): estandarización o alisado de las tasas. Método directo de estandarización de tasas (Sáez y Saurina, 2007, pág. 34): pˆ i pi ni n pi xini xi, el valor de la variable X en la unidad espacial i ni la población de i n la población del conjunto total de unidades del sistema. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 35 Session 4 Anselin, L. (2005). “Exploring spatial data with GeoDa” University of Illinois, Urbana-Champaign. 3.3. Spatial correlation analysis for rates Estandarización Empírica Bayesiana (EB): el implementado en el programa GeoDa (Anselin, 2005). @ 2010, Coro Chasco Yrigoyen All Rights Reserved 36 Session 4 Anselin, L. (2005). “Exploring spatial data with GeoDa” University of Illinois, Urbana-Champaign. 3.3. Spatial correlation analysis for rates Assunçao y Reis (1999) adaptan el estadístico I de Moran para evitar el sesgo propio de esta situación. @ 2010, Coro Chasco Yrigoyen All Rights Reserved 37