# Transportation Geography and Network Science/Spatial Econometrics

## Introduction

The idea of exploring spatial data samples which include region interdependent observations, has motivated researchers to introduce spatial econometrics. This field employs both econometrics methods and spatial analysis to investigate the spatial autocorrelation or neighborhood effects among observed variables. The notion of spatial econometrics is entirely distinct from traditional econometrics models in three major features, namely spatial dependency, spatial heterogeneity, and spatial heteroscedasticity. Spatial dependency occurs when either positive or negative correlation is observed between characteristics at nearby locations. Spatial heterogeneity indicates the differences in relationships between dependent and independent variables; while, spatial heteroscedasticity explores the heterogeneity in the variance of the unobserved component among spatial units in a study region. Given these characteristics violate the Gauss-Markov assumptions, including uncorrelated and homoscedastic error terms, they have ignored in traditional econometrics techniques. For instance, Gauss-Markov assumption overlooks the variance changes across the spatial data sample and only assumes a single linear relationship with constant variance. Throughout the past fifty years, a host of studies have attempted to bridge this gap using multivariate methods in spatial analysis, including spatial autocorrelation, spatial interpolation, and spatial interaction. The brief definition of each of which is provided below.

### Spatial Autocorrelation

Spatial autocorrelation statistics include Moran's $I$ , Geary's $C$ , and Hot Spot Analysis (Getis's $G$ ) and evaluate the dependency among observations of spatial units. A spatial weights matrix which pinpoints exactly what relationship is among spatial observations is indicated in advance to explore spatial autocorrelation of data. While a positive component of a spatial weights matrix demonstrates similarity between variables at observed and unobserved locations, a negative component explains a dissimilarity.

### Spatial Interpolation

Variables at unobserved locations are estimated by the spatial interpolation method and employing the values of variables at observed locations. Inverse distance weighting and Kriging methods are commonly applied to estimate value of variables at unobserved locations. The former indicates that spatial effects of variables at unobserved locations diminish continuously with decreasing the distance from the observed location. The later, on the other hand, interpolates the effect of variables at unobserved locations when both systematic and random spatial lag are explored among spatial units.

### Spatial Interaction

Spatial interaction is one of the primary spatial analysis tool which is also known as gravity models. A gravity function contains attractiveness, number of commuters, and opportunities in spatial units, along with proximity relationships between locations. The function parameters, then, are estimated by computational methods and techniques such as ordinary least squares, maximum likelihood, and artificial neural networks.

The above-mentioned characteristics of the spatial analysis and generic methods give a reader a general framework about notion of spatial econometrics analysis. As is discussed in detail in this article, however, quantification of the locational aspects of the data is a cornerstone of spatial dependency, heterogeneity, and heteroscedasticity analysis. The remainder of this article unfolds as follows. A discussion about quantification of the locational aspects of the data is provided in the following section. Spatial econometrics history, then, is mentioned followed by developed methods and models in this arena. The article, finally, ends with a review of spatial econometrics implementation in transportation, along with a conclusion and possible studies for future.

## Quantifying location in models

The spatial data observations are usually mapped on a Cartesian space to show location and contiguity information. The location information is represented by latitude and longitude to calculate distances between observations from any point in space. Observations that are near reflect a greater degree of spatial dependency, while spatial dependency declines with increasing the distance between observations. Moreover, changing (dis)similarity between observations over space implies spatial heterogeneity in the data. The contiguity information, on the other hand, represents the relative position in space among regional unit of observations which reflects the neighboring units concept. The spatial dependency is observed in the data, if and only if the dependency of neighboring units diminishes by moving to units located far apart. The similarity between neighboring units, also, signifies spatial heterogeneity. Two examples of quantifying spatial contiguity and weighted spatial neighborhood matrix are provided below to shed light on the application of aforementioned concepts.

### Quantifying Spatial Contiguity

To capture the notion of spatial contiguity between spatial units, consider a hypothetical regions as shown in Figure 1. Contiguity relations for each region are generally recorded in a symmetric binary matrix. Each element represents the contiguity relationship between two regions. For instance, the matrix element in row 1 and column 3 stands for a contiguity relationship between regions 1 and 3. In a binary configuration, a matrix element denotes by 0 or 1 which records the absence or presence of a contiguity relationship between regions. A number of methods have recommended to construct a contiguity matrix, including Linear contiguity, Rook contiguity, Bishop contiguity, and Queen contiguity. Kelejian and Robinson (1995) discusses each of which in detail. To make a long story short, the notion of elements for each configuration is mentioned below.

Linear contiguity: an element which represents contiguity between two regions equals 1, if and only if, share a common edge to the right or left side.

Rook contiguity: an element which represents contiguity between two regions equals 1, if and only if, share a common side.

Bishop contiguity: an element which represents contiguity between two regions equals 1, if and only if, share a common vertex.

Queen contiguity: an element which represents contiguity between two regions equals 1, if and only if, share a common side or vertex.

It should be kept in mind, further, that the length of shared borders also plays a pivotal role to determine elements of contiguity matrix. A common border between two entities might be short or long which results low or high contiguity, respectively. Overall, the contiguity matrix reflects the spatial dependency in the data, measuring the average influence of neighboring observations on observations in the vector. The efficacy of the matrix in spatial analysis methods is discussed in the methods and models section.

### Quantifying Weighted Spatial Neighborhood Matrix

Spatial weight matrix is cornerstone of spatial econometrics and spatial analysis. A spatial weight matrix reflects a possible relations between spatial units. Each non-negative element represents the spatial influence between two entities without self influence characteristic. In other words, diagonal elements are equal to zero; while, other elements of the symmetric weight matrix are measured by a number of methods. It remains, however, unclear that which configuration operates better than other possible configuration. Hence, selecting a conceptualization that best reflects the actual interaction between spatial units of the data is generally recommended by practitioners. For instance, some form of inverse distance is probably most appropriate, when you are measuring seed-propagating tree in a forest. The travel time or travel cost conceptualization, on the other hand, might has a better fit if you are exploring the geographic distribution of commuters among traffic analysis zones. From the quantifying perspective, spatial weight matrix configuration might fall into two categories: a binary pattern or a variable weighting model. Each of which encompasses a wide spectrum of methods. Contiguity matrix, fixed distance, Delaunay Triangulation, and space-time window are commonly employed to create a binary configuration; while, Inverse distance and zone of indifference methods are proposed to measure a variable weighting matrix.

## Spatial Econometrics History: A Life-Cycle Perspective

The use of "Spatial Econometrics" term reverts back to 1974 in which Jean Paelinck argued over the need of providing a methodological foundation for urban econometric models at the Annual Meeting of the Dutch Statistical Association in Tilburg. Following that, Jean Paelinck and Leo Klaassen published an article entitled Spatial Econometrics in 1979.

The historical overview of the spatial econometrics might be categorized into three major phases, namely birth, growth, and maturity as per Figure 2. The birth stage launched from 1970 and developed until the late 1980s. the basic concepts, including spatial econometrics, estimation methods, spatial statistics, and spatial data analysis were represented at this era and became the cornerstone of the growth stage. It is imperative to mention that two important progresses in geography and regional science provided further impetus for evolution of the birth phase. One was the revolution of quantitative analysis in geography concurrently with introducing spatial analysis by Berry and Marble book entitled "Spatial Analysis: A Reader in Statistical Geography" in 1968. The second progress was that spatial effects began to take root into operational models of regional science and regional and urban economics arena. Granger in 1969 and Fisher in 1971 made an effort to introduce spatial methods and the extract of the attempts was published as the Econometric Estimation with Spatial Dependence in the applied economic literature.

The growth era witnessed the influx of many regional scientists and geographers who was interested in spatial regression questions such as Brundson, Boots, Tiefelsdorf, and Fotheringham. Following that, a new generation including students of active scholars in the birth phase have striven continuously to expand the spatial econometrics field. For instance, Case in 1991 conducted a study to explore the spatial relationships in Indonesian demand for rice by applying a spatial random effects model. McMillen in 1992, further, endeavored to remedy the inconsistency of Probit with spatial autocorrelation which is deeply rooted in heteroskedastic errors by employing a weighted least squares and a maximum-likelihood estimators. Ever since, a wide range of studies have flourished rigorously in public economics, urban economics, real estate economics, and development. Furthermore, spatial econometrics studies contributed in the literature to address model specification, estimation and testing issues. Kelejian and Robinson in 1995, for instance, developed the spatial error components method, and Moran's I applied in two stage least squares regression by Anselin and Kelejian in 1997 . At the same time, geographically weighted regression, a pivotal step in spatial econometrics models, evolved by Fotheringham between 1997 and 1999 to capture spatial heterogeneity . A further unique characteristic of the growth era was revolution of statistical software to ease computational analysis for spatial econometrics. Estimation of spatial regression models became practical by NCGIA of SpaceStat in 1992. The S+SpatialStats and Matlab toolboxes are another commercial packages that introduce contemporaneously. Anecdotal evidence shows that spatial econometrics has reached the impressive acceptance among researchers and practitioners by the early 21st century and transferred to maturity phase. Publishing numerous articles and special issues, handbook chapters, evolving software and commercial packages, growing job opportunities and research funding might are a good indication of starting a maturity era. Spatial econometrics notion also continues to permeate into crime analysis, epidemiology, and public health. Several text books including "Spatial Data Analysis: Theory and Practice", "Testing panel data regression models with spatial error correlation", "Spatial Analysis: A Guide for Ecologists", and "Bayesian Disease Mapping, Hierarchical Modeling in Spatial Epidemiology" have become more available in this phase. Modeling, also, enters a stage of maturity by introducing state-of-the-art methods, namely spatial panel models, models for spatial latent variables, and models for flows. Specification testing and computational aspects and software have received considerably more attention than growth phase. Lagrange Multiplier tests, for example, have extended remarkably to detect multiple sources of misspecification, different types of spatial error correlation, and model selection strategies.

## Methods and Models

Spatial econometrics models have attracted a notable attention throughout the past decades. From the modeling perspective, a general approach is frequently applied to test the spatial interaction effects in most empirical studies. Manski in 1993 introduced a model to explore three different interaction effects among spatial observations, namely endogenous interaction effects, exogenous interaction effects, and correlated effects. Endogenous interaction effects are considered when the decision of a spatial unit depends on the decision of other spatial units. The exogenous interaction effects are tested where the decision of a spatial unit depends on independent explanatory variables of the decision of other spatial units. The correlated effects, as discussed before, are also measured in models when similar unobserved spatial characteristics result in similar behavior. The spatial weight matrix is indicated by $W$ . The Manski's general model is formulated by equation 1; where, $WY$ stands for the endogenous interaction effects, $WX$ shows the exogenous interaction effects, and $Wu$ indicates the interaction effects among the disturbance terms of spatial units. Further, $\rho$ is called the spatial autoregressive coefficient, $\gamma$ the spatial autocorrelation coefficient, and both $\beta$ and $\theta$ are vector of fixed but unknown parameters.

$Y=\rho {WY}+\gamma {l_{N}}+\beta {X}+\theta {WX}+(\lambda {Wu}+\epsilon )$ (1)

The Manski's equation collapses to Kelejian-Prucha model and spatial Durbin model when $\theta$ and $\lambda$ equal zero, respectively. In other words, Kelejian-Prucha model considers spatially lagged dependent variable and a spatially autocorrelated error term; while, spatial Durbin model which was introduced by Anselin in 1988 tests a spatially lagged dependent variable and spatially lagged independent variables. Hence, applying more constrains on the general model derives two primary models: 1) spatial lag model and 2) spatial error model that are formulated by equations 2 and 3, respectively.

$Y=\rho {WY}+\gamma {l_{N}}+\beta {X}+\epsilon$ (2)
$Y=\gamma {l_{N}}+\beta {X}+(\lambda {Wu}+\epsilon )$ (3) 

Looking at the primary models, it is found that forcing ρ in spatial lag model and γ in spatial error model into zero forms the simple regression model. Hence, to test the spatial interaction effects many scholars start with a simple ordinary least square model which is known as the specific to general approach.

From the estimation methods perspective, three major methods including maximum likelihood, generalized method of moments, and Bayesian Markov Chain Monte Carlo are applied largely by researchers and practitioners. The pros and cons of the methods are broadly argued in the literature. The generalized method of moments, for instance, does not rely on the "normality of the disturbances" assumption and eases he computational difficulties of previous methods. The Jacobian term, however, is ignored in this method which triggers possibility of ending up with a coefficient estimate outside their parameter space. Nevertheless, Fingleton and Le Gallo in 2008 demonstrates that generalized method of moments is valuable where linear spatial models encompass one or more endogenous explanatory variables. Models counting spatial lag are simply estimated by two-stage least squares methods. ; while, the Bayesian Markov Chain Monte Carlo and maximum likelihood methods are extremely useful to estimate spatial error model and the spatial Durbin error model. The Bayesian Markov Chain Monte Carlo method also incorporate regional science ideas, including 1) a decay of sample data influence with distance, 2) similarity of observations to neighboring observations, 3) a hierarchy of place or regions, and 4) systematic change in parameters with movement through space as subjective prior information.

Although significant advances in estimation methods and spatial econometrics modeling have been made over the past 25 years, literature is scant in the spatial weights matrix ($W$ ) estimation. One of the major weakness of the current spatial econometric models is that the spatial weights matrix is determined in advance. However, two general configurations literally are employed including a matrix which non-diagonal elements are measured by ${\frac {1}{d^{2}}}$ and a matrix whose elements are measured by $e^{2d}$ in which $d$ stands for the distance between two units. Further, the current literature proposes a number of statistical tests such as goodness of fit and select the best matrix structure to the extent possible.

## Applications

This part of the article takes a quick look at the studies in transportation arena which employed spatial econometrics models. Skimming the previous researches gives a transportation planner a flavor of spatial econometrics application. In the context of transportation, possible implementations will be discussed in the class!

## Conclusions

Throughout the past decades, regional science, geographic, urban planning, and transportation researchers have interested themselves growingly in applying spatial econometrics on various issues. This article provides an overview of the birth, growth, majority, and application of the spatial econometrics and analysis to the extent possible. The main distinction between a general econometrics and a spatial econometrics model is that the later is able to capture spatial dependency, spatial heterogeneity, and spatial heteroscedasticity. There is growing evidence that posit ignoring spatial dependency, spatial heterogeneity, and spatial heteroscedasticity notions begets model misspecification in numerous research questions and mislead long- and short- term policies. The core questions, however, researchers and practitioners seek to explore in this arena are: which, how, and why spatial units are interdependent? Answering these questions have enlarged the circle of ideas for not only exploring state-of-the-art methods, but endeavoring to quantify the interdependent spatial relationships between spatial units. There is a modest literature, however, examining the interdependency between spatial entities in a specific weight matrix configuration. In other words, knowledge of examining spatial weight matrix is still a toddler which has being attracted a remarkable attention in the recent years. The current literature indicates that as a consequence manifest, various hypotheses about spatial dependency result in different weight matrix configuration including, radial distance weights, power distance weights, exponential distance weights, spatial contiguity weights, and shared-boundary weights. It remains, however, unclear which one is fitting best the problem.

On the method and model side, a number of methods have introduced in recent decades, all of which originated from a simple regression or ordinary least square model. The Manski's general model might be known as a mature method which contains all types of dependency between spatial entities. On the employing of models side, a mushrooming studies have been provided to explore the influence of spatial interdependency on various research subjects. However, a significant number of opening research ideas remains in other transportation subjects which are definitely worth exploring in the future.