Skip to contents

Introduction

This vignette demonstrates the spatialunitroot R package, which implements the spatial unit root diagnostic tests and transformations of Müller & Watson (2024, Econometrica). The package is an R port of the SPUR Stata package by Becker, Boll & Voth (2025, Stata Journal).

Spatial data often exhibit strong dependence across locations. Just as time series with unit roots can produce spurious regressions, spatial processes with (near) unit roots can lead to misleading inference. This package provides tools to:

  1. Diagnose spatial unit roots using I(1) and I(0) tests
  2. Remove spatial unit roots via spatial differencing transformations
  3. Characterize spatial persistence via half-life confidence intervals

Throughout, we replicate the numbers published in Appendix A (Table A.1) of Becker, Boll & Voth (2025), which reproduce the Chetty et al. (2014) results of Müller & Watson (2024). The dataset used there ships with this package, so the published table can be reproduced directly.

The Chetty et al. (2014) Commuting Zone Data

We use the dataset from Chetty, Hendren, Kline & Saez (2014), which contains measures of intergenerational mobility and its correlates across 722 US commuting zones. This is the same data the Stata package uses in its Appendix A.

data(chetty)
str(chetty[, 1:10], vec.len = 2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    722 obs. of  10 variables:
##  $ state    : chr  "TN" "TN" ...
##  $ cz       : num  100 200 301 302 401 ...
##  $ czname   : chr  "Johnson City" "Morristown" ...
##  $ s_1      : num  36.5 36 ...
##  $ s_2      : num  -82.4 -83.4 ...
##  $ am       : num  38.4 37.8 ...
##  $ fracblack: num  0.0208 0.0198 ...
##  $ racseg   : num  0.0904 0.0932 ...
##  $ segpov25 : num  0.0302 0.0279 ...
##  $ fraccom15: num  0.325 0.276 ...

Spatial coordinates are stored in s_1 (latitude) and s_2 (longitude), following the convention of Müller & Watson’s scpc package. When latlong = TRUE, distances are computed as great-circle distances on the sphere.

Diagnosing Spatial Unit Roots

We illustrate the procedure using absolute upward mobility (am), the key outcome variable in Chetty et al. (2014) and the first row of Table A.1.

I(1) Test: Is there a spatial unit root?

The I(1) test has the null hypothesis of a spatial unit root. A high p-value means we cannot reject the presence of a unit root.

set.seed(42)
t_i1 <- spurtest(am ~ 1, data = chetty, coords = ~ s_1 + s_2,
                 type = "i1", latlong = TRUE, q = 15, nrep = 10000)
summary(t_i1)
## 
## Spatial I(1) Test Summary
## =======================================
## 
## Test Statistic :  5.75884 
## P-value        :  0.3874 
## HA Parameter   :  9.238 
## 
## Critical Values:
##   1%  :   9.7841
##   5%  :   8.5872
##   10% :   7.8594
## 
## Test Configuration:
##   Type        :  i1 
##   q (components) :  15 
##   MC replications:  10000

With a p-value of 0.39, we fail to reject the spatial unit root null. Table A.1 of Becker, Boll & Voth (2025) reports an I(1) p-value of 0.38 for absolute mobility — the small difference is Monte Carlo simulation noise.

I(0) Test: Is the variable spatially stationary?

The I(0) test has the null hypothesis of no spatial unit root (spatial stationarity). A low p-value indicates strong evidence against spatial stationarity.

set.seed(42)
t_i0 <- spurtest(am ~ 1, data = chetty, coords = ~ s_1 + s_2,
                 type = "i0", latlong = TRUE, q = 15, nrep = 10000)
summary(t_i0)
## 
## Spatial I(0) Test Summary
## =======================================
## 
## Test Statistic :  3.04364 
## P-value        :  0.0011 
## HA Parameter   :  41.56 
## 
## Critical Values:
##   1%  :   2.5225
##   5%  :   2.1020
##   10% :   1.9393
## 
## Test Configuration:
##   Type        :  i0 
##   q (components) :  15 
##   MC replications:  10000

The p-value is essentially zero, so we strongly reject the null that am is I(0). This matches Table A.1, which reports an I(0) p-value of 0.00 for absolute mobility. Together, the I(0) and I(1) tests provide strong evidence that am has a spatial unit root.

Spatial Half-Life

The half-life is the distance at which spatial correlation drops to one-half. We report it in normalized units (fractions of the maximum pairwise distance), as in Table A.1.

set.seed(42)
hl <- spurhalflife(am ~ 1, data = chetty, coords = ~ s_1 + s_2,
                   latlong = TRUE, normdist = TRUE, q = 15, nrep = 10000)
summary(hl)
## 
## Spatial Half-Life Confidence Interval Summary
## =========================================
## 
## Confidence level :  95 %
## CI Lower bound   :  0.0918182 
## CI Upper bound   : inf
## Max distance     :  4549672 
## Normalized units :  TRUE 
## 
## Configuration:
##   q (components)  :  15 
##   MC replications :  10000

The half-life confidence interval for am is approximately [0.09, ∞] in fractions of the maximum pairwise distance, matching the published interval of [0.09, ∞]. The unbounded upper bound is a hallmark of a (near) unit root.

Reproducing Table A.1

We now assess spatial persistence across a representative set of the variables in Table A.1, comparing the R output to the published Stata results. For each variable we report the I(1) and I(0) test p-values and the half-life confidence interval, with the value from Becker, Boll & Voth (2025) in parentheses.

# Variable -> published Table A.1 values: c(I1, I0, HL_lower, HL_upper)
paper <- list(
  am        = c(0.38, 0.00, 0.09, Inf),
  fracblack = c(0.11, 0.01, 0.04, Inf),
  racseg    = c(0.01, 0.13, 0.00, 0.28),
  segpov25  = c(0.28, 0.03, 0.05, Inf),
  fraccom15 = c(0.57, 0.00, 0.14, Inf),
  hipc      = c(0.13, 0.14, 0.02, Inf),
  gini      = c(0.78, 0.00, 0.26, Inf),
  tsr       = c(0.23, 0.13, 0.05, Inf),
  hsdrop    = c(0.09, 0.02, 0.03, Inf),
  scind     = c(0.72, 0.00, 0.22, Inf),
  manshare  = c(0.20, 0.00, 0.06, Inf),
  tlfpr     = c(0.51, 0.00, 0.12, Inf),
  fracfor   = c(0.56, 0.04, 0.17, Inf)
)

fmt_inf <- function(x) if (is.infinite(x)) "∞" else sprintf("%.2f", x)

results <- do.call(rbind, lapply(names(paper), function(v) {
  set.seed(42)
  t1 <- spurtest(as.formula(paste(v, "~ 1")), data = chetty,
                 coords = ~ s_1 + s_2, type = "i1",
                 latlong = TRUE, q = 15, nrep = 5000)
  set.seed(42)
  t0 <- spurtest(as.formula(paste(v, "~ 1")), data = chetty,
                 coords = ~ s_1 + s_2, type = "i0",
                 latlong = TRUE, q = 15, nrep = 5000)
  set.seed(42)
  hl <- spurhalflife(as.formula(paste(v, "~ 1")), data = chetty,
                     coords = ~ s_1 + s_2, latlong = TRUE,
                     normdist = TRUE, q = 15, nrep = 5000)
  p <- paper[[v]]
  data.frame(
    Variable    = v,
    `I(1) p`    = sprintf("%.2f (%.2f)", t1$p_value, p[1]),
    `I(0) p`    = sprintf("%.2f (%.2f)", t0$p_value, p[2]),
    `HL lower`  = sprintf("%.2f (%.2f)", hl$ci_lower, p[3]),
    `HL upper`  = sprintf("%s (%s)", fmt_inf(hl$ci_upper), fmt_inf(p[4])),
    check.names = FALSE
  )
}))

knitr::kable(
  results,
  caption = "Reproducing Table A.1 of Becker, Boll & Voth (2025). Each cell shows the R result with the published Stata value in parentheses."
)
Reproducing Table A.1 of Becker, Boll & Voth (2025). Each cell shows the R result with the published Stata value in parentheses.
Variable I(1) p I(0) p HL lower HL upper
am 0.39 (0.38) 0.00 (0.00) 0.10 (0.09) ∞ (∞)
fracblack 0.12 (0.11) 0.01 (0.01) 0.04 (0.04) ∞ (∞)
racseg 0.01 (0.01) 0.13 (0.13) 0.00 (0.00) 0.29 (0.28)
segpov25 0.28 (0.28) 0.03 (0.03) 0.05 (0.05) ∞ (∞)
fraccom15 0.56 (0.57) 0.00 (0.00) 0.13 (0.14) ∞ (∞)
hipc 0.14 (0.13) 0.14 (0.14) 0.02 (0.02) ∞ (∞)
gini 0.78 (0.78) 0.00 (0.00) 0.25 (0.26) ∞ (∞)
tsr 0.23 (0.23) 0.13 (0.13) 0.05 (0.05) ∞ (∞)
hsdrop 0.10 (0.09) 0.02 (0.02) 0.04 (0.03) ∞ (∞)
scind 0.71 (0.72) 0.00 (0.00) 0.22 (0.22) ∞ (∞)
manshare 0.21 (0.20) 0.01 (0.00) 0.06 (0.06) ∞ (∞)
tlfpr 0.51 (0.51) 0.00 (0.00) 0.12 (0.12) ∞ (∞)
fracfor 0.55 (0.56) 0.04 (0.04) 0.16 (0.17) ∞ (∞)

The R implementation reproduces the published p-values and half-life intervals to within Monte Carlo error. As Table A.1 shows, the overwhelming majority of the Chetty et al. (2014) variables fail to reject the I(1) null and reject the I(0) null, indicating strong spatial persistence.

Correcting Spurious Regression

A key application is diagnosing and correcting spurious spatial regression. Following the algorithm in Figure 5 of Becker, Boll & Voth (2025): once a variable cannot be rejected as I(1), we difference all variables (dependent and independent) before re-estimating. We illustrate the workflow with a regression of am on fracblack and racseg.

Step 1: Naïve OLS

m_ols <- feols(am ~ fracblack + racseg, data = chetty, se = "standard")
summary(m_ols)
## OLS estimation, Dep. Var.: am
## Observations: 693
## Standard-errors: IID 
##             Estimate Std. Error   t value   Pr(>|t|)    
## (Intercept)  47.8631   0.283542 168.80421  < 2.2e-16 ***
## fracblack   -24.4455   1.367710 -17.87331  < 2.2e-16 ***
## racseg      -13.4511   1.717636  -7.83115 1.8264e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 4.33291   Adj. R2: 0.411517

Step 2: Test residuals for a spatial unit root

set.seed(42)
t_resid <- spurtest(m_ols, coords = ~ s_1 + s_2,
                    type = "i1resid", latlong = TRUE,
                    q = 15, nrep = 5000)
t_resid
## 
## Spatial I(1) Residual Test Results
## ---------------------------------------
## Test Statistic :  1111 
## P-value        :  0.192 
## ---------------------------------------

The residual test fails to reject a spatial unit root in the residuals, so the OLS inference above is not reliable and we proceed to difference all variables.

Step 3: Apply the LBM-GLS transformation

transformed <- spurtransform(~ am + fracblack + racseg, data = chetty,
                              coords = ~ s_1 + s_2,
                              prefix = "h_", latlong = TRUE)

Step 4: Re-estimate on transformed data

m_trans <- feols(h_am ~ h_fracblack + h_racseg, data = transformed,
                 se = "standard")
summary(m_trans)
## OLS estimation, Dep. Var.: h_am
## Observations: 693
## Standard-errors: IID 
##                  Estimate Std. Error       t value   Pr(>|t|)    
## (Intercept) -9.350000e-16   0.999411 -9.350000e-16 1.0000e+00    
## h_fracblack -1.297061e+01   2.095557 -6.189575e+00 1.0348e-09 ***
## h_racseg    -1.151500e+01   1.107396 -1.039827e+01  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 26.3   Adj. R2: 0.222151

The R² drops from 0.413 to 0.224 after removing spatial trends, indicating that much of the apparent relationship was driven by shared spatial patterns — a classic spurious regression result. In practice, one would additionally apply SCPC inference (Müller & Watson 2022, 2023) to the transformed regression to account for any remaining weak spatial correlation.

Comparison with the Stata SPUR Package

The R implementation reproduces the Stata SPUR package (Becker, Boll & Voth, 2025). The table above replicates Table A.1 across variables; the key checks for absolute mobility (am) are:

Statistic Stata (Table A.1) R (spatialunitroot)
I(1) test p-value 0.38 0.39
I(0) test p-value 0.00 0
Half-life 95% CI (norm.) [0.09, ∞] [0.09, ∞]

Minor numerical differences arise from Monte Carlo simulation noise (of order 1/nrep1/\sqrt{n_{rep}}) and from differences between the R and Stata random number generators, but the qualitative conclusions — and the reported values to two decimal places — are identical.

References

  • Becker, S. O., Boll, P. D., & Voth, H.-J. (2025). “Testing and Correcting for Spatial Unit Roots in Regression Analysis.” Stata Journal.
  • Chetty, R., Hendren, N., Kline, P., & Saez, E. (2014). “Where is the Land of Opportunity?” Quarterly Journal of Economics, 129(4), 1553–1623.
  • Müller, U. K. & Watson, M. W. (2024). “Spatial Unit Roots and Spurious Regression.” Econometrica, 92, 1661–1695.