2 Computing Distance to the Roman Empire Border at the Wijk Level

2.1 Introduction

This notebook constructs a running variable for a regression discontinuity design (RDD) based on the historical border of the Roman Empire in the Netherlands.

The idea is straightforward: Dutch neighbourhoods (wijken) that were once inside the Roman Empire may differ systematically from those just outside it. To study this, we need to know — for each neighbourhood — how far it is from the Roman border, and whether it falls inside or outside.

We will:

Load the Roman Empire boundary and Dutch neighbourhood data.
Label each neighbourhood as inside or outside the Roman Empire.
Crop the Roman border to the relevant section running through the Netherlands.
Compute the distance from each neighbourhood centroid to that border.
Build a signed running variable (positive = inside, negative = outside).
Export the result as a shapefile.

2.2 Step 1: Load packages and spatial data

We use:

sf — for all spatial operations.
tidyverse — for data manipulation.
cawd — contains historical boundary data, including the Roman Empire at 117 AD.
cbsodataR — provides direct access to CBS (Statistics Netherlands) geographic data.

library(sf)

Linking to GEOS 3.14.1, GDAL 3.11.5, PROJ 9.6.2; sf_use_s2() is TRUE

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.6.0
✔ ggplot2   3.5.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(cawd)
library(cbsodataR)

We load the Roman Empire boundary and Dutch neighbourhood (wijk) data, making sure both use the same coordinate reference system (CRS).

roman_empire <- cawd::awmc.roman.empire.117.sp |> st_as_sf()

Loading required namespace: sp

crs <- st_crs(roman_empire)

wijk <- cbsodataR::cbs_get_sf("wijk", 2023) |> st_transform(crs)

ggplot(roman_empire) + geom_sf()

ggplot(wijk) + geom_sf()

2.3 Step 2: Label neighbourhoods as inside or outside the Roman Empire

We use st_intersects() to check which neighbourhood centroids fall within the Roman Empire polygon. The result tells us the row indices of neighbourhoods that intersect — we use those to create a binary indicator in_roman_empire.

numbers <- st_intersects(st_centroid(wijk), roman_empire) |>
  as.data.frame() |>
  select(row.id) |>
  pull()

Warning: st_centroid assumes attributes are constant over geometries

wijk <- wijk |>
  mutate(in_roman_empire = as.factor(if_else(is.element(row_number(), numbers), 1, 0)))

2.4 Step 3: Extract the Roman border running through the Netherlands

The full Roman Empire boundary covers a huge area. We only want the section that cuts through the Netherlands (the limes).

First, we extract the outer boundary of all neighbourhoods classified as inside the Roman Empire. Then we define a small polygon covering the relevant area and use st_intersection() to clip the border to that region.

# Dissolve all "inside" neighbourhoods and extract their outer boundary
boundary <- wijk |>
  filter(in_roman_empire == 1) |>
  st_union() |>
  st_boundary()

# Define a polygon covering the Dutch section of the Roman border
point1 <- c(4.5, 52.5)
point2 <- c(4.5, 53)
point3 <- c(6.4, 53)
point4 <- c(6.4, 51.75)
point5 <- c(5.5, 51.75)
point6 <- c(4.5, 52.5)

coords <- list(rbind(point1, point2, point3, point4, point5, point6))
poly_coords <- st_polygon(coords)
poly_c <- st_sfc(poly_coords, crs = st_crs(crs))
poly_l <- st_sf(poly_c) |> rename(geometry = poly_c)

# Clip the boundary to this region
border <- st_intersection(boundary, poly_l)

border |> ggplot() + geom_sf()

2.5 Step 4: Compute distances from each neighbourhood to the border

st_distance() returns a matrix of distances between every neighbourhood centroid and every segment of the border. We take the minimum across columns to get each neighbourhood’s distance to the nearest point on the border.

Distances from st_distance() are in metres.

distances <- st_distance(st_centroid(wijk), border) |>
  apply(1, min)

Warning: st_centroid assumes attributes are constant over geometries

wijk <- wijk |>
  mutate(distance_to_border = distances)

2.6 Step 5: Build the signed running variable

For an RDD, we need a single running variable that is:

Positive for neighbourhoods inside the Roman Empire.
Negative for neighbourhoods outside.

We convert from metres to kilometres by dividing by 1000.

wijk <- wijk |>
  mutate(running = if_else(in_roman_empire == 1,
                           distance_to_border / 1000,
                           -distance_to_border / 1000))

wijk

Simple feature collection with 3352 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 3.358373 ymin: 50.75136 xmax: 7.227497 ymax: 53.55358
Geodetic CRS:  +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
# A tibble: 3,352 × 6
   statcode statnaam                                    geometry in_roman_empire
 * <chr>    <chr>                             <MULTIPOLYGON [°]> <fct>          
 1 WK001400 Centrum        (((6.57088 53.2272, 6.560231 53.2276… 0              
 2 WK001401 Oud-Zuid       (((6.579803 53.20758, 6.580139 53.21… 0              
 3 WK001402 Oud-West       (((6.555863 53.21402, 6.560231 53.22… 0              
 4 WK001403 Oud-Noord      (((6.57088 53.2272, 6.584517 53.2334… 0              
 5 WK001404 Oosterparkwijk (((6.580139 53.21705, 6.598265 53.22… 0              
 6 WK001405 Zuidoost       (((6.662155 53.19982, 6.63494 53.202… 0              
 7 WK001406 Helpman e.o.   (((6.575788 53.18293, 6.603272 53.19… 0              
 8 WK001407 Zuidwest       (((6.575788 53.18293, 6.562998 53.19… 0              
 9 WK001408 Hoogkerk e.o.  (((6.532016 53.20781, 6.519302 53.20… 0              
10 WK001409 Nieuw-West     (((6.542291 53.22096, 6.518617 53.25… 0              
# ℹ 3,342 more rows
# ℹ 2 more variables: distance_to_border <dbl>, running <dbl>

2.7 Step 6: Export the result

We save the enriched neighbourhood data (with in_roman_empire, distance_to_border, and running) as a shapefile for use in downstream analyses.

sf::write_sf(wijk, "distance_to_border_wijk.shp")