7 Synthetic Control: The Economic Impact of German Reunification

7.1 Introduction

The synthetic control method is a way to estimate the causal effect of a policy when only a small number of units are treated (e.g. a handful of states or countries). The idea is to build a weighted combination of untreated units — a synthetic version of each treated unit — that closely matches its pre-treatment trajectory. After the treatment, any divergence between the treated unit and its synthetic counterpart is attributed to the policy.

This notebook replicates the classic study of Abadie, Diamond & Hainmueller (2015), which estimates the economic impact of German Reunification (1990) on West Germany’s GDP per capita using the scpi package.

We will:

Load and prepare the data.
Set up the synthetic control problem with scdata().
Estimate the treatment effect with scest().
Add uncertainty estimates (confidence intervals) with scpi().

7.2 Step 1: Load packages and data

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.6.0
✔ ggplot2   3.5.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(scpi)

The scpi package ships with the scpi_germany dataset containing annual economic indicators for 17 OECD countries from 1960 to 2003. The treated unit is West Germany, which was reunified with East Germany in 1990.

data("scpi_germany")

7.3 Step 2: Set up the synthetic control problem

scdata() prepares the data structure that scpi needs. The key arguments are:

id.var, time.var — which columns identify units and time periods.
outcome.var — the variable we want to predict/compare (gdp).
period.pre / period.post — pre- and post-treatment time periods.
unit.tr — the treated unit ("West Germany").
unit.co — the donor pool of untreated countries.
features — variables used to match the treated unit to its synthetic in the pre-period.
cov.adj — additional adjustments applied to the matching (a constant and a linear trend).

donors <- unique(scpi_germany$country)
donors <- donors[donors != "West Germany"]

df <- scdata(
  scpi_germany,
  id.var      = "country",
  outcome.var = "gdp",
  time.var    = "year",
  period.pre  = 1960:1989,
  period.post = 1990:2003,
  unit.tr     = "West Germany",
  unit.co     = donors,
  features    = c("gdp", "infrate", "trade", "schooling", "industry"),
  cov.adj     = list(c("constant", "trend"))
)

7.4 Step 3: Estimate the treatment effect

scest() finds the optimal weights for the donor countries. The constraint w.constr = list("name" = "simplex") forces the weights to be non-negative and sum to one — so the synthetic control is a proper convex combination of donors, not an extrapolation.

scplot() then shows the actual outcome alongside the synthetic counterfactual.

res <- scest(df, w.constr = list("name" = "simplex"))
scplot(res)

$plot_out

7.5 Step 4: Add uncertainty with bootstrapped confidence intervals

Point estimates alone do not tell us whether the gap between treated and synthetic is statistically meaningful. scpi() adds inference via a Gaussian resampling procedure (e.method = "gaussian"), run in parallel across 4 cores with 50 simulations.

scplot(..., type = "series") shows the time series with confidence bands.

respi <- scpi(df,
              w.constr = list("name" = "simplex"),
              cores    = 4,
              sims     = 50,
              e.method = "gaussian")

---------------------------------------------------------------
Estimating Weights...
Quantifying Uncertainty
Treated unit 1: 5/50 iterations completed (10%) 
Treated unit 1: 10/50 iterations completed (20%) 
Treated unit 1: 15/50 iterations completed (30%) 
Treated unit 1: 20/50 iterations completed (40%) 
Treated unit 1: 25/50 iterations completed (50%) 
Treated unit 1: 30/50 iterations completed (60%) 
Treated unit 1: 35/50 iterations completed (70%) 
Treated unit 1: 40/50 iterations completed (80%) 
Treated unit 1: 45/50 iterations completed (90%) 
Treated unit 1: 50/50 iterations completed (100%)

scplot(respi)

$plot_out

Note: Increase sims (e.g. to 200 or 1000) for more stable confidence intervals in a real analysis. 50 is used here for speed.