Introduction
Clio is an R package that serves as a bridge between the user and the website Clio Infra, a repository containing publicly available data on various aspects of economic history at the country level. This package is designed to quickly and efficiently extract data, allowing the user to make large queries. This way, the user is not confined to manually downloading Excel files, and the associated filtering and merging process. The package also facilitates transparent and reproducible data collection, and it is ‘typo robust’ to a certain extent, to prevent annoyance. Load it with:
Functions
The package is designed to be very easy to use and contains only four (key) functions:
This function provides an overview of all available variables on clio-infra. It has no arguments, and returns a dataframe with all variables, their title, and availability (from and to). The function’s goal is to aid the user in browsing the variables, without having to go back and forth to the website.
This function allows the user to browse through various categories of data available. It is meant to be used in the following way: first, call it without argument. That gives you a vector of currently available categories of data. Secondly, call it again with a character vector of one of the categories as an argument:
clio_overview_cat("production")This will give the user an overview of variables within a
certain category. Note that the function does not support looking at two
categories at the same time. The _get equivalent of this
function, the function that actually provides the data to the user,
does, however.
clio_get returns a data frame on the basis of a couple
of arguments: - variables: You can select variables with the help of
clio_overview() or clio_overview_cat(). You can enter multiple variables
using c(var1,var2). - countries: You can enter one, or
multiple countries, also using c(country1, country2).
Defaults to all countries available in the query. - from, to: condition
the dataset on a certain time period. If ineffective, no warning will be
given. - list = FALSE: create a list instead of a merged data.frame. -
mergetype = inner_join. Select a merge type,
inner_join, outer_join, left_join, right_join. Ignored in
case of list = TRUE.
clio_get_cat accepts categories as inputs, as
well as combinations of categories using c(cat1, cat2). All
other arguments are passed on to clio_get.
Demonstration
head(clio_overview(),10)
#> variable_name from to obs
#> 1 Cattle per Capita 1500 2010 7456
#> 2 Cropland per Capita 1500 2010 6226
#> 3 Goats per Capita 1500 2010 7037
#> 4 Pasture per Capita 1500 2010 5963
#> 5 Pigs per Capita 1500 2010 6841
#> 6 Sheep per Capita 1500 2010 6835
#> 7 Total Cattle 1500 2010 7457
#> 8 Total Cropland 1500 2010 6191
#> 9 Total Number of Goats 1500 2010 7037
#> 10 Total Number of Pigs 1500 2010 6841All categories of data are shown below:
clio_overview_cat()
#> [1] "Agriculture" "Demography" "Environment"
#> [4] "Finance" "Gender Equality" "Human Capital"
#> [7] "Institutions" "Labour Relations" "National Accounts"
#> [10] "Prices and Wages" "Production"Browse through the database by feeding arguments to
clio_overview_cat(). It is typo-robust.
clio_overview_cat("Finanze")
#> variable_name from to obs
#> 1 Exchange Rates to UK Pound 1500 2013 15572
#> 2 Exchange Rates to US Dollar 1500 2013 11765
#> 3 Gold Standard 1800 2010 14359
#> 4 Long-Term Government Bond Yield 1727 2011 2849
#> 5 Total Gross Central Government Debt as a Percentage of GDP 1692 2010 7134
clio_overview_cat("prices end weeges")
#> variable_name from to obs
#> 1 Income Inequality 1820 2000 866
#> 2 Inflation 1500 2010 16676
#> 3 Labourers Real Wage 1820 2008 5053
#> 4 Wealth Decadal Ginis 1820 2010 225
#> 5 Wealth Top 1820 2010 225
#> 6 Wealth Total 1820 2010 111
#> 7 Wealth Yearly Ginis 1820 2015 749
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
clio_get(c("income inequality", "labouresrs real wage"))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 5,702 × 5
#> ccode country.name year `Income Inequality` `Labourers Real Wage`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 24 Angola 1820 48.9 NA
#> 2 32 Argentina 1820 47.1 NA
#> 3 40 Austria 1820 53.4 NA
#> 4 56 Belgium 1820 62.4 16.6
#> 5 204 Benin 1820 48.0 NA
#> 6 76 Brazil 1820 47.1 NA
#> 7 120 Cameroon 1820 56.2 NA
#> 8 124 Canada 1820 45.1 NA
#> 9 152 Chile 1820 47.1 NA
#> 10 156 China 1820 44.9 3.71
#> # ℹ 5,692 more rows
clio_get(c("infant mortality", "zinc production"))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 14,570 × 5
#> ccode country.name year `Infant Mortality` `Zinc Production`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 191 Croatia 1810 175 NA
#> 2 246 Finland 1810 200. 0
#> 3 826 United Kingdom 1810 141 0
#> 4 40 Austria 1820 188. 0
#> 5 191 Croatia 1820 150 NA
#> 6 246 Finland 1820 198. 0
#> 7 250 France 1820 182 0
#> 8 528 Netherlands 1820 179 NA
#> 9 826 United Kingdom 1820 153 0
#> 10 40 Austria 1830 251. 0
#> # ℹ 14,560 more rows
clio_get(c("biodiversity - naturalness", "xecutive Constraints (XCONST)"),
from = 1850, to = 1900,
countries = c("Armenia", "Azerbaijan"))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 12 × 5
#> ccode country.name year `Biodiversity - naturalness` Executive Constraints…¹
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 51 Armenia 1850 0.903 NA
#> 2 31 Azerbaijan 1850 0.908 NA
#> 3 51 Armenia 1860 0.899 NA
#> 4 31 Azerbaijan 1860 0.900 NA
#> 5 51 Armenia 1870 0.896 NA
#> 6 31 Azerbaijan 1870 0.892 NA
#> 7 51 Armenia 1880 0.892 NA
#> 8 31 Azerbaijan 1880 0.883 NA
#> 9 51 Armenia 1890 0.888 NA
#> 10 31 Azerbaijan 1890 0.873 NA
#> 11 51 Armenia 1900 0.884 NA
#> 12 31 Azerbaijan 1900 0.863 NA
#> # ℹ abbreviated name: ¹`Executive Constraints (XCONST)`
clio_get(c("Zinc production", "Gold production"),
from = 1800, to = 1920,
countries = c("Botswana", "Zimbabwe",
mergetype = inner_join))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 242 × 5
#> ccode country.name year `Zinc Production` `Gold Production`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 72 Botswana 1800 NA 0
#> 2 716 Zimbabwe 1800 NA 0
#> 3 72 Botswana 1801 NA 0
#> 4 716 Zimbabwe 1801 NA 0
#> 5 72 Botswana 1802 NA 0
#> 6 716 Zimbabwe 1802 NA 0
#> 7 72 Botswana 1803 NA 0
#> 8 716 Zimbabwe 1803 NA 0
#> 9 72 Botswana 1804 NA 0
#> 10 716 Zimbabwe 1804 NA 0
#> # ℹ 232 more rows
clio_get(c("Armed conflicts internal", "Gold production", "Armed conflicts international"),
mergetype = inner_join)
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 34,559 × 6
#> ccode country.name year `Armed conflicts (Internal)` `Gold Production`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 12 Algeria 1681 0 0
#> 2 24 Angola 1681 0 0
#> 3 32 Argentina 1681 0 0
#> 4 51 Armenia 1681 0 0
#> 5 36 Australia 1681 0 0
#> 6 40 Austria 1681 0 0
#> 7 31 Azerbaijan 1681 0 0
#> 8 68 Bolivia 1681 0 0
#> 9 72 Botswana 1681 0 0
#> 10 76 Brazil 1681 0 1.5
#> # ℹ 34,549 more rows
#> # ℹ 1 more variable: `Armed Conflicts (International)` <dbl>The kind of merge is customizable . The argument name is
.
And it takes the values full_join (default), left_join, inner_join,
outer_join, etc. if you have loaded dplyr.
library(dplyr)
clio_get_cat("finanz", list = F, from = 1800, to = 1900)
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 7,074 × 8
#> ccode country.name year `Exchange Rates to UK Pound` Exchange Rates to US …¹
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 40 Austria 1800 10.1 NA
#> 2 124 Canada 1800 1.08 NA
#> 3 156 China 1800 3.64 NA
#> 4 208 Denmark 1800 5.09 NA
#> 5 276 Germany 1800 11.9 3.00
#> 6 372 Ireland 1800 1.11 NA
#> 7 380 Italy 1800 5.24 NA
#> 8 428 Latvia 1800 3.78 NA
#> 9 528 Netherlands 1800 11.3 2.61
#> 10 616 Poland 1800 21.3 NA
#> # ℹ 7,064 more rows
#> # ℹ abbreviated name: ¹`Exchange Rates to US Dollar`
#> # ℹ 3 more variables: `Gold Standard` <dbl>,
#> # `Long-Term Government Bond Yield` <dbl>,
#> # `Total Gross Central Government Debt as a Percentage of GDP` <dbl>
clio_overview_cat()
#> [1] "Agriculture" "Demography" "Environment"
#> [4] "Finance" "Gender Equality" "Human Capital"
#> [7] "Institutions" "Labour Relations" "National Accounts"
#> [10] "Prices and Wages" "Production"
clio_get_cat(c("agriculture", "environment"),
countries = "Netherlands",
mergetype = inner_join, from = 1850, to = 1900)
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 6 × 20
#> ccode country.name year `Cattle per Capita` `Cropland per Capita`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 528 Netherlands 1850 0.397 0.235
#> 2 528 Netherlands 1860 0.386 0.229
#> 3 528 Netherlands 1870 0.389 0.223
#> 4 528 Netherlands 1880 0.362 0.217
#> 5 528 Netherlands 1890 0.335 0.211
#> 6 528 Netherlands 1900 0.320 0.204
#> # ℹ 15 more variables: `Goats per Capita` <dbl>, `Pasture per Capita` <dbl>,
#> # `Pigs per Capita` <dbl>, `Sheep per Capita` <dbl>, `Total Cattle` <dbl>,
#> # `Total Cropland` <dbl>, `Total Number of Goats` <dbl>,
#> # `Total Number of Pigs` <dbl>, `Total Number of Sheep` <dbl>,
#> # `Total Pasture` <dbl>, `Biodiversity - naturalness` <dbl>,
#> # `CO2 Emissions per Capita` <dbl>, `SO2 Emissions per Capita` <dbl>,
#> # `Total CO2 Emissions` <dbl>, `Total SO2 Emissions` <dbl>
clio_get_cat("Produzioni", from = 1700, list = F, mergetype = inner_join)
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 1,197 × 15
#> ccode country.name year `Aluminium Production` `Bauxite Production`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 36 Australia 1880 0 0
#> 2 76 Brazil 1880 0 0
#> 3 156 China 1880 0 0
#> 4 356 India 1880 0 0
#> 5 364 Iran 1880 0 0
#> 6 398 Kazakhstan 1880 0 0
#> 7 643 Russia 1880 0 0
#> 8 724 Spain 1880 0 0
#> 9 840 United States 1880 0 0
#> 10 36 Australia 1881 0 0
#> # ℹ 1,187 more rows
#> # ℹ 10 more variables: `Copper Production` <dbl>, `Gold Production` <dbl>,
#> # `Iron Ore Production` <dbl>, `Lead Production` <dbl>,
#> # `Manganese Production` <dbl>, `Nickel Production` <dbl>,
#> # `Silver Production` <dbl>, `Tin Production` <dbl>,
#> # `Tungsten Production` <dbl>, `Zinc Production` <dbl>
clio_get(c("Tin Production", "income inequality"), from = 1800, countries = c("Netherlands", "Russia"))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 225 × 5
#> ccode country.name year `Tin Production` `Income Inequality`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 643 Russia 1800 0 NA
#> 2 643 Russia 1801 0 NA
#> 3 643 Russia 1802 0 NA
#> 4 643 Russia 1803 0 NA
#> 5 643 Russia 1804 0 NA
#> 6 643 Russia 1805 0 NA
#> 7 643 Russia 1806 0 NA
#> 8 643 Russia 1807 0 NA
#> 9 643 Russia 1808 0 NA
#> 10 643 Russia 1809 0 NA
#> # ℹ 215 more rows
clio_get_cat("labor relation")
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 6,357 × 7
#> ccode country.name year Number of Days Lost in Lab…¹ Number of Labour Dis…²
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 32 Argentina 1927 363492 56
#> 2 36 Australia 1927 1713581 441
#> 3 40 Austria 1927 686560 216
#> 4 56 Belgium 1927 1658836 186
#> 5 100 Bulgaria 1927 57196 23
#> 6 124 Canada 1927 152570 74
#> 7 156 China 1927 7622029 117
#> 8 208 Denmark 1927 119000 17
#> 9 233 Estonia 1927 3067 5
#> 10 246 Finland 1927 1528182 79
#> # ℹ 6,347 more rows
#> # ℹ abbreviated names: ¹`Number of Days Lost in Labour Disputes`,
#> # ²`Number of Labour Disputes`
#> # ℹ 2 more variables: `Number of Workers Involved in Labour Disputes` <dbl>,
#> # `Working week in manufacturing` <dbl>Thank you for reading! Questions and remarks: Github or e-mail. If you have any improvements, feel free to submit a PR. In case of issues: Feel free to start a thread or point them out otherwise.