Getting Started with Clio • clio

Introduction

Clio is an R package that serves as a bridge between the user and the website Clio Infra, a repository containing publicly available data on various aspects of economic history at the country level. This package is designed to quickly and efficiently extract data, allowing the user to make large queries. This way, the user is not confined to manually downloading Excel files, and the associated filtering and merging process. The package also facilitates transparent and reproducible data collection, and it is ‘typo robust’ to a certain extent, to prevent annoyance. Load it with:

library(clio)

Functions

The package is designed to be very easy to use and contains only four (key) functions:

clio_overview()

This function provides an overview of all available variables on clio-infra. It has no arguments, and returns a dataframe with all variables, their title, and availability (from and to). The function’s goal is to aid the user in browsing the variables, without having to go back and forth to the website.

clio_overview_cat()

This function allows the user to browse through various categories of data available. It is meant to be used in the following way: first, call it without argument. That gives you a vector of currently available categories of data. Secondly, call it again with a character vector of one of the categories as an argument:

clio_overview_cat("production")

This will give the user an overview of variables within a certain category. Note that the function does not support looking at two categories at the same time. The _get equivalent of this function, the function that actually provides the data to the user, does, however.

clio_get()

clio_get returns a data frame on the basis of a couple of arguments: - variables: You can select variables with the help of clio_overview() or clio_overview_cat(). You can enter multiple variables using c(var1,var2). - countries: You can enter one, or multiple countries, also using c(country1, country2). Defaults to all countries available in the query. - from, to: condition the dataset on a certain time period. If ineffective, no warning will be given. - list = FALSE: create a list instead of a merged data.frame. - mergetype = inner_join. Select a merge type, inner_join, outer_join, left_join, right_join. Ignored in case of list = TRUE.

clio_get_cat()

clio_get_cat accepts categories as inputs, as well as combinations of categories using c(cat1, cat2). All other arguments are passed on to clio_get.

Demonstration

head(clio_overview(),10)
#>            variable_name from   to  obs
#> 1      Cattle per Capita 1500 2010 7456
#> 2    Cropland per Capita 1500 2010 6226
#> 3       Goats per Capita 1500 2010 7037
#> 4     Pasture per Capita 1500 2010 5963
#> 5        Pigs per Capita 1500 2010 6841
#> 6       Sheep per Capita 1500 2010 6835
#> 7           Total Cattle 1500 2010 7457
#> 8         Total Cropland 1500 2010 6191
#> 9  Total Number of Goats 1500 2010 7037
#> 10  Total Number of Pigs 1500 2010 6841

All categories of data are shown below:

clio_overview_cat()
#>  [1] "Agriculture"       "Demography"        "Environment"      
#>  [4] "Finance"           "Gender Equality"   "Human Capital"    
#>  [7] "Institutions"      "Labour Relations"  "National Accounts"
#> [10] "Prices and Wages"  "Production"

Browse through the database by feeding arguments to clio_overview_cat(). It is typo-robust.

clio_overview_cat("Finanze")
#>                                                 variable_name from   to   obs
#> 1                                  Exchange Rates to UK Pound 1500 2013 15572
#> 2                                 Exchange Rates to US Dollar 1500 2013 11765
#> 3                                               Gold Standard 1800 2010 14359
#> 4                            Long-Term Government  Bond Yield 1727 2011  2849
#> 5 Total Gross Central Government  Debt as a Percentage of GDP 1692 2010  7134

clio_overview_cat("prices end weeges")
#>          variable_name from   to   obs
#> 1    Income Inequality 1820 2000   866
#> 2            Inflation 1500 2010 16676
#> 3  Labourers Real Wage 1820 2008  5053
#> 4 Wealth Decadal Ginis 1820 2010   225
#> 5           Wealth Top 1820 2010   225
#> 6         Wealth Total 1820 2010   111
#> 7  Wealth Yearly Ginis 1820 2015   749

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
clio_get(c("income inequality", "labouresrs real wage"))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 5,702 × 5
#>    ccode country.name  year `Income Inequality` `Labourers Real Wage`
#>    <dbl> <chr>        <dbl>               <dbl>                 <dbl>
#>  1    24 Angola        1820                48.9                 NA   
#>  2    32 Argentina     1820                47.1                 NA   
#>  3    40 Austria       1820                53.4                 NA   
#>  4    56 Belgium       1820                62.4                 16.6 
#>  5   204 Benin         1820                48.0                 NA   
#>  6    76 Brazil        1820                47.1                 NA   
#>  7   120 Cameroon      1820                56.2                 NA   
#>  8   124 Canada        1820                45.1                 NA   
#>  9   152 Chile         1820                47.1                 NA   
#> 10   156 China         1820                44.9                  3.71
#> # ℹ 5,692 more rows

clio_get(c("infant mortality", "zinc production"))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 14,570 × 5
#>    ccode country.name    year `Infant Mortality` `Zinc Production`
#>    <dbl> <chr>          <dbl>              <dbl>             <dbl>
#>  1   191 Croatia         1810               175                 NA
#>  2   246 Finland         1810               200.                 0
#>  3   826 United Kingdom  1810               141                  0
#>  4    40 Austria         1820               188.                 0
#>  5   191 Croatia         1820               150                 NA
#>  6   246 Finland         1820               198.                 0
#>  7   250 France          1820               182                  0
#>  8   528 Netherlands     1820               179                 NA
#>  9   826 United Kingdom  1820               153                  0
#> 10    40 Austria         1830               251.                 0
#> # ℹ 14,560 more rows

clio_get(c("biodiversity - naturalness", "xecutive Constraints  (XCONST)"), 
         from = 1850, to = 1900, 
         countries = c("Armenia", "Azerbaijan"))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 12 × 5
#>    ccode country.name  year `Biodiversity - naturalness` Executive Constraints…¹
#>    <dbl> <chr>        <dbl>                        <dbl>                   <dbl>
#>  1    51 Armenia       1850                        0.903                      NA
#>  2    31 Azerbaijan    1850                        0.908                      NA
#>  3    51 Armenia       1860                        0.899                      NA
#>  4    31 Azerbaijan    1860                        0.900                      NA
#>  5    51 Armenia       1870                        0.896                      NA
#>  6    31 Azerbaijan    1870                        0.892                      NA
#>  7    51 Armenia       1880                        0.892                      NA
#>  8    31 Azerbaijan    1880                        0.883                      NA
#>  9    51 Armenia       1890                        0.888                      NA
#> 10    31 Azerbaijan    1890                        0.873                      NA
#> 11    51 Armenia       1900                        0.884                      NA
#> 12    31 Azerbaijan    1900                        0.863                      NA
#> # ℹ abbreviated name: ¹`Executive Constraints  (XCONST)`

clio_get(c("Zinc production", "Gold production"), 
         from = 1800, to = 1920, 
         countries = c("Botswana", "Zimbabwe", 
                       mergetype = inner_join))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 242 × 5
#>    ccode country.name  year `Zinc Production` `Gold Production`
#>    <dbl> <chr>        <dbl>             <dbl>             <dbl>
#>  1    72 Botswana      1800                NA                 0
#>  2   716 Zimbabwe      1800                NA                 0
#>  3    72 Botswana      1801                NA                 0
#>  4   716 Zimbabwe      1801                NA                 0
#>  5    72 Botswana      1802                NA                 0
#>  6   716 Zimbabwe      1802                NA                 0
#>  7    72 Botswana      1803                NA                 0
#>  8   716 Zimbabwe      1803                NA                 0
#>  9    72 Botswana      1804                NA                 0
#> 10   716 Zimbabwe      1804                NA                 0
#> # ℹ 232 more rows

clio_get(c("Armed conflicts internal", "Gold production", "Armed conflicts international"), 
         mergetype = inner_join)
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 34,559 × 6
#>    ccode country.name  year `Armed conflicts (Internal)` `Gold Production`
#>    <dbl> <chr>        <dbl>                        <dbl>             <dbl>
#>  1    12 Algeria       1681                            0               0  
#>  2    24 Angola        1681                            0               0  
#>  3    32 Argentina     1681                            0               0  
#>  4    51 Armenia       1681                            0               0  
#>  5    36 Australia     1681                            0               0  
#>  6    40 Austria       1681                            0               0  
#>  7    31 Azerbaijan    1681                            0               0  
#>  8    68 Bolivia       1681                            0               0  
#>  9    72 Botswana      1681                            0               0  
#> 10    76 Brazil        1681                            0               1.5
#> # ℹ 34,549 more rows
#> # ℹ 1 more variable: `Armed Conflicts (International)` <dbl>

The kind of merge is customizable . The argument name is $mergetype$ . And it takes the values full_join (default), left_join, inner_join, outer_join, etc. if you have loaded dplyr.

library(dplyr)
clio_get_cat("finanz", list = F, from = 1800, to = 1900)
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 7,074 × 8
#>    ccode country.name  year `Exchange Rates to UK Pound` Exchange Rates to US …¹
#>    <dbl> <chr>        <dbl>                        <dbl>                   <dbl>
#>  1    40 Austria       1800                        10.1                    NA   
#>  2   124 Canada        1800                         1.08                   NA   
#>  3   156 China         1800                         3.64                   NA   
#>  4   208 Denmark       1800                         5.09                   NA   
#>  5   276 Germany       1800                        11.9                     3.00
#>  6   372 Ireland       1800                         1.11                   NA   
#>  7   380 Italy         1800                         5.24                   NA   
#>  8   428 Latvia        1800                         3.78                   NA   
#>  9   528 Netherlands   1800                        11.3                     2.61
#> 10   616 Poland        1800                        21.3                    NA   
#> # ℹ 7,064 more rows
#> # ℹ abbreviated name: ¹`Exchange Rates to US Dollar`
#> # ℹ 3 more variables: `Gold Standard` <dbl>,
#> #   `Long-Term Government  Bond Yield` <dbl>,
#> #   `Total Gross Central Government  Debt as a Percentage of GDP` <dbl>

clio_overview_cat()
#>  [1] "Agriculture"       "Demography"        "Environment"      
#>  [4] "Finance"           "Gender Equality"   "Human Capital"    
#>  [7] "Institutions"      "Labour Relations"  "National Accounts"
#> [10] "Prices and Wages"  "Production"

clio_get_cat(c("agriculture", "environment"),
             countries = "Netherlands",
             mergetype = inner_join, from = 1850, to = 1900)
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 6 × 20
#>   ccode country.name  year `Cattle per Capita` `Cropland per Capita`
#>   <dbl> <chr>        <dbl>               <dbl>                 <dbl>
#> 1   528 Netherlands   1850               0.397                 0.235
#> 2   528 Netherlands   1860               0.386                 0.229
#> 3   528 Netherlands   1870               0.389                 0.223
#> 4   528 Netherlands   1880               0.362                 0.217
#> 5   528 Netherlands   1890               0.335                 0.211
#> 6   528 Netherlands   1900               0.320                 0.204
#> # ℹ 15 more variables: `Goats per Capita` <dbl>, `Pasture per Capita` <dbl>,
#> #   `Pigs per Capita` <dbl>, `Sheep per Capita` <dbl>, `Total Cattle` <dbl>,
#> #   `Total Cropland` <dbl>, `Total Number of Goats` <dbl>,
#> #   `Total Number of Pigs` <dbl>, `Total Number of Sheep` <dbl>,
#> #   `Total Pasture` <dbl>, `Biodiversity - naturalness` <dbl>,
#> #   `CO2 Emissions per Capita` <dbl>, `SO2 Emissions per Capita` <dbl>,
#> #   `Total CO2 Emissions` <dbl>, `Total SO2 Emissions` <dbl>

clio_get_cat("Produzioni", from = 1700, list = F, mergetype = inner_join)
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 1,197 × 15
#>    ccode country.name   year `Aluminium Production` `Bauxite Production`
#>    <dbl> <chr>         <dbl>                  <dbl>                <dbl>
#>  1    36 Australia      1880                      0                    0
#>  2    76 Brazil         1880                      0                    0
#>  3   156 China          1880                      0                    0
#>  4   356 India          1880                      0                    0
#>  5   364 Iran           1880                      0                    0
#>  6   398 Kazakhstan     1880                      0                    0
#>  7   643 Russia         1880                      0                    0
#>  8   724 Spain          1880                      0                    0
#>  9   840 United States  1880                      0                    0
#> 10    36 Australia      1881                      0                    0
#> # ℹ 1,187 more rows
#> # ℹ 10 more variables: `Copper Production` <dbl>, `Gold Production` <dbl>,
#> #   `Iron Ore Production` <dbl>, `Lead Production` <dbl>,
#> #   `Manganese Production` <dbl>, `Nickel Production` <dbl>,
#> #   `Silver Production` <dbl>, `Tin Production` <dbl>,
#> #   `Tungsten Production` <dbl>, `Zinc Production` <dbl>

clio_get(c("Tin Production", "income inequality"), from = 1800, countries = c("Netherlands", "Russia"))
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 225 × 5
#>    ccode country.name  year `Tin Production` `Income Inequality`
#>    <dbl> <chr>        <dbl>            <dbl>               <dbl>
#>  1   643 Russia        1800                0                  NA
#>  2   643 Russia        1801                0                  NA
#>  3   643 Russia        1802                0                  NA
#>  4   643 Russia        1803                0                  NA
#>  5   643 Russia        1804                0                  NA
#>  6   643 Russia        1805                0                  NA
#>  7   643 Russia        1806                0                  NA
#>  8   643 Russia        1807                0                  NA
#>  9   643 Russia        1808                0                  NA
#> 10   643 Russia        1809                0                  NA
#> # ℹ 215 more rows

clio_get_cat("labor relation")
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> Joining with `by = join_by(ccode, country.name, year)`
#> # A tibble: 6,357 × 7
#>    ccode country.name  year Number of Days Lost in  Lab…¹ Number of Labour Dis…²
#>    <dbl> <chr>        <dbl>                         <dbl>                  <dbl>
#>  1    32 Argentina     1927                        363492                     56
#>  2    36 Australia     1927                       1713581                    441
#>  3    40 Austria       1927                        686560                    216
#>  4    56 Belgium       1927                       1658836                    186
#>  5   100 Bulgaria      1927                         57196                     23
#>  6   124 Canada        1927                        152570                     74
#>  7   156 China         1927                       7622029                    117
#>  8   208 Denmark       1927                        119000                     17
#>  9   233 Estonia       1927                          3067                      5
#> 10   246 Finland       1927                       1528182                     79
#> # ℹ 6,347 more rows
#> # ℹ abbreviated names: ¹`Number of Days Lost in  Labour Disputes`,
#> #   ²`Number of Labour Disputes`
#> # ℹ 2 more variables: `Number of Workers Involved  in Labour Disputes` <dbl>,
#> #   `Working week  in manufacturing` <dbl>

Thank you for reading! Questions and remarks: Github or e-mail. If you have any improvements, feel free to submit a PR. In case of issues: Feel free to start a thread or point them out otherwise.