popcompr is an R package to make it easier to compare different high resolution population datasets for humanitarian and research purposes. It is under active development and has not been released, but the code is available through a GPL3 license.

To install package from github:

remotes::install_github("mrajeev08/popcompr")

An example comparing 2019 population estimates for Lesotho from World Pop & Facebook/CIESIN

Included in the package are files to reproduce a minimal working example for the country of Lesotho. lesotho_wp_2019 and lesotho_fb_2019 are included in the package (use ?lesotho_wp_2019 for more details).

First load the necessary libraries

  • some extras for plotting (plotly & ggplot2)
library(popcompr)
library(raster)
#> Loading required package: sp
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following object is masked from 'package:raster':
#> 
#>     shift
library(foreach)
library(plotly)
#> Loading required package: ggplot2
#> 
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> The following object is masked from 'package:raster':
#> 
#>     select
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following object is masked from 'package:graphics':
#> 
#>     layout
library(ggplot2)
library(fasterize)
#> 
#> Attaching package: 'fasterize'
#> The following object is masked from 'package:graphics':
#> 
#>     plot
#> The following object is masked from 'package:base':
#> 
#>     plot
library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1

Comparing populations at pixel level

You can first get an estimate of the time required to return the comparison:

# comparing at pixel level with data included in the package
lesotho_fb_2019 <- raster(system.file("external/lso_facebook_2019.tif", package="popcompr"))
lesotho_wp_2019 <- raster(system.file("external/lso_worldpop_2019.tif", package="popcompr"))

pop_list <- list(lesotho_wp_2019, lesotho_fb_2019)
compare_pop(pop_list, parallel = FALSE, 
                   estimate_time = TRUE)
#> It will take approximately 22.73 seconds to complete the full job serially.

It shouldn’t take too long by that estimate, but here we’ll work through parallelizing. Here, I use the doParallel backend, but other do packages can be used (anything compatible with the %dopar% infix in the foreach package).

# parallelized example
library(doParallel)
#> Loading required package: iterators
#> Loading required package: parallel
cl <- makeCluster(detectCores() - 1) # how many cores do we have available
registerDoParallel(cl)
system.time(
  # defaults to estimate_time = FALSE & resolution of ~ 1km at equator
  exe <- compare_pop(pop_list, parallel = TRUE) 
)
#>    user  system elapsed 
#>   0.540   0.039  21.549
stopCluster(cl)

compare_pop will warn you if any people were not resampled to the comparison grid (this happens sometimes when pops are at the edge of the original raster).

The output is a raster brick with a layer corresponding to each of the input population rasters. We can vizualize and compare the rasters:

Vizualizing and summarizing comparisons

plot_compare and summary_compare are convenience functions to generate automatic plots. The default is map of the differences between the population datasets.

Other options include: - hex plots

plot_compare(exe, type = "hex")

plot_compare(exe, type = "pairs")
#> Loading required package: GGally
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2

  • histogram
plot_compare(exe, type = "hist")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All plots are ggplots and as such can be customized by adding scales, themes, etc. You may get a warning if you’re overriding an existing theme.

out <- plot_compare(exe, type = "hex")
out +
  scale_fill_distiller(palette = "Greens", trans = "log")
#> Scale for 'fill' is already present. Adding another scale for 'fill', which
#> will replace the existing scale.

Note that for the map plots, changing the scales will mess up the labeling of the values. This is known issue that I’ll try to fix!

You can also make these interactive using plotly:

plot_compare(exe, type = "map", interactive = TRUE) 

Note that for the tooltips for these plots show the transformed values, should show the raw values, otherwise the interactive view is not that helpful! This is known issue that I’ll try to fix!

If you’re comparing a large number of rasters, you can also save the output, rather than returning the plot to your R session (just pass a file path and an extension).

plot_compare(exe, type = "map", save = TRUE, path = "test", ext = "jpeg") 

# interactive plots get saved as html with package `htmlwidgets`
plot_compare(exe, type = "map", save = TRUE, interactive = TRUE,
             path = "test", ext = ".html") 

You can also generate summary stats. See ?summary_compare for more details on how to customize the summary stats.

summary_compare(exe)
#>            popcmp_lso_worldpop_2019 popcmp_lso_facebook_2019
#> sum                    1.884069e+06             2.135022e+06
#> mean                   4.550781e+01             1.276545e+02
#> max                    3.130142e+03             3.683978e+03
#> min                    9.557885e-02             2.471747e+00
#> sd                     1.153118e+02             2.479142e+02
#> length_nas             6.460100e+04             8.927700e+04

Comparing at the administrative level

You can access country shapefiles using get_country_shape which will return a shapefile as an sf object. These are available through the geoBoundaries API, where more documentation can be found on available datasets. In addition, geoboundaries is a dataset provided in the package to find country codes and available admin levels.

# Find the right iso code & see which admin levels are available
dplyr::filter(geoboundaries, grepl("Les", country))
#>   iso_code   year admin_level                     source
#> 1      LSO 2017.0        ADM0              OpenStreetMap
#> 2      LSO 2017.0        ADM1              OpenStreetMap
#> 3      LSO 2020.0        ADM2 Humanitarian Data Exchange
#> 4      TLS 2017.0        ADM0              OpenStreetMap
#> 5      TLS 2017.0        ADM1              OpenStreetMap
#> 6      TLS 2018.0        ADM2                        HDX
#>                                         license     country
#> 1   Open Data Commons Open Database License 1.0     Lesotho
#> 2   Open Data Commons Open Database License 1.0     Lesotho
#> 3                                                   Lesotho
#> 4   Open Data Commons Open Database License 1.0 Timor-Leste
#> 5   Open Data Commons Open Database License 1.0 Timor-Leste
#> 6 Humanitarian use only - Noted in metadata tab Timor-Leste

We get the Lesotho shapefile to the admin level 2 (the type argument defaults to a simplified shapefile which is faster for plotting and downloading, but users may prefer unsimplified files for better accuracy see the geoBoundaries API documentation for more information).

les_shape <- get_country_shape(country_iso = "LSO", admin_level = 2)
#> Reading layer `geoBoundariesSSCU-3_0_0-LSO-ADM2' from data source `https://geoboundaries.org/data/geoBoundariesSSCU-3_0_0/LSO/ADM2/geoBoundariesSSCU-3_0_0-LSO-ADM2.geojson' using driver `GeoJSON'
#> Simple feature collection with 78 features and 5 fields
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: 27.01125 ymin: -30.67785 xmax: 29.45737 ymax: -28.57103
#> geographic CRS: WGS 84

Then we can aggregate the population rasters to it:

# Get the shapefile at admin 2
les_shape <- aggregate_to_shp(brick = exe, sf = les_shape, max_adjacent = 100)

aggregate_to_shp will warn you if grid cell values go unallocated to the shapefile. In the case that many people go missing, you can set max_adjacent higher (this determines how many grid cells to buffer to when looking for the nearest non-NA neighbor), but it may also be wise to check the extent & boundaries of the shapefile vs. the population rasters.

We can compare these differences using the same functions as with the pixel level comparisons:

plot_compare(les_shape)
#> Joining, by = "id"


plot_compare(les_shape, type = "hex")


plot_compare(les_shape, type = "pairs")


plot_compare(les_shape, type = "hist")
#> Joining, by = "id"
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

summary_compare(les_shape)
#>            popcmp_lso_worldpop_2019 popcmp_lso_facebook_2019
#> sum                     1884068.804               2135022.09
#> mean                      24154.728                 27372.08
#> max                      203692.785                229895.84
#> min                        3901.574                  7282.35
#> sd                        22081.908                 25066.39
#> length_nas                    0.000                     0.00

User specific inputs

You can use your own raster and shapefile inputs, they just need to be in the WGS84 (lat/long) coordinate system.