1. Model setup • mapspamc

To create crop distribution maps with mapspamc, several parameters need to be set first (note that we assume that the mapspamc package and other required software has been already installed and is working).

Create model template

The first step is to create a project with RStudio, activate the mapspamc package by typing library("mapspamc") and use the create_model_template function to create a set of template folders and scripts that implement the six steps to create crop distribution maps with mapspamc. An alternative approach would be to copy all the scripts from one of the country examples into the RStudio project and use those as a starting point. In the vignettes, We will use the scripts of the Malawi example for illustration.

# First create a new RStudio project (here we use the name mapspamc_mwi but any name is possible) in the folder c:/temp (or a location of your choice).

# Load mapspamc
library("mapspamc")

# Create the model template in the RStudio project folder
create_model_template("c:/temp/mapspamc_mwi")

Set the model parameters and the location of the files

Nearly all functions in mapspamc need input on (a) key parameters that determine the design of the model and how it is solved; (b) the location of the mapspamc database with the raw data that will be further processed and (c) the location of the model itself. All these pieces of information are bundled in a mapspamc_par object that is generated by the function set_mapspamc_par(). This functions is called in the script 01_model_setup\01_model_setup.R. As the model parameters are input into all all functions, this script is always loaded at the beginning of all the other scripts using the source() command. In this way, you only have to provide the parameter once. It also ensures that all the necessary packages are loaded into R.

The following model parameters need to be set before mapspamc can be run:

iso3c country code (iso3c). The three letter ISO 3166-1 alpha-3 country code, also referred to as iso3c is used to extract national statistics from global datasets. A list of country codes can be found in Wikipedia.
Year (year). The year (four digits) will be used to select relevant global datasets when data is available for multiple years, in particular the FAOSTAT national statistics and population maps. It is also useful to distinguish mapspamc output when the model is used to create crop distribution maps for multiple years.
Spatial resolution (res). The user can choose between two spatial resolutions: 30 arc seconds (~1x1 km at the equator) by selecting "30sec" and 5 arc minutes (~10x10 km at the equator) by selecting "5min". Note that selecting a resolution of 30 arcsec increases the model dimensions (number of grid cells) by a factor 100! This might mean the model becomes too big to solve and/or your computer runs out of memory. We recommend experimenting with the 5 arc minute resolution first!
Depth of subnational statistics (adm_level). This parameter defines the most detailed level at which data for subnational administrative units are available. Accepted inputs are 0 (only national level data), 1 (national level and first level subnational data) and 2 (national level, first and second level subnational data). In most cases first level data refers to states and provinces and second level data indicates districts, counties and departments. In the unusual case that only national and district level data is available, the user should set adm_level to 1 as the depth of the subnational statistics is 1.
Level at which the model is solved (solve_level). The model can be solved at the national level (0) or at the subnational level 1 (1). In the first case, all grid cells are optimized using national level constraints. This means that for crops for which no subnational level information is available, the national statistics for this crop will be distributed over the whole country, subject to the standard model constraints. In the second case, the model is split into several administrative level 1 models, which are each solved separately. Breaking the model up in smaller pieces will result in less complex and less memory intensive models, which are easier to solve. The disadvantage is that the subnational level models require full level 1 administrative unit information (i.e. harvested area statistics, cropping intensity statistics and production system shares) for all crops that are cultivated in the level 1 administrative unit. For most countries, the subnational statistics tend to be incomplete and full crop information is only available at the national level (from FAOSTAT). Hence, the only way to run the model at the level 1 administrative unit is to impute the missing crop area values (e.g. by taking values for comparable crops for which information is available, use national values or look for secondary information). Running the model at level 1 administrative unit is most useful for larger countries, such as China, Brazil and the USA, for which detailed subnational statistics are available and models tend to be large and complex due to the large size of the country.
Type of model (‘model’). You can choose between two types of models: (1) minimization of cross-entropy (’min_entropy) and (2) maximization of fitness score (max_score`) See the section on Model description for details.

The Location of the following three folders need to be added when setting up the model:

main model folder (model_path). This folder will be used to store all model related data, including processed data, mapping tables, intermediate output, final results and, unless specified otherwise by the user, the raw data.
Location of mapspam_db (db_path). Optionally the user can set a folder for the location of the mapspamc database. This might be convenient if you want to use a server to store the database (> 5 GB). A folder with the name mapspamc_db will automatically be created in the db_path set by the user. If db_path is not specified the mapspamc_db folder will be created inside the model_db folder.
Location of GAMS (gams_path). Optionally the user can set a folder for the location of GAMS, which also includes the GAMS libraries that are needed to load GAMS gdx files with R. If GAMS was installed properly, mapspamc will automatically find the GAMS system directory. However, sometimes the location is not stored correctly, resulting in errors. We therefore always recommend setting the location of GAMS manually. This might also be convenient when multiple versions of GAMS are installed on one machine and one version is preferred over the other.

The example below sets the mapspamc parameters for Malawi and stores them in an object called param. Param is an arbitrary name but we recommend not to change it as we consistently refer to param in the template scripts and use it in the documentation of most functions.

# Load mapspamc
library("mapspamc")

# Set the folder where the model will be stored
# Note that R uses forward slashes even in Windows!!
model_path <- "c:/temp/mapspamc_mwi"

# Creates a database folder with the name mapspamc_db in c:/temp.
db_path <- "c:/temp"

# Sets the location of the version of GAMS that will be used to solve the model
gams_path <- "C:/MyPrograms/GAMS/35"

# Set mapspamc parameters for the min_entropy_5min_adm_level_2_solve_level_0 model
param <- mapspamc_par(
  model_path = model_path,
  db_path = db_path,
  gams_path = gams_path,
  iso3c = "MWI",
  year = 2010,
  res = "5min",
  adm_level = 2,
  solve_level = 0,
  model = "min_entropy"
)

# Show parameters
print(param)
#> country name:  Malawi 
#> iso3n:  454 
#> iso3c:  MWI 
#> continent:  Africa 
#> year:  2010 
#> model:  min_entropy 
#> resolution:  5min 
#> adm level:  2 
#> solve level:  0 
#> model path:  c:/temp/mapspamc_mwi 
#> mapspamc_db path:  c:/temp/mapspamc_db 
#> gams path: C:/MyPrograms/GAMS/35 
#> model folder:  min_entropy_5min_adm_level_2_solve_level_0 
#> crs:  epsg:4326

Create a folder structure

mapspamc includes a function create_folders() to create the model structure in model_path defined by the user . The only input required is param which contains all the model parameters. The function will create three folders: (1) mapspamc_db and several subfolders in model_path (or in the db_path if specified by the user), where all the raw input data will be stored, (2) processed_data is used to save all the intermediary data products as well as the final results and (3) mappings contains all the lists and mappings that are needed to define the model (e.g. number crops) and harmonize various data sources (concordance tables). All tables are in csv format and will be created automatically into the mappings folder. create_folders() will only create csv files if the files do not exist. Hence, updated files will not be overwritten. If the user wants to recreate the original tables, simply remove the mappings folder and/or included files and rerun create_folders.

# Create folder structure in the mapspamc_path
create_folders(param)

Prepare the subnational administrative unit map

To allocate subnational crop area information, a complementary map (most likely in the form of a shapefile) with the location of the subnational administrative units is needed. The processing of this map is done in the script 01_model_setup\02_prepare_adm_map_and_grid. This involves multiple steps, which cannot be captured by a single function. These are explained next.

Before the shapefile can be processed the user needs to load the file in R by setting the filename (in this case adm_2010_MWI.shp), which is stored in the mapspamc_db/adm/ folder. For illustrative purposes, we also added the map (in rds format) to the mapspamc package. The sf::st_transform command projects the map to epsg:4326 so that it is consistent with all the other spatial data.

# replace the name of the shapefile with that of your own country.
iso3c_shp <- "adm_2010_MWI.shp"

# load shapefile (not needed below as adm_map_raw is stored inside the mapspamc package)
# adm_map_raw <- read_sf(file.path(param$db_path, glue("adm/{param$iso3c}/{iso3c_shp}")))

# plot
plot(mapspamc::adm_map_raw$geometry)


# Project to standard global projection
adm_map <- adm_map_raw %>%
  sf::st_transform(param$crs)

It is essential that the attribute table (i.e. the table with the list of subnational administrative units that is attached to the shapefile) with the names and codes of the administrative units has the correct column names. To rename to columns, the user has to specify the column names of the attribute table which contain the original names and code of the (sub)national units and store them in the admX_name_orig and admX_code_orig variables, respectively, where X is the administrative unit level. If you only have data at level 0 and 1, simply delete the lines of script that refer to administrative unit level 2, etc. In case of Malawi, we have data at level 0, 1 and 2 so for each administrative unit level the column names for name and code need to be set. Columns in the attribute table that are not relevant will automatically be deleted. The script will also union all polygons with the same name and code. This is convenient if the user has crop area information for the combination of two subnational regions (e.g. ADM A + ADM B) but a shapefile with the individual polygons of both regions (ADM A & ADM B). In such case, you can simply give each polygon the same name (e.g. ADM_A_B) and code (A_B) and they will automatically be dissolved into one polygon.

# Check names
names(adm_map)
#>  [1] "AREA"       "PERIMETER"  "G2008_2_"   "G2008_2_ID" "ADM2_CODE" 
#>  [6] "ADM0_CODE"  "ADM0_NAME"  "ADM1_NAME"  "ADM1_CODE"  "ADM2_NAME" 
#> [11] "DISP_AREA"  "LAST_UPDAT" "CONTINENT"  "REGION"     "STR_YEAR0" 
#> [16] "STR_YEAR1"  "STR_YEAR2"  "EXP_YEAR0"  "EXP_YEAR1"  "EXP_YEAR2" 
#> [21] "FIPS0"      "FIPS1"      "FIPS2"      "NAME1"      "NAME2"     
#> [26] "O_FIPS1"    "O_FIPS2"    "O_NAME1"    "O_NAME2"    "geometry"

# Set the original names, i.e. the ones that will be replaced. Remove adm1
# and/or adm2 entries if such data is not available.
adm0_name_orig <- "ADM0_NAME"
adm0_code_orig <- "FIPS0"
adm1_name_orig <- "ADM1_NAME"
adm1_code_orig <- "FIPS1"
adm2_name_orig <- "ADM2_NAME"
adm2_code_orig <- "FIPS2"

# Replace the names
names(adm_map)[names(adm_map) == adm0_name_orig] <- "adm0_name"
names(adm_map)[names(adm_map) == adm0_code_orig] <- "adm0_code"
names(adm_map)[names(adm_map) == adm1_name_orig] <- "adm1_name"
names(adm_map)[names(adm_map) == adm1_code_orig] <- "adm1_code"
names(adm_map)[names(adm_map) == adm2_name_orig] <- "adm2_name"
names(adm_map)[names(adm_map) == adm2_code_orig] <- "adm2_code"

# Only select relevant columns
adm_map <- adm_map %>%
  dplyr::select(adm0_name, adm0_code, adm1_name, adm1_code, adm2_name, adm2_code)

# Union separate polygons that belong to the same adm
adm_map <- adm_map %>%
  group_by(adm0_name, adm0_code, adm1_name, adm1_code, adm2_name, adm2_code) %>%
  summarize(.groups = "drop") %>%
  ungroup() %>%
  mutate(
    adm0_name = param$country,
    adm0_code = param$iso3c
  )

# Check names
head(adm_map)
#> Simple feature collection with 6 features and 6 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 32.67395 ymin: -14.60495 xmax: 35.2957 ymax: -9.49193
#> Geodetic CRS:  WGS 84
#> # A tibble: 6 × 7
#>   adm0_name adm0_code adm1_name                    adm1_code adm2_name adm2_code
#>   <chr>     <chr>     <chr>                        <chr>     <chr>     <chr>    
#> 1 Malawi    MWI       Area under National Adminis… MI01      Area und… MI01001  
#> 2 Malawi    MWI       Central Region               MI02      Dedza     MI02001  
#> 3 Malawi    MWI       Central Region               MI02      Dowa      MI02002  
#> 4 Malawi    MWI       Central Region               MI02      Kasungu   MI02003  
#> 5 Malawi    MWI       Central Region               MI02      Lilongwe  MI02004  
#> 6 Malawi    MWI       Central Region               MI02      Mchinji   MI02005  
#> # ℹ 1 more variable: geometry <POLYGON [°]>
names(adm_map)
#> [1] "adm0_name" "adm0_code" "adm1_name" "adm1_code" "adm2_name" "adm2_code"
#> [7] "geometry"

Polygons of administrative units where no crops can or should be allocated must be removed. If this is not done, and in the rare case when the area statistics are larger than the cropland exent, the model might still allocate crops into these regions. The Malawi shapefile includes two of such regions. One is the Area under National Administration, which is the part of Lake Malawi that belongs to Malawi, and Likoma, several small islands in lake Malawi that are not covered by the statistics. The script below shows how to remove them. It there is no reason to remove polygons, simply delete the lines of script in the template.

# Set names of ADMs that need to be removed from the polygon.
adm1_to_remove <- c("Area under National Administration")
adm2_to_remove <- c("Likoma")

# Remove ADMs
adm_map <- adm_map %>%
  filter(adm1_name != adm1_to_remove) %>%
  filter(adm2_name != adm2_to_remove)

plot(adm_map$geometry, main = "ADM polygons removed")

To link the subnational statistics to other spatial data a mapping table is needed (adm_list) that contains information on how the administrative units at various levels are nested, which is precisely the information that is stored in the attribute table of the administrative unit map. The following code, extracts the list from the shapefile and saves it as csv file in the processed_data/lists folder.

# Create adm_list
create_adm_list(adm_map, param)

After the administrative region shapefile has been processed, save it in the right location.

temp_path <- file.path(param$model_path, glue("processed_data/maps/adm/{param$res}"))
dir.create(temp_path, showWarnings = FALSE, recursive = TRUE)

saveRDS(adm_map, file.path(temp_path, glue("adm_map_{param$year}_{param$iso3c}.rds")))
write_sf(adm_map, file.path(temp_path, glue("adm_map_{param$year}_{param$iso3c}.shp")))

By running create_adm_map_pdf() a pdf file with maps of the (sub)national regions is created in the processed_data/maps/adm folder.

# Create pdf file with the location of administrative units
create_adm_map_pdf(param)

Create grid and rasterize the subnational administrative unit map

The create_grid() function creates a spatial grid at selected spatial resolution (i.e. 30 arc seconds or 5 arc minutes). The grid is subsequently used by rasterize_adm_map(), which rasterizes the administrative unit map at the selected resolution. Both the grid and the rasterized map are required by the model as inputs and are saved automatically in the right location.

# Create grid
create_grid(param)

# Rasterize administrative unit map
rasterize_adm_map(param)

This is all it takes to set up a mapspamc model! The next step is processing the raw subnational statistics and spatial data.