After all the input data has been processed, they need to be
harmonized and combined into a single file thatis used as input for one
of the downscaling algorithms. The sections below briefly describe the
main functions in mapspamc
to do this. Almost all functions
require only one input param
, an object with the
mapspamc
parameters. Note that ‘under the hood’ of these
functions a lot of other processes are triggered, which automatically
load the data that was created in previous steps, perform consistency
checks, reformat data from spatial to data table format and, where
needed, run algorithms to harmonize the various inputs. This means that
some of the functions might take some time to run, in particular if the
resolution is set to 30 arc second, which considerably increases the
size of the model. All the functions send a message to the screen when
they have finished so the user knows what is happening.
All the intermediate data output is saved in the
processed_data/intermediate_output
folder. In case the user
has set solve_level = 1
, the various functions split the
data into administrative level 1 chunks, which are saved in subfolders
using the level 1 administrative unit code as name of the folder. If
solve_level = 0
, only one subfolder, with the country’s
iso3c code as name, will be created.
prepare_physical_area()
combines the three agricultural
statistics input files (harvested area, production system shares and
cropping intensity) to calculate the physical cropping area for all
administrative units.
prepare_physical_area(param)
The function prepare_cropland()
combines the three
synergy cropland components (medium and maximum cropland and cropland
ranking maps) into a data table and stores this in a file.
prepare_cropland(param)
prepare_irrigated_area()
is similar to the function that
prepares the cropland as it combines the synergy irrigated area maps
(maximum irrigated area and ranking maps) into one file.
prepare_irrigated_area(param)
Before the algorithms in mapspamc
can be solved, it is
essential to harmonize the physical are information, which is derived
from national and subnational statistics, with the synergy cropland and
irrigated area maps, which are based on remote sensing information and
other spatially explicit data sources. As this data is coming from
different sources, they will are not always fully consistent. This would
not be a problem if the cropland exent would be larger than the physical
crop area, meaning there would be enough space to allocate the
statistics on the cropland map. Similarly if the total irrigated area in
the irrigated area map would be larger than the physical area of the
irrigated production systems the data would fit on the map.
Unfortunately, often this is not the case and, without adjustments, the
downscaling algorithms would be impossible to solve. In practice, we use
‘slack variables’ to ensure the model always solves (see Appendix). However, large slacks in the
solution signal serious inconsistencies and therefore we check for
inconsistencies and adjust the data already in the model preparation
stage.
harmonize_inputs()
uses a number of steps to harmonize
the various data sources:
In the end, the user has to decide if the slacks are acceptable or not. In our opinion small slacks (measured as share of total or administrative unit physical crop area) are no problem to deal with inconsistencies. However it slacks become very large we recommend scrutinizing the statistics and where possible make adjustments. Large slack often results from data entry errors or too rigid cropping intensity values. We provide some advise on how to deal with slack in the Appendix.
Compare and harmonize irrigated area and statistics In the next step, the irrigated area from the synergy irrigated area map and the irrigated physical area statistics are compared and harmonized. We start by ranking all irrigated grid cells from the most (rank 1) to the least (rank 10) reliable. Next, for each grid cell, we set the irrigated area to the minimum of the cropland area (taken from the previous harmonization step) and the irrigated area from the synergy irrigated area map. The grid cells are subsequently aggregated till the accumulated area is slightly larger than the irrigated physical area statistics. If the physical area turns out to be larger than the total irrigated cropland, a new iteration is started in which the irrigated area per grid cell is increased by taking the maximum of the cropland area and the irrigated area. If this is still not sufficient, the irrigated area is further enlarged by taking the maximum of the maximum cropland area and the irrigated area for each grid cell. It this is still not sufficient a warning is issued that solving the model will introduce slack. Finally, the grid cell ranking from the synergy cropland is adjusted to factor in all the selected irrigated area cells.
Select grid cells to match with statistics In the final harmonization step the cropland and irrigation extent are compared with the crop statistics. Similar to the previous step, the grid cells are ranked and the cropland is aggregated stating with the most preferred grid cells (now also including the irrigated area grid cells) till the total area is slightly larger than the physical area from the (sub)national statistics. This is consequentially done for each individual administrative unit starting with the most detailed level and ending at the national level. This process makes sure that the cropland and irrigated area extent is reduced to include only the most reliable grid cells, while at the same time it ensures that the cells are still large enough to fit the physical crop area statistics, including the irrigated area. The final cropland and irrigated area extent consist of the union of grid cells that are selected at each (sub)national administrative unit level processing step.
harmonize_inputs(param)
prepare_priors_and_scores()
creates the priors and the
scores for each grid cell. For convenience, the function will always
create data tables with priors and scores even though only one is needed
because the user only wants to run min_entropy
, which
requires the priors, or max_score
, which requires the
scores. In this way, the user can easily test different algorithms,
without going through the data pre-processing steps.
Note that the function might take some time to run as it implements
three consecutive processes. First, the biophysical suitability and
potential yield maps for all production system and crop combinations are
loaded and only grid cells that overlap with the cropland extent from
the previous step are selected, after which all data is merged into one
table and saved. This process also checks if the maps do not only
contain zero values and, where needed, replaces the map by a substitute
crop. This is important because it occasionally happens that the
biophysical suitability and potential yield maps indicate zero
suitability for a specific crop although the statistics suggest the crop
is produced in the country. If we would not correct for this, most
scores and priors for this crop would be zero, resulting in an
‘uninformed’ allocation of the crop, meaning it can be placed anywhere
as long as the the constraints are satisfied and the objective function
(minimization of cross-entropy or maximization of fitness score) is
optimized. In case all the substitute crops have zero values, a warning
is issued. We prepared a list of substitute crops that is stored in the
mappings/replace_gaez.sv
file. You can modify the list to
add other substitute crops if you think these are more appropriate. The
only requirement is that selected crop must be in the list of SPAM crops
that is stored in mappings/crop.csv
. The second and third
process create data files with the priors and scores using the
biophysical suitability and potential yield, among others, as input
data.
prepare_priors_and_scores(param)
Finally, all the inputs, including the harmonized cropland extent,
irrigated area extent and statistics, and the priors/scores are combined
in one GAMS gdx file, which is used as input to solve the downscaling
algorithm in GAMS. The file contains a number of sets and parameter
tables that define the model. Sets describe the dimensions of the model,
while parameters contain the data along these dimensions. As part of the
process to combine all the inputs, and if relevant, artificial
administrative units are created that represent the combination of all
administrative units per crop for which subnational statistics are
missing. These units are added to the list of administrative units from
the subnational statistics. The names of these units, stored in the
adm_area
parameter table, start with the name of the lower
level administrative unit which nests the units with missing data,
followed by ART
and the level for which data is missing and
ending with the crop for which data is not available.
combine_inputs(param)