Appendix

Use of GAMS to solve the `mapspamc` models

The mapspamc package depends on General Algebraic Modeling System (GAMS) (GAMS Development Corporation 2019) to solve the cross-entropy and fitness score models. GAMS is commercial software, designed for modelling and solving large complex optimization problems. To be able to solve the models in mapspamc a GAMS licence is required that includes full access (without size limitations) to CPLEX and IPOPT solvers (but see below for potential alternative solvers). The function run_mapspamc activates GAMS to run the model as defined by the user in the model setup step. The GAMS model log will be sent to the screen after the model run has finished. The log is a text file, which names starts with model_log_ and is saved in the processed_data/intermediate_output folder after the model has finished. The GAMS code (gms files) is stored in the gams folder in the mapspamc R library folder. Interested users might want to take a look and, if necessary, modify the code and run it directly in GAMS, separately from the mapspamc package.

Use of slack when solving models

The GAMS code involves the use of a slack variable to deal with data inconsistencies, which result in infeasible models. A slack variable transforms an inequality constraint in an equality constraint by adding ‘slack’ (Boyd and Vandenberghe 2004). This makes it possible to solve the model even though it is technically infeasible. Different weights (i.e. penalties) are used to capture the interaction and trade-offs between the different slack variables (e.g. a lower slack for irrigated area may result in a higher slack for available cropland). Relatively high weights are used to minimize the slack for cropland availability and irrigated area, whereas low weights are used for the subnational statistics, which values are considered less reliable (see GAMS gms files for which weights are used). After the model is run, the slack variables are saved in the model output gdx file, which is stored in the processed_data/intermediate_output folder.

Excessive slack points at inconsistencies between the (sub)national statistics and the available crop and irrigated area to which the statistics are allocated. To deal with these inconsistencies, the first step is to closely inspect the data, and in particular the subnational crop statistics, for data entry errors (e.g. missing values) and, if needed, correct them. If there are still problems, we recommend by adjusting the subnational statistics as they are sometimes inconsistent with the FAOSTAT national statistics.

GAMS solvers

Depending on the licence, GAMS is installed with several solvers. For each type of problem a default solver is pre-selected. Unless changed by the user, the run_mapspamc function will use the GAMS default solvers for linear-problems (i.e. the max_score model) and non-linear problems (i.e. the min_entropy model) to solve the models. To find out which solvers are available and which are the default, open the GAMS IDE: file -> options -> solvers. The user has the option to select one of the other linear- and non-linear solvers supported by GAMS: ANTIGONE, BARON, CPC, CPLEX, CONOPT4, CONOPT, GUROBI, IPOPT, IPOPTH, KNITRO, LGO, LINDO, LOCALSOLVER, MINOS, MOSEK, MSNLP, OSICPLEX, OSIGUROBI, OSIMOSEK, OSIXPRESS, PATHNLP, SCIP, SNOPT, SOPLEX, XA, XPRESS.

For the max_score model, which is a linear problem, it is recommended to use CPLEX. For non-linear problems, such as the min_entropy model, is not possible to predict at forehand, which solver performs best. It is recommended to start with using the IPOPT, which has shown good performance in solving cross-entropy models. An alternative option is CONOPT4, which, however, is often much slower, and in some cases is not able to solve the model.

Differences between `mapspamc` and SPAM

There are several differences between mapspamc and SPAM (Yu et al. 2020) related to the allocation algorithm, calculation of priors and the input data (Table @ref(tab:tab-spam-dif)). A main innovation is that apart from the SPAM cross-entropy model, mapspamc also provides the option to use an alternative ‘fitness score’ model. This model was developed to create crop distribution maps at a higher resolution of 30 arc seconds (Dijk et al. 2022). As a consequence of using priors based on socio-economic and biophysical suitability measures, the cross-entropy approach allocates relatively small shares of a large number of crops and farmings systems to grid cells. This is plausible for a 5 arc minute resolution when grid cells are relatively large (~ 10x10 km) and a large diversity of crops and farmings systems is to be expected. This is, however, less likely for high resolutions of 30 arc seconds, where grid cells are much smaller (~ 1x1 km) and it is more likely to observe clusters of grid cells that are populated by a small number of crops and production systems. To simulate this process, a ‘fitness’ score between 0 and 100, which measures both the socio-economic and biophysical suitability, is calculated for each grid cell. Crops and production systems will be allocated in such a way that the crop area weighted fitness score is maximized, subject to subnational crop area information and availability of cropland and irrigated area.

Another difference related to the allocation algorithm is the use of slacks in the constraints to better deal with data inconsistencies (see above). We also dropped the suitability constraint in mapspamc. In SPAM, the suitable crop area, calculated as the biophyiscal suitability times the cropland area in a grid cell, was used as a hard constraint. In practice, this constraint was often dropped because it severely limits the space to allocate crops, resulting in infeasible models. As an alternative, the cross-entropy and fitness score models in mapspam use the suitability information to inform the priors and fitness scores, in particular for the subsistence and low-input production systems. We also slightly modified the calculation of the high-input and irrigated crops, which are now based on the geometric average of accessibility and potential revenue indicators that are normalized by means of the min-max method.

Finally, we updated several input data sources with more recent and higher resolution products, in particular (a) accessibility, which is now based on travel time (Weiss et al. 2018), (b) population, taken from Tatem (2017), (c) urban extent taken from Schiavina et al. (2022), (d) irrigated area, which is a synergy product, based on GIA (Siebert et al. 2013) and GMIA (Meier, Zabel, and Mauser 2018), and (e) cropland, taken from several recent cropland products that can be combined to generate a synergy cropland map in the pre-processing step. We also reduced the number of standard crops from 42 in SPAM to 40 in mapspamc by merging the two millet (pearl and small millet) and coffee (arabica and robusta) species because statistical information is difficult to find at such detailed crop level (see below).

As a consequence of these modifications, the maps created with mapspamc will deviate from comparable information presented by SPAM. Nonetheless, output is expected to be comparable if similar model settings are used.

Differences between `mapspamc` and SPAM. SPAM refers to the latest version, which was used to generate SPAM2010 (Yu et al. 2020)
item	SPAM	mapspamc
Allocation algorithm
Objective function	Minimization of cross-entropy	Minimization of cross-entropy and maximization of fitness score
Resolution	5 arc minutes	5 arc minutes & 30 arc seconds
Slack variable	Used for model checking only	Added to the objective function to improve flexibility
Suitability constraint	Allocation cannot be larger than biophysically suitable cropland area	No longer implemented
Calculation of priors
Subsistence priors	based on share of rural population	based on share of rural population but set to zero when bio-physical suitability is zero
Low-input priors	based on potential revenue times accessibility	based on min-max normalized suitability
High-input priors	based on potential revenue times accessibility	based on weighted average of min-max normalized potential revenue and accessibility
Irrigated-area priors	based on potential revenue times accessibility	based on weighted average of min-max normalized potential revenue and accessibility
Data
Number of crops	42	40
Accessibility	Based on population density	Based on time to travel
Population	SEDAC v4	WorldPop
Urban extent	GRUMP	GHL-SMOD
Irrigation	GMIA	GIA & GMIA
Cropland	SASAM	Recent cropland products

List of `mapspamc` crops

mapspamc identifies 40 different crop (and crop groups) that together cover the full agricultural sector and are each identified by a four letter code (Table 1). The main reason for this classification is the limited availability of crop-specific biophysical suitability maps, which form a key input of the crop allocation process (see pre-processing spatial data for more information). It would be relatively easy to add new crops by splitting them off from broader crop groups (e.g. separate tomatoes from vegetables) if appropriate agricultural statistics and suitability maps are available. We plan to add an example on how to do this in future updates. The actual number of crops in the model is determined by the number of crops for which statistical information is provided.

List of `mapspamc` crops
number	name	group
1	Wheat	Cereals
2	Rice	Cereals
3	Maize	Cereals
4	Barley	Cereals
5	Millet	Cereals
6	Sorghum	Cereals
7	Other Cereals	Cereals
8	Potato	Roots & Tubers
9	Sweet Potato	Roots & Tubers
10	Yams	Roots & Tubers
11	Cassava	Roots & Tubers
12	Other Roots	Roots & Tubers
13	Bean	Pulses
14	Chickpea	Pulses
15	Cowpea	Pulses
16	Pigeon Pea	Pulses
17	Lentil	Pulses
18	Other Pulses	Pulses
19	Soybean	Oilcrops
20	Groundnut	Oilcrops
21	Coconut	Oilcrops
22	Oilpalm	Oilcrops
23	Sunflower	Oilcrops
24	Rapeseed	Oilcrops
25	Sesame Seed	Oilcrops
26	Other Oil Crops	Oilcrops
27	Sugarcane	Sugar Crops
28	Sugarbeet	Sugar Crops
29	Cotton	Fibres
30	Other Fibre Crops	Fibres
31	Coffee	Stimulates
32	Cocoa	Stimulates
33	Tea	Stimulates
34	Tobacco	Stimulates
35	Banana	Fruits
36	Plantain	Fruits
37	Tropical Fruit	Fruits
38	Temperate Fruit	Fruits
39	Vegetables	Vegetables
40	Rest Of Crops	Other

References

Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press. https://doi.org/10.1017/CBO9780511804441.

Dijk, Michiel van, Ulrike Wood-Sichra, Yating Ru, Amanda Palazzo, Petr Havlik, and Liangzhi You. 2022. “Generating multi-period crop distribution maps for Southern Africa using a data fusion approach.”

GAMS Development Corporation. 2019. “General Algebraic Modeling System (GAMS).” Fairfax, VA. www.gams.com.

Meier, Jonas, Florian Zabel, and Wolfram Mauser. 2018. “A global approach to estimate irrigated areas – a comparison between different data and statistics.” Hydrology and Earth System Sciences 22 (2): 1119–33. https://doi.org/10.5194/hess-22-1119-2018.

Schiavina, Marcello, M. Melchiorri, Martino Pesaresi, P. Politis, S. Freire, Luca Maffenini, P. Florio, et al. 2022. GHSL Data Package 2022. Luxembourg: Publications Office of the European Union. https://doi.org/10.2760/19817, JRC 129516.

Siebert, Stefan, Verena Henrich, Karen Frenken, and Jacob Burke. 2013. “Global Map of Irrigation Areas version 5.” Bonn, Germany: Rheinisch Friedrich-Wilhelms-University.

Tatem, Andrew J. 2017. “WorldPop, open data for spatial demography.” Nature Publishing Groups. https://doi.org/10.1038/sdata.2017.4.

Weiss, D. J., A. Nelson, H. S. Gibson, W. Temperley, S. Peedell, A. Lieber, M. Hancher, et al. 2018. “A global map of travel time to cities to assess inequalities in accessibility in 2015.” Nature 553 (7688): 333–36. https://doi.org/10.1038/nature25181.

Yu, Qiangyi, Liangzhi You, Ulrike Wood-Sichra, Yating Ru, Alison K. B. Joglekar, Steffen Fritz, Wei Xiong, Miao Lu, Wenbin Wu, and Peng Yang. 2020. “A cultivated planet in 2010 – Part 2: The global gridded agricultural-production maps.” Earth System Science Data 12 (4): 3545–72. https://doi.org/10.5194/essd-12-3545-2020.

Use of GAMS to solve the mapspamc models