## Objective functions

mapspamc includes two types of models: (1) Minimization of cross-entropy, which is used by SPAM (You and Wood 2006; You, Wood, and Wood-Sichra 2009; You et al. 2014; Yu et al. 2020) and the maximization of fitness score, which was developed by Dijk et al. (2022). The sections below describe the basic structure of the two models. Additional technical information is presented in the appendix.

### Minimization of cross-entropy

The cross-entropy approach (min_entropy) is defined as a non-linear optimization problem in which the error between prior information on the location of crop area shares ($$\pi_{ij}$$) and model allocated area shares ($$s_{ijl}$$) is minimized subject to a number of constraints.1 More precisely the cross-entropy objective function ($$CE$$) is specified as follows:

$\begin{equation} \min\ CE(s_{ijl}, \pi_{ijl})= \sum_{i} \sum_{j} \sum_{l}s_{ijl} \ lns_{ijl}- \sum_{i} \sum_{j} \sum_{l}s_{ijl} \ ln\pi_{ijl} \end{equation}$

### Maximization of fitness score

In the maximization of fitness score approach (max_score) the spatial allocation model is defined as a linear optimization problem in which the weighted ‘fitness score’ over all grid cells, crops and production systems is maximized. The fitness score, which ranges from 0 to 1, is a composite indicator of the economic and biophysical suitability for each crop and production system combination at the grid cell level. The solution to the model implies that on average all crops will be allocated to locations, which are considered as most appropriate from both an economic and biophysical viewpoint. The objective function ($$S$$) is defined as follows:

$\begin{equation} \max\ S(s_{ijl}, score_{ijl})= \sum_{i} \sum_{j} \sum_{l}s_{ijl} \times score_{ijl} \end{equation}$

where, $$s_{ijl}$$ is the share of total physical area of crop $$j$$ and production system $$l$$ allocated to grid cell $$i$$, and $$score_{ijl}$$ is the fitness score for all $$j$$ and $$l$$ combinations.

### Comparison

Cross entropy minimization is better suited for maps at a resolution of 5 arc minute as it results in a more fragmented allocation per grid cell. This is desirable when the resolution is relatively coarse (e.g. 5 arc minutes) and the number of crop and production systems that can be observed a grid cell is relatively large. In contrast, maximizing the fitness score results in a much more concentrated crop area allocation, where only a few crops and production systems are allocated to a grid cell. This makes sense for relatively small grid cells (e.g. 30 arc seconds) where one only expects to observed a small number of crop and production systems, but is not realistic for large grid cells. Hence, we recommend to use the cross-entropy model for 5 arc minute resolution and the fitness score model for 30 arc second resolution crop distribution maps.

## Economic and biophysical determinants of crop location

To solve the models in mapspamc, either fitness scores or prior area information is needed to allocate the statistics to each grid cell. The fitness scores and priors are informed by a combination of economic (e.g. market access and population density) and biophysical factors (e.g growing conditions determined by soil, temperature and precipitation) that jointly determine the location of crops. As the impact of these factors will be different for production system, we define separate fitness scores and priors for each of them.

### Fitness scores

For subsistence farmers, who mainly grow crops for their own consumption, we assume that rural population density can be used as proxy for the location of subsistence production systems. We use min-max normalization to create an index of rural population density between 0 and 1:

$\begin{equation} score_{ijl} = \frac{rpd_{ijl}-min(rpd_{ijl})}{max(rpd_{ijl})-min(rpd_{ijl})} \quad l = subsistence \quad \forall i\ \forall j \end{equation}$

where $$rpd_{ijl}$$ is the rural population density defined for each grid cell $$i$$, crop $$j$$ and production system $$l$$, which is converted to the $$score_{ijl}$$ between 0 and 1. In contrast to the other three systems, the allocation of subsistence farmers is modeled by means of a constraint instead of through the objective function (see next section). In this way, it is ensured that the total subsistence crop area is allocated proportional to the rural population density. Distributing subsistence production systems by means of maximizing the fitness score, might lead to a solution in which a large share of subsistence crop area will be allocated to grid cells with the highest rural population density. Such a concentrated allocation is not realistic for subsistence farming.

Low-input rainfed systems are characterized by smallholder farms that mostly sell their products on local markets. We assume that farmers will specialize in growing crops with the highest biophysical suitability, while other factors such as market access, plays a minor role for this production system. The fitness score is defined as follows:

$\begin{equation} score_{ijl} = \frac{suit\_index_{ijl}-min(suit\_index_{ijl})}{max(suit\_index_{ijl})-min(suit\_index_{ijl})} \quad l = low\ input \quad \forall i\ \forall j \end{equation}$

where $$score_{ijl}$$ and $$suit\_index_{ijl}$$ are the fitness score and the biophysical suitability score for grid cell $$i$$, crop $$j$$ and production system $$l$$, respectively. Similar to the fitness score for subsistence farming, we apply the min-max normalization approach to create an index in the range of 0 and 1.

We assume that high-input rainfed and irrigated production systems mostly consist of medium to large commercial farms, which location is largely determined by profit maximization. To simulate this, the fitness core is defined as the square root of potential revenue and market access:

$\begin{equation} score_{ijl} = \frac{\sqrt{access_{i} \times rev_{ijl}}-min(\sqrt{access_{i} \times rev_{ijl}})}{max(\sqrt{access_{i} \times rev_{ijl}})-min(\sqrt{access_{i} \times rev_{ijl}})} \quad l \in high\ input,\ irrigated \quad \forall i\ \forall j \end{equation}$

where $$access_{ijl}$$ is market access approximated by the inverse of travel time to large cities and $$rev_{ijl}$$ is potential revenue, calculated as the product of potential yield ($$yield_{ijl}$$) for grid cell $$i$$, crop $$j$$ and system $$l$$ and national level crop price ($$price_j$$):

$\begin{equation} rev_{ijl} = pot\_yield_{ijl} \times price_{i} \end{equation}$

Again, we use the min-max normalization to create an index in the range of 0-1.

### Priors

We use the fitness scores as a basis to estimate the priors. To priors for subsistence farming are calculated as follows:

$\begin{equation} prior_{ijl} =\frac{score_{ijl}}{\sum_{j} \sum_{l}{score_{ijl}}} \times crop\_area_{jl} \quad l = subsistence \quad \forall i\ \forall j \end{equation}$

where $$prior_{ijl}$$ is the prior crop area in ha for subsistence farming defined for each grid cell $$i$$ and crop $$j$$ and $$crop\_area_{jl}$$ is the total physical area per crop for the subsistence production system taken from the statistics.

The prior areas for the other three systems are estimated by distributing the residual grid cell cropland (i.e. after subtracting the prior area for subsistence production systems) using the scores as weights:

$\begin{equation} prior_{ijl} = \frac{score_{ijl}}{\sum_{j} \sum_{l} score_{ijl}} \times residual\_cropland_{i} \quad l \in low\ input,\ high\ input,\ irrigated \quad \forall i\ \forall j \end{equation}$

where $$residual\_cropland_{i}$$ is the grid cell cropland after subtracting the prior for subsistence farming. In case the prior area for subsistence farming is larger than than the cropland in the grid cell all the priors for the other production systems are set to zero.

Finally, the priors are converted in to crop area shares ($$\pi_{ij}$$), which are an input into the cross-entropy objective function:

$\begin{equation} \pi_{ij} = \frac{prior_{ijl}}{\sum_{i}{prior_{ijl}}} \end{equation}$

## Constraints

The allocation of crop area is determined by either minimizing the cross-entropy function or maximizing the fitness scores subject to a set of constraints:

1. A constraint defining the range of permitted physical area shares:

$\begin{equation} 0 \leq s_{ijl} \leq 1 \end{equation}$

1. A constraint, which forces that the sum of allocated physical area shares for each crop $$j$$ and production system $$l$$ over all grid cells $$i$$ is equal to one. This ensures all physical crop area is allocated to grid cells.

$\begin{equation} \sum_{i}s_{ij}=1 \qquad \forall j\ \forall l \end{equation}$

1. A constraint, which specifies that the total physical area allocated to a grid cell $$i$$ is lower or equal than the available cropland ($$avail_{i}$$) in grid cell $$i$$.

$\begin{equation} \sum_{j} \sum_{l}crop\_area_{jl} \times s_{ijl} \leq avail_{i} \qquad \forall i \end{equation}$

where $$crop\_area_{jl}$$ is total national-level physical area for crop $$j$$ and production system $$j$$.

1. A constraint, which ensures that the allocated physical area is equal to the sub-national statistics ($$sub\_crop\_area_{jk}$$) that are available for statistical reporting unit $$k$$ (i.e. a region or province) and crops $$J$$. Crops for which only national level information is available are not affected by this constraint.

$\begin{equation} \sum_{i \in k} \sum_{l}crop\_area_{jl} \times s_{ijl} = sub\_crop\_area_{jk} \qquad \forall j \in J\ \forall k \end{equation}$

1. A constraint, which specifies that the total allocated physical area for all irrigated crops $$L$$ in grid cell $$i$$ does not exceed the total available irrigated area ($$irr\_area_{i}$$) in that grid cell.

$\begin{equation} \sum_{j \in L}crop\_area_{jl} \times s_{ijl} \leq irr\_area_{i} \qquad \forall i \end{equation}$

1. A constraint (only for the max_score model), that allocates the subsistence production system crops proportional to rural population density. In cases, where the biophysical conditions are considered not suitable for a certain crop, we assume a zero allocation:

$\begin{equation} \bar{s}_{ijl} = \frac{rur\_pop_i}{\sum_{i}{rur\_pop_i}} \quad l = subsistence \quad \forall i\ \forall j \end{equation}$

where $$\bar{s}_{ijl}$$ is the share of physical area allocated to grid cell $$i$$ for crop $$j$$ and production system $$l$$ and $$pop_i$$ is the rural population density in grid cell $$i$$.

Dijk, Michiel van, Ulrike Wood-Sichra, Yating Ru, Amanda Palazzo, Petr Havlik, and Liangzhi You. 2022. Generating multi-period crop distribution maps for Southern Africa using a data fusion approach.”
Kullback, S., and R. A. Leibler. 1951. On Information and Sufficiency.” The Annals of Mathematical Statistics 22 (1): 79–86. https://doi.org/10.1214/AOMS/1177729694.
You, Liangzhi, and Stanley Wood. 2006. An entropy approach to spatial disaggregation of agricultural production.” Agricultural Systems 90 (1): 329–47. https://doi.org/10.1016/j.agsy.2006.01.008.
You, Liangzhi, Stanley Wood, and Ulrike Wood-Sichra. 2009. Generating plausible crop distribution maps for Sub-Saharan Africa using a spatially disaggregated data fusion and optimization approach.” Agricultural Systems 99 (2): 126–40. https://doi.org/10.1016/j.agsy.2008.11.003.
You, Liangzhi, Stanley Wood, Ulrike Wood-Sichra, and Wenbin Wu. 2014. Generating global crop distribution maps: From census to grid.” Agricultural Systems 127: 53–60. https://doi.org/10.1016/j.agsy.2014.01.002.
Yu, Qiangyi, Liangzhi You, Ulrike Wood-Sichra, Yating Ru, Alison K. B. Joglekar, Steffen Fritz, Wei Xiong, Miao Lu, Wenbin Wu, and Peng Yang. 2020. A cultivated planet in 2010 – Part 2: The global gridded agricultural-production maps.” Earth System Science Data 12 (4): 3545–72. https://doi.org/10.5194/essd-12-3545-2020.