Spatial data analysis of malnutrition among children under-five years in Ethiopia

Background Childhood malnutrition is a major cause of child mortality under the age of 5 in the sub-Saharan Africa region. This study sought to identify the risk factors and spatial distribution of the composite index of anthropometric failure (CIAF). Methods Secondary data from 2000, 2005, 2011, and 2016 Ethiopian Health and Demographic Survey (EDHS) were used. The generalized geo-additive mixed model was adopted via the Integrated Nested Laplace Approximation (INLA) with a binomial family and logit link function. Results The CIAF status of children was found to be positively associated with the male gender, the potency of contracting a disease, and multiple births. However, it was negatively associated with family wealth quartiles, parental level of education, place of residence, unemployment status of mothers, improved sanitation, media exposure, and survey years. Moreover, the study revealed significant spatial variations on the level of CIAF among administrative zones. Conclusions The generalized geo-additive mixed-effects model results identified gender of the child, presence of comorbidity, size of child at birth, dietary diversity, birth type, place of residence, age of the child, parental level of education, wealth index, sanitation facilities, and media exposure as main drivers of CIAF. The results would help decision-makers to develop and carry out target-oriented programs. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01391-x.


Introduction
A considerable number of evidence from different sources currently indicate that nearly one in every nine children under the age of five around the globe had single or multiple forms of malnutrition, and nearly half of the deaths among the under-five children population (U5C) is due to poor nutrition [1]. Moreover, this prevalence is persistently higher in the Sub-Saharan African (SSA) region including Ethiopia [2]. In Ethiopia, the proportion of underweight (low weight for age) fell from 47.12 to 21% of children, stunting (too short for age) fell from 51.22 to 37%, and wasting (too low weight for height) fell from 10.37 to 7% of children from 2000 to 2019 [3].
Previous studies on the prevalence of undernutrition in Ethiopia had focused on a single conventional anthropometric index of stunting, underweight, wasting [4][5][6][7][8][9] separately proposed by World Health Organization (WHO) [2]. However, those conventional indices, when used alone, failed to give true estimates of the real impact of childhood malnutrition. The composite index of anthropometric failure (CIAF) might overcome such limitations through an aggregation of different forms of malnutrition measures [4][5][6][7]10].
Looking into the global experience, one can see that countries such as China, India, Bangladesh, Malawi, and others have adopted the CIAF approach to defining their U5C's nutritional status [11][12][13][14][15][16]. Yet such national wide studies are missing in most developing nations like Ethiopia. Besides, the majority of the previous studies carried out on nutritional status in this country have mainly focused on different socio-economic, demographic, or health-related covariates, disregarding spatial and nonlinear effects of covariates. Moreover, the studies didn't target the impact of climate and environmental covariates which is now widely acknowledged to be a threat to food security and nutrition around the globe [17,18]. As such climate change directly affects crop production and therefore food availability [19]. Rainfall shortfalls occur frequently, while temperature increase from time to time and hence increases the rate of evapotranspiration [20]. These bring drought, ultimately leading to lower crop yields and worsened food security and nutrition for vulnerable populations [21].
Further, those studies had only reported geographical variations of CIAF at higher (country/region) aggregated levels, and zonal level variation is rarely examined. A closer look into the contents of the studies shows that their CIAF data is masked in higher-level geographical aggregates, and had an adverse effect on lower levels (zones in this context). This is inconsistent with the decentralized system of governance in Ethiopia. The zone is administrative levels where operation planning, resource allocation, and implementation of health services are made. Hence identifying the problem of malnutrition and its variation among administrative zones would provide deeper insight into the country's health priorities for the under-five children population. Particularly, this would help Zonal health departments to make informed decisions and actions in their planning, follow-up, monitoring, and evaluation of health activities at lower levels [22][23][24][25]. Addressing the health inequalities is of considerable importance for the country, requiring a major reform that will ensure access to health care services for the poor and disadvantaged groups in the zones. Therefore, the main aim of this study was to identify linear fixed, nonlinear, and spatial effects of covariates on CIAF among under-five children in administrative zones of Ethiopia.

Materials and methods
Ethiopia is the second-largest country in SSA with a population of more than 96 million people, among which more than 13 million are under five. With a surface area of 1.1 million km 2, the country shares borders with Eritrea in the north, Djibouti and Somali in the east, Sudan and South Sudan in the west, and Kenya in the south. The government of Ethiopia divides the country into 11 administrative units (regions) including Addis Ababa, the capital city of the country. The regions were further divided into 72 third-level administrative zones mainly based on ethnic groups [26]. Figure 1 shows the spatial prevalence variability of CIAF among the 72 administrative zones of Ethiopia.

Study data
The dataset of this study consists of 29,792 children from 72 Ethiopian administrative zones over 16 years. We obtained Global Positioning System (GPS) coordinates for 2208 cluster coordinates from EDHS and those clusters were linked with the corresponding administrative III (zones) for spatial analysis. The DHS program randomly displaces cluster coordinates up to 2 and 5 km for urban and rural clusters respectively. This is made to ensure the confidentiality of data and the anonymity of participants. The administrative shapefiles were freely available databases on Global administrative units hosted through the DIVA-GIS project (http://www.diva-gis.org) [27]. This database represents the 72 Ethiopian zones across 11 regions that existed in 2016.

Variables
The outcome variable of this study is the composite index for anthropometric failure (CIAF). It is computed by grouping children whose height and weight above the age-specific norm (above − 2 z-scores) and those whose height and weight for their age is below the norm and thus experiencing one or more forms of anthropometric failure. As such, they were classified into seven categories: A-no-anthropometric failure, B-wasting only, Cwasting and underweight, D-wasting, stunting and underweight, E-stunting and underweight, F-stunting only, and Y-underweight only. The CIAF is then calculated by aggregating the six (B-Y) categories [5,7,[28][29][30]. The potential risk factors comprise the child, household, and maternal covariates selected based on findings in the literature [8,[31][32][33][34][35][36][37] (Fig. 2).

Geographic covariates
In addition to the child maternal and household DHS covariates, the contextual geographic covariates were obtained from the EDHS. This dataset can easily be linked to the original EDHS datasets through the cluster identifying number (ID) [1,17,18,[38][39][40]. These factors were selected based on previous studies which demonstrated that the nutritional status was correlated with key climate factors [41][42][43][44][45]. The key contextual climate factors in the study include the average number of drought episodes (1, low and 10, high), aridity index defined as the ratio of annual precipitation (0, most arid to 300, most wet), Built-up index (UR) between 0.00 (extremely rural) and 1.00 (extremely urban), Daytime Land Surface Temperature (LST: The average annual land surface temperature during the day), and Enhanced Vegetation Index (EVI).

Statistical analysis
We adopted three models including the generalized linear mixed model (GLMM), generalized additive mixed model (GAMM), and generalized geo-additive mixed model (GGAMM). The GLMM is a strictly linear regression that assumes a linear effect of the categorical and continuous covariates. Let W be an n by p design matrix for p-categorical covariates related to n = 29,792 children and which may be linearly associated with the response variable. The response variable Y is assumed to be a member of an exponential family [46] expressed as where a, b, and c are arbitrary functions, θ and φ is the natural and scale parameter respectively. For the i th child, the GLMM can be expressed as Where g(.) is the link function, β = (β 1 , …, β p ) is a vector of unknown univariate fixed effect parameters for the categorical covariates and U h(i) , h i =1, 2, . . .,72, is the zone level random effect.
The second model is GAM [47,48], which is a nonparametric generalized linear model. This model assumes nonlinear functions for the continuous covariates and linear effects of the categorical covariates. Let X be the n by q design matrix with the q-dimensional vector of continuous predictors with possible non-linear effects related to n = 29,792 under-five children.
where f = (f 1 , f 2 , …, f q ) T is a vector of unknown smooth functions.
The third model is GGAMM, which is an extension of GAM by adding a bivariate spatial function. Supposing that the children belong to zone h and cluster k in terms of location, the random spatial effects of the GGAMM is generally given as [49,50].
Without loss of generality, the GGAMM model in our study is expressed as;

}
The nonlinear term f spat (lon k , lat k ) is a function of geographical coordinates of the k th cluster where lon k and lat k the longitude and latitude are respectively. The parameter f spat (lon k , lat k ) is random effects that capture the unobserved spatial heterogeneity at location k, for which some are spatially structured and the others are unstructured. This is denoted as: Prior distributions In the Bayesian context, all the unknown fixed effect parameters and variance (β,σ 2 ) and unknown smooth functions (f l , l = 1, 2. . . q, f str and f unstr ) are considered as random variables and this requires specifying the appropriate prior assumptions. Accordingly, for the categorical fixed effect parameters β independent diffuse priors, π (Φ) ∝ constant are assumed. Also, the non-linear smooth parameters were modeled by second-order random walk prior given by . .,f with non-informative priors for f 1 , f 2 [51][52][53][54][55].
Finally, to capture the unstructured spatial random effects (f ustr ), exchangeable normal priors are assumed,f ustr $ N 0; 2 unst À Á , where 2 unst is a variance component that allows for over-dispersion and heterogeneity [51][52][53][54][55]. It is important to note at this point, the software R-INLA assign a prior to log ( 2 unst ) which by default is a log-gamma distribution. Yet to model the spatial correlation (structured) components, neighborhoods must be defined among the study zones. Two zones are neighbors if they share common boundaries. The spatial autocorrelation is modeled using a normal distribution with mean zero and a precision matrix that controls correlations between neighbors. Hence, the structured spatial effect of correlations between areas is achieved by incorporating the f str using the Generalized Markov Random Field (GMRF) priors, where the MRF is defined as [54,55] Such that 2 str designates the unknown precision parameter that controls the degree of similarities, and Q is the spatial precision matrix that encodes the spatial structure. Hence the (i,j) th element of the spatial precision matrix Q is given by Fig. 2 Covariates included in the model Where d~e denotes the area, d is adjacent to e, and n d is the number of adjacent areas to d.
Areas are neighbors if they share common borders [51][52][53][54][55] . Thus, area d with neighboring area e, would have conditional distributions given by where γ ¼ 1 Finally, it is important to note that for all unknown variances, the proposed prior distribution is the inverse-Gamma IG (a j , b j ) [51] where the constant parameters a j > 0 and b j > 0 with hyper-parameters a j = b j = 0.001 is the standard choice for a weakly informative prior [51][52][53][54][55]. For further illustration, let us assume that Φ presents the vector of unknown parameters in the model and L(.) be the likelihood which is the product of individual likelihoods. With Φ = {β, σ 2 , f l , f strc }. Under the Bayesian approach, the posterior distribution L(Φ/y) is a function of the prior distribution L(Φ) and the likelihood function L(y/Φ) [52,54,56].

Model diagnostics
In this study, we considered three models mentioned in the methodology part. We used formal criteria like deviance information criteria (DIC) [1,2] Watanabe_Akaike information criteria (WAIC) [3], and conditional predictive ordinate (CPO) [4,5] to compare the predictive capability of different competing models including GLMM GAMM and GGAMM. The model with the lowest DIC and WAIC and the highest values of CPO was considered as the best model to fit the data. In the selected model, covariates with an association that was significant at a 10% level were included in the multivariable logit model [49,50].
The EDHS provided a sample weight for each survey respondent that accounted for the unequal likelihood of selection and non-response. To confirm that our results are representative of the Ethiopian population, the sample weights were included in the analysis through the options pop-up command of "weight" representing the sample weight by "inla" function [57].

Model validation and evaluation
The test statistic such as the AUC (which is a function of specificity and sensitivity) is used to assess the overall predictive power of the selected model. The AUC measures the ability of the model to correctly predict the presences or absences: specificity measures the percentage of absences correctly predicted, while sensitivity measures the percentage of presences correctly predicted [6][7][8].

Sensitivity analysis for priors
Was performed to investigate the influence of the choice of different variance component priors on the final model to check whether the result was unchanged or not. The following specifications were used as recommended: IG (0.001, 0.001), IG (0.01, 0.01), IG (0.5, 0.0005), and (1, 0.026) of different degrees of uncertainty [51,[58][59][60][61][62]. The first two equal prior specifications (a = b) have often been as a standard choice on the variances of random effects [60], the third specifications were suggested by [58,59], for modeling the precision of the spatial effects in MRF models. The remaining prior choice has been proposed by [59,61,62] for capturing residual odds of range (0.5, 2.0). Generally, for the large dataset and well-identified parameters, any of the prior choices have minor different effects on the posterior inferences [51]. For the analysis purpose, the Integrated Nested Laplace Approximation (INLA) software package for R was used, this is due to the time efficiency as compared to MCMC [53,63,64].

Results
This study sought to find out the determinants and spatial patterns of CIAF among U5C using the four cohort DHS studies in Ethiopia. The posterior mean (standard deviation) together with the 95% credible intervals (Crs) is determined for the linear fixed effect parameters included in the model. These figures are comprehensively presented in Figs. 2, 3 below. Also, the positive coefficients corresponding to an increased risk for the CIAF are demonstrated in the table.

Fixed effects
The findings revealed that the CIAF status of children under five in Ethiopia is positively associated with the male gender. The male under-five children are 22% more likely to be malnourished as compared to female under-five children. Children with comorbidity are 1.407 times more likely affected by malnutrition as compared to those without comorbidity.
The odds ratio of U5C with large/small size at birth decreased /increased by 0.83/1.417 respectively as compared to average size children at birth; the CIAF status of U5C is negatively associated with singleton birth type, i.e. the odds ratio of singleton birth is decreased by 64.4% compared with the multiple birth types (Fig. 3).
Turning to the link between the CIAF status of children and the educational attainments of their parents, one observes a negative association between the two. Those children born from mothers with primary, and secondary and above were 0.958 and 0.602 times less likely affected by malnutrition as compared to those born from mothers with no formal education counterparts. This is markedly different compared to those having no formal education at all. Moreover, those children born from fathers who attained primary, and secondary, and above were 0.855 and 0.804 times less likely affected by malnutrition as compared to their counterparts from those having no formal education. Children born from households with unimproved access to sanitation were 1.116 times more likely affected by CIAF compared with those who got improved sanitation facilities (Fig. 4). Also, children from the low socioeconomic household were, more undernourished than those from highincome backgrounds (the lowest quintile index is positively associated with the CIAF (odds ratio > 1), compared with the middle quintile, the reference group. Particularly, compared against the benchmark category (medium wealth quintiles), children from households with the lowest and lower wealth quintile odds ratio were increased by 26.10 and 13% respectively. In contrast for those from rich and richest wealth quintiles, the odds ratio decreased by 0.924 and 0.797 compared to the baseline category. Finally, the CIAF status of U5C in Ethiopia is negatively associated with the EDHS survey years. As such the odds ratio of CIAF was decreased by 0.620, 0.529, and 0.651 for children whose information has been collected in 2005, 2011, and 2016 respectively compared to the 2000 group, the reference category in the study (Fig. 4).

Nonlinear effects
The results for the non-linear effects of the continuous covariates and the log-odds of CIAF after accounting for other variables with a 95% confidence interval were presented in Fig. 5 and Fig. 6. The estimates for most of the continuous variables were different from zero, suggesting that there are nonlinear effects (Table 1).
Accordingly, the log-odds of U5C having CIAF is directly but nonlinearly correlated with the age of the child (in months). This shows that the CIAF status of children is negatively associated with the lower age group (up to about 12 months). The influence of a child's age on its CIAF is considerably high between the age of 15 and the age of 24 months and almost gets static up to the age of about 52 months. The log-odds of CIAF status of U5C were negatively associated with the lower birth orders (birth order < 4). It increases between the birth orders of 4 to 6 and stabilizes at the same level thereafter. Looking into the mother's body mass index and its impact on the level of their child's log-odds of CIAF, one can observe the influence expressed in the form of a u-shape in the graph. As such, the result revealed that there is a positive association between the thinness condition of the mother and the CIAF status of their U5C (Fig. 5).
Turning to the last determinant under this category, we observe that the CIAF status of U5C is indirectly but non-linearly correlated with enhanced vegetation index (EVI) and urban-rural (UR) settlement ratios. Particularly, the risk of CIAF was found to be significantly high in the lower EVI and UR. Further, looking into the aridity and land surface temperature (LST) and their impact on the log-odds level of their child's CIAF there is a negative association between the lower values of the aridity and lst., The influence is depicted in the graphs which take a clear-cut u-shape forms. Finally, while a negative association is evident in the graphs, the logodds of CIAF increase with an increase in aridity (> 20) and lst (> 30) (Fig. 6).

Spatial effects
The spatial effects presented in Figs. 6, 7 are based on the GGAMM. It shows both positive and negative spatial Fig. 3 Posterior estimates of the fixed effects of childhood parameters for CIAF among U5C in Ethiopia effects on the log-odds of CIAF status among the underfive children in Ethiopia. The estimated cluster (structured) and zonal level (unstructured) estimates were generated for 2208 clusters and 72 Ethiopian third-level administrative zones respectively. Moreover, the variance estimates for the zone-level random effect and the cluster-level spatial effects (which account for the spatial autocorrelation) were nonzero and statistically significant ( Table 1).
The colors on the map show the log-odds scale, which reveals each administrative zone's contribution to the odds of a child's CIAF in U5C.
The colors ranged from green to red such that the green dot denoting a negative effect on the posterior parameters for the log-odds of CIAF, while the yellow and red dots show the positive effects which are associated with an increased risk of the log-odds of childhood malnutrition. Moreover, the map revealed that the presence of variation in CIAF among under-five children in Ethiopian clusters. Mainly clusters in the northern, northwestern, and some parts of the western areas were more likely to be suffered from malnutrition (Fig. 7). The results of the random effects presented in Fig. 8 above reveal the variation across the posterior parameters of Fig. 5 Posterior means of the non-parametric effects of age of the child, birth order, household size and BMI of the mother CIAF among under-five children at smaller spatial units in administrative zones across the country. The positive posterior values further reveal the increased log-odds of CIAF in the zones, but the negative values showed the decreased log-odds of CIAF. Notably, children from the zones including the eastern part of Tigray, Gondar, Waghmira, Guraghe, and Alaba Tembaro were more likely to have suffered from malnutrition. Yet most of the subjects from Oromia, Somali, Afar, SNNP regions, and Addis Ababa city were less likely affected. Moreover, the maps of the 95% CrI and SD indicate higher uncertainties about the CIAF distribution in Ethiopia.

Model goodness of fit and diagnostics
The area under the curve (AUC) for the given three models compared with that of the diagonal line that is always 0.500 (half of the graph) is computed. The GGAMM model has the lowest DIC and WAIC, but the highest values of CPO and AUC indicating that it is the best model for identifying the associated effects of childhousehold level covariates on CIAF (Fig. 9).

Sensitivity analysis
The results are robust to changes in the prior information. It shows that all the four distributions provided almost equivalent values for the parameters and therefore, our model is less sensitive to the choice of the hyperparameters. Hence, the standard choice of a = b = 0.001 which is the default for R-INLA was implemented to explore this study data with GGAMM ( Table 2)

Discussion
The focus of this study was identifying factors associated with CIAF among U5C and mapping the possible spatial pattern of CIAF at zonal level heath service institutional structures. As such, it is the first study in the country that provides the new measures for the prevalence of malnutrition by considering the aggregated indicators of anthropometric failure and applies them to larger nationally representative datasets in Ethiopia over time.
The major findings of the study demonstrated that, for all EDHS datasets, the CIAF is significantly higher compared to the prevalence reported by conventional indices of stunting, underweight, and wasting [65][66][67].
The linear fixed effects have demonstrated a significant positive association between the outcome variable (CIAF) and the demographic characteristics of the child's maternal level of education, and socio-economic level of households. The results, consistent with the    studies reported among the SSA [68] regions, revealed that boys are more likely to have CIAF compared to girls. This may be because the morbidity pattern of male children under the age of 5 is more prevalent than female counterparts [68]. Our study further found out that odds of CIAF among children from a rural area are more prevalent compared to those who live in urban areas. This might be because the rural areas are less developed in healthcare facilities and communities lack awareness of food intake. A further look into the results shows that children from the lowest socio-economic households are more likely to experience CIAF compared to children from the richest households. This is also consistent with previous studies conducted in other developing countries [14,34,38,[69][70][71][72][73]. This might be because in rural areas and in the poorest wealth index families; the access to healthcare facilities and a balanced diet is limited which would lead to inadequate nutrient intakes. Moreover, parents with a limited source of income are unable to purchase food and health care services for their children, which, in effect, contribute to undernourishment.
Finally, the CIAF status is found to have negatively and significantly associated with the educational attainments of parents. The evidence also supports the evidence reported by several studies investigating child nutrition [70,[73][74][75][76]. The results suggest that the educated parents have a better knowledge of a balanced diet and the health of their children. Strengthening this implication, the risk of malnourishment (CIAF) is almost 43.8% lower for children whose mother attains secondary or higher education than fathers with similar educational attainments. Of course, it is also important to consider the employment status of the mothers in this respect. With more than 63% of women in Ethiopia being unemployed and usually spend more of their time with their children compared to their husbands, they would afford to provide childcare and nourishment services than the fathers could do. Thus, while the level of education significantly contributes to the mothers' capacity for child care and nourishment, the time factor might have complemented this impact.
Turning to another determinant, the CIAF status of under-five children was negatively associated with improved sanitation facilities. This might be due to the fact that households with improved sanitation reduce the exposure to additional diseases hence decreases the undernutrition of children. However, the use of improved/ unimproved water has no significant effects on the CIAF status of children. This might be connected to the fact that in Ethiopia, only 38% of households are using improved sanitation, but more than 61.5% of inhabitants can use improved water.
Apart from socio-demographic determinants the mothers' access to mass media use also impacted the CIAF status of the children. As such, a significant negative association is observed between the two variables, suggesting that women who are exposed to media learn about the diet and health of their children and allocating even the limited income to devote to the health nutrition of their children. This finding is consistent with previous research evidence reported in other countries [9,70,75,77,78]. Further, the CIAF status was negatively associated with the DHS year in Ethiopia. Particularly, the risk of malnourishment (CIAF) is almost 10.9, 31, and 39.5% less likely to be malnourished respectively in 2005, 2011, and 2016 as compared to the reference category (the 2000 group). This decline in under-nutrition prevalence over the past two decades evidenced that Ethiopia continued to address childhood malnutrition.
Turning to another evidence in the data, the nonlinear relationship between the age of the child (in months) and CIAF prevalence was similar to that of studies conducted elsewhere [72,73]. Moreover, children in the youngest age group (in months) had significantly lower CIAF odds than the older age groups. This result is also consistent with different studies [14,[72][73][74]. This may be because breastfeeding practices common among mothers who spent much of their time with their children [79]. Moreover, the highest CIAF risk was observed in the age group of almost 18 to 31 months and this is because in this age group mothers do not produce sufficient breast milk to fulfill the required nutrition for their children.
Finally, the predicted values from the GGAMM revealed that child under-nutrition (CIAF) remains high in almost all parts of the country with a limited exception that Zones in Addis Ababa, Gambella, Somali, and

Conclusion
The wide-ranging analyses made in the last sections of the paper demonstrate that the study, as the first of its type in using the GGAMM approach in Ethiopia, has successfully modeled the CIAF at the zonal level by using both the EDHS and geospatial covariates. Through this comprehensive modeling, it has found out that child sex, presence of comorbidity, size of child at birth, dietary diversity, birth type, place of residence, child age, parental educational level, wealth index, sanitation facilities, media exposure of mothers and the survey years have direct links with the risk of under-nutrition among children under the age of five in Ethiopia. Moreover, the minimum body mass index, the higher birth order, and the lower urban-rural ratio were associated with the rising of CIAF. Still, as a major contribution to the inquiry on the subject, this study explored and depicted the spatial maps that revealed the areas at the high (low) estimated risk of undernutrition (CIAF) among under-five children in Ethiopian administrative zones. Finally, the generated estimated values of CIAF and the identified factors would provide essential information that helps the decision-makers to allocate limited resources and program implementation in the zones that need more attention in Ethiopia.

Acknowledgments
The datasets used in this study were obtained from the DHS program thanks to the authorization received to download the dataset on the website.

Authors' contributions
HMF was involved in this study from data management, data analysis, drafting, and revising the final manuscript. TZT and EKM contributed to the conception, design, and interpretation of data, as well as to manuscript reviews and revisions. The authors read and and approved the final manuscript.

Not applicable
Availability of data and materials The dataset used and analyzed during the current study is available from the corresponding author on reasonable request.