Skip to main content

Gaussian process emulation to improve efficiency of computationally intensive multidisease models: a practical tutorial with adaptable R code



The rapidly growing burden of non-communicable diseases (NCDs) among people living with HIV in sub-Saharan Africa (SSA) has expanded the number of multidisease models predicting future care needs and health system priorities. Usefulness of these models depends on their ability to replicate real-life data and be readily understood and applied by public health decision-makers; yet existing simulation models of HIV comorbidities are computationally expensive and require large numbers of parameters and long run times, which hinders their utility in resource-constrained settings.


We present a novel, user-friendly emulator that can efficiently approximate complex simulators of long-term HIV and NCD outcomes in Africa. We describe how to implement the emulator via a tutorial based on publicly available data from Kenya. Emulator parameters relating to incidence and prevalence of HIV, hypertension and depression were derived from our own agent-based simulation model and other published literature. Gaussian processes were used to fit the emulator to simulator estimates, assuming presence of noise for design points. Bayesian posterior predictive checks and leave-one-out cross validation confirmed the emulator’s descriptive accuracy.


In this example, our emulator resulted in a 13-fold (95% Confidence Interval (CI): 8–22) improvement in computing time compared to that of more complex chronic disease simulation models. One emulator run took 3.00 seconds (95% CI: 1.65–5.28) on a 64-bit operating system laptop with 8.00 gigabytes (GB) of Random Access Memory (RAM), compared to > 11 hours for 1000 simulator runs on a high-performance computing cluster with 1500 GBs of RAM. Pareto k estimates were < 0.70 for all emulations, which demonstrates sufficient predictive accuracy of the emulator.


The emulator presented in this tutorial offers a practical and flexible modelling tool that can help inform health policy-making in countries with a generalized HIV epidemic and growing NCD burden. Future emulator applications could be used to forecast the changing burden of HIV, hypertension and depression over an extended (> 10 year) period, estimate longer-term prevalence of other co-occurring conditions (e.g., postpartum depression among women living with HIV), and project the impact of nationally-prioritized interventions such as national health insurance schemes and differentiated care models.

Peer Review reports


The need for emulation

In situations where empirical data on disease impact(s) are not universally available or a randomized controlled trial may not be feasible, complex mathematical computer models (referred to as “simulators”) [1,2,3] can characterize disease prevalence, forecast incidence, and help identify cost-effective approaches for meeting short- and long-term health system needs [4,5,6]. Though simulation models play a critical role in helping public health decision-makers synthesize data from multiple sources and compare anticipated outcomes over time, the limitations of these models are not trivial. Simulation models based on central processing units (CPUs), such as grid computing and computing clusters, [7] require significant infrastructure that can incur high costs for hardware and oversight. Even with access to high-computing infrastructure, long run times of several hours for a single simulation and large numbers of input and output parameters can greatly inhibit fitting such models, [8] and in turn restrict analysts to considering only a subset of all possible simulated scenarios [9, 10]. Furthermore, because of their complexity, components of microsimulation models can still be perceived as a black box [11, 12] because their functions and behaviors are often not exhaustively described or immediately accessible at the time of publication, all of which makes it difficult for external users to interpret and adapt model processes for their local context.

Emulators are one tool that can help mitigate these limitations [9]. An emulator, also known as a metal-model, [13] is an approximation of one or more complex mathematical model(s) that is constructed using a training sample of simulator runs [14] and computationally more efficient. Emulators reduce costs by negating the need for super-computing infrastructure and, once developed, can substantially shorten the amount of time needed to implement model runs and interpret results.

In the last decade there has been an accelerated demand for integrated responses [15,16,17,18,19,20] to the growing burden of non-communicable diseases (NCDs) – including cardiovascular disease, cancers, diabetes, and mental illness – among people living with HIV (PLWH) in low and middle income countries (LMICs) [21,22,23,24]. In response to this call, the authors of this paper recently extended an established agent-based model of HIV transmission and treatment impact to include hypertension in two rural settings in Sub-Saharan Africa [24]. The authors’ simulation model was able to generate robust estimates of changing risks across age groups and predict growing population burdens of HIV and hypertension as comorbidities; however, the simulations were resource and time intensive and it is unlikely that novice modelers would be able to adapt the model’s components without input from an expert biostatistician. Other simulators of HIV and non-communicable diseases have met the same challenges [25]. To facilitate a greater understanding and usability of these complex models, we therefore share our experience developing an open-source emulator that approximates estimates from two simulators over an input subdomain of parameters related to HIV, hypertension and depression in a Sub-Saharan African (SSA) country with a generalized HIV epidemic. The tutorial presented in this paper (i) describes the steps involved in emulator development and validation, (ii) illustrates how to interpret the emulator’s predictive accuracy and outputs in relation to those from each simulator using case study data from Kenya, and (iii) provides annotated, open-source and adaptable R code to facilitate the use of the discussed methods in practice. We expect researchers with a basic understanding of Bayesian statistics and some familiarity with R software to be able to implement this protocol independently.

The emulation method described in this paper relies on well-established and validated Gaussian processes [14, 25, 26]. To the best of our knowledge, Gaussian-process emulation has not yet been used to mimic simulation models that predict the burden of HIV-comorbidities over time, nor has it been described in sufficient detail to enable use by non-biostatisticians. Thus, this tutorial is scientifically significant in that is uses a didactic approach to demystify Gaussian process emulation methods, and offers a new tool that can potentially improve public health decision-making with less resources.

Methods: emulator development and validation

Simulator description and source data

Evidence overwhelming indicates that the disproportionate burden of non-communicable diseases among people living with HIV – compared to individuals not living with HIV – will increase rapidly in the coming decades, [27, 28] and that most health systems in Sub-Saharan Africa are not currently equipped to treat the more than 15 million patients who require integrated HIV and NCD care [19]. In this tutorial we focus on two published simulation models that have estimated the future burden of HIV and non-communicable diseases in Kenya, as well as the costs and epidemiological impact of strengthening integrated care systems in the country. The two simulators were selected as examples for this tutorial because of their longer-term (i.e., > = 10 year) forecast periods and available details surrounding their design points. The two simulators were also selected because the rise of non-communicable diseases among persons living with HIV in Kenya is indicative and representative of the rise of NCDs on the continent, which are estimated to overtake infectious diseases worldwide by 2030 [19, 27, 28].

The first model is the authors’ Integrated Modeling of Epidemiologic and Economic Long-term Outcomes in Africa (inMODELA) microsimulation model, [29] which simulates HIV and hypertension in Kenya and South Africa from 2018 to 2028. The model is an extension of the Sexually Transmitted Diseases Simulator (STDSIM), [30, 31] a stochastic agent-based model that simulates transmission of HIV and other sexually transmitted diseases (STDs) through dynamic sexual networks. inMODELA was partly calibrated using population surveillance data of hypertension and HIV from western Kenya. National-level data on the hypertension prevalence were extracted from the 2015 Kenya STEPwise Approach to NCD Risk Factor Surveillance (STEPS) survey [32] while HIV modelling was calibrated using reports from 2007 and 2012 Kenya AIDS Indicator Surveys (KAIS) and the 2016 Kenya County HIV profiles, [33, 34] the most recent national data available at the time of simulator development. HIV was modelled as having four stages: early infection, asymptomatic, symptomatic and AIDS, and treatment with antiretroviral therapy was operationalized as individual ART demand and health system capacity to meet ART demand. Hypertension was modelled as being normotensive or hypertensive (i.e., blood pressure > 140/90 mmHg), accounting for the potential effects of age, gender, and economic development on hypertension risk. Key outputs of the inMODELA model include total annual mortality, incidence, and prevalence, as well as the health system burden of hypertension, HIV, and comorbid HIV and hypertension. Additional details of the inMODELA simulator are available in a separate publication [29].

The second model in this tutorial is an individual-based simulator initially developed by Smit et al. for Zimbabwe [35] and adapted for Kenya [36]. The model estimates current and future births, deaths, HIV disease and treatment, as well as prevalence/incidence of cardiovascular disease, chronic kidney disease, depression, diabetes, hypertension, and other NCDs and cancers among adults for the period of 2018 through 2035. Simulator calibration relied on data from the Joint United Nations Programme on HIV/AIDS, [37] 2016 Global Burden of Disease estimates [38] and other sources [36].

Relevant input parameters used to calibrate the two simulation models are summarized in Supplementary Table 1.

Overview of the emulation process

Figure 1 summarizes the steps that were used to develop and validate an emulator to approximate epidemiological outputs projected over 10 years by the multidisease simulators. To develop the emulator, we first abstract relevant parameters from the more complex simulation models to serve as the emulator’s design points. In this example, parameters related to the prevalence of HIV, hypertension, comorbid HIV and hypertension, and depression among PLWH were selected. Prevalence parameters were ascertained for 2018 and for 2028. Second, we use Gaussian processes (GP) to approximate the mean and variance of each simulator’s outputs, assuming presence of noise for our design points. Third, we use Bayesian posterior predictive analysis to analyse the credibility of future emulator predictions based on the posterior distribution. Lastly, we use leave-one-out cross validation to confirm the emulator’s predictive accuracy and compare emulator estimates to simulation results.

Fig. 1
figure 1

Overview of emulator development and validation process

Figure 1 depicts the steps used to develop and validate the emulator approximating epidemiological HIV and NCD data from two established simulators. First, the R statistical package is installed. Second, relevant parameters related to the 10-year prevalence of HIV, hypertension, comorbid HIV and hypertension, and depression are abstracted from published simulators to serve as the emulator’s design points, and entered into R. Third, Gaussian processes (GP) approximate the mean and variance of each simulator’s outputs, assuming presence of noise for the emulator’s design points. Fourth, Bayesian posterior predictive analysis is used to infer the credibility of future HIV and NCD prevalence(s) based on the posterior distribution. Fifth, leave-one-out cross validation confirmed the emulator’s predictive accuracy. Lastly, the emulator’s predictions are compared and interpreted in relation to those of the simulator(s)

Step 1. Install the program

This tutorial uses the GauPro package [39], the rstanarm package [40] and the loo package [41, 42] in R version 4.1.2 for emulator construction and application. To replicate results from this tutorial, or to adapt this emulator to new simulation data, the latest version of the free R software environment needs to be installed, and can be downloaded from

Step 2. Select key design points from simulation model(s)

Not all parameters of a complex simulation model will be needed for emulation. Only those most informative for your research question should be ascertained. In this example using simulator data from Kenya, parameters relating to the prevalence of HIV, hypertension, and comorbid HIV and hypertension were ascertained from the inMODELA simulation model [29] for the period of 2018 through 2028. Parameters relating to the prevalence of depression among people living with HIV were ascertained from the Smit et al. model [36] for the period of 2018 through 2030. (Table 1) To reconcile the different forecasting periods used by the two simulation models, we assumed the prevalence of depression in 2028 to be the average prevalence for years 2025 and 2030. Supplementary Table 2 specifies the design points used to emulate annual HIV and NCD prevalence for the period of 2018 through 2028.

Table 1 Summary of final design points used for emulation, based on example national data from Kenya

Update the emulator’s R code to reflect relevant design points and corresponding emulation time period(s) abstracted from simulator data.

Step 3. Fit Gaussian processes to simulation data

A Gaussian process refers to a collection of any finite number of random variables which have a Gaussian (normal) distribution [25]. A GP emulator uses a statistical model to fit a Gaussian process to a dataset, and is defined by (1) a mean function describing the mean at any point of the input space and (2) a covariance function describing the covariance between points [25, 39]. When emulating a stochastic simulator, the unknown function is assumed to be the expectation of the ith simulator output denoted as fi(x).

We construct our GP emulator such that, for each simulator output fi(x), we select active variables (xA) and then emulate using the following process:


The first part of the emulator, \(\sum_{i=1}^q{\beta}_i{g}_i\left({x}^A\right),\) is a polynomial of active inputs xA running from i = 1, …q chosen from the simulator outputs, βi are the regression coefficients, and gi(˙) are the deterministic functions of xA which are known [43]. The second part of the emulator, μi(xA), represents a collection of any finite number of random variables which have a Gaussian distribution [43]. Supposing that μi(xA) is a Gaussian process with zero mean and known variance, we can define it as:

$$\mu \left({x}^A\right)\sim GP\left(0,c\left({x}^A,{x^A}^{\prime}\right)\right)$$

where c(xA, xA) is a covariance function that determines the relationship between μ(xA) and μ(xA) based on the distance between xA and xA. The Gaussian process used to develop the current emulator has a covariance structure given by:

$$Cov\left({\mu}_i\Big({x}_1^A\right),{\mu}_i\left({x}_2^A\right)\Big)={\sigma}_i^2\exp \left[\frac{-{\left|{x}_1^A-{x}_2^A\right|}^2}{\theta_i^2}\right].$$

The parameter σ in eq. 3 can be varied to obtain the desired amount of waves in the emulator, whereby a smaller value of σ results in less extreme waves [44]. θ > 0 are unknown correlation length parameters where large values of θ indicate a smooth output function of the ith input and small values suggest high non-linearity [45]. The last part of the emulator, δi(x), models the effects of inactive variables as random noise.

For each output of interest, the emulator provides the expectation E [fi(x)] and variance var(fi(x)) at x for every output given by i = 1, 2, …n where x denotes a vector of the emulator inputs. We evaluate fi(x) as the prevalence value(s) projected from 2018 to 2028 for HIV, hypertension, comorbid HIV and hypertension, and depression. The results are gathered into a vector D: \({D}_i=\Big(f{\left({x}_1^A,\dots, f\left({x}_q^A\right)\right)}^T\) where q represents the number of design points. The emulator is then plotted as an adjusted expectation and variance function of fj(x): ED(fj(x)) and VarD(fj(x)).

Step 4. Run graphical posterior predictive checks of emulator fit

In this step, we use graphical displays to check that the disease burden trends produced by our emulator look similar to the simulator data we observed. Our emulator uses Bayesian posterior predictive analysis to assess the credibility of future observable data based on the posterior distribution [46]. We analyze the Bayesian posterior distributions graphically to check predictive accuracy of the posterior distributions, plotting simulated y values from the posterior distribution against the actual values of y. The posterior predictive distribution is defined as:

$$p\left({y}^{rep}|y\right)=\int p\left({y}^{rep}|\theta \right)p\left(y|\theta \right)p\left(\theta \right) d\theta$$

where yrep represents future data that could be drawn from the posterior distribution, y is the current simulator data, θ is the model parameter, and p(y| θ) is the sampling density for future data conditional on the parameter [46]. The size of the posterior sample was based on 50 draws which showed to be sufficient enough to generate the yrep matrix from the posterior predictive distribution [40].

Step 5. Validate the emulator’s accuracy

We use the Bayesian leave-one-out cross validation (LOOCV) technique to confirm the emulator’s predictive accuracy. The LOOCV method applies the log-likelihood evaluation of posterior parameter values, [42] and is appropriate for smaller data sets and when similar distributions exist in the training and testing data [47]. LOOCV assesses the predictive ability of posterior simulations in which the data is iteratively partitioned into either calibration (training) sets or validation (test) sets. LOOCV is one of the most accurate ways to estimate how well a model will perform on unseen, “out-of-sample” data [42]. The calibration set is used to train the model and produce output values that are compared with the test set [48]. We validated the emulator’s accuracy using the loo package in R whereby the computation uses Pareto-smoothed importance sampling (PSIS) to regularize importance weights [42]. Following prior work [42, 49], a Pareto k estimate less than or equal to a 0.70 threshold was used to confirm reliability of the emulator’s performance. The LOOCV approach is specified in eqs. 57.

Given that data y1, …yn are independently modelled given the parameters θ then, \(p\left(y|\theta \right)={\prod}_i^np\left({y}_i|\theta \right)\). Suppose a prior distribution p(θ) gives a posterior distribution p(θ| y) and a posterior predictive distribution p(y˜| y) = ∫ p(y˜i|θ)p(θ| y). A predictive accuracy measure for n data points termed as expected log pointwise predictive density (elpd) for a new dataset is given as:

$$\sum\nolimits_1^n\int {p}_t\left(y{\sim}_i\right)\mathit{\log}\ p\left(y{\sim}_i|y\right) dy{\sim}_i$$

where pt(y˜i) is the distribution of the real data generation for y˜i. The values of pt(y˜i) are unknown and therefore cross-validation is used to approximate (5). The Bayesian leave-one-out (LOO) estimate of the predictive fit is:

$${elpd}_{loo}=\sum\nolimits_{i=1}^n\log p\left({y}_i|{y}_{-1}\right)$$


$$p\left({y}_i|{y}_{-1}\right)=\int p\left({y}_i|\theta \right)p\left(\theta |{y}_{-1}\right) d\theta$$

is the leave-one-out predictive density given the data without the ith data point.

To train the emulator, we followed the Emulation and History Matching Handbook for R [50] whereby we chose a number of training points equal to ten times the number of parameters. Given that we were estimating only one parameter (prevalence) for each of the four outcomes of interest, this resulted in 10 training points.

Results: emulator interpretation

Using outputs from the two simulation models of HIV, hypertension and depression burden, our emulator shows to be as accurate and computationally more efficient at predicting prevalence of these comorbidities in Kenyan populations over 10-years. On average, the inMODELA simulator took 11 hours and 53 minutes to perform 1000 runs on a high-performance computing cluster (HPC) and used 208 central processing units (CPU) cores, 2 graphics processing units (GPUs), and 1500 gigabytes (GBs) of random-access memory (RAM). The model by Smit et al. was coded in C++, ran using Xcode (Mac), and took a little over 3 minutes per iteration on an 8-core machine. Although run times were not overwhelmingly long, the Smit et al. simulation model faced computational constraints due to the large size of model outputs; this necessitated an additional post-processing step of aggregating and summarizing each batch of 10 model iterations in MatLab. By comparison, the present emulator was developed and validated on a Hewlett-Packard Intel Core i5 laptop with a 64-bit operating system, 2.50 CPU of 2.50 GHz and 8.00 GBs of RAM, and took only a few seconds for a single run using the same statistical program for development, validation and output interpretation.

Figure 2 shows the emulator’s projected mean prevalence over time for the example data in terms of: (A) HIV, (B) hypertension, (C) comorbid HIV & hypertension, and (D) depression among adults living with HIV. In each sub-figure of Fig. 2, black points should be interpreted as the design points (i.e., simulator outputs) used to fit the emulator, dashed red lines show the mean prevalence plotted as a function of the emulator’s predictions, solid blue lines represent the 95% Confidence Intervals (CI) for the emulator’s mean estimates, and the yellow lines are the 95% Confidence Intervals for the simulators’ mean estimates (Confidence Intervals for the annual mean prevalence of depression among PLWH were not publically available). We can then visually understand the emulated trends in disease burden to closely reproduce the trends projected by each simulator. For example, the emulator predicts that the mean prevalence of HIV will be 0.02776 [95% CI: 0.02774–0.02778] in the year 2027 compared to 0.02750 [95% CI: 0.02379–0.03121] predicted by the inMODELA simulator. The pointwise confidence intervals used for statistical uncertainty quantification are wider between each emulation year than at each annual point estimate (e.g., at the midpoint between 2027 and 2028, the mean prevalence of HIV is predicted to be 0.02573 (95% CI: 0.02219, 0.02928). The width of the credibility interval is largest between points and approaches zero near each point estimate due to the Gaussian process regression towards the mean. When considering the prevalence(s) of HIV, hypertension, and comorbid HIV & hypertension in this example, the emulator’s 95% Confidence Intervals for each mean estimate are very similar to those projected by the inMODELA simulator. Given the linearity and lack of noise in the original data, the emulator’s uncertainty ranges are within those of the inMODELA estimates. This indicates that our emulator accurately approximates outputs from the more complex model.

Fig. 2
figure 2

Gaussian process emulator for (a) HIV, (b) Hypertension, (c) Comorbid HIV & Hypertension and (d) Depression in Kenya

At each year, black dots represent the selected design points used to fit the emulator, the blue curved lines represent the predicted 95% uncertainty intervals for each design point, the dashed red lines are the mean prevalence for a given year plotted as a function presenting the emulator’s predictions and the yellow thick lines outside are the 95% confidence intervals for the original simulators. Sub-plot D does not include the 95% confidence intervals for the simulator because these data were unpublished

Figure 3 shows the Gaussian posterior predictive distributions for the example Kenyan data for HIV, hypertension, comorbid HIV & hypertension, and depression. The solid black lines can be understood as the distribution of y; the dashed red lines represent the y posterior predictive distribution, and the dotted grey lines represent the distribution of the simulations performed. We observe from panel A, B, C and D in Fig. 3 that the posterior predictive distributions denoted by post-pred y are not far off from the current fitted data denoted by y. Density values differ for each panel due to different prevalence values for the respective disease(s) on the x-axis, with greater imprecision for depression estimates (panel D) because of fewer available simulation data at each projection year.

Fig. 3
figure 3

Gaussian posterior predictive distributions for (a) HIV, (b) Hypertension, (c) Comorbid HIV & Hypertension and (d) Depression prevalence

The solid black lines represent the distribution of y ; the dashed red lines represent the y posterior predictive distribution and the dotted grey lines represent the distribution of simulations. The density values of the plots are different as generated by the system due to the differences in disease-specific prevalence values on the x-axis

For each output of interest, the LOOCV yielded Pareto k diagnostic values of < 0.70, indicating practical convergence rates and reliable Monte Carlo error estimates. Validating diagnostics for each of the emulator’s output are provided in Supplementary Figures 14.

Supplementary Figure 5a shows the emulator’s projected mean prevalence of HIV when the simulation model assumes that Kenya’s ART coverage targets are achieved (i.e., 90% of people with HIV are aware of their status, and, of those, 90% are enrolled in HIV careFootnote 1) due to the implementation of effective interventions along the care continuum. Supplementary Figure 5b shows the emulator’s projected mean prevalence of hypertension when Kenya’s hypertension treatment targets are reached (i.e., when 50% of persons with confirmed hypertension are enrolled in care and receiving drug therapy and counselling). Supplementary Figures 5c and 5d show the projected disease prevalences when one or both of these treatment targets are achieved. Supplementary Figure 6 shows the Gaussian posterior predictive distributions for these emulations. When considering the prevalence(s) of HIV and hypertension in the presence of hypothetical interventions that help achieve treatment targets, the emulator’s 95% confidence intervals for each mean estimate closely approximate those projected by the inMODELA model. Similar to emulator outputs based on disease prevalence simulations in the absence treatment, the emulator’s uncertainty ranges for prevalence estimates in the presence of wider treatment uptake are within those of the inMODELA simulation estimates. This indicates that our emulator accurately approximates outputs from more complex modelling of the disease-impact of chronic disease intervention(s). We also see a substantial lack of density values for each panel for the respective disease(s) on the x-axis in Supplementary Figure 6, further indicating high precision of the emulator for estimating treatment impacts on chronic disease burden over time. For all emulations of disease burden in the presence of higher intervention uptake, the Pareto k diagnostic values were < 0.70 (Supplementary Figures 710).


Observed accuracy and efficiency

Applying widely-used Gaussian processes, this tutorial details the steps used to develop a new, user-friendly emulator that can approximate multi-year estimates of chronic disease burden from two computationally intensive simulation models. We demonstrate that the emulator closely reproduced trends in disease burden projected by the published simulators from which our parameters were sourced [29, 36]. In this example, from 2018 through 2028, prevalence of HIV and depression among PLWH in Kenya is projected to decrease by approximately 2 and < 1 percentage points, respectively, with prevalence of hypertension increasing by 5 percentage points over the same 10-year forecast. Successful validation checks (Pareto k estimates < 0.70 for all emulations) confirmed the emulator’s predictive accuracy. Disease prevalence estimates were modelled in only a few seconds on a 64-bit operating system laptop with 2 CPU cores of 8 GBs of RAM using the emulator presented in this paper, while simulations [29] took more than 11 hours to perform 1000 runs on a high-performance computing cluster with 208 CPU cores, 2 GPUs and 1500 GBs of RAM. Thus, our emulator was able to more efficiently model disease burden over time without compromising the statistical accuracy of more computationally intensive simulators. Outputs from sensitivity analyses suggest that the emulator is equally efficient and reliable for approximating simulations of disease burden in the presence of effective treatment interventions.

Future applications

The emulator presented in this paper was developed and validated using 10-year demographic and epidemiological case study data from Kenya. However, the broader Gaussian processes described in this tutorial (and made available via the open source R code) are widely validated in the fields of public and environmental health as reliable methods for emulating results of complex and resource-intensive models, [9, 43, 49] including for those above and beyond Kenyan populations and adults living with chronic conditions. For example, a tutorial using HIV data from Uganda [8] found that history matching and emulation of an 18 output simulator had a 65% probability of fitting all simulator outputs and was several orders of magnitude faster to evaluate. A Bayesian optimization emulator with Gaussian processes [51] has similarly shown to adequately capture the input–output relationship of the OpenMalaria individual-based model (IBM) [52] of malaria transmission while improving upon the simulator’s overall goodness of fit. And a Gaussian process emulator of an IBM of microbial communities [53] has demonstrated an approximately 220-fold increase in computational efficiency, with the percentage of variance explained by the univariate emulator ranging from 83 to 99%. Thus, through the selection of alternative design points, the emulator in this tutorial has the potential to approximate other simulators outside of those for HIV, hypertension and diabetes. As is the goal with any emulation exercise, our emulator offers a statistical model that can be used as a surrogate for chronic disease simulators. Because our simple emulator showed to be valid and more efficient in mimicking 10-year prevalence of HIV comorbidities in the absence of intervention, the next step is to evaluate how the emulator will also be able to mimic future simulators that model disease burden in the presence of targeted interventions. For example, Hamilton et al. developed an agent-based simulation of HIV-1 transmission in Kenya to estimate the potential population-level impact of providing PrEP layered into standard care services over 10 years [54, 55]. These and other longitudinal simulator data offer ideal design points to further test the emulator’s predictive accuracy for modelling intervention effects on HIV treatment and prevention. Several microsimulation models have been developed to characterize and plan for the rapidly growing burden of non-communicable diseases in SSA [56,57,58] and elsewhere [59,60,61]. Also, though still in early stages, recent advances in computing power are now allowing large and complex ABMs to be simulated in reasonable amounts of time using desktop GPUs [62]. Yet, to the best of our knowledge, none of these NCD models have been made available to local public health decision makers through accessible and understandable platforms, which the present emulator may begin to correct.

Limitations and strengths of this tutorial

There are some limitations to the emulation methods presented in this paper. First, Gaussian process emulation was appropriate given the small number of design points being considered in this tutorial. However, GP emulators do not scale well when including many (e.g., > 50) inputs [9, 63, 64], such that several lower-dimensional emulators might be more appropriate when a greater number of simulator outputs are being considered [9]. Relatedly, because our original simulation model data were essential linear, the emulator increased the efficiency of our predictions (i.e., narrower uncertainty ranges). It is possible that our emulator may lose computational efficiency or yield estimates with greater uncertainty when applied to higher dimensional data or to data with asymmetric noise. Incorporating interval calibration [65], Monte Carlo [66], and other methods in future iterations of the emulator could help address this limitation. Second, we fit separate univariate GPs emulators to model each simulator output individually, which neglects any potential correlation [67] between the outputs [53] (e.g., between changes in the prevalence of hypertension and depression among PLWH over time). Future expansions of the emulator can address this issue by using a multivariate Gaussian process [68]. Lastly, as is common to mathematical modelling work, the simulation models selected for this tutorial suffered from incomplete surveillance data which restricted our ability to perform additional emulation procedures such as history matching [69] to reduce simulator input space or additional diagnostic checks [70] that rely on more robust training data.

Despite these limitations, the emulator presented in this paper offers an accessible tool for health policy makers who may not have a background in disease modelling. This will help to build the modelling capacity of local decision makers who are working to build integrated HIV and chronic disease programmes with limited resources [71]. While transparency surrounding microsimulation model development has increased in recent years as these models become more widespread, it is often impractical to document every detail of their functionality [11, 12]. Emulators do not address all transparency concerns related to black box modelling, but they can help address concerns related to communicating results from complex models for wider audiences. Thus, a key strength of this emulator is its simplicity; the step-by-step annotated code that was programmed using free R software and is available via an open-source repository can encourage future use and adaption at no cost to the user.

Secondly, our emulator uses Bayesian inferences rather than a frequentist approach in the posterior predictive analysis, which maximizes both the prior and currently available data. Third, we used the leave-one-out cross validation technique which offers an unbiased and reliable diagnostic check when similar distributions exist in the training and test data [47, 49, 70]. Fourth, our emulator fills a gap in the health decision-making toolbox in that it is one of the first to model the dual burden of HIV and hypertension and of HIV and depression for a country with a generalized HIV epidemic and growing non-communicable disease burden.


This emulation tutorial responds to calls from international donors and global health researchers [72] to “make modelling tools and analytic packages publicly available to wider audiences” and facilitate “training of decision makers to understand model outputs, particularly uncertainty and confidence intervals”. We envision future expansions of this emulator to be able to estimate changes in HIV and NCD burden with greater coverage of National Health Insurance schemes and in the presence of integrated care interventions [73, 74], and to estimate the cost-effectiveness of integrated interventions based on current [6, 75] and emerging data.

Availability of data and materials

All annotated code used to develop and run the emulator presented in this paper is freely available via the following digital repository:


  1. The simulation model was developed and published in 2019, at which time the UNAIDS 90–90-90 targets were in effect in Kenya.



Central Processing Unit




Gaussian Process


Graphics Processing Unit


Human Immunodeficiency Virus


Integrated Modeling of Epidemiologic and Economic Long-term Outcomes in Africa


Kenya AIDS Indicator Surveys (KAIS)


Low- and Middle-Income Countries


Leave-One-Out Cross Validation


Non-communicable Disease


People Living with HIV


Random Access Memory


Sub-Saharan Africa


Sexually Transmitted Disease


Sexually Transmitted Diseases Simulator


  1. Law AM. Simulation Modeling and Analysis. 5th ed. Mcgraw Hill Series in Industrial Engineering and Management; 2014. p. 776.

    Google Scholar 

  2. Rutter CM, Zaslavsky AM, Feuer EJ. Dynamic microsimulation models for health outcomes: a review. Med Decis Mak. 2011;31:10–8.

    Article  Google Scholar 

  3. Elveback L, Varma A. Simulation of mathematical models for public health problems. Public Health Rep. 1965;80:1067–76.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Kopec JA, Edwards K, Manuel DG, Rutter CM. Advances in microsimulation modeling of population health determinants, diseases, and outcomes. Epidemiol Res Int. 2012;2012:1–3.

    Article  Google Scholar 

  5. Sugrue DM, Ward T, Rai S, McEwan P, van Haalen HGM. Economic modelling of chronic kidney disease: a systematic literature review to inform conceptual model design. Pharmacoeconomics. 2019;37:1451–68.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Nugent R, Barnabas RV, Golovaty I, Osetinsky B, Roberts DA, Bisson C, et al. Costs and cost-effectiveness of HIV/NCD integration in Africa: from theory to practice HHS public access. Aids. 2018;32:83–92.

    Article  Google Scholar 

  7. Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D. Graphics processing units in bioinformatics, computational biology and systems biology. Brief Bioinform. 2017;18:870–85.

    PubMed  Google Scholar 

  8. Andrianakis I, Vernon IR, McCreesh N, McKinley TJ, Oakley JE, Nsubuga RN, et al. Bayesian history matching of complex infectious disease models using emulation: a tutorial and a case study on HIV in Uganda. PLoS Comput Biol. 2015;11:e1003968.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Ellis AG, Iskandar R, Schmid CH, Wong JB, Trikalinos TA. Active learning for efficiently training emulators of computationally expensive mathematical models. Stat Med. 2020;39:3521–48.

    Article  PubMed  Google Scholar 

  10. Gulati R, Gore JL, Etzioni R. Comparative Effectiveness of Alternative Prostate-Specific Antigen–Based Prostate Cancer Screening Strategies. Ann Intern Med. 2013;158:145.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Abraham JM. Using microsimulation models to inform U.S. health policy making. Health Serv Res. 2013;48(2 Pt 2):686–95.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Lorscheid I, Heine B-O, Meyer M. Opening the ‘black box’ of simulations: increased transparency and effective communication through the systematic design of experiments. Comput Math Organ Theory. 2012;18:22–62.

    Article  Google Scholar 

  13. Blanning RW. The construction and implementation of metamodels. Simulation. 1975;24:177–84.

    Article  Google Scholar 

  14. Conti S, Gosling JP, Oakley JE, O’Hagan A. Gaussian process emulation of dynamic computer codes. Biometrika. 2009;96:663–76.

    Article  Google Scholar 

  15. De Foo C, Shrestha P, Wang L, Du Q, García Basteiro AL, Abdullah AS, et al. Integrating tuberculosis and noncommunicable diseases care in low- and middle-income countries (LMICs): a systematic review. PLoS Med. 2022;19(1):e1003899.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Rohwer A, Uwimana Nicol J, Toews I, Young T, Bavuma CM, Meerpohl J. E ffects of integrated models of care for diabetes and hypertension in low-income and middle-income countries: a systematic review and meta-analysis. BMJ Open. 2021;11:1–9.

    Article  Google Scholar 

  17. Vorkoper S, Kupfer LE, Anand N, Patel P, Beecroft B, Tierney WM, et al. Building on the HIV chronic care platform to address noncommunicable diseases in sub-Saharan Africa: a research agenda. Aids. 2018;32(Suppl 1):S107–13.

    Article  PubMed  Google Scholar 

  18. Kemp CG, Weiner BJ, Sherr KH, Kupfer LE, Cherutich PK, Wilson D, et al. Implementation science for integration of HIV and non-communicable disease services in sub-Saharan Africa: a systematic review. Aids. 2018;32:S93–105.

    Article  PubMed  Google Scholar 

  19. Adeyemi O, Lyons M, Njim T, Okebe J, Birungi J, Nana K, et al. Integration of non-communicable disease and HIV/AIDS management: a review of healthcare policies and plans in East Africa. BMJ Glob Heal. 2021;6(5):e004669.

    Article  Google Scholar 

  20. Kasaie P, Weir B, Schnure M, Dun C, Pennington J, Teng Y, et al. Integrated screening and treatment services for HIV, hypertension and diabetes in Kenya: assessing the epidemiological impact and cost-effectiveness from a national and regional perspective. J Int AIDS Soc. 2020;23(Suppl 1):e25499.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Achwoka D, Mutave R, Oyugi JO, Achia T. Tackling an emerging epidemic: the burden of non-communicable diseases among people living with hiv/aids in sub-Saharan Africa. Pan Afr Med J. 2020;36:1–9.

    Article  Google Scholar 

  22. Wanni Arachchige Dona S, Bohingamu Mudiyanselage S, Watts JJ, Sweeney R, Coghlan B, Majmudar I, et al. Added socioeconomic burden of non-communicable disease on HIV/AIDS affected households in the Asia Pacific region: a systematic review. Lancet Reg Health West Pac. 2021;9:100111.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Hyle EP, Mayosi BM, Middelkoop K, Mosepele M, Martey EB, Walensky RP, et al. The association between HIV and atherosclerotic cardiovascular disease in sub-Saharan Africa: a systematic review. BMC Public Health. 2017;17:1–15.

    Article  Google Scholar 

  24. Patel P, Rose CE, Collins PY, Nuche-Berenguer B, Sahasrabuddhe VV, Peprah E, et al. Noncommunicable diseases among HIV-infected persons in low-income and middle-income countries: a systematic review and meta-analysis. AIDS. 2018;32:S5–20.

    Article  PubMed  Google Scholar 

  25. Rasmussen CE, Williams CKI. Gaussian processes for machine learning. MIT Press; 2018.

    Google Scholar 

  26. Sacks J, Welch WJ, Mitchell TJ, Wynn HP. Design and analysis of computer experiments. Stat Sci. 1989;4(4):409–23.

    Google Scholar 

  27. Coetzee L, Bogler L, De Neve JW, Bärnighausen T, Geldsetzer P, Vollmer S. HIV, antiretroviral therapy and non-communicable diseases in sub-Saharan Africa: empirical evidence from 44 countries over the period 2000 to 2016. J Int AIDS Soc. 2019;22(7):e25364.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Bloomfield GS, Khazanie P, Morris A, Rabadán-Diehl C, Benjamin LA, Murdoch D, et al. HIV and noncommunicable cardiovascular and pulmonary diseases in low-and middle-income countries in the art era: what we know and best directions for future research. J Acquir Immune Defic Syndr. 2014;67(Suppl 1):S40–53.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Osetinsky B, Hontelez JAC, Lurie MN, McGarvey ST, Bloomfield GS, Pastakia SD, et al. Epidemiological and health systems implications of evolving HIV and hypertension in South Africa and Kenya. Health Aff. 2019;38:1173–81.

    Article  Google Scholar 

  30. Bakker R, Korenromp E, Meester E, Van Der Ploeg C, Voeten H, Van Vliet C, et al. STDSIM: a microsimulation model for decision support in the Control of HIV and other STDs. Am Sex Transm Dis Assoc. 2000;27:652.

    Article  Google Scholar 

  31. Hontelez JAC, Lurie MN, Bärnighausen T, Bakker R, Baltussen R, Tanser F, et al. Elimination of HIV in South Africa through expanded access to antiretroviral therapy: a model comparison study. PLoS Med. 2013;10:e1001534.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Kenya Ministry of Health. STEPwise Survey for Non CommuniKenyacable Diseases Risk Factors 2015 Report. Kenya STEPwise Surv Non Commun Dis Risk Factors 2015 Rep, vol. 5. Kenya Ministry of Health; 2015. p. 8–210.

    Google Scholar 

  33. National AIDS and STI Control Programme (NASCOP) K. Kenya AIDS Indicator survey 2012: final report. Nairobi; 2014. Accessed 26 Sep 2023

  34. National AIDS Control Council. Kenya HIV County profiles. 2016. Accessed 26 Sep 2023.

    Google Scholar 

  35. Smit M, Olney J, Ford NP, Vitoria M, Gregson S, Vassall A, et al. The growing burden of noncommunicable disease among persons living with HIV in Zimbabwe. AIDS. 2018;32:773–82.

    Article  PubMed  Google Scholar 

  36. Smit M, Perez-Guzman PN, Mutai KK, Cassidy R, Kibachio J, Kilonzo N, et al. Mapping the current and future noncommunicable disease burden in Kenya by human immunodeficiency virus status: a modeling study. Clin Infect Dis. 2020;71:1864–73.

    Article  PubMed  Google Scholar 

  37. UNAIDS. On the Fast-Track to end AIDS 2016–2021 Strategy. p. 2015. Accessed 26 Sep 2023

  38. Hashiguchi L, Achoki T, Alam U, Fullman N. The Global Burden of Disease: Generating Evidence, Guiding Policy. Nairobi, Kenya. Institute for Health Metrics and Evaluation and the International Centre for Humanitarian Affairs. 2016. Accessed 26 Sep 2023.

  39. Erickson C. A guide to the GauPro R package. 2023. Accessed 26 Sep 2023.

    Google Scholar 

  40. Gabry J, Simpson D, Vehtari A, Betancourt M, Gelman A. Visualization in Bayesian workflow. J R Stat Soc Ser A Stat Soc. 2019;182:389–402.

    Article  Google Scholar 

  41. Magnusson M, Andersen MR, Jonasson J, Vehtari A. Leave-one-out cross-validation for Bayesian model comparison in large data. Proc Mach Learn Res. 2020; arXiv:2001.00980

  42. Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput. 2017;27:1413–32.

    Article  Google Scholar 

  43. Vernon I, Goldsteiny M, Bowerz RG. Galaxy formation: a Bayesian uncertainty analysis. Bayesian Anal. 2010;05(04):619–70.

    Google Scholar 

  44. Swiler LP, Gulian M, Frankel AL, Safta C, Jakeman JD. A survey of constrained Gaussian process regression: approaches and implementation challenges. J Mach Learn Model Comput. 2020;1:119–56.

    Article  Google Scholar 

  45. Al-taweel Y. Diagnostics and simulation-based methods for validating Gaussian process diagnostics and simulation-based methods for validating Gaussian process emulators. PhD Thesis. University of Sheffield; 2018.

    Google Scholar 

  46. Lynch SM. Bayesian Statistics. Encycl Soc Meas. 2005:135–44.

  47. Shao Z, Er MJ, Wang N. An Efficient Leave-One-Out Cross-Validation-Based Extreme Learning Machine (ELOO-ELM) With Minimal User Intervention. IEEE Trans Cybern. 2016;46:1939–51.

    Article  PubMed  Google Scholar 

  48. Morrison RE, Bryant CM, Terejanu G, Prudhomme S, Miki K. Data partition methodology for validation of predictive models. Comput Math Appl. 2013;66:2114–25.

    Article  Google Scholar 

  49. Beddows AV, Kitwiroon N, Williams ML, Beevers SD. Emulation and sensitivity analysis of the community multiscale air quality model for a UK ozone pollution episode. Environ Sci Technol. 2017;51:6229–36.

    Article  PubMed  CAS  Google Scholar 

  50. Iskauskas A. Emulation and History Matching Handbook. Accessed 03 May 2023

  51. Reiker T, Golumbeanu M, Shattock A, Burgert L, Smith TA, Filippi S, et al. Emulator-based Bayesian optimization for efficient multi-objective calibration of an individual-based model of malaria. Nat Commun. 2021;12:7212.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. Smith T, Killeen GF, Maire N, Ross A, Molineaux L, Tediosi F, et al. Mathematical modeling of the impact of malaria vaccines on the clinical epidemiology and natural history of plasmodium falciparum malaria: overview. Am J Trop Med Hyg. 2006;75(Suppl 2):1–10.

    Article  PubMed  Google Scholar 

  53. Oyebamiji OK, Wilkinson DJ, Jayathilake PG, Curtis TP, Rushton SP, Li B, et al. Gaussian process emulation of an individual-based model simulation of microbial communities. J Comput Sci. 2017;22:69–84.

    Article  Google Scholar 

  54. Hamilton DT, Agutu C, Sirengo M, Chege W, Goodreau SM, Elder A, et al. Modeling the impact of different PrEP targeting strategies combined with a clinic-based HIV-1 nucleic acid testing intervention in Kenya. Epidemics. 2023;44:100696.

    Article  PubMed  CAS  Google Scholar 

  55. Coll P, Jarrín I, Martínez E, Martínez-Sesmero JM, Domínguez-Hernández R, Castro-Gómez A, et al. Achieving the UNAIDS goals by 2030 in people living with HIV: A simulation model to support the prioritization of health care interventions. Enfermedades Infecc y Microbiol Clin. 2023;41(10):589–95.

    Article  Google Scholar 

  56. Gouda HN, Charlson F, Sorsdahl K, Ahmadzada S, Ferrari AJ, Erskine H, et al. Burden of non-communicable diseases in sub-Saharan Africa, 1990–2017: results from the global burden of disease study 2017. Lancet Glob Heal. 2019;7(10):E1375–87.

    Article  Google Scholar 

  57. Schnure M, Dowdy D. In: Bae K-H, Feng B, Kim S, Lazarova-Molnar S, Zheng Z, Roeder T, Thiesing R, editors. Proceedings of the 2020 Winter Simulation Conference. IEEE; 2020. p. 980–91.

    Chapter  Google Scholar 

  58. Haacker M, Bärnighausen T, Atun R. HIV and the growing health burden from noncommunicable diseases in Botswana: modelling study. J Glob Health. 2019;9(1):010428.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Nianogo RA, Arah OA. Forecasting obesity and type 2 diabetes incidence and burden: the ViLA-obesity simulation model. Front Public Health. 2022;10

  60. Lymer S, Schofield D, Lee CMY, Colagiuri S. NCDMod: a microsimulation model projecting chronic disease and risk factors for Australian adults. Int J Microsimulation. 2015;9:103–39.

    Article  Google Scholar 

  61. Yarnoff B, Honeycutt A, Bradley C, Khavjou O, Bates L, Bass S, et al. Validation of the prevention impacts simulation model (PRISM). Prev Chronic Dis. 2021;18:200225.

    Article  Google Scholar 

  62. Fain BG, Dobrovolny HM. GPU acceleration and data fitting: agent-based models of viral infections can now be parameterized in hours. J Comput Sci. 2022;61:101662.

    Article  Google Scholar 

  63. Rasmussen CE. Gaussian Processes in Machine Learning. Bousquet, O., von Luxburg, U., Rätsch, G. (eds) Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science, vol 3176. Springer, Berlin, Heidelberg.

  64. O’Hagan A. Bayesian analysis of computer code outputs: a tutorial. Reliab Eng Syst Saf. 2006;91(10–11):1290–300.

    Article  Google Scholar 

  65. Thiagarajan JJ, Venkatesh B, Anirudh R, Bremer P-T, Gaffney J, Anderson G, et al. Designing accurate emulators for scientific processes using calibration-driven deep models. Nat Commun. 2020;11:5622.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. López-Lopera AF, Bachoc F, Durrande N, Rohmer J, Idier D, Roustant O. Approximating Gaussian Process Emulators with Linear Inequality Constraints and Noisy Observations via MC and MCMC. In: Tuffin B, L'Ecuyer P, editors. Monte Carlo and Quasi-Monte Carlo Methods. MCQMC 2018. Springer Proceedings in Mathematics & Statistics, vol 324. Cham: Springer; 2020.

  67. Mayala BK, Bhatt S, Gething P. Predicting HIV/AIDS at Subnational Levels using DHS Covariates related to HIV. DHS Spatial Analysis Reports No. 18. Rockville, Maryland, USA: ICF; 2020.

    Google Scholar 

  68. Chen Z, Fan J, Wang K. Remarks on multivariate Gaussian Process. 2020. arXiv:2010.09830.

    Google Scholar 

  69. Craig PS, Goldstein M, Seheult AH, Smith JA. Pressure matching for hydrocarbon reservoirs: a case study in the use of Bayes linear strategies for large computer experiments; 1997.

    Google Scholar 

  70. Bastos LS, O’Hagan A. Diagnostics for Gaussian process emulators. Technometrics. 2009;51:425–38.

    Article  Google Scholar 

  71. Revill P, Rangaraj A, Makochekanwa A, Mpofu A, Ciaranello AL, Jahn A, et al. Perspectives on the use of modelling and economic analysis to guide HIV programmes in sub-Saharan Africa. Lancet HIV. 2022;3018:1–4.

    Google Scholar 

  72. Kupfer LE, Beecroft B, Viboud C, Wang X, Brouwers P. A call to action: strengthening the capacity for data capture and computational modelling of HIV integrated care in low- and middle-income countries. Journal of the international AIDS. Society. 2020;23(Suppl 1):e25475.

    Google Scholar 

  73. Genberg BL, Wachira J, Steingrimsson JA, Pastakia S, Tran DNT, Said JA, et al. Integrated community-based HIV and non-communicable disease care within microfinance groups in Kenya: study protocol for the Harambee cluster randomised trial. BMJ Open. 2021;11:e042662.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Kibachio J, Mwenda V, Ombiro O, Kamano JH, Perez-Guzman PN, Mutai KK, et al. Recommendations for the use of mathematical modelling to support decision-making on integration of non-communicable diseases into HIV care. J Int AIDS Soc. 2020;23(Suppl 1):e25505.

    Article  PubMed  PubMed Central  Google Scholar 

  75. Osetinsky B, Mwangi A, Pastakia S, Wilson-Barthes M, Kimetto J, Rono K, et al. Layering and scaling up chronic non-communicable disease care on existing HIV care systems and acute care settings in Kenya: a cost and budget impact analysis. J Int AIDS Soc. 2020;23:e25496.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


Not applicable.


Research reported in this publication was supported by the National Institute of Mental Health of the National Institutes of Health under award number R01MH118075, and by the Civilian Research and Development Foundation (CRDF) Global under award number R-202104-67710. Approximately 50 % of this research was financed with Federal money. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or CRDF Global.

Author information

Authors and Affiliations



OG, SC, AM and FY led the conception and design of the work. SJS led data analysis and interpretation and wrote the first draft of the manuscript. RM, MWB and BO contributed to data analysis and to the conception and design of the work. All authors revised the work critically for important intellectual content, provided final approval of the version to be published, and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Omar Galárraga.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sawe, S.J., Mugo, R., Wilson-Barthes, M. et al. Gaussian process emulation to improve efficiency of computationally intensive multidisease models: a practical tutorial with adaptable R code. BMC Med Res Methodol 24, 26 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: