 Research
 Open access
 Published:
SpatialWavePredict: a tutorialbased primer and toolbox for forecasting growth trajectories using the ensemble spatial wave subepidemic modeling framework
BMC Medical Research Methodology volume 24, Article number: 131 (2024)
Abstract
Background
Dynamical mathematical models defined by a system of differential equations are typically not easily accessible to nonexperts. However, forecasts based on these types of models can help gain insights into the mechanisms driving the process and may outcompete simpler phenomenological growth models. Here we introduce a friendly toolbox, SpatialWavePredict, to characterize and forecast the spatial wave subepidemic model, which captures diverse wave dynamics by aggregating multiple asynchronous growth processes and has outperformed simpler phenomenological growth models in shortterm forecasts of various infectious diseases outbreaks including SARS, Ebola, and the early waves of the COVID19 pandemic in the US.
Results
This tutorialbased primer introduces and illustrates a userfriendly MATLAB toolbox for fitting and forecasting timeseries trajectories using an ensemble spatial wave subepidemic model based on ordinary differential equations. Scientists, policymakers, and students can use the toolbox to conduct realtime shortterm forecasts. The fiveparameter epidemic wave model in the toolbox aggregates linked overlapping subepidemics and captures a rich spectrum of epidemic wave dynamics, including oscillatory wave behavior and plateaus. An ensemble strategy aims to improve forecasting performance by combining the resulting topranked models. The toolbox provides a tutorial for forecasting timeseries trajectories, including the full uncertainty distribution derived through parametric bootstrapping, which is needed to construct prediction intervals and evaluate their accuracy. Functions are available to assess forecasting performance, estimation methods, error structures in the data, and forecasting horizons. The toolbox also includes functions to quantify forecasting performance using metrics that evaluate point and distributional forecasts, including the weighted interval score.
Conclusions
We have developed the first comprehensive toolbox to characterize and forecast timeseries data using an ensemble spatial wave subepidemic wave model. As an epidemic situation or contagion occurs, the tools presented in this tutorial can facilitate policymakers to guide the implementation of containment strategies and assess the impact of control interventions. We demonstrate the functionality of the toolbox with examples, including a tutorial video, and is illustrated using daily data on the COVID19 pandemic in the USA.
Background
Developing reliable methods for forecasting dynamic growth processes is critical for decisionmaking in problems ranging from predicting the weather, forecasting the trajectory of an emerging epidemic, the growth or decline of economic variables, election outcomes, and sporting events [1]. While statistical methods such as ARIMA and exponential smoothing are robust and broadly competitive for forecasting time series [2,3,4,5,6], dynamical mathematical models defined by a system of differential equations are typically not easily accessible to nonexperts. However, forecasts based on these types of models can help characterize the mechanisms driving the process [7]. They may offer higher forecasting performance than purely statistical approaches based on statistical evaluation criteria like mean absolute and squared errors [8,9,10,11]. Here we focus on dynamical models that can characterize growth processes that give rise to waves of variable shapes and sizes [12,13,14]. The complexity of this family of growth models ranges from single differential equation models with a few parameters, such as the 3parameter generalizedlogistic growth model (GLM) [14], to systems of ordinary differential equations (ODEs) that capture diverse wave dynamics by aggregating multiple asynchronous growth processes [13]. The spatial wave subepidemic framework has outperformed simpler phenomenological growth models in forecasts of various infectious diseases, including severe acute respiratory syndrome (SARS), Ebola, and the early waves of the coronavirus disease 2019 (COVID19) pandemic in the United States (US) [13, 15].
This tutorial paper introduces a userfriendly MATLAB toolbox to fit and forecast timeseries trajectories using the spatial wave subepidemic dynamic growth model based on ordinary differential equations, which was initially developed to characterize and derive shortterm forecasts of epidemic trajectories [13, 16]. This mathematical framework characterizes timeseries trajectories by aggregating multiple asynchronous growth processes. Each growth process (i.e., subepidemic) is modeled using a simple phenomenological growth model such as the generalized logistic growth model (GLM). This framework supports a family of growth models that yield similar fits to the calibration data, but their corresponding forecasts could produce diverse trajectories. Hence, we also incorporate ensemble techniques to combine the resulting models to boost forecasting performance [16, 17].
This toolbox is written for a diverse audience, including students training in timeseries forecasting. It allows the user to conduct parameter estimation and forecasting with quantified uncertainty and evaluate forecasting performance using a set of standard metrics, including the coverage of the 95% prediction interval and the weighted interval score, which account for the uncertainty of the predictions. The toolbox allows scientists and policymakers to generate shortterm forecasts by relying on minimal data of the process of interest, such as an unfolding epidemic or natural disaster.
The toolbox provides prediction intervals and allows the user to employ different estimation methods, assumptions of the error structure, and forecasting horizons. For instance, the toolbox includes estimation methods such as the nonlinear least squares estimation and maximum likelihood estimation (MLE) with different assumptions about the error structure of the observed data, including Poisson, negative binomial, and normal distributions, as well as quantification of the uncertainty based on a parametric bootstrapping approach. The model also provides flexibility to choose the underlying building block of the growth process. In addition, the toolbox includes functions to derive weighted and unweighted ensembles based on the resulting topranked models. The full functionality of the toolbox is illustrated using daily time series of COVID19 cases in the US, and in the process, shows that this framework outcompetes simpler single growth models and simple timeseries models (e.g., ARIMA, GAM, SLR) in calibration and forecasting performance.
We start by describing the format of the input timeseries data, followed by the methods employed for parameter estimation. Next, we describe the underlying methodology, user parameters, and functions to calibrate, evaluate, and display the model fits. Finally, we introduce the functions to generate, display, and quantify the performance of modelbased forecasts with specific examples in the context of the daily COVID19 case data reported in the USA. A tutorial video that demonstrates the toolbox functionality is available at: https://www.youtube.com/watch?v=qxuF_tTzcR8&t=47s.
Implementation
In this section, we describe the methods implemented in this toolbox and provide a brief overview of the toolbox functions.
Installing the toolbox

Download the MATLAB code located in the folder spatialWave_subepidemicFramework code from the GitHub repository: https://github.com/gchowell/spatial_wave_subepidemic_framework.

Create an ‘input’ folder in your working directory where your input data will be stored.

Create an ‘output’ folder in your working directory where the output files will be stored.

Open a MATLAB session.
Overview of the toolbox functions
The methodological workflow of the tutorial is organized as follows: (1) plotting model simulations, (2) fitting the models to data with quantified uncertainty, (3) plotting the resulting model fits and calibration performance metrics, and (4) plotting modelbased forecasts and the associated forecasting performance metrics. Table 1 and Supplementary Table 1 list the names of both user and internal functions associated with the toolbox, along with a brief description of their role. As described below, the user needs to specify the parameters related to model fitting and forecasting in the default options_fit.m and options_forecast.m files.
Parameter estimation method
Let \(f\left(t,{\Theta }\right)\) denote the expected curve of the epidemic’s trajectory. We can estimate model parameters \({\Theta }\) by fitting the model solution to the observed data via nonlinear least squares [18] or maximum likelihood estimation with specific assumptions about the error structure in the data [19] by specifying parameter <method1> in the options.m file. For nonlinear least squares (i.e., <method1> =0), this is achieved by searching for the set of parameters \(\widehat{\varTheta }\) that minimizes the sum of squared differences between the smoothed data \({y}_{{t}_{j}=}{y}_{{t}_{1}, }{y}_{{t}_{2}}\dots .{y}_{{t}_{{n}_{d}}}\) and the model mean, corresponding to \(f(t,{\Theta })\). That is, \({\Theta }=\left({C}_{thr},r,p,q,{K}_{0}\right)\) in the subepidemic wave model (given below) is estimated by \(\widehat{{\Theta }}=\text{arg}\text{min} \sum\nolimits _{j=1}^{{n}_{d}}{(f\left({t}_{j},{\Theta }\right){y}_{{t}_{j}})}^{2}\). We estimate the parameter \({C}_{thr}\) through simple discretization of its range of plausible values. Our estimation procedure consists of two steps. First, for each \({C}_{thr}\), we search for the set of parameters \((r,p,q,{K}_{0})\) that yield the best fit to the data. Then we choose \({C}_{thr}\)and the corresponding estimates of other parameters leading to the overall bestfit to the data.
Nonlinear least squares estimation weighs each of the data points equally and does not explicitly require a specific distributional assumption for \({y}_{t}\), except for the first moment \(E\left[{y}_{t}\right]=f({t}_{i};\varTheta )\). That is, the mean of the observed data at time t is equivalent to the expected count denoted by \(f\left(t,{\Theta }\right)\)at time t [20]. This method yields asymptotically unbiased point estimates regardless of any misspecification of the variancecovariance error structure. Hence, the estimated model mean \(f({t}_{i},\widehat{{\Theta }})\) yields the best fit to observed data \({y}_{{t}_{i}}\)in terms of squared L2 norm. We can solve the nonlinear least squares optimization problem using the fmincon function in MATLAB. Moreover, we also employ MATLAB’s MultiStart feature to specify the number of random initial guesses of the model parameters using the parameter <numstartpoints> in the options.m file in order to search thoroughly for a global minimum, check that the solution is unique, and the parameters are identifiable.
We can also estimate parameters via maximum likelihood estimation (MLE) [19] and assume different error structures in the data (e.g., Poisson, negative binomial). The loglikelihood expressions derived for different error structures are specified below.
Poisson
For a Poisson error structure, the full loglikelihood of Poisson (i.e., <method1> =1) is given by:
where \({\mu }_{i}=f ({t}_{i},\theta )\) denotes the mean of \({y}_{i}\)at time \({t}_{i}\). The number of parameters is just the number of parameters estimated in the dynamical model based on ordinary differential equations.
Negative binomial
Let \(r>0\) denote the number of failures until the experiment is stopped, \(p\in [0, 1]\) denote the success probability in each experiment. The number of successes y before the rth failure occurs has a negative binomial distribution given by:
with \(\text{m}\text{e}\text{a}\text{n}=\mu =\frac{rp}{(1p)}\), variance \(={\sigma }^{2}=\frac{rp}{{\left(1p\right)}^{2}}> \mu\). For n observations \({y}_{1}, \dots ,{y}_{n}\), the full loglikelihood is
which can be expressed with \(\mu\) and σ ^{2} by pluggingin \(p=1\frac{{\upmu }}{{{\upsigma }}^{2}}\) and \(r=\frac{{\mu }^{2}}{{\sigma }^{2}\mu }\).
There are different types of variances commonly used in a negative binomial distribution. If the variance scales linearly with the mean: \({\sigma }^{2}=\mu +\alpha \mu\), (i.e., <method1> = 3 in options.m), then \(p=\frac{\alpha }{1+\alpha }\) and \(r=\mu /\alpha\). Let \(\mu =f \left(t,\theta \right)\)be the mean curve to be estimated from the differential equation. The full loglikelihood (1.1) can be expressed as follows:
If the variance scales quadratically with the mean, \({\sigma }^{2}=\mu +\alpha {\mu }^{2}\) (i.e., <method1> =4 in options.m), then \(p=\frac{\alpha \mu }{1+\alpha \mu }\)and \(r=1/\alpha\). The full loglikelihood (1.1) can be expressed as follows:
The more general form of variance is \({\sigma }^{2}=\mu +\alpha {\mu }^{d}\) (i.e., <method1> =5 in options.m) with any \({\infty }<d<{\infty }\). Then the full loglikelihood (1.1) can be expressed as follows:
where \({\mu }_{i}=f ({t}_{i},\theta ).\)
The number of parameters is 1 plus the number of parameters in the dynamical model based on ordinary differential equations (ODE) for (1.2) ~ (1.3), and 2 plus the number of parameters in the dynamical model for (1.4) if d is also estimated via MLE. Assuming Poisson or negative binomial error structures in the data, we can estimate parameters using MLE by specifying parameters in the options.m file, such as <method1 >=1 & <dist1>=1 for Poisson and <method1> & <dist1>=3, <method1>=4 & <dist1 >=4, and <method1>=5 & <dist1> = 5 for the different negative binomial error structures described above.
Parametric bootstrapping
To quantify parameter uncertainty, we follow a parametric bootstrapping approach which allows the computation of standard errors and related statistics in the absence of closedform formulas [21]. We generate B bootstrap samples from the bestfit model \(f(t,\widehat{\varTheta})\), with an assumed error structure specified using parameter <dist1> in the options.m file to quantify the uncertainty of the parameter estimates and construct confidence intervals. Typically, the error structure in the data is modeled using a probability model such as the Poisson or negative binomial distribution. Using nonlinear least squares (< method1> = 0), besides a normally distributed error structure (<dist1> = 0), we can also assume a Poisson (<dist1> = 1) or a negative binomial distribution (<dist1> = 2) whereby the variancetomean ratio is empirically estimated from the time series. To estimate this constant ratio, we group a fixed number of observations (e.g., 7 observations for daily data into a bin across time), calculate the mean and variance for each bin, and then estimate a constant variancetomean ratio by calculating the average of the variancetomean ratios over these bins.
Using the bestfit model \(f(t,\widehat{\varTheta })\), we generate Btimes replicated simulated datasets of size \({n}_{d}\), where the observation at time \({t}_{j }\)is sampled from the corresponding distribution specified by <dist1>. Next, we refit the model to each of the B simulated datasets to reestimate the parameters using the same estimation method for the bootstrap sample as for the original data. This allows us to quantify the uncertainty of the estimate using that method. The new parameter estimates for each realization are denoted by \({\widehat{{\Theta }}}_{b}\), where \(b=\text{1,2},\dots ,B\). Using the sets of reestimated parameters (\(\widehat{\Theta}_{b}\)), it is possible to characterize the empirical distribution of each estimate, calculate the variance, and construct confidence intervals for each parameter. The resulting uncertainty around the model fit can similarly be obtained from \(f\left(t,{\widehat{{\Theta }}}_{1}\right),\) \(f\left(t,{\widehat{{\Theta }}}_{2}\right),\dots ,f(t,{\widehat{{\Theta }}}_{B})\). We characterize the uncertainty using 300 bootstrap realizations (i.e., parameter <B> = 300 in the options.m file).
Modelbased forecasts with quantified uncertainty
Forecasting the model \(f\left(t,\widehat{{\Theta }}\right), h\) days ahead is based on the estimate \(f(t+h,\widehat{\Theta})\). The uncertainty of the forecasted value can be obtained using the previously described parametric bootstrap method. Let
denote the forecasted value of the current state of the system propagated by a horizon of h time units, where \({\widehat{{\Theta }}}_{b}\) denotes the estimation of parameter set \({\Theta }\) from the b _{th} bootstrap sample. We can use these values to calculate the bootstrap variance to measure the uncertainty of the forecasts and use the 2.5% and 97.5% percentiles to construct the 95% prediction intervals (95% PIs). We can set the forecasting horizon using the parameter <forecastingperiod1> in the options_forecast.m file. The structure of the options_forecast.m file is described in Supplementary Text 2.
For the COVID19 case data employed for illustration purposes, we fit the models by the nonlinear least squares method assuming a normal error structure (i.e., <method1> =0 and <dist1>=0) (Fig. 1).
Subepidemic wave model
We use a spatial wave model with up to 5 parameters that aggregate linked overlapping subepidemics [13]. This subepidemic framework can characterize diverse epidemic patterns, including the epidemic plateaus, where the epidemic stabilizes at a high level for an extended period and the epidemic waves have multiple peaks. The strength (e.g., weak vs. strong) of their overlap determines when the next subepidemic is triggered and is controlled by the onset threshold parameter, \({C}_{thr}\). The mathematical equation for the subepidemic building block is the 3parameter generalizedlogistic growth model (GLM), which is specified by setting the parameter <flag1>=1 in the options.m file. This growth model has performed well in shortterm forecasts of single outbreak trajectories for different infectious diseases, including COVID19 [22,23,24]. Alternative growth equations to model the subepidemic building block include the 3parameter Richards model (<flag1>=4) and the 2parameter logistic growth model (<flag1>=2). The following differential equation gives the generalizedlogistic growth model (GLM):
where \(C\left(t\right)\) denotes the cumulative curve at time t, and \(\frac{dC\left(t\right)}{dt}\) describes the epidemic’s incidence curve over time t. The positive parameter \(r\) denotes the growth rate per unit of time, \({K}_{0}\) is the final outbreak size, and \(p \in \left[\text{0,1}\right]\) is the “scaling of growth” parameter which allows the model to capture early subexponential and exponential growth patterns. If \(p=0\), this equation describes a constant incidence over time, while \(p=1\) indicates that the early growth phase is exponential. Intermediate values of \(p\hfill(0 <p<1)\) describe early subexponential (e.g., polynomial) growth dynamics. The subepidemic wave model consists of a system of coupled differential equations:
Here, \({C}_{i}\left(t\right)\) is the cumulative number of infections for subepidemic \(i\), and \({K}_{i}\) is the size of the \({i}_{th}\) subepidemic where \(i=1,\dots ,n\). Starting from an initial subepidemic size \({K}_{0}\), the size of consecutive subepidemics \({K}_{i}\) decline at the rate q following an exponential or powerlaw function as described below. Hence, a total of 5 parameters \((r,p,{C}_{thr}, {q,K}_{0})\) for \(i=1,\dots ,n\) are needed to characterize a subepidemic wave composed of two or more subepidemics.
The onset timing of the subsequent \({\left(i+1\right)}_{th}\) subepidemic is determined by the indicator variable \({A}_{i}\left(t\right)\). This results in a coupled system of subepidemics where the \({(i+1)}_{th }\) subepidemic is triggered when the cumulative curve for the \({i}_{th}\) subepidemic exceeds a total of \({C}_{thr}\). The subepidemics overlap because the (\(i+1{)}_{th}\)subepidemic takes off before the \({i}_{th}\) subepidemic completes its course. That is,
The threshold parameters are defined so that 1 \(\le {C}_{thr}<{K}_{0}\) and \({A}_{0}\left(t\right)=1\) for the first subepidemic. The maximum number of subepidemics considered in the epidemic wave trajectory is specified using parameter <npatches_fixed> in the options.m file. Here, we set <npatches_fixed>=3. The initial number of cases is given by \({C}_{1}\left(0\right)={I}_{0}\), where \({I}_{0}\) is the initial number of cases in the observed data.
In this framework, the size of the subsequent \({i}_{th}\) subepidemic (\({K}_{i}\)) remains steady or declines due to the effects of behavior changes or interventions. We consider both exponential and inverse decline functions to model the size of consecutive subepidemics described below.
Exponential decline of subepidemic sizes
If consecutive subepidemics follow exponential decline, then \({K}_{i}\) is given by:
where K _{0} is the size of the initial subepidemic \({(K}_{1}={K}_{0})\). If \(q=0\), the model predicts an epidemic wave composed of subepidemics of the same size. When \(q>0\), the epidemic wave is composed of a finite number of subepidemics given by \({n}_{tot}\) which is a function of \({C}_{thr}, {q, \text{and} K}_{0}\) as follows:
where the brackets \(\lfloor *\rfloor\) denote the largest integer that is smaller than or equal to \(*\). The total size of the epidemic wave composed of \({n}_{tot}\) overlapping subepidemics has the following closedform solution:
The exponential subepidemic decline function can be selected by setting the parameter <typedecline2>=1 in the options.m file.
Powerlaw decline of subepidemic sizes
If consecutive subepidemics decline according to the inverse function, we have:
When \(\text{q}>0\), the total number of subepidemics \({n}_{tot}\) comprising the epidemic wave is finite and given by:
The total size of an epidemic wave is given by the aggregation of \({n}_{tot}\) overlapping subepidemics:
The powerlaw subepidemic decline function can be selected by setting the parameter <typedecline2> = 2 in the options.m file. Selecting the type of decline function that yields the best fit to the data is also possible by setting the parameter,
<typedecline2>=[1 2].
Fixed subepidemic onset
We can also consider subepidemic wave models with a fixed onset time at 0. In this case, all subepidemics start at time 0, and the threshold parameter \({C}_{thr}\) drops from the model. We use parameter <onset_fixed> in the options.m file to specify whether the onset timing of the subepidemics is fixed at time 0 (<onset_fixed>=1) or not (<onset_fixed>=0).
Topranked subepidemic models
To select the topranked subepidemic models, we analyze the Akaike information criterion (\(AI{C}_{c}\)) values of the set of bestfit subepidemic wave models with different values of \({C}_{thr}\). The \({AIC}_{c}\) is given by [25, 26]:
where \(m\) is the number of model parameters, and \({n}_{d}\) is the number of data points. Specifically for normal distribution, the \({AIC}_{c}\) is
where \(SSE=\sum _{j=1}^{{n}_{d}}{(f\left({t}_{j},\widehat{{\Theta }}\right){y}_{{t}_{j}})}^{2}\) is the sum of squared errors, \(m\) is the number of model parameters including parameter \({C}_{thr}\). Parameter <topmodelsx> in the options.m file is used to specify the number of topranked models that will be generated and used to derive ensemble models.
To illustrate the methodology, we set <onset_fixed>=0, < typedecline2 > = 2 (powerlaw decline) and analyzed four topranking subepidemic models (<topmodelsx> = 4). The topranking models are used to construct three ensemble subepidemic models, which we refer to as: Ensemble(2), Ensemble(3), and Ensemble(4) (Fig. 2).
Plotting simulations of the spatial wave subepidemic model
Before fitting the growth model to the data, it is useful to check that the selected model yields simulations broadly consistent with the range of the time series data by generating model simulations with different parameter values. For example, if data show systematic differences that contrast with the model solutions, it may suggest that the model is not the best choice for the data at hand.
The function plot_SW_subepidemic.m can be used to plot model solutions where the user provides the type of growth model by passing parameter <flag1> (generalizedlogistic growth model, Richards, Gompertz, etc.), the model parameter values, and the initial conditions as passing input parameters to the function in the following order: <flag1>, \(r\), \(p\), \(a,K,q,n, {C}_{thr},\) <typedecline1>, \(C\left(0\right)\), and finally the duration of the simulation. For example, the following call plots a simulation of the spatial wave subepidemic model using as building block the generalized logistic growth model (< flag1 > = 1) and the following model parameter values: \(r=0.18,p=0.18,K=1000,q=0.24,n=8,{C}_{thr}=50\). The initial condition \(C\left(0\right)=5\), and the total duration of the simulation is set at 200.
>> plot_SW_subepidemic (1,0.18,0.9,[],1000,0.24,8,50,1,5,200)
Of note, in the above call, the value of parameter \(a\) is passed empty ([]) since the generalized logistic growth model does not use this parameter. This function will generate a figure (Fig. 3A) that shows the corresponding model solution \(dC\left(t\right)/dt\). Additional representative simulations with other values of the \({C}_{thr}\) are shown in Fig. 3.
In the next section, we describe four comprehensive performance metrics that can be used to assess both calibration and forecasting performance. Specifically, the mean absolute error (MAE) and the mean squared error (MSE) are used to assess the performance of point forecasts, while the coverage of the 95% prediction interval (95% PI) and the weighted interval score (WIS) evaluate the performance of distributional forecasts by accounting for uncertainty in model fit and predictions.
Performance metrics
To assess the performance of the models during the calibration or forecasting periods, we used four performance metrics: the mean absolute error (MAE), the mean squared error (MSE), the coverage of the 95% prediction intervals (95% PI), and the weighted interval score (WIS) [27]. While it is possible to generate htime units ahead forecasts of an evolving process, those forecasts looking into the future can only be evaluated until sufficient data for the htime units ahead has been collected. In the options_forecast.m file, the parameter <getperformance> is a Boolean variable (0/1) to indicate whether the user wishes to compute the performance metrics of the forecasts when sufficient data is available.
The mean absolute error (MAE) is given by:
where \({t}_{h}\)are the time points of the time series data [28], and N is the calibration or forecasting period length. Similarly, the mean squared error (MSE) is given by:
where \({t}_{h}\)are the time points of the time series data [28], and N is the calibration or forecasting period length. The coverage of the 95% prediction interval (PI) corresponds to the fraction of data points that fall within the 95% PI, calculated as
where \({L}_{{t}_{h}}\)and \({U}_{{t}_{h}}\) are the lower and upper bounds of the 95% PIs, respectively, \({Y}_{{t}_{h}}\)are the data and 1 is an indicator variable that equals 1 if \({Y}_{{t}_{h}}\)is in the specified interval and 0 otherwise.
The weighted interval score (WIS) [27, 29], which is a proper score recently embraced for quantifying model forecasting performance in epidemic forecasting studies [30,31,32,33], provides quantiles of predictive forecast distribution by combining a set of Interval Scores (IS) for probabilistic forecasts. An IS is a simple proper score that requires only a central \((1\alpha )\times 100\%\) PI [27] and is described as
In this Eq. 1 refers to the indicator function, meaning that \(1\left(y<l\right)=1\) if \(y<l\) and 0 otherwise. The terms \(l\) and \(u\) represent the \(\frac{\alpha }{2}\) and \(1\frac{\alpha }{2}\) quantiles of the forecast F. The IS consists of three distinct quantities:

1.
The sharpness of \(F,\) given by the width \(ul\) of the central \(\left(1\alpha \right) \times 100\%\) PI.

2.
A penalty term \(\frac{2}{\alpha }\times \left(ly\right)\times 1\left(y<l\right)\) for the observations that fall below the lower end point \(l\) of the \(\left(1\alpha \right)\times 100\%\) PI. This penalty term is directly proportional to the distance between \(y\) and the lower end \(l\) of the PI. The strength of the penalty depends on the level \(\alpha .\)

3.
An analogous penalty term \(\frac{2}{\alpha }\times \left(yu\right)\times 1\left(y>u\right)\) for the observations falling above the upper limit \(u\) of the PI.
To provide more detailed and accurate information on the entire predictive distribution, we report several central PIs at different levels \(\left(1{\alpha }_{1}\right)<\left(1{\alpha }_{2}\right)<\dots <\left(1{\alpha }_{K}\right)\) along with the predictive median, \(\stackrel{\sim}{y}\), which can be seen as a central prediction interval at level \(1\alpha_0\rightarrow0\). This is referred to as the WIS, and it can be evaluated as follows:
where, \({w}_{k}=\frac{{\alpha }_{k} }{2}\)for \(k=\text{1,2},\dots .K\) and \({w}_{0}=\frac{1}{2}\). Hence, WIS can be interpreted as a measure of how close the entire distribution is to the observation in units on the scale of the observed data [31, 34].
Doubling times
Doubling times characterize the sequence of times at which the cumulative incidence doubles. We denote the times at which cumulative incidence doubles by \({t}_{{d}_{j}}\), such that \({2 C(t}_{{d}_{j}})={C(t}_{{d}_{j+1}})\) where \({t}_{{d}_{0}}=0, C\left( {t}_{{d}_{0}}\right)={C}_{0}\), \(j=\text{1,2},3,\dots ,{n}_{g}\) and \({n}_{g}\) is the total number of times cumulative incidence doubles [35]. The actual sequence of “doubling times” is defined as follows:
For exponential growth, doubling times remain invariant and are given by \(\left(ln2\right)/r\), whereas the doubling times increase when the growth pattern follows subexponential growth [36]. We can characterize the doubling times and their uncertainty from the bestfit model \(f\left(t,\widehat{{\Theta }}\right)\) [37]. We can evaluate the uncertainty of the sequence of doubling times and the overall doubling time using the model parameter estimates derived from bootstrapping \(\left({\widehat{{\Theta }}}_{b}\right)\), where \(b=\text{1,2},3,\dots ,B\). That is, \({d}_{j}\left({\widehat{{\Theta }}}_{b}\right)\) provides a sequence of doubling times for a set of bootstrap parameter estimates, \({\widehat{{\Theta }}}_{b}\), where \(b=\text{1,2},3,\dots ,B\). We can use these curves to derive 95% CIs for the sequence of doubling times and quantify the probability of observing a given number of doublings.
Constructing ensemble forecasts from topranking models
Ensemble models that combine the strength of multiple models may exhibit significantly enhanced predictive performance (e.g [11, 17, 38, 39]). An ensemble model derived from the topranking \(I\) models is denoted by the Ensemble(1), illustrated in Fig. 4. Thus, Ensemble(2) and Ensemble(3) refer to the ensemble models generated from the combination of the topranking 2 and 3 models, respectively. The ensemble models can be derived from the unweighted (equal weights across contributing individual models) or a weighted combination of the highestranking subepidemic models based on the quality of fit as deemed by the \({AIC}_{{c}_{i}}\)for the ith model where \({AIC}_{{c}_{1}}\le \dots \le {AIC}_{{c}_{I}}\) and \(i\) = 1, …, I. In this case, we compute the weight \({w}_{i}\)for the ith model, \(i\) = 1, …, I, where \(\sum {w}_{i}\) = 1 as follows:
and hence \({\text{w}}_{\text{I}}\le \dots {\le \text{w}}_{1}\).
The estimated mean curve of daily COVID19 cases for the Ensemble(I) model is:
where given the training data, \(\widehat\Theta^{\left(i\right)}\) denotes the set of estimated parameters, and \(f_i\left(t,\;\widehat\Theta\right)^{\left(i\right)}\) denotes the estimated mean curve of daily COVID19 cases, for the ith model. Accordingly, we compute the weighted average and sample the bootstrap realizations of the forecasts for each model to construct the 95% CI or PI using the 2.5% and 97.5% quantiles [17]. Alternatively, we can set the ensemble weights based on different calibration performance metrics for the topranked models. For instance, we can make the ensemble weights proportional to the relative likelihood (\(l\)) rather than the reciprocal of the \(AI{C}_{c}\). Let \({AIC}_{min}\) denote the minimum \(AIC\) from the set of models. The relative likelihood of model i is given by \({l}_{i}={e}^{({(AIC}_{min}{AIC}_{i})/2)}\) [40]. We compute the weight \({w}_{i}\) for the \(i\)th model where \(\sum {w}_{i}\) = 1 as follows:
and hence \({w}_{I}\le \dots {\le w}_{1}\).
In the options_forecast.m file, we can specify four types of ensemble weights using <weight_type1>. Specifically, unweighted (<weigth_type1>=1), weighted according to the \(AI{C}_{c}\) (<weight_type1>=0), weighted based on the relative likelihood (weight_type1=1), weighted based on the reciprocal of the WIS metric of the calibration period (<weight_type1>=2).
In the options_forecast.m file, we can specify the parameters related to the epidemic forecasts, including the forecasting horizon and the type of ensemble weights (Fig. 5).
Results and discussion
The dataset
The time series data file is a text file with the extension *.txt in the input folder. The data file can contain one or more incidence time series (one per column). Each column corresponds to the incidence curve over time for each epidemic corresponding to a different area/group. For instance, each column could contain time series data corresponding to different U.S. states or countries worldwide. In the options.m file, a specific data column can be accessed for inference using the parameter <outbreakx>. If the time series file contains cumulative incidence count data, the name of the file containing the time series data starts with “cumulative” according to the following format:
cumulative < cadtemporal><caddisease><datatype><cadregion><caddate1> .txt.
where <cadtemporal> is a string parameter that indicates the temporal resolution of the data (e.g., daily, weekly, yearly). Parameter <caddisease> is a string used to indicate the name of the disease related to the time series data, <datatype> is a string parameter indicating the nature of the data (e.g., cases, deaths, and hospitalizations), whereas <cadregion> is a string parameter indicating the geographic region of the time series contained in the file (New York, USA, World, Asia, Africa). Finally, <caddate1> is a string to indicate the date for the most recent observation in the data file with the format: mmddyyyy.
To illustrate the methodology presented in this tutorial paper, we used daily COVID19 cases reported in the USA from the publicly available data tracking system of the Johns Hopkins Center for Systems Science and Engineering (CSSE) [41]. The data is also publicly available in the GitHub repository [42]. An example of a data file that we will use in this tutorial is provided in Fig. 6.
If the time series file contains incidence data, the name of the data file does not start with the word ‘cumulative’ and follows the format:
<cadtemporal><caddisease><datatype><cadregion><caddate1>.txt
For example: dailycoronaviruscasesUSA05112020.txt
In the options.m file, the parameter <datevecfirst1> is a 3value vector that specifies the date corresponding to the first data point in time series data in the format: [yyyy mm dd]. Similarly, the parameter <datevecend1> is a 3value vector that specifies the date of the most recent data file in the format: [yyyy mm dd]. The file.
cumulative<cadtemporal><caddisease><datatype><cadregion><datevecend1>.txt
in the input folder with the date <datevecend1> contains the most recent time series data and is needed to assess forecast performance. Finally, the parameter <DT> is an integer indicating the temporal resolution of the time series data (e.g., <DT> = 1 for daily data; <DT> = 7 for weekly data) (Fig. 7).
Data adjustments
Data smoothing
To reduce the noise in the original data due to artificial reasons such as the weekend effects, we can smooth out the time series data using the moving average of the time series whereby <smoothfactor1> is a parameter in the options.m file that specifies the span of the moving average (e.g., <smoothfactor1> = 1 implies no smoothing applied to the data). Let
denote the smoothed time series of the epidemic trajectory based on the moving average. Here, \({t}_{j}\)are the time points for the time series data, \({n}_{d}\) is the number of observations, and each \({y}_{{t}_{j}},\)
\(j=1,2,\dots ,{n}_{d}\), correspond to the smoothed time series. We recommend that the user set the average to multiples of seven to reduce the weekend effects in the reported data.
For the daily COVID19 case data employed for illustration purposes, we set <smoothfactor1> = 7 and smooth out the daily series using a 7day moving average to reduce the noise in the original data due to artificial reasons such as the weekend effects.
Calibration period
To fit the models to the most recent observations in a time series file, we can specify the length of the calibration period whereby <calibrationperiod1> indicates the number of recent data points that will be used to calibrate the models. If <calibrationperiod1> exceeds the length of the time series in the data file, the calibration period is set to the maximum length of the available data.
For illustration purposes, we used a 90day calibration period (i.e., <calibrationperiod1> = 90) (Fig. 8).
Fitting the subepidemic wave models to data with quantified uncertainty
To fit the subepidemic wave models to the data with quantified uncertainty, we need to run the function Run_SW_subepidemicFramework.m. This function uses the input parameters provided by the user in the options.m file. However, the function can also receive <outbreakx> and <caddate1> as passing input parameters while the remaining inputs are obtained from the options.m file.
For example, to fit the ensemble subepidemic models to the daily curve of COVID19 cases in the USA as of the week of ‘05112020’ (data file path: input/cumulativedailycoronaviruscasesUSA05112020.txt), we can run the function from MATLAB’s command line window as follows:
>> Run_SW_subepidemicFramework (52,‘05112020’)
This function will generate several output MATLAB files in the output folder. For instance, the following output file contains the fits of the topranking models:
ABCoriginalnpatchesfixed4onsetfixed0typedecline2smoothing1dailycoronaviruscasesUSAstate5205112020flag11method0dist0calibrationperiod90.mat
Please note that the names of the output files contain the values of the parameters for reference.
The following output files contain the uncertainty characteristics associated with each of the topranking models:

a)
modifiedLogisticPatchoriginalnpatchesfixed4onsetfixed0typedecline2smoothing1dailycoronaviruscasesUSAstate5205112020flag11method0dist0calibrationperiod90rank1.mat

b)
modifiedLogisticPatchoriginalnpatchesfixed4onsetfixed0typedecline2smoothing1dailycoronaviruscasesUSAstate5205112020flag11method0dist0calibrationperiod90rank2.mat

c)
modifiedLogisticPatchoriginalnpatchesfixed4onsetfixed0typedecline2smoothing1dailycoronaviruscasesUSAstate5205112020flag11method0dist0calibrationperiod90rank3.mat

d)
modifiedLogisticPatchoriginalnpatchesfixed4onsetfixed0typedecline2smoothing1dailycoronaviruscasesUSAstate5205112020flag11method0dist0calibrationperiod90rank4.mat
These output internal files are needed to plot model fits, derive parameter estimates, generate shortterm forecasts, and quantify the calibration and forecasting performance metrics.
Plot the mean model fits and quality of fit metrics for the topranked models
After running the function Run_SW_subepidemicFramework.m with the desired input parameters, we can use the function plotRankings_SW_subepidemicFramework.m to plot the mean model fits of the topranking models including their subepidemic profiles and the associated quality of model fit metrics including the \(AI{C}_{c}\), the relative likelihood, and the evidence ratio based on the inputs. However, this function can also receive <outbreakx> and <caddate1> as passing input parameters while the remaining inputs are obtained from the options.m file. Running this function from MATLAB’s command line, we have:
>>plotRankings_SW_subepidemicFramework(52,‘05112020’)
Figures 9 and 10 illustrate the outputs obtained from this function call. Figure 9 shows the mean model fits of the topranked subepidemic models, which indicates that the 1stranked model consists of 3 subepidemics. In contrast, the 2nd, 3rd, and 4th ranked subepidemic models consist of 2 subepidemics. It is important to note that there was severe underreporting of cases during the early phase of the epidemic. The corresponding goodness of fit statistics of the topranked models, including the \(AI{C}_{c}\), the relative likelihood, and the evidence ratio, are shown in Fig. 10. It also saves the \(AI{C}_{c}\) values of the topranked models in the following .csv file:
AICctopRankedonsetfixed0typedecline2flag11method0dist0dailycoronaviruscasesUSAarea5205112020.csv.
For comparison, a simpler growth model consisting of a single subepidemic (<npatches_fixed> = 1) performs substantially worse (\(AI{C}_{c}=1530.4\); Supplementary Fig. 1).
Plot the model fits, parameter estimates, and performance metrics of the topranking models
Using the function plotFit_SW_subepidemicFramework.m, we can plot the fits of the topranking models, including their subepidemic profiles, parameter estimates, and residual plots based on the inputs indicated in the options.m file. However, this function can also receive <outbreakx> and <caddate1> as passing input parameters while the remaining inputs are obtained from the options.m file.
In addition, this function also plots the empirical distributions of the parameters associated with each of the topranking models and the calibration performance metrics (MSE, MAE, 95% P.I., and WIS). Finally, this function also outputs .csv files in the output folder with the calibration performance metrics, the parameter estimates associated with the topranking models, the corresponding Monte Carlo standard errors of the parameters, and the estimated sequence of doubling times for each of the topranked models. Using the default parameter values indicated in the options.m file, the actual call to this function from MATLAB’s command line follows:
>> plotFit_SW_subepidemicFramework
Figures 11 and 12 illustrate the outputs from the above call to the function. The fits of the 1st and 2nd ranked subepidemic models, including the subepidemic profiles and residuals, to the daily curve of COVID19 cases are shown in Figs. 11 and 12. These models yield a similarly good fit to the data. The figures also include the empirical distribution of the parameter estimates. These parameter estimates are well identified as the confidence intervals lie in a welldefined range of values [13]. The calibration performance metrics capturing the quality of fit of the topranked subepidemic models are also displayed in Fig. 13. For instance, this figure indicates that the coverage of the 95% PIs varied little between ~ 93% and 95% for the topranked models. This function will store the following .csv files in the output folder:

1)
Model parameter estimates:
parameterstopRankedonsetfixed0typedecline2flag11method0dist0dailycoronaviruscasesUSAarea5205112020.csv

2)
Monte Carlo standard errors:
MCSEStopRankedonsetfixed0typedecline2flag11method0dist0dailycoronaviruscasesUSAarea5205112020.csv

3)
Calibration performance metrics:
performancecalibrationtopRankedonsetfixed0typedecline2flag11method0dist0dailycoronaviruscasesUSAarea5205112020.csv

4)
Doubling times for each of the topranked models:
doublingTimesranked(1)onsetfixed0typedecline2flag11method0dist0dailycoronaviruscasesUSAarea5205112020.csv
doublingTimesranked(2)onsetfixed0typedecline2flag11method0dist0dailycoronaviruscasesUSAarea5205112020.csv
doublingTimesranked(3)onsetfixed0typedecline2flag11method0dist0dailycoronaviruscasesUSAarea5205112020.csv
doublingTimesranked(4)onsetfixed0typedecline2flag11method0dist0dailycoronaviruscasesUSAarea5205112020.csv
A relevant issue to investigate when using any mathematical model is that of structural or practical parameter identifiability [43]. Structural identifiability arises when one or more model parameters cannot be uniquely estimated using the model, even when the data is free of noise. That is, a lack of structural identifiability is due to issues in the model structure, such as the presence of parameter correlations [12]. On the other hand, practical identifiability occurs when one or more parameters cannot be reliably estimated using the available observed data, which could be associated with the number of observations available for model calibration and the spatialtemporal resolution of the data. Because the time series of incident cases in the observed epidemic wave is an aggregation of overlapping subepidemics, there could be instances when different subepidemic profiles may give rise to indistinguishable aggregated epidemic waves as noted elsewhere [44].
Generate the topranked and ensemble subepidemic model forecasts and the associated forecasting performance metrics
Using the function plotForecast_SW_subepidemicFramework.m, we can plot the shortterm forecasts from the topranking subepidemic models and the ensemble models derived from the topranking subepidemic models based on the inputs indicated in the options.m and the options_forecast.m files. However, this function can also receive parameters < outbreakx>, <caddate1>, or <forecastingperiod> as passing input parameters while the remaining inputs are read from the options.m and options_forecast.m files. Moreover, the data associated with each topranked model and ensemble forecasts are saved as .csv files in the output folder.
In addition, this function also plots the forecasting performance metrics (MSE, MAE, 95% P.I., WIS) for the topranking models and the ensemble subepidemic wave models. Finally, this function also stores *.csv files in the output folder with the forecasting performance metrics associated with the topranking and ensemble models, and the estimated doubling times for each of the topranked models. Using the default parameter values indicated in the options.m, and options_forecast.m files, the call to this function from MATLAB’s command line follows:
>> plotForecast_subepidemicFramework
Figures 14 and 15 illustrate the outputs obtained from this function call. Figure 14 shows the 30day forecasts derived from the topranking subepidemic models, whereas Fig. 15 shows the subepidemic profiles of the forecasts. These forecasts indicate that the 1stranked model outperformed the other topranked models. Moreover, the data associated with the topranked model forecasts are also saved as .csv files in the output folder. The forecasting performance metrics for the topranked models are displayed in Fig. 16, and these metrics are also saved in a .csv file in the output folder. In comparison, the forecast derived from the simpler growth model consisting of a single subepidemic (<npatches_fixed>=1) was substantially worse, as shown in Supplementary Fig. 2.
The corresponding 3 ensemble forecasts (Ensemble(2), Ensemble(3), and Ensemble(4)) derived from the weighted combination of the topranked models based on their relative likelihood or Akaike weights (e.g., < weight_type1>=1 in the options_forecast.m file) are shown in Fig. 17. Also, the corresponding forecasting performance metrics for the ensemble models are shown in Fig. 18 and are saved in a .csv file in the output folder. The Ensemble(4) performed slightly better than the Ensemble(2) and Ensemble(3) models in terms of the WIS and coverage of the 95% prediction interval. This function will store the following .csv files in the output folder:

1)
Forecasting performance metrics of the topranked models:
performanceforecastingtopRankedonsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv

2)
Forecasting performance metrics of the ensemble models:
performanceforecastingEnsembleonsetfixed0typedecline2flag11method0dist0horizon30weight_type1dailycoronaviruscasesUSAarea5205112020.csv

3)
Forecasts of the topranked models:
ranked(1)onsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv
ranked(2) onsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv
ranked(3) onsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv
ranked(4) onsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv

4)
Forecasts of the ensemble models:
Ensemble(2)onsetfixed0typedecline2flag11method0dist0horizon30weight_type1dailycoronaviruscasesUSAarea5205112020.csv
Ensemble(3) onsetfixed0typedecline2flag11method0dist0horizon30weight_type1dailycoronaviruscasesUSAarea5205112020.csv
Ensemble(4) onsetfixed0typedecline2flag11method0dist0horizon30weight_type1dailycoronaviruscasesUSAarea5205112020.csv

5)
Sequence of doubling times of the topranked models:
doublingTimesranked(1)onsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv
doublingTimesranked(2)onsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv
doublingTimesranked(3)onsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv
doublingTimesranked(4)doublingTimesonsetfixed0typedecline2flag11method0dist0horizon30dailycoronaviruscasesUSAarea5205112020.csv

6)
Sequence of doubling times of the ensemble models:
doublingTimesEnsemble(2)onsetfixed0typedecline2flag11method0dist0horizon30weight_type1dailycoronaviruscasesUSAarea5205112020.csv
doublingTimesEnsemble(3)onsetfixed0typedecline2flag11method0dist0horizon30weight_type1dailycoronaviruscasesUSAarea5205112020.csv
doublingTimesEnsemble(4)onsetfixed0typedecline2flag11method0dist0horizon30weight_type1dailycoronaviruscasesUSAarea5205112020.csv
We can also compare the performance of unweighted ensemble models by setting the parameter <weight_type1>=1 in the options_forecast.m file while the other parameters are kept unchanged. Then we can compare the performance of the unweighted ensemble models (equal weights across topranked models) with the weighted ensemble models, where the weights are proportional to the relative likelihood of the models (<weight_type1>= 1). We can run the function plotForecast_subepidemicFramework.m to generate the new set of forecasts with the new models.
The forecasting performance metrics for the weighted and unweighted ensemble models and other statistical timeseries models are displayed in Table 2. Overall, the unweighted ensemble models performed similarly as their weighted ensemble counterparts for this forecast and outperformed some popular statistical timeseries models such as ARIMA (a brief description of the statistical models is given in Supplementary Text S3).
Conclusion
We have introduced a MATLAB toolbox to fit and forecast time series using the spatial wave subepidemic model originally developed to generate shortterm forecasts of epidemics [13] and illustrated its functionality using timeseries data of the COVID19 pandemic in the US. In particular, the subepidemic model used in this tutorial has shown competitive performance in characterizing and forecasting epidemic trajectories of infectious diseases such as COVID19, Ebola, and plague [13, 15]. The toolbox can be a helpful resource for policy makers and used as a part of the curriculum for students training in infectious disease modeling, mathematical biology, applied statistics and mathematics, and special courses in epidemic modeling and timeseries forecasting.
This new opensource toolbox and associated tutorial will be helpful to a broad community of applied scientists interested in characterizing and forecasting timeseries data that results from the aggregation of multiple asynchronous underlying growth processes. Moreover, prior publications have extensively validated the tools presented here [13, 15]. The models and methods included in the toolbox have improved shortterm forecasting performance over simpler growth models such as the Richards and generalizedlogistic growth models. Moreover, we have ensured publicly available, longterm, and stable hosting of the toolbox in a public GitHub repository. Extensions to the toolbox could include additional components, such as new model features, alternative estimation methods, and additional forecasting performance metrics.
Availability and requirements
Project name: Forecasting growth trajectories using the ensemble spatial wave subepidemic modeling framework.
Project home page: https://github.com/gchowell/spatial_wave_subepidemic_framework Operating system(s): Platform independent.
Programming language: MATLAB.
Other requirements: NA.
License: This program is free software: it can be redistributed or modified under the GNU Public License as published by the Free Software Foundation, version 3 of the License.
Any restrictions to use by nonacademics: None.
Availability of data and materials
Datasets for daily COVID19 cases reported in the USA are retrieved from the publicly available data tracking system of the Johns Hopkins Center for Systems Science and Engineering (CSSE). Code files can be accessed at https://github.com/gchowell/spatial_wave_subepidemic_framework.
Abbreviations
 AIC:

Akaike Information Criterion
 ARIMA:

Auto Regressive Integrated Moving Average
 CSSE:

Center for System Science and Engineering
 CI:

Confidence Interval
 GLM:

Generalized Logistic growth Model
 IS:

Interval Score
 MAE:

Mean Absolute Error
 MLE:

Maximum Likelihood
 MSE:

Mean Squared Error
 ODE:

Ordinary Differential Equations
 PI:

Prediction Interval
 USA:

United States of America
 WIS:

Weighted Interval Score
References
Petropoulos F, Apiletti D, Assimakopoulos V, Babai MZ, Barrow DK, Ben Taieb S, Bergmeir C, Bessa RJ, Bijak J, Boylan JE, et al. Forecasting: theory and practice. Int J Forecast. 2022;38(3):705–871.
Dimri T, Ahmad S, Sharif M. Time series analysis of climate variables using seasonal ARIMA approach. J Earth Syst Sci. 2020;129:149.
Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice. 2nd ed. OTexts. 2018. p. 384.
Mondal P, Shit L, Goswami S. Study of effectiveness of time series modeling (ARIMA) in forecasting stock prices. Int J Sci Eng Appl. 2014;4(2):13.
Shamsnia SA, Shahidi N, Liaghat A, Sarraf A, Vahdat SF. Modeling of weather parameters using stochastic methods (ARIMA model)(case study: Abadeh Region, Iran). In: International Conference on Environment and Industrial Innovation. IPCBEE. 2011;12.
Tektaş M. Weather forecasting using ANFIS and ARIMA models. Environ Res Eng Manag. 2010;51(1):5–10.
Yan P, Chowell G. Quantitative methods for investigating infectious disease outbreaks vol. 70. Cham: Springer; 2019.
Chowell G. Fitting dynamic models to epidemic outbreaks with quantified uncertainty: a primer for parameter uncertainty, identifiability, and forecasts. Infect Dis Model. 2017;2(3):379–98.
Chowell G, CastilloChavez C, Fenimore PW, KribsZaleta CM, Arriola L, Hyman JM. Model parameters and outbreak control for SARS. Emerg Infect Dis. 2004;10(7):1258.
Keeling MJ, Hill EM, Gorsich EE, Penman B, GuyverFletcher G, Holmes A, Leng T, McKimm H, Tamborrino M, Dyson L. Predictions of COVID19 dynamics in the UK: shortterm forecasting and analysis of potential exit strategies. PLoS Comput Biol. 2021;17(1):e1008619.
Viboud C, Sun K, Gaffey R, Ajelli M, Fumanelli L, Merler S, Zhang Q, Chowell G, Simonsen L, Vespignani A. The RAPIDD ebola forecasting challenge: synthesis and lessons learnt. Epidemics. 2018;22:13–21.
Tuncer N, Timsina A, Nuno M, Chowell G, Martcheva M. Parameter identifiability and optimal control of an SARSCoV2 model early in the pandemic. J Biol Dyn. 2022;16(1):412–38.
Chowell G, Tariq A, Hyman JM. A novel subepidemic modeling framework for shortterm forecasting epidemic waves. BMC Med. 2019;17(1):164–164.
Raimund B, Gerardo C, Leidy Yissedt LD. Comparative analysis of phenomenological growth models applied to epidemic outbreaks. Math Biosci Eng. 2019;16(5):4250–73.
Chowell G, Rothenberg R, Roosa K, Tariq A, Hyman JM, Luo R. Subepidemic model forecasts during the first wave of the COVID19 pandemic in the USA and European hotspots. Mathematics of Public Health. Cham: Springer International Publishing; 2022.
Chowell G, Dahal S, Tariq A, Roosa K, Hyman JM, Luo R. An ensemble nsubepidemic modeling framework for shortterm forecasting epidemic trajectories: application to the COVID19 pandemic in the USA. PLoS Comput Biol. 2022;18(10): e1010602.
Chowell G, Luo R. Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks. BMC Med Res Methodol. 2021;21(1):34.
Banks HT, Hu S, Thompson WC. Modeling and inverse problems in the presence of uncertainty. 1st ed. Chapman and Hall/CRC; 2014. https://doi.org/10.1201/b16760.
Roosa K, Luo R, Chowell G. Comparative assessment of parameter estimation methods in the presence of overdispersion: a simulation study. Math Biosci Eng. 2019;16(5):4299–313.
Myung IJ. Tutorial on maximum likelihood estimation. J Math Pyschol. 2003;47:90–100.
Friedman J, Hastie T, Tibshirani R. The elements of statistical learning: Data mining, inference, and prediction. New York: SpringerVerlag New York; 2009.
Shanafelt DW, Jones G, Lima M, Perrings C, Chowell G. Forecasting the 2001 footandmouth disease epidemic in the UK. EcoHealth. 2018;15:338–47.
Chowell G, HincapiePalacio D, Ospina J, Pell B, Tariq A, Dahal S, Moghadas S, Smirnova A, Simonsen L, Viboud C. Using Phenomenological models to characterize transmissibility and Forecast patterns and final Burden of Zika Epidemics. PLoS Curr. 2016;8.
Pell B, Kuang Y, Viboud C, Chowell G. Using phenomenological models for forecasting the 2015 Ebola challenge. Epidemics. 2018;22:62–70.
Sugiura N. Further analysts of the data by akaike’ s information criterion and the finite corrections. Commun Stat Theory Methods. 1978;7:13–26.
Hurvich CM, Tsai CL. Regression and time series model selection in small samples. Biometrika. 1989;76:297–307.
Gneiting T, Raftery AE. Strictly proper Scoring rules, Prediction, and estimation. J Am Stat Assoc. 2007;102(477):359–78.
Kuhn M, Johnson K. Applied predictive modeling, vol. 26. New York: Springer; 2013.
Competitor’s Guide: Prizes and Rules.[https://www.m4.unic.ac.cy/wpcontent/uploads/2018/03/M4CompetitorsGuide.pdf].
Tariq A, Chakhaia T, Dahal S, Ewing A, Hua X, Ofori SK, Prince O, Salindri AD, Adeniyi AE, Banda JM, et al. An investigation of spatialtemporal patterns and predictions of the coronavirus 2019 pandemic in Colombia, 2020–2021. PLoS Negl Trop Dis. 2022;16(3): e0010228.
Bracher J, Ray EL, Gneiting T, Reich NG. Evaluating epidemic forecasts in an interval format. PLoS Comput Biol. 2021;17(2): e1008618.
Hwang E. Prediction intervals of the COVID19 cases by HAR models with growth rates and vaccination rates in top eight affected countries: bootstrap improvement. Chaos Solitons Fractals. 2022;155:111789–111789.
Roosa K, Tariq A, Yan P, Hyman JM, Chowell G. Multimodel forecasts of the ongoing ebola epidemic in the Democratic Republic of Congo, March 2013October 2019. J R Soc Interface. 2020;17(169):20200447.
Cramer EY, Ray EL, Lopez VK, Bracher J, Brennen A, Castro Rivadeneira AJ, Gerding A, Gneiting T, House KH, Huang Y, et al. Evaluation of individual and ensemble probabilistic forecasts of COVID19 mortality in the United States. Proc Natl Acad Sci U S A. 2022;119(15): e2113561119.
MunizRodriguez K, Chowell G, Cheung CH, Jia D, Lai PY, Lee Y, Liu M, Ofori SK, Roosa KM, Simonsen L, et al. Doubling time of the COVID19 epidemic by Province, China. Emerg Infect Dis. 2020;26(8):1912–4.
Smirnova A, DeCamp L, Chowell G. Mathematical and statistical analysis of doubling times to investigate the early spread of epidemics: application to the COVID19 pandemic. Mathematics. 2021;9(6): 625.
Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc R Soc B: Biol Sci. 2007;274(1609):599–604.
Chowell G, Luo R, Sun K, Roosa K, Tariq A, Viboud C. Realtime forecasting of epidemic trajectories using computational dynamic ensembles. Epidemics. 2020;30: 100379.
Ray EL, Reich NG. Prediction of infectious disease epidemics via weighted density ensembles. PLoS Comput Biol. 2018;14(2): e1005910.
Burnham KP, Anderson DR. Model selection and multimodel inference: a practical informationtheoretic approach. 2nd ed. New York: Springer; 2002. p. 488.
Hopkins J. CSSE Covid19 timeseries. GitHub; 2022.
Tariq A. GitHub Repository. 2022.
Roosa K, Chowell G. Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models. Theor Biol Med Model. 2019;16(1):1.
Chowell G, Tariq A, Hyman JM. A novel subepidemic modeling framework for shortterm forecasting epidemic waves. BMC Med. 2019;17(1):164.
Acknowledgements
None.
Funding
G.C. is partially supported from NSF grants 2125246 and 2026797 and R01 GM 130900. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
G.C. conceived and developed the first version of the toolbox and wrote the first draft of the tutorial; G.C., A.T., A.B, S.D., J.M., R.L contributed to analysis and writing subsequent drafts of the tutorial. A.T. produced the tutorial video.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Chowell, G., Tariq, A., Dahal, S. et al. SpatialWavePredict: a tutorialbased primer and toolbox for forecasting growth trajectories using the ensemble spatial wave subepidemic modeling framework. BMC Med Res Methodol 24, 131 (2024). https://doi.org/10.1186/s12874024022412
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874024022412