 Research
 Open Access
 Published:
Distributed lag interrupted time series model for unclear intervention timing: effect of a statement of emergency during COVID19 pandemic
BMC Medical Research Methodology volume 22, Article number: 202 (2022)
Abstract
Background
Interrupted time series (ITS) analysis has become a popular design to evaluate the effects of health interventions. However, the most common formulation for ITS, the linear segmented regression, is not always adequate, especially when the timing of the intervention is unclear. In this study, we propose a new model to overcome this limitation.
Methods
We propose a new ITS model, ARIMAITSDL, that combines (1) the Autoregressive Integrated Moving Average (ARIMA) model and (2) distributed lag functional terms. The ARIMA technique allows us to model autocorrelation, which is frequently observed in time series data, and the decaying cumulative effect of the intervention. By contrast, the distributed lag functional terms represent the idea that the intervention effect does not start at a fixed time point but is distributed over a certain interval (thus, the intervention timing seems unclear). We discuss how to select the distribution of the effect, the model construction process, diagnosing the model fitting, and interpreting the results. Further, our model is implemented as an example of a statement of emergency (SoE) during the coronavirus disease 2019 pandemic in Japan.
Results
We illustrate the ARIMAITSDL model with some practical distributed lag terms to examine the effect of the SoE on human mobility in Japan. We confirm that the SoE was successful in reducing the movement of people (15.0–16.0% reduction in Tokyo), at least between February 20 and May 19, 2020. We also provide the R code for other researchers to easily replicate our method.
Conclusions
Our model, ARIMAITSDL, is a useful tool as it can account for the unclear intervention timing and distributed lag effect with autocorrelation and allows for flexible modeling of different types of impacts such as uniformly or normally distributed impact over time.
Introduction
Interrupted time series (ITS) has become a popular study design to evaluate the effects of health interventions in the field of public health and epidemiology [1, 2]. The effect of the intervention is estimated by comparing it with the “counterfactual" that is estimated by the expected trend assuming the absence of the intervention. ITS has been regarded as one of the best designs to estimate the causality of the intervention, especially when no control population is available, or randomized controlled trials (RCTs) are not feasible [3, 4]. Fretheim et al. (2015) demonstrated that ITS could generate effect estimates similar to those of RCTs [5]. Previous studies extensively examined the strengths and limitations of ITS and provided a practical guideline for its application [3, 6,7,8].
More formally, let Y_{t} be an outcome of interest at time t, and T_{0} be the time of the intervention of interest. For statistical inference, we assume that the sampled time series data \(\{Y_{t}\}_{t=1}^{T}\) is available. Standard ITS can be expressed as a simple segmented linear regression formulation, which is given by:
where ε_{t} is Gaussian white noise with constant variance \(\sigma ^{2}_{\varepsilon }\) (i.e., independent of t) and D_{t} is a dummy variable indicating the postintervention interval, coded as 0 in the preinterruption period and 1 in the postinterruption period. β_{0} represents the baseline level at t=0, β_{1} denotes the change in outcome associated with a onetime increase and is regarded as the underlying preintervention trend, and β_{2} and β_{3} represent the changes in the intercept and slope of the trend after the intervention, respectively. Note that although we tentatively use the linear regression formulation expressed in Eq. (1), it can be easily extended to the generalized linear regression formulation.
One issue that has not yet been covered in detail in the prior literature is how the unclear timing of the intervention should be modeled. For example, as we discuss in “Application data analysis” section, consider the case where the statement of emergency (SoE) during the coronavirus disease 2019 (COVID19) pandemic is the intervention for controlling the spread of infection. Although the date of the SoE declaration is fixed (i.e., nominal timing of the intervention is fixed and clear), the actual effect of the SoE begins long before the date of declaration because the media broadcast the declaration in advance, leading to a change in people’s behavior accordingly. In such a case, determining D_{t} in Eq. (1) becomes difficult as the actual timing of the intervention is unclear. In other words, the current ITS approach can only be employed when the timing of the intervention is clear and the nominal timing is the same as the actual timing of the intervention. To overcome this limitation, in this study, we aim to develop a new model that combines the ITS model in Eq. (1) with the 1) autoregressive integrated moving average (ARIMA) model and 2) distributed lag functional terms. The motivation to include the ARIMA technique is 1A) to model autocorrelation, which is frequently observed in time series data but cannot be treated well in Eq. (1), and 1B) to model the decaying cumulative effect of the intervention. By contrast, the motivation to include the distributed lag functional terms is 2A) to represent the idea that the effect of the intervention does not start at a fixed time point but that the intervention timing is distributed over a certain interval (thus, the intervention timing is unclear), and 2B) to model the distributed effect of the intervention over a certain period that includes the timing of the intervention.
The remainder of this study is structured as follows. In “Methods” section, we introduce the idea of the ARIMA ITS model with distributed lag functional terms to model the unclear intervention timing and its distributed effect and then describe the inference procedure. In “Application data analysis” section, we outline practical data analysis procedures of our method for the users by applying several human mobility datasets during the COVID19 pandemic in Japan. In “Discussion” section, we discuss and propose possible further developments.
Methods
ARIMAX iTS with distributed lag functional terms
In this section, we first explain an ARIMA model with exogenous variables (ARIMAX) [9]. Then, by extending the exogenous variables to our new functional variables (distributed lag functional terms), we propose our model, ARIMAITSDL, to simultaneously model both the unclear intervention timing and the distributed effect of the intervention.
Define the sampled time series data as \(\{Y_{t}, \boldsymbol {X}_{t}\}_{t=1}^{T}\). The general class of the ARIMAX (p,d,q) model without a constant takes the form of; [9, 10]
or equivalently,
where B is a lag operator such that \({BY}_{t} = Y_{t1},\ \delta (B) = 1  \delta _ {1} B \dots \delta_{p}B^{p}=1\sum_{i=1}^{p}\delta_{i}B^{i},\ \delta_{i}<1,\ \Delta = (1B),\ \theta (B) = 1  \theta_ {1} B \dots \theta_{q}B^{q}=1\sum_{i=1}^{q}\theta_{i}B^{i},\ \delta_{i}\) is the parameter for the autoregressive (AR) part, θ_{i} is the parameter for the moving average part (MA), and p and q are the orders of the AR and MA parts, respectively. For simplicity, we set d=0, and thus, ARIMAX(p,d,q) reduces to ARMAX (p,q). Even when the model where d>0 is needed, the following procedure is still valid after differencing Y_{t} before fitting the model. The example of an analytical procedure in this case is explained in “Application data analysis” section. When the covariate vector X includes a dummy variable indicating the postintervention intervals, which corresponds to D_{t} in Eq. (1), we can extend the simple ITS model to the ARIMA model [11].
To model both the unclear intervention timing and the distributed effect of the intervention over time, we propose the following ARIMAX ITS model with distributed lag functional terms with the covariates X_{t}, denoted by ARIMAITSDL (p,q,l_{1},l_{2}). We define T_{0} as the nominal timing of the intervention.
where \(\{\delta _{i}\}_{i=1}^{p}, \{w_{k}\}_{k=0}^{l_{1}+l_{2}}, \boldsymbol {\beta }^{T}\), and \(\{\theta _{i}\}_{i=1}^{q}\) are unknown parameters,
l_{1} controls the duration before the effect of the intervention appears (i.e., the start timing of the effect), and l_{2} controls the duration of the effect (i.e., the end timing of the effect). Thus, the effect of the intervention is assumed to last from T_{0}−l_{1} to T_{0}+l_{2}. Given this formalization, we can model the unclear intervention timing during t∈[T_{0}−l_{1},T_{0}+l_{2}]. The distributed lag functional term F_{t−k} represents how the effect of the intervention is distributed over the time during t∈[T_{0}−l_{1}+k,T_{0}+l_{2}]. The function f() is a probability density function (pdf) of time t that represents the proportion of the distributed effect of the intervention. Note that Eq. (4) can be considered one special case of the transfer function model popularized by Box and Jenkins [10].
Figure 1 illustrates examples of f(): ex. if the effect is assumed to be distributed uniformly and symmetrically with a peak at T_{0} or asymmetrically around T_{0}, f() might be formulated as a uniform distribution (green line in Fig. 1), (truncated) normal distribution (blue or red lines in Fig. 1), or (truncated) lognormal distribution (purple in Fig. 1), respectively. The choice of f() is discussed in the following section with practical examples.
Detailed explanation on distributed lag functional terms
For simplicity, we set p=1 and q=0 and drop X_{t} from Eq. (4). Thus, our model reduces to the ARX(1,0) model with distributed lag functional terms. Then, Eq. (4) is the reduced simple form that can be expressed as:
where δ<1 is assumed to make Y_{t} a stationary process. Now, t is assumed to indicate a day. F_{t−k} indicates the proportion of the distributed effect of the intervention and w_{k} estimates the (lagged) effect. Thus, the term w_{k}F_{t−k} indicates the distributed lag effect on Y_{t}. Another advantage of the formulation of Eq. (6) is that because of the AR parameter, δ, we can model the cumulative effect of the intervention from the previous time point. For a clearer understanding, we explain Eq. (6) with examples as follows.
From the definition of F_{t−k}, when t<T_{0}−l_{1}, Eq. (6) becomes
Consider the case where t∈[T_{0}−l_{1},T_{0}+l_{2}] (i.e., during the intervention period). When t=T_{0}−l_{1}+m (m∈[0,l_{1}+l_{2}]), the iterative use of Eq. (6) results in the general form as
This formulation represents the cumulative effect of the intervention since a day before the intervention becomes effective (i.e., the day of T_{0}−l_{1}−1). The first term is the cumulative effect of the outcome \(Y_{T_{0}l_{1}1}\) with decay δ^{k−1}, the second term is the delayed and cumulative effect of the intervention, which is explained below, and the third term is an error term.
Finally, we consider the postintervention period (i.e., t>T_{0}+l_{2}). When t>T_{0}+l_{2}+n (n∈[0,l_{1}+l_{2}]), Eq. (6) becomes
and when t>T_{0}+l_{1}+2l_{2}, Eq. (6) becomes the following simple form again as:
Now by using Eq. (6) iteratively, we obtain the general form Eq. (8). For a clearer understanding, we introduce the following example calculations, (T0T2), which correspond to the first three days after the actual intervention becomes effective (i.e., the actual timing of the intervention while the nominal intervention timing is still T_{0}). T0. When t=T_{0}−l_{1}, the outcome \(Y_{T_{0}l_{1}}\) is affected by the white noise \(\varepsilon _{T_{0}l_{1}}\), the decayed outcome at one day ago \(\delta Y_{T_{0}l_{1}1}\), where δ<1 represents the decay rate and the proportion of the effect of intervention at the first date \(F_{T_{0}l_{1}}=f(0)\), and w_{0}, which represents the effect size as follows:
T1. When t=T_{0}−l_{1}+1, the outcome \(Y_{T_{0}l_{1}+1}\) is affected by the white noise \(\varepsilon _{T_{0}l_{1}+1}\), the decayed outcome at \(\delta Y_{T_{0}l_{1}}\), the proportion of the effect of intervention on the first and second dates \(F_{T_{0}l_{1}+1} =f(1), F_{T_{0}l_{1}}=f(0)\), and w_{0} and w_{1}, which represent the 0 or 1day delayed effect), respectively, as follows:
By plugging \(Y_{T_{0}l_{1}}\) in T0, we can obtain the following equation:
T2. When t=T_{0}−l_{1}+2, we obtain \(F_{T_{0}l_{1}+2} =f(2), F_{T_{0}l_{1}+1}=f(1), F_{T_{0}l_{1}}=f(0)\). Using Eq. (6) iteratively, the model is reduced to:
For example, Eq. (11) provides an intuitive explanation of the cumulative and delayed effect of the intervention. The first term is the decayed outcome at T_{0}−l_{1}−1; the second term is the effect on the same day (i.e., T_{0}−l_{1}+2), which is decomposed into the 0daydelayed effect w_{0} and the proportion of effect on the same day \(F_{T_{0}l_{1}+2} =f(2)\); the third term is the cumulative and delayed effect from the previous day, which is decomposed into the 1daydelayed effect w_{1}, the decayed 0daydelayed effect δw_{0}, and the proportion of effect of the previous day \(F_{T_{0}l_{1}+1} =f(1)\); and the fourth term is the cumulative and delayed effect from 2day ago, which is decomposed into the 2daydelayed effect w_{2}, the decayed 0 and 1daydelayed effects δ^{2}w_{0},δw_{1}, and the proportion of effect from 2day ago \(F_{T_{0}l_{1}} =f(0)\). This example explains that we can model the proportion of the intervention on a given day and the cumulative and delayed effects of the intervention from the previous days.
Restrictions for estimation
The simplest procedure for estimating the parameters in Eq. (4) is using the maximum likelihood method as with the ordinary ARIMA modeling. Unfortunately, the precision of the estimates of \(\{w_{k}\}_{k=0}^{l_{1}+l_{2}}\) is known to sometimes become poor because of the high correlation among \(\{F_{tk}\}_{k=0}^{l_{1}+l_{2}}\), resulting in multicollinearity in the model [12, 13]. To obtain stable estimates of w_{k}s, we impose the following constraints: We redefine \(\sum_{k=0}^{l_{1}+l_{2}}w_{k}F_{tk}\) in a matrix form as:
where \(\boldsymbol {F}=(F_{t},\dots,F_{tl_{1}l_{2}})^{T}\in \mathbb {R}^{l_{1}+l_{2}+1},\ \boldsymbol {C\eta }=(w_{0},\dots,w_{l_{1}+l_{2}})^{T}\in \mathbb {R}^{l_{1}+l_{2}+1},\ \boldsymbol {C}\in \mathbb {R}^{(l_{1}+l_{2}+1)\times h}\) includes the basis variables derived from the specific constraint on w_{k}s, and \(\boldsymbol {\eta }\in \mathbb {R}^{h}\) is a vector of unknown parameters. For example, a constant decline during the lag interval is modeled by:
a moving average in the previous L period is modeled by (in the case of L=2):
and a polynomial smoothing proposed by Schwartz (2000) [11] and Rondeau et al. (2004) [14] is modeled by:
where s denotes the degree of the polynomial function. Other examples with nonlinear functions, such as splines, can be found in Gasparrini et al. (2010) [13]. These models have been popular for modeling air pollution [15] and temperatures [16].
Modeling steps, the selection of orders (p,q,l _{1},l _{2}) and f(), and creating counterfactual
We follow the Box–Jenkins threestage modeling strategy, including identification, estimation, and diagnostic checking [10]. At the identification stage, a visual inspection of the plots for the time series data allows us to check some important features, such as structural changes, outliers, and missing values, and ensure the stationarity of the time series data. Nonstationary time series invalidates the analyses with the ordinary ARIMA model. To check the stationarity, several tests, such as the Augmented Dickey–Fuller (ADF) test, Ljung–Box test, and Kwiatkowski–Philips–Schmidt–Shin test, can be used [9]. If the time series is judged as nonstationary, a common approach is to subtract successive observations—also known as differencing—to stationarize the time series data. By iteratively applying this testdifferencing approach, we can select the order d in Eq. (4). Once d is fixed, Y_{t} in Eq. (4) can be replaced with the differentiated Y_{t}.
The plots of the autocorrelation function (ACF) and partial autocorrelation function (PACF) should be visually examined to identify the search range of the orders of the AR and MA parameters: 1) to decide an AR(p) model, the ACF should slowly decrease and PACF should cut off after lag p; 2) to decide an MA(q) model, the PACF should slowly decline and the ACF should cut off after lag q; and 3) to decide an ARMA(p,q) model, both the ACF and PACF should tail off.
At the estimation stage, the maximum likelihood method is used to estimate the regression parameters. The Akaike information criterion (AIC) or Bayesian information criterion (BIC) can be used to select the optimal order of (p,q) among the aforementioned search range defined by the ACF or PACF. In addition, regarding the search space defined by the possible combinations of (l_{1},l_{2}) and f(), they can be specified by the researcher based on the literature search or expert opinion; however, the sensitivity analysis for their choice or model fitting statistics such as AIC or BIC can be helpful. In “Application data analysis” section, we check the sensitivity of the results by changing the combination of (l_{1},l_{2}) and f(). The last step is to check the residuals of the selected model by visual inspection of the residual plot and by testing the presence of autocorrelation using methods such as the Ljung–Box test for white noise. If the autocorrelation is still judged to exist in the residuals, different AR and/or MA orders can be chosen.
Finally, once the regression parameters are estimated, the intervention effect can be estimated by calculating the differences between the observed data and counterfactual prediction. The counterfactual (i.e., Y_{t} in the absence of the intervention) can be created by substituting 0 into all the components in F_{t−k}s.
Application data analysis
Data description
In response to the global COVID19 pandemic, governments implemented largescale public health interventions, such as lockdowns and SoE declarations, to control the spread of the virus. To quantify the effect of such interventions, human mobility has been frequently measured in terms of tracing human contact in several places, such as restaurants and workplaces. Previous studies have illustrated that the SoE declaration can effectively reduce human mobility [17,18,19]. In this study, we use the human mobility index (HMI) in Japan published by Google [20]. The HMI represents the relative percentage changes in the daily number of visitors (or time spent) from the baseline period (i.e., the same day of the 5week period between January 3 and February 6, 2020) at six locations: retail and recreational places, grocery and pharmacy stores, public transportation (transit) stations, parks and outdoor spaces, workplaces, and residential areas. The HMI is available daily for all 47 prefectures in Japan. For a simple explanation of our method, we mainly use the HMI at workplaces in Tokyo from February 20 to May 19, 2020. In Tokyo, the first SoE was officially declared on April 7, 2020, although the media broadcast the event even before it was declared, which may have impacted human mobility [21, 22]. In this sense, the nominal intervention timing is clear, although the actual intervention (i.e., the first SoE) timing is unclear. In this section, we demonstrate that the proposed method can handle such cases. Lastly, we use the following covariates in X: the daily COVID19 testpositive rate, daily number of deaths, and daily (average) temperature.
Practical procedure of data analysis
The data are illustrated in Fig. 2A, where the observed HMI is plotted as a black solid line. To reduce the variation in HMI by day of the week, a sevenday rolling average is calculated in advance. The ADF test indicates that the original time series is nonstationary (p=0.571), and thus, the first difference of the time series data (Fig. 2B) is used to induce stationarity (the ADF test shows p=0.049). Therefore, Y_{t} is the first difference data, and d=0 is fixed. The ACF and PACF of the stationary (i.e., firstdifferenced) data are plotted in Fig. 2C and D, respectively. In the figures, the black solid bars above or below the blue dashed lines represent statistically significant autocorrelation with p<0.05. In Fig. 2C and D, we can check that autocorrelation does not exist after lag 1. This implies that the search range of the order for p and q might be around lag 1 [11]. We then search over a series of potential models for the best model with the lowest AIC by using the auto.arima() function in the forecast package in R. Each model is optimized using the maximum likelihood method.
To estimate the effect of the SoE on the HMI in Tokyo, the intervention effect is assumed to be distributed uniformly or normally over time, that is, f() in Eq. (5) is set to the pdf of uniform distribution or (truncated) normal distribution (truncated by l_{1} and l_{2}). In addition, for the constraint on the estimation of w_{k}s, both Eqs. (14) and (15) are used. Consequently, we have four types of ITS models by combining two f() and two constraints, as follows:

Normal1:
f() is a normal distribution N(0,1) that is truncated by l_{1}=6 and l_{2}=6 (i.e., the center is the day of the SoE (April 7, 2020) and truncated at April 1 and 13, 2020). In addition, the constraint for w_{k}s is the polynomial smoothing model Eq. (15), with s=3.

Normal2:
Similar to Normal1, but f() is truncated by different days, that is, f() is a normal distribution N(0,1) that is truncated by l_{1}=9 and l_{2}=3 (i.e., the center is the day of the SoE (April 7, 2020) and truncated at March 29 and April 10, 2020). Moreover, the constraint for w_{k}s is the polynomial smoothing model, Eq. (15), with s=3.

Normal3:
Similar to Normal2, but f() has a different normal distribution, that is, f() is a normal distribution N(−3,3) that is truncated by l_{1}=9 and l_{2}=3 (i.e., the center is 3 days before the SoE (April 4, 2020) and truncated at March 29 and April 10, 2020). Further, the constraint for w_{k}s is the MA model, Eq. (14).

Uniform:
f() is a uniform distribution with l_{1}=6 and l_{2}=6 (i.e., the center is the day of the SoE (April 7, 2020) and truncated at April 1 and 13, 2020). In addition, the constraint for w_{k}s is the MA model, Eq. (14).
The model with the lowest AIC selected by the algorithm is ARIMA(1,0,0) for all the above models. The residuals for all the models have no significant autocorrelation: all pvalues of the Ljung–Box test for white noise are more than 0.9 at six lags. As the data comprise the first difference (Fig 2B), we calculate it back into the original time series (Fig. 2A). To compare our method with the other approaches, the conventional ITS model (denoted by cITS in Figures and Tables) and the model proposed by Schaffer et al. (2021) [11], which is an ARIMAbased ITS model (denoted by ARIMA in Figures and Tables), are included.
Lastly, we validate our method in different COVID19 datasets by using different prefectures (Osaka and Ehime prefectures: Osaka is a populous and large prefecture and Ehime is a local and small prefecture in Japan), different HMI at “grocery and pharmacy store” and “public transportation (transit) stations”, and different timing of SoE (Osaka at April 7 and Ehime at April 16, 2020). The same procedures described above are used. Figure 3 and the Additional files include the detailed results of these validations.
Application results
Figures 2A and B (colored and dashed lines) indicate the observed data and counterfactual predictions in Tokyo by our ARIMAITSDL models, assuming the absence of the intervention. Dotted lines indicate the comparison groups: cITS (i.e., the conventional ITS model) and ARIMA (i.e., the model by Schaffer et al. (2021)). Table 1 provides more detailed values of the observed and estimated HMIs. It illustrates that the SoE on April 7, 2020, was successful in reducing the mobility of people: Normal1, Normal2, Normal3, and Uniform models show 392.42 (16.0%), 372.33 (15.0%), 388.61 (16.0%), and 392.20 (16.0%) (cumulative) reduction in HMI, respectively, after the SoE became effective in Tokyo. The difference between the observed and counterfactual peaks around the head of May 2020 for all models, Normal1, Normal2, Normal3, and Uniform models, show a reduction of 18.68 (31.1%), 16.66 (27.7%), 17.12 (28.5%), and 18.74 (31.2%), respectively, on May 3, 2020, which corresponds to the middle period of Japan’s long holiday season. In the comparison groups, the similar tendency are observed, while the degree of reduction in HMI are likely to be underestimated: cITS and ARIMA models show 60.15 (3.0%) and 319.21 (15.0%) (cumulative) reduction in HMI, respectively, after the SoE became effective in Tokyo. Importantly, our findings should only be valid for the study period (i.e., February 20 to May 19, 2020, in Tokyo).
Figure 3A and B indicate the validation results by using different COVID19 datasets. While only the result of cITS in Osaka doesn’t seem to estimate the counterfactual values well, the tendency is basically the same as in Tokyo (i.e., Fig. 2): compared to the counterfactual predictions made by our method, the conventional methods tend to underestimate the degree of reduction in HMI. The detailed values of the observed and estimated HMIs in the Additional files.
Discussion
ITS analysis is frequently used to quantify the effects of health interventions on health outcomes at the population level. The most popular formulation of the ITS analysis is the (Gauss–Markov type) linear regression model, Eq. (1) [1, 6, 23]. One of the key assumptions is that the residuals are independent and not correlated. However, this assumption is often violated in time series data. By incorporating dependencies between different time points, the ARIMA model is a possible solution to this problem. In addition, the timing of the intervention is assumed to be clear; however, the actual time at which the effect begins is not always clear in practice. For example, as we described in “Application data analysis” section, the SoE itself was issued on a certain date, T_{0}, but the actual influence was effective long before the issue of the SoE and distributed over time (i.e., [T_{0}−l_{1},T_{0}+l_{2}]), as the media, such as TV, announced its issuance in advance. Thus, the actual timing of the SoE is unclear. To address these issues, we proposed the ARIMA ITS model with the distributed lagged functional terms ARIMAITSDL. The lagged functional term is tailored to represent how the intervention effect is distributed before and after the nominal timing of the intervention (i.e., the date of the SoE). In addition to the distributed effect, another practical feature of our new model is that we can naturally model the cumulative effect with decaying parameter δ, which is explained in detail in “Detailed explanation on distributed lag functional terms” section. One possible difficulty highlighted by this abundance of choice (the orders (p,d,q), the distribution function f() with (l_{1},l_{2}), and the restriction form for w_{k}s) is how to choose the best combination. Another possible modeling techniques for the unclear timing of intervention would be the combination of fuzzy set theory and time series models, which have been extensively studied in the field of information science such as Chen (1996) and Singh (2021) [24, 25]. In addition, the fuzzy set theory has been frequently used for analyzing the COVID19 data [26, 27]. These papers point the way to our next research directions.
In “Application data analysis” section, we used AIC to select the orders (p,d,q), and f(), and (l_{1},l_{2}) were varied for the sensitivity check. However, a priori arguments and expert opinions may be helpful for this choice. A previous discussion concluded that the choice of the distributed lag terms should balance between sufficient complexity to capture detail and sufficient simplicity for interpretability from epidemiological or medical perspectives [28]. As we have no consensus on what is an “optimal” ARIMA model, sensitivity analyses and regression diagnoses such as residual analysis are particularly important to assess the robustness of the key conclusions. The R code, “fuzzyARIMAITS”, for the proposed method are provided in a GitHub repository (https://github.com/kingqwert/R/blob/master/ARIMAITS_DS/fuzzyARIMAITS.R) and will be hosted on the CRAN repository (https://www.rproject.org/) shortly, allowing others to apply our method easily.
Real data were examined with a detailed explanation of the analytical procedure to provide practical insights into the effect of the SoE on human mobility in Japan. We used four models consisting of two distributed lag functions (truncated normal and uniform distribution) and two restrictions on w_{k}s (polynomial and MA) and confirmed that they exhibit almost the same effect for reducing human mobility in several COVID19 datasets. Our results indicate that the SoE was successful in reducing the movement of people, at least during the study period (i.e., February 20 to May 19, 2020).
A limitation of this study is that our method was examined in only COVID19 datasets while using four different settings, two comparison methods, and models with potentially different formulations associated with the distributed lag terms to check the sensitivity of the results. We encourage the reevaluation of our approach using other datasets. Another limitation is that we assume that a stationary time series is available, which is the requirement of the AR model. The stationary assumption may be incompatible with time series data used in ITS analysis, where trends may change at certain points in time. However, in such a case, we can simply use ARIMA(0,0,0), and the idea of simultaneously using the ITS and distributed lag terms can still be valid. In this case, the model reduces to a simple form, such as Eq. (1), in which autocorrelation can be still modeled by including time as a covariate.
Conclusion
The ITS model has been a powerful study design for evaluating health intervention impacts, and its use has been increasing. The most common formulation for ITS, Eq. (1), is not always adequate, especially when the timing of the intervention is unclear. Our model, ARIMAITSDL, is a useful tool because it can account for such unclear intervention timing and distributed lag effect with autocorrelation and allows for flexible modeling of different types of impacts.
Availability of data and materials
The dataset generated during and/or analysed during the current study are available in https://www.google.com/covid19/mobility/?hl=en.
References
Bernal JL, Cummins S, Gasparrini A. Interrupted time series regression for the evaluation of public health interventions: a tutorial. Int J Epidemiol. 2017; 46(1):348–55.
Soumerai SB, Starr D, Majumdar SR. How do you know which health care effectiveness research you can trust? A guide to study design for the perplexed. Prev Chronic Dis. 2015;12:150187. http://dx.doi.org/10.5888/pcd12.150187.
Lopez Bernal J, Cummins S, Gasparrini A. The use of controls in interrupted time series studies of public health interventions. Int J Epidemiol. 2018; 47(6):2082–93.
Cook TD, Campbell DT, Shadish W. Experimental and Quasiexperimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin Harcourt; 2002.
Fretheim A, Zhang F, RossDegnan D, Oxman AD, Cheyne H, Foy R, Goodacre S, Herrin J, Kerse N, McKinlay RJ, et al.A reanalysis of cluster randomized trials showed interrupted timeseries studies were valuable in health system evaluation. J Clin Epidemiol. 2015; 68(3):324–33.
Jandoc R, Burden AM, Mamdani M, Lévesque LE, Cadarette SM. Interrupted time series analysis in drug utilization research is increasing: systematic review and recommendations. J Clin Epidemiol. 2015; 68(8):950–6.
Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D. Regression based quasiexperimental approach when randomisation is not an option: interrupted time series analysis. BMJ. 2015;350:h2750. https://doi.org/10.1136/bmj.h2750.
Wagner AK, Soumerai SB, Zhang F, RossDegnan D. Segmented regression analysis of interrupted time series studies in medication use research. J Clin Pharm Ther. 2002; 27(4):299–309.
Hamilton JD. Time Series Analysis. Princeton: Princeton university press; 2020.
Box GE, Jenkins GM, Reinsel GC, Ljung GM. Time Series Analysis: Forecasting and Control. Hoboken: Wiley; 2015.
Schaffer AL, Dobbins TA, Pearson SA. Interrupted time series analysis using autoregressive integrated moving average (arima) models: a guide for evaluating largescale health interventions. BMC Med Res Methodol. 2021; 21(1):1–12.
Zanobetti A, Schwartz J, Samoli E, Gryparis A, Touloumi G, Atkinson R, Le Tertre A, Bobros J, Celko M, Goren A, et al.The temporal pattern of mortality responses to air pollution: a multicity assessment of mortality displacement. Epidemiology. 2002; 13(1):87–93.
Gasparrini A, Armstrong B, Kenward MG. Distributed lag nonlinear models. Stat Med. 2010; 29(21):2224–34.
Rondeau V, Berhane K, Thomas DC. A threelevel model for binary timeseries data: the effects of air pollution on school absences in the southern california children’s health study. Stat Med. 2005; 24(7):1103–15.
Dominici F. Timeseries analysis of air pollution and mortality: a statistical review. Res Rep (Health Eff Inst). 2004;(123):3–27.
Anderson BG, Bell ML. Weatherrelated mortality: how heat, cold, and heat waves affect mortality in the united states. Epidemiol (Cambridge, Mass.) 2009; 20(2):205.
Arimura M, Ha TV, Okumura K, Asada T. Changes in urban mobility in sapporo city, japan due to the covid19 emergency declarations. Transp Res Interdiscip Perspect. 2020; 7:100212. https://doi.org/10.1016/j.trip.2020.100212.
Schlosser F, Maier BF, Jack O, Hinrichs D, Zachariae A, Brockmann D. Covid19 lockdown induces diseasemitigating structural changes in mobility networks. Proc Natl Acad Sci. 2020; 117(52):32883–90.
Yabe T, Tsubouchi K, Fujiwara N, Wada T, Sekimoto Y, Ukkusuri SV. Noncompulsory measures sufficiently reduced human mobility in tokyo during the covid19 epidemic. Sci Rep. 2020; 10(1):1–9.
Google. COVID19 Community Mobility Reports. https://www.google.com/covid19/mobility/?hl=en. Accessed 18 May 2022.
Nagata S, Nakaya T, Adachi Y, Inamori T, Nakamura K, Arima D, et al. Mobility change and covid19 in japan: Mobile data analysis of locations of infection. J Epidemiol. 2021;31(6):387–91. https://doi.org/10.2188/jea.JE20200625.
Yoneoka D, Tanoue Y, Kawashima T, Nomura S, Shi S, Eguchi A, Ejima K, Taniguchi T, Sakamoto H, Kunishima H, et al.Largescale epidemiological monitoring of the covid19 epidemic in tokyo. Lancet Reg HealthWest Pac. 2020; 3:100016.
Hategeka C, Ruton H, Karamouzian M, Lynd LD, Law MR. Use of interrupted time series methods in the evaluation of health system quality improvement interventions: a methodological systematic review. BMJ Glob Health. 2020; 5(10):e003567.
Chen SM. Forecasting enrollments based on fuzzy time series. Fuzzy Sets Syst. 1996; 81(3):311–9.
Singh P. Fqtsfm: A fuzzyquantum time series forecasting model. Inf Sci. 2021; 566:57–79.
Singh P, Bose SS. Ambiguous dmeans fusion clustering algorithm based on ambiguous set theory: Special application in clustering of ct scan images of covid19. KnowlBased Syst. 2021; 231:107432.
Singh P, Bose SS. A quantumclustering optimization method for covid19 ct scan image segmentation. Expert Syst Appl. 2021; 185:115637.
Armstrong B. Models for the relationship between ambient temperature and daily mortality. Epidemiology. 2006;17(6):624–31. https://doi.org/10.1097/01.ede.0000239732.50999.8f.
Acknowledgements
None.
Funding
This work was partially supported by JST, PRESTO Grant Number JPMJPR21RC, Japan, and by the GrantsinAid for Scientific Research (21K13328, 19K24340, 22K17859).
Author information
Authors and Affiliations
Contributions
D.Y. led the study. D.Y., T.K., and Y.T. conceived and designed the study. D.Y., T.K., Y.T., S.N., and A.E. analysed and interpreted the data. All authors took responsibility for the integrity of the data and the accuracy of the data analysis. All the authors made critical revisions to the manuscript for important intellectual content and gave final approval of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Observed and estimated human mobility index (HMI) in osaka and Ehime.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Yoneoka, D., Kawashima, T., Tanoue, Y. et al. Distributed lag interrupted time series model for unclear intervention timing: effect of a statement of emergency during COVID19 pandemic. BMC Med Res Methodol 22, 202 (2022). https://doi.org/10.1186/s12874022016621
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874022016621
Keywords
 Interrupted time series
 Distributed lag
 Unclear intervention timing
 Autoregressive integrated moving average model
 COVID19
 Human mobility index