 Research article
 Open Access
 Published:
Accounting for confounding by time, early intervention adoption, and timevarying effect modification in the design and analysis of steppedwedge designs: application to a proposed study design to reduce opioidrelated mortality
BMC Medical Research Methodology volume 21, Article number: 53 (2021)
Abstract
Background
Beginning in 2019, steppedwedge designs (SWDs) were being used in the investigation of interventions to reduce opioidrelated deaths in communities across the United States. However, these interventions are competing with external factors such as newly initiated public policies limiting opioid prescriptions, media awareness campaigns, and the COVID19 pandemic. Furthermore, control communities may prematurely adopt components of the intervention as they become available. The presence of timevarying external factors that impact study outcomes is a wellknown limitation of SWDs; common approaches to adjusting for them make use of a mixed effects modeling framework. However, these models have several shortcomings when external factors differentially impact intervention and control clusters.
Methods
We discuss limitations of commonly used mixed effects models in the context of proposed SWDs to investigate interventions intended to reduce opioidrelated mortality, and propose extensions of these models to address these limitations. We conduct an extensive simulation study of anticipated data from SWD trials targeting the current opioid epidemic in order to examine the performance of these models in the presence of external factors. We consider confounding by time, premature adoption of intervention components, and timevarying effect modification— in which external factors differentially impact intervention and control clusters.
Results
In the presence of confounding by time, commonly used mixed effects models yield unbiased intervention effect estimates, but can have inflated Type 1 error and result in under coverage of confidence intervals. These models yield biased intervention effect estimates when premature intervention adoption or effect modification are present. In such scenarios, models incorporating fixed interventionbytime interactions with an unstructured covariance for interventionbyclusterbytime random effects result in unbiased intervention effect estimates, reach nominal confidence interval coverage, and preserve Type 1 error.
Conclusions
Mixed effects models can adjust for different combinations of external factors through correct specification of fixed and random time effects. Since model choice has considerable impact on validity of results and study power, careful consideration must be given to how these external factors impact study endpoints and what estimands are most appropriate in the presence of such factors.
Background
Stepped wedge designs (SWD) are a unidirectional crossover design in which clusters switch from the control to the intervention condition at varying time points. The first phase is usually a baseline period in which no clusters receive intervention. During the second phase, clusters are randomly assigned to intervention at preselected time points until all clusters receive the intervention. The third phase corresponds to the followup period in which all clusters receive the intervention. An example of a SWD is provided in Fig. 1 and discussed in “Motivating example: HEALing communities study” section.
SWDs can be useful in public health settings where rolling out the intervention to all clusters at once is infeasible; they also ensure that all clusters in the study eventually receive the intervention [1, 2]. The SWD is particularly suitable for implementing and evaluating complex health interventions [3–5]. The current COVID19 pandemic and opioid epidemic provide settings in which such designs may prove useful; effective combinations of interventions are needed as soon as possible, but rolling them out to each community in a short time period may not be feasible. In this paper, we use the design of SWDs to combat the opioid epidemic between 2019 and 2022 as an illustrative example. In 2019, the National Institute on Drug Abuse (NIDA) awarded Kentucky and Ohio roughly $100 million each to implement an integrated set of interventions in highrisk communities using a SWD, with the objective of reducing opioid overdose deaths by 40% over a threeyear period [6–10].
Like most clinical trial designs, SWDs may be impacted by external factors that influence the primary outcomes. The term ”rising tide” has been used to describe the situation when there is a drift towards improvement in the outcomes due to factors external to the study [11]. A rising tide may be seen in the current opioid epidemic, where the severity of this crisis has led to new public policies, media awareness campaigns, and external interventions that will improve outcomes in concurrence with any proposed interventions. For example, public policies were implemented to limit opioid prescriptions in both Kentucky and Ohio in the summer of 2017 [12]. In addition, the Center for Disease Control and Prevention launched intensive media awareness campaigns on the dangers of opioids in these states in September of 2017 [13]. In the current COVID19 pandemic, interventions rolled out at the community level will compete with social distancing measures, novel treatments, and other interventions aimed at improving patient care. Such external factors may confound estimates of the intervention effect estimate in SWDs. This occurs when (i) factors external to the study (e.g., new public health policy) influence the primary outcomes over time, and (ii) the proportion of communities exposed to the intervention also increases with calendar time. This situation has been referred to as confounding by calendar time; [2, 14] an example of such confounding is the rising tide described above.
Failure to account for confounding by time might result in severely biased intervention effect estimates [2, 3, 14–20], and lead to both Type 1 and Type 2 errors. Hussey and Hughes suggested the use of mixed effects models for analyzing data from SWD [17]. To account for potential confounding by time, they recommend the incorporation of fixed time effects. However, models that incorporate secular trends common to all clusters may not be appropriate in this setting, because only some of the communities may be exposed to the external factors. Furthermore, the impact of these external factors on the outcome will likely differ in each community.
For these reasons, Girling and Hemming, Hooper et al., and Hemming et al. have suggested incorporating random clusterbytime effects into the mixed effects models [14, 18, 21]. This class of models captures confounding by time through these random effects. This is especially useful when the timing and level of exposure to external factors are unknown. However, these models implicitly assume that the random effect variance is the same for all clusters across all time points, and that the random time effects are independent within each cluster. Kasza et al. propose models with more general withincluster correlation structures [22]. These models are more applicable to the scenarios discussed here, since outcomes within a community are more likely to be similar for time periods before or after exposure to external factors. Furthermore, since exposure to external factors may increase with time, variation in the outcome may change with time as well.
The models described above assume that the random effects are identically distributed across clusters. This assumption may not appropriate in settings where the impact of external factors systematically differs between intervention and control communities. Kasza et al showed that the random effect covariance structure can have an important impact on the sample size and power [22], and that misspecification of this covariance structure can lead to biased estimates of the intervention effect [23].
Given the discussion above, we consider three mechanisms through which timevarying external factors can impact SWD studies: a) confounding of the effect of interest by time, b) inducing or facilitating noncompliance, and c) timevarying effect modification. The models we discuss below can accommodate all of these mechanisms, but interpretation of results requires consideration of the way in which external factors are operating. Noncompliance might arise, for example, through exposure to statewide policies and media awareness campaigns that cause control communities to adopt available components of the intervention prior to the scheduled rollout time. Premature adoption of components of the intervention may be seen as a form of rising tide; but its impact in this case arises through the effect of time on exposure to intervention. This differs conceptually from confounding by time in which the outcome, measured over time, is impacted by confounding factors that are unrelated to the intervention itself. Differential impact of external factors on intervention compared to control communities may be viewed as timevarying effect modification; we return to this point in the discussions below.
This paper discusses the issues described above in the context of a proposed SWD to reduce opioidrelated mortality in South Carolina, and shows these issues can be addressed through appropriate choice of mixed effects models, with regard to both fixed time effects and random effect covariance structure. Our objectives are to (1) consider the implications of assumptions regarding the impact of external factors for choice of mixedmodels – with a focus on our motivating example; (2) extend these models to accommodate different combinations of external factors; (3) conduct an extensive simulation study to examine the performance of mixed effects models under different model assumptions. Performance is assessed based on the bias, confidence interval coverage, power, and Type 1 error of the intervention effect estimate. We describe more fully what this estimate represents below. An important component of our simulation study is the generation of latent external factors from different distributions than that assumed by the model, i.e. normal distribution for random effects. This allows for investigation of the robustness of our methods to violations of model assumptions.
The outline of this paper is as follows. “Motivating example: HEALing communities study” section introduces the motivating example. In “Methods” section, we examine existing mixed effects models for SWD, discuss their limitations, and introduce alternative models that improve robustness to different combinations of external factors. “Simulation study” section conducts an extensive simulation study to examine the adequacy of these mixedeffects models under different scenarios regarding external factors. Discussion, extensions, and concluding remarks are provided in “Discussion” section.
Motivating example: HEALing communities study
The HEALing Communities Study: Developing and Testing an Integrated Approach to Address the Opioid Crisis[7] is used as an example to illustrate confounding by time in SWD. The purpose of this initiative was to develop and integrate a set of evidence based interventions using cluster randomized trials to reduce opioid overdose fatalities by 40% over a 3year period in communities across the United States. In response to this research funding announcement, our research team proposed a SWD to implement a comprehensive external facilitation intervention in 18 of South Carolina’s counties that were hit hardest by the opioid crisis.
The proposed SWD design is displayed in Fig. 1. Each cluster consists of all individuals in a given county. Each time interval corresponds to a 3month study period. The preintervention phase consists of two time periods between (study) month 0 and month 6, in which all clusters are in the control condition. The rollout phase consists of nine time periods between months 6 through 33, where two clusters are randomly assigned to receive the intervention at the beginning of each time period. The postintervention phase, in which all clusters receive the intervention, occurs between months 33 through 39. Data pertaining to the communities considered in our proposal are provided in Supplementary Table S1. The outcome, recorded at the end of each time period, consists of the total number of opioid overdose deaths in each cluster during the 3month time interval.
The proposed external facilitation intervention included the following components: 1) Integration of screening, intervention, and referral to treatment within health care settings; 2) implementation of programs and providers prescribing medicationassisted treatment and linkage to such treatment for people with opioid use disorder (OUD); 3) implementation of evidencebased school and communitybased OUD prevention programs; and 4) increasing availability and use of naloxone by first responders and the community. Given the amount of effort and resources currently directed to the fight against the opioid epidemic, there is potential for other events to affect the outcome. For example, in 2018 Executive Order No. 201743 was passed in South Carolina; this order set a 5day limit for certain opioid prescriptions [24]. Also in 2018, the South Carolina Division of Alcohol and Other Drug Abuse Services rolled out a media campaign to raise community awareness of opioid addiction [25]. This campaign included digital, social, and traditional media tactics, and was intended to cover all counties in South Carolina. These external factors are part of a rising tide of interventions and are expected to reduce the opioidrelated death count over time. Failing to account for this rising tide in an analysis will cause an upward bias in the estimation of the intervention effect. Similarly, an influx of synthetic opioids into the population will likely be associated with an increase in the death rate over time. A failure to account for this will cause underestimation of the intervention effect and may lead to Type II error. Another possibility is that control communities prematurely adopt components of the intervention before the scheduled rollout time. For example, the opioid overdose reversal medication naloxone may become widely available in control communities prior to the scheduled rollout time. Therefore premature adoption of a successful intervention will improve the outcome in control communities and attenuate the estimate of the intervention effect if unaccounted for.
Methods
We first introduce some notation. We denote Y_{ij} as the summary measure of the outcome for cluster i during time period j, i=1,...,N and j=1,...,n, where N denotes the number of clusters and n denotes the number of time periods in the study, which are assumed to be common to all clusters. In the motivating example, Y_{ij} is an aggregate count of opioid deaths in county i between months j and j+1. Using summary measures for the outcome in each cluster has implications when estimating the intervention effect under certain distributional assumptions for the outcome. We discuss this further in “Estimation” section.
We assume that the expected outcomes, denoted by μ_{ij}=E[Y_{ij}], come from a generalized linear mixed effects model (GLMM) with link function g. In the motivating example we consider μ_{ij}=E[Y_{ij}/O_{i}], where Y_{ij} assumes a Poisson distribution (g= log link), and O_{i} is an offset for the population size of cluster i. We also assume that all clusters receive the full intervention effect immediately after the scheduled implementation, i.e., that the intervention effect does not change with time. We denote the intervention effect by θ, and set the corresponding design matrix X_{ij}=1 if clusters i receives intervention at time j, and 0 otherwise.
Random intercept model to adjust for confounding by time
To adjust for confounding by calendar time, Hussey and Hughes recommended incorporation of a time effect in the GLMM [17]:
where α is the intercept, θ is the intervention effect, β_{j} is the discrete time effect, and \(b_{0i}\overset {\text {iid}}{\sim } N(0,\sigma _{b_{0}}^{2})\) for i=1,...,N are the random intercepts for each cluster.
In some cases, the timing and location of external factors are known in each cluster and their effects can be modeled. Often, investigators are unaware of all factors affecting the outcome. Throughout this paper we assume exposure to these factors are unknown. By incorporating only a single fixed effect for each time step, the standard Hussey and Hughes model requires that the effects of time are common to all clusters, and that the correlation between any two observations in the same cluster are independent of the time step. We note that with proper specification of the time effect, inference based on this model adjusts for such confounding even when it cannot be measured. Therefore, the resulting intervention effect estimate is unbiased. However, correct specification of the random effect structure is necessary for optimal precision.
Random period models to adjust for confounding by time
As misspecification of the time effects can lead to biased estimates of the intervention effect and its standard error, random clusterbytime period interaction effects have been incorporated in models [14,18,21]. This formulation, referred to as the Hooper/Girling model [22,23], allows the random intercept for each cluster to vary by time period. This model is discussed in “Random clusterbydiscrete time effect (uncorrelated, with single variance)” section below.
Random clusterbydiscrete time effect (uncorrelated, with single variance)
where \(b_{ij}\overset {\text {iid}}{\sim } N(0,\sigma _{b}^{2})\) are the timevarying random intercepts for each cluster i at time period j, i=1,...,N and j=1,...,n. These clusterspecific random effects are intended to capture the effects of confounding factors on the outcome by allowing unique secular trends for each cluster. We note that throughout this paper, we may use the term ”confounding factors” to indicate the presence of external factors that do not differentially impact intervention and control clusters.
There are limitations to the model above that arise from the distributional assumptions regarding the random effects b_{ij}: Constant variance over time, independence of the random effects, and identical correlation between two observations in the same cluster between any two time periods. If exposure to confounding factors increases over time, the variance of the random effect may increase as well. The assumption of independence of the random effects may be violated in the scenario where confounding factors impact clusters at all time intervals after exposure. Such a scenario could lead to greater similarity of random effects during time intervals that are entirely before or entirely after exposure than for random effects for a mix of intervals that are both before and after exposure. Furthermore, while the Hooper/Girling model allows both withincluster and withinperiod variation, it imposes a constant correlation across time (for observations in the same cluster).
Random clusterbydiscrete time effect (unstructured covariance)
To account for the limitations described above, we propose an unstructured covariance for the random clusterbydiscrete time interaction terms:
where \(\boldsymbol {b}_{i}^{*}=(b_{i1}^{*},...,b_{in}^{*})\overset {\text {iid}}{\sim } N(\boldsymbol {0},\boldsymbol {\Sigma _{b^{*}}})\) for i=1,...,N. Similar to model 2, the clusterspecific random effects \(b^{*}_{ij}\) allow unique secular trends for each cluster. However, the unstructured covariance matrix \(\phantom {\dot {i}\!}\boldsymbol {\Sigma _{b^{*}}}\) imposes no restrictions on the variance across time periods nor on the correlation between the latent time effects within each cluster, albeit with the assumption that the correlation structure is the same for all clusters. While model 3 imposes fewer restrictions than model 2, it requires the estimation of \(\frac {(n+1)\times n}{2}\) covariance parameters, which may greatly reduce the power to detect an intervention effect and may lead to issues of identifiability of regression parameter estimates.
Random cluster by linear time effect
An alternative to models 2 and 3 is to include a random slope for time in each cluster:
where \(\phantom {\dot {i}\!}(b_{0i},b_{1i}) \sim N(0,\boldsymbol {\Sigma _{b_{0},b_{1}}})\) and \(\boldsymbol {\Sigma _{b_{0},b_{1}}}\) is the 2×2 covariance matrix for (b_{0i},b_{1i}),i=1,...,N, and t_{j} is the time between the study start date and the beginning of time period j. The random effects b_{1i} allow for unique linear time trends in each cluster. Model 4 assumes the variance in the outcome changes monotonically with time and imposes restrictions on the correlation between observations within a cluster, and is therefore more restrictive than model 3.
Random groupbyperiod models to adjust for confounding by time and early adoption or timevarying effect modification
The models described in “Random period models to adjust for confounding by time” section are insufficient for the setting where external factors differentially impact intervention and control clusters. Examples of such scenarios are provided in Table 1. Events leading to premature adoption of intervention components by control clusters  a form of intervention noncompliance  will differentially impact intervention and control clusters. We refer to this scenario as early adoption, and assume this is unbeknownst to the investigator. To model this scenario, we need to include group ×time interaction terms, i.e., different fixed and random effects for control and intervention clusters. Unlike group ×time interaction models for timevarying treatments, these models accommodate situations in which secular trends systematically differ between intervention and control clusters.
In models 5 through 7, we incorporate a fixed discrete time effect, β_{j}, that is common to all clusters. Because incorporating a groupbydiscrete time interaction may lead to loss of power and potentially also to nonidentifiability, we include a fixed linear time effect in the control group by incorporating the term \(\phantom {\dot {i}\!}\gamma \times t_{j}\times 1_{\{X_{ij}=0\}}\) in models 5 through 7. In this setting, incorporation of a linear time effect in the control group models the effect of early adoption of a beneficial intervention on overdose death rates as a monotonic decrease with time.
We note that these same set of models would also be appropriate for the setting of timevarying effect modification. An example would be changes in insurance policy that either enhance the effect of the intervention (because access to opioid treatments that are not part of the intervention is reduced in control clusters) or reduce it (because such access is increased in control clusters). Both scenarios can be accommodated by the models below. In these models, the estimate of θ provides the intervention effect estimate, and γ captures the effect of early adoption. When timevarying effect modification is present, the parameter γ captures the difference in the effect of external factors between intervention and control groups. We discuss these models below.
Random clusterbyinterventionbydiscrete time effect (uncorrelated, with single variance for each group)
where γ×t_{j} is the difference in the time effect between control and intervention clusters at time period j, and \(c_{ij}\sim N(0,\sigma _{c}^{2})\) is a random time effect for control cluster i at time period j for i=1,...,N and \(j=1,..,t^{I}_{i}\), where \(t^{I}_{i}\) is the time period in which the intervention was scheduled to be rolledout to cluster i. For control clusters that prematurely adopt intervention components, the clusterspecific random effects (c_{ij}) would be expected to be larger (in magnitude) than those that do not. Model 5 is an extension to Model 2 (Hooper/Girling), in which the variance of the clusterbytime period random effects systematically differ between intervention and control clusters. This model is subject similar limitations as those discussed in “Random clusterbydiscrete time effect (uncorrelated, with single variance)” section.
Random cluster byinterventionbydiscrete time effect (unstructured covariance for each group)
To account for the limitations of model 5, we propose allowing an unstructured covariance for the random clusterbydiscrete time interaction terms in each group:
where \(\boldsymbol {c}_{i}^{*}=(c_{i1}^{*},...,c_{i,t^{I}_{i}}^{*})\overset {\text {iid}}{\sim } N(\boldsymbol {0},\boldsymbol {\Sigma _{c_{i}^{*}}})\) for i=1,...,N. Here \(\phantom {\dot {i}\!}\boldsymbol {\Sigma _{c_{i}^{*}}}\) consists of the first \(t^{I}_{i}\) rows and columns of \(\phantom {\dot {i}\!}\boldsymbol {\Sigma _{c^{*}}}\), where \(\phantom {\dot {i}\!}\boldsymbol {\Sigma _{c^{*}}}\) is the unstructured covariance matrix for a control cluster scheduled to receive the intervention during the final period of the rollout phase. Similar to model (5), the clusterspecific random effects, \(c^{*}_{ij}\), are intended to capture the effects of premature exposure to intervention components. This is an extension of model 3, which assumes the random clusterbydiscrete time effects shared a common (unstructured) covariance matrix. While the random effect covariance structure in model 6 has less restrictions than in model 5, it requires the estimation of \(\frac {(n+1)\times n + (n_{\tau }1)\times (n_{\tau }2)}{2}\) covariance parameters and is subject to similar limitations as discussed in “Random clusterbydiscrete time effect (unstructured covariance)” section. Here we define n_{τ} as the number of time periods prior to the followup phase of the study.
Random cluster byinterventionbylinear time effect
Similar to model 4, we can impose a random slope for time in each group.
where \(\phantom {\dot {i}\!}(c_{0i},c_{1i}) \sim N(0,\boldsymbol {\Sigma _{c_{0},c_{1}}})\) and \(\phantom {\dot {i}\!}\boldsymbol {\Sigma _{c_{0},c_{1}}}\) is the 2×2 covariance matrix for (c_{0i},c_{1i}),i=1,...,N. The clusterspecific random effects c_{1i} allow for unique linear time trends in each control cluster and are intended to capture the effects of premature adoption of intervention components.
Fixed effects for linear time in all clusters
To further limit loss of power and potential nonidentifiability due to the large number of parameters, we replace the fixed discrete time effect in models 5 through 7, β_{j}, with a fixed linear time effect, β×t_{j}, in models 8 through 10. This strategy may be useful in situations when the combined effects of external factors result in monotonic changes in the outcome over time (e.g., rising tide). For example, the implementation of a new policy intended to reduce opioid prescriptions in conjunction with media awareness campaigns may decrease the level of opioid overdose deaths with time.
Estimation
The estimands of interest in our setting may depend on the goals of the study and the nature of the timevarying external factors that impact the study. In the setting where only confounding by time is of concern, the causal estimand is the average (across clusters) intervention effect on the outcome of interest– which is assumed to be fixed. Unbiased estimation is made possible by inclusion of a fixed time effect as described above. In the setting of early adoption, the causal estimand of interest to investigators might also be the average intervention effect (compared to the control condition with no early adoption), but unbiased estimation is only possible if either we can accurately model the effect as a function of time or can measure the amount of early adoption (we return to this point in the discussion). Alternatively, investigators might be interested in the randomized intervention effect as a function of time; in the presence of early adoption, this effect would likely wane over time. Finally we consider the setting where timevarying external factors are effect modifiers. Here investigators might, once again, be interested in a causal estimand that is the average effect of intervention as a function of time. As before, the effect of the external factors must be modeled in order to obtain unbiased estimates of this estimand. We note that although spline and other flexible models might be used, we considered only the parametric models described above for inference. To investigate their robustness, we use different models for data generation (the mechanism of which is generally unknown) than for inference.
The models for inference in “Methods” section can be fit by specifying the Poisson family in the glmer function (package: lme4) in R [26]. Of note, when the outcome is a summary measure for each cluster (as is the case in our motivating example), we ran into estimation difficulties when specifying random clusterbydiscrete time effects when the outcome distribution was assumed to be Gaussian. This is due to incorporation of a residual term for each cluster for each time period, which yields unidentifiable random effects. This issue does not arise under Poisson distributional assumptions, since the variance is directly proportional to the mean and thus residual terms are not estimated. To accommodate the Gaussian distribution assumption in this scenario, multiple communities per cluster [14,18,21] or multiple intervals per time period would be needed. We note that the approach described above can be generalized to any outcome that can be modeled as arising from an exponential family distribution.
In the models above, the fixed time effects β_{j} and β are modeled to capture confounding by time. The fixed time effects γ captures the effect of early adoption or timevarying effect modification. These effects are all identifiable because the design matrices are full rank. Without additional assumptions, however, the above models cannot distinguish among the effects of external factors, early intervention adoption, or of other timevarying factors. Rather, they model temporal trends that may be a consequence of these factors; such models allow for assessment of the impact of such trends on bias, power, and Type 1 error. When used in analysis, our simulation studies demonstrate that these models can reduce bias in estimation of intervention effects, in the settings appropriate for their use.
Simulation study
We conduct a simulation study to investigate the impact of external factors and noncompliance on the bias, coverage probability of 95% confidence intervals, power, and Type 1 error of the intervention effect estimate. The data are simulated based on the motivating example described in “Motivating example: HEALing communities study” section. In this setting, we have N=18 clusters and n=13 time periods. The intervention is rolled out to each cluster according to the time line provided in Fig. 1. In all data generating models, the outcomes Y_{ij} are simulated from a Poisson distribution, and represent the number of opioid overdose deaths in cluster i during time period j. Here g is the log link, and the population size of cluster i, O_{i}, is included as an offset in all models. That is, g(μ_{ij})=log(E[Y_{ij}]/O_{i}). The intercept α is set to 10 and the standard deviation of the random intercept, \(\sigma _{b_{0}}\), is set to 0.30. These numbers were determined using the opioid death counts in the 18 South Carolina clusters between 2016 and 2018. The data are provided in Supplementary Table S1. The intervention effect θ is set to log(0.6), which represents the target 40% reduction in opioid overdose deaths due to intervention (Motivating example: HEALing communities study” section).
Simulation scenarios
We apply models 1 through 10 under four general scenarios: (1) standard, (2) confounding, (3) early adoption, and (4) confounding + early adoption or effect modification. We set the number of simulations for each scenario at 500. The data generation process for each scenario is summarized in Table 2, and is described in more detail below. It is important to distinguish between the data generation models described in this section, which are used to simulate the data, and the analysis models (for inference) described in “Methods” section, which are used to analyze the data. In these simulations, we intentionally generate the data from different models than those used for analysis. For example, we simulate the effect of confounding due to a rising tide by randomly selecting communities exposed to the underlying external event at each time point, and for each community, randomly generating the impact of the confounding factors on outcomes at each time point. This allows for investigation of the robustness of the proposed analysis models to both the parameterization of the fixed and random effects and the underlying processes (e.g., external factors, early adoption, etc.) that cause confounding, noncompliance, and/or effect modification.
In the first scenario (standard) we assume no confounding, early adoption, or effect modification is present; all clusters receive the intervention during the scheduled time period and no external factors influence the outcome. The data are generated according to scenario 1 in Table 2. The data generation model for scenario 1 is the same as (analysis) model 1 defined in “Random intercept model to adjust for confounding by time” section, with β_{j}=0 for j=1,...,n.
In the second scenario, all clusters not currently exposed to confounding factors by time period j have a 1 in N chance of exposure. Once exposed, each cluster continues to be exposed through the remainder of the study period. This is intended to reflect the situation where confounding factors have longlasting effects, such as new public policies limiting opioid prescriptions and media awareness campaigns. The number of clusters exposed to confounding factors at each time period j, \(n^{\prime }_{j}\), is simulated from a Binomial(Nj′,1/N) distribution, where \(n^{\prime }_{j}\) is the total number of clusters unexposed to the confounding factors prior to time period j.
Once exposed, the effect of confounding factors on the outcome in cluster i is set to vary uniformly at each time period j. Specifically, we let \(t^{\prime }_{i}\) denote the time period in which cluster i is exposed to confounding factors. We simulate β_{ij} under 2 settings. For j≥ti′, we simulate β_{ij}∼unif[−1,0], corresponding to a positive effect of confounding factors on the outcome opioid overdose deaths (scenario 2.1). This is intended to represent a rising tide of events aimed at improving outcomes. We set β_{ij}=0 for j<ti′, indicating that cluster i has yet to be exposed to confounding factors. To represent a negative impact of confounding factors on the outcome, we simulate β_{ij}∼unif[0,1] for j≥ti′ and β_{ij}=0 for j<ti′ (scenario 2.2).
In the third scenario (early adoption), the number of clusters prematurely adopting intervention components during each time period j, \(n^{*}_{j}\), is simulated from a \(Binomial\left (N^{*}_{j},\frac {NN^{*}_{j}+1}{2\times N}\right)\) distribution. Here \(n^{*}_{j}\) is the number of control clusters which have not adopted any intervention components by time period j. This set up reflects the nature of SWD, where the number of control clusters at risk for early adoption decreases with time as more clusters crossover to the intervention group by design. This formulation allows the probability of exposure to intervention components increase at each time point j (for an unexposed cluster). This situation is feasible in certain settings, such as when intervention components become more widely available with time, thus increasing the probability of exposure for a control cluster.
For all time periods j in which a cluster is subjected to early adoption, the magnitude of the early intervention adoption effect is set to vary uniformly. Denote \(t^{*}_{i}\) as the time period of early adoption for cluster i and denote \(t^{I}_{i}\) as the time period in which the intervention was scheduled to be rolledout to cluster i. We simulate θ_{ij}∼unif[θ,0] for \(j=t^{*}_{i},...,t^{I}_{i}1,\) and θ_{ij}=0 otherwise. Thus the maximum effect of early intervention adoption on the outcome for exposed control clusters is set to θ. In this scenario, the event (i.e., early adoption) has a positive effect on the outcome in the control group only.
In the fourth scenario, the effects of confounding factors and early adoption are simulated in the same manner as in scenario 2 and scenario 3, respectively. In scenario 4.1, confounding factors have a positive effect on the outcome. In scenario 4.2, confounding factors have a negative effect on the outcome. The negative impact of confounding factors on the outcome is partially offset by the positive impact of early adoption on the outcome (for control communities) in scenario 4.2. The data generation model for timevarying effect modification is similar to the generation model for simultaneous confounding and early adoption described in scenario 4. Here, the parameters β_{ij} would be interpreted as the effects of external factors on the outcomes in the intervention communities, and θ_{ij} would be interpreted as the differences in the effects of external factors between intervention and control communities.
Results
Simulation results are presented in Table 3. Under scenario 1 (standard), all models yield unbiased estimates of the intervention effect, reach nominal confidence interval coverage rates, and preserve Type 1 error. Model 1, which does not include any random effects for time, yields unbiased estimates of the intervention effect in scenario 2 (confounding), where external factors do not differentially impact control and intervention clusters. However, coverage probabilities are below the nominal level of 0.95 and Type 1 error is inflated. Model 1 is heavily biased in scenarios 3 (early adoption) and 4 (confounding & early adoption), where external factors differentially impact intervention and control clusters.
Models 2 through 4, which include random clusterbytime effects, yield unbiased estimates of the intervention effect when external factors do not differentially impact control and intervention clusters (i.e., scenarios 1 and 2). Model 3, which assumes an unstructured covariance for the random clusterbytime interactions, performs the best with regard to coverage probabilities of the 95% confidence intervals and Type 1 error preservation. Model 2, which assume a single variance for the random clusterbytime interactions (Hooper/Girling model), perform slightly worse on these metrics. Model 4, which assume a random slope for time for each cluster, has the highest inflation in Type 1 error with coverage probabilities well below the nominal level of 0.95. Models 2 through 4 are overpowered for scenarios 1 and 2. These models perform poorly when early adoption is present in scenarios 3 and 4 (i.e., when external factors differentially impact intervention and control clusters). The estimated intervention effect is reduced by 32% to 34.3% in these scenarios.
The performance of models with and without interventionbytime interactions is compared in Fig. 2. Models without interventionbytime interactions are displayed in the left column, where the blue shapes correspond to models 2 through 4. For comparison, we include models which replace the discrete main effect for (fixed) time in models 2 through 4, β_{j}, with the linear fixed effect β×t_{j}. These models are labeled by the red shapes in the left column of Fig. 2. The models in the right column of Fig. 2 include fixed and random interventionbytime interactions. These models correspond to models 5 through 10.
Models 5 through 10 perform well in all simulation scenarios. When no confounding, early adoption, or effect modification is present (scenario 1), these models yield little bias, reach nominal coverage probabilities for 95% confidence intervals, and preserve Type 1 error. With the exception of model 10, all models have less than 10% bias when only confounding is present (scenario 2). These models generally reach nominal confidence interval coverage probabilities and preserve Type 1 error. The exceptions are models 7 and 10, which include random clusterbygroupbylinear time interaction effects.
When external factors differentially impact intervention and control clusters due to early adoption (scenarios 3 and 4), models 5 through 10 greatly reduce the bias in the intervention effect estimate compared to models which do not include interventionbytime interactions. Models 6 and 9, which assume an unstructured covariance for the random clusterbygroupbydiscrete time interactions, generally yield the lowest bias, while reaching nominal coverage probabilities of the 95% confidence intervals and preserving Type 1 error. Models 5 and 8 (Hooper/Girling models) perform slightly worse on these metrics. Models 7 and 10 have the highest inflation in Type 1 error and lowest coverage probabilities.
The choice of a discrete or linear term for the fixed time effect does not impact power in models without interventionbytime interactions. This result is expected given the findings of Grantham et al. [27]. For models which include interventionbytime interaction terms (i.e., models 5 through 10), the strongest correlate of power is the fixed effect for time. Models with discrete time effects, which incorporate a parameter for each time period, have much lower power compared to models which incorporate a single parameter corresponding to a linear time effect. For a given fixed time effect, models with a random slope for time achieve the highest power (model 7 among fixed discrete time effects; model 10 among fixed linear time effects). Models which assume an unstructured covariance for random clusterbytime effects achieve the lowest power (model 6 among fixed discrete time effects; model 9 among fixed linear time effects).
Discussion
SWDs and alternative cluster randomized trials are currently being used to implement interventions to reduce opioid overdose deaths in communities across the United States. However, these interventions are competing with newly initiated public policies and media awareness campaigns. Furthermore, control communities may adopt components of proposed interventions as they become readily available. These scenarios induce confounding by time, treatment noncompliance, and timevarying effect modification. Given the difficulties in capturing the timing and exposure levels of external factors, we considered mixedeffects models which incorporate fixed and random time effects to account for these latent factors. We discussed the limitations of commonly used models in the context of proposed SWDs to combat the opioid epidemic, and proposed solutions to accommodate deviations from these assumptions.
Through our simulation study, we showed that mixed effects models are sensitive to the scenarios considered here (i.e., confounding, noncompliance, and effect modification). While the Hooper/Girling model [18, 21] offers an improvement over the standard Hussey and Hughes model [17], it may result in severely biased estimates of the intervention effect when secular trends systematically differ between intervention and control clusters. Even in scenarios where these models do capture the secular trend, we demonstrated how incorrect specification of the clusterlevel covariance over time can yield under coverage of confidence intervals and inflation of Type 1 error. Similar conclusions have been reached in other studies [3, 23, 28, 29].
Alternatively, models which allow secular trends to systematically differ between the intervention and control clusters through incorporation of fixed and random groupbytime effects offer a major improvement in terms of bias reduction. Our simulation studies confirmed that models incorporating unstructured clusterlevel covariances for the random interventionbyclusterbytime interaction terms yielded nominal confidence interval coverage rates and preserved Type 1 error (i.e., models 6 and 9). However, these models may be underpowered for certain parameterizations of the fixed time effects.
The proposed mixedeffects modeling framework discussed in this paper treats external factors as latent processes; these are accounted for through the incorporation of fixed and random time effects. This modeling strategy is useful in practice because investigators are often unaware of all external factors affecting the outcome, let alone each cluster’s level of exposure to these factors. For the models considered in this study, correct specification of the fixed time effects were primarily responsible for the attenuation of bias in the intervention effect estimate.
Our results also have potential implications for data collection in steppedwedge studies. As previously noted, unbiased estimation of the estimands of interest in settings of timevarying effect modification and early adoption rely on assumptions about these effects. Information collected during the study on the processes of interest could also be incorporated in models. These processes include potential confounding, early adoption, and effect modification. Modeling these processes is more straightforward in a setting in which external events, such as changes in insurance policy, cannot be affected by study outcomes. Even if this condition holds, however, important confounding factors may not be known and, even if known, may not be easy to measure. For issues such as early adoption of treatment, one must be concerned about the potential “confounding by indication” wherein participants or their providers might make choices based on the individual characteristics as they vary over time. In this context, unbiased estimation of the average intervention effect would require correct modeling of the mechanism that leads to early adoption as a function of measured confounders for early intervention [30].
Limitations and future research
An important limitation of the models we discussed in this paper is that they assume the intervention effect does not vary with time. Models attempting to account for both secular trends which differ by intervention group, and timevarying treatment effects, may lead to unidentifiable intervention effect estimates. Furthermore, the approach discussed here is limited to GLMM. Generalized estimating equations (GEE) also allow for correlation in the outcomes, and are robust to misspecification of the covariance structure [31]. Ren et al. show that GEE is more robust to model misspecification than linear mixed models when the random intercepts differ by intervention group [29]. Furthermore, GEE’s lend themselves naturally to development of doubly robust estimators for application to data sets with missing observations. Future work is needed to explore the performance of GEE models in the context considered here, where secular trends in the control and intervention clusters arise from different mechanisms, and differentially impact clusters within each group. Several nonparametric methods have also been proposed that use withinperiod [3, 28, 32] or between period [3] comparisons to account for confounding by time. While these models are robust to misspecification of the random effects, the performance of these models when external factors differentially impact intervention and control clusters, such as in the case of early intervention adoption, has not been explored.
Although our paper focused on steppedwedge designs, the secular trends in outcomes induced by rising tides and early intervention adoption may also be present in other types of cluster randomized trials. Simulation studies are needed to determine the impact of such scenarios on the bias of the intervention effect estimate, Type 1 error, and power in these settings. We note that Grantham et al. establish a sufficient condition for when the choice of time parameterization does not impact the variance of the estimated intervention effect in cluster randomized trials [27]. This information can be useful in the planning of such trials. While Grantham et al establish that categorical or linear fixed time effects do not impact the variance estimator of the intervention effect in SWD, this was not the case when groupbytime interaction terms were modeled (as demonstrated by our simulation study). Future investigation is needed to establish sufficient conditions in the presence of interactions.
Conclusion and recommendations
Steppedwedge designs are particularly suitable for epidemic and pandemic settings, in which interventions need to be implemented rapidly. In these settings, there is a high probability of external implementation of other possible interventions–or even components of the intervention under study–while the trial is ongoing. As a consequence, consideration must be given to these scenarios during the planning stages. Our paper demonstrates that incorporating fixed and random groupbytime effects can reduce bias in settings where external factors differentially impact intervention and control clusters. As incorporation of such time effects is likely to impact power, power calculations must take this into account. We note that formulas may not be currently available for models with more complex random effect covariance structures  simulation studies may be needed for accurate estimation of sample size and power for such settings.
Models incorporating an unstructured covariance for the random interventionbyclusterbytime interaction effects are most effective in reducing bias, achieving nominal confidence interval coverage, and preserving Type 1 error control; but they can lead to loss of efficiency in analysis and reduce power when used in conjunction with discrete fixed time effects. One strategy is to use fixed parametric interventionbytime effects with an unstructured covariance; such models perform reasonably well in the scenarios considered here. Alternatively, one can impose a more restrictive covariance structure such as exponential decay over time, Hooper/Girling covariance, or random slopes for time. To guard against potential under coverage of confidence intervals and Type 1 error inflation, randomizationbased inference should be used for such covariance structures [33, 34].
Availability of data and materials
The dataset supporting the simulation study in this article is included as a supplementary file.
Abbreviations
 SWD:

Steppedwedge design
 COVID19:

Coronavirus disease 2019
 OUD:

Opioid use disorder
 GLMM:

Generalized linear mixed model
References
 1
Brown CA, Lilford RJ. The stepped wedge trial design: a systematic review. BMC Med Res Methodol. 2006; 6(1):54.
 2
Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ. 2015:350.
 3
KennedyShaffer L, De Gruttola V, Lipsitch M.Novel methods for the analysis of stepped wedge cluster randomized trials. Stat Med. 2020; 39(7):815–44.
 4
Mdege ND, Man MS, Taylor Nee Brown CA, Torgerson DJ. Systematic review of stepped wedge cluster randomized trials shows that design is particularly used to evaluate interventions during routine implementation. J Clin Epidemiol. 2011; 64(9):936–48.
 5
KerielGascou M, BuchetPoyau K, Rabilloud M, Duclos A, Colin C.A stepped wedge cluster randomized trial is preferable for assessing complex health interventions. J Clin Epidemiol. 2014; 67(7):831–3.
 6
Becker WC, Fiellin DA. When Epidemics Collide: Coronavirus Disease 2019 (COVID19) and the Opioid Crisis. Ann Intern Med. 2020. American College of Physicians.
 7
NIH. HEALing Communities Study. 2019.
 8
NIH. NIH funds study in four states to reduce opioid related deaths by 40 percent over three years. 2019.
 9
NIH RePORTER. OPTIMIZING HEALING IN OHIO COMMUNITIES (OHIO). 2019.
 10
NIH RePORTER. KENTUCKY CAN HEAL (COMMUNITIES AND NETWORKS HELPING END ADDICTION LONGTERM). 2019.
 11
Bion J, Richardson A, Hibbert P, et al.‘Matching Michigan’: a 2year stepped interventional programme to minimise central venous catheterblood stream infections in intensive care units in England. BMJ Qual Saf. 2013; 22(2):110–23.
 12
Davis CS, Lieberman AJ, HernandezDelgado H, Suba C.Laws limiting the prescribing or dispensing of opioids for acute pain in the United States: A national systematic legal review. Drug Alcohol Depend. 2019; 194:166–72.
 13
Center for Disease Control and Prevention. CDC launches campaign to help states fight prescription opioid epidemic. 2017.
 14
Hemming K, Taljaard M, Forbes A.Analysis of cluster randomised stepped wedge trials with repeated crosssectional samples. Trials. 2017; 18(1):101.
 15
Thompson JA, Fielding KL, Davey C, Aiken AM, Hargreaves JR, Hayes RJ. Bias and inference from misspecified mixedeffect models in stepped wedge trial analysis. Stat Med. 2017; 36(23):3670–82.
 16
Taljaard M, Teerenstra S, Ivers NM, Fergusson DA. Substantial risks associated with few clusters in cluster randomized and stepped wedge designs. Clin Trials J Soc Clin Trials. 2016; 13(4):459–63.
 17
Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007; 28(2):182–91.
 18
Hooper R, Teerenstra S, Hoop Ed, Eldridge S. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.Stat Med. 2016; 35(26):4718–28.
 19
Barker D, McElduff P, D’Este C, Campbell MJ. Stepped wedge cluster randomised trials: a review of the statistical methodology used and available. BMC Med Res Methodol. 2016; 16(1):69.
 20
Nickless A, Voysey M, Geddes J, Yu LM, Fanshawe TR. Mixed effects approach to the analysis of the stepped wedge cluster randomised trial—Investigating the confounding effect of time through simulation. PLoS ONE. 2018; 13(12).
 21
Girling AJ, Hemming K.Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med. 2016; 35(13):2149–66.
 22
Kasza J, Hemming K, Hooper R, Matthews J, Forbes A.Impact of nonuniform correlation structure on sample size and power in multipleperiod cluster randomised trials. Stat Methods Med Res. 2019; 28(3):703–16.
 23
Kasza J, Forbes AB. Inference for the treatment effect in multipleperiod cluster randomised trials when random effect correlation structure is misspecified. Stat Methods Med Res. 2019; 28(1011):3112–22.
 24
McMaster H, Governor SCO. Executive order no. 201743. 2017.
 25
South Carolina DAODAS. Library Catalog: www.daodas.sc.gov.
 26
Bates D, Mächler M, Bolker B, Walker S. Fitting Linear MixedEffects Models using lme4. J Stat Softw. 2015; 67(1):1–48. arXiv: 1406.5823.
 27
Grantham KL, Forbes AB, Heritier S, Kasza J. Time Parameterizations in Cluster Randomized Trial Planning. Am Stat. 2020; 74(2):184–9. Taylor & Francis.
 28
Thompson JA, Davey C, Fielding K, Hargreaves JR, Hayes RJ. Robust analysis of stepped wedge trials using clusterlevel summaries within periods. Stat Med. 2018; 37(16):2487–500.
 29
Ren Y, Hughes JP, Heagerty PJ. A Simulation Study of Statistical Approaches to Data Analysis in the Stepped Wedge Design. Stat Biosci. 2019.
 30
Sitlani CM, Heagerty PJ, Blood EA, Tosteson TD. Longitudinal structural mixed models for the analysis of surgical trials with noncompliance. Stat Med. 2012; 31(16):1738–60.
 31
Fitzmaurice G, Laird NM, Ware JH. Applied longitudinal analysis: Wiley; 2012.
 32
Hughes JP, Heagerty PJ, Xia F, Ren Y. Robust Inference for the Stepped Wedge Design. Biometrics. 2019:biom.13106.
 33
Wang R, De Gruttola V. The use of permutation tests for the analysis of parallel and steppedwedge clusterrandomized trials. Stat Med. 2017; 36(18):2831–43.
 34
Ji X, Fink G, Jacob Robyn P, S. Small D. Randomization inference for steppedwedge clusterrandomized trials: An application to communitybased health insurance. Ann Appl Stat. 2017; 11:1–20.
Acknowledgments
We thank the reviewers for their valuable comments and suggestions to this manuscript.
Funding
This research is supported in part by the National Institute of Allergy and Infectious Diseases (R37 51164), National Institute on Drug Abuse (R01DA034086), and PatientCentered Outcomes Research Institute (PCORI) Award HPC150328122. These funding agencies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Affiliations
Contributions
LR, MH, and VG conceived the methods and LR drafted the manuscript. VDG and MH helped write the paper. VDG, MH, and AHL critically reviewed the contents of the paper. AHL designed the study used as a motivating example in this paper. LR carried out the simulation study. All authors have read and approve the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
No ethical approval or consent was required for this simulation study based on a proposed trial design.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional material
Additional file 1
Opioid overdose deaths in 18 South Carolina communities between 2016 and 2018.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Rennert, L., Heo, M., Litwin, A.H. et al. Accounting for confounding by time, early intervention adoption, and timevarying effect modification in the design and analysis of steppedwedge designs: application to a proposed study design to reduce opioidrelated mortality. BMC Med Res Methodol 21, 53 (2021). https://doi.org/10.1186/s12874021012296
Received:
Accepted:
Published:
Keywords
 Stepped wedge design
 Cluster randomized trials
 Study design
 Confounding
 Secular trends
 Analysis
 Opioids
 COVID19
 Epidemic
 Pandemic