 Research
 Open access
 Published:
Power calculation for detecting interaction effect in crosssectional steppedwedge cluster randomized trials: an important tool for disparity research
BMC Medical Research Methodology volume 24, Article number: 57 (2024)
Abstract
Background
The steppedwedge cluster randomized trial (SWCRT) design has become popular in healthcare research. It is an appealing alternative to traditional cluster randomized trials (CRTs) since the burden of logistical issues and ethical problems can be reduced. Several approaches for sample size determination for the overall treatment effect in the SWCRT have been proposed. However, in certain situations we are interested in examining the heterogeneity in treatment effect (HTE) between groups instead. This is equivalent to testing the interaction effect. An important example includes the aim to reduce racial disparities through healthcare delivery interventions, where the focus is the interaction between the intervention and race. Sample size determination and power calculation for detecting an interaction effect between the intervention status variable and a key covariate in the SWCRT study has not been proposed yet for binary outcomes.
Methods
We utilize the generalized estimating equation (GEE) method for detecting the heterogeneity in treatment effect (HTE). The variance of the estimated interaction effect is approximated based on the GEE method for the marginal models. The power is calculated based on the twosided Wald test. The Kauermann and Carroll (KC) and the Mancl and DeRouen (MD) methods along with GEE (GEEKC and GEEMD) are considered as biascorrection methods.
Results
Among three approaches, GEE has the largest simulated power and GEEMD has the smallest simulated power. Given cluster size of 120, GEE has over 80% statistical power. When we have a balanced binary covariate (50%), simulated power increases compared to an unbalanced binary covariate (30%). With intermediate effect size of HTE, only cluster sizes of 100 and 120 have more than 80% power using GEE for both correlation structures. With large effect size of HTE, when cluster size is at least 60, all three approaches have more than 80% power. When we compare an increase in cluster size and increase in the number of clusters based on simulated power, the latter has a slight gain in power. When the cluster size changes from 20 to 40 with 20 clusters, power increases from 53.1% to 82.1% for GEE; 50.6% to 79.7% for GEEKC; and 48.1% to 77.1% for GEEMD. When the number of clusters changes from 20 to 40 with cluster size of 20, power increases from 53.1% to 82.1% for GEE; 50.6% to 81% for GEEKC; and 48.1% to 79.8% for GEEMD.
Conclusions
We propose three approaches for cluster size determination given the number of clusters for detecting the interaction effect in SWCRT. GEE and GEEKC have reasonable operating characteristics for both intermediate and large effect size of HTE.
Background
The push towards studying and improving racial and ethnic disparities in healthcare has grown in recent years. Across various settings, retrospective studies continue to demonstrate that Black patients are less likely to obtain adequate care and outcomes than that of White patients [1,2,3,4]. Within the field of palliative care, Black patients are less likely to receive advanced care planning and to enroll in palliative care than other racial groups [5,6,7]. One step towards mitigating these findings includes implementing interventions, first introduced through randomized controlled trials, which work to close the health disparity gap between minority and nonminority patients. Examples include properly training and removing racial bias among physicians [6], improving communication between physicians and patients [8, 9], and integrating automated tools into physicians’ decision making to prompt care [10]. While the interventions should work to increase quality of care among all patients, in order to truly tackle the issue of disparity, they should have a greater impact on minority patients, as their baseline care is usually less adequate than that of nonminorities. In the statistical models analyzing the impact of these interventions, the main interest is the magnitude and significance of the interaction term between race and the exposure variable.
Interventions aimed at reducing racial disparities in healthcare often tackle change at the provider or clinic level, suggesting the use of cluster randomized trials (CRT). This has been conducted in several published trials. In a twoarm, parallel group CRT, researchers aimed to increase unmet palliative care needs among both Black and White intensive care unit (ICU) families by introducing an automated web app, aimed at improving communication between physicians and families [9]. While this study relied on a parallel group CRT, these designs are not always feasible, particularly in cases where the intervention cannot be denied to half of the participants. In these situations, the steppedwedge cluster randomized trial (SWCRT) is an appealing design [11]. Appealing features of SWCRTs include having each cluster act as its own control and not needing to withhold the intervention from any patient due to the key feature of only needing unidirectional crossover of clusters from control to intervention in a timestaggered manner. However, from a statistical perspective, the SWCRT design gives us complex correlation structures compared to the standard CRT design due to the unidirectional crossover from control to intervention.
Several approaches for optimal sample size determination of SWCRT for estimating overall treatment effect (OTE) are available in published literature [12,13,14,15,16,17] for both continuous and binary outcomes. Various software implementations of these approaches are summarized in existing literature [18, 19]. In the study design stage for CRT, detecting the overall intervention effect is the main consideration in the sample size determination/power calculation, and examination of differential effect by group is the secondary objective [9, 20]. However, in trials assessing health disparities, the key interest is in examining whether the intervention effect differs among study subgroups. In healthcare delivery research, examination of differential effect by group would be also considered in the study planning stage. Sample size calculation methods for continuous outcomes based on the interaction level have been studied extensively in recent advances [21, 22]. However, conducting power calculations around the interaction level in these types of trials is challenging, and sample size determination for detecting an interaction effect between intervention status variable and a key covariate in the SWCRT study has not been proposed yet. Since binary outcome measures are the most common outcome in SWCRTs for healthcare delivery research, we focus on binary outcomes in this study.
There is limited evidence of similar healthequity focused CRTs properly powering their studies for the interaction term or providing enough transparency of methodologies used. The parallelgroup CRT assessing palliative care needs among ICU families powered their study around the OTE for all patients as well as by independently powering each racial subgroup for identical OTEs. However, the study admitted they likely did not have enough power for the interaction effect between the exposure and race [9]. Similarly, a SWCRT aimed at increasing the rates of advanced care planning (ACP) among AfricanAmerican patients through a structured ACP intervention conducted their main sample size calculation around the OTE, and they briefly mentioned they had sufficient power in the secondary objective for detecting a reduction in the racial disparity gap [8].Limited information was provided on how this calculation was conducted, preventing future researchers from utilizing their design. Aside from the statistical drawbacks that come with underpowering a study, failing to address that the effect size for the minority group should be larger than that of the nonminority group is an issue. By designing the study to achieve the same effect size among all subgroups, the issue of the disparity gap will remain. Instead, these types of studies should be powered to detect a larger effect size in the minority group.
Researchers at Mount Sinai Hospital System (MSHS) are planning to conduct a crosssectional SWCRT for advanced cancer patients who are at high risk for dying within a short period of time (less than 6 months). There would be two choices for the patients and their family members that oncologists are expected to discuss as the ‘goals of care’ (GoC): (1) remaining in active treatment with curative intent and (2) moving into hospice care for management of pain and discomfort only. A machinelearning based predictive model for 6month predictive mortality has already been optimized with retrospective data and validated with prospective data. A clinical decision support system (CDSS) with information from the predictive model will guide the physician through the electronic medical record (EMR) system to have a GoC for the patient predicted at highrisk for mortality. Physicians in the intervention arm will also be trained on how to have the GoC with patients and family members and will be shown data on low GoC conversations for the minority (Black and Hispanic) patients at MSHS. Therefore, the intervention of the planned SWCRT is a combination of training and a CDSS based alert about prediction. The physicians in the control arm will have neither the CDSS alert nor the training. The binary outcome in this case is having the GoC conversation or not. The binary covariate is race (White/Black). It is hypothesized that the proportion of GoC in Black patients will increase at a higher rate than for White patients with the same intervention. This is equivalent to testing the hypothesis that the interaction effect between intervention and race is significant.
In this article, our main focus of the study design is on the crosssectional SWCRT design, in which the outcome is measured on individuals within each cluster at each time period. For example, in a crosssectional SWCRT to evaluate a hospitalbased psychological care quality improvement intervention for cancer patients, the binary outcome of healthrelated quality of life improvement (improved vs worse/unchanged) was measured among crosssections of individuals attending the hospitals during each timeperiod [6]. We propose approaches to this problem in a crosssectional SWCRT setting with an individual binary covariate. We utilize the generalized estimating equation (GEE) model setting for these formulations for binary outcomes. We organize this article in the following manner. In Sect. 2, we introduce the three proposed approaches for obtaining the optimal sample size for detecting interaction effect in the crosssectional SWCRT. In Sect. 3, we describe the planned simulation study for estimating comparative empirical type I and type II error rates for several settings. In Sect. 4, we show simulation results. Section 5 discusses a motivational example for this manuscript. We provide an illustration of the proposed approaches using a real data example from the MSHS study. We conclude our article in Sect. 6 and Sect. 7 with some discussion and plans for expanding our study to continuous and censored endpoints as well as detecting the interaction effect with clusterlevel covariates.
Methods
Statistical approaches in the crosssectional SWCRT
In general, the design and analysis for SWCRT has been based on either conditional model approaches or a marginal model approach. Conditional model approaches require specification of fixed effects and random effects. Linear mixed models (LMM) are used for continuous outcomes, while generalized linear mixed models (GLMM) are used for binary outcomes. When we consider a simpler model without an interaction between intervention and covariates, fixed effects include period effect terms and intervention effect term, while random effects include random cluster effect and clusterbytime interaction. For the closed cohort SWCRT, the random effect for the repeated measures from an individual within the same cluster is needed. Using these random effect terms, conditional approaches enable us to have flexible random effect structures. While the LMM benefits from the flexible random effect structures due to the interpretation that the intervention effect is unrelated to these random effect structures. The interpretation of the intervention effect is not straightforward for the GLMM since it depends on the specification of the random effect structure. For sample size determination, flexible modeling of random effects requires distributional assumptions. These assumptions need more information in the study design stage. However, the marginal model approach provides intuitively straightforward interpretation of intervention effect since it focuses on the populationaveraged intervention effect. This approach needs two specifications: marginal mean model and a working correlation structure. Unlike conditional model approaches, the intervention effect interpretation does not depend on correlation structure modeling [23].
Models for estimating the heterogeneity of effect in the crosssectional SWCRT
We consider the marginal model via the generalized estimating equation (GEE) approach for the crosssectional SWCRT design, where we focus on the interaction effect between the intervention and a binary individual covariate. Let \({Y}_{ijk}\) be a binary outcome of the \(k\) th individual (\(k\in \left\{1,...,{m}_{i}\right\}\)) in the \(i\) th cluster (\(i\in \left\{1,...,I\right\}\)) at the \(j\) th time interval (\(j\in \left\{1,...,J\right\}\)), where m_{i} denotes cluster size for cluster i, I denotes the total number of clusters, and J denotes total time steps and \({Y}_{ijk}=1\) denotes the event of interest and \({Y}_{ijk}=0\) otherwise. A GEE model is formulated for the crosssectional SWCRT as follows:
where \({\mu }_{ijk}={\mathbb{E}}\left[{Y}_{ijk}\right]\), denotes the marginal mean response of \({Y}_{ijk}\), \({\theta }_{0}\) is the baseline logodds of the outcome in the control group corresponding to the reference group for the binary covariate (\({X}_{ijk}\in \left\{\mathrm{0,1}\right\}\), \({X}_{ijk}=0\) represents the reference group and \({X}_{ijk}=1\) represents the other), and \({\gamma }_{j}\) is the period fixed effect for the \(j\) th time interval. \({W}_{ij}\) is the design indicator; \({W}_{ij}=1\) means that all individuals in cluster \(i\) at time interval \(j\) receive the intervention and \({W}_{ij}=0\) otherwise. \({\theta }_{1}\) is the overall treatment effect (OTE), \({\theta }_{2}\) is main effect of binary covariate, and \({\theta }_{3}\) captures the interaction between treatment and the binary covariate (i.e., heterogeneity in treatment effect denoted by HTE). For identification, we set \({\gamma }_{1}=0\). We use \(\Theta =({\gamma }_{2},\cdots ,{\gamma }_{J},{\theta }_{0},{\theta }_{1},{\theta }_{2},{\theta }_{3})\in {\mathbb{R}}^{J+3}\) for the parameters and variance of \({Y}_{ijk}\) is defined as \({\upsilon }_{ijk}\), where \({\mathbb{R}}^{d}\) represents the space of all \(d\)dimensional real vectors and \({\upsilon }_{ijk}={\mu }_{ijk}(1{\mu }_{ijk})\) for binary outcome. Hence model (1) can be written as
where \({\mathbf{M}}_{ijk}:={\left[\begin{array}{ccccc}1& {\mathbf{e}}_{j}^{\intercal }& {W}_{ij}& {X}_{ijk}& {W}_{ij}{X}_{ijk}\end{array}\right]}^{\intercal }\in {\mathbb{R}}^{J+3}\) with \({\mathbf{e}}_{j}\) as the vector of length \(J1\) with all elements equal to 0 except the \(j1\) th element, which is equal to 1. Now by stacking \({\mathbf{M}}_{ijk}\)’s as the matrix of \(J{m}_{i}\) rows and \(J+3\) columns, namely,
We reach
where \({\mathbf{u}}_{i}={\left[\begin{array}{cccccc}{\mu }_{i11}& \cdots & {\mu }_{i1{m}_{i}}& {\mu }_{i21}& \cdots & {\mu }_{iJ{m}_{j}}\end{array}\right]}^{\intercal }\) and the function logit here is applied to the vector \({\mathbf{u}}_{i}\) elementwise.
If the HTE is to be tested with respect to \({X}_{ijk}\), the interaction effect parameter \({\theta }_{3}\), instead of the OTE \({\theta }_{1}\), should be considered for the sample size calculation. In this case, the null hypothesis \({H}_{0}: {\theta }_{3}=0\) is to be tested against an alternative hypothesis \({H}_{{\text{a}}}: {\theta }_{3}=\delta\) for some prespecified HTE level \(\delta \ne 0\). For the purpose of simplification, we assume the SWCRT has equal cluster sizes, i.e. \({m}_{1}=\cdots ={m}_{I}=m\). If the HTE level \({\theta }_{3}\) is estimated by a consistent, asymptotically normally distributed estimator \({\widehat{\theta }}_{3,m}\), then the power is approximately calculated using the twotailed Wald test,
where \(\Phi\) is the standard normal distribution function and \({z}_{1\alpha /2}\) is the \({(1\alpha /2)}^{{\text{th}}}\) standard normal quantile. When estimating the OTE, t distribution is also recommended particularly for SWCRT designs with small number of clusters:
where \({\Phi }_{t,{\text{df}}}\) is the cumulative t distribution function with certain degree of freedom (df). Although the proposed method can be used to determine either the number of clusters or cluster size, in this article we focus on determining cluster size and we provide R code for both methods. The use of formula (2) or (3) requires an approximation of \({\mathbb{V}}ar\left({\widehat{\theta }}_{3,m}\right)\), which is the (J + 3, J + 3)^{th} element in the modelbased variance–covariance matrix, \({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\) where \({\widehat{\Theta }}_{m}\) is the GEE estimator of \(\Theta\). In this article, we provide the sandwich form of approximation for \({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\)’s, which arises from the GEE method. Due to the presence of the individual binary covariate, the closed form of the sample size calculation is not available. We consider two correlation structures: a simple exchangeable correlation structure and a nested exchangeable correlation structure which can be based on the values of withinperiod correlation and betweenperiod correlation. The intraclass correlation (ICC) measures the correlation on the outcome for different individuals in the same cluster within a given time period. The withinperiod correlation is same as the intraclass correlation (ICC). The cluster autocorrelation (CAC) is the correlation between the population means from the same cluster at two different time periods and is defined as the ratio of betweenperiod correlation and withinperiod correlation. Hence, a simple exchangeable correlation structure has the same value for withinperiod correlation and betweenperiod correlation (i.e., CAC = 1), and a nested correlation structure has different values for withinperiod correlation and betweenperiod correlation. In general, betweenperiod correlation is smaller than withinperiod correlation (0 < CAC < 1).
Now denote the ICC as \(\alpha \in \left(\mathrm{1,1}\right)\) such that \({\mathbb{C}}ov\left({Y}_{ijk},{Y}_{ijk{\prime}}\right)=\alpha \sqrt{{\upsilon }_{ijk}{\upsilon }_{ijk{\prime}}}\) for \(k{\prime}\ne k\) and the CAC as \(\rho \in \left[\mathrm{1,1}\right]\) such that \({\mathbb{C}}ov\left({Y}_{ijk},{Y}_{ij{\prime}k{\prime}}\right)=\alpha \rho \sqrt{{\upsilon }_{ij{\prime}k}{\upsilon }_{ijk{\prime}}}\) for \(j{\prime}\ne j\). Then the variance–covariance matrix of the binary outcomes is \({\mathbf{V}}_{i}={\mathbb{V}}ar\left({\mathbf{Y}}_{i}\right)={\mathbf{A}}_{i}^\frac{1}{2}{\mathbf{R}}_{i}\left(\alpha ,\rho \right){\mathbf{A}}_{i}^\frac{1}{2}\) for cluster \(i\), where
\({\mathbf{I}}_{n}\) is the \(n\times n\) identity matrix, \({\mathbf{J}}_{n}\) is an \(n\times n\) matrix with all elements equal to 1, and \({\mathbf{A}}_{i}={\text{diag}}\left({\upsilon }_{i11},\cdots ,{\upsilon }_{i1{m}_{i}},\cdots {\upsilon }_{iJ1},\cdots ,{\upsilon }_{iJ{m}_{i}}\right)\). Hence the GEE is
By solving the GEE (3), the resulting \({\widehat{\Theta }}_{m}\) satisfies
By ignoring the \({O}_{\mathbb{P}}\left({I}^{1}\right)\) term if \(I\) is sufficiently large, we have
As a result, \({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\) can be approximately calculated as
where \({{\varvec{\Sigma}}}_{i}={\mathbf{A}}_{i}^{\frac{1}{2}}{\mathbf{R}}_{i}\left(\alpha ,\rho \right){\mathbf{A}}_{i}^{\frac{1}{2}}\) for cluster \(i\). Equation (6) leads to a modelbased (naïve) variance estimator.
where \({\widehat{{\varvec{\Sigma}}}}_{i}={\mathbf{A}}_{i}^{\frac{1}{2}}{\mathbf{R}}_{i}\left(\widehat{\alpha },\widehat{\rho }\right){\mathbf{A}}_{i}^{\frac{1}{2}}\) with \(\widehat{\alpha }\) and \(\widehat{\rho }\) also obtained by solving the GEE (4). Note that for power calculation we do not have data. Instead, we have assumptions about the values of \(\alpha\) and \(\rho\). Hence we may compute \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) without any data analysis, i.e. \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) is a deterministic quantity rather than a random variable. We call power calculation method (2) or (3) based on the \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) with specific values of \(\alpha\) and \(\rho\) “GEE” for simplicity.
Biascorrection sandwich variance approaches
It is wellknown that \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\), the \(\left(J+3,J+3\right)\) th element of \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\), is less than the true value of \({\mathbb{V}}ar\left({\widehat{\theta }}_{3,m}\right)\) due to ignoring the \({O}_{\mathbb{P}}\left({I}^{1}\right)\) term particularly for cases with small \(I\). From the perspective of data analysis, if the number of clusters \(I\) is small, biascorrection techniques are recommended for obtaining \(\widehat{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) to mitigate increased risk of type I errors [24]. In this article, we consider two biascorrection techniques from data analysis to adjust \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\) for small number of clusters: 1) the Kauermann and Carroll (KC) [25] Correction; and 2) the Mancl and DeRouen (MD) [26] Correction. Both biascorrection techniques lead to the following modified form of approximation for \({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\):
where \({{\varvec{\Omega}}}_{i}\) is the corresponding residualmodified version of \({{\varvec{\Sigma}}}_{i}\) with the form
where the residuals modification matrices \({\mathbf{F}}_{i}\) is defined as one of
and
depending on the selection of biascorrection technique (KC or MD). The matrix \({\mathbf{H}}_{i}\) is defined as
Note that by writing
the right handside of (8) collapses to \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\), which means \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\Theta }}_{m}\right)\) is also a particular case of the right handside of (8) like the KC and MD cases. For this reason, we may distinguish our power calculation methods by the choice of \({\mathbf{F}}_{i}\) in the right handside of (8), namely, “GEE” refers to the method with \({\mathbf{F}}_{i}={\mathbf{I}}_{J{m}_{i}}\); “GEEKC” refers to the method with \({\mathbf{F}}_{i}={\mathbf{F}}_{i}^{{\text{KC}}}\); and “GEEMD” refers to the method with \({\mathbf{F}}_{i}={\mathbf{F}}_{i}^{{\text{MD}}}\). Again, we do not estimate \({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\) because \({\mathbf{I}}_{J{m}_{i}}\), \({\mathbf{F}}_{i}^{{\text{KC}}}\), and \({\mathbf{F}}_{i}^{{\text{MD}}}\) are known given the assumption of \({\mathbf{R}}_{i}\left(\alpha ,\rho \right)\) ‘s and the parameters therein at the design stage.. In other words, GEEKC and GEEMD are proposed to improve the predicted power based on GEE in the situation that number of clusters is small.
Simulation study
We performed a simulation study to examine the operating characteristics of the three proposed methods described in Sect. 2 to detect the interaction effect between intervention and binary individual covariate in crosssectional SWCRT. The simulation study was conducted using statistical software R version 4.2.2.
Scheme for generating simulation datasets
We simulate binary outcomes \({Y}_{ijk}\)’s based on marginal means \({\mathbf{u}}_{i}\) and correlation matrix \({\mathbf{R}}_{i}\left(\alpha ,\rho \right)\). Note that we have
where \({\mathbf{Y}}_{i}={\left[\begin{array}{cccccc}{Y}_{i11}& \cdots & {Y}_{i1m}& {Y}_{i21}& \cdots & {Y}_{iJm}\end{array}\right]}^{\intercal }.\)
To simulate \({\mathbf{Y}}_{i}\), we consider the copulabased method for multivariate binary outcomes [27, 28] which assumes \({Y}_{ijk}=1\left\{{Z}_{ijk}\le {\Phi }^{1}\left({\mu }_{ijk}\right)\right\}\), for \(j=1,\dots ,J\) and \(k=1,\dots ,m\), where \(1\left\{\cdot \right\}\) is the indicator function, \(\Phi\) is the standard normal CDF and the random vector, \({\mathbf{Z}}_{i}={\left[\begin{array}{cccccc}{Z}_{i11}& \cdots & {Z}_{i1m}& {Z}_{i21}& \cdots & {Z}_{iJm}\end{array}\right]}^{\intercal }\sim {N}_{Jm}\left(0,{{\varvec{\Xi}}}_{i}\right)\) for some correlation matrix \({{\varvec{\Xi}}}_{i}\in {\mathbb{R}}^{Jm\times Jm}\) (hence \({Z}_{ijk}\sim N\left(\mathrm{0,1}\right)\) for all \(j=1,\dots ,J\) and \(k=1,\dots ,m\) marginally). The elements of \({{\varvec{\Xi}}}_{i}\) corresponding to the location of \({Z}_{ijk}\) and \({Z}_{ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\) within \({\mathbf{Z}}_{i}\) are defined as
For all \(j,{j}{\prime}=1,\dots ,J\) and \(k,{k}{\prime}=1,\dots ,m\). To ensure that \({\mathbf{R}}_{i}\left(\alpha ,\rho \right)\) is the correlation matrix of \({\mathbf{Y}}_{i}\), \({\xi }_{ijk,ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\) is determined by solving the equation
where \({C}_{{\xi }_{ijk,ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}}\) is a bivariate Gaussian copula with correlation parameter \({\xi }_{ijk,ij\mathrm{^{\prime}}k\mathrm{^{\prime}}}\). We compute the bivariate copula components using the “pbinormcop” function of the R package “VGAM”.
Note that the resulting \({{\varvec{\Xi}}}_{i}\) is not always positively definite [29]. In this case, we need to modify this matrix to force it to become nonnegative definite such that we may simulate the random vector \({\mathbf{Z}}_{i}\) using the modified \({{\varvec{\Xi}}}_{i}\). An eigenvalue modification trick [30] is employed for this purpose. Values for parameters for the simulation study setups are shown in Table 1. We set \({\theta }_{0}={\text{log}}(0.15/0.85)\) representing 15% prevalence of outcome for control group with reference group of binary covariate. This prevalence rate came from our motivational example. We included increasing secular time trend by specifying \({\gamma }_{2}=0.1,{\gamma }_{3}=0.2,{\gamma }_{4}=0.3,\) and \({\gamma }_{5}=0.4\). In power calculation the secular time trend fixed effects should be determined either from existing literature or by a preliminary analysis. In the Supplementary materials of (16), a demonstration of the ‘power.ap’ function in the R package CRTpowerdist considers categorical time trend using coefficients 1, 2, 3, and 4 for Gaussian outcomes; for binary outcomes using logit link these values are too large for computation. As a result, we use 0.1, 0.2, 0.3, and 0.4 instead for demonstration purpose only. We simulated binary covariate, \({X}_{ijk}\) using the cluster prevalence levels 30% and 50%, respectively. These two values also came from our motivational example in which 30% of patients are Black and 48% of patients are female. We plan to evaluate the statistical operating characteristics of HTE when we have imbalanced and balanced binary covariates. For OTE, we set \({\theta }_{1}={\text{log}}\left(1.35\right)\) and \({\theta }_{1}={\text{log}}\left(1.68\right)\) representing a small and intermediate effect sizes for the intervention effect and \({\theta }_{2}={\text{log}}\left(1.5\right)\) for the binary covariate. The main quantity of interest is HTE, \({\theta }_{3}\). We considered two values, \({\text{log}}\left(1.5\right)\) and \({\text{log}}\left(2\right)\), to achieve an intermediate and large effect size, respectively.
SWCRT design related parameters
Relatively small number of clusters are often used in crosssectional SWCRT. A literature review found that 50% of 56 crosssectional SWCRTs reviewed had fewer than 10 clusters [31]. Therefore, we simulated eight clusters (I = 8), five time periods (J = 5), and several fixed cluster sizes (20, 40, 60, 80, and 120). Hence, the total numbers of subjects in this simulation study were 800, 1,600, 2,400, 3,200, and 4,800 individuals, respectively.
Since we used GEE models, we needed to specify working correlation matrix structure. Although there are three distinct correlation structures that are typically used in the crosssectional SWCRT: simple exchangeable correlation structure, nested exchangeable correlation structure, and exponential decay correlation structure, we decided to simulate the first two because they are most commonly used in simulation studies [17, 32, 33]. For both correlation structures used, we set ICC to be 0.1. We considered a CAC value of 0.8 for the nested exchangeable correlation structure.
Measure of operating characteristic for simulation study
In the simulation study we examined simulated power and empirical type I error rate for three proposed approaches from 1,000 generated datasets using the simulation setup shown in Table 1.
For the null scenario with \({\theta }_{3}=0\), the empirical Type I error rate (\({\psi }_{0}\)) was calculated as the proportion of false rejections among the 1,000 tests for \({\theta }_{3}\); for the nonnull scenarios (\({\theta }_{3}={\text{log}}\left(1.5\right)\) and \({\text{log}}\left(2\right)\)), the simulated power (\({\varphi }_{0}\)) was calculated as the proportion of correct rejections among the 1,000 tests for \({\theta }_{3}\). When fitting simulated samples using GEE, the naive standard error estimation is considered because modifications KC and MD are added in the approximation of \({\mathbb{V}}ar\left({\widehat{\Theta }}_{m}\right)\).
Comparison between simulated power and predicted power for sample size determination
We conduct a simulation study to examine sample size determination, provided in terms of cluster size, from each approach given the numbers of clusters and periods. We consider two distinct values for the prevalence of binary covariate (30% and 50%), the OTE effect size (\({\text{log}}\left(1.35\right)\) and \({\text{log}}\left(1.68\right)\)), and the HTE effect size (\({\text{log}}\left(1.5\right)\) and \({\text{log}}\left(2\right)\)).
The predicted powers/Type I errors are obtained by our proposed methods (GEE, GEEKC and GEEMD) using Eq. (2) given the design parameters. The simulated powers/empirical Type I errors are obtained by simulation based on the same set of design parameters. While the simulated powers are usually seen as the true powers for the given design due to the law of large numbers, they might be very timeconsuming depending on the complexity of the simulation scheme and the true value of OTE/HTE. For this reason, we expect to have priori information of a suitable range of cluster sizes such that our simulated power may achieve the target power (80% for example) without conducting simulation studies. To this end, mathematical approximation methods are employed to compute the predicted powers/Type I errors as fast feedbacks to guessing values of cluster sizes and to help determine the suitable cluster sizes.
Sensitivity analysis
In the sensitivity analysis for the three proposed approaches using different numbers of clusters and cluster sizes on simulated power, we consider two additional numbers of clusters (I = 20 and 40) and five additional cluster sizes (m = 20, 40, 60, 80, and 100) given a prevalence rate of 50% for the binary covariate and an intermediate OTE effect size (\({\theta }_{1}={\text{log}}\left(1.68\right)\)). Once again, we consider intermediate and large effect sizes for HTE, the quantity of interest. From different combinations of the number of clusters and cluster sizes, we examine whether an increase in the number of clusters improves statistical power compared to an increase in cluster size indirectly. We also examine the tradeoff between the number of clusters and cluster size in simulated power. We consider two total number of observations per step (160 and 800) and then two combinations of the number of clusters and cluster sizes for a given total number of observations. One pair represents small number of clusters and relatively bigger cluster size and another pair represents larger number of clusters and small cluster size. Hence, I = 8 & m = 20 and I = 40 & m = 4 for 160 observations per step and I = 8 & m = 100 and I = 40 & m = 20 for 800 observations per step. We also examine the impact of ICC and CAC on simulated power for the above setup. We consider three setups for ICC and CAC: ICC = 0.1 & CAC = 1; ICC = 0.1 & CAC = 0.8; and ICC = 0.05 & CAC = 0.4.
Results
Comparison of sample size determination
In Tables 2 and 3, we evaluate the difference between predicted standard error (SE) and empSE defined in Table 10 of [34] (\(\Delta\) SE), simulated power (\({\varphi }_{0}\)) and empirical Type I error (\({\psi }_{0}\)) based on the naïve and robust SEs of GEE for a determined cluster size satisfying 80% power from our three approaches using the following values (I = 8, J = 5, \({\theta }_{0}={\text{log}}\left(0.15/0.85\right)\)). We evaluate this for a simple exchangeable correlation structure (ICC = 0.1) and for a nested exchangeable correlation structure (ICC = 0.1, CAC = 0.8). We also consider two different values for effect sizes of OTE and HTE as well as the prevalence rate of the binary covariate. The cluster sizes are taken as multiples of ten intentionally to generate integer number of observations for both X = 0 and X = 1 within each cluster. For the same reason, only even numbers are taken into account under the scenario of prevalence = 50%. For all cases, empirical Type I errors fall into the 95% confidence interval of nominal significance level (5%). For both a simple exchangeable correlation structure and a nested exchangeable correlation structure, simulated powers of GEEKC and GEEMD are above the upper limit of the 95% confidence interval of the predicted power (\(\varphi\)) for both intermediate and large effect sizes of HTE. GEE maintains the simulated power inside of the 95% confidence interval of the predicted power in both correlation structures when naïve SE is used. If robust SE is used, simulated powers of GEE are also above the upper limit of the 95% confidence interval of the predicted power (\(\varphi\)) for both intermediate and large effect sizes. When we calculate the difference between simulated power (\({\varphi }_{0}\)) and predicted power (\(\varphi\)), GEE shows the smallest among the three approaches. This comes from relatively bigger sample size determination for the two biascorrection approaches provided.
Based on Tables 2 and 3, statistical significance is determined by the naïve SE, and GEEMD consistently overpredicts the cluster sizes while GEE consistently underpredicts the cluster sizes. However, when the statistical significance is determined by the robust SE, all three methods tend to consistently overpredict the cluster sizes due to the widelyacknowledged phenomenon that robust SE (without any bias correction) leads to inflated empirical typeI error [13, 35]. Hence the resulting cluster sizes from GEE and GEEMD may provide a priori range of cluster sizes such that the SWCRT design may achieve the target power when the data is analyzed with the naïve SE. However, if the data is analyzed with the robust SE, the resulting cluster sizes from GEE are sufficiently large, in which the empirical typeI error should be the major problem. Finally, testing the simulated power/empirical Type I error obtained from the cluster sizes within this range yields more accurate cluster size selection. Moreover, GEEKC also tends to overpredict the cluster sizes. Hence it is natural to consider the performance of the average cluster size between the cluster sizes obtained by GEE and GEEKC particularly for the low HTE scenarios when using naïve SE. We add this as a fourth method and report the empirical typeI error and power in Tables 2 and 3 as well. The corresponding predicted power is also directly calculated as the average of predicted powers obtained by GEE and GEEKC. In Supplementary materials, we show the tradeoff between the number of clusters and cluster size and impact of ICC and CAC on simulated power (Table S1A, Table S1B, and Table S1C). In general, when we have more clusters per step, simulated power increases and when we have more observations per step, the increase in simulated power is slightly bigger than that of fewer observations per step. When ICC and/or CAC decrease, simulated power decreases.
We also perform the same simulation study by replacing the required cluster sizes and predicted powers based on (2) by their analogues based on tdistribution with degrees of freedom equal to 4 (we have 8 clusters and 4 cluster level parameters \({\theta }_{0},{\theta }_{1},{\theta }_{2},{\theta }_{3}\), hence we choose the degrees of freedom 4 = 8–4 [36]). In Table S2A and B, the GEE method provides sufficiently large power compared with the corresponding predicted powers when the prevalence is 30%. For prevalence 50%, GEE method shows similar results as the scenarios based on normal Wald test. However, GEEKC and GEEMD tend to generate even larger power compared to the corresponding scenarios based on normal Wald test. For the empirical typeI error, the modelbased standard error for GEE leads to very small results for tdistribution as shown in Web Fig. 3 in the Supplementary materials of [13].
Simulated powers
We show simulated powers for a simple exchangeable correlation structure (Table 4) and a nested exchangeable correlation structure (Table 5). For both correlation structures, simulated powers are close to each other but a nested exchangeable correlation structure case has slightly lower power compared to a simple exchangeable correlation structure case. Among three approaches, GEE has the largest simulated powers and GEEMD has the smallest simulated power. This is a similar pattern as shown in [17] for the OTE. Given cluster size of 120, GEE has over 80% statistical power. When we have a balanced binary covariate (50%), simulated power increases compared to unbalanced binary covariate (30%) as shown in Tables 4 and 5. When we have a large effect size for HTE, simulated power is over 80% for both small and intermediate effect sizes of OTE.
Simulated powers in the sensitivity analysis
First, we examine the effect of different cluster sizes on the simulated power with the same setups. In Tables 6 and 7, five additional cluster sizes along with cluster size of 120 are considered (20, 40, 60, 80, and 100). With the intermediate effect size of HTE, only cluster sizes of 100 and 120 have more than 80% power using GEE for both correlation structures. With the large effect size of HTE, when cluster size is at least 60, all three approaches have more than 80% power. With cluster size of 40, GEE and GEEKC have more than 80% power.
Next, we examine the effect of different number of clusters on the simulated power. In Tables 8, 9, 10, and 11, we use 20 and 40 for the number of clusters, respectively. When we have 20 clusters, we have more than 80% power for all six different cluster sizes with the large effect size of HTE. When we have a cluster size of 60 or more, all three approaches have more than 80% power with the intermediate effect size of HTE (Tables 8 and 9). When we use 40 clusters, we have more than 80% power across all six different cluster sizes except GEEMD for cluster size of 20 (Tables 10 and 11).
When we compare the impact of increasing cluster size versus increasing the number of clusters, the latter has slightly higher gain in power. For cluster size change from 20 to 40 with 20 clusters, power increases from 53.1% to 82.1% for GEE; 50.6% to 79.7% for GEEKC; and 48.1% to 77.1% for GEEMD (Table 6 first two columns). When the number of clusters changes from 20 to 40 with cluster size fixed at 20, power increases from 53.1% to 82.1% for GEE; 50.6% to 81% for GEEKC; and 48.1% to 79.8% for GEEMD (Tables 8 and 10; column 1 each).
Power calculation for the example
Retrospective data for MSHS oncology practices show a racial disparity in who receives the goals of care (GoC) conversations: 35% of White patients get GoC while only 15% of minority patients get GoC. Hence, the primary objective of this SWCRT trial is to evaluate whether our intervention of a CDSS guided alert system, combined with training in communication and knowledge about the disparity provided to physicians, in the intervention arm results in a higher increase in the proportion of GoC in minority patients as compared to that in White patients. In other words, the main focus is the HTE, the interaction effect of intervention by race. Our hypothesis is that the rate of GoC increases from 15 to 30% (15% increase) in minority patients and from 35 to 40% (5% increase) in nonminority patients. Design parameters for this SWCRT are shown in Table 12. Based on this setup, statistical powers to detect the HTE were 17.8% for GEE, 15.4% for GEEKC, and 13.4% for GEEMD. Given a cluster size of 15 and 80% power, the effect size of HTE should be more than two times larger than the original value of HTE (= 1.96) in Table 12 (4.24 for GEE, 4.7 for GEEKC, and 5.51 for GEEMD). Given an HTE effect size of 1.96, bigger cluster sizes are needed (81 for GEE, 90 for GEEKC, and 117 for GEEMD).
Discussion
Detection of interactions between treatment effects and patient or cluster descriptors in SWCRT is critical for optimizing the healthcare delivery process. In this article, we propose three different approaches for determining sample size and statistical power for the interaction between binary intervention and patient level covariate in SWCRT. We show through an illustrative example that a much larger sample size is needed for detecting an interaction effect. In this work we focus on crosssectional SWCRT and we deal with two generic correlation structures and two biascorrection techniques. Hence, determination of cluster size should be developed for other types of design for SWCRT such as closed cohort and open cohort. Other types of biascorrection technique are also considered. In reality, the fixed cluster size assumption is too strong, and unequal cluster sizes should be considered in the sample size determination. In this article we consider binary outcomes, while an extension of this work to consider continuous and censored endpoints along with cluster level covariates is of interest for future study. In this article, we focus on the immediate treatment effect model, though we acknowledge there is the possibility of a timevarying treatment effect as proposed in [37]. Further research should be done for examining different patterns of treatment effect between subgroups.
It is worth noting that in Tables 2 and 3, the empirical Type I errors (\({\psi }_{0}\)) are wellmaintained, which deviates from the pattern of inflated empirical Type I error observed by previous studies [13, 36]. By comparing our study design with these previous studies, we found two potential reasons for the empirical Type I errors not to be inflated: 1) the treatment effect of interest (in this study we focus on HTE while the previous studies [13, 36] focus on the OTE, in Web Fig. 3 in the Supplementary materials of [13] the resulting empirical typeI error under scenarios with small number of clusters is slightly inflated above the 95% upper limit of 0.05 even with the modelbased SE); 2) the choices of ICC and CAC for the simulation (in [13] the ICC is chosen to represent the small correlations commonly reported in parallel CRTs. In such cases, using modelbased SE might be punished by higher level of misspecification of correlation structure while the robust SE fails to converge due to small number of clusters. As a result, both choices are likely to cause the inflation of empirical typeI error). To investigate these two potential reasons, we compute the difference between the empirical variances based on 1,000 simulations and the modelbased calculated variances as shown in Table 13, which indicates that the Naïve (modelbased) SE estimator underestimates the true SE (consider the empirical SE as true) for all cases as is wellknown. However, when estimating OTE \({\theta }_{1}\) the difference \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{1,m}\right)\widehat{{\mathbb{V}}ar}\left({\widehat{\theta }}_{1,m}\right)\) is quite sensitive to the selection of the ICC \(\alpha\) and CAC \(\rho\); while using the same datagenerating procedure to estimate the HTE \({\theta }_{3}\) the difference \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\widehat{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\) seems to be less sensitive to the selection of the CAC \(\rho\). In APPENDIX we show the empirical and modelbased calculated variances. The effect of CAC \(\rho\) on the modelbased \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{1,m}\right)\) seems to disappear due to replacing \({{\text{N}}}_{i}\) with \({{\text{M}}}_{i}\) in the inversion of matrix. Similar results can be found in [22] where empirical typeI errors are stably located between 0.04 to 0.06 across all scenarios compared with the inflated typeI error from Fig. 2 of [13]. Last but not least, our selection of \({\theta }_{1}={\text{log}}\left(1.68\right)\) and \({\theta }_{2}={\text{log}}\left(1.5\right)\) happens to reduce the difference between \(\widehat{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\) and \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\) for the cases with \(\rho =1\), which leads to even better results of empirical Type I error in Table 10.
The empirical Type I errors for all settings in Table 13 are given in Table 14. We observe the inflated Type I errors for cases with \(\rho =0.5\) when estimating the OTE. As to the HTE, the empirical Type I errors are wellmaintained for all cases. We acknowledge that there might be other factors affecting the difference \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\widehat{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\) as well as the performance of empirical Type I errors, such as the prevalence of the covariate \({X}_{ijk}\) within each cluster. Thus, the situation might be more complicated than our observation from Tables 13 and 14. We believe that further investigations are needed to explore how the difference \(\widetilde{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\widehat{{\mathbb{V}}ar}\left({\widehat{\theta }}_{3,m}\right)\) might be controlled such that the empirical Type I error can be maintained well.
Our work highlights the importance of properly conducting sample size calculations around the interaction of the exposure and patient level covariate in health disparities research. With the need to introduce interventions aimed at closing the racial gap, properly designing and powering these trials is critical in ensuring statistical results are reliable. The previous trials mentioned in the Background section introduce important and promising interventions, but further potential exists to more directly target the interaction effect. In the case of Cox et al.’s trial [9], the ICU web app is targeted to benefit Black and White families with the same effect size. While beneficial, this alone will not reduce the existing baseline gap between the two subgroups. In the case of Ejem et al.’s trial [8], limited information is provided on how to recreate the racialdisparity targeted sample size calculation. Our work aims to provide researchers with a transparent and accessible tool to design their own SWCRTs around similar interaction effects. We also acknowledge that the field of health disparities research requires a thoughtful, comprehensive, and nuanced approach. While studying the interaction term is important, investigators should not solely rely on this term when making conclusions about disparities. In addition to the presence or absence of a statistically significant interaction, additional factors may exacerbate health disparities, such as the distribution of the exposure and outcome prevalence across subgroups [38]. We also note that the detection and interpretation of the interaction term is scaledependent. The comparison of changes in the interaction term may be different when interpreted on a relative risk scale versus a risk difference scale. In this paper we focus the interaction on the odds ratio scale [39]. Our work is one step in the comprehensive framework of working to reduce disparities in healthcare.
Conclusion
We propose three approaches for cluster size determination given the number of clusters. These methods can be also applied for determining the number of clusters given cluster size. GEE has reasonable operating characteristics for determining cluster size in both intermediate and large effect sizes of HTE. Both GEEKC and GEEMD provide relatively large sample sizes compared to GEE. R codes used for this manuscript are available in the Supplementary material.
Availability of data and materials
The datasets analyzed in the simulation study were generated from code available in the GitHub repository: https://github.com/DeukwooKwon/HTESWCRT.
Abbreviations
 SWCRT:

SteppedWedge ClusterRandomized Trials
 GEE:

Generalized estimating equations
 OTE:

Overall treatment effect
 HTE:

Heterogeneity of treatment effect
 KC:

Kauermann and Carroll (correction)
 MD:

Mancl and DeRouen (correction)
References
Williams DR, Mohammed SA, Leavell J, Collins C. Race, socioeconomic status, and health: complexities, ongoing challenges, and research opportunities. Ann N Y Acad Sci. 2010;86:69–101. https://doi.org/10.1111/j.17496632.2009.05339.x.
Mateo CM, Williams DR. Racism: a fundamental driver of racial disparities in healthcare quality. Nat Rev Dis Primers. 2021;7(1):20. https://doi.org/10.1038/s41572021002581.
Ayanian JZ. The costs of racial disparities in health care. Harvard Business Review. 2015;93(10)
Dunlop DD, Manheim LM, Song J, Chang RW. Gender and ethnic/racial disparities in health care utilization among older adults. J Gerontol B Psychol Sci Soc Sci. 2002;57(4):S221–33. https://doi.org/10.1093/geronb/57.4.S221.
Lee K, Gani F, Canner JK, Johnston FM. Racial disparities in utilization of palliative care among patients admitted with advanced solid organ malignancies. Am J Hospice Palliat Med®. 2020;38(6):539–46. https://doi.org/10.1177/1049909120922779.
Lee KT, George M, Lowry S, Ashing KT. A review and considerations on palliative care improvements for african americans with cancer. Am J Hospice Palliat Med®. 2020;38(6):671–7. https://doi.org/10.1177/1049909120930205.
Chuang E, Hope AA, Allyn K, Szalkiewicz E, Gary B, Gong MN. Gaps in Provision of Primary and Specialty Palliative Care in the Acute Care Setting by Race and Ethnicity. J Pain Symptom Manage. 2017;54(5):645653 e1. https://doi.org/10.1016/j.jpainsymman.2017.05.001.
Ejem DB, Barrett N, Rhodes RL, et al. Reducing disparities in the quality of palliative care for older African Americans through improved advance care planning: study design and protocol. J Palliat Med. 2019;22(S1):90–100. https://doi.org/10.1089/jpm.2019.0146.
Cox CE, Riley IL, Ashana DC, et al. Improving racial disparities in unmet palliative care needs among intensive care unit family members with a needstargeted app intervention: The ICUconnect randomized clinical trial. Contemp Clin Trials. 2021;103:106319. https://doi.org/10.1016/j.cct.2021.106319.
Lopez L, Green AR, TanMcGrory A, King R, Betancourt JR. Bridging the digital divide in health care: the role of health information technology in addressing racial and ethnic disparities. Jt Comm J Qual Patient Saf. 2011;37(10):437–45. https://doi.org/10.1016/s15537250(11)370559.
Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ. 2015;350:h391–h391. https://doi.org/10.1136/bmj.h391.
Hemming K, Taljaard M. Sample size calculations for stepped wedge and cluster randomised trials: a unified approach. J Clin Epidemiol. 2016;69:137–46. https://doi.org/10.1016/j.jclinepi.2015.08.015.
Li F, Turner EL, Preisser JS. Sample size determination for GEE analyses of stepped wedge cluster randomized trials. Biometrics. 2018;74(4):1450–8. https://doi.org/10.1111/biom.12918.
Zhou X, Liao X, Kunz LM, Normand ST, Wang M, Spiegelman D. A maximum likelihood approach to power calculations for stepped wedge designs of binary outcomes. Biostatistics. 2020;21(1):102–21. https://doi.org/10.1093/biostatistics/kxy031.
Wang J, Cao J, Zhang S, Ahn C. Sample size determination for stepped wedge cluster randomized trials in pragmatic settings. Stat Methods Med Res. 2021;30(7):1609–23. https://doi.org/10.1177/09622802211022392.
Ouyang Y, Xu L, Karim ME, Gustafson P, Wong H. CRTpowerdist: An R package to calculate attained power and construct the power distribution for crosssectional steppedwedge and parallel cluster randomized trials. Comp Methods Progr Biomed. 2021;208:106255. https://doi.org/10.1016/j.cmpb.2021.106255.
Harrison LJ, Wang R. Power calculation for analyses of crosssectional steppedwedge cluster randomized trials with binary outcomes via generalized estimating equations. Stat Med. 2021;40(29):6674–88. https://doi.org/10.1002/sim.9205.
Chen J, Zhou X, Li F, Spiegelman D. swdpwr: A SAS macro and an R package for power calculations in stepped wedge cluster randomized trials. Comput Methods Progr Biomed. 2022;213:106522. https://doi.org/10.1016/j.cmpb.2021.106522.
Ouyang Y, Li F, Preisser JS, Taljaard M. Sample size calculators for planning steppedwedge cluster randomized trials: a review and comparison. Int J Epidemiol. 2022;51(6):2000–13. https://doi.org/10.1093/ije/dyac123.
O’Connor EA, Vollmer WM, Petrik AF, Green BB, Coronado GD. Moderators of the effectiveness of an intervention to increase colorectal cancer screening through mailed fecal immunochemical test kits: results from a pragmatic randomized trial. Trials. 2020;21(1):91. https://doi.org/10.1186/s1306301940277.
Tong G, Taljaard M, Li F. Sample size considerations for assessing treatment effect heterogeneity in randomized trials with heterogeneous intracluster correlations and variances. Stat Med. 2023;42(19):3392–412. https://doi.org/10.1002/sim.9811.
Yang S, Li F, Starks MA, Hernandez AF, Mentz RJ, Choudhury KR. Sample size requirements for detecting treatment effect heterogeneity in cluster randomized trials. Stat Med. 2020;39(28):4218–37. https://doi.org/10.1002/sim.8721.
Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, Bruckner T, Satariano WA. To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010;21(4):467–74. https://doi.org/10.1097/EDE.0b013e3181caeb90. (PMID: 20220526).
Kahan BC, Forbes G, Ali Y, et al. Increased risk of type I errors in cluster randomised trials with small or medium numbers of clusters: a review, reanalysis, and simulation study. Trials. 2016;17(1):438. https://doi.org/10.1186/s1306301615712.
Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. 2001;96(456):1387–96. https://doi.org/10.1198/016214501753382309.
Mancl LA, DeRouen TA. A covariance estimator for GEE with improved smallsample properties. Biometrics. 2001;57(1):126–34. https://doi.org/10.1111/j.0006341x.2001.00126.x.
Emrich LJ, Piedmonte MR. A method for generating highdimensional multivariate binary variates. Am Stat. 1991;45(4):302–4. https://doi.org/10.1080/00031305.1991.10475828.
Wang J, Cao J, Zhang S, Ahn C. Sample size and power analysis for stepped wedge cluster randomised trials with binary outcomes. Stat Theory Relat Fields. 2021;5(2):162–9. https://doi.org/10.1080/24754269.2021.1904094.
Rebonato R, Jäckel P. The most general methodology for creating a valid correlation matrix for risk management and option pricing purposes. Journal Risk. 2000;2(2):17–27. https://doi.org/10.21314/jor.2000.023.
Fixing non positive definite correlation matrices using R. Rbloggers. Accessed Jan. 26, 2023. https://www.rbloggers.com/2012/10/fixingnonpositivedefinitecorrelationmatricesusingr2/
Barker D, McElduff P, D’Este C, Campbell MJ. Stepped wedge cluster randomised trials: a review of the statistical methodology used and available. BMC Med Res Methodol. 2016;16:69. https://doi.org/10.1186/s1287401601765.
Barker D, D’Este C, Campbell MJ, McElduff P. Minimum number of clusters and comparison of analysis methods for cross sectional stepped wedge cluster randomised trials with binary outcomes: a simulation study. Trials. 2017;18(1):119. https://doi.org/10.1186/s1306301718622.
Tian Z, Preisser JS, Esserman D, Turner EL, Rathouz PJ, Li F. Impact of unequal cluster sizes for GEE analyses of stepped wedge cluster randomized trials with binary outcomes. Biom J. 2022;64(3):419–39. https://doi.org/10.1002/bimj.202100112.
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–102. https://doi.org/10.1002/sim.8086.
Ford WP, Westgate PM. Maintaining the validity of inference in smallsample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations. Stat Med. 2020;39(21):2779–92. https://doi.org/10.1002/sim.8575.
Li P, Redden DT. Small sample performance of biascorrected sandwich estimators for clusterrandomized trials with binary outcomes. Stat Med. 2015;34(2):281–96. https://doi.org/10.1002/sim.6344.
Kenny A, Voldal EC, Xia F, Heagerty PJ, Hughes JP. Analysis of stepped wedge cluster randomized trials in the presence of a timevarying treatment effect. Stat Med. 2022;41(22):4311–39. https://doi.org/10.1002/sim.9511.
Ward JB, Gartner DR, Keyes KM, Fliss MD, McClure ES, Robinson WR. How do we assess a racial disparity in health? Distribution, interaction, and interpretation in epidemiological studies. Ann Epidemiol. 2019;29:1–7. https://doi.org/10.1016/j.annepidem.2018.09.007.
VanderWeele TJ, Knol MJ. A tutorial on interaction. Epidemiologic Methods. 2014;3(1):33–72. https://doi.org/10.1515/em20130005.
Acknowledgements
This research was supported by the National Cancer Institute and the National Center for Advancing Translational Sciences (NCATS) (R56CA267957, P30CA196521, U01TR00299701A1, P30AG028741, UL1 TR004419) and unrelated to this work (R35CA220491, R01 CA224319, P01 AG066605, U01CA121947), awarded to the Tisch Cancer Institute (TCI) of the Icahn School of Medicine at Mount Sinai (ISMMS) and the Biostatistics/ Epidemiology/ Research Design (BERD) component of the Center for Clinical and Translational Sciences (CCTS) for this project (UL1TR003167), funded by NCATS, awarded to the University of Texas Health Science Center at Houston.
Funding
Dr. Mazumdar receives grant funding paid to her institution for grants related to this work from NCI and NCATS (R56CA267957, P30CA196521, U01TR00299701A1, P30AG028741, UL1 TR004419) and unrelated to this work from NCI and NCATS (R35CA220491, R01 CA224319, P01 AG066605, U01CA121947) Dr. Kwon was partially supported by the support provided by the Biostatistics/ Epidemiology/ Research Design (BERD) component of the Center for Clinical and Translational Sciences (CCTS) for this project that is currently funded through a grant (UL1TR003167), funded by the National Center for Advancing Translational Sciences (NCATS), awarded to the University of Texas Health Science Center at Houston.
Author information
Authors and Affiliations
Contributions
CY, AB, MM, and DK conceived the study. CY and DK conducted the simulation study, wrote the first draft of the manuscript. AB and MM provided critical review of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
APPENDIX.
Additional file 2: Table S1A.
Calculated powers for three effect sizes in combination with number of observations=160 and 800 per step, prevalence rate=50%, OTE=log(1.68) for a simple exchangeable correlation structure (ICC=0.1 and CAC=1). Table S1B. Calculated powers for three effect sizes in combination with number of observations=160 and 800 per step, prevalence rate=50%, OTE=log(1.68) for a nested exchangeable correlation structure (ICC=0.1 and CAC=0.8). Table S1C. Calculated powers for three effect sizes in combination with number of observations=160 and 800 per step, prevalence rate=50%, OTE=log(1.68) for a nested exchangeable correlation structure (ICC=0.05 and CAC=0.4). Table S2A. Required cluster sizes (based on \({t}_{4}\)distribution) to achieve 80% predicted power to detect interaction effect sizes for a simple exchangeable correlation structure with the number of clusters=8 and ICC=0.1. Table S2B. Required cluster sizes (based on \({t}_{4}\)distribution) to achieve 80% predicted power to detect interaction effect sizes for a nested exchangeable correlation structure with the number of clusters=8, ICC=0.1, and CAC=0.8.
Additional file 3.
R codes for power calculation and simulation.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Yang, C., Berkalieva, A., Mazumdar, M. et al. Power calculation for detecting interaction effect in crosssectional steppedwedge cluster randomized trials: an important tool for disparity research. BMC Med Res Methodol 24, 57 (2024). https://doi.org/10.1186/s12874024021620
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874024021620