 Research
 Open Access
 Published:
Accounting for expected attrition in the planning of cluster randomized trials for assessing treatment effect heterogeneity
BMC Medical Research Methodology volume 23, Article number: 85 (2023)
Abstract
Background
Detecting treatment effect heterogeneity is an important objective in cluster randomized trials and implementation research. While sample size procedures for testing the average treatment effect accounting for participant attrition assuming missing completely at random or missing at random have been previously developed, the impact of attrition on the power for detecting heterogeneous treatment effects in cluster randomized trials remains unknown.
Methods
We provide a sample size formula for testing for a heterogeneous treatment effect assuming the outcome is missing completely at random. We also propose an efficient Monte Carlo sample size procedure for assessing heterogeneous treatment effect assuming covariatedependent outcome missingness (missing at random). We compare our sample size methods with the direct inflation method that divides the estimated sample size by the mean followup rate. We also evaluate our methods through simulation studies and illustrate them with a realworld example.
Results
Simulation results show that our proposed sample size methods under both missing completely at random and missing at random provide sufficient power for assessing heterogeneous treatment effect. The proposed sample size methods lead to more accurate sample size estimates than the direct inflation method when the missingness rate is high (e.g., ≥ 30%). Moreover, sample size estimation under both missing completely at random and missing at random is sensitive to the missingness rate, but not sensitive to the intracluster correlation coefficient among the missingness indicators.
Conclusion
Our new sample size methods can assist in planning cluster randomized trials that plan to assess a heterogeneous treatment effect and participant attrition is expected to occur.
Contributions to the literature

No previous studies have formally investigated how attrition can affect the sample size estimation in cluster randomized trials when the objective is to assess treatment effect heterogeneity.

We provide a sample size formula for testing a heterogeneous treatment effect assuming the outcome is missing completely at random.

We describe an efficient Monte Carlo sample size procedure for assessing a heterogeneous treatment effect assuming covariatedependent outcome missingness.

We found that the intracluster correlation coefficient among the missingness indicators has a limited impact on the power of heterogeneous treatment effect analysis in cluster randomized trials.
Background
Cluster randomized trials (CRTs) correspond to a study design that allocate intervention at the group or community level and are increasingly popular in implementation science research [1, 2]. An essential component in planning CRTs is to estimate the sample size that provides sufficient statistical power for detecting a clinically meaningful effect size [3, 4]. When randomizing clusters, the intracluster correlation coefficient (ICC)—a quantity that measures the similarity in outcomes among units within the same cluster—is a driving factor for variance inflation and must be accounted for [5]. There has been a growing interest in studying the heterogeneous treatment effects (HTE) in clinical and public health research, particularly for studies with a health equity objective [6, 7]. Broadly, HTE refers to the differences in treatment effects across different subpopulations defined by levels of baseline covariates or effect modifiers (e.g., age, sex, education) and is modeled through treatment by covariate interaction terms. While there are different types of HTE in CRTs, our focus in this article is the systematic HTE that can be explained by measured baseline covariates, rather than the unexplained HTE across clusters that may be accounted for by a random treatment effect [8]. Accounting for treatment effect heterogeneity is important in CRTs for several considerations. First, an interaction term representing treatment effect heterogeneity in the analytical model is essential for testing and estimating differential treatment effects in patient subpopulations. Second, Tong et al. [9] and Li et al. [10] have demonstrated in different CRT designs that accounting for treatment effect heterogeneity can lead to a more efficient average treatment effect estimator. In other words, failure to account for treatment effect heterogeneity could even reduce the power to study the average treatment effect.
Understanding treatment effect heterogeneity across different subgroups (e.g., age, sex, education) is not uncommon in cluster randomized trials (CRTs) [11]. For instance, according to a systematic review [12] of 64 CRTs assessing cardiovascular and chronic respiratory disease interventions, 18 out of 64 conducted the analysis with patientlevel baseline covariates. As another broad example, the Consolidated Standards of Reporting Trials (CONSORT) extension in 2017 has encouraged investigators to explicitly formulate health equity objectives as the trial’s primary objective [7]. In addition, the health equity best practices guidance document [13] (item 3) developed by the National Institute on Aging (NIA) IMbedded Pragmatic Alzheimer’s disease (AD), and ADRelated Dementias (AD/ADRD) Clinical Trials (IMPACT) Collaboratory also included “Be explicit in the sample size justifications with regard to health equity objectives” as a recommended practice for AD/ADRD pragmatic trials (personal communication), many of which randomize nursing homes instead of individual patients. Development on HTEbased sample size procedures can therefore respond to this emerging need. Specifically, the methods developed in this work can be used in several settings. For studies whose primary interest is HTE (such as studies addressing health equity as a primary objective), our sample size methods provide tools to design a CRT with adequate power, accounting for missing outcomes. For studies whose primary interest is ATE but still hope to study HTE as a prespecified secondary objective, our sample size tools can formally quantify the power for that secondary objective. In other words, one can assess if sufficient power can already be obtained for testing an HTE given a sample size already calculated based on the primary goal of studying ATE. For studies that are interested both in ATE and HTE, choosing a conservative sample size (maximum) from the ATE or HTE objective could be a feasible approach, in cases where there is no need for multiplicity adjustment.
Recent studies introduced methods to plan CRTs for assessing the systematic HTE with prespecified effect modifiers such as sex and age [9, 14]. Yang et al. [14] proposed sample size methods for testing HTE in CRTs with a continuous endpoint and found that the sample size is influenced not only by the outcome ICC but also the covariate ICC, a quantity which measures the degree of similarity between effect modifiers within the same cluster [15]. Tong et al. [9] generalized their sample size procedure for unequal cluster sizes. They found that variable cluster sizes lead to negligible power loss for testing HTE with an individuallevel effect modifier. The sample size procedure for assessing HTE has also been extended to accommodate threelevel CRTs [10]. These developments respond to the need for CRT methods to generate knowledge on how individuals may respond differently to interventions or how the intervention may reduce existing disparity in outcomes between subgroups.
A potential limitation of existing formulas is that they assume complete followup of individuals and clusters, and therefore the impact of attrition on the power of the HTE test remains unknown [16,17,18]. Previous studies on the impact of attrition for planning CRTs focused on the average treatment effect and were typically under the missing completely at random (MCAR) assumption [19, 20]. Taljaard et al. [21] developed the sample size methods to account for expected attrition in testing the average treatment effect under MCAR, and found that the direct approach that simply inflates the sample size by mean followup rate can overestimate the sample size. Xu et al. [22, 23] proposed sample size methods to address outcome attrition for continuous and binary outcomes in matchedpair CRT design. Outside the CRT context, Zhu et al. [24] and Zhang et al. [25] studied attrition with the matchedpair design under the generalized estimating equations (GEE) framework and proposed a sample size formula for continuous and binary outcomes. Moreover, several studies concerning the missing data in longitudinal studies also developed sample size methods that may be applied to CRTs. For example, Roy et al. [3] developed a sample size method to address attrition in a hierarchical longitudinal design that permits differential dropout rates. Wang et al. [26] compared power methods for longitudinal data under monotone missing at random (MAR) assumption.
To date, no previous studies have formally investigated how attrition can affect the sample size estimation in CRTs when the objective is to assess treatment effect heterogeneity. This paper bridges the gap by contributing sample size procedures with outcome attrition under both the MCAR and MAR mechanisms. We provide a closedform sample size formula for the MCAR and discuss relevant insights. For the second mechanism, we assume the effect modifier of interest is predictive for outcome attrition, and describe an efficient Monte Carlo approach for sample size estimation. The rest of the paper is organized as follows. Testing HTE with an individuallevel effect modifier section formulates the problem by introducing both the outcome model for the analysis and the missingness model. Accounting for expected attrition under MCAR section and Accounting for expected attrition under MAR section introduce our sample size methods for detecting HTE to allow for expected attrition under both MCAR and MAR. Results section presents a simulation study to validate our sample size procedures. Discussion section provides an illustration based on a realworld data example from the Work, Family, and Health Study [27]. Throughout, we compare our sample size procedures to the direct inflation approach (i.e., obtaining the sample size assuming no attrition, and then inflating it with the mean followup rate). Conclusions section concludes.
Methods
Testing HTE with an individuallevel effect modifier
We first review the typical formulation for testing confirmatory HTE in a twoarm CRT under the linear mixed model framework. We define \({Y}_{ij}\) as the continuous outcome for \(i\)th cluster and \(j\)th individual, \(i\in \{1,\dots ,n\}\), \(j\in \{1,\dots ,m\}\), where \(n\) is the total number of clusters\(; m\) is the common cluster size typically assumed in study planning. Define the clusterlevel treatment indicator as \({W}_{i}\) with \({W}_{i}=1\) if a cluster is randomized to the intervention, and \({W}_{i}=0\) if randomized to the control. We focus on a single, individuallevel effect modifier \({X}_{ij}\). Then, the linear mixed model accommodating the treatment by covariate interaction can be written as,
where \({\beta }_{1}\), \({\beta }_{2}\), \({\beta }_{3}\), and \({\beta }_{4}\) are intercept, treatment main effect, covariate main effect, and treatmentbycovariate interaction effect; \({\mu }_{i}\sim N\left(0,{\sigma }_{\mu }^{2}\right)\)is the random intercept accounting for the withincluster correlation; \epsilon_{ij}~N(0, \sigma_\epsilon^2) is the residual error. The variance formula of the HTE estimator \({\widehat{\beta }}_{4}\) has been characterized in Yang et al. [14] as
where \({{\rho }}_{\text{x}}\) is the covariate ICC (which quantifies the ratio of betweencluster covariate variation to the total covariate variation), \({{\upsigma }}_{\text{x}}^{2}\) is the marginal variance of the effect modifier, \({{\uprho }}_{\text{y}\text{x}}=\frac{{{\upsigma }}_{{\upmu }}^{2}}{{{\upsigma }}_{\text{y}\text{x}}^{2}}\) is the adjusted outcome ICC (which quantifies the ratio of betweencluster outcome variation and the total outcome variance), and \({{\upsigma }}_{\text{y}\text{x}}^{2}={{\upsigma }}_{{\upmu }}^{2}+{{\upsigma }}_{\epsilon}^{2}\) is the total adjusted variance components.
Here, \({\rho }_{x}\) is the counterpart of outcome ICC and can be defined as \({\rho }_{x}=Cov\left({X}_{ij},{X}_{ik}\right)/{\sigma }_{x}^{2}\), for \(j\ne k\) where \(Cov\left({X}_{ij},{X}_{ik}\right)\) represents the common covariance between effect modifiers observed for any two individuals \(j\) and \(k\) in a given cluster \(i\). For a twosided \(z\)test with type I error rate \(\alpha\) to achieve a power of \(\left(1\zeta \right)\), the required number of clusters for testing a prespecified effect size of \(\delta\) is,
where \({z}_{q}\) is the \(q\)quantile of the standard normal distribution, and \({\sigma }_{w}^{2}\) is the Bernoulli variance of the treatment indicator. Under a balanced 1:1 randomization, \({\sigma }_{w}^{2}=1/4\).
The above sample size procedure has been extended to scenarios with randomly varying cluster sizes [9]. Assume the cluster sizes follow from a common distribution of \(m_i\sim f(m_i)\) with finite first and second moments as \(\stackrel{}{m}\) and \({\sigma }_{m}^{2}+{\stackrel{}{m}}^{2}\). We can define the coefficient of variation (CV) of cluster sizes as, \(\text{C}\text{V}={\sigma }_{m}/\stackrel{}{m}\) [28,29,30,31]. Tong et al. [9] derived a multiplicative correction factor (CF) for the sample size requirement of HTE test with a continuous outcome as a function of the mean cluster size and CV,
The impact of this multiplicative correction factor depends on the relative size of \({\rho }_{x}\) and \({\rho }_{yx}.\) It is equal to one if \({\rho }_{x}={\rho }_{yx}\), below one if \({\rho }_{x}>{\rho }_{yx}\), and above one if \({\rho }_{x}<{\rho }_{yx}\). As shown numerically in Fig. 1 in Tong et al. [9], this correction factor is almost always 1 with a small CV of cluster size (CV = 0.3). With an extreme CV of cluster size (CV = 0.9), when the cluster size is 100, it is frequently close to 1 except when the covariate ICC approaches one. However, when the cluster size becomes smaller (e.g., 20), the correction factor is only close to 1 when the covariate ICC falls below 0.5 across a common range of outcome ICC (from 0 to 0.2). Therefore, \(\text{C}\text{F}\left(\stackrel{}{m},CV\right)\) in general has little impact on the sample size requirement.
Accounting for expected attrition under MCAR
We propose to modify the above sample size procedure under expected attrition. Assuming a binary missingness indicator \({O}_{ij}\) such that \({O}_{ij}=1\) if the outcome \({Y}_{ij}\) is observed and \({O}_{ij}=0\) if the outcome is missing. We assume the proportion of observed outcomes or average followup rate as \({Pr}\left({O}_{ij}=1\right)=\pi\). Because the missingness is likely correlated within the same cluster due to cluster randomization, we assume a compound symmetric correlation structure of missingness similar to Taljaard et al. [21] The ICC between \(j\)th and \(k\)th individual missingness within \(i\)th cluster is defined as \(corr\left({O}_{ij},{O}_{ik}\right)=\tau\) for \(j\ne k\); the ICC between two individual missingness indicators in different clusters as \(corr\left({O}_{ij},{O}_{{i}^{{\prime }}k}\right)=0\) for \(i\ne {i}^{{\prime }}\); and by definition the ICC with itself is \(corr\left({O}_{ij},{O}_{ij}\right)=1\). To ensure that the correlation matrix for the missingness indicator is positive definite, \(\tau\) is bounded with \(1/(m1)\le \tau \le\)1 [32]. Importantly, the lower bound is reached when the missingness is independent at the individual level, whereas the upper bound is reached when the missingness indicator for all individuals within a cluster takes the same value (cluster attrition) [21]. Therefore, this formulation accommodates the loss to follow up at both the individual and cluster levels.
Under MCAR, the missingness is independent from both outcome and covariate, such that \({O}_{ij}\perp \{{Y}_{ij},{X}_{ij},{W}_{i}\)}. We have the number of observed outcomes in each cluster as \({m}_{i}=\sum _{j=1}^{m}{O}_{ij}\). The expected number of observed outcomes (observed cluster size) is \({m}^{c}=\sum _{j=1}^{m}Pr({O}_{ij}=1)=\pi m\), and the variance of the observed cluster size for each cluster is \({\sigma }_{{m}^{c}}^{2}=\sum _{j=1}^{m}Var\left({O}_{ij}\right)+{\sum }_{j\ne {j}^{{\prime }}}Cov\left({O}_{ij},{O}_{i{j}^{{\prime }}}\right)=\pi (1\pi )m\{1+\tau (m1\left)\right\}\). Hence the coefficient of variation of the observed cluster size becomes
An important insight under MCAR is that the expected attrition leads to randomly varying cluster sizes, and therefore we can modify the formula developed in Tong et al. [9] to address attrition. Specifically, we insert \(\text{C}\text{V}\) and \({m}^{c}\) into Eqs. (2) and (3), which gives,
where,
Here, \(\text{C}\text{F}\left(\pi ,\tau \right)\) is the multiplicative correction factor of sample size under MCAR. Interestingly, the correlation between missingness only enters the formula through \(\text{C}\text{F}\left(\pi ,\tau \right)\), which increases with \(\tau\), suggesting the power loss is larger when the missingness correlation is higher. In two special cases, the lower bound of \(\tau\) is \(1/(m1)\), and \(\text{C}\text{F}\left(\pi ,\tau \right)=1\) when this lower bound of \(\tau\) is reached, which indicates the missingness modifies the cluster size by multiplying the mean followup rate; the upper bound of \(\tau\) is \(1\), where the maximum of \(\text{C}\text{F}\left(\pi ,\tau \right)\) is reached (cluster attrition). However, the correction factor often takes values close to one, (Tong G, Taljaard M, Li F: Sample size considerations for assessing treatment effect heterogeneity in randomized trials with heterogeneous intracluster correlations and variances, under review) and therefore, we anticipate that \(\tau\) only minimally affect the sample size with an individuallevel effect modifier given fixed \(\pi\).
We compare the sample size formula in Eq. (4) to the direct approach that inflates the sample size results in Eq. (2) by the average followup rate \(\pi\) as
To facilitate the illustration, we provide contour plots in Fig. 1 that compares the new sample size formula we proposed versus the direct inflation approach assuming \(m\in \left\{\text{20,100}\right\}\), \(\pi =0.6\), and \(\tau \in \left\{\text{0.05,0.6,1}\right\}\). The ratio of \({n}_{1}\) to \({n}_{0}\) is plotted with the outcome ICC in 0 to 0.2 and covariate ICC in 0 to 1. Several patterns emerge. First, the ratio of \({n}_{1}\) to \({n}_{0}\) is always smaller than 1 across all the panels in Fig. 1, suggesting that the direct inflation approach overestimates the sample size when \(\tau\) is relatively small; this finding is consistent with that in Taljaard et al. [21] for testing the average treatment effect. However, when \(\tau\) is large, the ratio becomes slightly larger, showing that the direct inflation approach would be less conservative. Second, the accuracy of the direct inflation approach is mainly driven by the outcome completion rate \(\pi\), and relatively insensitive to the outcome ICC or covariate ICC. When \(\pi\) is smaller, the direct inflation approach can be quite conservative. Additional scenarios of \(m\in \{\text{20,100}\)} and \(\tau \in \left\{\text{0.05,0.6,1}\right\}\) with the followup rate \(\pi =0.9\) are plotted in Appendix Fig. 1. The patterns are qualitatively similar.
Accounting for expected attrition under MAR
We now consider the covariatedependent attrition or MAR. With a specific effect modifier of interest, a general formulation of MAR, sometimes called the covariatedependent missingness mechanism, assumes that \({O}_{ij}\perp {Y}_{ij}  \left\{{X}_{ij}, {W}_{i}\right\}\). For illustration, we only consider a scenario where the missingness only depends on the effect modifier (\({O}_{ij}\perp \left\{{Y}_{ij},{W}_{i}\right\}{X}_{ij}\)) but an extension to allow for dependence on \({W}_{i}\) is also straightforward. To proceed, we specify the missingness model as, \(Pr\left({O}_{ij}=1{X}_{ij}\right)={\pi }_{ij}\left({X}_{ij}\right),\) and the correlation of missingness indicators between observations within the same cluster as \(\tau \left({X}_{ij},{X}_{ik}\right)=corr\left({O}_{ij},{O}_{ik}{X}_{ij},{X}_{ik}\right)\). Essentially, these quantities are counterparts of those in the previous section to allow for the dependence on the effect modifier.
Unlike MCAR, deriving the closedform formula for \(Var\left({\widehat{\beta }}_{4}\right)\)under MAR can be challenging due to the complicated correlation patterns between the missingness indicator and covariates. Under MAR, the attrition rate per cluster is no longer homogeneous and the observed cluster size could be correlated with covariate. Sample size formula under MAR will inevitably depend on the cluster size’s distributional assumption and its association with the covariate value. Therefore, we propose an efficient Monte Carlo approach to estimate the sample size through simulating the covariates and missing data patterns under prespecified working models. Monte Carlo approach has been proven to be a popular and effective alternative when closedform expression for variance is not available in sample size determination [33,34,35].
The Monte Carlo sample size procedure for searching for an optimal sample size is summarized with five steps in Fig. 2. Our objective is to find the smallest number of clusters such that the predicted power is greater than or equal to a prespecified level, such as 80%. In Step 1, we specify the parameters for the outcome models, including outcome ICC, effect sizes, allocation ratio, total variances of the outcome, and the parameters for generating covariates, including covariate ICC, and other distributional parameters. For example, we can employ a mixedeffect logistic model as the missingness model with a random intercept to induce the ICC of the missingness on the logit scale [36]. Once the model configurations are determined, we set the number of simulations \(B\) and an even integer \({n}^{\left(0\right)}\) as the initial number of clusters. The initial number can be obtained by assuming MCAR using Eq. (4). We iterate \(B\) times of Step 2–3. In each iteration of Step 2, we simulate the effect modifier according to prespecified distributional assumption while accounting for the covariate ICC. With a continuous effect modifier, this can be achieved via a linear mixed model; with a binary effect modifier, we can use the betabinomial model [37] or the conditional linear family approach [38] to simulate correlated binary effect modifiers within a cluster for a prespecified covariate ICC. In addition, we can numerically specify the intercept of the mixedeffects logistic model for missingness to set our marginal followup rate to the given value.
Next, we follow Yang et al. [14] and Tong et al. [9] and consider a linear mixed analysis of covariance model with a meancentered treatment as,
The compound symmetric correlation for cluster \(i\) becomes \({\varvec{R}}_{i}=\left(1{\rho }_{yx}\right){\varvec{I}}_{{m}_{i}}+{\rho }_{yx}{\varvec{J}}_{{m}_{i}}\), and its inverse is given by
Here \({m}_{i}\) is the number of individuals with a measured outcome in each cluster; \({\varvec{I}}_{s}\) and \({\varvec{J}}_{s}\) are \(s\times s\) identity matrix and matrix of ones, respectively. We also define the collection of design points for each individual as \({\varvec{Z}}_{ij}={\left(1,\left({W}_{i}\stackrel{}{W}\right),{X}_{ij},\left({W}_{i}\stackrel{}{W}\right){X}_{ij}\right)}^{T}\) and for each cluster as \({\varvec{Z}}_{i}={\left({\varvec{Z}}_{i1},\cdots ,{\varvec{Z}}_{i{m}_{i}}\right)}^{T}\), and \(\varvec{\beta }=\{{\beta }_{1},{\beta }_{2},{\beta }_{3},{\beta }_{4}\}\). Given values of the variance and ICC parameters, our target variance of the HTE estimator, \(Var\left({\widehat{\beta }}_{4}\right)\), can be approximated by the \(\left(\text{4,4}\right)\) element of \({{\sigma }_{yx}^{2}\left\{{\sum }_{i=1}^{n}{\varvec{Z}}_{i}^{T}{\varvec{R}}_{i}^{1}{\varvec{Z}}_{i}\right\}}^{1}\). Since this variance is a function of \(n\), we rely on the searching algorithm to solve for the required sample size. Specifically, in each iteration of Step 3, we calculate \({\sum }_{i=1}^{n}{\varvec{Z}}_{i}^{T}{\varvec{R}}_{i}^{1}{\varvec{Z}}_{i}\) and record its realization \({\left\{{\sum }_{i=1}^{n}{\varvec{Z}}_{i}^{T}{\varvec{R}}_{i}^{1}{\varvec{Z}}_{i}\right\}}^{\left(b\right)}\) in iteration \(b,b=0, 1,\dots ,B\). In Step 4, we estimate \(Var\left({\widehat{\beta }}_{4}\right)\) by the average over all iterations as \({B}^{1}\sum _{b=1}^{B}{\left\{{\sum }_{i=1}^{n}{\varvec{Z}}_{i}^{T}{\varvec{R}}_{i}^{1}{\varvec{Z}}_{i}\right\}}^{\left(b\right)}\) and obtain our target variance of the HTE estimator, \(Var\left({\widehat{\beta }}_{4}\right)\), as the \(\left(\text{4,4}\right)\) element of \({{\sigma }_{yx}^{2}\left\{{B}^{1}\sum _{b=1}^{B}{\left\{{\sum }_{i=1}^{n}{\varvec{Z}}_{i}^{T}{\varvec{R}}_{i}^{1}{\varvec{Z}}_{i}\right\}}^{\left(b\right)}\right\}}^{1}\). In Step 5, we estimate the power using a twosided \(z\)test with a prespecified type I error rate \(\alpha\) and sample size \({n}^{\left(0\right)}\). Assuming our target power is 80%, for each \(k=\text{0,1},\dots\), we will assess whether the current sample size estimate \({n}^{\left(k\right)}\) ensures both \(1\zeta \left({n}^{\left(k\right)}\right)\ge 80\text{\%}\)and \(1\zeta ({n}^{\left(k\right)}2)<80\text{\%}\), where we define the power \(1\zeta \left(n\right)\) as a function of \(n\). If it does, the final sample size estimate will be ascertained as \({n}^{\left(k\right)}\)and the search concludes. Otherwise, we set \({n}^{(k+1)}={n}^{\left(k\right)}+2\) (assuming equal randomization) if the predicted power is below 80%; or decreased by 2 as \({n}^{(k+1)}={n}^{\left(k\right)}2\) if the predicted power is over 80%.
Simulation design under MCAR
For simplicity, our simulation only considers the balanced design with a single covariate, (i.e., \({ \sigma }_{w}^{2}=1/4\)\(, p=1\)). Since the values of \({\beta }_{1},{\beta }_{2},\) and \({\beta }_{3}\) do not affect the simulation results, we choose \({\beta }_{1}=0,{\beta }_{2}=0.25,\) and \({\beta }_{3}=0.1\). We fix the type I error rate as \(\alpha =0.0\)5, power as \(1\zeta =80\%\), and outcome variance \({\sigma }_{yx}^{2}=1\). We set the cluster size to be \(m\in \{20, 50, 100\}\) and mean followup rate to be \(\pi \in \{0.7, 0.9\}\). The values of missing indicator correlation \(\tau \in \{0.05, 0.3, 0.6, 1\}\), in which \(1\) indicates the special case of clusterlevel attrition following Taljaard et al. [21]. We chose two outcome ICC values, \({\rho }_{yx}\in \{0.01, 0.1\}\); two covariate ICC values, \({\rho }_{x}\in \{0.1, 0.5\}\). We consider both a continuous covariate and a binary covariate, where we specify the standardized effect size of \(\delta /{\sigma }_{x}=\{0.1, 0.25\}\) for the continuous covariate, and the effect size of \(\delta =\left\{0.25, 0.45\right\}\) for the binary covariate. The above parameter settings total \(288\) simulation scenarios for each covariate type. The simulation code can be found at https://github.com/deckardt98/HTE_CRT_Attrition.
With a continuous covariate, we fix \({\sigma }_{x}^{2}=1\), and use the linear mixed model to simulate \({X}_{ij}=1/2+{\lambda }_{i}+{\gamma }_{ij}\) where \({\lambda }_{i}\sim N\left(0,{\rho }_{x}{\sigma }_{x}^{2}\right), { \gamma }_{ij}\sim N(0,\left(1{\rho }_{x}\right){\sigma }_{x}^{2})\). With a binary covariate, we assume a betabinomial distribution. The data generation follows a twostep process as follows. First, we generate the event rate \({\pi }_{i}\) for each cluster from a beta distribution, Beta \(({q}_{1},{q}_{2}\)). Second, we randomly generate the covariate value for each individual from \(Bernoulli\left({\pi }_{i}\right)\). We fix the marginal expectation of the binary covariate as 0.3. According to the law of total expectation and law of total variance, we can solve \({q}_{1}\) and \({q}_{2}\) from \({q}_{1}/({q}_{1}+{q}_{2})=0.3\) and \({\rho }_{x}=1/(1+{q}_{1}+{q}_{2})\) simultaneously to obtain \({q}_{1}\) and \({q}_{2}\). Then the marginal variance of covariate can be obtained from \({\sigma }_{x}^{2}={q}_{1}{q}_{2}/{\left({q}_{1}+{q}_{2}\right)}^{2}\) given \({q}_{1},{q}_{2}\). Moreover, under MCAR, the missingness indicator \({O}_{ij}\) was generated using simbinCLF() function from the geeCRT package in R, which allows the specification of a common \(\tau\) for all clusters to generate binary missingness indicators with a compound symmetric correlation structure in each cluster [32, 38].
We simulate each scenario as follows. (a) We first calculate the required sample size based on our proposed Eq. (4) under MCAR and round up it to the nearest even integer. (b) We generate the covariate \({X}_{ij}\) as described, and outcome \({Y}_{ij}\) from the linear mixed model in the methods section. (c) We generate the missing indicators \({O}_{ij}\) to obtain the observed outcomes after attrition. (d) We fit the linear mixed effects model with the observed data and estimate \({\widehat{\beta }}_{4}\) via restricted maximum likelihood methods (REML) using the nlme package in R. (e) We calculate \(p\) values under null hypothesis \({{H}_{0}:\beta }_{4}=\delta =0\) and alternative hypothesis \({{H}_{1}:\beta }_{4}=\delta \ne 0\) respectively. For each scenario, we repeat Step (a) to (e) for \(3000\) times. The empirical type I error rate \(\psi\) is evaluated as the proportion of false positive using the simulated data under the null, whereas the empirical power \({\varphi }_{emp}\) is evaluated as the proportion of true positive using the simulated data under the alternative. We compare empirical power \({\varphi }_{emp}\) to analytical power \({\varphi }_{pre}\) based on Equation (4) under the alternative, and empirical type I error rate \(\psi\) to \(0.05\) under the null. The values of \({\varphi }_{pre}\) can be slightly larger than \(0.8\) because the estimated sample size is rounded to the next even integer. The corresponding Monte Carlo standard errors under \(3000\) simulations assuming Bernoulli random variables are \(0.004\) for the typeI error rate and \(0.007\) for power. Therefore, the 95% error margin for the empirical type I error is \(\pm 0.008\), and the empirical power is \(\pm 0.014\). To illustrate the advantage of our methods, the sample sizes obtained the direct inflation method are also calculated for each simulation scenario.
Simulation design under MAR
Under MAR, we preserve the design parameters for the outcome model from the MCAR simulations but introduce additional parameters in the missingness model. We employ a mixedeffect logistic regression model to simulate the missingness indicators:
where \({b}_{i}\) is the clusterspecific intercept that follows \(N(0,{\sigma }_{b}^{2})\). We fix the value of \({\alpha }_{1}\) as \(0.5\) and consider one single covariate, \({X}_{ij}\), which can be either continuous or binary. The covariate datagenerating procedure is identical to that under the MCAR simulation design. To make MAR simulation results comparable with MCAR, we tune the marginal followup rates to be \(\pi \in \{0.7, 0.9\}\) by varying the value of \({\alpha }_{0}\). For the missingness ICC, \(\tau\), we consider \(\tau \in \{0.05, 0.3, 0.6\}\) and do not consider the extreme case of \(\tau =1\). This is because \(\tau\) equals to \({\sigma }_{b}^{2}/({\sigma }_{b}^{2}+{\pi }^{2}/3)\) under the mixedeffect logistic regression model [36]; in this setup, \(\tau =1\) is theoretically not attainable but can be approximately once we set \({\sigma }_{b}^{2}\) to be extremely large. With the above parameter setup, we have \(144\) simulation scenarios for each type of covariate. We note that both the variance \(Var\left({\widehat{\beta }}_{4}\right)\) and empirical power are obtained by Monte Carlo simulations. For each scenario, we average over \(B=1000\) Monta Carlo draws to obtain the variance \(Var\left({\widehat{\beta }}_{4}\right)\) and calculate the sample size based on this variance. After rounding up to the nearest even integer, the analytical power is calculated from the Equation (2). We simulate \(3000\) trials to calculate the empirical power following the same procedure under MCAR. To facilitate the comparison, we also calculate the required sample sizes using our proposed sample size formula under MCAR and the direct inflation method.
Result
Simulation results under MCAR
Table 1 shows the estimated required number of clusters for the \(z\)test based on the proposed formula under MCAR given a continuous individuallevel covariate. The correlation of missingness indicators is set to be \(\tau =0.05\), and the missingness rate is tuned to be 0.1 or 0.3. Most scenarios have the predicted power and the empirical type I error rate within the error margin. When the number of clusters become smaller, cases with predicted power out of the error margin are occasionally observed. This is not unexpected because our sample size formula relies on the asymptotic distribution. Similar simulation results with a continuous covariate assuming \(\tau \in \left\{0.3, 0.6, 1\right\}\) are included in Appendix Tables 1, 2 and 3 and another set of simulations with a binary covariate are included in Appendix Tables 4, 5, 6 and 7. Overall, our simulation verifies the accuracy of our proposed sample size formula under MCAR. In addition, compared to the sample sizes estimated via the direct inflation method, our results suggest that the magnitudes of sample size inflation, defined by the ratio of unadjusted sample size versus adjusted sample size, based on our method can be close to the followup rate in the direct inflation method under many simulation scenarios. However, in scenarios with large values of covariate ICC, the direct inflation method overestimates the sample size, and the degree of overestimation increases when the outcome ICC is large and when the followup rate is low. These findings are consistent with our numerical illustration in the methods section. In addition, in Appendix Tables 2 and 3, we present the results for when \(\tau \in \{0.3, 0.6\}\); in Appendix Table 4, we present results for the special case of \(\tau =1\) where the attrition occurs at the cluster level. The results are all qualitatively similar. In general, our simulation shows that under MCAR the sample size stays approximately constant as the missingness correlation, \(\tau\), varies, even when the attrition is at the cluster level (\(\tau =1\)). Moreover, the estimated number of clusters is insensitive to the outcome ICC when the covariate ICC is small but becomes more sensitive to the outcome ICC when the covariate ICC increases.
Simulation results under MAR
Table 2 presents the estimated required number of clusters for the \(z\)test based on the Monte Carlo method under MAR with a continuous covariate, \(\tau =0.05\), and \(\pi =0.7, 0.9\). Overall, the accuracy of our Monte Carlobased method under MAR is confirmed as the predicted power for most of the scenario is within the error margin of the empirical power while the test maintains a valid empirical type I error rate. Only when the number of clusters becomes very small can we find empirical power lower than the predicted power. Compared to the other methods, we found that the MCAR method often underestimates the sample size when the true missingness model is MAR, especially when the prespecified effect size is small, the cluster size is small, and the missingness rate is high. In the worst cases with lowest followup rate of \(0.7\) and smallest cluster size of \(20\), the estimated number of clusters using our formula under MCAR are \(10\) clusters fewer compared to the more accurate estimation based on the proposed Monte Carlo method under MAR. As for the direct inflation method, it can underestimate the sample size when the true missingness mechanism is MAR, especially when the prespecified effect size is small. However, when the covariate ICC and outcome ICC are ,large and the missing rate is high, the direct inflation method can also overestimate the sample size. Similar results are observed for scenarios with a continuous covariate when \(\tau \in \{0.3, 0.6\}\) (in Appendix Tables 8 and 9) as well as the scenarios with a binary covariate when \(\tau \in \{0.05, 0.3, 0.6\}\). (Appendix Tables 10, 11 and 12). Note that with a binary covariate, the sample size formula under MCAR and the direct inflation method tend to overestimate the sample size when the true missingness mechanism is MAR.
Application to the Work, Family, and Health Study
We demonstrate our proposed sample size methods using data from the Work, Family, and Health Study (WFHS) [27, 39]. WFHS implemented a social experiment among employees in a Fortune 500 information technology company and studied the effect of altered workplace practices and policies on worklife balance of employees. Randomization was at the group level, where each group comprised employees governed by the same leadership.
There were 799 participants nested within 56 groups enrolled in this trial at baseline and 694 participants completed the followedup assessment at 6 months (\(\pi =0.87\)). The group size varies between \(7\) and \(60\) with an average of 29.
In this illustration, the outcome of interest is control over working hours (CWH) assessed at 6 months through survey interview. It is a continuous outcome measuring the degree of flexibility for managing working hours and was also measured at the baseline. Our goal is to estimate the sample size for studying HTE with the covariate of CWH at baseline on the CWH outcome at 6 months. According to published results [39], the outcome ICC is estimated to be 0.14; the estimated total outcome variance conditional on the baseline CWH is \({\sigma }_{yx}^{2}\approx 0.23\); the estimated marginal variance of covariate is \({\sigma }_{x}^{2}=0.4\); the estimated ICC for the baseline CWH is\({\rho }_{x}=0.058\). The allocation ratio is 1:1 and \({\sigma }_{W}^{2}=0.25\). We consider the effect size on the outcome \(\delta \in \{0.2, 0.3\}\); the correlation in the missingness \(\tau \in \{0.05, 0.3, 0.6\}\). Besides \(\pi =0.87\), we also consider \(\pi =\{0.935, 0.610\)} to expand our illustration. We estimate the required sample size under the MCAR and MAR assumptions with the same configurations of outcome and covariates ICCs and missingness ICC and marginal average missing rate. For the latter, we employ the same mixedeffects logistic model for the missingness indicator on covariate as in the simulation design for MAR with a slope of \({\alpha }_{1}=0.5\). We round up the calculated sample size to the nearest even integer. All calculations performed under type I error rate of 0.05 and power of 80%.
Table 3 summarizes the estimated number of clusters to test HTE with baseline CWH under different missingness mechanisms. The estimated number of clusters ranges from 16 to 26 when \(\delta =0.2\), and from 8 to 12 when \(\delta =0.3.\) The sample size is invariant with regard to assumptions on \(\tau\) when other parameters are fixed, while being much more sensitive to the missingness rate as the number of required clusters increases when the followup rate is lower. For \(\delta =0.3\) and under high attrition (\(\pi =0.61\)), the direct inflation method and the estimation formula under MCAR may underestimate the sample size when the actual missingness mechanism is MAR. When \(\delta =0.3\), the estimated sample sizes are relatively invariant across different methods because the estimated sample size is generally quite small. Overall, these patterns are consistent with the findings observed in our simulation studies.
Discussion
This paper developed new sample size procedures for assessing HTE in CRTs with outcome attrition under both MCAR and MAR mechanisms. Under MCAR, we proposed a closedform formula for sample size calculation by adapting the result from Tong et al. [9]. This closedform sample size is easy to implement and clarifies how design parameters can influence the sample size requirement under MCAR. Under MAR, we described a Monte Carlo method to calculate sample size. Our simulation studies show adequate performance of our proposed sample size methods under both missingness mechanisms. We also compared the performance of our sample size methods to the direct inflation method. Although the estimated sample sizes are similar across different methods in our simulation studies with a limited number of scenarios, they are expected to be more different under other settings. For example, under MCAR, Fig. 1 illustrates that the direct inflation method becomes more conservative with an increasing covariate ICC, but may be close to the proposed method when the covariate ICC approaches zero. Our simulation results also suggest the implications when the sample size is calculated using the direct inflation method. Based on the results in Tables 1 and 2, we find that the direct inflation method only overestimates the sample size under MCAR, but can either overestimate or underestimate the sample size under MAR depending on the missingness rate, cluster size, covariate, and outcome ICCs. In the context of CRT, clusters typically refer to communities, hospitals, or health systems, and it is often the case that the difference of one cluster can mean a significant change in the study budget and is important. Regardless of the values of design parameters, our formula and procedure still provide a useful approach to estimate the sample size required for a CRT accurately.
Although sample size calculations for CRTs can be carried out via simulationbased methods, it could be computationally intensive (due to repeatedly fitting multilevel models based on simulation data) and operationally cumbersome if one is interested in examining the study power across a wide range of design parameters. On the other hand, the availability of closedform formulas and methods reduces the computational burden and effectively decreases the amount of effort in exploring many design scenarios. Perhaps more importantly, the closedform formula clarifies key aspects and insights into the datagenerating processes that determine the study power. For example, under the MCAR setting, our formula implies that the intracluster correlation coefficient of the missingness indicator generally has minimal impact on study power (except under whole cluster attrition), which further indicates that it may not be necessary to explore the change in sample size under varying intracluster correlation coefficient of the missingness indicator. Such insights can simplify the power analyses because one could then focus on exploring essential design parameters to assess the sensitivity of sample size results, instead of blindly exhausting all design parameters typically required in a simulationbased procedure. Finally, the closedform sample size formula could offer knowledge on how key intracluster correlation parameters affect the study power, and inform investigators on selecting their values to obtain a conservative sample size when accurate knowledge is unavailable [14].
For the design parameters that are influential to the sample size estimation, our findings resemble those discussed in Tong et al. [9]. In brief, the required sample size increases as covariate ICC increases, and the required sample size decreases as outcome ICC increases. Regarding the missingness model parameters, under MCAR, the required sample size is not sensitive to the missingness ICC but increases almost proportionally as the missing rate increases. This finding facilitates the use of our method in practice because the missing rate is much easier to assume than the correlation between missing indicators at the design stage. Like MCAR, the required sample size increases as the missing rate increases under MAR. The choice of \(\tau\) can be slightly more influential to sample size under MAR. However, when the cluster size is large, and the estimated sample size is small, the choice of \(\tau\) still has limited impact on the sample size estimation under MAR. Reliable estimation of ICC parameters is essential for informing the design of future CRTs, and has been a topic of study in many previous works, see, for example, Maas and Hox [40], Ukoumunne et al. [41], Wu et al. [42], Preisser et al. [43] and Li et al. [44] under different clustered designs. Ridout et al. [45] also compared the performance of 20 ICC estimators for binary data. When routinelycollected data are available, one could use existing ICC estimators to obtain correlation parameters for study design purposes. Alternatively, several previous studies [42, 46,47,48] also published on empirical ICC estimates in completed CRTs under different settings and outcome types, and could inform the design of studies with similar features. Finally, when there is a lack of accurate knowledge of the ICC parameters nor existing data to inform such parameters, we recommend varying a range of ICC parameters (that are considered plausible depending on the context and primary outcome). For instance, in our data example, we conducted sensitivity analyses on sample sizes under various outcome or covariate ICCs. We did not further examine the sensitivity to the ICC of the missing indicator as this parameter, in most cases, is found to have minimal impact on study power. When there is no existing, routinelycollected data to inform study design parameters, it may be challenging to accurately simulate an MAR mechanism for sample size calculation at the design stage. However, when MAR (based on the effect modifier of interest) is suspected at the design stage, a reasonable approach is to conduct sensitivity analyses varying the association parameter (\({{\upalpha }}_{1}\) in the logistic missingness model in the simulation design under MAR) between the effect modifier and the missingness while controlling for the overall missingness rate. This could generate a range of sample size estimates under different MAR mechanisms and can suggest a conservative sample size estimate over the range of missingness parameters considered. Nevertheless, in future work, it would be interesting to develop a modified sample size procedure that is robust to misspecification of the missingness model under the MAR setting.
Our Monte Carlo method for evaluating sample size under MAR is flexible insofar as the specification of the missingness model. Notably, a correct sample size estimation under MAR relies on the correct specification of the missingness model. The use of our method relies on the ignorability assumption. In this paper, we assumed a mixedeffects logistic model, and the regression parameter can be prespecified based on prior knowledge or estimated using existing data. Computationwise, our Monte Carlo procedure does not require repeatedly fitting mixedeffects logistic regression models and therefore is more efficient in computing time than other Monte Carlobased sample size methods developed for CRTs [49]. From our simulation results, we observe that, except for a few scenarios where the outcome ICC is relatively small, the deviation between predicted and empirical power is always within the expected Monte Carlo error margin (that is \(\pm 0.015\) based on a target power of 80% under 3000 simulations). For this reason, we still recommend using our methods as a computationally efficient approach to explore sample size under MAR. However, if there is only a limited number of design scenarios and parameters to explore and computation allows, an alternative approach is to estimate the sample size based on the full simulation (by repeatedly fitting the linear mixed analysis of covariance model based on simulated data), which is expected to provide more accurate results.
Finally, there are several limitations of our proposed methods that we will address in future work. First, our methods only focus on a continuous outcome and may only provide an approximation when the outcome is binary or count. Second, our paper studies a univariate effect modifier, but multivariate effect modifiers can also arise in certain situations. For instance, a trial may wish to investigate HTE with a continuous effect modifier under both a linear term and a quadratic term. Third, our method does not address missing not at random (MNAR) scenario, in which case sensitivity analysis strategies warrants further development [50].
Conclusion
Despite a growing interest in studying HTE in CRTs, no previous studies have formally investigated how attrition can affect the sample size estimation in a CRT when the objective is to assess treatment effect heterogeneity. We discussed sample size procedures for assessing HTE in CRTs with outcome attrition under MCAR and MAR mechanisms. Our simulation studies show satisfactory performance of our proposed sample size methods under both missingness mechanisms. The outcome ICC, covariate ICC and attrition rate are important input parameters for sample size determination at the design stage, but the ICC among the missingness indicators often has limited influence and can be considered as a nuisance parameter.
Availability of data and materials
Data for the application can be found at, https://www.icpsr.umich.edu/web/DSDR/studies/36158.
Abbreviations
 CF:

Correction factor
 CRT:

Cluster randomized trials
 CV:

Coefficient of variation
 CWH:

Control over working hours
 HTE:

Heterogeneous treatment effect
 ICC:

Intracluster correlation coefficient
 MAR:

Missing at random
 MCAR:

Missing completely at random
 MNAR:

Missing not at random
 WFHS:

Work, Family, and Health Study
References
Turner EL, Li F, Gallis JA, et al. Review of recent methodological developments in GroupRandomized trials: part 1Design. Am J Public Health. 2017;107(6):907–15.
Murray DM. Design and analysis of grouprandomized trials: Monographs in Epidemiology and 1998.
Roy A, Bhaumik DK, Aryal S, et al. Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics. 2007;63(3):699–707.
Demidenko E. Sample size and optimal design for logistic regression with binary interaction. Stat Med. 2008;27(1):36–46.
Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. Int J Epidemiol. 2015;44(3):1051–67.
Cintron DW, Adler NE, Gottlieb LM, et al. Heterogeneous treatment effects in social policy studies: an assessment of contemporary articles in the health and social sciences. Ann Epidemiol. 2022;70:79–88.
Welch VA, Norheim OF, Jull J, et al. CONSORTEquity 2017 extension and elaboration for better reporting of health equity in randomised trials. BMJ. 2017;359:j5085.
Hemming K, Taljaard M, Forbes A. Modeling clustering and treatment effect heterogeneity in parallel and steppedwedge cluster randomized trials. Stat Med. 2018;37(6):883–98.
Tong G, Esserman D, Li F. Accounting for unequal cluster sizes in designing cluster randomized trials to detect treatment effect heterogeneity.Stat Med. 2022;41(8):137696.
Li F, Chen X, Tian Z et al. Designing threelevel cluster randomized trials to assess treatment effect heterogeneity. Biostatistics. 2022. https://doi.org/10.1093/biostatistics/kxac026.
Sun X, Briel M, Busse JW et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ. 2012;344.
Starks MA, Sanders GD, Coeytaux RR, et al. Assessing heterogeneity of treatment effect analyses in healthrelated cluster randomized trials: a systematic review. PLoS ONE. 2019;14(8):e0219894.
Collaboratory NI. Best Practices for Integrating Health Equity into Embedded Pragmatic Clinical Trials for Dementia Care 2022:4.
Yang S, Li F, Starks MA, et al. Sample size requirements for detecting treatment effect heterogeneity in cluster randomized trials. Stat Med. 2020;39(28):4218–37.
Raudenbush SW. Statistical analysis and optimal design for cluster randomized trials. Psychol Methods. 1997;2:173–85.
RUBIN DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
Lefante JJ. The power to detect differences in average rates of change in longitudinal studies. Stat Med. 1990;9(4):437–46.
Lu K, Mehrotra DV, Liu G. Sample size determination for constrained longitudinal data analysis. Stat Med. 2009;28(4):679–99.
Fiero MH, Huang S, Oren E, et al. Statistical analysis and handling of missing data in cluster randomized trials: a systematic review. Trials. 2016;17:72.
Zhang S, Ahn C. Sample size calculation for timeaveraged differences in the presence of missing data. Contemp Clin Trials. 2012;33(3):550–6.
Taljaard M, Donner A, Klar N. Accounting for expected attrition in the planning of community intervention trials. Stat Med. 2007;26(13):2615–28.
Xu X, Zhu H, Ahn C. Sample size considerations for matchedpair cluster randomization design with incomplete observations of continuous outcomes. Contemp Clin Trials. 2021;104:106336.
Xu X, Zhu H, Hoang AQ, et al. Sample size considerations for matchedpair cluster randomization design with incomplete observations of binary outcomes. Stat Med. 2021;40(24):5397–416.
Zhu H, Xu X, Ahn C. Sample size considerations for paired experimental design with incomplete observations of continuous outcomes. Stat Methods Med Res. 2019;28(2):589–98.
Zhang S, Cao J, Ahn C. Sample size calculation for beforeafter experiments with partially overlapping cohorts. Contemp Clin Trials. 2018;64:274–80.
Wang C, Hall CB, Kim M. A comparison of power analysis methods for evaluating effects of a predictor on slopes in longitudinal designs with missing data. Stat Methods Med Res. 2015;24(6):1009–29.
Work F, Network H. Work, Family, and Health Study (WFHS). Interuniversity Consortium for Political and Social Research [distributor]; 2018.
van Breukelen GJ, Candel MJ, Berger MP. Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Stat Med. 2007;26(13):2589–603.
Candel MJ, Van Breukelen GJ. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with secondorder PQL mixed logistic regression. Stat Med. 2010;29(14):1488–501.
Li F, Tong G. Sample size estimation for modified Poisson analysis of cluster randomized trials with a binary outcome. Stat Methods Med Res. 2021;30(5):1288–305.
Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol. 2006;35(5):1292–300.
Neuhaus JM. Statistical methods for longitudinal and clustered designs with binary responses. Stat Methods Med Res. 1992;1(3):249–73.
Liu W, Ye S, Barton BA, et al. Simulationbased power and sample size calculation for designing interrupted time series analyses of count outcomes in evaluation of health policy interventions. Contemp Clin Trials Commun. 2020;17:100474.
Snell KIE, Archer L, Ensor J, et al. External validation of clinical prediction models: simulationbased sample size calculations were more reliable than rulesofthumb. J Clin Epidemiol. 2021;135:79–89.
Shi Y, Lee JH. Sample size calculations for group randomized trials with unequal group sizes through Monte Carlo simulations. Stat Methods Med Res. 2018;27(9):2569–80.
Eldridge SM, Ukoumunne OC, Carlin JB. The IntraCluster correlation coefficient in cluster randomized trials: a review of definitions. Int Stat Rev. 2009;77(3):378–94.
Li P, Redden DT. Small sample performance of biascorrected sandwich estimators for clusterrandomized trials with binary outcomes. Stat Med. 2015;34(2):281–96.
Qaqish BF. A family of Multivariate Binary Distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika. 2003;90(2):455–63.
Bailey BE, Andridge R, Shoben AB. Multiple imputation by predictive mean matching in clusterrandomized trials. BMC Med Res Methodol. 2020;20(1):72.
Maas CJ, Hox JJ. Sufficient sample sizes for multilevel modeling. Methodology. 2005;1(3):86–92.
Ukoumunne OC. A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials. Stat Med. 2002;21(24):3757–74.
Wu S, Crespi CM, Wong WK. Comparison of methods for estimating the intraclass correlation coefficient for binary responses in cancer prevention cluster randomized trials. Contemp Clin Trials. 2012;33(5):869–80.
Preisser JS, Lu B, Qaqish BF. Finite sample adjustments in estimating equations and covariance estimators for intracluster correlations. Stat Med. 2008;27(27):5764–85.
Li F, Yu H, Rathouz PJ, et al. Marginal modeling of clusterperiod means and intraclass correlations in stepped wedge designs with binary outcomes. Biostatistics. 2022;23(3):772–88.
Ridout MS, Clarice GBD, Firth D. Estimating Intraclass correlation for Binary Data. Biometrics. 1999;55(1):137–48.
Murray DM, Blitstein JL. Methods to reduce the impact of Intraclass correlation in GroupRandomized trials. Eval Rev. 2003;27(1):79–103.
Campbell MK, Fayers PM, Grimshaw JM. Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research. Clin Trials. 2005;2(2):99–107.
Korevaar E, Kasza J, Taljaard M, et al. Intracluster correlations from the CLustered OUtcome dataset bank to inform the design of longitudinal cluster trials. Clin Trials. 2021;18(5):529–40.
Landau S, Stahl D. Sample size and power calculations for medical studies by simulation when closed form expressions are not available. Stat Methods Med Res. 2013;22(3):324–45.
Tong G, Li F, Allen AS. Missing Data. In Piantadosi S, Meinert CL, editors. Principles and Practice of Clinical Trials. Cham: Springer International Publishing; 2019. p. 1–21.
Acknowledgements
None.
Funding
Research in this article was supported by PatientCentered Outcomes Research Institute Award (PCORI®) Award ME2020C321072 (PI: Li) and ME2020C119220 (PI: Harhay). The statements presented in this article are solely the responsibility of the authors and do not represent the views of PCORI®, its Board of Governors or Methodology Committee.
Author information
Authors and Affiliations
Contributions
JT, FL and GT developed the methods. JT performed the simulation and prepared the data example. JT and GT prepared the initial manuscript draft. FL and MH revised the draft. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
None.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
12874_2023_1887_MOESM1_ESM.docx
Additional file 1.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Tong, J., Li, F., Harhay, M.O. et al. Accounting for expected attrition in the planning of cluster randomized trials for assessing treatment effect heterogeneity. BMC Med Res Methodol 23, 85 (2023). https://doi.org/10.1186/s12874023018878
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874023018878
Keywords
 Heterogeneity of treatment effect
 Missing data
 Missing at random
 Missing completely at random
 Power calculation
 Intracluster correlation coefficient
 Cluster randomized trial