A priori postulated and real power in cluster randomized trials: mind the gap
© Guittet et al. 2005
Received: 08 March 2005
Accepted: 18 August 2005
Published: 18 August 2005
Skip to main content
© Guittet et al. 2005
Received: 08 March 2005
Accepted: 18 August 2005
Published: 18 August 2005
Cluster randomization design is increasingly used for the evaluation of health-care, screening or educational interventions. The intraclass correlation coefficient (ICC) defines the clustering effect and be specified during planning. The aim of this work is to study the influence of the ICC on power in cluster randomized trials.
Power contour graphs were drawn to illustrate the loss in power induced by an underestimation of the ICC when planning trials. We also derived the maximum achievable power given a specified ICC.
The magnitude of the ICC can have a major impact on power, and with low numbers of clusters, 80% power may not be achievable.
Underestimating the ICC during planning cluster randomized trials can lead to a seriously underpowered trial. Publication of a priori postulated and a posteriori estimated ICCs is necessary for a more objective reading: negative trial results may be the consequence of a loss of power due to a mis-specification of the ICC.
A cluster randomized trial involves randomizing social units or clusters of individuals, rather than the individuals themselves. This design, which is increasingly used for evaluating health-care, screening and educational interventions [1–3], presents specific constraints that must be considered during planning and analysis [4, 5].
The responses of individuals within a cluster tend to be more similar than those of individuals of different clusters. This correlation leads to an increased required sample size in randomized trials of clusters compared with that of individuals, although this clustering effect is rarely taken into account. Thus, in a recent review of cluster randomized trials in primary care, Eldridge et al  reported that only 20% of studies accounted for clustering in the sample size calculation. Similar results were found in other reviews, as listed by Bland . The increase in sample size is measured through an inflation factor, which is a function of both the cluster size and the intraclass correlation coefficient (ICC), which appraises the correlation between individuals within the same cluster [1–3, 8]. Therefore an a priori value for this correlation must be postulated during planning. However, estimates of this correlation are rarely available, and, if available, are often uncertain. Indeed the correlation would differ according to outcome, setting, intervention, covariate adjustment and also sampling [5, 9, 10]. Therefore, a discrepancy between a priori postulated and a posteriori estimated ICCs may occur.
The discrepancy between a priori postulated and a posteriori estimated ICCs may be reduced by intermediate estimation of the ICC, thus allowing a re-estimation of the required sample size . However, the room to manœuvre to increase the sample size may be restricted. Indeed, including new clusters may be difficult, either because the number of clusters is limited [10, 12–15] (which may occur when the randomization unit is defined by a geographic area or hospital, for example) or because clusters are frequently randomized all at once and not one at a time. The cluster size itself may also be limited (e.g., by the size of a family or because the number of patients followed up in a clinical practice cannot be increased ), which then disallows the increase in sample size by increasing cluster size.
The purpose of our study was to assess the consequence on power of the ICC and to what extent the discrepancy between a priori postulated and a posteriori estimated ICCs may induce a loss in power in cluster randomized trials.
We considered a completely cluster randomized design with a continuous outcome (normally distributed) measured at a single time point. We assumed an equal number of clusters randomized to each arm and a fixed common cluster size. The sample size is calculated as follows :
where m is the cluster size, g is the number of clusters per arm, ρ is the ICC, ES is the effect size (defined as ratio between the absolute difference between the two intervention-specific means (|Δ|) and the standard deviation (σ)) and z 1-α/2 and z 1-β are the critical values of the standard normal distribution corresponding to error rates α (two-sided) and β, respectively. One recognizes the sample size calculation for an individually randomized trial inflated by a factor equal to [1 + (m - 1)ρ] defined as the variance inflation factor. When the cluster size varies, m refers to the average cluster size.
To quantify the influence of the ICC on the power, we drew two kinds of power contour graphs. First, considering an effect size and an a priori postulated ICC, we considered several combinations of numbers of clusters and cluster sizes that allow for achieving 80% power. Then considering these combinations, we plotted the real power as a function of the ICC, which may differ from the a priori postulated value. Two values of a priori postulated ICC (0.005, 0.02) and five numbers of clusters per intervention arm (3, 5, 10, 20 and 40) were considered for these graphs. The effect size was fixed at 0.25.
We also drew power contour graphs, showing combinations of cluster sizes and number of clusters leading to a pre-specified power, with type I error fixed at 5%. Four power levels were considered (90, 80, 60 and 40%), 3 effect sizes (0.25, 0.50 and 0.75) and 4 levels of ρ (0.005, 0.020, 0.050 and 0.100). These ICC values were chosen according to previously published estimates [3, 6, 12, 16–23].
We determined the maximal achievable power given a limited number of randomized clusters (i.e., considering an infinite cluster size) or a limited cluster size (i.e., considering an infinite number of clusters). For a limited number of clusters, results were graphically illustrated by considering 5 numbers of clusters per intervention arm (3, 5, 10, 20 and 40) for 2 effect sizes (0.25, 0.5).
Second, when the mean cluster size is limited but the number of clusters is not, an increase in the ICC may also be of great consequence. As an example, considering a mean cluster size of 100, we would need to randomize 8 clusters per arm to detect a 0.25 effect size with 80% power when the ICC is fixed at 0.02. This number of clusters is raised to 15 and 28 when the ICC is fixed at 0.05 and 0.10, respectively, or 700 and 2000 additional subjects, respectively, per arm.
In a cluster randomized trial aimed at detecting an effect size ES at a pre-specified α level with an a priori postulated ICC equal to ρ, changing the cluster size m and/or the number of clusters g per group changes β and therefore power. Power is thus related to the f(m, g) = (z 1-α/2 + z 1-β )2 function defined as
When m, the mean cluster size, tends to be infinite, f(m, g) tends to be an asymptotic value, but there is no limit when g, the number of clusters, is infinite:
Therefore, although power is not theoretically limited when the number of clusters can be increased, a maximal reachable power is possible when this number is fixed and only the cluster size can be increased. This maximum theoretical power is defined as:
The ICC is a nuisance parameter that has to be a priori specified when planning a cluster randomized trial. The magnitude of this coefficient has a major impact on power, particularly with a small number of randomized clusters. Our results were derived considering a continuous outcome, but in their simulation study, Donner and Klar  showed that power never differs from more than one percentage point in continuous or binary outcomes. Moreover, we did not take into account any potential variability in cluster size, which is already known to reduce power . When planning cluster randomized trials, variability in cluster size is rarely taken into account, and the cluster size m is generally replaced by the mean cluster size. An underestimation of the ICC may therefore be expected to have similar consequences when cluster size is constant. In the end, an underestimation of the ICC during planning could therefore lead to a severely underpowered study and thus questionable results.
In cluster randomized trials, it is known that for a fixed total number of subjects, the higher the number of clusters (and thus the smaller the average cluster size), the higher the power [2, 4, 5, 14, 24, 26, 27]. In the extreme case, in clusters of size one, individuals are randomized, with no loss of power because of correlation between subjects. Moreover, it has also been shown that increasing cluster size improves the power up to a certain threshold, which depends on the value of the ICC [24, 27]. Therefore, when planning a cluster randomized trial, the optimal strategy is indeed to randomize a large number of clusters [1, 2, 12, 29]. Such a strategy first allows for decreasing the total sample size for a pre-specified power and second, as our results show, protects against a loss of power induced by an underestimation of the ICC when planning. However, because of logistic constraints, the number of randomized clusters may be limited, and indeed, the review by Eldridge et al  noted that half of the cluster randomized trials analyzed had fewer than 29 clusters in each arm. Therefore, for most cluster randomized trials, the a priori postulated value of the ICC has a great impact on power.
When planning trials, the a priori postulated ICC will rarely be very reliable. During the study, an intermediate estimation of the ICC can be assessed, thus allowing a sample size adjustment . But the determination of this intermediate estimation is not without error, as was shown in the study by Moore et al , in which the intermediate ICC was 0.012 and the final one 0.031. A sensitivity analysis must therefore be undertaken when planning, to account for uncertainty of the ICC. In the extreme situations, when very few clusters can be randomized, such a sensitivity analysis may illustrate the high risk of performing an underpowered study and thus highlight arguments for not performing the study.
When reporting the study results, investigators should publish both the ICC used during the planning and the a posteriori estimated one, as recommended initially by some authors and recently by the extension of the CONSORT statement for cluster randomized trials [27, 29–31]. However, such information is rarely available. We studied cluster randomized trials published between January 2003 and December 2004 in the British Medical Journal, "which contains more such reports than any other journal" , and the published extension of the CONSORT statement ). Of 16 published studies, 5 (31.2%) did not report an a priori postulated ICC and 2 reported no sample size calculation. Only 5 (31.2%) reports provided a posteriori estimated ICCs (without any confidence intervals). Such under-reporting disallows assessing the discrepancy between the a priori postulated ICC and the a posteriori estimated one. However, reporting both ICCs would help readers "assess the appropriateness of the original sample size calculations as well as the magnitude of the clustering for each outcome"  and help investigators design future trials [1, 27, 31]. It would also help readers understand trial results, particularly negative ones: a study may prove to be negative just by a loss of power induced by an a priori underestimation of the ICC. On a formal point, the publication format of the a posteriori estimated ICC should follow the recommendation by Campbell et al., who advocate specifying a description of the data set and information on the method used to assess it and the precision of the estimate .
In conclusion, our study supports modifications in investigators' practices when planning trials and reporting results, taking into account the uncertainty of the ICC by favoring a high number of clusters and publishing this parameter. For readers, an objective reading of trial results, particularly negative results, requires knowledge of a priori and a posteriori estimated ICCs.
The authors are indebted to Dr. Sandra Eldridge for constructive comments.
This work was funded by a grant from the Foundation for Medical Research (FRM).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.