This article has Open Peer Review reports available.
Design effect in multicenter studies: gain or loss of power?
- Emilie Vierron^{1, 2, 3, 4}Email author and
- Bruno Giraudeau^{1, 2, 3, 4}
https://doi.org/10.1186/1471-2288-9-39
© Vierron and Giraudeau; licensee BioMed Central Ltd. 2009
Received: 22 July 2008
Accepted: 18 June 2009
Published: 18 June 2009
Abstract
Background
In a multicenter trial, responses for subjects belonging to a common center are correlated. Such a clustering is usually assessed through the design effect, defined as a ratio of two variances. The aim of this work was to describe and understand situations where the design effect involves a gain or a loss of power.
Methods
We developed a design effect formula for a multicenter study aimed at testing the effect of a binary factor (which thus defines two groups) on a continuous outcome, and explored this design effect for several designs (from individually stratified randomized trials to cluster randomized trials, and for other designs such as matched pair designs or observational multicenter studies).
Results
The design effect depends on the intraclass correlation coefficient (ICC) (which assesses the correlation between data for two subjects from the same center) but also on a statistic S, which quantifies the heterogeneity of the group distributions among centers (thus the level of association between the binary factor and the center) and on the degree of global imbalance (the number of subjects are then different) between the two groups. This design effect may induce either a loss or a gain in power, depending on whether the S statistic is respectively higher or lower than 1.
Conclusion
We provided a global design effect formula applying for any multicenter study and allowing identifying factors – the ICC and the distribution of the group proportions among centers – that are associated with a gain or a loss of power in such studies.
Background
Multicenter studies involve correlation in data because subjects from the same center are more similar than are those from different centers [1]. Such a correlation potentially affects the power of standard statistical tests, and conclusions made under the assumption that data are independent can be invalidated.
A usual measure of the clustering effect on an estimator (often a treatment or a group effect) is the design effect (Deff). The Deff is defined as the ratio of two variances: the variance of the estimator when the center effect is taken into account over the variance of the estimator under the hypothesis of a simple random sample [2, 3]. The Deff represents the amount by which the sample size needs to be multiplied to account for the design of the study. Ignoring clustering can lead to over- (Deff < 1) or underpowered (Deff > 1) studies.
In cluster randomized trials, clustering produces a loss of power and Donner and Klar proposed a method to inflate the sample size to take data correlation into account [4]. On the contrary, in individually randomized trials with equal treatment arm sizes, a center effect induces a gain in power, and sample size can be reduced [5]. Thus, in some situations, correlation in data induces a loss of power, and in others, a gain in power. To our knowledge, complete explanations for this striking discrepancy are lacking.
We aimed to produce a measure of clustering in multicenter studies testing the effect of a binary factor on a continuous outcome. We first present the statistical model used and the associated design-effect formula. Then we explore the general form of this design effect under particular study designs. Finally, we give examples to illustrate our results.
Methods and results
Theoretical Issues
The mixed-effects model
Group effect variance
Two-way ANOVA
One-way ANOVA
The Design Effect
Simulation study
Validation of the approximate design effect formula.
ICC = 0.01 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
N subjects | 100 | 200 | 500 | |||||||||
N centers | 5 | 10 | 20 | 5 | 10 | 20 | 50 | 5 | 10 | 20 | 50 | 100 |
S1 Deff | 0.9969 | 0.9938 | 0.9921 | 0.9966 | 0.9936 | 0.9922 | 0.9911 | 0.9965 | 0.9933 | 0.9919 | 0.9913 | 0.9908 |
rdiff | 0.0065 | 0.0032 | 0.0016 | 0.0065 | 0.0032 | 0.0016 | 0.0006 | 0.0065 | 0.0032 | 0.0016 | 0.0006 | 0.0003 |
S2 Deff | 0.9972 | 0.9949 | 0.9928 | 0.9972 | 0.9956 | 0.9938 | 0.9917 | 0.9980 | 0.9989 | 0.9956 | 0.9931 | 0.9918 |
rdiff | 0.0065 | 0.0032 | 0.0014 | 0.0065 | 0.0032 | 0.0016 | 0.0005 | 0.0065 | 0.0033 | 0.0016 | 0.0006 | 0.0003 |
S3 Deff | 1.0102 | 1.0306 | 1.0147 | 1.0217 | 1.0622 | 1.0431 | 1.0132 | 1.0575 | 1.1788 | 1.1143 | 1.0487 | 1.0204 |
rdiff | 0.0066 | 0.0035 | 0.0016 | 0.0066 | 0.0036 | 0.0018 | 0.0006 | 0.0066 | 0.0036 | 0.0019 | 0.0007 | 0.0003 |
S4 Deff | 1.1038 | 1.0323 | 1.0285 | 1.2026 | 1.0538 | 1.0604 | 1.0184 | 1.4788 | 1.1290 | 1.1588 | 1.0559 | 1.0186 |
rdiff | 0.0077 | 0.0051 | 0.0027 | 0.0077 | 0.0052 | 0.0030 | 0.0011 | 0.0077 | 0.0053 | 0.0030 | 0.0013 | 0.0006 |
ICC = 0.10 | ||||||||||||
N subjects | 100 | 200 | 500 | |||||||||
N centers | 5 | 10 | 20 | 5 | 10 | 20 | 50 | 5 | 10 | 20 | 50 | 100 |
S1 Deff | 0.9655 | 0.9356 | 0.9197 | 0.9642 | 0.9337 | 0.9209 | 0.9105 | 0.9631 | 0.9313 | 0.9177 | 0.9124 | 0.9076 |
rdiff | 0.0643 | 0.0318 | 0.0155 | 0.0649 | 0.0320 | 0.0160 | 0.0061 | 0.0649 | 0.0324 | 0.0161 | 0.0063 | 0.0031 |
S2 Deff | 0.9709 | 0.9469 | 0.9269 | 0.9696 | 0.9547 | 0.9359 | 0.9171 | 0.9793 | 0.9827 | 0.9549 | 0.9300 | 0.9174 |
rdiff | 0.0656 | 0.0318 | 0.0142 | 0.0648 | 0.0323 | 0.0157 | 0.0053 | 0.0651 | 0.0325 | 0.0161 | 0.0063 | 0.0028 |
S3 Deff | 1.1101 | 1.3018 | 1.1721 | 1.2095 | 1.6471 | 1.4256 | 1.1337 | 1.6662 | 2.7175 | 2.1685 | 1.4965 | 1.2049 |
rdiff | 0.0654 | 0.0349 | 0.0166 | 0.0659 | 0.0354 | 0.0182 | 0.0063 | 0.0662 | 0.0358 | 0.0185 | 0.0074 | 0.0034 |
S4 Deff | 2.0718 | 1.3360 | 1.2725 | 3.1669 | 1.5750 | 1.6252 | 1.1934 | 6.2708 | 2.5759 | 2.5886 | 1.5687 | 1.2017 |
rdiff | 0.0768 | 0.0507 | 0.0272 | 0.0770 | 0.0517 | 0.0299 | 0.0110 | 0.0771 | 0.0513 | 0.0299 | 0.0126 | 0.0059 |
Some specific designs
Stratified Multicenter Individually Randomized Trial
In a stratified multicenter individually randomized trial, the Deff is smaller than 1 and its value decreases as the ICC increases, which involves a gain in power allowing a reduction in sample size, as shown by Vierron et al. [5].
Matched Pair Design
where σ ^{2} is the variance of observations in a standard parallel group design.
where d is the difference in mean responses from the two groups.
Cluster Randomized Trial and Expertise-based Randomized Trial
where is the mean cluster size. This value is the inflation factor [4], used for sample size calculation in cluster randomized trials.
Multicenter Observational Study
Thus, in an observational study, with all centers having identical group distributions – even if the global group sizes are not equal (i.e., even if n _{1} ≠ n _{2}) – taking into account the center effect leads to increased power, as with stratified individually randomized trials.
No design effect: Deff = 1.
where Φ(·) is defined as the cumulative density function of N(0,1). As the design effect increases and exceeds 1, the power decreases and sample size has to be inflated to reach the nominal power. On the contrary, when the design effect value is below 1, the power is larger than the nominal one, allowing reducing the required sample size.
Example
Design effects calculations for three different group distributions among centers.
Group distribution among centers | Quite homogeneous | Heterogeneous | Cluster design | ||||||
---|---|---|---|---|---|---|---|---|---|
Group size per center | m _{1j } | m _{2j } | %* | m _{1j } | m _{2j } | %* | m _{1j } | m _{2j } | %* |
Center 1 (n = 57) | 16 | 41 | 28 | 11 | 46 | 19 | 0 | 57 | 0 |
Center 2 (n = 38) | 10 | 28 | 26 | 24 | 14 | 63 | 38 | 0 | 100 |
Center 3 (n = 44) | 11 | 33 | 25 | 7 | 37 | 16 | 0 | 44 | 0 |
Center 4 (n = 15) | 3 | 12 | 20 | 1 | 14 | 7 | 0 | 15 | 0 |
Center 5 (n = 41) | 9 | 32 | 22 | 8 | 33 | 20 | 0 | 41 | 0 |
Center 6 (n = 19) | 5 | 14 | 26 | 10 | 9 | 53 | 19 | 0 | 100 |
Center 7 (n = 37) | 8 | 29 | 22 | 9 | 28 | 24 | 0 | 37 | 0 |
Center 8 (n = 52) | 12 | 40 | 23 | 4 | 48 | 8 | 0 | 52 | 0 |
Center 9 (n = 12) | 3 | 9 | 25 | 1 | 11 | 8 | 0 | 12 | 0 |
Center 10 (n = 28) | 8 | 20 | 29 | 10 | 18 | 36 | 28 | 0 | 100 |
S | 0.14 | 5.79 | 33.77 | ||||||
Deff ( ρ = 0.10) | 0.91 | 1.48 | 4.28 |
To illustrate the impact of heterogeneity between the global group sizes on the design effect, we considered hypothetical situations, less likely to occur, where 10 centers recruit 20 subjects each, for balanced designs (i.e., n _{1} = n _{2}, Table S4a in Additional file 1) and imbalanced designs (i.e., n _{1} ≠ n_{2}, Table S4b in Additional file 1), and for different levels of heterogeneity of group distributions among centers and two ICC values. As expected, the Deff increases with S and increases with the ICC. Moreover, if we focus on the "strongly heterogeneous" column, we observe a higher Deff with imbalance between the two groups (Table S4b in Additional file 1, Deff = 1.757 for ρ = 0.1) than with balance between the groups (Table S4a in Additional file 1, Deff = 1.620 for ρ = 0.1), which can be analytically explained (Appendix 2). Thus, the impact of heterogeneity of the group distributions among centers is greater with increased imbalance between the two group sizes. See additional file 1 for results from this example.
Discussion and conclusion
In a multicenter study, the design effect measures the effect of clustering due to multisite recruitment of subjects. As shown in formula (18), the power of such a study is directly affected by the design effect value. Our work aimed explaining why some situations of multicenter studies, such as individually randomized trials, lead to a gain in power whereas others, such as cluster randomized trials lead to a loss of power.
We derived a simple formula assessing the clustering effect in a multicenter study aiming to estimate the effect of a binary factor on a continuous outcome, through an individual level analysis with a mixed effect model: Deff = 1+(S-1)ρ. The design effect depends on ρ, the correlation between observations from the same center. It also depends on S, a statistic that quantifies the degree of heterogeneity of group distributions among centers, and in other words, the level of association between the binary factor and the center. S increases with the heterogeneity of the group distributions among centers, which leads to an increased Deff and a loss of power, and falls below 1 when the group distributions are identical between centers, thus leading to a Deff below 1 and a gain in power. It is now known that balanced designs such as individually randomized trials increase their power when including the center effect in analyses [5], and that cluster randomized trials should increase their sample size to reach the nominal power and account for the center effect in the analyses to protect against type I error inflation [4]. Our simple formula throws light on the relation between these two situations and allows calculating the design effect for any multicenter design.
We used in our developments a weighted method to assess the group effect: this method gives equal weight to each subject, whatever the size of his/her center is. Different methods of analysis could be used. In the frame of multicenter randomized trials, Lin et al. and Senn et al. discuss this point and show that a weighted analysis is more powerful than an unweighted one, particularly when there is unbalance in sample sizes between centers [11, 12]. The weighted method is then often recommended for analyses of data from multicenter randomized trials, what justifies our choices for model (1) [13]. However, in clusters randomized trials, Kerry et al. show that the minimum variance weights are the most efficient weights in the estimation of the design effect in the presence of important imbalance between the clusters sizes, but that weighting the clusters by their sizes give similar – though over estimated – results, except when clusters are large [14]. Our formula aims to apply to any multicenter study, whatever its design is, from individually to cluster randomized trials. Then, it may not use the most powerful method of calculation for some particular multicenter designs but has the great advantage to be simple and general.
Apart from the mixed effect model (1) we described, we did not develop the practical aspect of the analysis stage of a multicenter study. Several statistical software packages are available to perform analyses of correlated data, such as data from multicenter designs. Zhou et al. and Murray et al. review many of these programs and detail, among others, appropriate procedures and available options allowing specifying data modeling [15, 16]. Moreover, some tutorials present step-by-step illustrations of the use of SAS and SPSS mixed model procedures [17, 18]. Lastly, Pinheiro and Bates provide an overview of the application of mixed-effects models in S and S-PLUS which are easily transposable to the R software [19].
In the field of cluster randomized trials, several authors worked on the planning of studies through the design effect and sample size calculations and proposed extensions of classical formula, for example to account for imbalance in cluster sizes [20, 21]. Our formula does not aim to substitute for these more specific and precise formula but to connect several multicenter designs through a design effect formula. This result helps in understanding the impact of the correlation on power of multicenter studies, whatever their designs are, and is particularly useful for observational studies where the center effect question is not often taken into account at the planning and/or at the analysis stages [22, 23]. However, when extended design effects formulas exist, dealing with a particular problem such as that of imbalance cluster sizes in cluster randomized trials, we recommend using them.
This simple result could now be extended to designs including, for example, several nested or crossed levels of correlation. One can then consider cluster-cluster randomization, or cluster then individual randomization and all observational designs including multiple levels of correlation between outcomes. Such designs could bring mixture of gain and loss of power, according to the multiple correlation levels considered. For example, Diehr et al. studied the case of matched-pair cluster designs and Giraudeau et al. the case of cluster randomized cross-over designs [24, 25]. A lot of situations like these ones could be explored to extend our result to more complex designs.
To conclude, clustering of data is a logical consequence of multicenter designs [26, 27]. Some designs allow for controlling some factors (e.g., balancing and homogenizing the treatment distribution in individually randomized trials), whereas others exclude such possibility. This latter situation occurs mainly in observational studies, for which there is no way to control the prevalence or distribution of any factor. Since multicenter studies range in design, from homogeneous and balanced designs to "cluster" distribution designs, the design effect can induce a gain or a loss of power as we described. The main advantage of the design effect formula we proposed is its simplicity and its ability to apply to any multicenter study. Its potential weakness would be the difficulty, for an investigator who plans a multicenter study, to obtain an accurate estimate of S, the degree of heterogeneity of the group distributions between centers, and of the ICC. In the field of cluster randomized trials, important efforts have been done to improve ICC estimates reporting, which should now be followed for any multicenter study [28, 29]. In the same way, recommendations should be made for encouraging the reporting of Deff calculation, or of the S statistic, from any multicenter study publication. Associated with an ICC estimate, this information could help researchers in planning new multicenter – particularly observational – studies.
Appendix 1
Calculation of the group effect variance with a two-way ANOVA
Since the centers are independent, we have:
corr(Y _{ ijk }; Y _{ i'j'k'}) = 0 for j ≠ j' and
Appendix 2
Rewriting the S statistic with the between-center group size variances
Hence, assuming centers are of equal sizes, for a given total sample size N, number of centers Q, and between-center group size variance V _{ i }, the higher the difference between and 1 the higher the statistic S. Then, the Deff increases with the degree of imbalance between the two group sizes. This result generalizes to designs with unequal center sizes, because the S statistic always depends on . However, quantitative prediction of the impact of the ratio on the Deff is not straightforward because the center size variance, V _{ m }, and the covariance term between V _{ m }and V _{2} are, in this case, not null.
Declarations
Acknowledgements
EV was supported by a doctoral fellowship from the Ministère de l'Enseignement Supérieur et de la Recherche, France.
Authors would like to thank the two referees for their helpful and constructive comments.
Authors’ Affiliations
References
- Localio AR, Berlin JA, Ten Have TR, Kimmel SE: Adjustments for center in multicenter studies: an overview. Ann Intern Med. 2001, 135: 112-123.View ArticlePubMedGoogle Scholar
- Kerry SM, Bland JM: The intracluster correlation coefficient in cluster randomisation. BMJ. 1998, 316: 1455-View ArticlePubMedPubMed CentralGoogle Scholar
- Kish L: Survey sampling. 1965, New York: John WileyGoogle Scholar
- Donner A, Klar N: Design and Analysis of Cluster Randomization Trials in Health Research. 2000, London: ArnoldGoogle Scholar
- Vierron E, Giraudeau B: Sample size calculation for multicenter randomized trial: taking the center effect into account. Contemp Clin Trials. 2007, 28: 451-458. 10.1016/j.cct.2006.11.003.View ArticlePubMedGoogle Scholar
- Fleiss JL: The Design and Analysis of Clinical Experiments. 1986, New York: WileyGoogle Scholar
- Machin D, Campbell M, Fayers P, Pinol A: Sample size tables for clinical studies. 1997, London: Blackwell Science, 2Google Scholar
- Lee KJ, Thompson SG: Clustering by health professional in individually randomised trials. BMJ. 2005, 330: 142-144. 10.1136/bmj.330.7483.142.View ArticlePubMedPubMed CentralGoogle Scholar
- Devereaux PJ, Bhandari M, Clarke M, Montori VM, Cook DJ, Yusuf S, Sackett DL, Cina CS, Walter SD, Haynes B, Schunemann HJ, Norman GR, Guyatt GH: Need for expertise based randomised controlled trials. BMJ. 2005, 330: 88-10.1136/bmj.330.7482.88.View ArticlePubMedPubMed CentralGoogle Scholar
- Julious SA: Sample sizes for clinical trials with normal data. Stat Med. 2004, 23: 1921-1986. 10.1002/sim.1783.View ArticlePubMedGoogle Scholar
- Lin Z: An issue of statistical analysis in controlled multi-centre studies: how shall we weight the centres?. Stat Med. 1999, 18: 365-373. 10.1002/(SICI)1097-0258(19990228)18:4<365::AID-SIM46>3.0.CO;2-2.View ArticlePubMedGoogle Scholar
- Senn S: Some controversies in planning and analysing multi-centre trials. Stat Med. 1998, 17: 1753-1765. 10.1002/(SICI)1097-0258(19980815/30)17:15/16<1753::AID-SIM977>3.0.CO;2-X.View ArticlePubMedGoogle Scholar
- ICH Topic E 9. Note for guidance on statistical principles for clinical trials. The European Agency for the Evaluation of Medicinal Products: 1998. 1998Google Scholar
- Kerry SM, Bland JM: Unequal cluster sizes for trials in English and Welsh general practice: implications for sample size calculations. Stat Med. 2001, 20: 377-390. 10.1002/1097-0258(20010215)20:3<377::AID-SIM799>3.0.CO;2-N.View ArticlePubMedGoogle Scholar
- Murray DM, Varnell SP, Blitstein JL: Design and analysis of group-randomized trials: a review of recent methodological developments. Am J Public Health. 2004, 94: 423-432. 10.2105/AJPH.94.3.423.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhou X, Perkins A, Hui S: Comparison of software packages for generalized linear multilevel models. American Statistician. 1999, 53: 282-290. 10.2307/2686112.Google Scholar
- Peugh J, Enders C: Using the SPSS mixed procedure to fit cross-sectional and longitudinal multilevel models. Educational and Psychological Measurement. 2005, 65: 717-741. 10.1177/0013164405278558.View ArticleGoogle Scholar
- Singer J: Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models. Journal of Educational and Behavioral Statistics. 1998, 24: 323-355.View ArticleGoogle Scholar
- Pinheiro J, Bates D: Mixed-Effects Models in S and S-PLUS. 2000, New-York: SpringerView ArticleGoogle Scholar
- Eldridge SM, Ashby D, Kerry S: Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol. 2006, 35: 1292-1300. 10.1093/ije/dyl129.View ArticlePubMedGoogle Scholar
- Guittet L, Ravaud P, Giraudeau B: Planning a cluster randomized trial with unequal cluster sizes: practical issues involving continuous outcomes. BMC Med Res Methodol. 2006, 6: 17-10.1186/1471-2288-6-17.View ArticlePubMedPubMed CentralGoogle Scholar
- DeLong ER, Coombs LP, Ferguson TB, Peterson ED: The evaluation of treatment when center-specific selection criteria vary with respect to patient risk. Biometrics. 2005, 61: 942-949. 10.1111/j.1541-0420.2005.00358.x.View ArticlePubMedGoogle Scholar
- Greenfield S, Kaplan SH, Kahn R, Ninomiya J, Griffith JL: Profiling care provided by different groups of physicians: effects of patient case-mix (bias) and physician-level clustering on quality assessment results. Ann Intern Med. 2002, 136: 111-121.View ArticlePubMedGoogle Scholar
- Diehr P, Martin DC, Koepsell T, Cheadle A: Breaking the matches in a paired t-test for community interventions when the number of pairs is small. Stat Med. 1995, 14: 1491-1504. 10.1002/sim.4780141309.View ArticlePubMedGoogle Scholar
- Giraudeau B, Ravaud P, Donner A: Sample size calculation for cluster randomized cross-over trials. Stat Med. 2008, 27: 5578-5585. 10.1002/sim.3383.View ArticlePubMedGoogle Scholar
- Chuang JH, Hripcsak G, Heitjan DF: Design and analysis of controlled trials in naturally clustered environments: implications for medical informatics. J Am Med Inform Assoc. 2002, 9: 230-238. 10.1197/jamia.M0997.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee KJ, Thompson SG: The use of random effects models to allow for clustering in individually randomized trials. Clin Trials. 2005, 2: 163-173. 10.1191/1740774505cn082oa.View ArticlePubMedGoogle Scholar
- Campbell MK, Elbourne DR, Altman DG: CONSORT statement: extension to cluster randomised trials. Bmj. 2004, 328: 702-708. 10.1136/bmj.328.7441.702.View ArticlePubMedPubMed CentralGoogle Scholar
- Campbell MK, Grimshaw JM, Elbourne DR: Intracluster correlation coefficients in cluster randomized trials: empirical insights into how should they be reported. BMC Med Res Methodol. 2004, 4: 9-10.1186/1471-2288-4-9.View ArticlePubMedPubMed CentralGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/9/39/prepub
Pre-publication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.