This article has Open Peer Review reports available.
Use of clustering analysis in randomized controlled trials in orthopaedic surgery
© Oltean and Gagnier; licensee BioMed Central. 2015
Received: 4 September 2014
Accepted: 13 February 2015
Published: 8 March 2015
The effects of clustering in randomized controlled trials (RCTs) and the resulting potential violation of assumptions of independence are now well recognized. When patients in a single study are treated by several therapists, there is good reason to suspect that the variation in outcome will be smaller for patients treated in the same group than for patients treated in different groups. This potential correlation of outcomes results in a loss of independence of observations. The purpose of this study is to examine the current use of clustering analysis in RCTs published in the top five journals of orthopaedic surgery.
RCTs published from 2006 to 2010 in the top five journals of orthopaedic surgery, as determined by 5-year impact factor, that included multiple therapists and/or centers were included. Identified articles were assessed for accounting for the effects of clustering of therapists and/or centers in randomization or analysis. Logistic regression used both univariate and multivariate models, with use of clustering analysis as the outcome. Multivariate models were constructed using stepwise deletion. An alpha level of 0.10 was considered significant.
A total of 271 articles classified as RCTs were identified from the five journals included in the study. Thirty-two articles were excluded due to inclusion of nonhuman subjects. Of the remaining 239 articles, 186 were found to include multiple centers and/or therapists. The prevalence of use of clustering analysis was 21.5%. Fewer than half of the studies reported inclusion of a statistician, epidemiologist or clinical trials methodologist on the team. In multivariate modeling, adjusting for clustering was associated with a 6.7 times higher odds of inclusion of any type of specialist on the team (P = 0.08). Likewise, trials that accounted for clustering had 3.3 times the odds of including an epidemiologist/clinical trials methodologist than those that did not account for clustering (P = 0.04).
Including specialists on a study team, especially an epidemiologist or clinical trials methodologist, appears to be important in the decision to account for clustering in RCT reporting. The use of clustering analysis remains an important piece of unbiased reporting, and accounting for clustering in RCTs should be a standard reporting practice.
The effects of clustering in randomized controlled trials (RCTs) and the resulting potential violation of assumptions of independence are now well recognized [1,2]. For example, when patients in a single study are treated by several therapists, there may be reason to suspect that the variation in outcome will be smaller between patients treated by the same clinician than between patients treated by different clinicians [2-4]. Clustering effects may arise when there is a potential for correlation of outcomes among patients in similar groups, which can result in a loss of independence of observations. Analyses taking into account correlation among patients may be especially important for studies involving treatment and care that depend on substantial skill or training; however, differential therapist effects may also arise due to personality differences, personal experience, or infrastructure [1,2,5,6]. Clustering may also occur by location (e.g., in a multicenter trial or inpatient vs. outpatient facilities) or in a study that randomizes by cluster (a cluster randomized trial), in which those delivering the intervention (e.g., surgeons) represent different clusters. On occasion, differences between clusters may be an outcome of interest to researchers; however, the outcome in RCTs is typically a measure of effectiveness of some intervention, irrespective of any clusters. That is, if the effectiveness of an intervention is influenced by some cluster (e.g., the healthcare provider delivering treatment) and if this is not measured or accounted for in the study, this introduces a source of bias into any outcomes measured . These potential sources of bias (e.g., center cluster or therapist cluster) have implications for outcome effect magnitudes and directions, and thus should be recognized and treated accordingly. The design and analysis of RCTs should account for the possible heterogeneity in cluster size and intracluster correlation to appropriately analyze results .
The majority of statistical analyses used in RCTs are based on the assumption that observed outcomes on different patients are independent . Independence of observations is a basic assumption of many widely used statistical tests, including t-tests and generalized linear modeling. Between-cluster variation may therefore lead to a loss of precision and reduced power when estimating treatment effects [2,6]. Clustering therefore has implications for the required sample size of an RCT; the impact depends on the study design and analysis used . The magnitude of the effect additionally depends on cluster size and intracluster correlation coefficients (ICCs) . The extent to which the treatment effect varies across clusters can have a major impact on the interpretation of a trial’s results; however, there is often not enough information to obtain a precise estimate of the clustering effect, since most trials are not powered to detect this variability . The magnitude of clustering may depend on cluster type, setting, and type of outcome, as well as time since receiving the intervention . Therefore, if clustering is believed a priori to be a realistic possibility, it is important to account for it in analysis to appropriately interpret the treatment effect .
These effects are illustrated in a study by Lee and Thompson , in which two published trials were re-analyzed using an analysis method that accounted for the effects of clustering, which was not used in the original publication. They found that if potential clustering is ignored, uncertainty may be underestimated, producing too extreme p values and even altering the results of a trial . In their first re-analysis, the authors looked at a trial assessing the effectiveness of teleconsultations performed by 20 consultants. The original study analyzed the observations as independent and concluded that the treatment was significantly more effective than the control. In a re-analysis of the study data using a random effects model, Lee and Thompson  found that clustering by consultant was significant. When this clustering was controlled for in the model, the resulting odds ratio became nonsignificant, therefore altering the results of the trial. In the re-analysis of a second study, the results of an exercise class delivered by 21 physiotherapists were called into question when it was determined that the standard error in a model controlling for clustering was larger than originally determined. This suggested a wide variation in treatment effect and again alters the interpretation of the study results.
In a second study that re-analyzed the data of two clinical trials to account for clustering, Roberts and Roberts  again found that the standard errors of the treatment effects markedly increased. A study by Cook et al. , analyzing ICCs for 198 outcomes across 10 multicenter surgical trials, demonstrated clustering effects at both the center and surgeon level and concluded that clustering of outcome is more of an issue than has been previously acknowledged. These examples demonstrate the dramatic effect that clustering may have and the mistaken conclusions that can be drawn if it is ignored in the analyses.
In one study specifically assessing a large orthopaedics surgical trial, Biau et al.  found provider effects to be highly significant in re-analysis. These provider effects were found to be more significant in highly specialized fields, such as orthopaedics, in contrast to general surgery . Using volume of patients seen per surgeon as a proxy for surgeon experience, higher surgeon experience was shown to correlate with better patient outcomes . This study therefore suggests that controlling for clustering effects is especially important in studies that involve highly skilled therapists.
Clustering in randomized clinical trials can be dealt with in many ways. Several methods of accounting for clustering are widely recognized: randomizing patients within each cluster (e.g., to the treatment provider), cluster-level analysis, fixed-effects models, random effects models, or generalized estimating equations [6,9].
Despite multiple studies demonstrating the importance of clustering analysis and available methodological and statistical approaches for handling it, accounting for clustering is not routine in the analysis of published RCTs . Based on findings in the general literature [4,7,10], we hypothesized that the prevalence of the use of clustering analysis reported in the orthopaedic literature would be low. Studies in the field of orthopaedics often involve highly skilled therapists and therefore have great potential to be affected by clustering . The primary objective of the present study was to determine the prevalence of reporting of the use of clustering analysis in RCTs published in the top five orthopaedic journals between 2006 and 2010. A secondary objective was to identify factors predicting the use or neglect of use of clustering analysis in the RCTs included in this study.
Identification of articles
We identified the top five journals with the highest 5-year impact factor in the area of orthopaedics as listed in the 2010 ISI Web of Knowledge Journal Citation Reports . Journals included were: American Journal of Sports Medicine (AJSM), Journal of Bone and Joint Surgery (JBJS), Journal of Orthopaedic Research (JOR), Osteoarthritis and Cartilage (OC), and The Spine Journal (SJ). All articles in all issues of these journals published between 2006 and 2010 were hand-searched. Inclusion criteria for articles were: randomized allocation of participants to two or more groups, inclusion of human subjects, and inclusion of a potential grouping variable (e.g., multiple therapists or treatment centers). Articles were excluded from analysis if they included nonhuman subjects, if they were conducted in a single center by a single therapist, or if they did not report enough information to determine inclusion. A single individual assessed all articles for inclusion (HO) and a random proportion, approximately 10%, were checked by a second individual (JG) to ensure eligibility. These two individuals met to discuss any disagreements, which were resolved by discussion.
Articles meeting the inclusion criteria were then searched for the following data: reporting of the number of therapists delivering treatment, number of centers used in the study, whether or not multiple therapists or centers were accounted for in randomization or statistical analysis, year of publication, the impact factor of the journal in the publication year, whether the study team included a statistician, whether the study team included an epidemiologist or clinical trials methodologist, the sample size of the study, and whether the primary outcome (defined as the outcome described as primary, to which the study was powered, or the first outcome reported) of the study was categorically positive, neutral, or negative. A statistician was defined as any person with a graduate degree in statistics or biostatistics. An epidemiologist/clinical trials methodologist was likewise defined as any person with a graduate degree in epidemiology, public health, or clinical research. The corresponding author was contacted via e-mail and asked if either specialist was included on the study team, if this information was not clear from the published report. The outcome of the study was defined as positive if the study findings supported the a priori hypothesis, defined as negative if the primary outcome was in the opposite direction hypothesized, and neutral if no significant effect was demonstrated.
The outcome measure of interest was accounting for clustering by therapist, accounting for clustering by center, and accounting for any type of clustering, either in randomization or analysis. All data were extracted by one individual with expertise in epidemiology and biostatistics.
Data were compiled in Excel (Microsoft, Redmond, Washington) spreadsheets and imported into SAS, version 9.3, statistical software (SAS Institute Inc., Cary, North Carolina) for statistical analysis. Frequency measures were computed for all data. Logistic regression was conducted using both univariable and multivariable models. Univariable logistic regression was performed for each predictor variable on each outcome variable, with no adjustments. Multivariable models were constructed using stepwise deletion, with deletion of the variable with the highest p value in each case. All variables were checked for collinearity before inclusion in multivariable models, and collinear variables were tested separately. Clustering effects by journal were checked using generalized estimating equations (GEE). Odds ratios and 95% confidence intervals were produced in all analyses. An alpha level of 0.10 was considered significant for all tests . Confidence intervals reported are at the 95% level.
Characteristics of articles included in analysis (N = 186)
Reported any clustering analysis
Reported inclusion of a statistician on study team
Report inclusion of a clinical trials methodologist or epidemiologist on study team
Reported inclusion of either specialist
Reported inclusion of both specialists
Reported positive outcome
Specified null hypothesis but reported positive outcome
Reported negative outcome
Reported neutral outcome
Sample size (mean, SD)
Methods used to account for clustering (N = 40*)
GEE or controlled variable
Logistic regression of use of any clustering analysis by predictors (N = 186)
95% CI (Pvalue)
Logistic regression of use of clustering analysis for multiple centers by predictors (N = 87)
95% CI (Pvalue)
Logistic regression of use of clustering analysis for multiple therapists by predictors (N = 145)
95% CI (Pvalue)
Multivariable analysis of any use of clustering analysis, stepwise deletion
Model 1 OR(CI),Pvalue
Model 2 OR(CI),Pvalue
Model 3 OR(CI),Pvalue
Model 4 OR(CI),Pvalue
Model 5 OR(CI),Pvalue
Model 6 OR(CI),Pvalue
Multivariable analysis of any use of clustering analysis after elimination of collinear variables, stepwise deletion for each specialist variable
Model 1 OR(CI),Pvalue
Model 2 OR(CI),Pvalue
Model 3 OR(CI),Pvalue
Model 4 OR(CI),Pvalue
Our study on the use of clustering analysis in orthopaedic research suggests that a small proportion of studies are currently employing these important statistical methods. Multivariable modeling of predictors associated with the presence of adjustment for clustering showed a strong and significant association between any type of clustering adjustment and inclusion of an epidemiologist/clinical trials methodologist on the study team.
Our study has several strengths and weaknesses. First, we systematically identified every RCT published in the top five journals of orthopaedic surgery between 2006 and 2010. This method of limiting to specific journals allowed for the entire target population of articles to be identified, as opposed to an electronic literature search that may miss potential articles meeting the inclusion criteria. Use of the top five journals also allows the assumption of a conservative estimate in our findings. But on the other hand, this cannot be generalized to other journals or to the broader orthopaedic literature. Also, while a single individual did inclusion for all articles, a second individual cross-checked a random selection of articles, which minimizes any selection bias.
Identified articles were then reviewed for inclusion and relevant data were extracted by a single researcher with experience in epidemiology and biostatistics. This extraction method allowed for consistency across articles and maintained homogenous definitions throughout the process; however, while there may be potential for bias due to extraction by a single reviewer, both authors met throughout the extraction process to clarify interpretations of extracted data. Despite efforts to extract all relevant data from all articles in the target population, data were underreported in several of the articles. Missing data were especially notable for the variables “biostatistician” and “epidemiologist/clinical trials methodologist”; the majority of author or study member specialties were not reported in the articles or easily identifiable from headings. In an effort to minimize the missing data, the corresponding author of each article was contacted and asked about the specialties of members of the study team. However, not all authors responded to the request for data. The underreporting here may bias our results. One possibility is that studies not reporting study member specialties may have been less likely to perform clustering analysis. If this was the case, our study would represent the higher-quality articles and therefore potentially be an over estimate of the use of clustering analysis. This hypothesis remains to be tested.
The method of stepwise regression used in the analysis of these data is controversial in some contexts, but generally remains an accepted method of hypothesis testing and generation. We are not aware of any other literature investigating predictors of accounting for clustering, and the investigational nature of this objective led us to this approach. Further studies are needed to verify these findings. Furthermore, the method of using GEEs for accounting for clustering in our analyses has recently been shown in Poisson data to increase the likelihood of type 1 errors , but not in binary outcomes. That is, in another paper Monte Carlo simulations showed that GEE models had better power at detecting within-cluster homogeneity than did other methods when examining binary outcomes . We recommend additional simulations be carried out to determine the validity of this approach.
A final potential weakness of the study is the cut-off date of 2010. It is possible that in the year and a half between our cut-off date and the analysis of these data, levels of the use of clustering analysis in orthopaedic RCT studies have changed. However, there is no known identifiable event that would initiate such a change, making this a marginal concern. Overall, our analysis is only applicable to the year of papers we reviewed for these journals. But, we still hold that this analysis represents relatively recent RCTs in orthopaedic surgery and their use of clustering analyses.
Although several papers have previously demonstrated the importance of taking clustering into account in RCTs, this type of analysis has not yet become standard practice [7,10]. Our study suggests a low prevalence of adjustment for clustering effects in RCTs published in the orthopaedic literature, with only 21.5% of included articles using any of these important methods. To the best of our knowledge, our study is the first to look at potential predictors of the use of clustering adjustment in RCTs. Multivariable modeling of predictors associated with adjustment for clustering showed a strong and significant association between any type of clustering adjustment and inclusion of an epidemiologist/clinical trials methodologist on the study team. A large effect was also seen for the inclusion of any type of specialist (epidemiologist/clinical trials methodologist or biostatistician). This finding was expected, in that individuals specifically trained in clinical research methods are more likely to employ proper methodology. By demonstrating the association between an adjustment for clustering in a study and the presence of an epidemiologist or clinical trials methodologist on the study team, we are able to make recommendations for practical ways to improve the use of these important statistical methods. For example, the inclusion of an epidemiologist or clinical research methodologist in the study design phase a priori could ensure that proper methods are planned and implemented that limit or control for the effects of clustering (e.g., stratification, limiting the number of centers/providers, homogeneous cluster sizes, statistical analyses to adjust for clustering).
We were surprised to find that the inclusion of a biostatistician was not significantly associated with increased use of clustering adjustment methods. One potential explanation is that epidemiologists or clinical trial methodologists are often included from the design phase of a study, whereas biostatisticians are often only included in the analysis phase. Since our outcome is defined as accounting for clustering effects in either randomization or statistical analysis, involvement of a specialist a priori in the study is an important consideration. This a priori versus ad hoc inclusion may be associated with a greater use of proper adjustment techniques; however, this hypothesis remains to be tested.
In addition to a lack of proper author specialization on study teams, there are several other potential reasons that adjustment for clustering effects is not currently a common practice. As mentioned above, adjustment for clustering generally increases the sample size needed for a given power, making recruitment a longer or more difficult process and potentially increasing funding and other resource needs. This could act as a barrier to researchers who might initially be interested in examining clustering effects within their studies. We found that many of the included studies reported that the therapists had similar training or that there were no noted differences between therapists. But this is insufficient, as clustering effects may still exist and equality of therapists cannot be assumed. We recommend that clinical trialists perform these analyses where relevant and that institutional review boards and peer reviewers be careful to point out the need for these analyses. In addition, a set of standards could be developed that outline when and how these adjustments can be done, providing concrete examples and empirical evidence of this need.
The effect of clustering may be difficult to detect in studies that are underpowered when divided by cluster; however, statistical analyses that ignore the presence of potential clustering will most likely result in overly precise and therefore misleading estimates . The methods for performing sample size calculations for studies with clustering effects depend on the type of data for the primary outcome of interest (e.g., continuous, binary, count). Several methods are suggested in the literature and several statistical packages have the ability to derive these estimates [15,16]. As an example, many studies use outcome measures that produce continuous data, for which an ICC is needed to calculate sample size; this requires an a priori knowledge of within- and between- cluster variances . Several efforts are underway to encourage the use of clustering analysis through the creation of databases of ICCs for various outcomes used in surgical trials . These databases will give researchers information on the likely magnitude of ICCs for different outcomes and enable the use of clustering effect estimates in the planning stages of a trial. This in turn will enable accurate sample size calculation in the design phase of a study and thus adequate power to test hypotheses . Cook et al.  suggest that the optimal use of available data would involve a formal meta-analysis of ICC estimates. Furthermore, more work is needed on sample size calculations and methods of accounting for clustering for binary and count data in clinical research. This important research should be prioritized, with the goal of informing researchers of possible clustering effects by outcome and enabling better practices in analyses through a priori understanding of potential clustering effects.
On the basis of our findings, we see a need for the improvement in methodology when dealing with clustering in RCTs. Strongly associated with adjusting for clustering was the inclusion on the study team of a specialist in biostatistics and/or epidemiology/clinical trials methodology. Investigators planning RCTs should make careful selection of their study teams to ensure that proper expertise is included. Additionally, the use of databases categorizing ICCs for different outcomes from the planning stages of a trial will improve sampling and study design and help reduce the effects of clustering.
We would like to acknowledge MS Whitney Townsend MLIS, for her help with the electronic database searching for this project.
- Cook JA, Bruckner T, MacLennan GS, Seiler CM. Clustering in surgical trials–database of intracluster correlations. Trials. 2012;13:2.View ArticlePubMedPubMed CentralGoogle Scholar
- Roberts C, Roberts SA. Design and analysis of clinical trials with clustering effects due to treatment. Clin Trials. 2005;2:152–62.View ArticlePubMedGoogle Scholar
- Roberts C. The implications of variation in outcome between health professionals for the design and analysis of randomized controlled trials. Stat Med. 1999;18:2605–15.View ArticlePubMedGoogle Scholar
- Walwyn R, Roberts C. Therapist variation within randomised trials of psychotherapy: implications for precision, internal and external validity. Stat Methods Med Res. 2010;19:291–315.View ArticlePubMedGoogle Scholar
- Cook JA, Ramsay CR, Fayers P. Statistical evaluation of learning curve effects in surgical trials. Clin Trials. 2004;1:421–7.View ArticlePubMedGoogle Scholar
- Walters SJ. Therapist effects in randomised controlled trials: what to do about them. J Clin Nursing. 2010;19:1102–12.View ArticleGoogle Scholar
- Lee KJ, Thompson SG. The use of random effects models to allow for clustering in individually randomized trials. Clin Trials. 2005;2:163–73.View ArticlePubMedGoogle Scholar
- Biau DJ, Halm JA, Ahmadieh H, Capello WN, Jeekel J, Boutron I, et al. Provider and center effect in multicenter randomized controlled trials of surgical specialties: an analysis on patient-level data. Ann Surg. 2008;247:892–8.View ArticlePubMedGoogle Scholar
- Chu R, Thabane L, Ma J, Holbrook A, Pullenayegum E, Devereaux PJ. Comparing methods to estimate treatment effects on a continuous outcome in multicenter randomized controlled trials: a simulation study. BMC Med Res Method. 2011;11:21.View ArticleGoogle Scholar
- Biau DJ, Porcher R, Boutron I. The account for provider and center effects in multicenter interventional and surgical randomized controlled trials is in need of improvement: a review. J Clin Epidemiol. 2008;61:435–9.View ArticlePubMedGoogle Scholar
- Thompson Reuters Web of Science. Journal citation reports. New York: Thompson Reuters; 2010.Google Scholar
- Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analyses. J Clin Epidemiolo. 1996;49(12):1373–9.View ArticleGoogle Scholar
- Gao D, Grunwald GK, Xu S. Statistical methods for estimating within-cluster effects for clustered poisson data. J Biomet Biostat. 2013;4:1.View ArticleGoogle Scholar
- Austin PC. A comparison of the statistical power of different methods for the analysis of repeated cross-sectional cluster randomization trials with binary outcomes. Int J Biostat. 2010;6(1):Article 11.PubMedGoogle Scholar
- Hemming K, Girling AJ, Sitch AJ, Marsh J, Lilford RJ. Sample size calculations for cluster randomized controlled trials with a fixed number of clusters. BMC Med Res Method. 2011;11:102.View ArticleGoogle Scholar
- Reich NG, Myers JA, Obeng D, Milstone AM, Perl TM. Empirical power and sample size calculations for cluster-randomized and cluster-randomized crossover studies. PLoS One. 2012;7(4):e35564.View ArticlePubMedPubMed CentralGoogle Scholar
- Killip S, Mahfoud Z, Pearce K. What is an intracluster correlation coefficient? Crucial concepts for primary care researchers. Ann Fam Med. 2004;2(3):204–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Preisser JS, Reboussin BA, Song E-Y, Wolfson M. The importance and role of intracluster correlations in planning cluster trials. Epidemiology. 2007;18(5):552–60.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.