This article has Open Peer Review reports available.
Testing for heterogeneity among the components of a binary composite outcome in a clinical trial
© Pogue et al; licensee BioMed Central Ltd. 2010
Received: 29 October 2009
Accepted: 7 June 2010
Published: 7 June 2010
Investigators designing clinical trials often use composite outcomes to overcome many statistical issues. Trialists want to maximize power to show a statistically significant treatment effect and avoid inflation of Type I error rate due to evaluation of multiple individual clinical outcomes. However, if the treatment effect is not similar among the components of this composite outcome, we are left not knowing how to interpret the treatment effect on the composite itself. Given significant heterogeneity among these components, a composite outcome may be judged as being invalid or un-interpretable for estimation of the treatment effect. This paper compares the power of different tests to detect heterogeneity of treatment effect across components of a composite binary outcome.
Simulations were done comparing four different models commonly used to analyze correlated binary data. These models included: logistic regression for ignoring correlation, logistic regression weighted by the intra cluster correlation coefficient, population average logistic regression using generalized estimating equations (GEE), and random effects logistic regression.
We found that the population average model based on generalized estimating equations (GEE) had the greatest power across most scenarios. Adequate power to detect possible composite heterogeneity or variation between treatment effects of individual components of a composite outcome was seen when the power for detecting the main study treatment effect for the composite outcome was also reasonably high.
It is recommended that authors report tests of composite heterogeneity for composite outcomes and that this accompany the publication of the statistically significant results of the main effect on the composite along with individual components of composite outcomes.
Composite outcomes can often be difficult to interpret, especially when the treatment effects on some of its components individually show differences in magnitude or even in direction. For example, in a trial of localized intracoronary gamma-radiation therapy versus placebo  the primary composite outcome of death, myocardial infarction, or revascularization of target lesion showed an overall benefit of gamma-radiation compared to placebo (24.4% vs 42.1%, p = 0.02); however, myocardial infarction individually had a non-significant effect in the opposite direction (9.9% vs. 4.1%, p = 0.09). Many authors have expressed concerns regarding interpretation of a treatment effect for a composite outcome when it appears that there is heterogeneity in the treatment effect across the composite components [2–4]. How then can we best determine the existence of important composite heterogeneity in treatment effect among the individual components of a composite outcome?
A composite outcome is defined as having occurred if one of a group of outcomes occurs. The main treatment effect is defined as the absolute or relative difference between treatment and control in the proportions of participants who have at least one component of the composite. The problems with interpreting composite outcomes are well known. The treatment effect observed on the components may go in opposite directions and reduce the power of the trial [5, 6]. The components may not have similar importance or frequency to one another [2–4, 7]. These issues make composite outcomes difficult to interpret in many trials.
Despite difficulties with interpretation, trialists are unlikely to abandon composite outcomes. Trials in cardiovascular disease commonly use composite endpoints as their primary outcome  and there are efforts in many other areas of research to follow suit. Many authors have expressed the need to use composite outcomes to increase the feasibility of conducting clinical trials research in their areas including: cardiology [9, 10], HIV/AIDS , organ transplantation , psychiatric disorders , adverse event reporting , and obstetrics and gynecology . The reasons for use of composite outcomes are well documented and include: reduced sample size due to increased outcome rates, the ability to answer important questions quickly, capturing the multi-dimensional nature of disease, seeking a better understanding of total disease burden, the inability to select the most important of many outcomes, concerns with multiplicity for testing many outcomes, and addressing competing risks.
Various approaches have been suggested for the analysis and interpretation of composite outcomes. For example, a multivariate global test across all the components could be used to look for simultaneous demonstrated benefit; but readers may find it difficult to interpret such a result [16, 17]. Alternatively, if the composite shows a statistically significant treatment effect, the component specific tests can be performed using a closed test procedure. Many authors recommend that each component of the composite should be defined as secondary outcomes for the trial . However, it is doubtful that there would be sufficient power to detect effects on the individual components for the very reason that the composite outcome was chosen (i.e. there are too few events for each outcome). Individual tests on each component would also inflate the overall Type I error rate for the study. Berger  has suggested the use of informative preserving composite endpoints and the use of omnibus test functions. However, trialists have rarely utilized this procedure. Finally, another method would involve analysis of the weighted components of the composite. Although many different weighting schemes have been suggested [6, 9, 19, 20], these methods are not in common use by trialists . Further, weighting systems can introduce their own set of problems with interpretation, due to the perceived subjectivity of the weights.
Composites may be used either under the assumption of homogeneity of treatment effect across components or to summarize a risk-benefit profile of an intervention. In this manuscript we address the former use, where the best knowledge of the disease being studied points to a likely similarity of treatment effect on all component outcomes, based on known physiological pathways and theoretical models. While the treatment effect is assumed to be similar across each of the components in terms of direction, it is recognized that the magnitude may differ [2, 5]. Many authors recommend reviewing suspected treatment homogeneity through visual inspection of the direction of relative risk estimates for individual components of the composite in a trial [2, 7]. However, it is possible to test for heterogeneity of these treatment effects across components directly using standard methods for correlated binary data. If significant heterogeneity is found then the composite outcome may be invalidated or inappropriate for use. If not, we may have more confidence in the composite outcome, viewing it as meaningful, interpretable to represent treatment effect as a whole, and likely free from evidence of heterogeneity. However, tests for heterogeneity have been shown to lack power in meta-analyses and subgroup analyses . The purpose of this paper is to compare the power of different tests to detect heterogeneity of treatment effect across components of composite binary outcomes. We then explore the usefulness of such tests for detecting composite heterogeneity when the power is high for the treatment comparison on the composite outcome as a whole.
A. Methods for analysis of correlated binary outcomes
Participants in a trial who are followed beyond their first outcome may experience more than one component of the composite primary outcome. For example, for a trial with the primary outcome of myocardial infarction, stroke or cardiovascular death, a participant may experience a stroke and then die a cardiovascular death. Thus there is a repeated measurement of the different component outcomes for each individual. This binary data then has an intra cluster correlation due to repeated outcomes on the same individuals.
All models used contain parameters that estimate the treatment effect, the specific individual outcome component in the composite outcome, and the interaction of these two factors. These are presented for the jth treatment group, the kth component of the composite component outcome, and the ith participant in the trial. The test of the interaction term will allow detection of possible heterogeneity or difference in the study treatment effect across the composite components.
Model 1 Logistic regression ignoring correlation
It is possible that the intra cluster correlation seen among outcomes in typical cardiovascular trials is too small to make a difference to this analysis of composite homogeneity. We will fit a simple logistic regression to test this hypothesis (implemented in SAS using proc logistic ). The model fit will be: Logit(y ijk ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε ijk
For this and all subsequent models, the test for heterogeneity will test whether β 3 is significantly different from zero at p < 0.05 level.
Model 2 Weighted logistic regression
Model 3 Population average logistic models (GEE)
Here treatment and outcome component effects are estimated at the margin by averaging across individuals. The generalized estimating equations (GEE) methods will be used, which treats the correlation among individuals as a nuisance factor. Correlation between outcomes of individuals is modeled through a working correlation matrix and adjustments for misspecification are made using the sandwich variance formula . The covariance matrix will be unstructured to allow for different variances for each composite component (proc genmod ). The model is: where μ ijk = E(y ijk ), the marginal expectation and the β*'s estimate the population average response parameters.
Model 4 Random effects logistic models
This model incorporates a term for the individual in the analysis and allows the intercept to vary across individuals. Individuals are considered to be randomly selected from a population that has a normally distributed intercept component . The model is
Logit(E[y ijk |γ k ]) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + γ i + ϕ ijk where γ i is the random effect of participant with composite outcome component clustered within individual and ϕ ijk is the error term (proc glimmix ). The covariance matrix will be unstructured, or determined by the random effect.
B. Simulation data
Degree of treatment heterogeneity of the composite components: The odds ratio of the first component (OR1) was kept constant, while the second component odds ratio (OR2) was varied to simulate composite heterogeneity. Low heterogeneity is demonstrated by both OR's showing the same direction of treatment effect, moderate is indicated by a neutral effect in one component, and large is seen where the OR's have opposite patterns of risk.
Balance of the components: Simulations included cases where the components occurred equally (1:1) or unequally. For the unequal case, the composite outcome contained one component that occurred three or five times more often than the other.
Multivariate binary correlated data was generated using the method described in Park et al. . Sums of independent Poisson random variables were generated which share components such that the resulting sums are multiple correlated Poisson variables. Indicator functions were used to transform these variables into correlated binary data with the desired correlational structure.
Power to detect heterogeneity between the two components of a composite outcome by degree of heterogeneity (equal balance among components) with OR1 = 0.65
Composite Overall OR
Power for detecting heterogeneity of treatment effect by varying degrees of balance among the components of the composite for a moderate heterogeneity pattern OR1, OR2= (0.65, 1.00) and ratio (p1:p2) of occurrence of components 1 and 2.
Comparison of power for the main treatment effect with power for interaction test, using the population average model (GEE)
OR1 = 0.65
OR1 = 0.65
OR1 = 0.70
OR1 = 0.70
OR1 = 0.75
OR1 = 0.75
These simulations demonstrate that generally the population average (GEE) model has the greatest power to detect composite outcome treatment heterogeneity, of the four methods investigated. This is further supported by the conclusion that population average models (GEE) are the more powerful test among possible methods for analyzing cluster randomized trials data . It should be noted that the GEE and random effects models do not estimate the same parameters, since GEE is a marginal model and the random effects allows the estimation of individual effects. For effect estimation the GEE models are known to bias model parameter estimates towards the null, but at the same time have smaller parameter standard deviations compared to random effects models . Since the focus for this application is on the test statistics itself, rather than estimation, it seems reasonable that the population average model would have the greatest power. We found only one exception to this conclusion. When there was a large imbalance between the two composite components, where the most frequent of these had the smaller treatment effect, the weighted regression model had higher power, with the population average (GEE) model being second. We should also consider the fact that the GEE model was somewhat liberal in its type I error rate for the case of no composite outcome heterogeneity.
Even small amounts of component heterogeneity, can reduce study power to detect a treatment effect for the composite outcome. However, we did find regions where the power for both tests for the composite outcome and composite heterogeneity were greater than 50%. This indicates a range of results where tests for composite heterogeneity would be useful. One may only want to perform a test of composite outcome heterogeneity when the main effect is statistically significant but regardless of the statistical significance of the composite outcome, test for composite heterogeneity may provide insight into the differing mechanisms for each component outcome. This information could then aid in the design of future trials. However, for the current trial, the presence of composite heterogeneity should never lead researchers to assume that the composite outcome as a whole would have been statistically significant if only the mix of components were slightly altered.
The use of models for correlated binary data to explore composite outcome heterogeneity has some important advantages. It can easily be implemented in common statistical software packages using currently available repeated/recurrent outcomes methods. The methodology suggested in this manuscript can be generalized to other outcomes types in addition to binary, including continuous outcomes, time to first event and time to recurrent events. Given the ease of implementation and application to a variety of outcome types, trialists may be encouraged to address the issue of potential composite heterogeneity more often and more directly in the presentation of trial results.
There are limitations to the results presented here. We have not explored differing event rates, component correlations, extreme imbalance in component ratios, and the effects of more than two composite components. This area will require more research and such simulations could be a productive exercise when designing a randomized clinical trial. The methods presented would not be appropriate to use when the composite components are expected to show differing treatment directions, as in a risk-benefit composite outcome. Lastly, failure to detect statistically significant composite heterogeneity may be a result of lower power, rather than true treatment homogeneity across the composite components. Trialists would be wise to consider the power to detect composite heterogeneity in the design of trials with composite outcomes.
The methods of exploring composite outcome heterogeneity directly, using the tests described here, may partially address the concerns raised about using composite outcomes in many fields. When reporting trial results, it would seem reasonable to expect to see such a test for composite heterogeneity presented along side a statistically significant treatment effect test for the composite outcome.
We compared the power of different tests to detect composite heterogeneity for treatment effect across components of a composite binary outcome. Simulations were done comparing four different models commonly used to analyze correlated binary data. The results of these simulations are quite clear. Generally, GEE model should be chosen for investigating possible heterogeneity among the components of a binary composite outcome, since it demonstrated the greatest power. This is particularly true when the power for the test of treatment effect on the composite outcome as a whole was also reasonably high. It is recommended that tests of composite heterogeneity for composite outcomes accompany the publication of the results for statistically significant composite outcomes along with individual components of composite outcomes. Further simulations are still required to explore the impact on power of differing event rates, component correlations, extreme imbalance in component ratios, and the effects of more than two composite components.
- Leon MB, Teirstein PS, Moses JW, Tripuranenin P, Lansky AJ, Jani S, Wong SC, Fish D, Ellis S, Holmes DR, Kerieakes D, Kuntz RE: Localized intracoronary gamma-radiation therapy to inhibit the recurrence of restenosis after stenting. The New England Journal of Medicine. 2001, 344: 250-6. 10.1056/NEJM200101253440402.View ArticlePubMedGoogle Scholar
- Montori VM, Busse JW, Permanyer-Miralda G, Ferreira I, Guyatt GH: How should clinicians interpret results reflecting the effect of an intervention on composite endpoints: Should I dump this lump?. ACP Journal Club. 2005, 143: A-8-9.Google Scholar
- Montori VM, Permanyer-Miralda G, Ferreira-Gonzalez I, Busse JW, Pacheco-Huergo V, Bryant D, Alonso J, Akl EA, Domingo-Salvany A, Mills E, Wu P, Schunemann HJ, Jaeschke R, Guyatt GH: Validity of composite outcomes in clinical trials. British Medical Journal. 2005, 330: 594-6. 10.1136/bmj.330.7491.594.View ArticlePubMedPubMed CentralGoogle Scholar
- Ferreira-Gonzalez I, Busse JW, Heels-Ansdell , Montori VM, Akl EA, Bryant DM, Alonso-Coello P, Alonso J, Worster A, Upadhye S, Jaeschke R, Schunemann HJ, Permanyer-Miralda G, Pacheco-Huergo V, Domingo-Salvany A, Wu P, Mills EJ, Guyatt GH: Problems with use of composite end points in cardiovascular trials: systematic review of randomized controlled trials. British Medical Journal. 2007, 334 (7597): 786-10.1136/bmj.39136.682083.AE.View ArticlePubMedPubMed CentralGoogle Scholar
- DeMets DL, Califf RM: Lessons learned from recent cardiovascular clinical trials: Part I. Circulation. 2002, 106: 746-51. 10.1161/01.CIR.0000023219.51483.66.View ArticlePubMedGoogle Scholar
- Neaton JD, Gray G, Zuckerman BD, Konstam MA: Key issues in end point selection for heart failure trials: Composite end points. Journal of Cardiac Failure. 2005, 11: 567-75. 10.1016/j.cardfail.2005.08.350.View ArticlePubMedGoogle Scholar
- Moye LA: Multiple analyses in clinical trials. 2003, New York: SpringerGoogle Scholar
- Bergman S, Feldman LS, Barkun JS: Evaluating surgical outcomes. Surgical Clinics of North America. 2006, 86: 129-49. 10.1016/j.suc.2005.10.007.View ArticlePubMedGoogle Scholar
- Califf RM, Harrelson-Woodlief L, Topol EJ: Left ventricular ejection fraction may not be useful as an end point of thrombolytic therapy comparative trials. Circulation. 1990, 82: 1847-53.View ArticlePubMedGoogle Scholar
- Braunwald E, Cannon CP, McCabe CH: An approach to evaluating thrombolytic therapy in acute myocardial infarction. The 'unsatisfactory outcome' end point. Circulation. 1992, 86: 683-7.View ArticlePubMedGoogle Scholar
- Follmann D, Duerr A, Tabet S, Gilber P, Moddie Z, Fast P, Cardinali M, Self S: Endpoints and regulatory issues in HIV vaccine clinical trials. Journal of Acquired Immune Deficiency Syndrome. 2007, 44: 49-60. 10.1097/01.qai.0000247227.22504.ce.View ArticleGoogle Scholar
- Hariharan S, McBride MA, Cohen EP: Evolution of endpoints for renal transplant outcome. American Journal of Transplantation. 2003, 3: 933-41. 10.1034/j.1600-6143.2003.00176.x.View ArticlePubMedGoogle Scholar
- Davis SM, Koch GG, Davis CE, LaVange LM: Statistical approaches to effectiveness measurement and outcome-driven re-randomizations in the clinical antipsychotic trials of intervention effectiveness (CATIE) studies. Schizophrenia Bulletin. 2003, 29: 73-80.View ArticlePubMedGoogle Scholar
- Tugwell P, Judd MG, Fries JF, Singh G, Wells GA: Powering our way to the elusive side effect: A composite outcome 'basket' of predefined designated endpoints in each organ system should be included in all controlled trials. Journal of Clinical Epidemiology. 2005, 58: 785-90. 10.1016/j.jclinepi.2004.11.028.View ArticlePubMedGoogle Scholar
- Ross S: Composite outcomes in randomized clinical trial: arguments for and against. American Journal of Obstetrics & Gynecology. 2007, 196: 119e1-e6.View ArticleGoogle Scholar
- Huque MF, Sankoh AJ: A reviewer's perspective on multiple endpoint issues in clinical trials. Journal of Biopharmaceutical Statistics. 1997, 7: 545-64. 10.1080/10543409708835206.View ArticlePubMedGoogle Scholar
- Sankoh AJ, D'Argostina RB, Huque MF: Efficacy endpoint selection and multiplicity adjustment methods in clinical trials with inherent multiple endpoint issues. Statistics in Medicine. 2003, 22: 3133-50. 10.1002/sim.1557.View ArticlePubMedGoogle Scholar
- Berger V: Improving the information content of categorical clinical trials endpoints. Controlled Clinical Trials. 2002, 23: 502-14. 10.1016/S0197-2456(02)00233-7.View ArticlePubMedGoogle Scholar
- Hallstrom AP, Litwin PE, Weaver WD: A method of assigning scores to the components of a composite outcome: An example from the MITI trial. Controlled Clinical Trials. 1992, 13: 148-55. 10.1016/0197-2456(92)90020-Z.View ArticlePubMedGoogle Scholar
- Bjorling LE, Hodges JS: Rule-based ranking schemes for antiretroviral trials. Statistics in Medicine. 1997, 16: 1175-91. 10.1002/(SICI)1097-0258(19970530)16:10<1175::AID-SIM522>3.0.CO;2-G.View ArticlePubMedGoogle Scholar
- Hardy RJ, Thompson SG: Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine. 1998, 17: 841-56. 10.1002/(SICI)1097-0258(19980430)17:8<841::AID-SIM781>3.0.CO;2-D.View ArticlePubMedGoogle Scholar
- SAS Institute: SAS version 9.1. SAS Institute, Cary, NCGoogle Scholar
- Shoukri MM, Chaudhary MA: Analysis of Correlated Data with SAS and R. 2007, London, Chapman & Hall, 3Google Scholar
- Donald A, Donner A: Adjustment to the Mantel-Haenszel chi-squared statistic and odds ratio estimator when the data are clustered. Statistics in Medicine. 1987, 6: 491-9. 10.1002/sim.4780060408.View ArticlePubMedGoogle Scholar
- Rao JNK, Scott AJ: A simple method for the analysis of clustered binary data. Biometrics. 1992, 48: 577-85. 10.2307/2532311.View ArticlePubMedGoogle Scholar
- Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.1093/biomet/73.1.13.View ArticleGoogle Scholar
- McCullagh P, Nelder JA: Generalized Linear Models. 1989, London: Chapman and HallView ArticleGoogle Scholar
- The Heart Outcomes Prevention Evaluation (HOPE) Study Investigators: Effect of an angiotensin-converting-enzyme inhibitor, ramipril on cardiovascular events in high-risk patients. The New England Journal of Medicine. 2000, 342: 145-53. 10.1056/NEJM200001203420301.View ArticleGoogle Scholar
- Park CG, Park T, Shin DW: A simple method for generating correlated binary variates. The American Statistician. 1996, 50: 306-10. 10.2307/2684925.Google Scholar
- Austin PC: A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes. Statistics in Medicine. 2007, 26: 3550-65. 10.1002/sim.2813.View ArticlePubMedGoogle Scholar
- Hosmer DW, Lemeshow S: Applied Logistic Regression. 2000, New York: John Wiley & Sons, IncView ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/10/49/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.