Skip to main content
  • Research article
  • Open access
  • Published:

A patient-centered composite endpoint weighting technique for orthopaedic trauma research



This study aimed to address the current limitations of the use of composite endpoints in orthopaedic trauma research by quantifying the relative importance of clinical outcomes common to orthopaedic trauma patients and use those values to develop a patient-centered composite endpoint weighting technique.


A Best-Worst Scaling choice experiment was administered to 396 adult surgically-treated fracture patients. Respondents were presented with ten choice sets, each consisting of three out of ten plausible clinical outcomes. Hierarchical Bayesian modeling was used to determine the utilities associated with the outcomes.


Death was the outcome of greatest importance (mean utility = − 8.91), followed by above knee amputation (− 7.66), below knee amputation (− 6.97), severe pain (− 5.90), deep surgical site infection (SSI) (− 5.69), bone healing complications (− 5.20), and moderate pain (− 4.59). Mild pain (− 3.30) and superficial SSI (− 3.29), on the other hand, were the outcomes of least importance to respondents.


This study revealed that patients’ relative importance towards clinical outcomes followed a logical gradient, with distinct and quantifiable preferences for each possible component outcome. These findings were incorporated into a novel composite endpoint weighting technique.

Peer Review reports


A commonly used definition of a composite endpoint in clinical research is the occurrence of any one of several study events of interest [1]. Incorporating multiple endpoints into a single metric increases the number of observed events, can avoid issues pertaining to multiplicity, and thus, may increase statistical power [1,2,3]. Composite endpoints also enable the inclusion of rare, but clinically important, outcomes; therefore, providing a broader interpretation of the net clinical benefit of a treatment [1].

Composite endpoints have several limitations [4,5,6,7]. The treatment effect of an outcome of high importance but low frequency, such as death, may be muted by the inclusion of more common outcomes of lesser importance, such as a superficial infection [4]. Additionally, in studies that analyze composite endpoints using a traditional time to first event analysis or other analyses of frequency that only consider the first event, each study participant can have only one event; therefore, censoring subsequent events biases treatment effects to earlier outcomes. Efforts to address these limitations have included weighting techniques such as those utilizing the Delphi method [8], disability-adjusted life years [9, 10], or hierarchical and global ranking systems [11–14]. However, weighting methods with incorporated patient values specific to the target patient population are lacking [11, 12].

Composite endpoints are becoming increasingly common in orthopaedic trauma research. The objective of this study was to address the limitations related to the use of composite endpoints in orthopaedic trauma research. The primary aim was to quantify the utility and heterogeneity of utility of clinical outcomes common to orthopaedic trauma patients using a Best-Worst Scaling experiment. The secondary aims were to use the patient values derived from the Best-Worst Scaling experiment to develop a patient-centered composite endpoint weighting technique that accounts for multiple events per patient. Finally, we provide one hypothetical clinical trial example and several options for how the weights may be applied in practice.


Study design

A Best-Worst Scaling experiment was used to determine the relative importance of common clinical outcomes to orthopaedic trauma patients. Best-Worst Scaling experiments are a type of choice experiment that were first devised for marketing research but have been more recently applied to healthcare research [13, 14]. Choice experiments assume that any product or service, such as a healthcare treatment or clinical outcome, can be described by its characteristics, or attributes [15]. In a Best-Worst Scaling experiment, respondents are presented with a set of three or more attribute levels and then asked to select the best and worst attribute level in each choice set. The utility of each attribute level is then determined based on the probability of respondents choosing one attribute level over others [16]. The mean utility of each attribute level is then reported relative to a single, common reference level. In this study, the calculated utilities were used to produce a weighting technique accounting for the patient-reported importance of orthopaedic clinical outcomes.

Attribute development and survey design

The study was performed at a single Level-1 trauma center in Baltimore and followed the International Society for Pharmacoeconomics and Outcomes Research conjoint analysis practice guidelines [17]. The attributes used in this study were selected through a combination of quantitative and qualitative methods. A literature review identified common components of composite endpoints used in orthopaedic trauma research [18,19,20,21]. Expert consensus was elicited from orthopaedic trauma surgeons at the study location. Finally, semi-structured interviews were conducted with three orthopaedic trauma patients for additional perspective on plausible clinical outcomes. Information gathered from this work informed the final selection of the included attributes and levels deemed most important by our patient and clinician stakeholders. Orthopaedic trauma patient-partners then participated in the development of patient-oriented descriptions of each attribute level. Fig. 1 lists the attributes included in the final Best-Worst Scaling experiment questionnaire. The MaxDiff Design platform in JMP Pro Version 13 (Cary, NC) was used to create a Best-Worst Scaling questionnaire. The respondent burden was reduced using a blocked, balanced, fractional factorial design, based on optimal D-efficiency [22]. The final design included four versions of the questionnaire, each consisting of 10 choice sets. The choice experiment was pilot tested on orthopaedic trauma patients in an outpatient setting to validate respondent comprehension and study feasibility before the final administration.

Fig. 1
figure 1

Description of the attribute levels used in the Best-Worst Scaling questionnaire

Prior to completing the Best-Worst Scaling questionnaire, respondents answered several demographic questions and indicated which orthopaedic complications they had experienced during their post-operative clinical course. This process served to familiarize patients with the description of each attribute level prior to the choice experiment. To ensure face validity for the attribute descriptions, a chart review was performed to compare each patient’s reported post-surgical complications with any complications noted in the electronic medical records. Each choice set included a brief clinical scenario designed to establish a common context in which the post-surgical complications included in the choice sets could occur. Each choice set presented the respondent with three possible attribute levels (clinical outcomes) (see Fig. 2 for a sample choice set), and the respondents were asked to select the best and worst attribute level based on their personal preferences. This process was then repeated for the remaining choice sets (n = 10), with each subsequent choice set containing a different combination of the attribute levels.

Fig. 2
figure 2

Example of a Best-Worst Scaling experiment choice set used in this study

Eligibility criteria

The Best-Worst Scaling questionnaire was administered to English-speaking patients, 18 years of age or older with a surgically treated appendicular fracture from November 2017 to March 2018. Patients were enrolled in the study at an outpatient follow-up appointment, at which time they provided written informed consent and completed the written questionnaires. Electronic medical records were reviewed to assess respondent injuries, treatments, and complications. To ensure adequate statistical power for an a priori defined subgroup analysis by injury location, study participants were purposely sampled to ensure at least 50 participants with each of the following fractures: hand/wrist; upper extremity (proximal to distal ¼ radius/ulna); hip (pelvis, acetabulum, femoral neck, and greater/lesser trochanter), tibia/femur (distal to lesser trochanter and proximal to ankle fractures), and foot/ankle.

Statistical analysis of Best-Worst Scaling data

There is no consensus on the appropriate sample size calculation for choice experiments; however, previous research recommends a minimum of 50 respondents in each sub-group included in the analysis [23]. Ten sub-groups with hypothesized divergent outcome preferences were monitored to ensure adequate representation in the sample.

The BWS statistical analyses were performed using JMP Pro Version 13 (Cary, NC). Patient demographic and clinical characteristics were described using means and standard deviations for continuous variables, and frequencies and proportions described categorical variables. A hierarchical Bayesian multinomial logit model was used to estimate the utility for each of the included clinical outcomes. This technique derives posterior estimates of the respondent’s utility based on the distribution of coefficients across the study sample and the individual respondent’s utility coefficients. Model parameters were calculated iteratively using Gibbs sampling. We ran 10,000 iterations, including 5000 burn-in iterations. The respondent-level covariates are estimated based on the algorithm described by Train, which incorporates Adaptive Bayes and Metropolis-Hastings approaches [24]. The likelihood function for the utility parameters for a given respondent is based on a model for each subject’s preference within a choice set, given the attributes in the choice set [25]. The parameters for each attribute level represent the mean of these iterations, and the utility of each included outcome estimates the strength and direction of the respondents’ preference towards a given outcome. The utility estimates for a specific outcome derived in the model have no direct interpretation, and can only be interpreted relative to another utility estimate in the model. We set the mean utility at zero for perfect health; all other possible outcomes are then presented as negative utilities.

To test heterogeneity in respondents’ utility for each included clinical outcome, ten demographic and injury-specific covariates were independently tested as interaction terms in the primary model. To adjust for ten statistical tests, we set the level of significance for the interaction terms at α =0.05/10 = 0.005. Only covariates with a significant independent interaction were jointly tested with a α = 0.005 level of significance. If a significant interaction was observed in the joint testing, a stratified analysis was performed for covariate and outcomes using a one-way analysis of variance (ANOVA) test. Significant associations between the covariates and a specific outcome at α = 0.05 in the ANOVA test were further tested using a Tukey-Kramer post hoc test (Tukey JW: The problem of multiple comparisons, Unpublished; [26]). To determine if experiencing a clinical outcome is associated with a different utility for that outcome, we stratified respondents by those who had and had not experienced the outcome. The respondent-level utilities for the outcome of interest were then compared using a Student’s t-test.

Derivation of composite endpoint weights

An orthopaedic trauma composite endpoint weighting technique based on the mean utilities of the component outcomes and a modified version of the conditional logit formula described by McFadden [19] is provided below:

$$ {W}_a=\frac{e^{u_b}+{e}^{u_i}}{e^{u_a}+{e}^{u_b}+{e}^{u_i}} $$

The weight (W) is calculated separately for each included outcome a where u is the mean utility of each included outcome. b and i note the component outcomes included in the composite. A weight calculator, with sub-group adjustment, is included in the Additional file 1.

A hypothetical pilon fracture trial was used to illustrate the application of the proposed weighting technique (Table 1). In this hypothetical trial, 1000 patients are randomized to hypothetical Treatment A (n = 498) or Treatment B (n = 502). Three components (deep surgical site infection, bone healing complication, and superficial surgical site infection) were included in the hypothetical trial’s primary composite endpoint. The effect of Treatment A versus Treatment B on the composite endpoint was then calculated using several unweighted methods, including a Fisher’s Exact Test, time to first event analysis, and a random effects model. For comparison, the treatment effect was also calculated using several methods that accounted for the proposed component weights, including a Wilcoxon Rank Sums test, time to event allowing for weighted repeated events, and a random effects model that accounted for component weights [27]. The effect size for the random effects models are reported as odds ratios, and hazard ratios are used for the time to event models [28]. The Probability Index was used to report the treatment effect for the Wilcoxon Rank Sums test [27,28,29]. These analyses were performed using R Version 3.6.1 (Vienna, Austria). All of the data and code for the models are included in Additional files 1 and 2. However, for simplicity, only the unweighted and weighted time to event analysis are reported in the results section.

Table 1 Summary of events in a hypothetical pilon fracture trial


Sample characteristics

A total of 428 patients consented for the Best-Worst Scaling questionnaire at their scheduled follow up visits. Of those, 32 patients (7.5%) did not clearly indicate best and worst outcomes in the Best-Worst Scaling choice sets and were omitted from the analysis. The sociodemographic and fracture characteristics of the survey respondents are shown in Table 2. The mean age of the respondents was 48.7 years, and the respondents were more commonly male (58.3%) and white (66.4%). The median time from initial orthopaedic injury to survey completion was four months (IQR: 2–12 months). Nearly half (47.5%) of respondents had a tibia or femur fracture below the lesser trochanter. The most commonly experienced post-surgical outcome was ‘severe pain or discomfort’ (42.2%) followed by ‘bone healing complication’ (31.3%), and ‘moderate pain or discomfort’ (29.3%).

Table 2 Characteristics of study participants

Utilities of the clinical outcomes

The mean utility for each of the included clinical outcomes was scaled relative to “perfect health” (referenced at zero) (Table 3). Of the ten included clinical outcomes, the greatest importance was associated with death (mean utility = − 8.91, 95% CI -9.23 - -8.65), followed by an above knee amputation (AKA) (− 7.66, 95% CI -7.83 - -7.48]). Mild pain (− 3.30, 95% CI -3.46 - -3.13) and a superficial surgical site infection (− 3.29, 95% CI − 3.39 to − 3.16) were determined to be the outcomes of least importance to the respondents. The was no overlap in the confidence intervals of the clinical outcomes, except for those of superficial surgical site infection and mild pain, where considerable overlap in their utilities was observed.

Table 3 Utility estimates for all of the included clinical outcomes

Heterogeneity in utilities of clinical outcomes

Ten covariates were independently tested as interaction terms in the primary model. There was no heterogeneity in the respondent’ mean utility of the component outcomes based on sex, time since treatment, the location of their injury, or specifically an open tibia fracture. Statistically significant interactions based on age, race, education level, income level, and health insurance status were observed. The association between these five covariates and the respondent’s utilities for the included clinical outcomes was further tested using a stratified analysis with the findings reported in Table 4.

Table 4 Heterogeneity in the importance of clinical outcomes by patient characteristics

For each included clinical outcome, the respondent-level utilities for that specific outcome were compared between respondents that had experienced that particular outcome versus those that had not experienced the outcome. Of the 72 comparisons, only seven comparisons demonstrated significantly different mean utilities. Respondents with bone healing complications were less averse to an amputation above the knee (− 7.63 vs. -7.67, P = 0.02) compared to other respondents. Respondents with an above knee amputation were more averse to death (− 9.50 vs. -8.91, P < 0.01), but less averse to a superficial surgical site infection (− 2.07 vs. -3.29, P < 0.01). Respondents with a below knee amputation placed less importance on mild pain (− 3.49 vs. -3.30, P = 0.02) and superficial surgical site infection (− 2.66 vs. -3.30, P < 0.01) but a greater importance on severe pain (− 6.07 vs. 5.90, P = 0.04) compared to the other respondents. Respondents who experienced a superficial surgical site infection had a greater aversion to severe pain (− 5.99 vs. 5.89, P = 0.04).

Composite outcome weighting: an example

For the hypothetical pilon fracture trial, the results with the unweighted composite endpoint using a time to first event analysis would have determined that there was no difference between the two treatments (hazard ratio (HR): 1.02, 95% CI 0.83–1.27, P = 0.83) (Fig. 3). When weights are applied to the included component outcomes, and the analysis allows for patients to have more than one event, Treatment A is superior (HR: 0.72, 95% CI 0.57–0.90, P < 0.01). A similar difference in effect size was observed when the data were analyzed using unweighted and weighted random effects models (Additional file 3). However, the treatment effect was not statistically significant when the weights were applied using a global rank approach, and treatment groups were compared using a Wilcoxon Rank Sums test and Probability Index Model.

Fig. 3
figure 3

Survival curves of an unweighted time to first event analysis (a) and a weighted time to event analysis that allowed for repeated events (b) using the hypothetical pilon frature data


This study presents a novel composite endpoint weighting technique that includes ten, commonly-reported, orthopaedic trauma clinical outcomes. Hierarchical Bayesian modeling was used to calculate the importance, and heterogeneity in the importance of these outcomes in a cohort of nearly 400 orthopaedic trauma patients. Patients consistently ranked clinical outcomes according to a logical gradient ranging, from perfect health to death. Some heterogeneity in importance was observed based on respondent age, race, education level, income level, and health insurance provider. We did not observe heterogeneity in responses based on the location of the fracture or time since the initial treatment, suggesting the observed utility estimates and weighting technique has face validity across multiple fracture types and clinical experiences.

To our knowledge, this is the first study to incorporate patient preferences derived from a choice experiment into a composite endpoint weighting technique for orthopaedic outcomes. Other efforts at weighting composite endpoints have included assigning weights based on clinical and research experience [1, 8, 30], hierarchical ranking of outcomes for an entire cohort of patients in a trial [31, 32], and the inclusion of a measure of “importance to patients” assigned by clinical experts [8, 32, 33]. Outside of cardiovascular research, patient surveys on the relative value of component outcomes of composite endpoints have not been incorporated into weighing techniques [11, 12, 34, 35].

This study’s patient-centered composite endpoint weighting technique represents an improvement on previous weighted composite endpoint techniques. This work advances patient-centered outcomes research by weighting study outcomes using responses derived from the study population of interest. For the orthopaedic community, the technique provides a set of ten common clinical outcomes researchers may incorporate into future composites endpoints. The limited heterogeneity in observed preferences suggests a common value gradient for clinical outcomes that is not altered by the type of fracture, or the time since injury, and only a small variation based on outcomes experienced. Weightings may be adjusted to reflect the relative importance of an outcome of interest for specific subpopulations, when heterogeneity in that subpopulation exists on a specific outcome, such as an above knee amputation among patients over the age of 65.

Additionally, the technique addresses an important limitation of traditional composite outcomes. The weighting formula can to easily applied to several different statistical methods, including time to event analysis, multivariate modeling, or a global rank test [28, 29]. Multiple events can be included for a single patient in any of the three methods. Furthermore, multiple events per patient could be used in a time-to-event analysis enabling a comparison of the trajectory of clinical outcomes subsequent to treatment [36]. The confidence intervals associated with the mean utility of each clinical outcomes allows for a sensitivity analysis of treatment effect based on the distribution of the weightings. In the weighting formula, the weights adjust relative to the components that are included in the composite. The precision of the weights is useful in distinguishing order in a global rank test with several components of similar weight [27, 28].

Despite the strengths of this study, several limitations must be considered. This study enrolled patients from a single trauma center. While the trauma center has a statewide catchment, sample populations from other regions may vary in their relative importance for the included outcomes. Although respondents may have had a different understanding of clinical outcomes described in the survey, a comparison of patient-reported outcomes with the medical records found 96% accuracy in reporting, suggesting an adequate comprehension of the included clinical outcomes. The questionnaire’s brief descriptions of the clinical outcomes may have not adequately conveyed the magnitude of such an event for a patient and are open to subjective interpretation. However, the overall homogeneity in the importance of the clinical outcomes suggests a consistent understanding by the respondents. Finally, the list of clinical outcomes included in the study is not exhaustive. While there are many other clinical outcomes commonly reported in orthopaedic trauma research, the identification of outcomes included in this analysis was based on a synthesis of the literature and conducted in collaboration with clinical experts and orthopaedic patient trauma survivors who confirmed the proposed outcomes were both commonly used and relevant to patients. This weighting technique could be easily expanded to other outcomes and replicated in other health conditions. However, at present, the application of this weighting technique is limited to studies with component outcomes included in our model.


Based on prospectively collected preference data from nearly 400 orthopaedic trauma patients, the study proposes a novel composite endpoint weighting technique. The findings suggest an overall homogeneity among orthopaedic trauma patients in their importance towards clinical outcomes. This composite endpoint technique applies weights to the component outcomes based on orthopaedic trauma patient preferences and can be applied to several types of statistical comparisons to estimate the clinical benefit of a treatment.

Availability of data and materials

The data supporting the conclusions of this article are included as Supplementary Material within the article.



Above knee amputation


Analysis of variance


Surgical site infection


  1. Braunwald E, Cannon CP, McCabe CH. An approach to evaluating thrombolytic therapy in acute myocardial infarction. The 'unsatisfactory outcome' end point. Circ. 1992;86(2):683–7.

    Article  CAS  Google Scholar 

  2. Sun H, Davison BA, Cotter G, et al. Evaluating treatment efficacy by multiple end points in phase II acute heart failure clinical trials: analyzing data using a global method. Circ Heart Fail. 2012 Nov;5(6):742–9.

    Article  Google Scholar 

  3. Brown PM, Anstrom KJ, Felker GM, et al. Composite End Points in Acute Heart Failure Research: Data Simulations Illustrate the Limitations. Can J Cardiol. 2016 Nov;32(11):1356.e21–8.

    Article  Google Scholar 

  4. Heneghan C, Goldacre B, Mahtani KR. Why clinical trial outcomes fail to translate into benefits for patients. Trials. 2017;18(1):122.

    Article  Google Scholar 

  5. Ferreira-Gonzalez I, Permanyer-Miralda G, Busse JW, Bryant DM, Montori VM, Alonso-Coello P, et al. Methodologic discussions for using and interpreting composite endpoints are limited, but still identify major concerns. J Clin Epidemiol. 2007;60(7):651–7 discussion 8-62.

    Article  Google Scholar 

  6. Heddle NM, Cook RJ. Composite outcomes in clinical trials: what are they and when should they be used? Transfus. 2011;51(1):11–3.

    Article  Google Scholar 

  7. Montori VM, Permanyer-Miralda G, Ferreira-Gonzalez I, Busse JW, Pacheco-Huergo V, Bryant D, et al. Validity of composite end points in clinical trials. BMJ. 2005;330(7491):594–6.

    Article  Google Scholar 

  8. Armstrong PW, Westerhout CM, Van de Werf F, Califf RM, Welsh RC, Wilcox RG, et al. Refining clinical trial composite outcomes: an application to the assessment of the safety and efficacy of a new Thrombolytic-3 (ASSENT-3) trial. Am Heart J. 2011;161(5):848–54.

    Article  Google Scholar 

  9. Hong KS, Ali LK, Selco SL, Fonarow GC, Saver JL. Weighting components of composite end points in clinical trials: an approach using disability-adjusted life-years. Stroke. 2011;42(6):1722–9.

    Article  Google Scholar 

  10. Global burden of disease 2004 update: disability weights for diseases and conditions. Geneva, Switzerland: World Health Organization; 2004.

  11. Stolker JM, Spertus JA, Cohen DJ, et al. Rethinking composite end points in clinical trials: insights from patients and trialists. Circ. 2014 Oct 7;130(15):1254–61.

    Article  Google Scholar 

  12. Vaanholt MCW, Kok MM, von Birgelen C, et al. Are component endpoints equal? A preference study into the practice of composite endpoints in clinical trials. Health Expect. 2018 Dec;21(6):1046–55.

    Article  Google Scholar 

  13. Finn A, Louviere JJ. Determining the appropriate response to evidence of public concern: the case of food safety. J Public Policy Mark. 1992;11(2):12–25.

    Article  Google Scholar 

  14. Szeinbach L, Barnes S. H, McGhan J, F, et al. using conjoint analysis to evaluate health state preferences. Drug Inf J. 1999;33:849–58.

    Article  Google Scholar 

  15. O'Hara NN, Roy L, O'Hara LM, Spiegel JM, Lynd LD, FitzGerald JM, et al. Healthcare worker preferences for active tuberculosis case finding programs in South Africa: a best-worst scaling choice experiment. PLoS One. 2015;10(7):e0133304.

    Article  Google Scholar 

  16. Louviere J, Lings I, Islam T, Gudergan S, Flynn T. An introduction to the application of (case 1) best–worst scaling in marketing research. Int J Res Mark. 2013;30(3):292–303.

    Article  Google Scholar 

  17. Bridges JF, Hauber AB, Marshall D, Lloyd A, Prosser LA, Regier DA, et al. Conjoint analysis applications in health--a checklist: a report of the ISPOR good research practices for conjoint analysis task force. Value Health. 2011;14(4):403–13.

    Article  Google Scholar 

  18. Bosse MJ, MacKenzie EJ, Kellam JF, Burgess AR, Webb LX, Swiontkowski MF, et al. An analysis of outcomes of reconstruction or amputation after leg-threatening injuries. N Engl J Med. 2002 Dec 12;347(24):1924–31.

    Article  Google Scholar 

  19. Investigators FLOW, Bhandari M, Jeray KJ, Petrisor BA, Devereaux PJ, Heels-Ansdell D, et al. A trial of wound irrigation in the initial management of open fracture wounds. N Engl J Med. 2015 Dec 31;373(27):2629–41.

    Article  Google Scholar 

  20. Fixation using Alternative Implants for the Treatment of Hip fractures (FAITH) Investigators. Fracture fixation in the operative management of hip fractures (FAITH): an international, multicentre, randomised controlled trial. Lancet. 2017 Apr 15;389(10078):1519–27.

    Article  Google Scholar 

  21. Study to Prospectively Evaluate Reamed Intramedullary Nails in Patients with Tibial Fractures Investigators, Bhandari M, Guyatt G, Tornetta P 3rd, Schemitsch EH, Swiontkowski M, et al. Randomized trial of reamed and unreamed intramedullary nailing of tibial shaft fractures. J Bone Joint Surg Am. 2008 Dec;90(12):2567–78.

    Article  Google Scholar 

  22. Kuhfeld WF, Tobias RD, Garratt M. Efficient experimental design with marketing research applications. J Mark Res. 1994 Nov;31(4):545–57.

    Article  Google Scholar 

  23. Cheraghi-Sohi S, Hole AR, Mead N, McDonald R, Whalley D, Bower P, et al. What patients want from primary care consultations: a discrete choice experiment to identify patients' priorities. Ann Fam Med. 2008;6(2):107–15.

    Article  Google Scholar 

  24. Train K. A comparison of hierarchical Bayes and maximum simulated likelihood for mixed logit. Univ California, Berkeley. 2001 Jun;18:1–3.

    Google Scholar 

  25. McFadden D. Conditional logit analysis of qualitative choice behaviour. In: Zarembka P, editor. Frontiers in econometrics. New York: Academic Press; 1974. p. 105–42.

    Google Scholar 

  26. Kramer CY. Extension of multiple range tests to group means with unequal numbers of replications. Biometrics. 1956 Sep 1;12(3):307–10.

    Article  Google Scholar 

  27. Felker GM, Maisel AS. A global rank end point for clinical trials in acute heart failure. Circ Heart Fail. 2010;3:643–6.

    Article  Google Scholar 

  28. Brown PM, Ezekowitz JA. Composite End Points in Clinical Trials of Heart Failure Therapy: How Do We Measure the Effect Size? Circ Heart Fail. 2017 Jan;10(1).

  29. Acion L, Peterson JJ, Temple S, Arndt S. Probabilistic index: an intuitive non-parametric approach to measuring the size of treatment effects. Stat Med. 2006;25:591–602.

    Article  Google Scholar 

  30. Califf RM, Harrelson-Woodlief L, Topol EJ. Left ventricular ejection fraction may not be useful as an end point of thrombolytic therapy comparative trials. Circ. 1990;82(5):1847–53.

    Article  CAS  Google Scholar 

  31. Felker GM, Anstrom KJ, Rogers JG. A global ranking approach to end points in trials of mechanical circulatory support devices. J Card Fail. 2008;14(5):368–72.

    Article  Google Scholar 

  32. Follmann D, Wittes J, Cutler JA. The use of subjective rankings in clinical trials with an application to cardiovascular disease. Stat Med. 1992;11(4):427–37 discussion 39-54.

    Article  CAS  Google Scholar 

  33. Ferreira-Gonzalez I, Busse JW, Heels-Ansdell D, Montori VM, Akl EA, Bryant DM, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials. BMJ. 2007;334(7597):786.

    Article  Google Scholar 

  34. Stolker JM, Spertus JA, Cohen DJ, Jones PG, Jain KK, Bamberger E, et al. Rethinking composite end points in clinical trials: insights from patients and trialists. Circ. 2014;130(15):1254–61.

    Article  Google Scholar 

  35. Tong BC, Huber JC, Ascheim DD, Puskas JD, Ferguson TB Jr, Blackstone EH, et al. Weighting composite endpoints in clinical trials: essential evidence for the heart team. Ann Thorac Surg. 2012;94(6):1908–13.

    Article  Google Scholar 

  36. Bakal JA, Westerhout CM, Cantor WJ, Fernandez-Aviles F, Welsh RC, Fitchett D, et al. Evaluation of early percutaneous coronary intervention vs. standard therapy after fibrinolysis for ST-segment elevation myocardial infarction: contribution of weighting the composite endpoint. Eur Heart J. 2013;34(12):903–8.

    Article  CAS  Google Scholar 

Download references


The authors would like to thank the study participants who participated in the survey.


The study was not supported by external funding.

Author information

Authors and Affiliations



UNU conceived and designed the study, collected the data, interpreted the data, and drafted the manuscript. AH conceived and designed the study, interpreted the data, and reviewed the manuscript critically. KF and RCC interpreted the data and reviewed the manuscript critically. MI, DC, DM, and MB conceived the study, collected the data, interpreted the data, and reviewed the manuscript critically. GPS and RVO conceived and designed the study, interpreted the data, and reviewed the manuscript critically. NNO conceived and designed the study, performed the statistical analysis, interpreted the data and drafted the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Nathan N. O’Hara.

Ethics declarations

Ethics approval and consent to participate

Ethical approval was obtained by the University of Maryland Institutional Review Board by the (HP-00076872) with a waiver of consent per 45 CFR 46.116(d).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

Composite weighting calculator.

Additional file 2.

Data for hypothetical pilon fracture trial available in long and wide format.

Additional file 3.

Plausible unweighted and weighted methods of analyses for counts, time to event, and multivariate analysis.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Udogwu, U.N., Howe, A., Frey, K. et al. A patient-centered composite endpoint weighting technique for orthopaedic trauma research. BMC Med Res Methodol 19, 242 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: