- Research article
- Open Access
- Open Peer Review
Empirical comparison of methods for analyzing multiple time-to-event outcomes in a non-inferiority trial: a breast cancer study
© Parpia et al.; licensee BioMed Central Ltd. 2013
- Received: 17 November 2012
- Accepted: 12 March 2013
- Published: 21 March 2013
Subjects with breast cancer enrolled in trials may experience multiple events such as local recurrence, distant recurrence or death. These events are not independent; the occurrence of one may increase the risk of another, or prevent another from occurring. The most commonly used Cox proportional hazards (Cox-PH) model ignores the relationships between events, resulting in a potential impact on the treatment effect and conclusions. The use of statistical methods to analyze multiple time-to-event events has mainly been focused on superiority trials. However, their application to non-inferiority trials is limited. We evaluate four statistical methods for multiple time-to-event endpoints in the context of a non-inferiority trial.
Three methods for analyzing multiple events data, namely, i) the competing risks (CR) model, ii) the marginal model, and iii) the frailty model were compared with the Cox-PH model using data from a previously-reported non-inferiority trial comparing hypofractionated radiotherapy with conventional radiotherapy for the prevention of local recurrence in patients with early stage breast cancer who had undergone breast conserving surgery. These methods were also compared using two simulated examples, scenario A where the hazards for distant recurrence and death were higher in the control group, and scenario B. where the hazards of distant recurrence and death were higher in the experimental group. Both scenarios were designed to have a non-inferiority margin of 1.50.
In the breast cancer trial, the methods produced primary outcome results similar to those using the Cox-PH model: namely, a local recurrence hazard ratio (HR) of 0.95 and a 95% confidence interval (CI) of 0.62 to 1.46. In Scenario A, non-inferiority was observed with the Cox-PH model (HR = 1.04; CI of 0.80 to 1.35), but not with the CR model (HR = 1.37; CI of 1.06 to 1.79), and the average marginal and frailty model showed a positive effect of the experimental treatment. The results in Scenario A contrasted with Scenario B with non-inferiority being observed with the CR model (HR = 1.10; CI of 0.87 to 1.39), but not with the Cox-PH model (HR = 1.46; CI of 1.15 to 1.85), and the marginal and frailty model showed a negative effect of the experimental treatment.
When subjects are at risk for multiple events in non-inferiority trials, researchers need to consider using the CR, marginal and frailty models in addition to the Cox-PH model in order to provide additional information in describing the disease process and to assess the robustness of the results. In the presence of competing risks, the Cox-PH model is appropriate for investigating the biologic effect of treatment, whereas the CR models yields the actual effect of treatment in the study.
- Cox model
- Marginal model
- Frailty model
- Competing risks
Randomized controlled trials are considered to be the gold standard for evaluating therapeutic interventions in many different diseases including those in oncology. Unlike studies in other diseases, cancer trials typically follow subjects beyond the planned intervention, often for many years. During this time, subjects may be at risk for several events. For example, subjects in breast cancer trials can experience local recurrence in the treated breast, distant recurrence, death or a combination of these. In most trials, only one of these events is considered the primary outcome and the others are secondary outcomes. The occurrence of multiple events per subject over a period of time is sometimes referred to as event history data .
One of the most commonly used statistical approaches for analyzing such data is the Cox proportional hazards (Cox-PH) model, which models the time from randomization to a specific event . However, analyzing each outcome separately using the Cox-PH model does not make use of all the available information because it fails to account for the plausible relationships or correlations between events. For instance, it is possible that experiencing one event increases the risk of experiencing another event. Conversely, it is also possible that the occurrence of one event may even prevent others from occurring, a situation known as competing risks . Standard survival analysis techniques have been shown to bias results in such circumstances [4–6]. The effect of treatment may differ depending on whether or not intermediary events are incorporated into the analysis.
Several statistical methods exist to analyze event history data. These include: the Cox-PH models , the competing risk (CR) model , the marginal model , and the frailty model . The majority of research has focused on using these methods in the analysis of superiority trials where the intervention is expected to be superior to the standard treatment, but their application to non-inferiority trials is lacking. Marginal and frailty models are efficient methods of estimating treatment effect in studies where patients have multiple events of the same type, such as recurrence of asthma attacks. In addition, they are used in studies where treatment can have an effect on multiple events using the same biological pathway. Research on the CR models in superiority trials has shown the Kaplan-Meier approach over-estimates the event rate in the presence of competing risks. However, the relative treatment effect from the CR model remains unchanged compared to the Cox-PH model unless treatment affects the competing event .
Non-inferiority randomized trials generally compare the standard treatment with a new treatment that is expected to be less toxic or less expensive or less invasive but “no worse” within a tolerance margin than the standard treatment in terms of clinical outcome.
The purpose of this manuscript is to compare empirically these different approaches in the analysis of a non-inferiority trial in which a subject can experience more than one type of outcome event. In addition, we compare these methods using simulated examples of trials. We first provide a brief overview of the methods, and then apply them to a previously-reported randomized trial of hypofractionated radiotherapy in patients with breast cancer [11, 12], and to the simulated trial examples. For the purposes of this study, we will consider the Cox-PH model for each type of event as the primary analysis.
Cox proportional hazards (PH) model
where λ i (t|X) is the hazard of subject i conditional on covariate X at time t, λ 0(t) is the baseline hazard at time t, X is the covariate (e.g. 1 = experimental group, 0 = control group), and β is the coefficient representing the effect of treatment independent of time. In cancer trials, the constant treatment effect is represented by the ratio of hazards for the experimental group relative to the control group, or hazard ratio (HR), given by exp(β).
Competing risks (CR) model
where is the hazard of the sub-distribution for cause j; is the baseline hazard of the sub-distribution; and is the treatment effect of the sub-distribution. This model reduces to the standard Cox-PH model when competing risks are absent.
Stratification by event j allows for varying underlying baseline hazards λ 0j for each event. In addition, treatment by event interactions allows for estimation of event-specific treatment effects [18, 19]. The WLW model also estimates the ‘average effect’ of treatment on all events using a weighted average of , which we will call the average WLW model. Dependencies between observed event times are adjusted for by the use of a robust sandwich estimate of the variance. In the presence of competing risks, the WLW model models both the marginal hazard for death and the cause-specific hazard for recurrences .
where γ i is the frailty parameter that can also be used to model associations between event times . A large parameter value corresponds to a large correlation between event times for a subject, and also describes the frailty or excess risk within a subject [9, 21, 22]. This model assumes that event times within a subject are independent given the frailty parameter . Similar to other random effects model, this one also yields effects specific to the subjects in the trial. Several published books provide excellent reviews on frailty models [9, 21, 23].
The hypofractionation trial
Between April 1993 and September 1996, 1234 patients with early stage breast cancer who had undergone breast conserving surgery were randomly allocated to receive either 42.5 Gray of radiotherapy in 16 fractions (the experimental arm) or 50 Gray in 25 fractions (the standard arm) to the breast for the prevention of local breast recurrence; details and long-term results are described elsewhere [11, 12]. The primary outcome of local recurrence was compared was using a point-in-time comparison of local recurrence failure probabilities at five and 10 years [11, 12].
For the purpose of this paper, HRs rather than point-in-time failure probabilities will be used. The hypofractionation trial was designed with a control arm local recurrence rate of 7% at 5 years. The non-inferiority margin was set at 5% to tolerate an increase in local recurrence to 12% in the experimental arm. This translates into a HR = ln(0.88)/ln(0.93) = 1.76. Additional events of interest were distant recurrence, new primary cancer and death. Because of the difficulty in differentiating new primaries from distant recurrences, these will be combined in the distant recurrence category. In addition, we consider only the first occurrence of each type of event.
Suppose that a randomized non-inferiority trial similar to the Hypofractionation Trial were designed to demonstrate that an experimental therapy E is as good as a control therapy C for the prevention of local recurrence in a subset of breast cancer patients. Assuming that the rate of local recurrence at five years in the control arm is 10.0%, and that the maximum tolerable rate of local recurrence at five years in the experimental arm is 14.6% (HR = 1.50), then 1000 patients per treatment arm would be required, giving 90% power and a one-sided alpha of 0.025.
Hazards for simulated scenarios of non-inferiority trials (LR = local recurrence, MR = distant recurrence, DT = Death)
For the Cox-PH model, we structured the data in a “wide” format (i.e. one record per subject). We fit Cox-PH models for each event separately. For the local recurrence model, death and distant recurrences are censored, and for the distant recurrence model, death is treated as a censored observation and local recurrence is ignored. Any recurrence is ignored for the death model. Similarly, for the CR approach, we fit Fine and Gray’s model  for each event. Death and distant recurrences are treated as competing events for the local recurrence model, and death is treated similarly for the distant recurrence model. The analysis for death is equivalent to the standard Cox-PH model because death is always observable.
Data structure for the WLW model for all possible combinations of events (L = time to local recurrence, M = time to distant recurrence, D = time to death, E = time at end of follow-up, + = censoring indicator)
L, M, D
The frailty model is fit using an extension of the Cox-PH model that includes the frailty parameter that assumes a gamma distribution because the events are assumed to be positively correlated [25, 26]. For this analysis, every subject has at least one record representing vital status (i.e. alive or dead) at the end of the study, and each recurrence is represented by an additional record.
For the simulation, the HRs and the standard errors of the HRs were averaged on the log scale (1000 replications). All analyses were performed using SAS 9.2 (SAS Institute, Cary, NC) and R 2.13 (http://www.r-project.org).
The hypofractionation trial
In non-inferiority clinical trials of patients with breast cancer, patients may be at risk of and may experience multiple failure types. The occurrence of one of these events may alter the probability of occurrence of other events. Moreover, the influence of treatment may differ depending on whether another event has occurred, thus affecting the conclusions of the trial. This paper discusses, and applies four approaches of analyzing non-inferiority trials with multiple events, by using data from an existing trial in which subjects with breast cancer could experience local recurrence, distant recurrence, death, or a combination of these events. In addition, we compared the methods using simulated examples of non-inferiority trials.
The analysis of the Hypofractionation Trial showed that treatment was not associated with increased risk of any of the events of interest either individually or in combination. The results for each event using the Cox-PH model and the CR model are similar, suggesting that the impact of competing risks in this data set is minimal. The treatment estimates for each event from the WLW model are identical to those of the standard Cox-PH model since the estimates of the regression coefficients are calculated using equivalent methods. However, the adjustment of correlation in the variance estimate of the WLW model leads to slightly different confidence intervals when compared with the Cox-PH model. The WLW model is also susceptible to the competing risk problem since subjects are at risk for events until they occur, but the model yields unbiased estimates when treatment does not influence the competing events .
Scenarios A and B provide evidence that the presence of multiple events could alter the conclusions of the trial depending on the method of final analysis. The Cox-PH and WLW local recurrence models ignore the hazards for distant recurrence and death, thus resulting in different conclusions for local recurrence when compared to with CR model. Similarly, the Cox-PH and WLW distant recurrence models ignore the hazard for death. By ignoring the competing risks, the Cox-PH and WLW methods model the cause-specific hazard or the marginal failure times, and the effect of treatment can be interpreted as the “pure effect” or the biologic effect of treatment on the event of interest . This is the effect of treatment under the assumption that the competing risk had not occurred, which can be of interest to investigators.
Unlike the Cox-PH and WLW models, the CR model does not censor patients who have had a distant recurrence or death, but rather assumes that these patients will have a zero risk of local recurrence once distant recurrence or death is observed. Censoring assumes that the patient is still at risk for local recurrence. Therefore, in the CR model, the treatment group with higher relative hazards of distant recurrence and death will have a relatively lower hazard of local recurrence, and the HR for local recurrence will favor this treatment group. This approach models the hazard of the sub-distribution, and the effect of treatment can be described as the “real effect” or the actual effect seen in the data [28, 29].
The CR model does provide additional information about the treatment when competing events are present. The Cox-PH model declares non-inferiority of local recurrence, but the CR model shows that the absolute effect of treatment is inferior in the study because the control group has a higher hazard of competing events (scenario A). However, in some situations (scenario B),the results from the CR model should be interpreted with caution since the CR model may show that the experimental group is non-inferior to the control for local recurrence, but at the expense of increased distant recurrence or death, which are clinically worse outcomes. If this is a concern, one may opt to design the trial using an outcome such as disease-free survival which encompasses local and distant recurrence. In addition, CR models have less power than the Cox-PH models to rule out the same non-inferiority margin .
The average WLW and frailty models are useful in investigating the overall effect of treatment for any event accounting for the correlation between event times in their respective ways. The main advantage of these approaches is that they are efficient in their estimation of regression coefficients due to their ability to use all the data and to adjust for the association between event times, thus increasing statistical power. However, their use is limited when dealing with dissimilar types of events with different clinical etiology such as local and distant recurrence, because the approach does not provide HRs of treatment and other factors in relation to specific events but rather a combination of all events. Moreover, these models do not correspond to the design of the trial which is evaluating a local treatment and based on rate of local recurrence.
The methods behave similarly in non-inferiority trials as compared with superiority trials. As in superiority trials, competing risks is an issue when treatment affects the competing event. When the distribution of competing events are similar in both treatment groups, the CR model and the Cox-PH model yield similar results, and therefore, the biologic effect and actual effect of treatment in the study are similar. However, similar to superiority trials, when treatment has a differential effect on competing events, the results of the biologic and actual effect of treatment can contradict each other.
A limitation of this study is that we compared analytic techniques using a single non-inferiority trial. To overcome this, we simulated examples to illustrate that the choice of method may influence the conclusions. However, we simulated only two scenarios using the latent failure time approach, thus limiting the generalizability of the results. Secondly, we generated the data using a latent failure time approach which is not without controversy . However, we did not use the model or its assumptions in any of our analyses, and do not recommend it for use for analysis. Lastly, we considered only the most commonly used methods of analysis which are readily available in current statistical software. Alternative options include jointly modeling all types of events using a joint frailty model where each event has one hazard function , or using a multivariate competing risk frailty model . However, such undertakings would be computationally intensive and complex.
Our results show that the choice of event-specific models did not affect the non-inferiority conclusion of the Hypofractionation Trial. However, our examples showed that the CR method did yield contrasting conclusions to the Cox-PH and WLW models when competing events were present. In general, the method of analysis should be determined by the research question. The Cox-PH or the WLW model can be used for analysis of non-inferiority trials when the question relates to the biologic effect of treatment. The CR model should also be used when competing risks are present as it provides valuable information on the actual effect of treatment in the study, especially when treatment has an effect on the competing event. Both models should be part of a comprehensive analysis. The frailty and average WLW provide similar results of the overall effect of treatment on all the events. When subjects are at risk for multiple events in non-inferiority trials, researchers should consider the use of the CR, WLW and frailty models concurrent with the standard Cox-PH model in order to provide additional information in describing the disease process.
SP, JAJ, LT and MNL conceived the study. SP conducted literature review, designed and implemented the simulation, preformed data analysis and wrote the initial draft of the manuscript. TJW, JAJ and MNL participated in the design and implementation of the Hypofractionation Trial. All authors reviewed and revised the draft version of the manuscript. All authors read and approved the final version of the manuscript This research was funded in part by funds from the CANNeCTIN Program.
- Andersen PK, Borgan O, Gill RD, Keiding N: Statistical models based on counting processes. 1993, New York: SpringerView ArticleGoogle Scholar
- Cox D: Regrssion models and life-tables. J Roy Stat Soc B Meth. 1972, 43: 187-220.Google Scholar
- Gooley TA, Leisenring W, Crowley J, Storer BE: Estimation of failure probabilities in the presence of competing risks: new representations of old estimators. Stat Med. 1999, 18: 695-706. 10.1002/(SICI)1097-0258(19990330)18:6<695::AID-SIM60>3.0.CO;2-O.View ArticlePubMedGoogle Scholar
- Kim HT: Cumulative incidence in competing risks data and competing risks regression analysis. Clin Cancer Res. 2007, 13: 559-565. 10.1158/1078-0432.CCR-06-1210.View ArticlePubMedGoogle Scholar
- Williamson PR, Kolamunnage-Dona R, Tudur Smith C: The influence of competing-risks setting on the choice of hypothesis test for treatment effect. Biostatistics. 2007, 8: 689-694.View ArticlePubMedGoogle Scholar
- Tai B-C, Grundy RG, Machin D: On the importance of accounting for competing risks in paediatric cancer trials designed to delay or avoid radiotherapy: I. Basic concepts and first analyses. Int J Radiat Oncol Biol Phys. 2010, 76: 1493-1499. 10.1016/j.ijrobp.2009.03.035.View ArticlePubMedGoogle Scholar
- Fine J, Gray R: A proportional hazards model for the sub-distribution of a competing risk. J Am Stat Assoc. 1999, 94: 496-509. 10.1080/01621459.1999.10474144.View ArticleGoogle Scholar
- Wei LJ, Glidden DV: An overview of statistical methods for multiple failure time data in clinical trials. Stat Med. 1997, 16: 833-839. 10.1002/(SICI)1097-0258(19970430)16:8<833::AID-SIM538>3.0.CO;2-2.View ArticlePubMedGoogle Scholar
- Hougaard P: Analysis of Multivariate Survival Data. 2000, New York, NY: SpringerView ArticleGoogle Scholar
- Bakoyannis G, Touloumi G: Practical methods for competing risks data: A review. Stat Methods Med Res. 2011, 3: 257-272.Google Scholar
- Whelan T, MacKenzie R, Julian J, Levine M, Shelley W, Grimard L, Lada B, Lukka H, Perera F, Fyles A, Laukkanen E, Gulavita S, Benk V, Szechtman B: Randomized trial of breast irradiation schedules after lumpectomy for women with lymph node-negative breast cancer. J Natl Cancer Inst. 2002, 94: 1143-1150. 10.1093/jnci/94.15.1143.View ArticlePubMedGoogle Scholar
- Whelan TJ, Pignol J-P, Levine MN, Julian JA, MacKenzie R, Parpia S, Shelley W, Grimard L, Bowen J, Lukka H, Perera F, Fyles A, Schneider K, Gulavita S, Freeman C: Long-term results of hypofractionated radiation therapy for breast cancer. N Engl J Med. 2010, 362: 513-520. 10.1056/NEJMoa0906260.View ArticlePubMedGoogle Scholar
- Parmar M, Machin D: Survival Analysis: A Practical Approach. 1995, Chichester, UK: John Wiley and SonsGoogle Scholar
- Kalbfleisch J, Prentice R: The Statistical Analysis of Failure Time Data. 1980, New York, USA: John Wiley and SonsGoogle Scholar
- Parpia S, Julian JA, Thabane L, Lee AYY, Rickles FR, Levine MN: Competing events in patients with malignant disease who are at risk for recurrent venous thromboembolism. Contemp Clin Trials. 2011, 32: 829-833. 10.1016/j.cct.2011.07.005.View ArticlePubMedGoogle Scholar
- Ghosh D: Methods for analysis of multiple events in the presence of death. Control Clin Trials. 2000, 21: 115-126. 10.1016/S0197-2456(00)00043-X.View ArticlePubMedGoogle Scholar
- Lim HJ, Liu J, Melzer-Lange M: Comparison of methods for analyzing recurrent events data: application to the Emergency Department Visits of Pediatric Firearm Victims. Accid Anal Prev. 2007, 39: 290-299. 10.1016/j.aap.2006.07.009.View ArticlePubMedGoogle Scholar
- Metcalfe C, Thompson SG: Wei, Lin and Weissfeld’s marginal analysis of multivariate failure time data: should it be applied to a recurrent events outcome?. Stat Methods Med Res. 2007, 16: 103-122. 10.1177/0962280206071926.View ArticlePubMedGoogle Scholar
- Li QH, Lagakos SW: Use of the Wei-Lin-Weissfeld method for the analysis of a recurring and a terminating event. Stat Med. 1997, 16: 925-940. 10.1002/(SICI)1097-0258(19970430)16:8<925::AID-SIM545>3.0.CO;2-2.View ArticlePubMedGoogle Scholar
- Wienke A: Frailty Models in Survival Analysis. 2010, Boca Raton, FL: Chapman & Hall/CRCView ArticleGoogle Scholar
- Therneau T, Grambsch P: Modeling Survival Data. 2000, New York, NY: SpringerView ArticleGoogle Scholar
- Duchateau L, Janssen P: Evolution of recurrent asthma event rate over time in frailty models. J Roy Stat Soc C Appl Stat. 2003, 52: 355-363. 10.1111/1467-9876.00409.View ArticleGoogle Scholar
- Duchateau L, Janssen P: The Frailty Model. 2010, New York, NY: SpringerGoogle Scholar
- Freidlin B, Korn EL: Testing treatment effects in the presence of competing risks. Stat Med. 2005, 24: 1703-1712. 10.1002/sim.2054.View ArticlePubMedGoogle Scholar
- Clayton D: A model for association in bivariate life tables and its application in epidemiological of familial studies tendency in chronic disease incidence. Biometrika. 1978, 65: 141-151. 10.1093/biomet/65.1.141.View ArticleGoogle Scholar
- Hougaard P: A Class of Multivariate Failure Time Distributions. Biometrika. 1986, 73: 671-678.Google Scholar
- B-Choo T, Stavola BLD, Gruttola VD, Gebski V, Machin D: First-event or marginal estimation of cause-specific hazards for analysing correlated multivariate failure-time data ?. Stat Med. 2008, 27: 922-936. 10.1002/sim.2944.View ArticleGoogle Scholar
- Pintilie M: Competing Risks: A Practical Perspective. 2006, Chichester, UK: John Wiley and SonsView ArticleGoogle Scholar
- Koller MT, Raatz H, Steyerberg EW, Wolbers M: Competing risks and the clinical community: irrelevance or ignorance?. Stat Med. 2011, 31: 1089-1097.View ArticlePubMedPubMed CentralGoogle Scholar
- Tai B-C, Wee J, Machin D: Analysis and design of randomised clinical trials involving competing risks endpoints. Trials. 2011, 12: 127-10.1186/1745-6215-12-127.View ArticlePubMedPubMed CentralGoogle Scholar
- Allignol A, Schumacher M, Wanner C, Drechsler C, Beyersmann J: Understanding competing risks: a simulation point of view. BMC Med Res Methodol. 2011, 11: 86-10.1186/1471-2288-11-86.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu L, Huang X: The use of Gaussian quadrature for estimation in frailty proportional hazards models. Stat Med. 2008, 27: 2665-2683. 10.1002/sim.3077.View ArticlePubMedGoogle Scholar
- Dixon SN, Darlington GA, Desmond AF: A competing risks model for correlated data based on the subdistribution hazard. Lifetime Data Anal. 2011, 17: 473-495. 10.1007/s10985-011-9198-9.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/13/44/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.