This article has Open Peer Review reports available.
A DAG-based comparison of interventional effect underestimation between composite endpoint and multi-state analysis in cardiovascular trials
- Antje Jahn-Eimermacher^{1}Email authorView ORCID ID profile,
- Katharina Ingel^{1},
- Stella Preussler^{1},
- Antoni Bayes-Genis^{2} and
- Harald Binder^{1, 3}
https://doi.org/10.1186/s12874-017-0366-9
© The Author(s) 2017
Received: 10 November 2016
Accepted: 9 June 2017
Published: 4 July 2017
Abstract
Background
Composite endpoints comprising hospital admissions and death are the primary outcome in many cardiovascular clinical trials. For statistical analysis, a Cox proportional hazards model for the time to first event is commonly applied. There is an ongoing debate on whether multiple episodes per individual should be incorporated into the primary analysis. While the advantages in terms of power are readily apparent, potential biases have been mostly overlooked so far.
Methods
Motivated by a randomized controlled clinical trial in heart failure patients, we use directed acyclic graphs (DAG) to investigate potential sources of bias in treatment effect estimates, depending on whether only the first or multiple episodes are considered. The biases first are explained in simplified examples and then more thoroughly investigated in simulation studies that mimic realistic patterns.
Results
Particularly the Cox model is prone to potentially severe selection bias and direct effect bias, resulting in underestimation when restricting the analysis to first events. We find that both kinds of bias can simultaneously be reduced by adequately incorporating recurrent events into the analysis model. Correspondingly, we point out appropriate proportional hazards-based multi-state models for decreasing bias and increasing power when analyzing multiple-episode composite endpoints in randomized clinical trials.
Conclusions
Incorporating multiple episodes per individual into the primary analysis can reduce the bias of a treatment’s total effect estimate. Our findings will help to move beyond the paradigm of considering first events only for approaches that use more information from the trial and augment interpretability, as has been called for in cardiovascular research.
Keywords
Background
When analyzing composite endpoints that incorporate an endpoint with multiple episodes, such as hospital admission, a time to first event approach is frequently adopted for randomized clinical trials. Researchers from different disciplines have called for more appropriate methods of statistical analysis to more closely reflect the patients’ disease burden. This involves a discussion on whether multiple episodes per patient are to be analyzed. So far, this discussion mostly has considered power issues, while overlooking potential bias. In this work, we investigate sources of bias and show that there can be a potentially severe underestimation of treatment effect estimates, when derived only based on first events, that can be substantially reduced by adequately modeling multiple episodes per patient.
Composite endpoints combine several events of interest into a single variable, usually defined as a time to event outcome. They are frequently used as primary or secondary endpoints in cardiovascular clinical trials [1, 2]. Composite outcomes facilitate the evaluation of treatment effects when unrealistically large sample sizes would be required to detect differences in the incidence of single outcomes among treatment groups, for example mortality. While using a composite outcome may help in terms of power, at the same time it introduces its own difficulties concerning interpretation of trial results and methodological challenges [2–6]. One major concern is that endpoints occurring in individual patients usually are clinically related (such as nonfatal and fatal myocardial infarctions). Multi-state modeling of these relations by allowing for separate transition hazards between the different subsequent events has recently been proposed for large cardiovascular observational studies [7, 8]. However, for randomized clinical trials this is suspected to attenuate the power and confirmatory character of the trial [9]. In the majority of clinical trials, the concern for potential relations between clinical episodes is therefore addressed by counting only one event per patient and analyzing the time to the first of all components. By following this approach, only data on the first episode per individual are used for the primary statistical analysis, even when subsequent episodes (including deaths) have been recorded. There is an ongoing debate, in particular in cardiovascular research, on the efficiency and validity of this practice because it ignores a great deal of clinically relevant information [3, 10–12]. The impact of multiple episodes per patient on the power of a clinical trial is apparently promising [3, 13], and selected statistical methods have been exemplarily applied to single trial data [14–16]. However, less attention is paid to the estimation and interpretability of treatment effects that can be substantially attenuated depending on whether multiple episodes are analyzed or not. We consider this critical since the choice of a statistical method for analyzing trial data should not be mainly driven by power considerations but by the objective to obtain an unbiased and meaningful treatment effect estimate, i.e. to make causal inferences about the treatment and its (added) benefit and to understand how a treatment influences a patient’s disease burden.
Although randomized clinical trials are often suspected to produce unbiased results as the randomized treatment allocation prevents confounding, hazard-based survival analysis can introduce its own bias [17–20]. In particular, the Andersen-Gill approach [21] has been suspected to introduce bias by erroneously modeling that a clinical episode will leave a patient’s risk profile unchanged and will not affect the incidence rate for future episodes [22–25]. This finding has been controversially discussed as it implicitly assumes that direct effects are to be estimated [26]. The causal directed acyclic graphs approach (DAG) [27, 28] has been proposed for defining adequate statistical models that prevent or minimize bias in the presence of confounding. It is a powerful tool for identifying and addressing bias and is increasingly popular, but it is primarily applied in epidemiological research. In this work, we will make use of this approach for randomized clinical trials to provide an accessible explanation of potential bias in proportional hazards-based survival analysis of first and multiple episodes of a composite endpoint and to define adequate statistical models for reducing or preventing bias. While the use of DAGs may be problematic in a continuous time setting [29], we are avoiding such issues by first considering actual discrete states in DAG analysis, and making the transition to continuous time settings with evidence from simulations.
The article is organized as follows: We motivate this research with a clinical example in “Cardiovascular clinical trial example” section. Then, in “Methods” section, we first formalize potential bias via directed acyclic graphs and illustrate the findings on simplified examples. Thereafter we identify statistical models that have the potential to reduce that bias. We support our findings by simulation studies that mimic the motivating clinical trial situation and present the results in “Results” section. Finally, we finish the article with a discussion in “Discussion” section.
Cardiovascular clinical trial example
This work has been motivated by the ST2 guided tReatment upON discharGe in Heart Failure (STRONG-HF) trial, a randomized controlled clinical trial that has been planned to investigate whether heart failure patients will benefit from a biomarker-based treatment scheme compared to standard care. It is planned as a multicenter prospective, randomized, open-label for patients, blinded-endpoint and event-driven study. The primary endpoint was defined as a composite of cardiovascular mortality and recurrent worsening heart failure. Worsening heart failure includes hospitalization due to heart failure or urgent visit to the emergency department or heart failure clinic due to decompensation needing unplanned intravenous diuretic treatment. Patients are to be uniformly recruited over a period of one year and are to be followed for one year after the end of the recruitment phase. The two regimens are to be allocated randomly and in a balanced fashion among the recruited patients. In addition to the treatments’ effect on the combined endpoint, its effects on the single components, cardiovascular death and disease-associated admissions, are also of major interest. From previous data, an annual death rate of 0.14 and an annual admission rate of 1.17 is expected for the patients under standard care (control group), defining a hazard rate for the composite endpoint of λ=1.31. Treatment is expected to decrease that rate by 25%, corresponding to a hazard ratio of HR =0.75. When the time to first composite endpoint is analyzed, a total number of N=465 patients is required to attain a power of 80% for rejecting the null hypothesis of no treatment effect on the incidence of the composite endpoint H _{0}={H R=1} [30]. Incorporating recurrent events into the statistical analysis has the potential to decrease the sample size to up to N=223 [13], and thus is apparently promising for improving the feasibility and efficiency of the trial. However, disease-associated complications that require a hospital admission will obviously affect the risk for further non-fatal and fatal outcomes. For example, patients who acquire a non-fatal MI have an increased risk for fatal and non-fatal outcomes thereafter. Concern arises if this might question the study results, and, more generally, how incorporating recurrent events into the primary statistical analysis will affect the treatment effect estimates and thus the interpretation of trial results.
Methods
Formalizing potential bias via directed acyclic graphs
The graphical representation of causal effects between variables [27, 28] helps to understand the sources of potential bias when estimating some causal effect of an exposure to an outcome and how different statistical models differently address that bias. In the causal directed acyclic graph (DAG) approach, an arrow connecting two variables indicates causation; variables with no direct causal association are left unconnected. We will use this approach for illustrating the causal system in randomized clinical trials when a composite endpoint is investigated that comprises fatal and non-fatal events. An example is the composite of cardiovascular death and hospital admission for heart failure disease as defined in the motivating clinical trial example (“Cardiovascular clinical trial example” section). Effect estimation is assumed to be hazard-based with a proportional hazards assumption.
Selection bias
Example
Expected patient numbers in the discrete failure time example for time to first event stratified by subgroup (“Selection bias” section)
Stratum | Group | Event at t _{2} | No event at t _{2} | At risk at t _{2} | OR |
---|---|---|---|---|---|
Low-risk subgroup | Treatment | 144 | 576 | 720 | 0.5 |
Placebo | 200 | 400 | 600 | ||
High-risk subgroup | Treatment | 225 | 225 | 450 | 0.5 |
Placebo | 200 | 100 | 300 | ||
All patients (unstratified) | Treatment | 369 | 801 | 1170 | 0.57 |
Placebo | 400 | 500 | 900 |
Direct effect bias
Example
Expected patient numbers in the discrete failure time example for time to death stratified by previously experienced non-fatal event (“Direct effect bias”section)
Stratum | Group | Death at t _{2} | Alive at t _{2} | At risk | OR |
---|---|---|---|---|---|
Non-fatal event at t _{1} | Treatment | 120 | 180 | 300 | 1 |
Placebo | 240 | 360 | 600 | ||
No non-fatal event at t _{1} | Treatment | 120 | 480 | 600 | 1 |
Placebo | 60 | 240 | 300 | ||
All patients (unstratified) | Treatment | 240 | 660 | 900 | 0.73 |
Placebo | 300 | 600 | 900 |
Reducing bias by statistical modeling
We will now transfer the insights on biased effect estimation as derived from the DAGs to identify statistical analysis models that have the potential to reduce that bias. Consider a randomized clinical trial with n subjects followed for a composite endpoint. Subjects will be indexed by i, events by j. Let \(T_{CE,ij}^{*}\) be a series of random variables that describe the time from starting point 0 to the j-th occurrence of the composite endpoint in subject i. Let further C _{ i } be independent identically distributed random variables that describe the time to censoring. We observe \(T_{CE,ij}=min(T_{CE,ij}^{*},C_{i})\), the time to composite endpoint or censoring, whichever comes first, and the indicator variables \(\delta _{ij}=\textbf {I}\left \{T_{CE,ij}^{*}\leq C_{i}\right \}\).
as in this model the risk sets \(R_{(i)}^{C}\) are restricted to subjects that are not only still alive but also free of any previous non-fatal event at time t _{ i1}, the time of the first event or censoring of individual i.
and risk sets \(R_{(ij_{l})}^{MS}\) that include all subjects who have not been censored and have not died before the particular event time, respectively. This generalization of the Andersen-Gill model allowing for separate treatment effects for each component, β _{ l }, can be proposed whenever sample size and event frequency allow for such an approach. It still does not stratify on the event history and does not restrict the at-risk-set only to those subjects that are free of any event, but allows for a higher flexibility with respect to differential treatment effects.
Note, that we focus on marginal models within this manuscript. By introducing a (joint) frailty term into model (5) or (6) and applying penalized likelihoods [38], a conditional joint frailty model could also be fitted. By conditioning on the frailty term the selection bias as illustrated in Fig. 1 is minimized, however at the price of increasing the model complexity by introducing further model assumptions (joint frailty distribution) and parameters (frailty variance). We will show in the next section that in many applications one can safely stay with the marginal model, thereby following the Occam’s razor principle.
Simulation studies
We investigate the bias in treatment effect estimation as identified in “Formalizing potential bias via directed acyclic graphs” section (selection bias, direct effect bias) in simulation studies. The simulation study mimics the clinical trial situation that has motivated this research. For this purpose, we consider a balanced randomized clinical trial with a follow-up of two years and uniformly distributed recruitment of N=380 individuals over the first year. The transition hazards λ _{1} and λ _{2} (Fig. 4 and Eq. (6)) for the transitions to fatal and non-fatal events, respectively, are defined by λ _{ l }(t|X _{ i })=λ _{ l }· exp(β _{ l } X _{ i }).
Note that the unobserved variable acts on both transition hazards, inducing a correlation between both processes. Such a joint model [38] is considered to more closely mimic real clinical trial data as compared to simulation models assuming independency between the event processes, as in most situations it can be expected that patient and disease characteristics will affect adverse disease outcomes towards the same direction. Different θ∈{0,0.2,…,1} reflect different strengths of association between the unobserved variable and the fatal and non-fatal outcomes and will therefore cause different degrees of selection bias. In a second simulation study we add an indirect effect of treatment on the composite outcome by defining the transition hazards to be increasing by a factor of ρ with each non-fatal event. By applying a range of values between ρ=1 (no increase of hazards) and ρ=1.3 (increase of hazards by 30% with each non-fatal event), different degrees of the indirect effects are evaluated.
In a third simulation study we investigate treatment effect estimation when both effects are present, that is the transition hazards increase with each non-fatal events by a factor of ρ (ρ∈ [ 1,1.3]) while in addition a gamma-distributed frailty term with mean 1 and a moderate variance of θ=0.6 acts on all transition hazards. For each simulation model 5000 datasets are simulated, respectively.
All simulated data are analyzed by the Andersen-Gill model for the composite endpoint (1) and its multi-state extension (6) to estimate separate treatment effects on fatal and non-fatal outcomes. Both models are applied to the full simulated datasets and to datasets that are restricted to the first composite endpoint per individual. For the restricted data, the Andersen-Gill model then reduces to a Cox proportional hazards model and its multi-state extension to a competing risk model.
All data are simulated and analyzed in the open-source statistical environment R, version 3.1.0 (2014-04-10) [39] and by extending the published simulation algorithm for recurrent event data [40]. Mean regression coefficient estimates are derived together with standard errors as estimated from their variability among the simulations.
Results
Simulation results for λ _{1}=0.14 and λ _{2}=1.17
Simulation parameters | Results of 1st-event-analyses | Results of all-events-analyses | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
exp(β _{1}) | exp(β _{2}) | θ | ρ | \(\exp (\widehat {\beta _{CE}})\) | \(\widehat {SE}(\beta _{CE})\) | \(\exp (\widehat {\beta _{1}})\) | \(\widehat {SE}(\beta _{1})\) | \(\exp (\widehat {\beta _{2}})\) | \(\widehat {SE}(\beta _{2})\) | \(\exp (\widehat {\beta _{CE}})\) | \(\widehat {SE}(\beta _{CE})\) | \(\exp (\widehat {\beta _{1}})\) | \(\widehat {SE}(\beta _{1})\) | \(\exp (\widehat {\beta _{2}})\) | \(\widehat {SE}(\beta _{2})\) | ||
0.75 | 0.75 | 0.00 | 1.00 | 0.75 | 0.11 | 0.75 | 0.37 | 0.75 | 0.12 | 0.75 | 0.08 | 0.75 | 0.26 | 0.75 | 0.09 | ||
0.75 | 0.75 | 0.00 | 1.05 | 0.75 | 0.11 | 0.74 | 0.37 | 0.75 | 0.12 | 0.74 | 0.08 | 0.73 | 0.25 | 0.74 | 0.09 | ||
0.75 | 0.75 | 0.00 | 1.10 | 0.75 | 0.12 | 0.75 | 0.37 | 0.75 | 0.12 | 0.73 | 0.09 | 0.73 | 0.25 | 0.73 | 0.09 | ||
0.75 | 0.75 | 0.00 | 1.15 | 0.75 | 0.12 | 0.76 | 0.37 | 0.75 | 0.12 | 0.72 | 0.09 | 0.72 | 0.24 | 0.72 | 0.10 | ||
0.75 | 0.75 | 0.00 | 1.20 | 0.75 | 0.12 | 0.75 | 0.37 | 0.75 | 0.12 | 0.70 | 0.09 | 0.70 | 0.24 | 0.70 | 0.10 | ||
0.75 | 0.75 | 0.00 | 1.25 | 0.75 | 0.11 | 0.74 | 0.37 | 0.75 | 0.12 | 0.68 | 0.10 | 0.68 | 0.23 | 0.68 | 0.11 | ||
0.75 | 0.75 | 0.00 | 1.30 | 0.75 | 0.12 | 0.74 | 0.37 | 0.75 | 0.12 | 0.65 | 0.12 | 0.65 | 0.22 | 0.65 | 0.13 | ||
0.75 | 0.75 | 0.60 | 1.00 | 0.80 | 0.13 | 0.80 | 0.40 | 0.80 | 0.13 | 0.76 | 0.12 | 0.76 | 0.26 | 0.76 | 0.12 | ||
0.75 | 0.75 | 0.60 | 1.05 | 0.80 | 0.13 | 0.80 | 0.40 | 0.80 | 0.13 | 0.75 | 0.12 | 0.75 | 0.26 | 0.75 | 0.13 | ||
0.75 | 0.75 | 0.60 | 1.10 | 0.80 | 0.13 | 0.80 | 0.41 | 0.80 | 0.13 | 0.74 | 0.13 | 0.74 | 0.26 | 0.74 | 0.13 | ||
0.75 | 0.75 | 0.60 | 1.15 | 0.80 | 0.12 | 0.80 | 0.40 | 0.80 | 0.13 | 0.73 | 0.13 | 0.72 | 0.24 | 0.73 | 0.14 | ||
0.75 | 0.75 | 0.60 | 1.20 | 0.80 | 0.13 | 0.80 | 0.41 | 0.81 | 0.13 | 0.73 | 0.14 | 0.71 | 0.24 | 0.73 | 0.15 | ||
0.75 | 0.75 | 0.60 | 1.25 | 0.81 | 0.13 | 0.80 | 0.40 | 0.81 | 0.13 | 0.72 | 0.15 | 0.70 | 0.23 | 0.73 | 0.15 | ||
0.75 | 0.75 | 0.60 | 1.30 | 0.81 | 0.13 | 0.80 | 0.41 | 0.81 | 0.13 | 0.72 | 0.15 | 0.69 | 0.22 | 0.72 | 0.16 | ||
0.92 | 0.75 | 0.00 | 1.00 | 0.77 | 0.12 | 0.92 | 0.35 | 0.75 | 0.13 | 0.77 | 0.08 | 0.92 | 0.24 | 0.75 | 0.09 | ||
0.92 | 0.75 | 0.00 | 1.05 | 0.77 | 0.11 | 0.92 | 0.35 | 0.75 | 0.12 | 0.76 | 0.08 | 0.90 | 0.24 | 0.74 | 0.09 | ||
0.92 | 0.75 | 0.00 | 1.10 | 0.77 | 0.12 | 0.92 | 0.35 | 0.75 | 0.12 | 0.75 | 0.09 | 0.90 | 0.23 | 0.73 | 0.09 | ||
0.92 | 0.75 | 0.00 | 1.15 | 0.77 | 0.12 | 0.93 | 0.34 | 0.75 | 0.12 | 0.74 | 0.09 | 0.88 | 0.23 | 0.72 | 0.10 | ||
0.92 | 0.75 | 0.00 | 1.20 | 0.77 | 0.11 | 0.92 | 0.35 | 0.75 | 0.12 | 0.72 | 0.09 | 0.86 | 0.22 | 0.70 | 0.10 | ||
0.92 | 0.75 | 0.00 | 1.25 | 0.77 | 0.12 | 0.93 | 0.35 | 0.75 | 0.12 | 0.70 | 0.11 | 0.83 | 0.22 | 0.68 | 0.11 | ||
0.92 | 0.75 | 0.00 | 1.30 | 0.77 | 0.12 | 0.92 | 0.35 | 0.75 | 0.12 | 0.66 | 0.12 | 0.79 | 0.21 | 0.65 | 0.12 | ||
0.92 | 0.75 | 0.60 | 1.00 | 0.82 | 0.12 | 0.98 | 0.38 | 0.80 | 0.13 | 0.77 | 0.11 | 0.92 | 0.25 | 0.75 | 0.12 | ||
0.92 | 0.75 | 0.60 | 1.05 | 0.82 | 0.12 | 0.98 | 0.38 | 0.80 | 0.13 | 0.76 | 0.12 | 0.91 | 0.24 | 0.74 | 0.13 | ||
0.92 | 0.75 | 0.60 | 1.10 | 0.82 | 0.13 | 0.98 | 0.38 | 0.80 | 0.13 | 0.75 | 0.12 | 0.89 | 0.24 | 0.73 | 0.13 | ||
0.92 | 0.75 | 0.60 | 1.15 | 0.82 | 0.12 | 0.98 | 0.39 | 0.80 | 0.13 | 0.74 | 0.13 | 0.87 | 0.23 | 0.72 | 0.14 | ||
0.92 | 0.75 | 0.60 | 1.20 | 0.82 | 0.13 | 0.98 | 0.38 | 0.80 | 0.13 | 0.73 | 0.14 | 0.85 | 0.22 | 0.71 | 0.15 | ||
0.92 | 0.75 | 0.60 | 1.25 | 0.82 | 0.12 | 0.97 | 0.38 | 0.80 | 0.13 | 0.72 | 0.15 | 0.82 | 0.21 | 0.71 | 0.15 | ||
0.92 | 0.75 | 0.60 | 1.30 | 0.82 | 0.13 | 0.98 | 0.38 | 0.80 | 0.14 | 0.72 | 0.15 | 0.81 | 0.21 | 0.70 | 0.16 |
Simulation results for λ _{1}=0.655 and λ _{2}=0.655
Simulation parameters | Results of 1st-event-analyses | Results of all-events-analyses | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
exp(β _{1}) | exp(β _{2}) | θ | ρ | \(\exp (\widehat {\beta _{CE}})\) | \(\widehat {SE}(\beta _{CE})\) | \(\exp (\widehat {\beta _{1}})\) | \(\widehat {SE}(\beta _{1})\) | \(\exp (\widehat {\beta _{2}})\) | \(\widehat {SE}(\beta _{2})\) | \(\exp (\widehat {\beta _{CE}})\) | \(\widehat {SE}(\beta _{CE})\) | \(\exp (\widehat {\beta _{1}})\) | \(\widehat {SE}(\beta _{1})\) | \(\exp (\widehat {\beta _{2}})\) | \(\widehat {SE}(\beta _{2})\) | ||
0.75 | 0.75 | 0.00 | 1.00 | 0.75 | 0.12 | 0.75 | 0.17 | 0.75 | 0.16 | 0.75 | 0.10 | 0.75 | 0.14 | 0.75 | 0.14 | ||
0.75 | 0.75 | 0.00 | 1.05 | 0.75 | 0.11 | 0.75 | 0.16 | 0.75 | 0.16 | 0.74 | 0.10 | 0.75 | 0.14 | 0.74 | 0.14 | ||
0.75 | 0.75 | 0.00 | 1.10 | 0.75 | 0.11 | 0.75 | 0.16 | 0.75 | 0.16 | 0.74 | 0.10 | 0.74 | 0.14 | 0.74 | 0.14 | ||
0.75 | 0.75 | 0.00 | 1.15 | 0.75 | 0.12 | 0.75 | 0.17 | 0.75 | 0.16 | 0.74 | 0.10 | 0.74 | 0.14 | 0.74 | 0.14 | ||
0.75 | 0.75 | 0.00 | 1.20 | 0.75 | 0.12 | 0.75 | 0.16 | 0.75 | 0.16 | 0.73 | 0.10 | 0.74 | 0.13 | 0.73 | 0.14 | ||
0.75 | 0.75 | 0.00 | 1.25 | 0.75 | 0.12 | 0.75 | 0.17 | 0.75 | 0.16 | 0.73 | 0.10 | 0.73 | 0.14 | 0.73 | 0.15 | ||
0.75 | 0.75 | 0.00 | 1.30 | 0.75 | 0.12 | 0.75 | 0.16 | 0.75 | 0.17 | 0.73 | 0.11 | 0.73 | 0.13 | 0.72 | 0.15 | ||
0.75 | 0.75 | 0.60 | 1.00 | 0.80 | 0.12 | 0.80 | 0.18 | 0.80 | 0.18 | 0.78 | 0.12 | 0.78 | 0.15 | 0.78 | 0.17 | ||
0.75 | 0.75 | 0.60 | 1.05 | 0.80 | 0.12 | 0.80 | 0.18 | 0.80 | 0.18 | 0.78 | 0.12 | 0.78 | 0.15 | 0.78 | 0.18 | ||
0.75 | 0.75 | 0.60 | 1.10 | 0.80 | 0.12 | 0.80 | 0.18 | 0.80 | 0.18 | 0.78 | 0.12 | 0.78 | 0.14 | 0.78 | 0.17 | ||
0.75 | 0.75 | 0.60 | 1.15 | 0.80 | 0.12 | 0.80 | 0.18 | 0.80 | 0.18 | 0.78 | 0.12 | 0.78 | 0.15 | 0.78 | 0.18 | ||
0.75 | 0.75 | 0.60 | 1.20 | 0.80 | 0.12 | 0.81 | 0.18 | 0.80 | 0.18 | 0.78 | 0.13 | 0.78 | 0.14 | 0.78 | 0.18 | ||
0.75 | 0.75 | 0.60 | 1.25 | 0.80 | 0.13 | 0.80 | 0.18 | 0.80 | 0.18 | 0.78 | 0.13 | 0.78 | 0.15 | 0.78 | 0.18 | ||
0.75 | 0.75 | 0.60 | 1.30 | 0.80 | 0.13 | 0.80 | 0.18 | 0.80 | 0.18 | 0.78 | 0.13 | 0.78 | 0.14 | 0.78 | 0.18 | ||
0.92 | 0.75 | 0.00 | 1.00 | 0.84 | 0.11 | 0.92 | 0.16 | 0.75 | 0.17 | 0.83 | 0.10 | 0.92 | 0.13 | 0.75 | 0.14 | ||
0.92 | 0.75 | 0.00 | 1.05 | 0.83 | 0.12 | 0.92 | 0.16 | 0.75 | 0.17 | 0.83 | 0.10 | 0.91 | 0.13 | 0.75 | 0.14 | ||
0.92 | 0.75 | 0.00 | 1.10 | 0.84 | 0.11 | 0.92 | 0.16 | 0.75 | 0.16 | 0.83 | 0.10 | 0.91 | 0.13 | 0.74 | 0.14 | ||
0.92 | 0.75 | 0.00 | 1.15 | 0.84 | 0.12 | 0.92 | 0.16 | 0.75 | 0.17 | 0.82 | 0.10 | 0.90 | 0.13 | 0.74 | 0.15 | ||
0.92 | 0.75 | 0.00 | 1.20 | 0.83 | 0.11 | 0.92 | 0.16 | 0.75 | 0.17 | 0.82 | 0.10 | 0.90 | 0.13 | 0.73 | 0.15 | ||
0.92 | 0.75 | 0.00 | 1.25 | 0.83 | 0.12 | 0.92 | 0.16 | 0.75 | 0.17 | 0.81 | 0.10 | 0.89 | 0.13 | 0.73 | 0.15 | ||
0.92 | 0.75 | 0.00 | 1.30 | 0.83 | 0.11 | 0.92 | 0.16 | 0.75 | 0.17 | 0.81 | 0.10 | 0.89 | 0.13 | 0.72 | 0.15 | ||
0.92 | 0.75 | 0.60 | 1.00 | 0.87 | 0.12 | 0.95 | 0.17 | 0.78 | 0.18 | 0.84 | 0.12 | 0.93 | 0.14 | 0.76 | 0.17 | ||
0.92 | 0.75 | 0.60 | 1.05 | 0.87 | 0.12 | 0.97 | 0.17 | 0.78 | 0.18 | 0.84 | 0.12 | 0.93 | 0.14 | 0.76 | 0.18 | ||
0.92 | 0.75 | 0.60 | 1.10 | 0.87 | 0.12 | 0.96 | 0.17 | 0.78 | 0.18 | 0.84 | 0.12 | 0.93 | 0.14 | 0.75 | 0.18 | ||
0.92 | 0.75 | 0.60 | 1.15 | 0.87 | 0.12 | 0.96 | 0.17 | 0.78 | 0.18 | 0.84 | 0.12 | 0.92 | 0.14 | 0.75 | 0.18 | ||
0.92 | 0.75 | 0.60 | 1.20 | 0.87 | 0.12 | 0.96 | 0.17 | 0.78 | 0.18 | 0.83 | 0.12 | 0.92 | 0.14 | 0.75 | 0.18 | ||
0.92 | 0.75 | 0.60 | 1.25 | 0.87 | 0.12 | 0.96 | 0.17 | 0.78 | 0.18 | 0.83 | 0.13 | 0.91 | 0.14 | 0.75 | 0.19 | ||
0.92 | 0.75 | 0.60 | 1.30 | 0.87 | 0.12 | 0.96 | 0.17 | 0.78 | 0.18 | 0.83 | 0.13 | 0.91 | 0.14 | 0.74 | 0.18 |
estimates differ by outcome. However, compared to the setting with a common treatment effect (Fig. 5), all effect estimates are similarly affected by selection bias with respect to the direction and magnitude of that bias.
As the hazard for the composite endpoint is the sum of the hazards over the two components [41], the hazard ratio can be derived as \(1/(\lambda _{1}+\lambda _{2}) {\sum \nolimits }_{i=1}^{2} \lambda _{i} \exp (\beta _{i})\) in the situation of constant hazards. This weighted sum is estimated when analysing the composite outcome using first events only or all events as long as no selection bias and no indirect effects are present, that is θ=0 for the analysis of 1st events and ρ=1 for the analysis of all events (Figs. 6 and 8). θ>0 and/or ρ>1 then affect the estimates for the composite endpoint in the same direction as the estimates for the single components.
Whereas selection bias is attenuating the treatment effect estimates, hazards that increase with each non-fatal event induce the total effect estimates to become larger than the direct effect only. As a consequence, the differences between direct and total treatment effect estimates decrease with increasing degree of selection bias. Whereas \(\exp (\hat {\beta _{2}})\) decreased from 0.75 to 0.65 when hazards increase by 0 to 30% with each non-fatal event, under θ=0.6 only a decrease up to 0.72 is still observed (Table 3). Under a higher mortality rate of λ _{1}=0.655 even not any decrease in the total effect estimate is observed (\(\exp (\hat {\beta _{2}})=0.78\)) as here the selection bias starts to prevail (Table 4).
Discussion
Potential biases in analysis of composite endpoints that comprise endpoints with multiple episodes, such as hospital admission, have been mostly overlooked so far. To advance the state-of-the-art, we provided an accessible explanation of biases in this setting, that is supported by simulation results. Our results show that the initial step in modeling must be defining the treatment effect that is of interest: A total treatment effect estimate can only be derived by analysing all events, whereas only the direct treatment effect can be estimated from analyses of 1st events or from analyses that are stratified by event history. When interpreting trial results, eventually derived from different statistical models, one must be aware, that the direct effect estimates can be severly more prone to selection bias. Our findings will help to move beyond the paradigm of considering first events only for approaches that use more information from the trial and augment interpretability, as has been called for in cardiovascular research [11, 12].
The association of some variable with the outcome is not a reasonable criterion for covariate selection in multiple regression, as has been described in epidemiology for example to explain the birth-weight paradox [42]. We use similar arguments in randomized clinical trials to justify that adjusting or stratifying for the patients’ disease history within trial time is inadequate for estimating a treatments’ total effect.
Selection bias in the Cox proportional hazards model as arising from the non-collapsibility of the hazard ratio estimate [18, 28] has recently been described by Aalen et al. [20]. They use a hypothetical example, where each individual who dies is replaced by an identical individual having the same covariate structure, which would prevent selection bias. In a way, the Andersen-Gill model implements this idea for non-fatal recurrent events by leaving individuals in the risk set after having experienced an event. A terminal component of the composite will still cause selection bias under the Andersen-Gill and multi-state approach. Its magnitude depends on the terminal event rate. Whereas in our simulations, the terminal event rate was small, as observed for most cardiovascular studies [37], and the multi-state models provided nearly unbiased results, Rogers et al. [43] advocate the need for joint frailty models [38, 44] to prevent from bias. However, their findings are based on simulation studies with high mortality rates (up to 31%), which explains these controversial conclusions. Balan et al. [45] recently proposed a score test for deciding between multi-state and joint frailty modeling. All these findings confirm, that using composite endpoints in randomized clinical trials can not eliminate the bias arising from the association between the risk processes of the single components as long as only the first event is analyzed [46].
We have focused on the estimation of a treatment effect based on proportional hazards. Additive hazard models have been recommended instead as they are unaffected by non-collapsibility [20, 47].
Hazard ratios are used to assess the early benefit of new drugs compared to some control [48]. Our results indicate the need to further specify the estimand, the assessment refers to: a treatment’s direct or its total effect as both can differ substantially.
In recent years alternatives to hazard-based analyses of composite endpoints have been proposed based on weighted outcomes [49–51] to consider that not all components are of the same clinical relevance and importance for the patients. The multi-state approach proposed in this paper allows a separate investigation of treatment effects on the different components, and it seems to be important to compare both approaches with respect to interpretability of treatment effect estimation and power. Concerning power, the multi-state approach requires some kind of multiplicity adjustment as different treatment effects are estimated for the different components. Sequentially rejective test procedures provide a powerful and flexible tool to control type I error. As with other multivariate time to event outcomes, closed form solutions for sample size planning will be difficult to obtain [52], but simulation algorithms allow for an extensive investigation of sample size requirements, including for complex models [40, 52].
Conclusion
This manuscript provides an accessible explanation of potential biases in treatment effect estimation when analysing composite endpoints. It illustrates that the risk for bias and its degree depend on whether first or multiple episodes per patient are analysed. Integrating multiple episodes into the statistical analysis model has the potential to reduce selection bias and to additionally capture indirect treatment effects. In particular for cardiovascular research, these findings may help to move beyond the paradigm of considering first events only.
Declarations
Acknowledgements
We thank Daniela Zoeller and two referees for their constructive comments improving the manuscript and Kathy Taylor for proof-reading.
Funding
This research was supported by a grant of the Deutsche Forschungsgemeinschaft (DFG) grant number JA 1821/4.
Availability of data and materials
Not applicable.
Authors’ contributions
AJ developed the method, produced the results and wrote the first draft of the manuscript. KI derived the sample size requirements for the STRONG-HF trial and contributed to the methods. SP implemented the simulations. AB designed the STRONG-HF trial and contributed to the introduction, results and discussion sections. HB contributed to all parts of the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Lim E, Brown A, Helmy A, Mussa S, Altman DG. Composite outcomes in cardiovascular research: a survey of randomized trials. Ann Intern Med. 2008; 149(9):612–17.View ArticlePubMedGoogle Scholar
- Freemantle N, Calvert M, Wood J, Eastaugh J, Griffin C. Composite outcomes in randomized trials: greater precision but with greater uncertainty?JAMA. 2003; 289(19):2554–9.View ArticlePubMedGoogle Scholar
- Ferreira-González I, Permanyer-Miralda G, Busse JW, Bryant DM, Montori VM, Alonso-Coello P, Walter SD, Guyatt GH. Methodologic discussions for using and interpreting composite endpoints are limited, but still identify major concerns. J Clin Epidemiol. 2007; 60(7):651–7.View ArticlePubMedGoogle Scholar
- Freemantle N, Calvert M. Weighing the pros and cons for composite outcomes in clinical trials. J Clin Epidemiol. 2007; 60(7):658–9.View ArticleGoogle Scholar
- Montori VM, Permanyer-Miralda G, Ferreira-González I, Busse JW, Pacheco-Huergo V, Bryant D, Alonso J, Akl EA, Domingo-Salvany A, Mills E, Wu P, Schünemann HJ, Jaeschke R, Guyatt GH. Validity of composite end points in clinical trials. BMJ. 2005; 330(7491):594–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Chi GYH. Some issues with composite endpoints in clinical trials. Fundam Clin Pharmacol. 2005; 19(6):609–19.View ArticlePubMedGoogle Scholar
- Ieva F, Jackson CH, Sharples LD. Multi-state modelling of repeated hospitalisation and death in patients with heart failure: The use of large administrative databases in clinical epidemiology. Stat Methods Med Res. 2017; 26(3):1350–72.View ArticlePubMedGoogle Scholar
- Ip EH, Efendi A, Molenberghs G, Bertoni AG. Comparison of risks of cardiovascular events in the elderly using standard survival analysis and multiple-events and recurrent-events methods. BMC Med Res Methodol. 2015; 15(1):15.View ArticlePubMedPubMed CentralGoogle Scholar
- Rauch G, Rauch B, Schüler S, Kieser M. Opportunities and challenges of clinical trials in cardiology using composite primary endpoints. World J Cardiol. 2015; 7(1):1–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Anker SD, Schroeder S, Atar D, Bax JJ, Ceconi C, Cowie MR, Crisp A, Dominjon F, Ford I, Ghofrani HA, Gropper S, Hindricks G, Hlatky MA, Holcomb R, Honarpour N, Jukema JW, Kim AM, Kunz M, Lefkowitz M, Le Floch C, Landmesser U, McDonagh TA, McMurray JJ, Merkely B, Packer M, Prasad K, Revkin J, Rosano GMC, Somaratne R, Stough WG, Voors AA, Ruschitzka F. Traditional and new composite endpoints in heart failure clinical trials: facilitating comprehensive efficacy assessments and improving trial efficiency. Eur J Heart Fail. 2016; 18(5):482–89.View ArticlePubMedGoogle Scholar
- Anker SD, McMurray JJV. Time to move on from ‘time-to-first’: should all events be included in the analysis of clinical trials?Eur Heart J. 2012; 33(22):2764–5.View ArticlePubMedGoogle Scholar
- Claggett B, Wei LJ, Pfeffer MA. Moving beyond our comfort zone. Eur Heart J. 2013; 34(12):869–71.View ArticlePubMedGoogle Scholar
- Ingel K, Jahn-Eimermacher A. Sample-size calculation and reestimation for a semiparametric analysis of recurrent event data taking robust standard errors into account. Biometrical J. 2014; 56(4):631–48.View ArticleGoogle Scholar
- Rogers JK, McMurray JJV, Pocock SJ, Zannad F, Krum H, van Veldhuisen DJ, Swedberg K, Shi H, Vincent J, Pitt B. Eplerenone in patients with systolic heart failure and mild symptoms: analysis of repeat hospitalizations. Circulation. 2012; 126(19):2317–23.View ArticlePubMedGoogle Scholar
- Rogers JK, Pocock SJ, McMurray JJV, Granger CB, Michelson EL, Östergren J, Pfeffer Ma, Solomon SD, Swedberg K, Yusuf S. Analysing recurrent hospitalizations in heart failure: a review of statistical methodology, with application to CHARM-Preserved. Eur J Heart Fail. 2014; 16(1):33–40.View ArticlePubMedGoogle Scholar
- Rogers JK, Jhund PS, Perez AC, Böhm M, Cleland JG, Gullestad L, Kjekshus J, van Veldhuisen DJ, Wikstrand J, Wedel H, McMurray JJV, Pocock SJ. Effect of rosuvastatin on repeat heart failure hospitalizations: the CORONA Trial (Controlled Rosuvastatin Multinational Trial in Heart Failure). JACC Heart Fail. 2014; 2(3):289–97.View ArticlePubMedGoogle Scholar
- Schmoor C, Schumacher M. Effects of covariate omission and categorization when analysing randomized trials with the Cox model. Stat Med. 1997; 16(1-3):225–37.View ArticlePubMedGoogle Scholar
- Hernan MA. The Hazards of Hazard Ratios. Epidemiology. 2010; 21(1):13–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Cécilia-Joseph E, Auvert B, Broët P, Moreau T. Influence of trial duration on the bias of the estimated treatment effect in clinical trials when individual heterogeneity is ignored. Biom J. 2015; 57(3):371–83.View ArticlePubMedGoogle Scholar
- Aalen OO, Cook RJ, Rysland K. Does Cox analysis of a randomized survival study yield a causal treatment effect?Lifetime Data Anal. 2015; 21(4):579–93.View ArticlePubMedGoogle Scholar
- Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. Ann Stat. 1982; 10(4):1100–20.View ArticleGoogle Scholar
- Jahn-Eimermacher A. Comparison of the Andersen-Gill model with poisson and negative binomial regression on recurrent event data. Comput Stat Data Anal. 2008; 52(11):4989–97.View ArticleGoogle Scholar
- Metcalfe C, Thompson SG. The importance of varying the event generation process in simulation studies of statistical methods for recurrent events. Stat Med. 2006; 25:165–79.View ArticlePubMedGoogle Scholar
- Kelly PJ, Lim LL. Survival analysis for recurrent event data: an application to childhood infectious diseases. Stat Med. 2000; 19(1):13–33.View ArticlePubMedGoogle Scholar
- Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model. New York: Springer; 2000.View ArticleGoogle Scholar
- Cheung YB, Xu Y, Tan SH, Cutts F, Milligan P. Estimation of intervention effects using first or multiple episodes in clinical trials: The Andersen-Gill model re-examined. Stat Med. 2010; 29(3):328–6.PubMedGoogle Scholar
- Pearl J. Causal diagrams for empirical research. Biometrika. 1995; 82(4):669–88.View ArticleGoogle Scholar
- Greenland S, Pearl J, Robins JM. Causal Diagrams for Epidemiological Research. Epidemiology. 1999; 10(1):37–48.View ArticlePubMedGoogle Scholar
- Aalen OO, Roysland K, Gran JM, Kouyos R, Lange T. Can we believe the DAGs? A comment on the relationship between causal DAGs and mechanisms. Stat Methods Med Res. 2016; 25(5):2294–314.View ArticlePubMedGoogle Scholar
- Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983; 39(2):499–503.View ArticlePubMedGoogle Scholar
- Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010; 39(2):417–20.View ArticlePubMedGoogle Scholar
- Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004; 15(5):615–25.View ArticlePubMedGoogle Scholar
- Cox DR. Regression models and life-tables (with discussion). J R Stat Soc Ser B. 1972; 34(2):187–220.Google Scholar
- Aalen O. Nonparametric inference for a family of counting processes. Ann Stat. 1978; 6(4):701–26.View ArticleGoogle Scholar
- Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika. 1981; 68:373–79.View ArticleGoogle Scholar
- European Medicines Agency: EMEA/CHMP/EWP/311890/2007 - Guideline on the evaluation of medicinal products for cardiovascular disease prevention. 2008. http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003290.pdf. Assessed June 2017.
- Ferreira-González I, Busse JW, Heels-Ansdell D, Montori VM, Akl Ea, Bryant DM, Alonso-Coello P, Alonso J, Worster A, Upadhye S, Jaeschke R, Schünemann HJ, Permanyer-Miralda G, Pacheco-Huergo V, Domingo-Salvany A, Wu P, Mills EJ, Guyatt GH. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials. BMJ. 2007; 334(7597):786.View ArticlePubMedPubMed CentralGoogle Scholar
- Mazroui Y, Mathoulin-Pelissier S, Soubeyran P, Rondeau V. General joint frailty model for recurrent event data with a dependent terminal event: Application to follicular lymphoma data. Stat Med. 2012; 31(11-12):1162–76.View ArticlePubMedGoogle Scholar
- R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2014. http://www.R-project.org/.Google Scholar
- Jahn-Eimermacher A, Ingel K, Ozga AK, Preussler S, Binder H. Simulating recurrent event data with hazard functions defined on a total time scale. BMC Med Res Methodol. 2015; 15:16.View ArticlePubMedPubMed CentralGoogle Scholar
- Beyersmann J, Latouche A, Buchholz A, Schumacher M. Simulating competing risks data in survival analysis. Stat Med. 2009; 28(6):956–71.View ArticlePubMedGoogle Scholar
- Hernández-Díaz S, Schisterman EF, Hernán MA. The birth weight paradox uncovered?Am J Epidemiol. 2006; 164(11):1115–20.View ArticlePubMedGoogle Scholar
- Rogers JK, Yaroshinsky A, Pocock SJ, Stokar D, Pogoda J. Analysis of recurrent events with an associated informative dropout time: Application of the joint frailty model. Stat Med. 2016; 35(13):2195–205.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu L, Wolfe RA, Huang X. Shared frailty models for recurrent events and a terminal event. Biometrics. 2004; 60(3):747–56.View ArticlePubMedGoogle Scholar
- Balan TA, Boonk SE, Vermeer MH, Putter H. Score test for association between recurrent events and a terminal event. Stat Med. 2016; 35(18):3037–48.View ArticlePubMedGoogle Scholar
- Wu L, Cook RJ. Misspecification of Cox regression models with composite endpoints. Stat Med. 2012; 31(28):3545–62.View ArticlePubMedPubMed CentralGoogle Scholar
- Martinussen T, Vansteelandt S. On collapsibility and confounding bias in Cox and Aalen regression models. Lifetime Data Anal. 2013; 19(3):279–96.View ArticlePubMedGoogle Scholar
- Skipka G, Wieseler B, Kaiser T, Thomas S, Bender R, Windeler J, Lange S. Methodological approach to determine minor, considerable, and major treatment effects in the early benefit assessment of new drugs. Biom J. 2016; 58(1):43–58.View ArticlePubMedGoogle Scholar
- Pocock SJ, Ariti CA, Collier TJ, Wang D. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. Eur Heart J. 2012; 33(2):176–82.View ArticlePubMedGoogle Scholar
- Bebu I, Lachin JM. Large sample inference for a win ratio analysis of a composite outcome based on prioritized components. Biostatistics. 2016; 17(1):178–87.PubMedGoogle Scholar
- Rauch G, Jahn-Eimermacher A, Brannath W, Kieser M. Opportunities and challenges of combined effect measures based on prioritized outcomes. Stat Med. 2014; 33(7):1104–20.View ArticlePubMedGoogle Scholar
- Rauch G, Beyersmann J. Planning and evaluating clinical trials with composite time-to-first-event endpoints in a competing risk framework. Stat Med. 2013; 32(21):3595–608.View ArticlePubMedGoogle Scholar