Bayesian methods in clinical trials: a Bayesian analysis of ECOG trials E1684 and E1690
© Ibrahim et al.; licensee BioMed Central Ltd. 2012
Received: 10 August 2012
Accepted: 6 November 2012
Published: 29 November 2012
Skip to main content
© Ibrahim et al.; licensee BioMed Central Ltd. 2012
Received: 10 August 2012
Accepted: 6 November 2012
Published: 29 November 2012
E1684 was the pivotal adjuvant melanoma trial for establishment of high-dose interferon (IFN) as effective therapy of high-risk melanoma patients. E1690 was an intriguing effort to corroborate E1684, and the differences between the outcomes of these trials have embroiled the field in controversy over the past several years. The analyses of E1684 and E1690 were carried out separately when the results were published, and there were no further analyses trying to perform a single analysis of the combined trials.
In this paper, we consider such a joint analysis by carrying out a Bayesian analysis of these two trials, thus providing us with a consistent and coherent methodology for combining the results from these two trials.
The Bayesian analysis using power priors provided a more coherent flexible and potentially more accurate analysis than a separate analysis of these data or a frequentist analysis of these data. The methodology provides a consistent framework for carrying out a single unified analysis by combining data from two or more studies.
Such Bayesian analyses can be crucial in situations where the results from two theoretically identical trials yield somewhat conflicting or inconsistent results.
Bayesian methods in clinical trials and biomedical research, in general, have become quite prominent in the last decade due to their flexibility in use, good operating characteristics, interpretation, and in their ability to handle design and analysis issues in complex models, such as survival models, models for longitudinal data, and models for discrete data. Bayesian methods are becoming more and more standard in the design and analysis of clinical trials [1, 2]. One main reason for this is in their flexibility and operating characteristics, for example, in adaptive designs and interim monitoring . There are many good introductory as well as advanced level books on Bayesian methods [3–5].
The Bayesian paradigm differs from the frequentist paradigm in that our uncertainty about unknown parameters in a model is expressed through an entire distribution, called the prior distribution. Thus, the prior distribution for a model parameter θ, expresses our prior uncertainty about the value of the parameter, and “prior” in the sense that we express our uncertainty before collecting the data in the study. We denote the prior distribution for θ by π(θ). The main inferential tool in the Bayesian paradigm is called the posterior distribution, that is, the distribution of θ after the data is collected. By Bayes’ theorem, the posterior distribution of θ is proportional to the likelihood function of the data times the prior. In terms of a formula, it is given by π(θ∣Data) ∝ L(Data∣θ)π(θ), where π(θ∣Data) denotes the posterior distribution of θ that is, the distribution of θ given the data, and L(Data∣θ) denotes the likelihood function of the data given the parameter θ.
A major consideration in the Bayesian paradigm is the choice of the prior. Priors that have a minimal impact on the overall Bayesian analysis are called noninformative priors. Other names for noninformative priors include flat, reference, or vague priors. Noninformative priors yield Bayesian inferences that are very similar to frequentist inference. A noninformative prior is “flat” relative to the likelihood function, that is, it is flat relative to the distribution of the data. On the other hand, informative priors generally do not lead to results that are similar to those of the frequentist paradigm and such priors are not flat relative to the likelihood function and do have an impact on the likelihood in a Bayesian analysis. Examples of potentially informative priors used in biomedical research include priors incorporating historical data [6, 7].
Bayesian models can be fit in a wide variety of statistical packages including WinBUGS and SAS. These software packages have become very powerful in providing the data analyst with a wide array of flexibility and capability for fitting complex Bayesian models. Both of these software packages use Markov chain Monte Carlo (MCMC) methods to carry out the Bayesian computation. MCMC methods are simulation-based methods that draw samples from the posterior distribution of θ and have proven to be quite powerful for fitting even the most complex models that cannot be entertained in the frequentist paradigm.
Melanoma incidence is increasing at a rate that exceeds all solid tumors. Although education efforts have resulted in earlier detection of melanoma, patients who have deep primary melanoma (>4mm) or melanoma metastatic to regional draining lymph nodes, classified as high-risk melanoma patients, continue to have high relapse and mortality rates of 50% or higher . Recently, several post-operative (adjuvant) chemotherapies have been proposed for this class of melanoma patients, and the one which seems to provide the most significant impact on relapse-free survival and survival is Interferon Alpha-2b (IFN). This immunotherapy was evaluated in two observation-controlled Eastern Cooperative Oncology Group (ECOG) phase III clinical trials, E1684 and E1690. The first trial, E1684, was a two arm clinical trial comparing high-dose interferon (IFN) to Observation (OBS). There were a total of 286 patients enrolled in the study, accrued from 1984 to 1990, and the study was unblinded in 1993 and published in 1996 . The results of this study suggested that IFN has a significant impact on relapse-free survival and survival, which led to the U.S. Food and Drug Administration (FDA) approval of this regimen as an adjuvant therapy for high-risk melanoma patients. Here, relapse-free survival is defined as the time from randomization until progression of tumor or death, whichever comes first, and survival is defined as time from randomization until death. This regimen is widely used for adjuvant therapy of high-risk melanoma patients and the reference standard for evaluation of alternative modalities such as vaccines in current U.S. Cooperative group trials.
The significant treatment effect favoring IFN seen in E1684 with respect to both relapse- free survival (RFS) and overall survival (OS) was expected and was accompanied by substantial side effects due to the high-dose regimen. As a result, ECOG began a second trial (E1690) in 1991 to attempt to confirm the results of E1684 and to study the potential benefit of IFN given at a lower and less toxic dosage. The ECOG trial E1690 was a three arm phase III clinical trial, and had treatment arms of high dose interferon, low dose interferon, and observation. This study had 427 patients on the high dose interferon arm and observation arm combined. Throughout our analyses in this paper, we will use only the data from these two arms of E1690. E1690 was initiated right after the completion of E1684. The E1690 trial accrued patients from 1991 until 1995, was unblinded in 1998, and published in 2000 . The E1690 trial was designed for exactly the same patient population as E1684, and the high dose interferon arm in E1690 was identical to that of E1684.
E1690 was a critical trial in the assessment of the value of high-dose IFN as adjuvant therapy for melanoma. When the results of E1690 were unblinded, separate results for E1684 and E1690 were reported , and analyses of the combined results were problematic and unable to be resolved into one coherent analysis. In this paper, we propose to do a combined analysis of the E1684 and E1690 trials using Bayesian methods. The Bayesian methodology lends itself well into this type of analysis since the E1684 data can effectively be used as prior information for the E1690 analysis. Using the E1684 data as the historical data and the E1690 data as the current data is natural here within the Bayesian paradigm and this will serve as the basis for our analysis. We thus examine the problem of developing suitable statistical models for high-risk melanoma patients as well as the opportunity of conducting Bayesian inference in the presence of historical data. In this article, we will discuss a Bayesian analysis for the endpoints of RFS as well as OS. It was in the OS endpoint where the results were most inconsistent between the two trials and it was this endpoint which has led to most of the controversy. The RFS results were more or less consistent between E1684 and E1690. The results of this trial also raised the issue of whether RFS can be used at all as a suitable surrogate or predictor of OS, since the RFS and OS results were not consistent in E1690.
In the present context, the incorporation of historical data, i.e., the E1684 data, into the analysis of E1690 is a natural thing to do. Towards this goal, we use the power priors of Ibrahim and Chen  to construct the prior distribution. Since the FDA has in the past often required a second confirmatory trial before approving a new drug for cancer therapy, historical data often exists for constructing prior distributions in a clinical trial from a previous trial or trials comparing identical or very similar treatment regimens. Such is the case for the E1684 and E1690 trials. Thus, it appears natural to use the E1684 data somehow for the analysis of E1690.
The construction of prior distributions from historical data has been examined under various contexts by Ibrahim and Chen . Where the focus is on observable quantities in the prior elicitation scheme. Specifically, prior elicitation is based on the availability of historical data D 0 and a scalar precision parameter a 0 (0 ≤ a 0 ≤ 1) quantifying the uncertainty in D 0. Then, D 0 and a 0 are used to specify a prior distribution for the parameters in a “semi-automatic” fashion. Strictly speaking, D 0 can consist of prior predictions from past data, summary statistics from previous studies, or subjective elicitation based on case-specific information. However, the most natural specification of D 0 arises when the raw data from a similar previous study is available, and this is what we focus on in this paper. There are many advantages to this type of elicitation scheme. First, the prior is constructed in a semi-automatic fashion from the historical data in the sense that the prior itself is just a weighted likelihood and thus there is minimal subjective prior elicitation. Second, the precision parameter a 0 allows the investigator a great deal of control on the influence of the historical data for the current analysis. This is important in situations when one suspects heterogeneity between the patient populations, or when the sample sizes between the two studies are quite different.
Patient heterogeneity between the two studies.
Methods for incorporation of historical data.
Consistency of the results from the two studies.
A pooled analysis.
Assessing Observation and high-dose IFN time trends.
Should the analysis of E1690 be done independently of E1684?
If we are interested in the treatment of melanoma, how can E1864 be used as historical data for E1690?
How does E1684 impact the results?
Can we assess and control the impact of E1684 by weighting it somehow?
How do we weight the historical data to account for heterogeneity between the two studies?
For example, π = 0.25 means that 25% of the population are “cured” and 75% are “not cured”. If π = 0, then we obtain the survival model with survival function S0(t). The cure rate model fits the data better than the usual Cox model when a plateau occurs in the right tail of the survival curve, and thus, for E1684, and most other melanoma adjuvant trials in ECOG, the cure rate model fits the data better than the Cox model. The cure rate model also has other attractive properties. For example, the log-rank test has nice properties (i.e., high power) when a cure rate model is used in the statistical design [9, 10]. Also, the cure rate model is quite easy to fit and computationally very straightforward to program in SAS or R.
The specification of the prior distribution is called prior elicitation. Prior elicitation plays the most crucial role in Bayesian inference. Prior distributions based on historical data are useful in applied research settings where the investigator has access to previous studies measuring the same response and covariates as the current study.
We compare important demographic and prognostic factor data from both studies, including distributions of gender, nodal status, age, Breslow depth, site of primary, ulceration, stage of disease, as well as other prognostic factors deemed important. Although not shown here, these comparisons were conducted for E1684 and E1690, and the distributions of these prognostic factors matched remarkably well. That is, the distributions of each of the demographic variables were nearly identical for both studies.
A more formal procedure involves a comparison of the posterior distributions for each study. We can compare posterior summaries, such as posterior hazard ratios, posterior standard deviations, and Highest Posterior Density (HPD) intervals.
Maximum likelihood estimates of hazard ratios and cure rates for E1684 and E1690
The quantity π 0 (θ) is the initial prior for θ. It represents the prior for θ before observing the historical data D0. It is reasonable to restrict the range of a 0 to be between 0 and 1, since in general, it does not make sense to weight the historical data more than the current data. The parameter a 0 controls the heaviness of the tails of the prior for θ. As a 0 becomes smaller, the tails of equation (2) become heavier. Setting a 0 = 1, equation (2) corresponds to the update of π 0(θ) using Bayes theorem. That is, with a 0 = 1, equation (2) corresponds to the posterior distribution of θ based on the historical data alone. When a 0 = 0, equation (2) does not depend on the historical data, and in this case, π(θ|D 0, a 0 = 0) ≡ π 0(θ). Thus, a 0 = 0 is equivalent to a prior specification with no incorporation of historical data. Equation (2) can be viewed as a generalization of the usual Bayesian update of π 0(θ), and therefore serves as a coherent update of π 0(θ). Details of the power prior, likelihood, and posterior for the models used in this paper are given in Additional file 1: Appendix A. We mention here that there are two ways of specifying the power prior: i) using a 0 as a fixed parameter and ii) treating a 0 as random and specifying a beta prior for it as discussed in . In our experience, we have observed that essentially similar results are obtained whether we take a 0 as fixed or random. The a 0 random case is more computationally intensive than the a 0 fixed case and probably not worth the extra modeling and computation, and thus we advocate using a 0 fixed and using a goodness of fit criterion (see Additional file 1: Appendix B) to select the best a 0. Such a procedure is much less computationally demanding than the a 0 random case.
In this section, we carry out a Bayesian analysis of E1690 using E1684 as historical data. For ease of exposition, all analyses are carried out with the treatment covariate alone, not adjusting for other prognostic factors. The two datasets were quite similar with respect to the distributions of several prognostic factors, including age, Breslow depth, number of nodes, performance status, gender, site of primary, and stage of disease. Prognostic factor analyses were conducted to examine the significance of time trend covariates and institutional effects for each study alone, as well as for the combined studies, for explaining the phenomenon in Figure 1. These factors were highly non-significant. We refer the reader to Kirkwood et al.  for detailed summaries of the prognostic factor distributions for both studies, as well as detailed prognostic factor analyses and various subset analyses conducted using the Cox model. The various prognostic factor and subset analyses in  yielded very similar results to analyses using the treatment covariate alone. In addition, Bayesian analyses with the proposed models using the prognostic factors mentioned above gave very similar results to the Bayesian analyses using the treatment covariate alone. Thus we conduct all analyses here with the treatment covariate alone for brevity.
Bayesian analysis of E1690 using E1684 as historical data
Weight (a 0)
Posterior hazard ratio
95% HPD Interval
Posterior cure rate estimate (95% HPD Interval)
Maximum likelihood analysis of E1690 and E1684
Weight (a 0)
Hazard ratio estimate
The upward shifts in the OBS and IFN arm in E1690 relative to E1684 do not have a clear cut explanation. It was conjectured that the standard of care improved with time, therefore resulting in improved RFS and OS in E1690. We carried out several analyses to assess this conjecture by fitting a time trend in the Cox model. As noted in Section 5, this time trend effect was highly non-significant in all models fit. Another analysis examining institutional effects was also conducted and the analysis yielded non-significant institutional effects. Another issue still very difficult to explain is why OS was much better for the OBS patients in E1690 than in E1684.
As noted earlier, the KM plot for OS showed almost total overlap between the IFN and OBS arms. One potential explanation for this is that patients on the OBS arm received salvage therapies after relapse that may have improved their OS. The inclusion of these salvage therapies was not accounted for in the OS analysis. It will take further confirmatory trials to get better answers to these two unresolved issues.
Moreover, we have carried out a sensitivity analysis with respect to the choice of the initial prior, using several possible choices of initial priors. To this end, we consider π 0(β, λ) = π 0(β)π 0(λ) where β ~ N(0, τ 0 2 I 2), I 2 is the 2 × 2 identity matrix, and λj ~ Gamma (b 01, b 02) (j = 1, 2, …, J), where J=5 for RFS and J=10 for OS, with density proportional to and b 01 ≥ 0 and b 02 ≥ 0. For RFS, the posterior hazard ratios and 95% HPD intervals for the treatment effect were 1.294 and (0.977, 1.626) for a 0 = 0, 1.320 and (1.033, 1.611) for a 0 = 0.4, and 1.346 and (1.109, 1.616) for a 0 = 1 when τ 0 2 = 10 and b 01 = b 02 = 1; and 1.293 and (0.968, 1.628) for a 0 = 0, 1.320 and (1.037, 1.616) for a 0 = 0.4, and 1.342 and (1.098, 1.597) for a 0 = 1 when τ 0 2 = 2 and b 01 = b 02 = 10. For OS, the posterior hazard ratios and 95% HPD intervals for the treatment effect were 1.012 and (0.726, 1.303) for a 0 = 0, 1.081 and (0.832, 1.352) for a 0 = 0:4, and 1.138 and (0.910, 1.371) for a 0 = 1 when τ 0 2 = 10 and b 01 = b 02 = 1; and 1.017 and (0.742, 1.317) for a 0 = 0, 1.081 and (0.826, 1.350) for a 0 = 0.4, and 1.135 and (0.909, 1.371) for a 0 = 1 when τ 0 2 = 2 and b 01 = b 02 = 10. These results were quite similar to those given in Table 2. Other choices of (τ 0 2, b 01, b 02) were also tried and the results were similar. Thus, the analysis of E1684 and E1690 is quite robust to the choice of the initial prior.
We have presented a Bayesian analysis of E1684 and E1690 using the ideas of the power prior. This prior incorporates historical data in a natural way and gives the investigator a great deal of control over the weight given to the historical data through the parameter a 0. In Section 5, we carried out a detailed Bayesian analysis using this prior and observed that the analysis provided a more coherent, flexible, and potentially more accurate analysis than a separate analysis of these data or a frequentist analysis of these data. The methodology provides a consistent framework for carrying out a single unified analysis by combining data from two studies. Our analysis showed that using a 0 = 0.4 yielded the best fitting model based on DIC and LPML, therefore suggesting that this value is the optimal value in discounting the E1684 data in the Bayesian analysis of E1690. The Bayesian analysis using a 0 = 0.4 yielded markedly different results than those of a 0 = 0 and a 0 = 1 in terms of estimated hazard ratios and reductions in relapses and/or deaths in using IFN as compared to OBS. A philosophical issue that arises here is that when such an analysis is to be carried out. The Bayesian analysis of E1690 was conducted after seeing the inconsistent results between the studies. A more appropriate approach would be to specify these Bayesian analyses in the protocol itself before the E1690 data is even collected. Such a decision would also impact the design of E1690.
We would like to thank the Associate Editor and two referees for their helpful comments and suggestions, which have led to an improved version of the article.
This work was carried out with partially support from the U.S. National Institutes of Health (NIH) grants #GM 70335 and #CA 74015.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.