- Research article
- Open Access
- Open Peer Review
Joint distribution approaches to simultaneously quantifying benefit and risk
BMC Medical Research Methodologyvolume 6, Article number: 48 (2006)
The benefit-risk ratio has been proposed to measure the tradeoff between benefits and risks of two therapies for a single binary measure of efficacy and a single adverse event. The ratio is calculated from the difference in risk and difference in benefit between therapies. Small sample sizes or expected differences in benefit or risk can lead to no solution or problematic solutions for confidence intervals.
Alternatively, using the joint distribution of benefit and risk, confidence regions for the differences in risk and benefit can be constructed in the benefit-risk plane. The information in the joint distribution can be summarized by choosing regions of interest in this plane. Using Bayesian methodology provides a very flexible framework for summarizing information in the joint distribution.
Data from a National Institute of Child Health & Human Development trial of hydrocortisone illustrate the construction of confidence regions and regions of interest in the benefit-risk plane, where benefit is survival without supplemental oxygen at 36 weeks postmenstrual age, and risk is gastrointestinal perforation. For the subgroup of infants exposed to chorioamnionitis the confidence interval based on the benefit-risk ratio is wide (Benefit-risk ratio: 1.52; 90% confidence interval: 0.23 to 5.25). Choosing regions of appreciable risk and acceptable risk in the benefit-risk plane confirms the uncertainty seen in the wide confidence interval for the benefit-risk ratio – there is a greater than 50% chance of falling into the region of acceptable risk – while visually allowing the uncertainty in risk and benefit to be shown separately. Applying Bayesian methodology, an incremental net health benefit analysis shows there is a 72% chance of having a positive incremental net benefit if hydrocortisone is used in place of placebo if one is willing to incur at most one gastrointestinal perforation for each additional infant that survives without supplemental oxygen.
If the benefit-risk ratio is presented, the joint distribution of benefit and risk also should be shown. These regions avoid the ambiguity associated with collapsing benefit and risk to a single dimension. Bayesian methods allow even greater flexibility in simultaneously quantifying benefit and risk.
When comparing the effects of a new therapy with an existing therapy, it is not uncommon for the new therapy to show increased risks along with increased benefits. We consider the case of a single binary measure of efficacy and a single binary measure of risk or adverse event (absent/present, ever/never) and address the questions:
1. How do you appropriately measure the tradeoff between the benefit and risk of two therapies?
2. When should you conclude the increased benefit of a new therapy outweighs the potential increased risk?
Rather than focusing on hypothesis testing and controlling the type I error rate, our interest is in jointly quantifying benefit and risk.
The benefit-risk ratio
One method that has been suggested for measuring the tradeoff between a binary measure of benefit and a binary measure of risk is the benefit-risk ratio . The benefit-risk ratio is the ratio of the difference in benefit to difference in risk, or equivalently, the ratio of Number Needed to Harm (NNH) to Number Needed to Treat (NNT):
where p E and p c are the probabilities of benefit in the experimental treatment and control arms, respectively, and q E and q c are the probabilities of risk in the experimental treatment and control arms, respectively.
The benefit-risk ratio can be interpreted as the increase in the number of expected patients who will benefit for each additional adverse event that is incurred from using the experimental treatment rather than the control. The ratio also can be viewed in the benefit-risk plane as the slope of the line that passes through the origin and point defined by the observed difference in risk and difference in benefit as shown in Figure 1. The benefit-risk ratio is similar to the incremental cost-effectiveness ratio (ICER), which measures the tradeoff between the cost and effectiveness of two therapies. The ICER is defined as the ratio of the mean treatment difference in cost to the mean treatment difference in effectiveness for two therapies:
where γ E and γ C are average costs of the experimental and control conditions, respectively, and ε E and ε C are average effectiveness measures of the experimental and control conditions, respectively. One can similarly view the ICER in the cost-effectiveness plane. Distributional assumptions may differ for the benefit-risk ratio and cost-effectiveness ratio with cost generally considered a continuous measure. And while effectiveness appears in the denominator of the ICER, benefit is in the numerator of the benefit-risk ratio. Furthermore, although the current discussion focuses on a single binary measure of risk, consolidating multiple risks into a single measure may be more problematic than combining costs.
There is some ambiguity in reducing the difference in benefit and difference in risk to a single measure. As differing magnitudes of benefit and risk can result in the same ratio, control therapy could show more benefit and more risk and yield the same ratio as a new therapy which shows more benefit and more risk. Note in Figure 1 that any observed difference in benefit and observed difference in risk that falls on the line shown through the origin will produce the same benefit-risk ratio. For example, suppose the difference in benefit favors the new therapy over control and is 0.30, but the new therapy also increases the adverse event rate by 0.20; the resulting benefit-risk ratio is 1.5. However, if the difference in benefit favors control over the new therapy and is -0.30, but the new therapy reduces the adverse event rate by 0.20, then the resulting benefit risk ratio also is 1.5. When deciding whether the new therapy is acceptable, it is unlikely that these two scenarios would be considered equivalent. In the first scenario we are weighing increased benefit against increased risk, while in the latter we are weighing decreased benefit against decreased risk. Heitjan et al. highlighted similar complications for estimation of the ICER .
Confidence intervals can be constructed for the benefit-risk ratio using methods similar to those used to compute confidence intervals for cost-effectiveness ratios [3–5]. Assuming bivariate normality, Willan et al. showed that Fieller's theorem can be used to compute confidence intervals where the variance of the bivariate normal distribution is given by
where "hats" indicate the observed values of population parameters and b E and b C are the probabilities of simultaneous benefit and risk in the same subject for the experimental treatment and control arms, respectively . The variance is estimated () by replacing the population parameters with the observed values. Calculation of the confidence limits by Fieller's theorem involves matrix manipulation which can be done in several packages including PROC IML in SAS (SAS Institute, Inc., Cary, NC), Mathematica (Wolfram Research, Inc., Champaign, IL), S-PLUS (Insightful Corporation, Seattle, WA), or the free software R . Alternatively, the bootstrap can be used to construct confidence intervals using the percentile method .
Difficulties can arise in using either Fieller's theorem or the bootstrap methods to construct confidence intervals [1, 8, 4, 2, 9]. Intractable or problematic solutions can result using Fieller's theorem because of small sample sizes and/or small expected differences in benefit and/or risk. As shown in Figure 2, the confidence limits of the benefit-risk ratio also can be represented as slopes of lines in the benefit-risk plane, and there is a discontinuity in the distribution of the benefit-risk ratio when the difference in risk is 0. For the bootstrap method, it may be unclear how to order estimates from the bootstrap samples when they fall in multiple quadrants. Heitjan et al. proposed reordering the bootstrap samples for the ICER (modified percentile bootstrap), taking into account the quadrant in which the ratio falls . A more complete solution by Heitjan et al. uses Bayesian methodology and treats the ICER as a two-dimensional parameter composed of the ICER value and the quadrant in which the effectiveness difference and cost difference fall . This methodology has been extended to handle censored effectiveness data .
Other simultaneous measures of benefit and risk
Other measures have been suggested to summarize differences in benefit and risk. An early example is the work by Tallarida et al. on a severity scale developed through physician interviews which synthesizes information on disease severity and adverse drug reactions so that these considerations can be quantitatively incorporated into a benefit-risk analysis . Chuang-Stein et al. presented three ratio measures that require assigning weights to categories of the form: (1) benefit without adverse event, (2) benefit with adverse event, (3) no benefit and no adverse event, (4) no benefit with adverse event, and (5) unacceptable adverse event leading to withdrawal . While these ratios are more general than the benefit-risk ratio, specifying weights that reflect the relative importance of the categories may be difficult. Later work by Chuang-Stein discounts benefit by risk using consolidated safety data [12, 13]. As noted by Holden, these approaches do not clearly delineate benefit and risk which makes their interpretation more complicated than the traditional benefit-risk ratio .
Rather than collapsing the difference in benefit and difference in risk into a single dimension, the joint density of benefit and risk can be represented in the benefit-risk plane. Similar methods have been proposed for cost-effectiveness analyses [15, 16]. Confidence regions can be constructed either under the bivariate normal assumption or using the bootstrap and nonparametric density estimation. Assuming bivariate normality, the confidence region is an ellipse. To construct a nonparametric confidence region, we draw repeated (bootstrap) samples with replacement and compute a benefit difference and risk difference for each of the samples. Next we obtain a two-dimensional kernel density estimate using the set of bootstrap estimates and find a contour of the kernel density estimate that includes (1 - α) × 100% of the bootstrap estimates . Two-dimensional kernel density estimation methods are available for S-PLUS or R.
In addition to plotting the confidence region in the benefit-risk plan, we also can partition the benefit-risk plane into chosen regions of interest, e.g.,
No appreciable benefit
No conclusion ("gray region")
Experimental therapy superior
and look at the proportion of bootstrap estimates that fall into each region. These regions may be easier to specify for the clinician than the weights needed for the weighted benefit-risk ratios proposed by Chuang-Stein et al. .
As an alternative to the confidence region approach, using asymptotic theory, Bayesian inference can be based on the posterior distribution of the difference in benefit and difference in risk, assuming that the prior distribution is locally uniform (or continuous and nonzero) near the true difference in risk and difference in benefit . Using the posterior distribution,
the posterior probability of falling into the chosen regions can be computed . The integration required can be carried out using the numerical integration function N Integrate in Mathematica or similar software. The probability interpretation of the Bayesian analysis is more straightforward than the confidence interpretation associated with the bootstrapping approach.
Decision analysis also can be conducted under the Bayesian framework using linear combinations of the form
f(A, B) = A(p E - p C ) - B(q E - q C )
Point estimates and probability intervals for these linear combinations can be computed by taking a large number of draws from the posterior distribution and computing f(A, B) for each draw. The median of the draws can be used as a point estimate of f(A, B), and the 100α/2 and 100(1 - α/2) centiles of these draws form a 100(1 - α)% interval estimate.
These linear combinations also can be used to conduct benefit-risk analyses analogous to the incremental net health benefit (INHB)approach used in cost-effectiveness analyses [20, 21]. In the cost-effectiveness setting, the INHB of an experimental treatment compared to a control is defined as
INHB(λ) = (ε E - ε C ) - (γ E - γ C )/λ
where λ can be thought of as the maximum society is willing to pay for an incremental gain in health . One obvious advantage of this approach is that INHB is measured in units of effectiveness so the quadrant ambiguity of the cost-effectiveness approach is no longer an issue.
Analogously, in the benefit-risk setting, we'll define an incremental health benefit of the experimental therapy compared to the control as
INHB BR (δ) = (p E - p C ) - (q E - q C )/δ
where δ can be thought of as the maximum number of adverse events one is willing to incur for each subject that benefits. Alternatively, and perhaps more meaningfully, one can interpret 1/δ as the minimum number of subjects who should benefit for each additional adverse event. Integration over the posterior distribution of the risk difference and benefit difference can be used to compute Pr[INHB BR (δ) > 0] for a particular δ value or one can look at a plot of Pr[INHB BR (δ) > 0] over a range of δ values.
Although we have used large sample theory to assume the posterior distribution of the difference in risk and difference in benefit is bivariate normal, this assumption is not necessary for these Bayesian methods. As long as it is possible to simulate draws from the posterior distribution, these point estimates and probability intervals can be calculated under other distributional assumptions. Simulation approximations to the integration required to compute the posterior probabilities, Pr[INHB BR (δ) > 0], are obtained by computing the percentage of simulation draws for which INHB BR (δ) exceeds 0. Similar simulation approximations to integration can be used to compute posterior probabilities of falling into chosen regions of interest in the benefit-risk plane.
Results and discussion
The PROPHET study is a multicenter, randomized clinical trial comparing placebo (n = 180) to low-dose hydrocortisone therapy (n = 180) in the first two weeks of life in extremely low birth weight babies (500–999 grams) to prevent chronic lung disease sponsored by National Institute of Child Health & Human Development . Enrollment was stopped at 360 babies because of an increase in spontaneous gastrointestinal (GI) perforation in the hydrocortisone-treated group. The primary benefit outcome for the study was survival without supplemental oxygen at 36 weeks postmenstrual age. While low-dose hydrocortisone did not significantly improve survival without supplemental oxygen in the overall study population, within the subgroup of babies exposed to chorioamnionitis (an a priori subgroup of interest), the hydrocortisone-treated group had significantly higher survival without supplemental oxygen. A benefit-risk analysis allows further examination of the relationship between survival without supplemental oxygen and GI perforation in the chorioamnionitis subgroup. Table 1 shows the proportion of babies exposed to chorioamnionitis in each treatment group that showed benefit or experienced a GI perforation.
Using Fieller's theorem, the benefit-risk ratio for the chorioamnionitis subgroup is 1.52 (90% confidence interval: 0.23 to 5.25). Thus, about 3 additional babies will survive without supplemental oxygen for every 2 GI perforations incurred from using hydrocortisone instead of placebo. We note in this case that the confidence interval is wide and is not inconsistent with as many as 5 babies benefiting for each additional adverse event incurred when hydrocortisone is used in place of placebo. The 90% confidence ellipse assuming bivariate normality and 90% nonparametric confidence region based on 5000 bootstrap samples are shown in Figure 3. The bootstrap estimates for the 5000 samples also are shown. Despite the small expected cell counts for GI perforations in the placebo and hydrocortisone groups, for this example the nonparametric and bivariate normal regions are very similar.
As a hypothetical example of choosing regions of interest for the PROPHET study, we separate the benefit-risk plane into the following regions:
Appreciable Risk: Risk difference > 0.10
Acceptable Risk: Risk difference = 0.10
Hydrocortisone Superior: Benefit difference > 0.20
No Conclusion: 0.10 = Benefit difference = 0.20
No Appreciable Benefit: Benefit difference < 0.10
Estimates of the probabilities of falling into the selected regions are given in Table 2. The bootstrap proportions and posterior probabilities are similar and show that there is a greater than 50% chance of falling into the region of acceptable risk. However, within the acceptable risk region there is still a substantial chance that no conclusion can be reached.
Alternatively, Figure 4 shows a plot of the probability the incremental net health benefit (INHB BR ) of hydrocortisone compared to placebo exceeds zero over a range of 1/δ, which can be interpreted here as the minimum number of babies who should survive without supplemental oxygen for each additional GI perforation incurred. If the threshold is one additional survivor without supplemental oxygen for each additional GI perforation, the probability INHB BR (1)exceeds zero is approximately 0.72. This probability quickly drops off and falls below 50% when the threshold is approximately 1.5 additional survivors without supplemental oxygen for each additional GI perforation.
These findings are not conclusive and demonstrate the need for additional study to determine how hydrocortisone therapy might be used to provide benefit in these extremely low birth weight infants without increasing risk of GI perforation. One area of potential investigation is related to indomethacin therapy's role in the development of GI perforation. There is evidence in the PROPHET study of an interaction between hydrocortisone and early indomethacin therapy, although indomethacin was not randomized in this trial. In the absence of early indomethacin, low-dose hydrocortisone therapy administered as described for this study has not previously been associated with increased incidence of GI perforation . For this analysis S-PLUS was used to construct the confidence ellipse and nonparametric region. The two-dimensional kernel density estimation function kde and the ellipse-drawing function ellipse for S-PLUS or R are available from StatLib . Mathematica was used to compute the benefit-risk ratio and associated confidence interval and all posterior probabilities, but these computations also can be done using S-PLUS or R.
It is less ambiguous to jointly look at the difference in risk and difference in benefit in the benefit-risk plane than to collapse information by computing a benefit-risk ratio. If the benefit-risk ratio is reported, the joint distribution of benefit and risk also should be presented. When looking at the joint distribution, uncertainty in benefits and risks can be represented by confidence ellipses based on the assumption of bivariate normality or plots of estimates from bootstrap samples with or without a nonparametric confidence region. To quantify the probability of falling into regions of interest, the proportion of bootstrap estimates or posterior probabilities can be computed for particular regions. Bayesian methods provide a flexible framework in which to summarize the joint distribution of benefit and risk. Using the Bayesian framework allows one to easily conduct benefit-risk analyses similar to the incremental net health benefit analyses used for cost-effectiveness research. As this approach is based on linear combinations of benefit and risk, many of the inferential problems associated with ratios are avoided.
We have chosen to focus on the comparison of two therapies for a binary measure of benefit and a binary measure of risk, as the motivating PROPHET study had a binary primary benefit outcome and an increased rate of a single adverse event, spontaneous GI perforation, which resulted in an early stop of the trial. However, the Bayesian methods easily generalize to allow for other distributions of benefit and risk, provided one can simulate samples from the posterior distribution of interest. The Bayesian methods also allow prior information to be incorporated into the inference if such information is available. When it is of interest to compare more than two therapies, the benefit-risk approaches shown can be conducted in a pairwise fashion.
Willan AR, O'Brien BJ, Cook DJ: Benefit-risk ratios in the assessment of the clinical evidence of a new therapy. Control Clin Trials. 1997, 18: 121-130. 10.1016/S0197-2456(96)00092-X.
Heitjan DF, Moskowitz AJ, Whang W: Bayesian estimation of cost-effectiveness ratios from clinical trials. Health Econ. 1999, 8: 191-201. 10.1002/(SICI)1099-1050(199905)8:3<191::AID-HEC409>3.0.CO;2-R.
Willan AR, O'Brien BJ: Confidence intervals for cost-effectiveness ratios: an application of Fieller's theorem. Health Econ. 1996, 5: 297-305. 10.1002/(SICI)1099-1050(199607)5:4<297::AID-HEC216>3.0.CO;2-T.
Heitjan DF, Moskowitz AJ, Whang W: Problems with interval estimates of the incremental cost-effectiveness ratio. Med Decis Making. 1999, 19: 9-15.
Glick H, Polsky D: Evaluating stochastic uncertainty in cost-effectiveness analysis. 2003, [Society for Medical Decision Making short course], [http://www.uphs.upenn.edu/dgimhsr]
The R Project for Statistical Computing. [http://www.r-project.org]
Efron B: The jackknife, the bootstrap and other resampling plans. 1982, SIAM [Society for Industrial and Applied Mathematics]
Briggs AH, Wonderling DE, Mooney CZ: Pulling cost-effectiveness analysis up by its bootstraps: a non-parametric approach to confidence interval estimation. Health Econ. 1997, 6: 327-340. 10.1002/(SICI)1099-1050(199707)6:4<327::AID-HEC282>3.0.CO;2-W.
Heitjan DF, Kim CY, Li H: Bayesian estimation of cost-effectiveness from censored data. Stat Med. 2004, 23: 1297-1309. 10.1002/sim.1740.
Tallarida RJ, Murray RB, Eiben C: A scale for assessing the severity of diseases and adverse drug reactions. Clin Pharmacol Ther. 1979, 25: 381-390.
Chuang-Stein C, Mohberg NR, Sinkula MS: Three measures for simultaneously evaluating benefits and risks using categorical data from clinical trials. Stat Med. 1991, 10: 1349-1359.
Chuang-Stein C, Mohberg NR, Musselman DM: Organization and analysis of safety data using a multivariate approach. Stat Med. 1992, 11: 1075-1089.
Chuang-Stein C: A new proposal for benefit-less-risk analysis in clinical trials. Control Clin Trials. 1994, 15: 30-43. 10.1016/0197-2456(94)90026-4.
Holden WL: Benefit-risk analysis: a brief review and proposed quantitative approaches. Drug Saf. 2003, 26: 853-862. 10.2165/00002018-200326120-00002.
van Hout BA, Al MJ, Gordon GS, Rutten FF: Costs, effects and C/E-ratios alongside a clinical trial. Health Econ. 1994, 3: 309-319.
Noyes K, Holloway RG: Evidence from cost-effectiveness research. NeuroRx. 2004, 1: 348-355. 10.1602/neurorx.1.3.348.
Hall P: On the bootstrap and likelihood-based confidence regions. Biometrika. 1987, 74: 481-493. 10.2307/2336687.
Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian Data Analysis. 1995, London: Chapman & Hall
Berger JO: Statistical Decision Theory and Bayesian Analysis. 1985, New York: Springer, Second
Stinnett AA, Mullahy J: Net health benefits: a new framework for the analysis of uncertainty in cost-effectiveness analysis. Med Decis Making. 1998, 18: S68-S80.
Willan AR: Analysis, sample size, and power for estimating incremental net health benefit from clinical trial data. Control Clin Trials. 2001, 22: 228-237. 10.1016/S0197-2456(01)00110-6.
Watterberg KL, Gerdes JS, Cole CH, Aucott SW, Thilo EH, Mammel MC, Couser RJ, Garland JS, Rozycki HJ, Leach CL, Backstrom C, Shaffer ML: Prophylaxis of early adrenal insufficiency to prevent bronchopulmonary dysplasia: a multicenter trial. Pediatrics. 2004, 114: 1649-1657. 10.1542/peds.2004-1159.
Watterberg KL, Gerdes JS, Gifford KL, Lin HM: Prophylaxis against early adrenal insufficiency to prevent chronic lung disease in premature infants. Pediatrics. 1999, 104: 1258-1263. 10.1542/peds.104.6.1258.
StatLib – Software and extensions for the S (Splus) language. [http://lib.stat.cmu.edu/S]
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/6/48/prepub
Partial support for this research was provided under Grant No. R01 HD038540 from the National Institute of Child Health & Human Development. The authors would like to thank three reviewers for helpful comments that improved the manuscript.
The authors declare that they have no competing interests.
MS developed and conducted the statistical analysis and drafted the manuscript. KW conceived of and led the PROPHET study which serves as the illustrative example. All authors participated in the interpretation of the analysis and read and approved the final manuscript.