Joint distribution approaches to simultaneously quantifying benefit and risk

Background The benefit-risk ratio has been proposed to measure the tradeoff between benefits and risks of two therapies for a single binary measure of efficacy and a single adverse event. The ratio is calculated from the difference in risk and difference in benefit between therapies. Small sample sizes or expected differences in benefit or risk can lead to no solution or problematic solutions for confidence intervals. Methods Alternatively, using the joint distribution of benefit and risk, confidence regions for the differences in risk and benefit can be constructed in the benefit-risk plane. The information in the joint distribution can be summarized by choosing regions of interest in this plane. Using Bayesian methodology provides a very flexible framework for summarizing information in the joint distribution. Results Data from a National Institute of Child Health & Human Development trial of hydrocortisone illustrate the construction of confidence regions and regions of interest in the benefit-risk plane, where benefit is survival without supplemental oxygen at 36 weeks postmenstrual age, and risk is gastrointestinal perforation. For the subgroup of infants exposed to chorioamnionitis the confidence interval based on the benefit-risk ratio is wide (Benefit-risk ratio: 1.52; 90% confidence interval: 0.23 to 5.25). Choosing regions of appreciable risk and acceptable risk in the benefit-risk plane confirms the uncertainty seen in the wide confidence interval for the benefit-risk ratio – there is a greater than 50% chance of falling into the region of acceptable risk – while visually allowing the uncertainty in risk and benefit to be shown separately. Applying Bayesian methodology, an incremental net health benefit analysis shows there is a 72% chance of having a positive incremental net benefit if hydrocortisone is used in place of placebo if one is willing to incur at most one gastrointestinal perforation for each additional infant that survives without supplemental oxygen. Conclusion If the benefit-risk ratio is presented, the joint distribution of benefit and risk also should be shown. These regions avoid the ambiguity associated with collapsing benefit and risk to a single dimension. Bayesian methods allow even greater flexibility in simultaneously quantifying benefit and risk.

Results: Data from a National Institute of Child Health & Human Development trial of hydrocortisone illustrate the construction of confidence regions and regions of interest in the benefit-risk plane, where benefit is survival without supplemental oxygen at 36 weeks postmenstrual age, and risk is gastrointestinal perforation. For the subgroup of infants exposed to chorioamnionitis the confidence interval based on the benefit-risk ratio is wide (Benefit-risk ratio: 1.52; 90% confidence interval: 0.23 to 5.25). Choosing regions of appreciable risk and acceptable risk in the benefit-risk plane confirms the uncertainty seen in the wide confidence interval for the benefit-risk ratio -there is a greater than 50% chance of falling into the region of acceptable riskwhile visually allowing the uncertainty in risk and benefit to be shown separately. Applying Bayesian methodology, an incremental net health benefit analysis shows there is a 72% chance of having a positive incremental net benefit if hydrocortisone is used in place of placebo if one is willing to incur at most one gastrointestinal perforation for each additional infant that survives without supplemental oxygen.

Conclusion:
If the benefit-risk ratio is presented, the joint distribution of benefit and risk also should be shown. These regions avoid the ambiguity associated with collapsing benefit and risk to a single dimension. Bayesian methods allow even greater flexibility in simultaneously quantifying benefit and risk.

Background
When comparing the effects of a new therapy with an existing therapy, it is not uncommon for the new therapy to show increased risks along with increased benefits. We consider the case of a single binary measure of efficacy and a single binary measure of risk or adverse event (absent/present, ever/never) and address the questions: 1. How do you appropriately measure the tradeoff between the benefit and risk of two therapies? 2. When should you conclude the increased benefit of a new therapy outweighs the potential increased risk?
Rather than focusing on hypothesis testing and controlling the type I error rate, our interest is in jointly quantifying benefit and risk.

The benefit-risk ratio
One method that has been suggested for measuring the tradeoff between a binary measure of benefit and a binary measure of risk is the benefit-risk ratio [1]. The benefitrisk ratio is the ratio of the difference in benefit to difference in risk, or equivalently, the ratio of Number Needed to Harm (NNH) to Number Needed to Treat (NNT): where p E and p c are the probabilities of benefit in the experimental treatment and control arms, respectively, and q E and q c are the probabilities of risk in the experimental treatment and control arms, respectively.
The benefit-risk ratio can be interpreted as the increase in the number of expected patients who will benefit for each additional adverse event that is incurred from using the experimental treatment rather than the control. The ratio also can be viewed in the benefit-risk plane as the slope of the line that passes through the origin and point defined by the observed difference in risk and difference in benefit as shown in Figure 1. The benefit-risk ratio is similar to the incremental cost-effectiveness ratio (ICER), which measures the tradeoff between the cost and effectiveness of two therapies. The ICER is defined as the ratio of the mean treatment difference in cost to the mean treatment difference in effectiveness for two therapies: where γ E and γ C are average costs of the experimental and control conditions, respectively, and ε E and ε C are average effectiveness measures of the experimental and control conditions, respectively. One can similarly view the ICER in the cost-effectiveness plane. Distributional assumptions may differ for the benefit-risk ratio and cost-effectiveness ratio with cost generally considered a continuous measure. And while effectiveness appears in the denominator of the ICER, benefit is in the numerator of the benefit-risk ratio. Furthermore, although the current discussion focuses on a single binary measure of risk, consolidating multiple risks into a single measure may be more problematic than combining costs.
There is some ambiguity in reducing the difference in benefit and difference in risk to a single measure. As differing magnitudes of benefit and risk can result in the same ratio, control therapy could show more benefit and more risk and yield the same ratio as a new therapy which shows more benefit and more risk. Note in Figure 1 that any observed difference in benefit and observed difference in risk that falls on the line shown through the origin will produce the same benefit-risk ratio. For example, suppose the difference in benefit favors the new therapy over control and is 0.30, but the new therapy also increases the adverse event rate by 0.20; the resulting benefit-risk ratio is 1.5. However, if the difference in benefit favors control over the new therapy and is -0.30, but the new therapy reduces the adverse event rate by 0.20, then the resulting benefit risk ratio also is 1.5. When deciding whether the new therapy is acceptable, it is unlikely that these two sce- The benefit-risk ratio in the benefit-risk plane Figure 1 The benefit-risk ratio in the benefit-risk plane. The benefit-risk ratio is the slope of the line which passes through the origin and the point defined by the observed difference in risk and observed difference in benefit. Confidence intervals can be constructed for the benefitrisk ratio using methods similar to those used to compute confidence intervals for cost-effectiveness ratios [3][4][5]. Assuming bivariate normality, Willan et al. showed that Fieller's theorem can be used to compute confidence intervals where the variance of the bivariate normal distribution is given by where "hats" indicate the observed values of population parameters and b E and b C are the probabilities of simultaneous benefit and risk in the same subject for the experimental treatment and control arms, respectively [1]. The variance is estimated ( ) by replacing the population parameters with the observed values. Calculation of the confidence limits by Fieller's theorem involves matrix manipulation which can be done in several packages including PROC IML in SAS (SAS Institute, Inc., Cary, NC), Mathematica (Wolfram Research, Inc., Champaign, IL), S-PLUS (Insightful Corporation, Seattle, WA), or the free software R [6]. Alternatively, the bootstrap can be used to construct confidence intervals using the percentile method [7].
Difficulties can arise in using either Fieller's theorem or the bootstrap methods to construct confidence intervals [1,8,4,2,9]. Intractable or problematic solutions can result using Fieller's theorem because of small sample sizes and/ or small expected differences in benefit and/or risk. As shown in Figure 2, the confidence limits of the benefit-risk ratio also can be represented as slopes of lines in the benefit-risk plane, and there is a discontinuity in the distribution of the benefit-risk ratio when the difference in risk is 0. For the bootstrap method, it may be unclear how to order estimates from the bootstrap samples when they fall in multiple quadrants. Heitjan et al. proposed reordering the bootstrap samples for the ICER (modified percentile bootstrap), taking into account the quadrant in which the ratio falls [4]. A more complete solution by Heitjan et al. uses Bayesian methodology and treats the ICER as a twodimensional parameter composed of the ICER value and the quadrant in which the effectiveness difference and cost difference fall [2]. This methodology has been extended to handle censored effectiveness data [9].

Other simultaneous measures of benefit and risk
Other measures have been suggested to summarize differences in benefit and risk. An early example is the work by Tallarida et al. on a severity scale developed through physician interviews which synthesizes information on disease severity and adverse drug reactions so that these considerations can be quantitatively incorporated into a benefit-risk analysis [10]. Chuang-Stein et al. presented three ratio measures that require assigning weights to categories of the form: (1) benefit without adverse event, (2) benefit with adverse event, (3) no benefit and no adverse event, (4) no benefit with adverse event, and (5) unacceptable adverse event leading to withdrawal [11]. While these ratios are more general than the benefit-risk ratio, specifying weights that reflect the relative importance of the categories may be difficult. Later work by Chuang-Stein discounts benefit by risk using consolidated safety data [12,13]. As noted by Holden, these approaches do not clearly delineate benefit and risk which makes their interpretation more complicated than the traditional benefit-risk ratio [14].
Confidence limits in the benefit-risk plane Figure 2 Confidence limits in the benefit-risk plane. The confidence limits of the benefit-risk ratio can be represented as slopes of lines (dotted) which pass through the origin. A discontinuity exists when the difference in risk is 0.

Confidence regions
Rather than collapsing the difference in benefit and difference in risk into a single dimension, the joint density of benefit and risk can be represented in the benefit-risk plane. Similar methods have been proposed for cost-effectiveness analyses [15,16]. Confidence regions can be constructed either under the bivariate normal assumption or using the bootstrap and nonparametric density estimation. Assuming bivariate normality, the confidence region is an ellipse. To construct a nonparametric confidence region, we draw repeated (bootstrap) samples with replacement and compute a benefit difference and risk difference for each of the samples. Next we obtain a twodimensional kernel density estimate using the set of bootstrap estimates and find a contour of the kernel density estimate that includes (1 -α) × 100% of the bootstrap estimates [17]. Two-dimensional kernel density estimation methods are available for S-PLUS or R.
In addition to plotting the confidence region in the benefit-risk plan, we also can partition the benefit-risk plane into chosen regions of interest, e.g.,

Experimental therapy superior
and look at the proportion of bootstrap estimates that fall into each region. These regions may be easier to specify for the clinician than the weights needed for the weighted benefit-risk ratios proposed by Chuang-Stein et al. [11].

Bayesian methods
As an alternative to the confidence region approach, using asymptotic theory, Bayesian inference can be based on the posterior distribution of the difference in benefit and difference in risk, assuming that the prior distribution is locally uniform (or continuous and nonzero) near the true difference in risk and difference in benefit [18]. Using the posterior distribution, the posterior probability of falling into the chosen regions can be computed [19]. The integration required can be carried out using the numerical integration function N Integrate in Mathematica or similar software. The probability interpretation of the Bayesian analysis is more straightforward than the confidence interpretation associated with the bootstrapping approach. These linear combinations also can be used to conduct benefit-risk analyses analogous to the incremental net health benefit (INHB)approach used in cost-effectiveness analyses [20,21]. In the cost-effectiveness setting, the INHB of an experimental treatment compared to a control is defined as where λ can be thought of as the maximum society is willing to pay for an incremental gain in health [20]. One obvious advantage of this approach is that INHB is measured in units of effectiveness so the quadrant ambiguity of the cost-effectiveness approach is no longer an issue.
Analogously, in the benefit-risk setting, we'll define an incremental health benefit of the experimental therapy compared to the control as where δ can be thought of as the maximum number of adverse events one is willing to incur for each subject that benefits. Alternatively, and perhaps more meaningfully, one can interpret 1/δ as the minimum number of subjects who should benefit for each additional adverse event. Integration over the posterior distribution of the risk difference and benefit difference can be used to compute Although we have used large sample theory to assume the posterior distribution of the difference in risk and difference in benefit is bivariate normal, this assumption is not necessary for these Bayesian methods. As long as it is possible to simulate draws from the posterior distribution, these point estimates and probability intervals can be calculated under other distributional assumptions. Simulation approximations to the integration required to compute the posterior probabilities, Pr[INHB BR (δ) > 0], are obtained by computing the percentage of simulation draws for which INHB BR (δ) exceeds 0. Similar simulation approximations to integration can be used to compute posterior probabilities of falling into chosen regions of interest in the benefit-risk plane.

Results and discussion
The PROPHET study is a multicenter, randomized clinical trial comparing placebo (n = 180) to low-dose hydrocortisone therapy (n = 180) in the first two weeks of life in extremely low birth weight babies (500-999 grams) to prevent chronic lung disease sponsored by National Institute of Child Health & Human Development [22]. Enrollment was stopped at 360 babies because of an increase in spontaneous gastrointestinal (GI) perforation in the hydrocortisone-treated group. The primary benefit outcome for the study was survival without supplemental oxygen at 36 weeks postmenstrual age. While low-dose hydrocortisone did not significantly improve survival without supplemental oxygen in the overall study population, within the subgroup of babies exposed to chorioamnionitis (an a priori subgroup of interest), the hydrocortisone-treated group had significantly higher survival without supplemental oxygen. A benefit-risk analysis allows further examination of the relationship between survival without supplemental oxygen and GI perforation in the chorioamnionitis subgroup. Table 1 shows the proportion of babies exposed to chorioamnionitis in each treatment group that showed benefit or experienced a GI perforation.
Using Fieller's theorem, the benefit-risk ratio for the chorioamnionitis subgroup is 1.52 (90% confidence interval: 0.23 to 5.25). Thus, about 3 additional babies will survive without supplemental oxygen for every 2 GI perforations incurred from using hydrocortisone instead of placebo. We note in this case that the confidence interval is wide and is not inconsistent with as many as 5 babies benefiting for each additional adverse event incurred when hydrocortisone is used in place of placebo. The 90% confidence ellipse assuming bivariate normality and 90% nonparametric confidence region based on 5000 bootstrap samples are shown in Figure 3. The bootstrap estimates for the 5000 samples also are shown. Despite the small expected cell counts for GI perforations in the placebo and hydrocortisone groups, for this example the nonparametric and bivariate normal regions are very similar.
As a hypothetical example of choosing regions of interest for the PROPHET study, we separate the benefit-risk plane into the following regions: Estimates of the probabilities of falling into the selected regions are given in Table 2. The bootstrap proportions and posterior probabilities are similar and show that there is a greater than 50% chance of falling into the region of acceptable risk. However, within the acceptable risk region there is still a substantial chance that no conclusion can be reached.
Alternatively, Figure 4 shows a plot of the probability the incremental net health benefit (INHB BR ) of hydrocortisone compared to placebo exceeds zero over a range of 1/ δ, which can be interpreted here as the minimum number of babies who should survive without supplemental oxygen for each additional GI perforation incurred. If the threshold is one additional survivor without supplemental oxygen for each additional GI perforation, the probability INHB BR (1)exceeds zero is approximately 0.72. This probability quickly drops off and falls below 50% when the threshold is approximately 1.5 additional survivors without supplemental oxygen for each additional GI perforation.
These findings are not conclusive and demonstrate the need for additional study to determine how hydrocortisone therapy might be used to provide benefit in these extremely low birth weight infants without increasing risk of GI perforation. One area of potential investigation is related to indomethacin therapy's role in the development of GI perforation. There is evidence in the PROPHET study of an interaction between hydrocortisone and early indomethacin therapy, although indomethacin was not randomized in this trial. In the absence of early indomethacin, low-dose hydrocortisone therapy administered as described for this study has not previously been associated with increased incidence of GI perforation [23]. For this analysis S-PLUS was used to construct the confidence ellipse and nonparametric region. The two-dimensional kernel density estimation function kde and the ellipsedrawing function ellipse for S-PLUS or R are available from StatLib [24]. Mathematica was used to compute the benefit-risk ratio and associated confidence interval and all posterior probabilities, but these computations also can be done using S-PLUS or R.

Conclusion
It is less ambiguous to jointly look at the difference in risk and difference in benefit in the benefit-risk plane than to collapse information by computing a benefit-risk ratio. If the benefit-risk ratio is reported, the joint distribution of benefit and risk also should be presented. When looking at the joint distribution, uncertainty in benefits and risks can be represented by confidence ellipses based on the assumption of bivariate normality or plots of estimates from bootstrap samples with or without a nonparametric confidence region. To quantify the probability of falling into regions of interest, the proportion of bootstrap estimates or posterior probabilities can be computed for particular regions. Bayesian methods provide a flexible framework in which to summarize the joint distribution of benefit and risk. Using the Bayesian framework allows one to easily conduct benefit-risk analyses similar to the incremental net health benefit analyses used for costeffectiveness research. As this approach is based on linear combinations of benefit and risk, many of the inferential problems associated with ratios are avoided.
We have chosen to focus on the comparison of two therapies for a binary measure of benefit and a binary measure of risk, as the motivating PROPHET study had a binary primary benefit outcome and an increased rate of a single adverse event, spontaneous GI perforation, which resulted in an early stop of the trial. However, the Bayesian methods easily generalize to allow for other distributions of benefit and risk, provided one can simulate samples from the posterior distribution of interest. The Bayesian methods also allow prior information to be incorporated into the inference if such information is available. When it is of interest to compare more than two therapies, the benefit-risk approaches shown can be conducted in a pairwise fashion.