We begin by showing that fortunately – and contrary to the statement by Sondhi et al. on page 4 of their article – there is a closed-form solution for what they term the BAE “tipping point”, which is key to their approach.
Assume, as per Sondhi et al., that both the likelihood of an effect estimate \(\hat {\theta }\) (the “data”) and the prior of the underlying effect size θ are represented by normal distributions \(\hat {\theta }\,\vert \, \theta \sim \mathrm {N}(\theta, \sigma ^{2})\) and θ∼N(μ,τ2), with the latter evidence coming either from pre-existing insight/studies or from a subsequent replication. Bayes’s Theorem then implies a posterior distribution \(\theta \,\vert \, \hat {\theta } \sim \mathrm {N}(\mu _{p}, \tau ^{2}_{p})\) whose mean and variance satisfy
$$\begin{array}{*{20}l} &\frac{\mu_{p}}{\tau^{2}_{p}} = \frac{\hat{\theta}}{\sigma^{2}} + \frac{\mu}{\tau^{2}}& &\text{and}& &\frac{1}{\tau^{2}_{p}} = \frac{1}{\sigma^{2}} + \frac{1}{\tau^{2}}&\end{array} $$
Sondhi et al. further assume that τ2=σ2, that is, the prior variance τ2 is equal to the data variance σ2 which itself is equal to the squared (known) standard error σ of the effect estimate \(\hat {\theta }\). It then follows that the posterior mean is the mean of the data and the prior mean, and that the posterior variance is half the data variance
$$\begin{array}{*{20}l} &\mu_{p} = \frac{\hat{\theta} + \mu}{2}& &\text{and}& &\tau^{2}_{p} = \frac{\sigma^{2}}{2}& \end{array} $$
(1)
The BAE “tipping point” is then defined as the least extreme prior mean that results in a posterior credible interval which excludes the null value. If the substantive hypothesis is for positive effect estimates (e.g. log(HR)>0) the BAE is the prior mean which leads to the lower limit Lp of the 100(1−α)% posterior credible interval being zero
$$\begin{array}{*{20}l} L_{p} = \mu_{p} - z_{\scriptscriptstyle \alpha/2} \, \tau_{p} = 0 \end{array} $$
(2)
while for negative effect estimates the upper limit Up is fixed to zero
$$\begin{array}{*{20}l} U_{p} = \mu_{p} + z_{\scriptscriptstyle \alpha/2} \, \tau_{p} = 0 \end{array} $$
(3)
with zα/2 the 1−α/2 quantile of the standard normal distribution. Combining Eq. (1) with Eq. (2), respectively Eq. (3), leads to
$$\begin{array}{*{20}l} \text{BAE} & = \text{sign}(\hat{\theta}) \sqrt{2} \, z_{\scriptscriptstyle \alpha/2} \, \sigma - \hat{\theta} \end{array} $$
(4)
where \(\text {sign}(\hat {\theta }) = 1\) when \(\hat {\theta } > 0\) and \(\text {sign}(\hat {\theta }) = -1\) otherwise. Re-written in terms of the upper and lower 100(1−α)% confidence interval (CI) limits U and L of the effect estimate \(\hat {\theta }\) we obtain
$$\begin{array}{*{20}l} \text{BAE} = \frac{\text{sign}(\hat{\theta}) \sqrt{2} (U - L) - (U + L)}{2} \end{array} $$
(5)
We see from Eq. (4) that Sondhi et al.’s proposal has the intuitive property that as the study becomes more convincing (through larger effect sizes \(|\hat {\theta }|\) and/or smaller standard errors σ), the BAE will decrease (increase) for positive (negative) \(\hat {\theta }\), indicating that less additional evidence is needed to push a non-significant study towards credibility. Eq. (4) and Eq. (5) also hold for significant studies but the BAE then represents the mean of a “sceptical” prior which renders the study non-significant.
These closed-form solutions greatly simplify the use of the BAE methodology. For example, Sondhi et al. use a comparison of monoclonals to show how it identifies additional evidence which, when combined with a non-significant finding, leads to overall credibility. The trial estimated the hazard ratio of the bevacizumab+chemo patients compared to the cetuximab+chemo patients as HR=0.42 (95% CI: 0.14 to 1.23), a non-significant finding with p=0.11. Expressed as log(HR), we have L=−1.97 and U=0.21. We use Eq. (5) and find that on log hazard ratio scale BAE=−0.66 equivalent to an HR of 0.52. Figure 1 shows the corresponding prior mean with 95% prior credible interval.
Thus additional evidence in the form of prior insight or a subsequent replication supporting an HR at least as impressive as this (i.e. an HR<0.52 in this case), and a CI at least as tight as that of the original study will render this non-significant result credible at the 95% level. Sondhi et al. cite prior evidence from Innocenti et al. [4] who found an HR=0.13 (95% CI: 0.06 to 0.30) which meets both criteria set by the BAE, and renders the original study credible.