### The proportional hazards measure

Numerous summary measures for a pair of specificity and sensitivity have been suggested: we mention here the Youden index, *J*
_{
i
}=*p*
_{
i
}+*q*
_{
i
}−1 [10], and the squared Euclidean distance to the upper left corner in the SROC diagram, *E*
_{
i
}=(1−*p*
_{
i
})^{2}+(1−*q*
_{
i
})^{2}. [A review of summary measures is given in Liu [20].] Using an average over any of these measures might be problematic: not only might sensitivities and specificities be heterogeneous, this might also be true for the associated summary measures such as the Youden index or the Euclidean distance (as demonstrated by Figure 2 using the data of the meta-analysis of BNP and heart failure).

We suggest using the measure \theta =\frac{logp}{log(1-q)}, which relates the log-sensitivity to the log-false positive rate; we call it the *proportional hazards (PH)* measure. In Figure 3 we see that this measure shows a reduced variability for the meta-analysis of BNP and heart failure, making it more suitable as an overall measure in the meta-analysis of diagnostic studies or diagnostic problems. While the measure appears to be like any other summary measure of the pair sensitivity and specificity, it has a specific SROC-modelling background and motivation. We have mentioned previously the cut-off value problem: observed heterogeneity might be induced by cut-off value variation which could lead to different sensitivities and specificities – despite the accuracy of the diagnostic test itself not having changed – and might also lead to an induced heterogeneity in the summary measure. Hence, it is unclear whether the observed heterogeneity is due to heterogeneity in the diagnostic accuracy (authentic heterogeneity) or whether it has occurred due to cut-off value variation (artificial heterogeneity). This second form of heterogeneity can also occur when the background population changes with the study.

One of the features of the SROC approach is that it incorporates the cut-off value variation in a natural way; hence a measure modelling an ROC curve is favorable. We suggest the PH measure based upon the Lehman family in the following way:

This model was suggested by Le [21] for the ROC curve. It is an appropriate model since, for feasible *q*, (1−*q*)^{θ} is also feasible as long as *θ* is positive. Note that (1) is defined for all values of *p*∈[0,1] and *q*∈ [ 0,1] whereas \theta =\frac{logp}{log(1-q)} is only defined for *p*∈(0,1) and *q*∈(0,1). Population values of sensitivity and specificity of 1 are rarely realistic, although observed values of 1 for sensitivity and specificity do occur in samples. This can be coped with by using an appropriate smoothing constant such as estimating specificity as (*n*
_{
i
}−1)/*n*
_{
i
} when *x*
_{
i
}=*n*
_{
i
} and sensitivity as (*m*
_{
i
}−1)/*m*
_{
i
} if *y*
_{
i
}=*m*
_{
i
}.

In Figure 4 we see a number of examples of the proportional hazards family. It becomes clear now why *θ* is called the proportional hazards measure. By taking logarithms on both sides of (1) we achieve

\theta =logp\left(t\right)/log\phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}1-q(t\left)\right],

(2)

meaning if model (1) holds, the ratio of log-sensitivity to log-false positive rate is constant across the range of possible cut-off value choices *t*. Hence the name proportional hazards model, which was suggested in a paper by Le [21] and used again in Gönen and Heller [22]. The idea of representing an entire ROC curve in a *single* measure is illustrated in Figure 5. While sensitivity and specificity vary over the entire interval (0,1), the value of *θ* remains constant. Hence, log-sensitivity is *proportional* to the log-false positive rate. This assumption is similar to an assumption used for a model in survival analysis, where it is assumed that the hazard rate of interest is proportional to the baseline hazard rate; this might have motivated the choice of name used by Le [21] and Gönen and Heller [22] in this context.

However, it is not our intention to make the assumption that an entire SROC curve can be represented by model (1); the explanations above are instead meant as a motivation that the PH-measure is not just another summary measure, but can be derived from a ROC modelling perspective. We envisage that each study, with associated pair of sensitivity and specificity, can be represented by a specific PH-model, as illustrated in Figure 6.

We see indeed that each pair of sensitivity and specificity can be associated with its own ROC curve provided by

p={(1-q)}^{{\widehat{\theta}}_{i}}

(3)

where {\widehat{\theta}}_{i}=log{\widehat{p}}_{i}/log\phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}1-{\widehat{q}}_{i}], so that the curve (3) passes exactly through the point (1-{\widehat{q}}_{i},{\widehat{p}}_{i}).

*Comparison to other approaches.* It remains to be seen how appropriate the suggested proportional hazards model is and how it compares to other existing approaches. We emphasize that in our situation we have assumed that there is only *one* pair of sensitivity and false positive rate ({\widehat{p}}_{i},1-{\widehat{q}}_{i}) per study *i*. Situations where several pairs per study are observed (such as in Aertgeerts *et al.*[23]) are rare. Hence, on the log-scale for sensitivity and false-positive rate, we are not able to identify any straight line model *within a study* with *more than one* parameter, since this would require at least two pairs of sensitivity and specificity per study; see also Rücker and Schumacher [24, 25]. However, any one-parameter straight line model, such as the proposed proportional hazards model, is estimable within each study, although within-model diagnostics is limited since we are fitting the full within study model. Given that sample sizes within each diagnostic study are typically at least moderately large it seems reasonable to assume a bivariate normal distribution for log\widehat{p} and log(1-\widehat{q}) with means log*p* and log(1−*q*) as well as variances {\sigma}_{p}^{2} and {\sigma}_{q}^{2}, respectively, and covariance *σ* with correlation *ρ*=*σ*/(*σ*
_{
p
}
*σ*
_{
q
}). This is very similar to the assumptions in the approach taken by Reitsma *et al.*[17] (see also Harbord *et al.*[19]), with the difference that we are using the log-transformation whereas in Reitsma *et al.*[17] logit-transformations are applied. Then, it is a well-known result that the mean of the random variable log\widehat{p} (having unconditional mean log*p*) conditional upon the value of the random variable log(1-\widehat{q}) (having unconditional mean log(1−*q*)) is provided as

E(log\widehat{p}|log(1-\widehat{q}))=logp+\rho \frac{{\sigma}_{p}}{{\sigma}_{q}}[\phantom{\rule{0.3em}{0ex}}log(1-\widehat{q})-log(1-q)],

(4)

which can be written as \alpha +\theta [\phantom{\rule{0.3em}{0ex}}log(1-\widehat{q}\left)\right] where *α*= log(*p*)−*θ* log(1−*q*) and \theta =\rho \frac{{\sigma}_{p}}{{\sigma}_{q}}. This is an *important* result since it means that, in the log-space, sensitivity and false–positive rate are linearly related. Furthermore, if *α* is zero, the proportional hazards model arises.

The question then arises why not work with a straight line model

log{p}_{|log(1-q)}=\alpha +\theta log(1-q).

(5)

The answer is that such a model is *not identifiable* since we have only one pair of sensitivity and specificity observed in each study and it is not possible to uniquely determine a straight line by just one pair of observations since there are infinitely many possible lines passing through a given point in the log*p* – log(1−*q*) space. However, the proportional hazards model as a slope-only model *is* identifiable and it is more plausible than other identifiable models such as the intercept–only model. Clearly, a logistic-transformation would be more consistent with the existing literature [14, 15] than the log-transformation. However, both models would give a perfect fit (within each study) since there are no degrees of freedom left for testing the model fit. The situation changes when there are repeated observations of sensitivity and specificity *per study* available. However, these meta-analyses with repeated observations of sensitivity and specificity according to cut-off value variation are extremely rare.

### A mixed model approach

With the motivation of the previous sections in mind, we assume that *k* diagnostic studies are available with diagnostic accuracies {\widehat{\theta}}_{1},\cdots \phantom{\rule{0.3em}{0ex}},{\widehat{\theta}}_{k} where

{\widehat{\theta}}_{i}=\frac{log{\widehat{p}}_{i}}{log(1-{\widehat{q}}_{i})}.

(6)

We assume the following linear mixed model for log{\widehat{\theta}}_{i}:

log{\widehat{\theta}}_{i}={{\beta}^{T}x}_{i}+{\delta}_{i}+{\epsilon}_{i}

(7)

where **x**
_{
i
} is a known covariate vector in study *i*, *δ*
_{
i
} is a normally distributed random effect *δ*
_{
i
}∼*N*(0,*τ*
^{2}) with *τ*
^{2} being an unknown variance parameter, and {\epsilon}_{i}\sim N(0,{\sigma}_{i}^{2}) is a normally distributed random error with variance {\sigma}_{i}^{2} known from the *i*−th study.

There are several noteworthy points about the mixed model (7). The response is measured on the log-scale, where the transformation improves the normal approximation and also brings the diagnostic accuracy into a well-known link function family: the complementary log-log function. The difference of the probability for a positive test in the groups with and without the condition is measured on the complementary log-log scale. The fixed effect part involves a covariate vector **x** which could contain information on study level such as gold standard variation, diagnostic test variation, or sample size information. It should be noted that there are two variance components, *τ*
^{2} and {\sigma}_{i}^{2}. It is important to have information on the second variance component. If the second component is unknown, even under the assumption of homogeneity {\sigma}_{1}^{2}=\cdots ={\sigma}_{k}^{2}, the variance component model would *not* be identifiable. Hence, we need to devote some effort to derive expressions for the within study variances; this can be accomplished using the *δ*−method as discussed in the next section.

*Within study variance.* Let us consider (ignoring the study index *i* for the sake of simplicity)

log\widehat{\theta}=log(-log\widehat{p})-log\phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}-log(1-\widehat{q}\left)\right]

(8)

and apply the *δ*−method. Recall that the variance *V* *a* *r* *T*(*X*) of a transformed random variable *T*(*X*) can be approximated as [ *T*
^{′}(*E*(*X*))]^{2}
*V* *a* *r*(*X*) assuming that the variance *V* *a* *r*(*X*) of *X* is known. Applying this *δ*−method twice gives

\mathit{\text{Var}}log(-log\widehat{p})\approx \frac{\widehat{p}(1-\widehat{p})/m}{{\widehat{p}}^{2}{(log\widehat{p})}^{2}}

(9)

and

\mathit{\text{Var}}log(-log(1-\widehat{q}\left)\right)\approx \frac{\widehat{q}(1-\widehat{q})/n}{{(1-\widehat{q})}^{2}{(log(1-\widehat{q}\left)\right)}^{2}}

(10)

so that the within study variance for the *i*-th study is provided as

{\sigma}_{i}^{2}=\frac{{m}_{i}-{y}_{i}}{{m}_{i}{y}_{i}{(log{y}_{i}/{m}_{i})}^{2}}+\frac{{x}_{i}}{{n}_{i}({n}_{i}-{x}_{i}){(log(1-{x}_{i}/{n}_{i}\left)\right)}^{2}}.

(11)

We acknowledge that the above are estimates of the variances of the diagnostic accuracy estimates, but are used as if they were the true variances.

*Some important cases.* If there are no further covariates, *two* important models are easily identified as special cases of (7). One is the *fixed* effects model

log{\widehat{\theta}}_{i}={\beta}_{0}+{\epsilon}_{i}

(12)

and the other is the *random* effects model

log{\widehat{\theta}}_{i}={\beta}_{0}+{\delta}_{i}+{\epsilon}_{i}

(13)

which have gained some popularity in the meta-analytic literature.