 Research article
 Open Access
 Published:
Optimal cutpoint definition in biomarkers: the case of censored failure time outcome
BMC Medical Research Methodology volume 15, Article number: 24 (2015)
Abstract
Background
Cutpoint finding is a crucial step for clinical decision making when dealing with diagnostic (or prognostic) biomarkers. The extension of ROCbased cutpoint finding methods to the case of censored failure time outcome is of interest when we are in the presence of a biomarker, measured at baseline, used to identify whether there will be the development, or not, of some disease condition within a given time point τ of clinical interest.
Methods
Three widely used cutpoint finding methods, namely the Youden index, the concordance probability and the point closest to(0,1) corner in the ROC plane, are extended to the case of censored failure time outcome resorting to nonparametric estimators of the sensitivity and specificity that account for censoring. The performance of these methods in finding the optimal cutpoint is compared under Normal and Gamma distributions of the biomarker (in subjects developing or not the disease condition). Normality ensures that estimators point theoretically to the same cutpoint. Two motivating examples are provided in the paper.
Results
The point closestto(0,1) corner approach has the best performance from simulations in terms of mean square error and relative bias.
Conclusions
We discuss the use of the Youden index or concordance probability associated to the cutpoint identified through the closestto(0,1) corner approach to ease interpretability of the classification performance of the dichotomized biomarker. In addition, the achieved performance of the dichotomized biomarker classification associated to the estimated cutpoint can be represented through a confidence interval of the point on the ROC curve.
Background
The use of a continuous biomarker X in clinical practice often requires the definition of a cutpoint c above (or below) which subjects are classified, for instance, as diseased and diseasefree. In the presence of a binary outcome, methods based on the receiver operating characteristic (ROC) curve are commonly and indistinctly used. These methods are based on objective functions of c:

i)
the Youden function, defined as the difference between the probability of X > c in diseased subjects (sensitivity, SE) and the complement to one of the probability of X ≤ c in diseasefree subjects (specificity, SP), i.e. SE + SP1. The chosen c maximizing this function, or equivalently SE + SP, leads to a maximum value known as Youden index [1];

ii)
the concordance probability function, equal to the product of SE and SP, where the chosen c maximizes this function [2,3];

iii)
the distance between the point (1SP, SE) and the optimal point (0,1) in the ROC plane [4], where the chosen c leads to the minimum distance, and the operating point is referred as point closestto(0,1) corner.
A recently published work compared these methods by simulation in the case of a binary outcome [5]. The authors showed that the point closestto(0,1) corner [4] and concordance probability [2,3] methods outperformed both the Youden index [1] and the minimum Pvalue approaches [6].
The extension of ROCbased cutpoint finding methods to the case of censored failure time outcome is of interest when we are in the presence of a biomarker, measured at baseline in a cohort of diseasefree subjects, and used to understand whether there will be the development, or not, of a disease condition within a given time point τ of clinical interest. However, this extension is not straightforward. In fact, both SE and SP cannot be estimated by simple proportions as in the case of a binary outcome, because it is not known whether censored subjects should be considered as diseased or diseasefree up to τ. As a consequence, a suitable estimator for SE and SP need to be used to account for the presence of censoring.
We aimed to extend the Youden index, concordance probability and point closestto(0,1) corner cutpoint estimation methods to the case of censored failure time outcome. The performance of the aforementioned methods in finding the optimal cutpoint is compared under Normal homoscedastic and Gamma distributions of the biomarker in diseased and diseasefree subjects [7]. Normality ensures that estimators point theoretically to the same cutpoint, as previously shown and described [2,5].
For each method, the optimal cutpoint is empirically estimated by maximization of objective functions [8] using the estimators for SE and SP derived in Antolini and Valsecchi [9]. To illustrate the methodology, two application examples are provided, one in an observational study of a molecular biomarker in acute lymphoblastic leukemia [10], and one concerning the definition of a prognostic score for patients with primary biliary cirrhosis enrolled in a randomized clinical trial [11].
Methods
Notations and basics
For a generic subject, let Z be the survival time, defined as the time elapsed between some initial time point, where the subject is diseasefree, and the development in time of disease. Let τ be a time horizon of clinical interest. The definition of disease and diseasefree conditions depends on whether Z ≤ τ or Z > τ. It is assumed that increasing values of the biomarker X are related to a possible increment of the risk of becoming diseased. Otherwise, without loss of generality, take the negative of X. For any cutpoint c that defines a binary classification rule, a generic subject is said to be testing positive or negative depending on whether X > c or X ≤ c.
In this context, SE and SP at c are defined as the probability of testing positive given that the subject is diseased
and as the probability of testing negative given that the subject is diseasefree
The ROC curve is defined as the plot of SE(c) across 1SP(c), for varying c. It is represented in Figure 1, panel A.
The Youden function of c is the difference between SE(c) and 1SP(c):
J(c) takes values between 0, when SE(c) = 1SP(c), and 1 when SE(c) = SP(c) = 1. The behavior of (3) is represented in Figure 1 panel B, thick line segment. The Youden index J [1] is defined as the maximum of the Youden function (3), or equivalently of SE(c) + SP(c). Graphically, J represents the maximum vertical distance between the ROC curve and the diagonal chance line representing a useless biomarker. It can be also interpreted as the maximum net gain of the true positive fraction (SE) with respect to the false positive fraction, i.e. 1SP (Figure 1 panel A thick line segment). The c maximizing (3) is the optimal cutpoint.
The concordance probability function [2,3] of c is the product of SE(c) and SP(c):
CZ(c) ranges between 0 if either SE(c) = 0 or SP(c) = 0, and 1 in the ideal case where SE(c) = SP(c) = 1. CZ(c) could be also expressed as the area of a rectangle on the ROC curve of width SP(c) and height SE(c) and interpreted as probability of being below or beyond c for any random pair of diseasefree and diseased subjects (Figure 1 panel A, dotted line). The behaviour of (4) is represented in Figure 1 panel B, dotted line. The optimal cutpoint according to this method is the c that maximizes (4).
The objective function defined as the distance between the couple (1SP(c), SE(c)) and the optimal point (0,1) – representing maximum specificity (SP = 1) and maximum sensitivity (SE = 1)  in the ROC plane (Figure 1 panel A, thin line segment) is obtained by applying the Euclidean distance
The behaviour of (5) is represented in Figure 1 panel B, thin line segment. The optimal cutpoint according to this method is the c that mimimizes (5).
The three objective functions (3), (4) and (5) lead theoretically to the same cutpoint c_{opt} when considering homoscedastic Normal distributions of the biomarker in diseased and diseasefree subjects (Figure 1). A formal proof is showed in Liu [2].
Optimal cutpoint estimation
Let T_{i} = min(Z_{i},C_{i}) be the observed time, where Z_{i} is the time to event (development of disease) and C_{i} the right censoring time, δ_{i} the censoring indicator (δ_{i} = 1 if T_{i} = Z_{i} and δ_{i} = 0 if T_{i} = C_{i}) and X_{i} the biomarker value for subject i. Independence completely at random between Z and C is assumed as in the classical framework of survival analysis. In our setting, the biomarker X is measured at baseline in order to identify ahead in time subjects that will develop disease or not within τ. The observed data in a sample of size N is {(X_{i}, T_{i}, δ_{i}); i = 1, …, N}, and it can be subdivided in three subgroups:

i)
Diseasefree, if T_{i} > τ regardless of δ_{i};

ii)
Diseased, if T_{i} ≤ τ and δ_{i} = 1;

iii)
Censored by τ while Diseasefree, if T_{i} ≤ τ and δ_{i} = 0;
with cardinality \( {\mathrm{n}}_{\mathrm{z}>\tau }={\displaystyle {\sum}_{\mathrm{i}=1}^{\mathrm{N}}\mathrm{I}\left({\mathrm{T}}_{\mathrm{i}}>\tau \right)},\;{\mathrm{n}}_{\mathrm{z}\le \uptau}={\displaystyle {\sum}_{\mathrm{i}=1}^{\mathrm{N}}\mathrm{I}\left({\mathrm{T}}_{\mathrm{i}}\le \uptau \right)}\kern0.5em {\updelta}_{\mathrm{i}} \) and \( {\mathrm{n}}_{\mathrm{C}}={\displaystyle {\sum}_{\mathrm{i}=1}^{\mathrm{N}}\mathrm{I}\left({\mathrm{T}}_{\mathrm{i}}\kern0.5em \le \kern0.5em \uptau \right)\left(1\kern0.5em \kern0.5em {\updelta}_{\mathrm{i}}\right)} \), respectively.
For the n_{c} subjects belonging to iii), the disease status by τ is unknown as it is not possible to know whether they would experience or not the disease within τ if censoring would not have occurred. This leads for any c of X to a classification 3×2 matrix of the type:
where \( {\mathrm{n}}_{\mathrm{X}\le \mathrm{c}}={\displaystyle {\sum}_{\mathrm{i}=1}^{\mathrm{N}}\mathrm{I}}\left({\mathrm{X}}_{\mathrm{i}}\le \mathrm{c}\right) \) and \( {\mathrm{n}}_{\mathrm{X}>c}={\displaystyle {\sum}_{\mathrm{i}\kern0.5em =1}^{\mathrm{N}}\mathrm{I}}\left({\mathrm{X}}_{\mathrm{i}}>c\right) \).
In this setting, SE(c) (1) and SP(c) (2) are not directly estimable from (6) since it is not known how the n_{c} subjects would contribute to the classification 2x2 matrix contrasting the disease status for all N subjects to the biomarker classification.
Nonparametric estimators of SE(c) (1) and SP(c) (2) can be derived following two different approaches, the direct or the indirect one, that were recently discussed and shown to be equivalent by Antolini and Valsecchi [9]. Direct estimation is based on an inverse probability weighting scheme applied to the counts of the groups of diseasefree and diseased subjects in the classification matrix (6) (lines 1 and 2), and originates from the consideration that subjects with observed status are selected from the censoring process and can be weighted to represent the subjects that are censored (line 3 of matrix (6)). Indirect estimation relies on writing SE and SP in terms of quantities that are estimable from the available data in the presence of censoring, and on pluggingin the estimates.
A further approach, equivalent to the aforementioned ones, consists in estimating the expected number of events in each of the four cells of the 2×2 classification matrix contrasting the disease status for all N subjects to the biomarker classification, as follows:
The two survival estimates \( {\widehat{\mathrm{S}}}_{\mathrm{X}>c}\left(\uptau \right)\kern0.5em =\kern0.5em \widehat{\mathrm{P}}\left(\mathrm{Z}>\uptau \left\mathrm{X}>\mathrm{c}\right.\right) \) and \( {\widehat{\mathrm{S}}}_{\mathrm{X}\le \mathrm{c}}\left(\uptau \right)\kern0.5em =\kern0.5em \widehat{\mathrm{P}}\left(\mathrm{Z}>\uptau \left\mathrm{X}\le \mathrm{c}\right.\right) \) are simply obtained by the KaplanMeier method in the two samples as classified according to the biomarker value X.
SE(c) (1) and SP(c) (2) can now be directly estimated from matrix (7) by simple proportions as
and
As an alternative, one could use the indirect estimators of SE(c) and SP(c) based on nearest neighbor estimates of the joint survival between X and Z [12]. This could have the advantage to relax the assumption of independence completely at random between C and Z since it requires only conditional independence given X.
The investigated ROCbased cutpoint finding methods could be now easily extended to the censored failure time outcome scenario by pluggingin sample estimates (8) and (9) into objective functions (3), (4) and (5). The optimal cutpoint estimates ĉ_{J}, ĉ_{CZ} and ĉ_{ER} are then obtained by maximizing the objective functions (3), (4) and (5) over all possible cutpoint values c of X [8].
It is worth of note that although the three methods point theoretically to the same cutpoint under Normal homoscedastic distributions of the biomarker in diseased and diseasefree subjects, the corresponding sample estimators here presented do not lead necessarily to the same estimated cutpoint in a single sample. This motivates the estimator performance comparison presented in the next section.
Simulation protocol
We conducted a simulation study to compare the performance of the Youden index (3), the concordance probability (4) and the point closestto(0,1) corner in the ROC plane (5) methods in the estimation of the optimal cutpoint in a censored failure time outcome scenario.
Data were simulated as follows:

The timetoevent Z was generated according to an exponential survival function S(t) = 1 − e^{− 2t}.

τ was set equal to 0.35 in order to achieve a disease fraction of 50% and to 0.20, 0.14, 0.11 and 0.08 to achieve disease fractions of 33%, 25%, 20% and 15%.

Depending on whether Z ≤ τ or Z > τ, the biomarker X was generated according to N(μ_{Z≤τ}, 1) or N(0,1), respectively. This leads to SE(c) = 1 − Φ(c − μ_{Z ≤ τ}) and SP(c) = Φ (c), where Φ denotes the standard Normal distribution function. It has been previously shown that within this scenario the objective functions (3), (4) and (5) reach their maximum in correspondence of the same true cutpoint, i.e. c_{opt} = μ_{Z ≤ τ}/2 [2,5]. Analytically, this common cutpoint occurs at the intersection between the Normal probability density functions of diseased, i.e. f_{Z>τ} (c), and diseasefree subjects, i.e. f_{Z>τ} (c).

Similarly, X was generated according to G(2.5, β_{Z≤τ}) and G(1.5,1) depending on whether Z ≤ τ or Z > τ. This implies that \( \mathrm{S}\mathrm{E}\left(\mathrm{c}\right)=\frac{1}{\Gamma (2.5){\left({\upbeta}_{\mathrm{Z}\le \uptau}\right)}^{2.5}}{\displaystyle {\int}_{\mathrm{c}}^{\infty }{\mathrm{x}}^{1.5}{\mathrm{e}}^{\hbox{} \mathrm{x}/{\upbeta}_{\mathrm{Z}\le \uptau}}\mathrm{d}\mathrm{x}} \) and \( \mathrm{S}\mathrm{P}\left(\mathrm{c}\right)\kern0.5em =\kern0.5em \frac{1}{\Gamma (1.5)}{\displaystyle {\int}_{\infty}^{\mathrm{c}}{\mathrm{x}}^{1.5}{\mathrm{e}}^{\mathrm{x}}\mathrm{d}\mathrm{x}} \). Within this scenario, the objective functions (3), (4) and (5) point to different true cutpoints, and a closed form for c_{opt} cannot be derived [2,5].

μ_{Z≤τ} was set equal to {0.51, 1.05, 1.68, 2.56} and β_{Z≤τ} to {0.79, 1.22, 1.97, 3.82} in order to achieve a wide variety of the classification accuracy, i.e. J(c_{J}) = {0.2, 0.4, 0.6, 0.8}, CZ(c_{CZ}) = {0.36, 0.49, 0.64, 0.81}, ER(c_{ER}) = {0.57, 0.42, 0.28, 0.14}, ranging from a poor one (J = 0.2, CZ = 0.36 and ER = 0.57) to a high one (J = 0.8, CZ = 0.81 and ER = 0.14).

The simulation of the survival times first, and thereafter the biomarker values, is somehow counterintuitive since the “natural” ordering suggests that time should be generated depending on the biomarker values, and not vice versa. This choice was done only to keep directly under control the theoretical values of SE and SP, and thus of the three objective functions, Youden index (3), concordance probability (4) and point closestto(0,1) corner (5).

To simulate independent censoring, the censoring time C was generated according to a uniform distribution in the interval [0,b]. When we considered the scenario with a disease fraction equal to 50%, b was set equal to 2, 1 and 0.66 time units in order to achieve different censoring levels, i.e. 12%, 25% and 38%. Within the scenarios with disease fraction equal to 33%, 25%, 20% and 15%, b was set equal to 0.67, 0.50, 0.40 and 0.29 time units to achieve a censoring level of 25%.

The observed survival data was calculated by T = min(Z,C) and δ = 1 if T = Z and δ = 0 if T = C.
We generated 1000 samples {(X_{i}, T_{i}, δ_{i}); i = 1, …, N} of size N = 50, N = 100, N = 200 and N = 400 with a disease fraction of 50% and three different censoring levels, i.e. 12%, 25% and 38%. Moreover, we generated 1000 samples of size N = 100, N = 150, N = 200 and N = 250 with different disease fractions, i.e. 33%, 25%, 20% and 15%, and a censoring level of 25%. For each sample, we determined by empirical numerical maximization [2,5,8] the optimal cutpoint estimates ĉ_{J}, ĉ_{CZ} and ĉ_{ER} for the Youden index, the concordance probability and the point closestto(0,1) corner in the ROC plane methods, respectively. The relative bias and the mean square error (MSE) of each method were computed by E[(ĉ_{.} − c)] and E[(ĉ_{.} − c)^{2}], where the expectation was meant to be the average over the N simulated samples.
Given the computational burden, we applied the bootstrap resampling technique to estimate the standard deviation and the confidence interval (CI) for the optimal cutpoint for some selected scenarios with sample size N = 100 or N = 150. We applied the Efron and Tibshirani’s procedure [13] as follow:

From each sample {(X_{i}, T_{i}, δ_{i}); i = 1, …, N}, we applied a random sampling with replacement to draw 200 bootstrap samples in order to calculate the bootstrap estimate ĉ_{B} (B = 1, …, 200).

We applied the basic percentile method, taking the 0.025 and 0.975 percentiles of the ĉ_{B} bootstrap distribution in order to construct a 95% CI of the optimal cutpoint within each of the 1000 generated samples. Each bootstrap sample contributed one cutpoint estimate, so that the standard deviation of the 200 cutpoint estimates was used as the bootstrap estimator of the standard deviation (SD_{B}) for the estimated cutpoint.

The CI for the cutpoint for each of the investigated methods was subsequently evaluated by computing coverage probability and mean length.
Simulations have been performed in R version 2.15 [14].
Results
Simulation study
The results of the simulation exercises under Normal homoscedastic distribution of X with a diseased and diseasefree fraction of 50% are shown in Tables 1, 2 and 3 for different censoring levels, i.e. 12%, 25% and 38%, respectively. The relative bias of the investigated methods is small on all levels of classification accuracy, except for the scenario with J = 0.2 and CZ = 0.36 for samples of size N = 50 and N = 100, and it increases as the censoring level increases. By comparing the MSEs, it can be noticed that the point closestto(0,1) corner in the ROC plane and the concordance probability methods have better performance than the Youden index method. Indeed, the MSE is inversely related to sample size and it increases as the censoring level increases. The performance of the investigated methods improves with increasing classification accuracy.
Table 4 shows the results under Normal homoscedastic distribution of X when considering different disease fractions and a censoring level of 25%. The relative bias of the investigated methods is small on all levels of classification accuracy, except for the scenarios with a disease fraction equal to 15%. As above, the point closestto(0,1) corner in the ROC plane and the concordance probability methods outperform the Youden index method. The MSE is lower for the point closestto(0,1) corner in the ROC plane method, too.
The results of the simulation exercise under Gamma distribution of X when considering a diseased and diseasefree fraction of 50% and a censoring level of 25% are shown in Table 5. In such scenario, the three objective functions point to different cutpoints, and only a relative performance comparison could be made. We note that methods’ performance improve with increasing classification accuracy in terms of relative bias, and also that the Youden index method showed an unsatisfactory performance in the scenario with J = 0.2 and CZ = 0.36 for samples of size N = 50 and N = 100. The MSE is inversely related to sample size but it increases as the classification accuracy increases. As in the Normal scenarios, the MSE is lower for the point closestto(0,1) corner in the ROC plane method.
Bootstrap standard deviation, coverage probability and mean length of the 95% bootstrap CI for the cutpoint are shown in Additional file 1: Table S1 for some selected simulation scenarios under Normal homoscedastic distribution of X. The SD_{B} of the point closestto(0,1) corner in the ROC plane approach is lower than the SD_{B} of the Youden index and concordance probability methods. Coverage probabilities are fluctuating around the nominal level. 95% bootstrap CIs were narrower when considering the scenarios with better classification accuracies, i.e. J of 0.6 and 0.8.
Applicative example on acute lymphoblastic leukemia
Acute lymphoblastic leukemia (ALL) is the most common malignancy in children and it presents, in the large majority (70%), a Bcell precursor (BCP) ALL immunophenotype. The cure rate of BCPALL is nowadays higher than 80%, but the probability of survival of patients who relapse is only 40% [10]. Recent studies had reported that a higher expression of the cytokine receptorlike factor 2 (CRLF2) was associated to a higher risk of relapse. In their study, Palmi et al. [10] aimed at defining a cutpoint for the CRLF2, as measured at diagnosis, that would allow to identify those children more likely to relapse in order to be able to tailor upfront the treatment intensity in future protocols. We applied the presented methods to this study that includes 464 Italian BCPALL children enrolled (from February 2003 to July 2005) in the AIEOP (Associazione Italiana Ematologia Oncologia Pediatrica) treatment protocol “AIEOPBFM ALL2000”. The time window of interest for predicting relapse was of 5 years, and in that time frame 74 relapses had been observed over a total of 79 relapses in the cohort. Figure 2 Panel A shows the event free survival (EFS) curve along with the 95% confidence bands. The 5year EFS estimate was 81.6% (95% CI, 78.1%85.1%). The CRLF2 expression had a rightskewed distribution (ShapiroWilk normality test P < 0.01) ranging from 0.006 to 810fold change compared to the overall median value (Figure 2 Panel B) [10].
The estimated cutpoint for the CRLF2 expression was ĉ = 1.46, the same for all three methods, but the classification accuracy of this biomarker was very low, as depicted by the ROC curve and expressed by the Youden index J = 0.10 calculated for the identified cutpoint (Figure 2 Panel C). The 95% bootstrap (999 replicates) CI estimates for the cutpoint are (0.12, 21.61), (0.70, 1.98) and (0.70, 1.86), for the Youden index, the concordance probability and the point closestto(0,1) corner in the ROC plane methods, respectively. The 95% deltamethod based elliptic asymptotic confidence interval of (FPF(ĉ), TPF(ĉ)) is represented in Figure 2 Panel C. This interval is elliptic in the logit space since it was obtained from a joint interval on the logit transformation of \( \widehat{\mathrm{TPF}} \) and \( \widehat{\mathrm{FPF}}, \) which are correlated, altough modestly, since censored observations contribute to both estimators [9].
Applicative example on primary biliary cirrhosis
We used data, made available online inside the survivalROC R package [15], from a randomized placebocontrolled trial of Dpenicilliamine (DPCA) for the treatment of primary biliary cirrhosis (PBC), conducted at the Mayo Clinic between 1974 and 1984. Among the 312 subjects randomized to the study, 125 died by the end of the followup. The survival curve is shown in Figure 3 Panel A. Data from this negative study were used to develop a clinical prediction model for mortality based on bilirubin and albumin levels, prothrombin time, presence of edema and age at diagnosis [11]. We aimed to find a cutpoint for this widely used prognostic score (hereafter named MAYOSCORE5) by considering a time frame of one year, i.e. τ = 365 days, from study entry, by when 22 deaths occurred. The MAYOSCORE5 score ranged between 3.74 and 11.250 with a median of 5.75, and its distribution is quite symmetric (Figure 3 Panel B), even if not Normal according to a formal test (ShapiroWilk normality test P < 0.01). The three investigated methods, i.e. the Youden index, the concordance probability and the point closestto(0,1) corner in the ROC plane, lead to the same estimated cutpoint ĉ = 7.35. The biomarker had a good classification accuracy, with a Youden index J = 0.78 and a concordance probability CZ = 0.79, as shown in Figure 3. The bootstrap (999 replicates) standard deviations for ĉ are 0.11, 0.09 and 0.05 for the Youden index, concordance probability and point closestto(0,1) corner in the ROC plane methods, respectively. Moreover, the 95% bootstrap CI estimates for the cutpoint are (6.99, 7.35), (7.03, 7.35) and (7.30, 7.48), for the Youden index, the concordance probability and the point closestto(0,1) corner in the ROC plane methods, respectively. The 95% deltamethod based elliptic asymptotic confidence interval of (FPF(ĉ), TPF(ĉ)) is represented in Figure 3 Panel C. This interval is elliptic in the logit space since it was obtained from a joint interval on the logit transformation of \( \widehat{\mathrm{TPF}} \) and \( \widehat{\mathrm{FPF}}, \) which are correlated, altough modestly, since censored observations contribute to both estimators [9].
Discussion
In this work we extended three widely used ROCbased methods for defining a cutpoint of a continuous biomarker, namely the Youden index [1], the concordance probability [2,3] and the point closest to(0,1) corner [4], to the censored failure time outcome by using nonparametric estimators of sensitivity and specificity in the presence of censoring [9]. The minimum pvalue approach [6] was not extended to the censored data setting since its objective function is computed under the null hypothesis of absence of association between the true binary status and the biomarker classification, in contrast with the presence of some discrimination potential that leads to the dichotomization issue itself. In fact, this last method showed an unsatisfactory performance when oriented to identify a cutpoint in the presence of a binary outcome [5]. The same consideration would also apply to other testbased methods such as the logrank, which in addition it is not specifically related to a predefined time horizon [16].
The simulation protocol was set in order to keep directly under control the theoretical values of sensitivity and specificity by simulating the survival times first, and afterwards the biomarker values conditional on time. This strategy can be however reversed by working with the Bayes’ theorem.
We mainly considered the case where the three methods identify theoretically the same underlying true cutpoint, as in the presence of Gaussian homoscedastic biomarker distributions. The main issue a researcher faces in this common situation is the choice between alternative estimators of the same parameter (cutpoint). We showed that the point closestto(0,1) corner approach has the best performance from simulations in terms of mean square error and relative bias. However, the calculation of the Youden index [1] or the concordance probability [2,3] associated to the cutpoint identified through the point closestto(0,1) corner estimator could be used to ease interpretability and to communicate the classification performance of the biomarker given the lack of clinical meaning of the point closestto(0,1) corner objective function [4].
In the absence of a closed form, we provided estimation of the standard deviation and 95% confidence interval for the cutpoint by the bootstrap method [13]. We used only 200 replicates due to computational burden of the simulation exercise. This may have led to coverage under the nominal level in some scenarios. We recommend to use a larger number of replicates in real data applications. In the applications presented in this paper, we used 999 bootstrap replicates. In addition, the achieved performance of the dichotomized biomarker classification associated to the estimated cutpoint can be represented through a confidence interval of the point on the ROC curve [9,17].
It should also pointed out that a good estimation of the cutpoint did not necessary lead to a good estimation of the corresponding objective functions, and vice versa [8]. In our simulation scenario, we found an overestimation of the Youden index and concordance probability, and an underestimation of the closestto(0,1) corner objective function, at the optimal estimated cutpoint. The bias decreased with increasing sample size and classification accuracy of the biomarker. This is due to the fact that most properties of estimators, such as bias, are not preserved under nonlinear monotonic transformations [8]. Thus, when communicating the clinical value of an identified cutpoint, we also recommend to provide the confidence interval estimate of the associated objective function. For the Youden index method, the variability of the objective function estimate in the presence of censored data can be addressed by applying the delta method and by handling the covariance issue as in Antolini and Valsecchi [9].
When methods point to different true cutpoints, as in the Gamma distribution scenario, since the parameters of interest are different, estimators cannot be solely chosen relying on performance. In this case, scientists should rather choose according to the meaning of the underlying objective functions. For instance, the Youden index method [1] could be chosen if the researcher is interested in interpreting the net gain of the true positive fraction accounting for the false positive fraction, while the concordance probability approach [2,3] could be used if the researcher aims to interpret the probability of being below or above the cutpoint for any random pair of diseasefree and diseased subjects. When the focus is not on a specific time horizon, other cutpoint finding methods could be considered, such as Harrell’s C, or even modelbased derived indicators [16].
When disease prevalence is far from 50%, as in many applications, the three investigated methods could be modified by a weighting system in order to take into account the relative importance attributed to a true positive or a true negative result by addressing aspects related to both disease prevalence and patient’s benefit associated with a correct positive test result [18,19]. Moreover, in a setting with a high or poor overall survival, the estimated cutpoint may have larger variation [3]. It has also to be considered that besides the relatively high/poor overall survival, the investigated objective functions do not generally lead to optimal cutpoints on the boundary of the biomarker distribution [5]. This is nice, since a cutpoint on the boundary could indeed lead to a very limited sample size for the estimation of one of the two conditional survivals which are plugged into sensitivity and specificity.
The proposed example on CRLF2 expression in acute lymphoblastic leukemia [10] shows that in some clinical applications methods based on sensitivity and specificity may lead to unsatisfactory cutpoints due to a moderate discrimination potential of the biomarker, as represented by the whole ROC curve. By contrast, the application example on the Mayo score predicting mortality in primary biliary cirrhosis shows a satisfactory result.
Future works should address the issue of when cutpoint finding should be based on predictive values [20], more appealing for clinical interpretation and use, rather than on sensitivity and specificity. If the point of a biomarker based test is to use it to discriminate prognosis, clinicians need to know the probability that the outcome will be favourable or unfavourable given the test outcome. In this way, clinicians would approach the data from the direction of the test results, using predictive values [21], although their cutpoint definition would be influenced by the prevalence of the condition. For example, when predictive values had been used to detect a cutpoint for the CRLF2, as done in the original paper [10], a very extreme value of the cutpoint would have been identified, which defines a very rare subgroup with a high risk of relapse.
Conclusions
We showed the extension of the Youden index, the concordance probability and the point closest to(0,1) corner in the ROC plane cutpoint finding methods to the case of censored failure time outcome. When considering the Normal homoscedastic scenario where the investigated methods lead to the same cutpoint, the point closestto(0,1) corner approach has the best performance from simulations in terms of mean square error and relative bias. However, we discuss the use of the Youden index or the concordance probability associated to the cutpoint identified through the closestto(0,1) corner approach to ease interpretability of the classification performance of the biomarker.
References
 1.
Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–5.
 2.
Liu X. Classification accuracy and cut point selection. Stat Med. 2012;31(23):2676–86.
 3.
Liu X, Jin Z. Optimal survival timerelated cutpoint with censored data. Stat Med. 2015;34(3):515–24.
 4.
Perkins NJ, Schisterman EF. The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am J Epidemiol. 2006;163(7):670–5.
 5.
Rota M, Antolini L. Finding the optimal cutpoint for Gaussian and Gamma distributed biomarkers. Computational Statistics & Data Analysis. 2014;69:1–14.
 6.
Miller R, Siegmund D. Maximally selected chi square statistics. Biometrics. 1982;38(4):1011–6.
 7.
Schisterman EF, Perkins NJ, Liu A, Bondell H. Optimal cutpoint and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology. 2005;16(1):73–81.
 8.
Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biom J. 2005;47(4):458–72.
 9.
Antolini L, Valsecchi MG. Performance of binary markers for censored failure time outcome: nonparametric approach based on proportions. Stat Med. 2012;31(11–12):1113–28.
 10.
Palmi C, Vendramini E, Silvestri D, Longinotti G, Frison D, Cario G, et al. Poor prognosis for P2RY8CRLF2 fusion but not for CRLF2 overexpression in children with intermediate risk Bcell precursor acute lymphoblastic leukemia. Leukemia. 2012;26(10):2245–53.
 11.
Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61(1):92–105.
 12.
Heagerty PJ, Lumley T, Pepe MS. Timedependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56(2):337–44.
 13.
Efron B, Tibshirani R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci. 1986;1(1):54–75.
 14.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2012.
 15.
Heagerty PJ, Saha P. survivalROC: Timedependent ROC curve estimation from censored survival data. 2013. R package version 1.0.3.
 16.
Sima CS, Gönen M. Optimal cutpoint estimation with censored data. J Stat Theory Pract. 2013;7(2):345–59.
 17.
Bantis LE, Nakas CT, Reiser B. Construction of confidence regions in the ROC space after the estimation of the optimal Youden indexbased cutoff point. Biometrics. 2014;70(1):212–23.
 18.
Jund J, Rabilloud M, Wallon M, Ecochard R. Methods to estimate the optimal threshold for normally or lognormally distributed biological tests. Med Decis Making. 2005;25(4):406–15.
 19.
Smits N. A note on Youden’s J and its cost ratio. BMC Med Res Methodol. 2010;10:89.
 20.
Linn S, Grunau PD. New patientoriented summary measure of net total gain in certainty for dichotomous diagnostic tests. Epidemiol Perspect Innov. 2006;3:11.
 21.
Altman DG, Bland JM. Diagnostic tests 2: predictive values. BMJ. 1994;309(6947):102.
Acknowledgements
This research received no specific grant from any funding agency in the public, commercial, or notforprofit sectors. The authors wish to acknowledge the European Network for Cancer Research in Children and Adolescents (ENCCA) FP7HEALTHF22011 Contract no. 261474 project for their partial support to this project.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
MR contributed to the methodological aspects, performed the statistical analyses and drafted the manuscript. LA contributes to the theoretical work, conception and design of the simulation study and critically revised the manuscript. MGV contributes to the interpretation of results and critically revised the manuscript. All authors read and approved the final manuscript.
Additional file
Additional file 1: Table S1.
Bootstrap standard deviation, coverage probability and mean length of the 95% confidence interval of the cutpoint in the normal homoscedastic scenario with different disease fractions and censoring levels.
Rights and permissions
About this article
Cite this article
Rota, M., Antolini, L. & Valsecchi, M.G. Optimal cutpoint definition in biomarkers: the case of censored failure time outcome. BMC Med Res Methodol 15, 24 (2015). https://doi.org/10.1186/s128740150009y
Received:
Accepted:
Published:
Keywords
 Optimal cutpoint
 Censored failure time outcome
 Youden index
 Concordance probability
 Point closestto(0,1) corner in the ROC plane