 Research
 Open access
 Published:
Methods of determining optimal cutpoint of diagnostic biomarkers with application of clinical data in ROC analysis: an update review
BMC Medical Research Methodology volume 24, Article number: 84 (2024)
Abstract
Introduction
An important application of ROC analysis is the determination of the optimal cutpoint for biomarkers in diagnostic studies. This comprehensive review provides a framework of cutpoint election for biomarkers in diagnostic medicine.
Methods
Several methods were proposed for the selection of optional cutpoints. The validity and precision of the proposed methods were discussed and the clinical application of the methods was illustrated with a practical example of clinical diagnostic data of Creactive protein (CRP), erythrocyte sedimentation rate (ESR) and malondialdehyde (MDA) for prediction of inflammatory bowel disease (IBD) patients using the NCSS software.
Results
Our results in the clinical data suggested that for CRP and MDA, the calculated cutpoints of the Youden index, Euclidean index, Product and Union index methods were consistent in predicting IBD patients, while for ESR, only the Euclidean and Product methods yielded similar estimates. However, the diagnostic odds ratio (DOR) method provided more extreme values for the optimal cutpoint for all biomarkers analyzed.
Conclusion
Overall, the four methods including the Youden index, Euclidean index, Product, and IU can produce quite similar optimal cutpoints for binormal pairs with the same variance. The cutpoint determined with the Youden index may not agree with the other three methods in the case of skewed distributions while DOR does not produce valid informative cutpoints. Therefore, more extensive Monte Carlo simulation studies are needed to investigate the conditions of test result distributions that may lead to inconsistent findings in clinical diagnostics.
Introduction
One of the most important medical challenges is the clinical evaluation of diagnostic tests, which is of interest to clinical experts and statistical researchers. The gold standard methods are likely to be invasive and costly. Therefore, an evaluation of new diagnostic tests is very important. If the result of the diagnostic test is binary, sensitivity (Se) and specificity (Sp) are used as measures of the diagnostic accuracy. Se (true positive rate) refers to the probability of a positive test result for the persons with Target Condition (TC). The Sp (true negative rate) is the probability that the test result is negative, provided the person is without TF [1,2,3]. From a clinical perspective, in addition to Se and Sp, two other measures, the positive and negative predictive values, are of interest to clinicians. The negative predictive value (NPV) indicates the probability that a person is without TC if the test result is negative. The positive predictive value (PPV) denotes the probability of having TC if the test result is positive. The PPV and NPV are clinically important but they are influenced by the prevalence of TC in target population. Clinicians are interested in the PPV and NPV and want to assess the likelihood that a person is with TC or without TC based on the test results [2, 3]. As a rule, the results of the gold standard status and the test are summarized in Fig. 1 as follows:
However, no longer diagnostic tests are confined to positive/negative results. Many biomarkers in laboratory tests yield results on a continuous scale. The Receiver Operator Characteristic (ROC) curve analysis is a method of choice to determine the diagnostic accuracy (area under the ROC curveAUC) and partial area [3]. However, from clinical decisionmaking, it is interesting to define an optimal cutpoint on continuous biomarkers. Several methods for optimal cutpoint selection have been proposed [4,5,6,7,8,9]. The choice of priority between these methods is a matter of interest in clinical practice. Thus, the objective of this study was to provide an updated extensive review of ROC analysis and the methods of cutpoint selection of biomarkers with application using clinical data. In the following sections, first, we provided an overview of ROC analysis for diagnostic biomarkers. In particular, we focused on the different methods of cutpoint selection for laboratory diagnostic test. We illustrated the five popular methods of cutpoint selection with clinical data. The consistency and inconsistency of findings were discussed depending on the distribution of test results in diseased and healthy populations.
Overview of ROC curve for quantitative biomarker
Many diagnostic markers in modern medicine are quantitative. Various cutoff points can be considered for them, from which the Se and Sp for each of the points are derived [1]. The tradeoff between (1Sp) and (Se) should be plotted on a coordinate system, and the process of changes in Se versus (1Sp) is called the receiver operating characteristics (ROC) analysis curve [2, 3]. This curve shows the diagnostic accuracy of the test and expresses clinically and statistically the area under the curve (AUC) of the diagnostic power of the test, which corresponds exactly to the Wilcoxon statistic [10]. Historically, this was used in radars during World War II to identify the point as a target or object (true positive or Se) amidst the clutter (FP or 1Sp) on the ROC [11, 12]. It was later used by Lusted in radiology to characterize pulmonary tuberculosis and to determine the correlation of FP and FN findings in several studies on the interpretation of chest radiographs and more recently in clinical epidemiology to determine the diagnostic accuracy of biomarkers [13]. This graph therefore clearly determines the presence or absence of the desired result for objects or persons. In the medical and statistical literature, this ROC curve is often used to evaluate the diagnostic significance of quantitative markers. However, the most important thing about the ROC curve is that it can be used to determine the optimal cutoff point for quantitative biomarkers.
The structure of the ROC graph was shown in Fig. 2. The ROC graph is plotted in a 1 × 1 square, where the vertical axis corresponds to the Se rate, but the horizontal axis of this graph corresponds to the FP rate. Within this square, there is a curve and a diameter [3, 14]. The lower left corner is Se = 0 & Sp = 1, i.e. the highest possible cutoff value of the test. As we move from the lower left corner to the upper right corner, the Se increases but the Sp decreases. As a result, the cutoff value gets lower and lower, and at the end of the upper right corner of the square, the Se and Sp are 1 and 0, respectively, i.e. the lowest possible cutoff value for this test [11]. The stricter the criteria for determining a positive result, the more points on the curve shift downwards and to the left. If, on the other hand, a looser criterion is applied, the point on the curve shifts upwards and to the right [15].
Interpretation of different shapes of ROC curve
If the ROC curve lies above the square diameter, this means that the test correctly determines the difference between the two target populations (healthy people, and sick people). The closer this curve is to the upper left corner, the better the diagnostic significance. Even if this curve is placed in the lefthand corner with the indication (0.1), the test has full diagnostic significance (Se = Sp = 1) [11, 16]. If the curve is placed on the diameter, this means that the two identified populations have been randomly classified [11, 16]. If the curve is below the diameter, this means that the test results are completely misleading. So the basic idea of this graph is that all points should be near the upper left corner. However, among all these points, we should look for the point with the best cutoff value, as this point is used to determine the threshold value for distinguishing between two healthy and diseased populations.
Area Under the Curve (AUC)
The area under the ROC curve is abbreviated as AUC. The AUC can be calculated either parametrically under binormal distributions (or other pairs of distributions of test results) [17,18,19] or nonparametrically (i.e. empirically, without making any distributional assumptions of test results) [18,19,20]. Several methods have been suggested to calculate the standard error of AUC either parametrically or nonparametrically. The other index is the partial area that might be interested at clinical relevant range of false positive [19,20,21,22]. The AUC is one of the indicators of diagnostic accuracy when comparing diagnostic tests in the ROC analysis. The AUC summarizes the entire position of the ROC curve and is not dependent on a specific operating point [3]. AUC is interpreted in the following two ways: The statistical concept of AUC is the probability that the criterion value of an individual randomly drawn from a population of individuals with a diseased condition is greater than the criterion value of another individual randomly drawn from a population of individuals with a healthy condition [18], or that it is interpreted as the mean true positive rate (average Se) over all possible FP rates. One of the purposes of the ROC curve is to compare two or more diagnostic tests in the ROC analysis. Of course, the higher the AUC value, the higher the accuracy of the test. The maximum value that the AUC can have is 1, which means that the diagnostic test correctly and completely distinguishes two populations (this is the case when the distribution of the test result for two populations, namely healthy and diseased, does not overlap at all). If the AUC is 0.5, this means that the differentiation is random and the ROC curve lies exactly on the square diameter.
Parametric and nonparametric AUC
The most popular parametric model is the binormal model that assumes the distributions of test results in a healthy and sick population follow a Gaussian distribution with different means and standard deviation. Based on this assumption, a smooth ROC curve can be driven, and the AUC can be calculated with a closed formula as follows:
Where, µ1, µ0 the mean of the diseased and healthy population and σ1, σ0 are the standard deviation of the diseased and healthy population respectively, and ϕ is, the cumulative standard normal distribution function.
The nice property of the ROC curve is that AUC is invariant to any monotonic transformation of the decision scale. However, binormal model is a theoretical model, and it is not observed in real life, in particular when the sample size is small. The alternative nonparametric approach is more practical for nonbinormal data. The nonparametric Wilcoxon statistics provide an estimate of the trapezoidal role of AUC. HajianTilaki and Hanley showed practical calculation of nonparametric AUC based on the pseudoaccuracy and its sampling variability [10]. This latter approach is more convenient for nonGaussian data with a small sample size. For example, Fig. 2 provides binormal AUC (smooth cure) and nonparametric AUC (empirical AUC). However, as we pointed out already, the AUC is the Se averaged over all possible cutoffs and thus the comparison of two diagnostic tests based on AUC can be misleading when they are crossing each other and results in a wrong conclusion because the AUC is the sensitivity averaged over all possible cutoffs. In this situation, the Se at a given relevant range of FPF and at an optimal cutpoint is interesting.
Main issues of performing diagnostic test
The main issues of diagnostic tests are how the test results will be used in real life (is the test for “rule in” or “rule out”, what is the target population? What are the next steps given the positive test results, and so on). Although the Youden index provides beautiful statistical properties and clinical interpretations, it may not be recommended in real life for cutoff selection because it assumes an equal weight for Se and Sp. For example, in a screening test for cancer, the false negative results are much more serious than false positive because the positive results usually should be confirmed by other tests and procedures. Medical diagnostic tests can have different indications for use as a diagnosis, prognosis, monitoring, risk assessment, treatment choice and so on. For example, for the “ruleout” test for cancer, a typical cutoff is a prespecified level of Se (for example, 99%) and a clinical acceptable level of Sp. Another issue of applying a diagnostic test for evaluation of Se and Sp is that a test should be applied from the same source to the target population. For example, for diagnosing Alzheimer's Disease (AD), the target population might be subjects with memory problems and with and without AD. If one calculates the Sp based on the “healthy” subjects, it provides a very biased estimation. We should emphasize that the best methods of cutpoint selection with desirable statistical properties and clinically relevant, cannot solve the problems of design in performing diagnostic test. Another bias may arise with further workup, when primarily the test result is negative. The results of a diagnostic test affect the gold standard test (or reference test) that is used to verify the test results. This type of bias sometimes called “verification bias” or “workup bias”. The partial verification may occur when only those with a positive test receive the reference standard test and differential verification occurs where a different reference test is used depending on whether the preliminary test was positive or negative. Blinding workup may reduce such bias.
Rationale of optimal cutoff value for quantitative diagnostic biomarkers
When a quantitative diagnostic test is performed, two groups cannot be completely distinguished due to the overlap of test results in the group of patients and healthy individuals [23]. An example: Imagine two hypothetical distributions that refer to a situation in which the average test result is 80 in the patient group and 60 in the nonpatient group. If the cutoff value is set to 70 in this situation, people with the disease whose test result is below 70 are incorrectly classified as not having the disease (FN). However, if the doctor lowers the cutoff value to 65 in order to increase the Se of the test, the number of people who test positive increases (the Se increases), but the number of FP results also increases. In general, it is important to determine a cutoff value with adequate Se and Sp, as the use of less stringent criteria to increase Se leads to a tradeoff in which Sp decreases.
Methods of determining optimal cutoff value
One of the most important applications of the ROC curve is the determination of the optimal cutoff value for quantitative biomarkers. The search for the optimal cutoff value is not only about maximizing Se and Sp but also about finding a suitable compromise between the two based on various criteria [11]. When a disease is highly contagious or associated with severe complications, Se is more important than Sp. In contrast, Sp is more important than Se when it comes to whether a test is expensive or risky. If there is no tradeoff between Se and Sp, or if both are equally important, it makes the most sense to maximize both [11]. Several methods have been introduced to determine the optimal cutoff point, but some of them are very common and it should be noted that each of them has unique assumptions, and the selection of each one is based on the importance of the Se versus the Sp of the test. The most important of these methods are as follows:

1.
Youden’s J statistic

2.
Euclidean distance

3.
Index of union (IU)

4.
Cost approach

5.
Positive likelihood ratio (LR +) and negative likelihood ratio (LR–)

6.
Maximum product of sensitivity and specificity

7.
Number needed to misdiagnose (NNM)

8.
Analytical method

9.
Diagnosis Odds Ratio

10.
Min PValue
Youden’s J statistic
The Youden index uses the maximum vertical distance of the ROC curve from the point (X, Y) on the diagonal (random line). In fact, the Youden index maximizes the difference between the Se and FP rate, in other words, it maximizes the percentage of Net correct classification:
Therefore, the optimal cutoff point is calculated by maximizing Se + Sp at different cutoff points [15, 23].
Euclidean distance
Another way to determine the optimal cutoff value is to use the Euclidean distance from the coordinates (0, 1) in the left corner of the ROC space. In this method, the optimal cutoff value is determined according to the basic principle that the AUC value should be maximum. Therefore, the distance between the coordinate (0, 1) and the ROC curve should be minimized. The Euclidean distance is defined as follows:
The point at which this value is minimized is considered the optimal cutoff value [3, 23].
Index of Union (IU)
The Index of Union (IU) uses the absolute value difference between the diagnostic measure and the AUC value to minimize the misclassification rate, which is calculated using the following formula.
IU is a method to find the point at which Se and Sp are maximized simultaneously. This is similar to the Euclidean distance. The difference, however, is that it minimizes the absolute value differences between the AUC value and the diagnostic measurements (Se and Sp), and this index also minimizes the difference between Se and Sp. The cutoff point at which the IU is minimized is optimal. This method does not require complex calculations, as it only checks whether the Se and Sp at the optimal cutoff value are sufficiently close to the AUC values or not. Furthermore, in most cases, IU has a better diagnostic performance than other methods [5].
Cost approach
The cost approach is a method for determining the optimal cutoff value that takes into account the benefits of correct classification or the costs of misclassification. This method can be used when the costs of true positive (TP), true negative (TN), FP, and FN in a diagnostic test are known [24]. There are two ways to determine the cutoff value using the cost approach: to calculate the cost itself or use the cost index (f_{m}).
where Pr is the prevalence and C_{TN}, C_{FP}, C_{TP}, and C_{FN} refer to the costs of TNs, FPs, TPs, and FNs, respectively. These four costs should be mentioned in a common unit. When the cost index (f_{m}) is maximized, the average cost is minimized, and this point is regarded as the optimal cutoff value [24].
Another method to determine the optimal cutoff value in terms of costs is to use the misclassification cost term (MCT). Considering only the prevalence of the disease, C_{FN,} and C_{FP}, the point at which the MCT is minimized is determined as the optimal cutoff value [6, 23].
Positive likelihood ratio (LR +) and negative likelihood ratio (LR–)
Positive likelihood ratio (\(L{R}^{+}\)) is the ratio of true positives to FPs and negative likelihood ratio (\(L{R}^{}\)) is the ratio of FNs to true negatives. Researchers can choose a cutoff value that either maximizes \(L{R}^{+}\) or minimizes \(L{R}^{}\). The larger the \(L{R}^{+}\) is, the more information it has for the diagnostic test, but with the \(L{R}^{}\) it is exactly the opposite: if it is close to zero, the test performs better [23, 24].
Maximum product of sensitivity and specificity
In this method, the point at which the product of Se and Sp reaches the maximum is regarded as the optimal cutoff value [7, 15].
Number needed to misdiagnose (NNM)
This method refers to the number of patients in whom a misdiagnosis is estimated when a diagnostic test is performed. In other words: If number needed to misdiagnose (NNM) = 10, this means that ten people would need to be tested to find one misdiagnosed patient. The higher the NNM (maximize), the better the test performance [11].
Analytical method
This method is related to the NNM, with the difference that the NNM assumes that the costs of FP and FN are equal, but otherwise, there is a new formula where FN equals C equals FP, resulting in a weighted NNM. To find the most appropriate cutoff value, the weighted NNM can be maximized to account for both the proximity of test results to gold standard results and the cost of misdiagnosis (FP and FN) [11].
Diagnostic odds ratio (DOR)
The diagnostic odds ratio (DOR) is calculated by dividing the \(L{R}^{+}\) by the \(L{R}^{}\). By maximizing the LR + and minimizing the LR, the optimal cutoff point can be determined. Note that the \(L{R}^{+}\) is between 0 and + ∞, but the \(L{R}^{}\) is between 0 and 1. The DOR is between 0 and + ∞; if DOR = 1, it means that the DOR shows no relationship between the test results and the target conditions. But if both FP and FN are zero, the test has both Se and Sp of 100% [8, 15].
The log(DOR) has an approximately normal distribution and with SE(LOG(DOR)) you can obtain a confidence interval for LOG(DOR) and then calculate the limit value of the confidence interval for DOR by subtracting the antilogarithm. Obviously, the lack of FP and FN data at a given cutoff value can lead to low accuracy of LOG(DOR) estimation [8, 15]. The DOR has a disadvantage: it produces a very low or very high cutoff point. One of the limitations of the statistical behavior of DOR is that it is associated with a higher mean square error (MSE) in the right tail, resulting in an unstable measurement. Therefore, it is suggested to minimize the MSE instead of maximizing it. HajianTilaki has presented a graphical method based on a study relying on the distribution of data over the population and shown that the DOR is not compatible with Youden and Euclid’s methods in determining the optimal cutoff point and is sometimes noninformative under certain conditions [15].
Minimum Pvalue approach (min P)
In this method, all cutoff points resulting from the tradeoff between Se and FP are determined, the Pvalue is calculated for each of them and the point with the smallest Pvalue is selected as the optimal cutoff point [5, 9]. Statistically, this Pvalue is driven from a chisquare distribution with one degree of freedom.
A review of performance of different methods of optimal cutpoint
To compare different methods for determining the optimal cutoff point, various populationbased and Monte Carlo simulation studies were conducted, the results of which are summarized in Table 1. In the study by HajianTilaki, the four methods were compared based on different distributions of data in patients and healthy individuals, including Youden's Jstatistic, Euclidean distance, product of Se and Sp, and diagnostic odds ratio (DOR). Of these methods, only the DOR differed from the other methods. However, the cutoff point in other methods was almost similar and consistent under binormal distributions, but when using DOR, the cutoff point is too high or too low, which is not reliable. That is if the model was binormal with similar variances for two groups, the DOR metric curve was Ushaped, and maximizing it gives the optimal cutoff point on the extreme critical values. But when the variances were different, the DOR increased exponentially, so the optimal cutoff point was very high, but when the healthy group had more variance, the optimal cutoff point was very low; in the cases where the bilogistic model was considered to have equal variance, the DOR was fixed at different cutoff points, but in the case where the variance of the patient group was larger, it had a linear relationship (straight line) with a positive slope at different cutoff points, making the optimal cutoff point very high. As an advantage of ROC analysis for quantitative diagnostic tests, it is recommended to use the Youden index, the Euclidean index, or the product of Se and Sp to obtain optimal cutoff values [15].
In the Ünal simulation study, methods such as Youden's Jstatistic, minimum Pvalue, maximum product of Se and Sp, Euclidean distance and IU were applied to the simulated data. By comparing MSE, relative bias, bootstrap SD, coverage and average length, it was found that IU and Euclidean distance determine the best cutoff point, but the author rather recommends IU due to its clinical significance and easier understanding for clinicians [5].
In the simulation study by Rota et al., the comparison and calculation of different methods for determining the optimal cutoff value was carried out in the form of a simulation, as in the Ünal study, with the difference that the IU method was not used. In the report on Euclidean distance, almost better performance in terms of MSE, bias, etc. was shown in estimating the optimal cutoff point, although the author did not declare this method as the best method for determining the optimal cutoff point [9].
Habibzadeh et al. used methods such as Se = Sp (this method determines the point corresponding to the optimal cutoff point resulting from the maximum product of Se by Sp), Bayesian approach, Youden's Jstatistic, Euclidean distance, maximum weighted NNM, and an analytical method using Hooper et al.’s populationbased distribution data [25]. They considered MCT and had information such as the cost of FP and FN and pretest probability, a more appropriate optimal cutoff point could be determined by maximum weighted NNM and analytical methods [11].
Perkins and Schisterman evaluated the Youden and Euclidean distance methods using populationbased distribution data. Both methods reached almost the same optimal cutoff point, but in their study, the Youden method was recommended more due to its clinical concept, as it increases the rate of correct classification and decreases the rate of misclassification, although the Euclidean method has more geometric significance, less clinical significance and also maximizes the rate of misclassification [26].
Liu used simulation data with a normal distribution [7]. The Youden, Euclidian, and Product methods were used to determine the optimal cutoff point. The comparison criterion for these three methods was the MSE, which was lowest for the Product and Euclidian methods, while the Youden method had the highest MSE, especially when the classification accuracy was low [7].
Moreover, Gerke et al. utilized simulation data with four different scenarios, including the healthy and sick groups with two normal distributions with different mean and variance, the healthy group with normal distribution and the sick group with gamma distribution, and the last scenario in which the healthy group had an exponential distribution and the sick group had a gamma distribution. The Youden, Euclidean, and Product methods were used to calculate the true optimal cutoff value. The result was that these three methods had the same true optimal cutoff value only in the first scenario, in which the two groups were normally distributed but had different mean values (in the other scenarios, however, there was a difference of one hundredth) [27].
Statistical software for ROC curve analysis
Statistical programs used to perform ROC curve analysis included various commercial software programs such as IBM SPSS, MedCalc, SAS, Stata, and NCSS as well as opensource software (OSS) such as R and MetzROC [23]. IBM SPSS, the most widely used commercial software, can perform basic statistical analysis for ROC curves, such as plotting ROC curves and calculating AUC and CI with statistical tests, but it lacks the comparison of two correlated ROC curves. This outputbased software does not report the optimal cutoff point, but only gives the nonparametric ROC curve, AUC, 95% CI, and test (H0: AUC = 0.5, H1: AUC ≠ 0.5). Stata provides several functions for analyzing ROC curves, including partial AUC (pAUC) [28], comparing multiple ROC curves, determining the optimal cutoff value using the Youden index, and comparing two or more output AUCs. MedCalc provides a sample size estimate for a single diagnostic test and includes various analysis methods to determine the optimal cutoff value, but does not provide a function to calculate pAUC. In terms of NCSS, this software can: generate empirical and binormal ROC curves, calculate AUC, determine the cutoff value, calculate other ROC curve performance criteria such as the Youden index and misclassification cost, plot the ROC curve and other diagnostic measures. SAS also has a number of functions for ROC analysis, including PROC ROC: This method can be used to generate ROC curves, calculate the AUC, and compare the AUCs of two ROC curves. PROC LOGISTIC: This method can be used to fit logistic regression models and then to create ROC curves. PROC NLMIXED: This method can be used to fit mixed nonlinear models, which can then be used to create ROC curves. In contrast to commercial software packages, the program R is a free OSS that contains all functions for the analysis of ROC curves using packages such as ROCR, pROC and optimal cutpoints. Among the R packages, ROCR is one of the most comprehensive packages for ROC curve analysis and contains functions for calculating the AUC with CI. pROC can be used to compare the AUC with the pAUC of different methods and provides CI for Se, Sp, AUC, and pAUC. Similar to ROCR, pROC also offers some functions for determining the optimal cutoff value, which can be determined using the Youden index and the Euclidean index. Optimal cutpoints is a sophisticated R package specifically designed to determine the optimal cutoff point value [6]. Although these R packages have a large number of functions, they require good programming knowledge of the R language. A web tool for Rbased ROC curve analysis, which includes easy ROC and plotROC, is a webbased program that uses the R packages such as plyr, pROC, and optimal cutpoints to perform ROC curve analysis and extends the functionality of several ROC packages in R so that researchers can perform ROC curve analysis through an easytouse interface without having to write R code [29, 30].
An illustration of different methods of cutpoint selection with clinical data
In a clinical study of diagnostic accuracy of biomarkers, 30 patients of IBD and 30 healthy individuals were recruited based on pathologic examination [31]. The target population was patients who were referred to the outpatient clinics for their checkup for diagnosis of IBD. It was similar that physicians need to discriminate between IBD and healthy individuals in real life. All suspected patients underwent colonoscopy for pathology examination as gold standard. Then, blood samples were taken for all subjects to measure three biomarkers blindly including Creactive protein (CRP), erythrocyte sedimentation rate (ESR) and malondialdehyde (MDA), were collected from 30 patients with inflammatory bowel disease (IBD) and 30 healthy control The equal sample size of IBD patients and healthy subjects were taken in order to achieve a higher statistical power of testing diagnostic accuracy. This 50% prevalence of IBD patients in our dataset does not influence the sensitivity and specificity of diagnostic biomarkers and thus it is not distorted the cutpoint selection because the criteria for cutpoint selection for all methods based on the sensitivity and specificity not based on PPV and NPV.
In our analysis, we applied the nonparametric ROC analysis to derive the AUC of different biomarkers and their 95% confidence interval (CI) in predicting IBD. The diagnostic accuracy of each biomarker in predicting IBD and the optimal cutoff point were calculated with 5 different methods for each biomarker using NCSS software. In addition, R software was also used to draw the density plot. The Youden index, Product, Euclidian, and IU, and DOR methods were used to determine the optimal cutoff point.
Results
Figure 3 displays the density plot of the pairs of distributions of three biomarkers including CRP, ESR, and MDA in IBD patients and healthy individuals. The distribution of CRP in healthy people was normal, but in IBD patients it had a large tail and extension on the right side and was skewed. ESR was elongated on the right side in both patients and healthy individuals. On description, the degree of elongation and skewness was greater in patients than in healthy individuals. The MDA value suggested a bimodal distribution in both patients and healthy subjects.
Table 2 and Fig. 4 demonstrate the nonparametric ROC curves that all three biomarkers have significant predictive power, but CRP has a higher diagnostic accuracy than MDA and ESR. Table 3 indicates that the three Youden, Euclid, and Product methods have the same optimal cutoff point for CRP. As a result, Se and Sp were the same, and the IU estimated the cutoff point to be slightly below 6 mg/L. But the cutoff point of the DOR was at the upper extreme. Table 4 illustrates that the optimal cutoff point for the ESR is completely identical for the three Euclidian, Product, and IU methods, but differs significantly for the Youden method. The Youden method determined higher values (39 mm/h) for the ESR, which had a low Se. In contrast, the DOR method showed a limit value for the cutoff point. This obtained cutoff point had a high DOR, but compared to the Sp (Sp = 0.97), the Se (Se = 0.22) of this point was low. Table 5 represents that the optimal cutoff point for MAD is the same for the Euclidian, Product, and IU methods (1.7 μmol/L), but higher for the Youden method (2.1 μmol/L) with Se = 0.50 and Sp = 0.93. In contrast, the cutoff point of the DOR was higher (2.3 μmol/L), meaning that the DOR was maximal but had a low Se.
Discussion
Defining the optimal cutpoints for quantitative biomarkers plays a crucial role in clinical decisionmaking in diagnostic medicine. ROC analysis is an optional choice for determining the optimal cutoff value. However, there is no single standard method to determine the optional cutoff value of biomarkers. As illustrated in this comprehensive review, several methods have been proposed in the context of ROC analysis. The best known is the Youden index due to its clinical interpretation, which maximizes the proportion of correct classification after correcting for the random level. In some scenarios of the underlying distributions of biomarkers, especially for binormal distributions with equal variance, the Euclidean index, which maximizes the points on the ROC curve from the left corner of the ROC space at (0,1), may be more accurate than the Youden index [9], but these two methods gave a similar estimate of the cutpoint in the ROC space in the above scenario [15].
Our findings in clinical investigation of biomarkers in IBD patients showed that the density function of ESR and CRP was skewed to the right tail, but not the distribution of CRP in healthy individuals. While the density function of MDA indicated a bimodal shape in both IBD patients and healthy individuals. Despite the presence of bimodal shapes and a rightskewed distribution, the three Euclidean, Product and IU metrics yielded quite similar estimates of the optimal cutoff points, but the Youden index yielded a higher cutoff value. The greatest inconsistency was found in DOR compared to other metrics. It always yielded the optimal cutpoint in the critical tail. Our findings are in accordance with the results of other studies [15, 32]. The inconsistency of the results of DOR is related to the convex distribution of log(DOR) as a ratio metric. In particular, for a pair of Gaussian distributions, the metric of log(DOR) is Ushaped across different cutpoints [15, 32].
In several studies, populationbased biomarker distributions and Monte Carlo simulation studies with repeated samples have shown that the three Youden, Euclidean and Product methods yield similar estimates of cutpoints under certain conditions of Gaussian distributions [15]. However, log(DOR) results in a higher/extreme value of the cutpoint, which has very low validity and reliability. Hajian–Tilaki investigated the population distribution based on test results and suggested in some scenarios of the data from the bilogistic model in diseased and nondiseased individuals that log(DOR) itself is noninformative and its metric is flat across the value of the different cutpoints [15].
For the clinical practice of determining cutpoints, sample data were used in the current study to illustrate the practical application of the NCSS software in cutpoint selection. Software has been developed for cutpoint selection in clinical research as described in this detailed review. The SPSS software does not offer this calculation directly. The R software in the ROC analysis library does offer these optimal cutpoints, but may be more specialized and less familiar to clinicians. In our experience, a practitioner can use the NCSS software to create an estimate of the optimal cutpoints using at least five methods in the ROC analysis: Youden index, Euclidean index, Product method, IU and DOR.
The present study provided a practical example and indicated how the optimal cutpoints can be calculated in clinical research. We have shown that in some scenarios, the four common methods for selecting optional cutpoints can lead to identical results. However, the inconsistency of cutpoint selection is possible in some other conditions of test results with skew distributions or bimodal form.
The results of the ongoing study on the clinical example of biomarker data for prediction of IBD represented that the four Youden, Euclidean, Product of Se and Sp and IU methods gave a similar cutpoint for CRP, but DOR gave a higher value for cutpoint selection. Nevertheless, for ESR and MAD, the Youden index gave different results than Euclidean, Product and IU methods. This inconsistency may depend in part on the underlying distributions of test scores in diseased and healthy populations that we have shown the density function of test results with graphical presentation. The higher degree of skewness and heterogeneous variance may lead to greater inconsistency in the results. In our example, the extreme value of the cutpoint of DOR can be explained by the convex distribution of log(DOR) as a ratio criterion. This result is consistent with other findings in the selection of cutpoints [32]. Thus, extensive Monte Carlo simulation studies are needed to explore the conditions for the distribution of test results that may lead to inconsistent results by different methods for the cutpoint in the evaluation of clinical diagnostic tests. We had a small sample dataset and all data was used for training model. Thus, our study may limit to lack of external dataset for crossvalidation of diagnostic performance of calculated optimal cutpoints with different methods because the diagnostic performance of selected cutpoints was calculated with training dataset only.
Conclusion
Overall, the four methods including Youden index, Euclidean index, Product, and IU can produce quite similar optimal cutpoints for binormal pairs with the same variance. The cutpoint determined with the Youden index may not agree with the other three methods in the case of skewed distributions while DOR may not produce valid informative cutpoints. Therefore, more extensive Monte Carlo simulation studies are needed to investigate the conditions of test result distributions that may lead to inconsistent results in clinical diagnostics.
Availability of data and materials
Data cannot be shared openly but are available on request from corresponding author.
Abbreviations
 ROC:

Receiver operator characteristics
 D:

Diseased
 ND:

Nondiseased
 Se:

Sensitivity
 Sp:

Specificity
 NPV:

Negative predictive value
 PPV:

Positive predictive value
 TPF:

True positive fraction
 TNF:

True negative fraction
 FNF:

False negative fraction
 FPF:

False positive fraction
 AUC:

Area under the curve
 IU:

Index of union
 DOR:

Diagnostic odds ratio
 C:

Cost
 LR +:

Positive likelihood ratio
 LR:

Negative likelihood ratio
 NNM:

Number needed to mis diagnosed
 pAUC:

Partial area
 IBD:

Inflammatory bowel disease
 CRP:

Creactive protein
 MDA:

Malondialdehyde
 ESR:

Erythrocyte sedimentation
References
Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5(9):1315–6.
Eng J. Receiver operating characteristic analysis: utility, reality, covariates, and the future. Acad Radiol. 2013;20(7):795–7.
HajianTilaki K. Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. Caspian J Intern Med. 2013;4(2):627–35.
Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–5.
Unal I. Defining an optimal cutpoint value in ROC analysis: an alternative approach. Comput Math Methods Med. 2017;2017:3762651.
LópezRatón M, RodríguezÁlvarez MX, CadarsoSuárez C, GudeSampedro F. OptimalCutpoints: an R package for selecting optimal cutpoints in diagnostic tests. J Stat Softw. 2014;61(8):1–36.
Liu X. Classification accuracy and cut point selection. Stat Med. 2012;31(23):2676–86.
Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56(11):1129–35.
Rota M, Antolini L. Finding the optimal cutpoint for Gaussian and Gamma distributed biomarkers. Comput Stat Data Anal. 2014;69:1–14.
Hanley JA, HajianTilaki KO. Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Acad Radiol. 1997;4(1):49–58.
Habibzadeh F, Habibzadeh P, Yadollahie M. On determining the most appropriate test cutoff value: the case of tests with continuous results. Biochem Med (Zagreb). 2016;26(3):297–307.
Calì C, Longobardi M. Some mathematical properties of the ROC curve and their applications. Ricerche mat. 2015;64(2):391–402.
Lusted LB. Logical analysis in roentgen diagnosis. Radiology. 1960;74:178–93.
McNett M, Amato S, Olson DM. Sensitivity, specificity, and receiver operating characteristics: a primer for neuroscience nurses. J Neurosci Nurs. 2017;49(2):99–101.
HajianTilaki K. The choice of methods in determining the optimal cutoff value for quantitative diagnostic test evaluation. Stat Methods Med Res. 2018;27(8):2374–83.
Pandey M, Jain AR. ROC Curve: Making way for correct diagnosis. 2016.
Bandos AI, Guo B, Gur D. Estimating the area under ROC curve when the fitted binormal curves demonstrate improper shape. Acad Radiol. 2017;24(2):209–19.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.
Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol. 1975;12(4):387–415.
DeLong ER, DeLong DM, ClarkePearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.
HajianTilaki KO, Hanley JA. Comparison of three methods for estimating the standard error of the area under the curve in ROC analysis of quantitative data. Acad Radiol. 2002;9(11):1278–85.
HajianTilaki KO, Hanley JA, Joseph L, Collet JP. A comparison of parametric and nonparametric approaches to ROC analysis of quantitative diagnostic tests. Med Decis Making. 1997;17(1):94–102.
Nahm FS. Receiver operating characteristic curve: overview and practical use for clinicians. Korean J Anesthesiol. 2022;75(1):25–36.
Hintze DJL. NCSS Documentation [One ROC Curve and Cutoff AnalysisChapter 546]. Kaysville: NCSS; 2007.
Hooper L, Abdelhamid A, Ali A, Bunn DK, Jennings A, John WG, et al. Diagnostic accuracy of calculated serum osmolarity to predict dehydration in older people: adding value to pathology laboratory reports. BMJ Open. 2015;5(10):e008846.
Perkins NJ, Schisterman EF. The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am J Epidemiol. 2006;163(7):670–5.
Gerke O, Zapf A. Convergence behavior of optimal cutoff points derived from receiver operating characteristics curve analysis: a simulation study. Mathematics. 2022;10(22):4206.
Jiang Y, Metz CE, Nishikawa RM. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology. 1996;201(3):745–50.
FORTRAN programs ROCFIT, CORROC2, LABROC1, LABROC4 and ROCPWR. University of Chicago. Chicago: Available from Metz CE, Department of Radiology; 1990.
Shiraishi J, Fukuoka D, Iha R, Inada H, Tanaka R, Hara T. Verification of modified receiveroperating characteristic software using simulated rating data. Radiol Phys Technol. 2018;11(4):406–14.
Moein S, Qujeq D, VaghariTabari M, Kashifard M, HajianTilaki K. Diagnostic accuracy of fecal calprotectin in assessing the severity of inflammatory bowel disease: From laboratory to clinic. Caspian J Intern Med. 2017;8(3):178–82.
Böhning D, Holling H, Patilea V. A limitation of the diagnosticodds ratio in determining an optimal cutoff value for a continuous diagnostic test. Stat Methods Med Res. 2011;20(5):541–50.
Acknowledgements
We acknowledge the Deputy of Research and Technology of Babol University of Medical Sciences for their supports.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
M.H contributed the conception in design, critical review, data analysis, and drafting of manuscript. K.H provided the conception in design, critical literature review, data analysis, manuscript drafting, and supervision. All authors read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The retrospective clinical data has been used as an illustration of different methods of cutpoint selection, has conformed to the standard of the World Medical Association, as embodied in the Declaration of Helsinki. The related protocol was approved by the local ethics committee of Hormozgan University of Medical Sciences (ethical code: IR.HUMS.REC.94.182) and all participants had written consent prior participation in the study.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hassanzad, M., HajianTilaki, K. Methods of determining optimal cutpoint of diagnostic biomarkers with application of clinical data in ROC analysis: an update review. BMC Med Res Methodol 24, 84 (2024). https://doi.org/10.1186/s12874024021982
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874024021982