An assessment of the relationship between clinical utility and predictive ability measures and the impact of mean risk in the population
 Kevin McGeechan^{1}Email author,
 Petra Macaskill^{1, 2},
 Les Irwig^{1, 2} and
 Patrick MM Bossuyt^{3}
DOI: 10.1186/147122881486
© McGeechan et al.; licensee BioMed Central Ltd. 2014
Received: 11 November 2013
Accepted: 26 June 2014
Published: 3 July 2014
Abstract
Background
Measures of clinical utility (net benefit and event free life years) have been recommended in the assessment of a new predictor in a risk prediction model. However, it is not clear how they relate to the measures of predictive ability and reclassification, such as the cstatistic and Net Reclassification Improvement (NRI), or how these measures are affected by differences in mean risk between populations when a fixed cutpoint to define high risk is assumed.
Methods
We examined the relationship between measures of clinical utility (net benefit, event free life years) and predictive ability (cstatistic, binary cstatistic, continuous NRI(0), NRI with two cutpoints, binary NRI) using simulated data and the Framingham dataset.
Results
In the analysis of simulated data, the addition of a new predictor tended to result in more people being treated when the mean risk was less than the cutpoint, and fewer people being treated for mean risks beyond the cutpoint. The reclassification and clinical utility measures showed similar relationships with mean risk when the mean risk was less than the cutpoint and the baseline model was not strong. However, when the mean risk was greater than the cutpoint, or the baseline model was strong, the reclassification and clinical utility measures diverged in their relationship with mean risk.
Although the risk of CVD was lower for women compared to men in the Framingham dataset, the measures of predictive ability, reclassification and clinical utility were both larger for women. The difference in these results was, in part, due to the larger hazard ratio associated with the additional risk predictor (systolic blood pressure) for women.
Conclusion
Measures such as the cstatistic and the measures of reclassification do not capture the consequences of implementing different prediction models. We do not recommend their use in evaluating which new predictors may be clinically useful in a particular population. We recommend that a measure such as net benefit or EFLY is calculated and, where appropriate, the measure is weighted to account for differences in the distribution of risks between the study population and the population in which the new predictors will be implemented.
Keywords
Biomarkers Net reclassification improvement (NRI) Area under curve (AUC) Net benefit Event free life years (EFLY) Risk assessment PredictionBackground
Models that calculate the risk of disease are widely used to aid diagnosis and prognosis [1]. Examples of commonly used models include the Framingham Risk score for CVD and the Gail model for breast cancer [2, 3]. However, the predictions provided by these models are not perfect and ways to improve the predictions are frequently proposed. One such method is to include additional predictors in the model [4]. Whether the additional predictors provide better predictions and how this is evaluated has been the subject of numerous articles in recent years [5–8].
If a new predictor is to be added to a prediction model then the benefits of including this new predictor must outweigh the costs; the new predictor must demonstrate clinical utility. Measures of clinical utility that have been proposed include the net benefit and event free life years (EFLYs) [6, 9]. Several authors have suggested that such measures of clinical utility be calculated after the new predictor has demonstrated incremental predictive ability in terms of either an increase in the cstatistic, the continuous version of net reclassification improvement (NRI(>0)) or the categorical NRI [5, 10–12]. This staged approach implies the predictive ability results provide an indication of the likely clinical utility results.
The cstatistic has been criticised as being insensitive to the effect of important new predictors [13]. Therefore it is questionable whether such an insensitive measure is of use in determining which new predictors should then be assessed in terms of clinical utility. The NRI(>0) has been proposed as a better measure of discrimination than the cstatistic when comparing predictors but how the NRI(>0) then relates to the clinical utility measures has not been examined [5]. If the measures of predictive ability do not correlate with the measures of clinical utility then it is doubtful whether they would be helpful in deciding which new predictors should be investigated further.
An additional concern is that these measures may behave differently as the mean risk of the population being studied changes. For example, the cstatistic is largely unaffected by the mean risk in the population whereas measures of reclassification may be affected by where the reclassification cutpoint is set in relation to the distribution of risk in the population [14]. The categorical NRI also implicitly weights the reclassification of cases and noncases by the prevalence in the sample population [15]. The impact of changing cutpoints on net benefit has been examined recently [16], however, less attention has been paid to the situation where the cutpoint is fixed but the mean risk varies across the populations studied, a common situation when new cardiovascular risk predictors are assessed.
In the cardiovascular setting, the application of risk thresholds for treatment has been widely promoted for a number of years in guidelines across the world [17–20]. New predictors of cardiovascular disease have then been assessed using these fixed thresholds but in a wide variety of populations. For example, the Emerging Risk Factors Collaboration has brought together 104 prospective populationbased studies across a number of countries, several of which are from North America [21]. The mean age of these North American cohort studies ranges from 54 to 78, and the percentage of males from 0% to 100% [6]. This indicates a range of mean risks, and differences in the distribution of risks, across studies in which the same threshold for treatment would be applied.
In this paper we examine how the measures of predictive ability (cstatistic, binarystatistic, NRI(0), NRI (with two cutpoints), binary NRI (at the upper cutpoint)) are related to the measures of clinical utility: net benefit and event free life years (EFLY) for assessing the effect of adding a new predictor to a model. We investigate how differences in the mean risk between populations affect these measures using simulated data and also using data from the Framingham Study where the mean risk of CVD differs for men and women [22].
Methods
Measures of predictive ability
For each of the measures we have chosen to calculate them at a common, fixed followup time of ten years consistent with the UK guidelines that apply CVD risk prediction models [17].
where the numbers of true positives, true negatives and those with and without an event were estimated using the KaplanMeier estimates of the proportion surviving at ten years.
Each individual with an event before ten years is paired with every other person, irrespective of their event status. A pair is usable if their observed survival times differ, and the paired person had an event or their censoring time was greater than the survival time for the individual with the event. A usable pair is concordant if the predicted survival time is less for the member of the pair with the shorter observed survival time. A pair is tied if they have the same predicted survival time. People with events after ten years are considered censored at ten years [23].
Measures of reclassification
NRI(>0) and NRI(with two cutpoints) measure the amount of reclassification that occurs when the new predictor is added to a model [24]. The proportion of events and nonevents correctly reclassified (reclassified up and down, respectively) are adjusted by the proportion of events and nonevents incorrectly reclassified.
where n is the total number of people and the subscripts U and D indicate those reclassified up and down. The KaplanMeier estimates at ten years among all people, and those reclassified up or down, provide the probabilities (P).
where $\widehat{\mathit{p}}.,.$ is the proportion of events, and nonevents, that are reclassified up, or down. As the data are censored the number of events and nonevents are estimated from the KaplanMeier estimates at ten years for each of the cells in the reclassification table [25].
Measures of clinical utility
where the number of true positives and false positives are estimated using the KaplanMeier estimates of the percentage surviving at ten years among those with calculated risks greater than the threshold probability; n is the total number of people and p _{ t } is the threshold that defines high risk.
where P is the proportion of those evaluated who are treated, B(T) is the benefit in terms of event free life years gained among those treated and C(T) is the costs for those treated, measured also relative to event free life years. Briefly, individuals with a calculated risk above a treatment threshold are assumed to have their risk reduced by treatment. This reduction in risk leads to a reduction in events and an increase in the total number of event free life years for the population within a given time period (here 10 years). Each gain in EFLY is assumed to have a monetary value. However, there are costs involved in treatment particularly for those who would not have experienced an event within the time period. Assuming a particular cost per EFLY, Rapsomaniki’s method deducts these costs in terms of EFLYs from the benefit obtained from the gain in EFLY of those treated.
As in Rapsomaniki’s paper, we set the reduction in risk due to treatment at 20% which was based on results from a metaanalysis [27]. The cost of treatment, in terms of EFLYs, is calculated assuming that the threshold for treatment is the optimal cutpoint in that benefits match costs at this point. Rapsomaniki and colleagues put a monetary value on this cost by relating it to the cost of one EFLY (£20,000) as proposed by the National Institute for Health and Clinical Excellence (NICE) [28].
Our main analysis focused on the upper cutpoint of 20% risk at ten years which is used in the UK CVD prevention guidelines [17]. We also repeated our analyses using upper cutpoints of 10% and 50%. The lower cutpoint in the calculation of the NRI categorical for these analyses were arbitrarily set at 5% and 25%, respectively.
Simulated data
Where U is a uniform random number between 0 and 1, λ is the baseline hazard rate which was varied to produce datasets with mean risks at ten years distributed between 0% and 100%. The variables x _{ 1 } and x _{ 2 } each had standard Normal distributions and were independent of each other. We carried out separate series of simulations by varying the coefficient β _{ 1 } from a hazard ratio of 1.5 per 1 standard deviation increase (weak baseline model) to 3 (medium baseline model) to 6 (strong baseline model) and by varying coefficient β _{ 2 } of the second covariate to produce hazard ratios of 1.2 (weak predictor), 2 (medium predictor) and 3 (strong predictor). Note the hazard ratios derived from the Framingham dataset for the traditional CVD risk factors of age, SBP and total cholesterol ranged from 1.25 to 2.04, per one standard deviation increase (Additional file 1: Table S1). Since we have assumed a constant hazard ratio across simulated datasets that have different mean risks, the odds ratio calculated for events occurring before ten years will not be constant across the datasets [30]. The estimated odds ratio is similar to the hazard ratio if the mean risk is small, but the odds ratio increasingly overestimates the hazard ratio as the mean risk increases.
We also generated censoring times that followed an exponential distribution with a 10% risk of being censored at ten years. If the censoring time was less than the survival time the observation was considered censored at the censoring time. The proportion censored decreased from 10% to approximately 2.5% as the mean risk increased. Each simulation dataset contained 10,000 observations. We simulated 1,000 datasets for each combination of baseline model and additional predictor.
For each of the simulated datasets the measures of predictive ability and clinical utility were calculated comparing models without and with the second variable. We then plotted the proportion of people classified as high risk (above the upper cutpoint) by each of the two models classified by the mean calculated risk of an event at ten years for that dataset. The mean risk was calculated from the model containing both covariates. We plotted the measures of predictive ability and clinical utility against the mean risk and applied a cubic spline smoother.
Empirical data
We obtained data from the Framingham Heart Study on the people included in the analysis that resulted in the 2008 Framingham risk equation [22]. At the initial visit, blood pressure, serum total cholesterol, HDL, smoking status, diabetes status and use of antihypertensive medication were recorded using standard methods. All study participants were free of prevalent CVD at the initial visit and were under continuous surveillance for the development of cardiovascular events and death. Maximum followup was 12 years.
We fitted two Cox proportional hazards models to the Framingham dataset consisting of the variables that were included in the proposed general CVD risk prediction model [22]. The first model included age, total cholesterol, high density lipoprotein, smoking status, diabetes status and use of antihypertensive medication. The second model included all of these variables, plus systolic blood pressure (SBP). We carried out separate analyses for men and women as the risk of CVD differs between men and women.
We compared the models without SBP and with SBP using the following measures: change in cstatistic, binary cstatistic, NRI(>0), NRI(10%, 20%), binary NRI (20%), net benefit and the event free life years (EFLY). Ninetyfive percent confidence intervals were calculated for these measures using 2000 bootstrap samples. We used a treatment cutpoint of 20% for the calculations of the measures net benefit and EFLYs (and 10%, 20% for the NRI(10%, 20%)) to match current cardiovascular disease (CVD) prevention guidelines [17]. We assumed that for treated people their risk of CVD would be reduced by 20% based on the metaanalysis reported in the paper by Rapsomaniki that introduced the EFLY [6].
Results
Simulated data
The difference in cbinary (which is the average of the differences in sensitivity and specificity) peaked at two points before approaching zero as the mean risk increased. The first peak corresponded to the maximum difference in sensitivity which happened at approximately half of the cutpoint, and the second corresponded to the maximum difference in specificity which happened above the cutpoint.
Empirical data
There were 3969 men and 4522 women included in the Framingham dataset. The mean calculated risk of CVD at 10 years was 15.6% for men and 8.2% for women.
Hazard ratios for the addition of systolic blood pressure to models predicting CVD for men and women in Framingham study
Men  Women  

Base model  Base model + systolic blood pressure  Base model  Base model + systolic blood pressure  
Hazard ratio (95% CI)  Hazard ratio (95% CI)  Hazard ratio (95% CI)  Hazard ratio (95% CI)  
Age (per 10 year increase)  1.94 (1.80, 2.01)  1.80 (1.67, 1.95)  1.83 (1.66, 2.03)  1.56 (1.40, 1.74) 
Total cholesterol (per 40 increase)  1.25 (1.17, 1.35)  1.23 (1.15, 1.32)  1.26 (1.16, 1.37)  1.23 (1.13, 1.33) 
HDL (per 10 increase)  0.83 (0.77, 0.88)  0.82 (0.77, 0.87)  0.88 (0.82, 0.94)  0.88 (0.83, 0.94) 
Hypertensive medication  1.72 (1.41, 2.08)  1.45 (1.18, 1.77)  1.76 (1.41, 2.19)  1.31 (1.04, 1.65) 
Current smoker  1.91 (1.64, 2.22)  1.93 (1.66, 2.24)  1.71 (1.40, 2.07)  1.72 (1.42, 2.09) 
Diabetes  1.89 (1.53, 2.34)  1.77 (1.42, 2.19)  2.14 (1.60, 2.86)  2.07 (1.55, 2.77) 
Systolic blood pressure (per 20mmHg increase)  1.30 (1.20, 1.41)  1.48 (1.34, 1.62) 
Change in measures with addition of systolic blood pressure to models predicting cardiovascular disease for men and women
Men  Women  

Base Model  Base Model + SBP  Difference  Base Model  Base Model + SBP  Difference (95% CI)  
Sensitivity  0.575  0.585  0.010 (0.17, 0.033)  0.280  0.344  0.064 (0.006, 0.099) 
Specificity  0.729  0.734  0.005 (0.003, 0.016)  0.921  0.909  0.013 (0.020, 0.003) 
cbinary  0.652  0.655  0.008 (005, 0.018)  0.601  0.626  0.025 (0.000, 0.041) 
cstatistic  0.751  0.758  0.007 (0.003, 0.012)  0.766  0.782  0.016 (0.009, 0.024) 
NRI binary*  0.015 (0.009, 0.036)  0.051 (0.000, 0.082)  
NRI continuous*  0.170 (0.071, 0.267)  0.306 (0.176, 0.419)  
NRI categorical*  0.028 (0.009, 0.063)  0.091(0.010, 0.129)  
Net benefit difference (per 1000 evaluated)  44.0  47.5  3.5 (1.6, 7.7)  8.7  12.1  3.3 (1.3, 6.4) 
Event Free Life Years difference (per 1000 evaluated)  32.1  34.1  2.0 (1.8, 5.4)  3.8  7.5  3.6 (0.4, 6.3) 
Discussion
We have described how the measures of predictive ability, reclassification and clinical utility used to assess a new predictor in a model depend upon the mean risk of the population. We have also demonstrated that the reclassification measures exhibit a different relationship with the mean risk than the clinical utility measures. The continuous NRI increases with increasing mean risk; the NRI categorical with two cutpoints often peaks at two points; whereas the net Benefit and EFLY peak once close to the cutpoint and then generally decrease to zero as the mean risk increases.
In the Framingham Study the mean risk of CVD was higher for men than for women, and also closer to the upper cutpoint of 20%. Based on this, we may have expected the measures of predictive ability, reclassification and clinical utility to be higher among men. However the hazard ratio for systolic blood pressure when it was added to the model was higher for women compared to men, and this compensated for the lower mean risk among women. In a recent review of several new predictors of cardiovascular disease, Paynter and colleagues have also highlighted that results may differ between men and women due to differences in effect sizes of new predictors as well as the strength of the baseline model and the mean risk in the study sample [31].
In our simulations we observed that as the mean risk increased the NRI(>0), and the change in the cstatistic, also increased. In the paper that introduced the NRI(>0) Pencina suggested that one of the benefits of this measure was that it was not affected by the event rates in the population [24]. Our simulations, where we assumed a constant hazard rate, indicate that the NRI(>0) increases as the event rate (the mean risk) increases for event rates above the cutpoint. The NRI(>0), as with the change in the cstatistic, is unaffected by event rates only if the odds ratio does not vary. However, as we have demonstrated, if the hazard ratio is assumed to be the same in populations with different event rates (a common assumption in cohort studies of cardiovascular outcomes) then the NRI(>0) will increase with increasing event rate.
In our simulations, when the mean risk in the population was less than the cutpoint the measures of reclassification and clinical utility were generally consistent with each other and increased as the mean risk increased. However, beyond this cutpoint the measures diverged. The reclassification measures continued to increase while the clinical utility measures decreased, although the NRI binary and NRI(with two cutpoints) did eventually decrease. Similar patterns were also observed by Van Calster and others when they varied the cutpoint and assumed a fixed mean risk; as the cutpoint moved away from mean risk the reclassification measures provided a more optimistic view of the new predictor compared to that provided by the difference in net benefit [16].
The clinical utility measures, difference in EFLY and difference in Net Benefit, achieved a maximum value at approximately the point where the threshold for treatment equaled the mean risk in the population, as expected [32]. However, we observed a divergence in the clinical utility measures in our simulations as the mean risk increased. This is attributable to differences between the two measures in terms of how benefits and costs are counted and the weights given to benefits and costs in populations with different mean risks.
When a new predictor is added to a model, the difference in EFLY is measured in terms of event free life years. An event free life year gained has the same value whether it occurs in a high risk or low risk population. In contrast, the difference in Net Benefit is measured in units of true positives, adjusted for false positives, with the weighting of false positives relative to true positives determined by the cutpoint defining high risk. However, the actual value of a true positive will differ in populations with different mean risks since the number of event free life years gained will be greater for an individual from a high risk population compared to a low risk population. Also, a false positive will have a greater cost in a low risk population than a high risk population as the survival time, and hence, treatment time, will be greater.
Although there are issues in using the Net Benefit when accounting for costs and benefits over a specific time period, there are also issues in the calculation of costs and benefits for the EFLY. Possible heterogeneity in treatment effects across patient subgroups is not accounted for in the EFLY. Also, the calculation of the EFLY assumes that the chosen cutpoint is the ‘optimal’ cutpoint in that costs equal benefits at this point; the cost of treatment, in terms of event free life years, is then calculated based on this assumption. Rapsomaniki and colleagues acknowledge that many factors, other than the costs and benefits they account for in their EFLY calculations, are considered when a particular cutpoint is chosen [6]. However, their assumption avoids the problem of an irrational choice of cutpoint resulting in a poorer model being favoured [6].
In previous papers the relationship between choice of cutpoint and the measures of reclassification and the difference in Net Benefit has been described when the mean risk in the population is fixed [14, 16, 33]. We observed similar results when the mean risk in the population varies but the cutpoint is fixed. The scenario we have described is the one more commonly encountered in the evaluation of new predictors of cardiovascular events. For example, the Emerging Risk factor Collaboration (EFRC) brings together several cohort studies from the same country which have different mean risks but where the same guidelines and cutpoints for defining high risk would apply. As each of the measures we have examined are in some way affected by the mean risk in the study population this must be taken into account when comparisons are made between different studies whose mean risk varies, or when the mean risk in the study population differs from the population in which a new predictor will ultimately be implemented.
A number of methods have been proposed to allow for these differences. Where the study data arise from a matched case control study Pepe has proposed a method for calculating an adjusted cstatistic that takes into account the greater similarity in risk between cases and controls that arises from matching [34]. The ERFC applied agesex specific measures of reclassification observed in their study population to the standard European population to estimate the amount of reclassification that would occur in this standard population [35, 36]. However, this relies upon having a large enough study population to provide reliable estimates of reclassification in each agesex stratum. If the data arise from a case control study, Rousson suggests reweighting the proportions of cases and controls to match the proportions found in the parent population [37].
Conclusion
There have been a number of recent recommendations regarding which measures of predictive ability should be reported [5, 10, 11]. Measures such as the cstatistic and the measures of reclassification do not capture the consequences of implementing a prediction model. Hence, we do not recommend their use in evaluating which new predictor may prove to be clinically useful in a particular population as these measures assess model fit rather than clinical utility. We recommend that a measure such as net benefit is calculated and the results adjusted to allow for the difference in the mean risk between the study population and the population in which the new predictor will be implemented. If benefits and costs are to be measured over a specific time period a measure such as the EFLY should be used which accounts for the different costs and benefits that would be accrued over time in populations with different mean risks.
Abbreviations
 NRI:

Net reclassification improvement
 ROC:

Receiver operator characteristic
 AUC:

Area under the curve
 EFLY:

Event free life years
 SBP:

Systolic blood pressure
 CVD:

Cardiovascular disease.
Declarations
Acknowledgements
This work was partly funded by the National Health and Medical Research Council (NHMRC) program grant 633003 to the Screening and Test Evaluation Program.
Authors’ Affiliations
References
 Moons KG, Royston P, Vergouwe Y, Grobbee DE, Altman DG: Prognosis and prognostic research: what, why, and how?. BMJ. 2009, 338: b375View ArticlePubMedGoogle Scholar
 Kannel WB, D'Agostino RB, Sullivan L, Wilson PW: Concept and usefulness of cardiovascular risk profiles. Am Heart J. 2004, 148 (1): 1626.View ArticlePubMedGoogle Scholar
 Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ: Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989, 81 (24): 18791886.View ArticlePubMedGoogle Scholar
 Helfand M, Buckley DI, Freeman M, Fu R, Rogers K, Fleming C, Humphrey LL: Emerging risk factors for coronary heart disease: a summary of systematic reviews conducted for the U.S. Preventive Services Task Force. Ann Intern Med. 2009, 151 (7): 496507.View ArticlePubMedGoogle Scholar
 Pencina MJ, D'Agostino RB, Pencina KM, Janssens AC, Greenland P: Interpreting incremental value of markers added to risk prediction models. Am J Epidemiol. 2012, 176 (6): 473481.View ArticlePubMedPubMed CentralGoogle Scholar
 Rapsomaniki E, White IR, Wood AM, Thompson SG: A framework for quantifying net benefits of alternative prognostic models. Stat Med. 2012, 31 (2): 114130.View ArticlePubMedGoogle Scholar
 Cook NR, Paynter NP: Performance of reclassification statistics in comparing risk prediction models. Biom J. 2011, 53 (2): 237258.View ArticlePubMedPubMed CentralGoogle Scholar
 Pepe MS: Problems with risk reclassification methods for evaluating prediction models. Am J Epidemiol. 2011, 173 (11): 13271335.View ArticlePubMedPubMed CentralGoogle Scholar
 Vickers AJ, Elkin EB: Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006, 26 (6): 565574.View ArticlePubMedPubMed CentralGoogle Scholar
 Hlatky MA, Hlatky MA, Greenland P, Arnett DK, Ballantyne CM, Criqui MH, Elkind MS, Go AS, Harrell FE, Hong Y, Howard BV, Howard VJ, Hsue PY, Kramer CM, McConnell JP, Normand SL, O'Donnell CJ, Smith SC, Wilson PW: Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation. 2009, 119 (17): 24082416.View ArticlePubMedPubMed CentralGoogle Scholar
 Steyerberg EW, Pencina MJ, Lingsma HF, Kattan MW, Vickers AJ, Van Calster B: Assessing the incremental value of diagnostic and prognostic markers: a review and illustration. Eur J Clin Invest. 2012, 42 (2): 216228.View ArticlePubMedGoogle Scholar
 Leening MJ, Vedder MM, Witteman JC, Pencina MJ, Steyerberg EW: Net reclassification improvement: computation, interpretation, and controversies: a literature review and clinician's guide. Ann Intern Med. 2014, 160 (2): 122131.View ArticlePubMedGoogle Scholar
 Cook NR: Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007, 115 (7): 928935.View ArticlePubMedGoogle Scholar
 Mihaescu R, van Zitteren M, van Hoek M, Sijbrands EJ, Uitterlinden AG, Witteman JC, Hofman A, Hunink MG, van Duijn CM, Janssens AC: Improvement of risk prediction by genomic profiling: reclassification measures versus the area under the receiver operating characteristic curve. Am J Epidemiol. 2010, 172 (3): 353361.View ArticlePubMedGoogle Scholar
 Greenland S: The need for reorientation toward costeffective prediction: comments on 'evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond' by M. J. Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929). Stat Med. 2008, 27 (2): 199206.View ArticlePubMedGoogle Scholar
 Van Calster B, Steyerberg EW, D'Agostino RB, Pencina MJ: Sensitivity and specificity can change in opposite directions when new predictive markers are added to risk models. Med Decis Making. 2014, 34 (4): 513522.View ArticlePubMedGoogle Scholar
 JBS 2: Joint British Societies' guidelines on prevention of cardiovascular disease in clinical practice. Heart. 2005, 91 (Suppl 5): v1v52.Google Scholar
 National Heart Foundation of New Zealand, Stroke Foundation of New Zealand, New Zealand Ministry of Health, New Zealand Guidelines Group: The assessment and management of cardiovascular risk. 2003, Wellington, NZ: New Zealand Guidelines Group, 190Google Scholar
 National Cholesterol Education Program (U.S.). Expert Panel on Detection Evaluation and Treatment of High Blood Cholesterol in Adults: Third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (adult treatment panel III): final report. NIH publication; no. 02–52152002. 2002, Bethesda, Md: National Cholesterol Education Program, National Heart, Lung, and Blood Institute, National Institutes of Health. 1 v. (various pagings)Google Scholar
 Graham I, Atar D, BorchJohnsen K, Boysen G, Burell G, Cifkova R, Dallongeville J, De Backer G, Ebrahim S, Gjelsvik B, HerrmannLingen C, Hoes A, Humphries S, Knapton M, Perk J, Priori SG, Pyorala K, Reiner Z, Ruilope L, SansMenendez S, Op Reimer WS, Weissberg P, Wood D, Yarnell J, Zamorano JL: European guidelines on cardiovascular disease prevention in clinical practice: executive summary. Atherosclerosis. 2007, 194 (1): 145.View ArticlePubMedGoogle Scholar
 Danesh J, Erqou S, Walker M, Thompson SG, Tipping R, Ford C, Pressel S, Walldius G, Jungner I, Folsom AR, Chambless LE, Knuiman M, Whincup PH, Wannamethee SG, Morris RW, Willeit J, Kiechl S, Santer P, Mayr A, Wald N, Ebrahim S, Lawlor DA, Yarnell JW, Gallacher J, Casiglia E, Tikhonoff V, Nietert PJ, Sutherland SE, Bachman DL, Keil JE: The emerging risk factors collaboration: analysis of individual data on lipid, inflammatory and other markers in over 1.1 million participants in 104 prospective studies of cardiovascular diseases. Eur J Epidemiol. 2007, 22 (12): 839869.View ArticlePubMedGoogle Scholar
 D'Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, Kannel WB: General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008, 117 (6): 743753.View ArticlePubMedGoogle Scholar
 Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15 (4): 361387.View ArticlePubMedGoogle Scholar
 Pencina MJ, D'Agostino RB, Steyerberg EW: Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011, 30 (1): 1121.View ArticlePubMedGoogle Scholar
 Steyerberg EW, Pencina MJ: Reclassification calculations for persons with incomplete followup. Ann Intern Med. 2010, 152 (3): 195196. author reply 196–7View ArticlePubMedGoogle Scholar
 Vickers AJ, Cronin AM, Elkin EB, Gonen M: Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Mak. 2008, 8: 53View ArticlePubMedPubMed CentralGoogle Scholar
 Baigent C, Keech A, Kearney PM, Blackwell L, Buck G, Pollicino C, Kirby A, Sourjina T, Peto R, Collins R, Simes R: Efficacy and safety of cholesterollowering treatment: prospective metaanalysis of data from 90,056 participants in 14 randomised trials of statins. Lancet. 2005, 366 (9493): 12671278.View ArticlePubMedGoogle Scholar
 National Institute for Health and Clinical Excellence: Social Value Judgements: Principles for the Development of NICE Guidance, 2nd edition. London: National Institute for Health and Clinical Excellence. 2008, Available from: http://www.nice.org.uk/Media/Default/About/whatwedo/Researchanddevelopment/SocialValueJudgementsprinciplesforthedevelopmentofNICEguidance.pdf. Accessed 8 July 2014Google Scholar
 Bender R, Augustin T, Blettner M: Generating survival times to simulate Cox proportional hazards models. Stat Med. 2005, 24 (11): 17131723.View ArticlePubMedGoogle Scholar
 Rothman KJ, Greenland S, Lash TL: Modern epidemiology. 2008, Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins, 7583Google Scholar
 Paynter NP, Everett BM, Cook NR: Cardiovascular disease risk prediction in women: is there a role for novel biomarkers?. Clin Chem. 2014, 60 (1): 8897.View ArticlePubMedGoogle Scholar
 Baker SG, Cook NR, Vickers A, Kramer BS: Using relative utility curves to evaluate risk prediction. J R Stat Soc Ser A Stat Soc. 2009, 172 (4): 729748.View ArticlePubMedPubMed CentralGoogle Scholar
 Van Calster B, Vickers AJ, Pencina MJ, Baker SG, Timmerman D, Steyerberg EW: Evaluation of markers and risk prediction models: overview of relationships between NRI and decisionanalytic measures. Med Decis Making. 2013, 33 (4): 490501.View ArticlePubMedGoogle Scholar
 Pepe MS, Fan J, Seymour CW: Estimating the receiver operating characteristic curve in studies that match controls to cases on covariates. Acad Radiol. 2013, 20 (7): 863873.View ArticlePubMedPubMed CentralGoogle Scholar
 Di Angelantonio E, Gao P, Pennells L, Kaptoge S, Caslake M, Thompson A, Butterworth AS, Sarwar N, Wormser D, Saleheen D, Ballantyne CM, Psaty BM, Sundstrom J, Ridker PM, Nagel D, Gillum RF, Ford I, Ducimetiere P, Kiechl S, Koenig W, Dullaart RP, Assmann G, D'Agostino RB, Dagenais GR, Cooper JA, Kromhout D, Onat A, Tipping RW, GomezdelaCamara A, Rosengren A: Lipidrelated markers and cardiovascular disease prediction. JAMA. 2012, 307 (23): 24992506.PubMedGoogle Scholar
 Kaptoge S, Di Angelantonio E, Pennells L, Wood AM, White IR, Gao P, Walker M, Thompson A, Sarwar N, Caslake M, Butterworth AS, Amouyel P, Assmann G, Bakker SJ, Barr EL, BarrettConnor E, Benjamin EJ, Bjorkelund C, Brenner H, Brunner E, Clarke R, Cooper JA, Cremer P, Cushman M, Dagenais GR, D'Agostino RB, Dankner R, DaveySmith G, Deeg D, Dekker JM: Creactive protein, fibrinogen, and cardiovascular disease prediction. N Engl J Med. 2012, 367 (14): 13101320.View ArticlePubMedGoogle Scholar
 Rousson V, Zumbrunn T: Decision curve analysis revisited: overall net benefit, relationships to ROC curve analysis, and application to case–control studies. BMC Med Inform Decis Mak. 2011, 11: 45View ArticlePubMedPubMed CentralGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/14/86/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.