Skip to main content

Advertisement

Table 1 Summary of the performance measures

From: Review and evaluation of performance measures for survival prediction models in external validation settings

Types of Measures Measures Characteristics Range and Interpretation Software
Overall Performance R2 BS Assesses relative gain in predictive accuracy quantified using at a specific time point based on squared error loss function. Range: 0 to 1 Interpretation: % gain in predictive accuracy at a single time point relative to the null model. Available in SAS and R and easy to implement in other software
R2 IBS Same approach as R2 BS but provides a summary over a range of time period. Range: same as R2 BS Interpretation: % gain in predictive accuracy over a range of time period relative to the null model. Available in SAS and R and easy to implement in other software
R2 SH Assesses relative gain in predictive accuracy quantified based on absolute error loss function. It is not robust to model mis-specification. Same as R2 IBS Available in SAS and R and easy to implement in other software
R2 S Modified version of R2 SH which is robust to model mis-specification. Same as R2 IBS Available in SAS and R and easy to implement other software
R2 PM Measures the variation in the outcome explained by the covariates in the model. Assume that the model is correctly specified. Requires re-calibration in the validation data. Range: 0 to 1 Interpretation: % of explained variation by the model. Easy to implement in any software
R2 D Measures the relative gain in prognostic separation quantified by the D statistic. Assume that the PI is normally distributed. Range: 0 to 1 Interpretation: % of prognostic separation explained by the model. Available in Stata and easy to implement in other software
Discrimination CH Rank order statistic based on usable pairs in which shorter time corresponds to an event. Range: 0.5 to 1 Interpretation: probability of correct ordering for a randomly selected pair of subjects. Available in R and Stata and easy to implement in software
CU Rank order statistic based on usable pairs. Inverse probability weighting is used to compensate for censoring. Same as CH. Available in R and easy to implement in other software
CGH Rank order statistic based on all patient pairs. Assumes that Cox PH model is correctly specified.Requires re-calibration in the validation data. Same as CH. Available in R and Stata and easy to implement in other software
D Quantifies the observed separation between low and high risk groups. Assumes that PI is normally distributed. Range: 0 to ∞ Interpretation: log hazard ratio between two equal sized prognostic groups fromed by dichotomising the PI at its median.. Available in Stata and easy to implement in other software
Calibration Cal Slope Regression slope of the PI and assesses the agreement between the observed and predicted survival.. Range: −∞ to ∞ Interpretation: a value of 1 suggests perfect calibration and a value much lower than 1 suggest overfitting. Easy to implement in any software