Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction

Table 4 Results of the permutation study

Colon	A	B	C	D	E	F
KNN	0.33	0.36	0.37	0.38	0.41	0.45
LDA	0.40	-	0.43	0.43	-	0.46
FDA	0.42	-	0.44	0.47	-	0.48
DLDA	0.36	-	0.41	0.42	-	0.44
PLSLDA	0.34	0.35	0.37	0.37	0.42	0.43
NNET	0.34	-	0.35	0.35	-	0.36
RF	0.40	0.40	-	-	-	0.42
SVM	0.37	-	-	-	-	0.37
PAM	0.36	-	-	-	-	0.36
L ₂	0.44	-	-	-	-	0.44
All	0.31	0.32	0.33	0.33	0.34	0.43
Prostate	A	B	C	D	E	Baseline
KNN	0.43	0.45	0.45	0.47	0.50	0.52
LDA	0.46	-	0.47	0.50	-	0.51
FDA	0.45	-	0.47	0.49	-	0.49
DLDA	0.46	-	0.49	0.49	-	0.51
PLSLDA	0.44	0.46	0.47	0.49	0.51	0.52
NNET	0.46	-	0.49	0.47	-	0.52
RF	0.52	0.54	-	-	-	0.54
SVM	0.57	-	-	-	-	0.57
PAM	0.54	-	-	-	-	0.54
^L2	0.52	-	-	-	-	0.52
All	0.41	0.42	0.43	0.44	0.46	0.52

Colon data set [17] and prostate data set [18], with variable selection (if any) based on the t-statistic. Approach A: Minimal error rate over the different tuning parameter values (k = 1, 3, 5 for KNN, ncomp = 2, 3 for PLSLDA, mtry = , , , for RF), different numbers of genes and different gene selection methods (median over the 20 runs). Approach B: Minimal error rate over the different numbers of genes and different gene selection methods (median over the 20 runs). Approach C: Minimal error rate over the different tuning parameter values (k = 1, 3, 5 for KNN, ncomp = 2, 3 for PLSLDA, mtry = , , , for RF) and different gene selection methods (median over the 20 runs). Approach D: Minimal error rate over the different tuning parameter values (k = 1, 3, 5 for KNN, ncomp = 2, 3 for PLSLDA, mtry = , , , for RF) and different numbers of genes (median over the 20 runs). Approach E: Minimal error rate over the different tuning parameter values (k = 1, 3, 5 for KNN, ncomp = 2, 3 for PLSLDA, mtry = , , , for RF) (median over the 20 runs). Approach F: Median of all 124 × 20 calculated error rates.

ISSN: 1471-2288