Skip to main content

Table 4 Results of the permutation study

From: Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction

Colon

A

B

C

D

E

F

KNN

0.33

0.36

0.37

0.38

0.41

0.45

LDA

0.40

-

0.43

0.43

-

0.46

FDA

0.42

-

0.44

0.47

-

0.48

DLDA

0.36

-

0.41

0.42

-

0.44

PLSLDA

0.34

0.35

0.37

0.37

0.42

0.43

NNET

0.34

-

0.35

0.35

-

0.36

RF

0.40

0.40

-

-

-

0.42

SVM

0.37

-

-

-

-

0.37

PAM

0.36

-

-

-

-

0.36

L 2

0.44

-

-

-

-

0.44

All

0.31

0.32

0.33

0.33

0.34

0.43

Prostate

A

B

C

D

E

Baseline

KNN

0.43

0.45

0.45

0.47

0.50

0.52

LDA

0.46

-

0.47

0.50

-

0.51

FDA

0.45

-

0.47

0.49

-

0.49

DLDA

0.46

-

0.49

0.49

-

0.51

PLSLDA

0.44

0.46

0.47

0.49

0.51

0.52

NNET

0.46

-

0.49

0.47

-

0.52

RF

0.52

0.54

-

-

-

0.54

SVM

0.57

-

-

-

-

0.57

PAM

0.54

-

-

-

-

0.54

L2

0.52

-

-

-

-

0.52

All

0.41

0.42

0.43

0.44

0.46

0.52

  1. Colon data set [17] and prostate data set [18], with variable selection (if any) based on the t-statistic. Approach A: Minimal error rate over the different tuning parameter values (k = 1, 3, 5 for KNN, ncomp = 2, 3 for PLSLDA, mtry = , , , for RF), different numbers of genes and different gene selection methods (median over the 20 runs). Approach B: Minimal error rate over the different numbers of genes and different gene selection methods (median over the 20 runs). Approach C: Minimal error rate over the different tuning parameter values (k = 1, 3, 5 for KNN, ncomp = 2, 3 for PLSLDA, mtry = , , , for RF) and different gene selection methods (median over the 20 runs). Approach D: Minimal error rate over the different tuning parameter values (k = 1, 3, 5 for KNN, ncomp = 2, 3 for PLSLDA, mtry = , , , for RF) and different numbers of genes (median over the 20 runs). Approach E: Minimal error rate over the different tuning parameter values (k = 1, 3, 5 for KNN, ncomp = 2, 3 for PLSLDA, mtry = , , , for RF) (median over the 20 runs). Approach F: Median of all 124 × 20 calculated error rates.