Skip to main content

Table 1 Overview of the datasets used in the studies on normalization and PCA. The following information is given: accession number, number of observations, number of variables, proportion of observations in the smaller class, data type

From: A measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization

Study

Label/

Num. of

Num. of

Prop. smaller

Data type

ID

 

acc. number

observ.

variables

class

  

Normalization

E-GEOD-10320

100

22283

0.42

transcription

1

Normalization

E-GEOD-47552

74

32321

0.45

transcription

2

Normalization

E-GEOD-25639

57

54675

0.46

transcription

3

Normalization

E-GEOD-29044

54

54675

0.41

transcription

4

Normalization

E-MTAB-57

47

22283

0.47

transcription

5

Normalization

E-GEOD-19722

46

54675

0.39

transcription

6

Normalization

E-MEXP-3756

40

54675

0.50

transcription

7

Normalization

E-GEOD-34465

26

32321

0.35

transcription

8

Normalization

E-GEOD-30174

20

54675

0.50

transcription

9

Normalization

E-GEOD-39683

20

32321

0.40

transcription

10

Normalization

E-GEOD-40744

20

20706

0.50

transcription

11

Normalization

E-GEOD-46053

20

54675

0.40

transcription

12

PCA

E-GEOD-37582

121

48766

0.39

transcription

13

PCA

ProstatecTranscr

102

12625

0.49

transcription

14

PCA

GSE20189

100

22277

0.49

transcription

15

PCA

E-GEOD-57285

77

27578

0.45

DNA methyl.

16

PCA

E-GEOD-48153

71

23232

0.48

proteomic

17

PCA

E-GEOD-42826

68

47323

0.24

transcription

18

PCA

E-GEOD-31629

62

13737

0.35

transcription

19

PCA

E-GEOD-33615

60

45015

0.35

transcription

20

PCA

E-GEOD-39046

57

392

0.47

transcription

21

PCA

E-GEOD-32393

56

27578

0.41

DNA methyl.

22

PCA

E-GEOD-42830

55

47323

0.31

transcription

23

PCA

E-GEOD-39345

52

22184

0.38

transcription

24

PCA

GSE33205

50

22011

0.50

transcription

25

PCA

E-GEOD-36769

50

54675

0.28

transcription

26

PCA

E-GEOD-43329

48

887

0.40

transcription

27

PCA

E-GEOD-42042

47

27578

0.49

DNA methyl.

28

PCA

E-GEOD-25609

41

1145

0.49

transcription

29

PCA

GSE37356

36

47231

0.44

transcription

30

PCA

E-GEOD-49641

36

33297

0.50

transcription

31

PCA

E-GEOD-37965

30

485563

0.50

DNA methyl.

32

  1. ArrayExpress accession numbers have the prefix E-GEOD-, NCBI GEO accession numbers have the prefix GSE