Skip to main content

Table 4 Number of samples in each class in training and test sets

From: A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance

Disease

Prevalence

Training Set

Test Set

Disease Presence Presence

Disease Absence

Disease Presence

Disease Absence

Hypertriglyceridemia

5%

50

878

17

292

Venous Insufficiency

7%

62

865

21

289

Asthma

13%

123

805

41

268

Gout

13%

120

808

40

269

OSA

14%

129

799

43

266

PVD

15%

135

793

45

264

Gallstones

15%

141

787

47

262

OA

18%

168

760

56

253

GERD

20%

184

743

62

248

Depression

20%

187

741

62

247

Obesity

40%

374

554

125

184

CHF

43%

402

526

134

175

Hypercholesterolemia

47%

432

496

144

165

CAD

55%

512

416

170

139

Diabetes

66%

616

312

205

104

Hypertension

73%

677

250

226

84