Skip to main content

Table 2 Imputation errors of different methods in HT-data

From: Generative adversarial networks for imputing missing data for big data clinical research

 

Skewness or proportion of minority class

MICE

missForest

GAIN

Missingness rate = 20%

Continuous variables

  Age, years

−0.018

0.063 ± 0.002

0.049 ± 0.001 a,b

0.057 ± 0.004 a

  SBP

0.492

0.075 ± 0.001

0.058 ± 0.000 a

0.048 ± 0.000 a,c

  Charlson index

0.146

0.154 ± 0.002

0.121 ± 0.001 a,b

0.144 ± 0.003 a

  TC/HDL-C ratio

3.139

0.175 ± 0.003

0.137 ± 0.001 a

0.115 ± 0.001 a,c

  Hospital admission times

7.037

2.379 ± 0.069

1.885 ± 0.042 a

1.752 ± 0.141 a,c

Categorical variables

  Smoking

7.45%

0.133 ± 0.007

0.123 ± 0.003 a

0.098 ± 0.010 a,c

  Hypertensive drugs

8.10%

0.149 ± 0.006

0.126 ± 0.003 a

0.098 ± 0.002 a,c

  Lipid Lowering drugs

9.99%

0.173 ± 0.007

0.159 ± 0.003 a

0.129 ± 0.006 a,c

  Overweight

37.89%

0.433 ± 0.01

0.400 ± 0.005 a

0.359 ± 0.003 a,c

  Sex

41.21%

0.448 ± 0.019

0.412 ± 0.004 a

0.405 ± 0.022 a

Missingness rate = 50%

Continuous variables

  Age, years

−0.018

0.129 ± 0.002

0.102 ± 0.001 a

0.094 ± 0.007 a,c

  SBP

0.492

0.115 ± 0.001

0.095 ± 0.001 a

0.080 ± 0.002 a

  Charlson index

0.146

0.295 ± 0.001

0.239 ± 0.002 a

0.241 ± 0.009 a

  TC/HDL-C ratio

3.139

0.279 ± 0.004

0.235 ± 0.003 a

0.183 ± 0.002 a,c

  Hospital admission times

7.037

3.766 ± 0.12

3.199 ± 0.057 a

3.004 ± 0.246 a,c

Categorical variables

  Smoking

7.45%

0.335 ± 0.006

0.277 ± 0.015 a

0.267 ± 0.012 a,c

  Hypertensive drugs

8.10%

0.368 ± 0.014

0.305 ± 0.004 a

0.276 ± 0.005 a,c

  Lipid Lowering drugs

9.99%

0.441 ± 0.015

0.319 ± 0.006 a

0.304 ± 0.009 a,c

  Overweight

37.89%

1.135 ± 0.018

1.029 ± 0.019 a

0.850 ± 0.020 a,c

  Sex

41.21%

1.149 ± 0.02

1.050 ± 0.013 a

1.007 ± 0.055 a

  1. Notes
  2. SBP Systolic Blood Pressure, TC Total Cholesterol, HDL-C High-Density Lipoprotein Cholesterol
  3. Since NRMSE and PFC both followed normal distribution (Shapiro-Wilk normality test p value > 0.05), imputation errors of different methods were compared using one-way ANOVA. If p < 0.05, paired methods were compared using independent sample t-test;
  4. aThe mean imputation error is significantly lower than that of MICE (p < 0.05)
  5. bThe mean imputation error is significantly lower than that of GAIN (p < 0.05)
  6. cThe mean imputation error is significantly lower than that of missForest (p < 0.05)