Skip to main content

Table 1 Imputation errors of different methods in DM-data

From: Generative adversarial networks for imputing missing data for big data clinical research

 

Skewness or proportion of minority class

Missingness rate = 20%

Missingness rate = 50%

MICE

missForest

GAIN

MICE

missForest

GAIN

Continuous variables

 Age, years

−0.106

0.078 ± 0.002

0.060 ± 0.001 a,b

0.069 ± 0.002 a

0.137 ± 0.002

0.107 ± 0.001 a

0.111 ± 0.002 a

 SBP, mmHg

0.316

0.052 ± 0.001

0.041 ± 0.002 a,b

0.048 ± 0.002 a

0.099 ± 0.001

0.082 ± 0.002 a

0.080 ± 0.002 a

 DBP, mmHg

0.154

0.070 ± 0.002

0.052 ± 0.002 a,b

0.056 ± 0.002 a

0.120 ± 0.002

0.094 ± 0.001 a

0.090 ± 0.001 a,c

 LDL-C, mmol/L

0.379

0.095 ± 0.003

0.075 ± 0.003 a,b

0.089 ± 0.003 a

0.208 ± 0.003

0.163 ± 0.004 a

0.161 ± 0.004 a

 BMI, kg/m2

0.813

0.064 ± 0.003

0.048 ± 0.002 a

0.048 ± 0.003 a

0.120 ± 0.004

0.095 ± 0.002 a

0.090 ± 0.005 a,c

 Waist, cm

0.299

0.047 ± 0.002

0.036 ± 0.001 a

0.036 ± 0.001 a

0.088 ± 0.002

0.069 ± 0.001 a

0.067 ± 0.003 a

 TC, mmol/L

0.564

0.065 ± 0.003

0.050 ± 0.002 a,b

0.055 ± 0.003 a

0.140 ± 0.003

0.110 ± 0.003 a

0.102 ± 0.004 a,c

 DM duration, years

−1.167

0.284 ± 0.007

0.206 ± 0.006 a

0.190 ± 0.006 a,c

0.451 ± 0.01

0.340 ± 0.006 a

0.304 ± 0.012 a,c

 eGFR, ml/min/1.73 m2

1.368

0.089 ± 0.010

0.057 ± 0.004 a,b

0.087 ± 0.006

0.195 ± 0.012

0.159 ± 0.012 a

0.146 ± 0.015 a,c

 HbA1c, %

1.557

0.106 ± 0.004

0.077 ± 0.002 a

0.078 ± 0.004 a

0.177 ± 0.007

0.138 ± 0.004 a

0.125 ± 0.004 a,c

 HDL-C, mmol/L

2.729

0.132 ± 0.016

0.111 ± 0.014 a

0.115 ± 0.011 a

0.251 ± 0.011

0.197 ± 0.014 a

0.184 ± 0.015 a,c

 TG, mmol/L

3.932

0.287 ± 0.027

0.251 ± 0.022 a

0.266 ± 0.027

0.610 ± 0.027

0.486 ± 0.023 a

0.444 ± 0.026 a,c

 Creatinine, μmol/L

4.128

0.093 ± 0.011

0.089 ± 0.016

0.068 ± 0.015 a,c

0.218 ± 0.013

0.177 ± 0.019 a

0.169 ± 0.015 a,c

 Fasting glucose, mmol/L

4.681

0.178 ± 0.043

0.121 ± 0.008 a

0.118 ± 0.007 a,c

0.277 ± 0.024

0.214 ± 0.011 a

0.195 ± 0.010 a,c

 Urine ACR, mg/mmol

11.450

2.509 ± 0.441

1.728 ± 0.307 a

1.554 ± 0.266 a,c

3.843 ± 0.405

2.987 ± 0.240 a

2.690 ± 0.258 a,c

Categorical variables

 Lipid drug usage

8.50%

0.162 ± 0.013

0.093 ± 0.010 a

0.083 ± 0.009 a,c

0.159 ± 0.006

0.090 ± 0.004 a

0.079 ± 0.005 a,c

 Smoker

10.57%

0.176 ± 0.014

0.113 ± 0.010 a

0.094 ± 0.009 a,c

0.182 ± 0.013

0.122 ± 0.007 a

0.097 ± 0.008 a,c

 DM treatment

10.50%

0.179 ± 0.013

0.115 ± 0.009 a

0.095 ± 0.009 a,c

0.187 ± 0.011

0.120 ± 0.006 a

0.096 ± 0.003 a,c

 Hypertension drug usage

29.68%

0.318 ± 0.020

0.256 ± 0.015 a

0.267 ± 0.016 a

0.345 ± 0.01

0.281 ± 0.011 a

0.274 ± 0.013 a,c

 Sex

45.93%

0.205 ± 0.020

0.126 ± 0.009 a,b

0.235 ± 0.027

0.353 ± 0.011

0.276 ± 0.01 a

0.287 ± 0.014 a

 Hypertension history

47.190%

0.122 ± 0.011

0.077 ± 0.008 a,b

0.129 ± 0.040

0.255 ± 0.012

0.201 ± 0.019 a

0.215 ± 0.017 a

  1. Notes
  2. SBP Systolic Blood Pressure, DBP Diastolic Blood Pressure, LDL-C Low Density Lipoprotein-Cholesterol, BMI Body Mass Index, TC Total Cholesterol, eGFR Estimated Glomerular Filtration, HbA1c Hemoglobin A1c, HDL-C High Density Lipoprotein-Cholesterol, TG Triglyceride, Urine ACR Urine Albumin to Creatinine Ratio
  3. Since NRMSE and PFC both followed normal distribution (Shapiro-Wilk normality test p value > 0.05), imputation errors of different methods were compared using one-way ANOVA. If p < 0.05, paired methods were compared using independent sample t-test
  4. aThe mean imputation error is significantly lower than that of MICE (p < 0.05)
  5. bThe mean imputation error is significantly lower than that of GAIN (p < 0.05)
  6. cThe mean imputation error is significantly lower than that of missForest (p < 0.05)