Skip to main content

Table 2 Distributional fit to the reference data of the subsamples produced by the subsampling procedures and by simple random sampling, applied to the 2000 times resampled bootstrap convenience sample. The Expected Value, the Effect Size, Standardized Effect Size, and the 90% Bootstrap Intervals [90% BI] as described in the Methods, adjusted for number of quantiles chosen to compute the fit indicators. For the three sampling procedures, the percentage of bootstrap resamples in which the fit criteria improved compared to the random subsample is shown; values above 50% indicate an overall improvement

From: Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies

Fit Indicator

Expected Value [90% BI]

Effect Size [90% BI]

Standardized Effect Size

% of subsamples with higher values than random sampling

Log-likelihood

 Random

− 260 [− 270, − 250]

   

 Distance (G)

−250 [−260, − 240]

+ 7.3 [− 3.9, 20]

1.04

85.6%

 Distance (S)

−250 [− 250, − 240]

+ 12 [1.4, 24]

1.73

91.5%

 Probability

− 230 [− 240, − 220]

+ 29 [19, 40]

4.39

100%

 Uniform

−260 [− 280, − 250]

−5.9 [− 18, 4.3]

− 0.88

35.9%

Spearman Rho

 Random

0.17 [0.076, 0.26]

   

 Distance (G)

0.26 [0.19, 0.34]

+ 0.095 [0.007, 0.19]

1.74

97%

 Distance (S)

0.31 [0.24, 0.38]

+ 0.14 [0.051, 0.24]

2.51

98.9%

 Probability

0.38 [0.33, 0.43]

+ 0.21 [0.12, 0.3]

3.85

100%

 Uniform

0.16 [0.078, 0.24]

−0.013 [−0.096, 0.072]

− 0.26

51.4%