Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies

Table 2 Distributional fit to the reference data of the subsamples produced by the subsampling procedures and by simple random sampling, applied to the 2000 times resampled bootstrap convenience sample. The Expected Value, the Effect Size, Standardized Effect Size, and the 90% Bootstrap Intervals [90% BI] as described in the Methods, adjusted for number of quantiles chosen to compute the fit indicators. For the three sampling procedures, the percentage of bootstrap resamples in which the fit criteria improved compared to the random subsample is shown; values above 50% indicate an overall improvement

Fit Indicator	Expected Value [90% BI]	Effect Size [90% BI]	Standardized Effect Size	% of subsamples with higher values than random sampling
Log-likelihood
Random	− 260 [− 270, − 250]
Distance (G)	−250 [−260, − 240]	+ 7.3 [− 3.9, 20]	1.04	85.6%
Distance (S)	−250 [− 250, − 240]	+ 12 [1.4, 24]	1.73	91.5%
Probability	− 230 [− 240, − 220]	+ 29 [19, 40]	4.39	100%
Uniform	−260 [− 280, − 250]	−5.9 [− 18, 4.3]	− 0.88	35.9%
Spearman Rho
Random	0.17 [0.076, 0.26]
Distance (G)	0.26 [0.19, 0.34]	+ 0.095 [0.007, 0.19]	1.74	97%
Distance (S)	0.31 [0.24, 0.38]	+ 0.14 [0.051, 0.24]	2.51	98.9%
Probability	0.38 [0.33, 0.43]	+ 0.21 [0.12, 0.3]	3.85	100%
Uniform	0.16 [0.078, 0.24]	−0.013 [−0.096, 0.072]	− 0.26	51.4%

ISSN: 1471-2288