Can electronic search engines optimize screening of search results in systematic reviews: an empirical study

Table 2 Performance of the first ranking attempt for each review

Review	N of records retrieved from MEDLINE (d)	N of included studies ranked in the top 500 (d)	N of included indexed in MEDLINE (d)	Proportion of records selected by the search engine (p = a/b)	Recall (q = c/d)	p-value
Gibbs [11]	5743	11	27	0.09	0.41	<0.001
Yeung [12]	4996	6	11	0.10	0.55	<0.001
Smeeth [17]	3119	4	5	0.16	0.80	0.003
Towheed [18]	1556	6	17	0.32	0.35	0.80
Shelley [19]	1486	5	5	0.34	1.00	0.004
Karjalainen [20]	1244	2	2	0.40	1.00	0.16
Malthaner [21]	2321	6	6	0.22	1.00	<0.001
Bowen [22]	4629	12	14	0.11	0.86	<0.001
Mulrow [23]	1405	36	39	0.36	0.92	<0.001
Overall	26499	88	136	0.17	0.70

Suppose there are b records in the initial retrieval. Suppose the top a (here we consider a = 500) of these records are selected by the search engine, i.e. a proportion p = a/b. Suppose further that this subset includes c of the d relevant records from the initial retrieval, i.e. a proportion q = c/d. If the search engine performs no better than would be expected by chance, then we would expect q = p. For each systematic review, we treated c as a binomial random variable with denominator d, and conducted a two-sided exact binomial test of the hypothesis that the expected value of q was equal to p.

ISSN: 1471-2288