Review

N of records retrieved
from MEDLINE (d)

N of included studies ranked
in the top 500 (d)

N of included indexed
in MEDLINE (d)

Proportion of records selected
by the search engine (p = a/b)

Recall (q = c/d)

pvalue


Gibbs [11]

5743

11

27

0.09

0.41

<0.001

Yeung [12]

4996

6

11

0.10

0.55

<0.001

Smeeth [17]

3119

4

5

0.16

0.80

0.003

Towheed [18]

1556

6

17

0.32

0.35

0.80

Shelley [19]

1486

5

5

0.34

1.00

0.004

Karjalainen [20]

1244

2

2

0.40

1.00

0.16

Malthaner [21]

2321

6

6

0.22

1.00

<0.001

Bowen [22]

4629

12

14

0.11

0.86

<0.001

Mulrow [23]

1405

36

39

0.36

0.92

<0.001

Overall

26499

88

136

0.17

0.70
 
 Suppose there are b records in the initial retrieval. Suppose the top a (here we consider a = 500) of these records are selected by the search engine, i.e. a proportion p = a/b. Suppose further that this subset includes c of the d relevant records from the initial retrieval, i.e. a proportion q = c/d. If the search engine performs no better than would be expected by chance, then we would expect q = p. For each systematic review, we treated c as a binomial random variable with denominator d, and conducted a twosided exact binomial test of the hypothesis that the expected value of q was equal to p.