An optimal search filter for retrieving systematic reviews and meta-analyses

Background Health-evidence.ca is an online registry of systematic reviews evaluating the effectiveness of public health interventions. Extensive searching of bibliographic databases is required to keep the registry up to date. However, search filters have been developed to assist in searching the extensive amount of published literature indexed. Search filters can be designed to find literature related to a certain subject (i.e. content-specific filter) or particular study designs (i.e. methodological filter). The objective of this paper is to describe the development and validation of the health-evidence.ca Systematic Review search filter and to compare its performance to other available systematic review filters. Methods This analysis of search filters was conducted in MEDLINE, EMBASE, and CINAHL. The performance of thirty-one search filters in total was assessed. A validation data set of 219 articles indexed between January 2004 and December 2005 was used to evaluate performance on sensitivity, specificity, precision and the number needed to read for each filter. Results Nineteen of 31 search filters were effective in retrieving a high level of relevant articles (sensitivity scores greater than 85%). The majority achieved a high degree of sensitivity at the expense of precision and yielded large result sets. The main advantage of the health-evidence.ca Systematic Review search filter in comparison to the other filters was that it maintained the same level of sensitivity while reducing the number of articles that needed to be screened. Conclusions The health-evidence.ca Systematic Review search filter is a useful tool for identifying published systematic reviews, with further screening to identify those evaluating the effectiveness of public health interventions. The filter that narrows the focus saves considerable time and resources during updates of this online resource, without sacrificing sensitivity.


Conclusions:
The health-evidence.ca Systematic Review search filter is a useful tool for identifying published systematic reviews, with further screening to identify those evaluating the effectiveness of public health interventions. The filter that narrows the focus saves considerable time and resources during updates of this online resource, without sacrificing sensitivity.

Background
Systematic reviews have been integral to the evidenceinformed practice movement [1][2][3][4][5] in the field of public health [6][7][8][9]. A systematic review consists of an examination of all of the primary studies on a topic, which includes searching for, collating, and assessing the studies, to establish conclusive evidence about a topic [10]. Systematic reviews present a more consistent and conservative estimate of the effect of interventions across a body of literature and as such, can have an important impact on program planning decisions in public health.
However, public health decision makers state that finding and accessing systematic reviews related to public health continues to be a barrier to evidence-informed public health practice [11][12][13][14][15][16]. The field of public health can be defined as a combination of sciences, skills, and values that function through collective societal, legislative, and political activities. It involves both public and private programs, services, and institutions aimed at protecting and improving the health of all people, including preventing disease, promoting health and wellbeing, and prolonging life. When necessary, public health also engages in restoring the health of individuals, specified groups, populations or communities through mobilizing and engaging local, state, national, and international resources to assure the conditions in which people can be healthy [17][18][19]. In short, the field of public health is broad, and decision makers wear many hats, requiring evidence on a wide range of topics.
Public health practitioners have expressed a need for a single place where they can access reviews evaluating the effectiveness of interventions, have confidence in the methodological quality of the evidence, and access plain language review summaries with corresponding implications for policy and practice [20]. Health-evidence.ca is a free, searchable online registry of systematic reviews and meta-analyses evaluating the effectiveness of public health and health promotion interventions. This registry represents one component of a larger knowledge translation and exchange (KTE) [21] strategy that supports users in accessing and interpreting research evidence. KTE is a two-way process involving dialogue, interaction, and the sharing of knowledge and evidence between and among the producers and users of knowledge and research evidence. It is a broad term that is often used to include knowledge transfer, exchange, translation, dissemination, and diffusion. The target audience for health-evidence.ca is decision makers working in public health and health promotion at all levels (front line practitioners to senior management and policy makers in government). Public health decision makers need to find, assess and interpret research evidence quickly and easily if it is to inform program and policy decisions. Health-evidence.ca provides decision makers with easy access to public health-relevant, qualityappraised systematic reviews evaluating the effectiveness of public health interventions. The site is freely accessible and can be searched by selecting common public health indexing terms. Search results include links to published review abstracts and a rating of the methodological quality of each review. In addition, healthevidence.ca team members write evidence summaries for reviews of good methodological quality to summarize key findings and provide recommendations for policy and practice. A more complete description of this online resource has been published and is accessible at http://www. biomedcentral.com/1471-2458/10/496. Health-evidence.ca was updated quarterly until 2012 and is now updated on a monthly basis. Updates consist of conducting monthly searches of relevant electronic databases, importing results into a bibliographic database management program, screening titles to identify relevant articles, retrieving potentially relevant articles and screening full document versions for inclusion. Included reviews must meet relevance criteria and must be systematic reviews that focus on public health, provide outcome data on the effectiveness of interventions, and include a documented search strategy.
As of February 2012, over 1,017,500 titles had been screened, yielding 2,450 relevant reviews. The large number of titles screened to reach the final, relevant set reflects the challenges of searching bibliographic databases for public health and health promotion literature. These challenges stem from the lack of a single database dedicated exclusively to public health and health promotion literature, requiring searches in multiple health (MEDLINE, EMBASE, CINAHL), science, and social science databases (BIOSIS, PsycINFO, SPORTDiscus, Sociological Abstracts). There are also several limitations inherent in searching these databases. For example, 33-44% of the journals identified by experts in the field as public health journals are not indexed in MEDLINE. These challenges are not limited to public health as others have encountered similar difficulties in searching for mental health content [23] and health services research literature [24]. A further challenge is identifying what is relevant to public health and health promotion practitioners, given that it is a dynamic field characterized by a wide scope of practice, defined regionally and changing constantly.
Along with the challenges of searching for public health and health promotion content, review literature, though rapidly growing, remains limited in volume when compared to primary studies. For example, over 700,000 articles were indexed in MEDLINE in 2010, of which approximately 2500 (0.36%) were health-related systematic reviews [25]. Currently, there is no single MEDLINE subject heading term for 'systematic review'; this lack of an indexing term requires the end user to employ a Clinical Query developed to locate systematic reviews, or to screen very large sets of irrelevant articles in order to retrieve systematic reviews. MEDLINE does have an indexing term for 'review' however its application is very broad. Of the 19,430,768 articles currently indexed in MEDLINE as of February 13, 2012, 8.5% (1,656,583) [26] were indexed as reviews. Upon screening a small portion of this results set, it was evident that the majority were not systematic reviews, but rather literature reviews and overviews. While the MEDLINE indexing term 'metaanalysis' is useful for identifying systematic reviews, it only captures systematic reviews that use statistical software to combine the results of the included primary studies in a single pooled estimate of effect. However, meta-analyses represent a small portion of all reviews evaluating the effectiveness of public health interventions. For example, fewer than half of public health intervention reviews indexed on health-evidence.ca are meta-analyses, thus reliance on this text word to identify reviews is not sufficient. A combination of indexing terms is required to detect relevant reviews that can be captured in online databases such as MEDLINE. Thus, although it has been time-consuming, screening a high number of irrelevant articles has been necessary. Search filters, also referred to as "search hedges", are "collections of search terms intended to capture frequently sought research methods such as randomized controlled trials, or other aspects of health care" [27]. While search filters for the retrieval of systematic reviews were being used by others for searching MEDLINE [19][20][21][22][23][24][25][26][27][28][29][30][31], EMBASE [32], and CINAHL [33], none had been used and tested for locating public health and health promotion reviews that we were aware of at the time of this project. These filters, including those targeting contentspecific literature relevant to the subject of interest [24,25], provided guidance as we developed a systematic review filter for health-evidence.ca.
Prior to 2008, we used a Public Health (PH) search filter that was developed in collaboration with health science librarians at McMaster University. The Head of Public Services worked with one of the authors (KD) to systematically run and informally evaluate the results of various search strategies for retrieving systematic reviews and meta-analyses evaluating the effectiveness of public interventions in MEDLINE, EMBASE, CINAHL, PsycINFO, and Sociological Abstracts. Search strategies were assessed and improvements made based on findings. The resulting PH search filter consisted of two distinct components: 1) indexing terms and keywords referring to systematic review methods, combined with the Boolean 'OR' operator (systematic, meta analysis, review); and 2) indexing terms and keywords referring to public health content areas, combined with the Boolean 'OR' operator (community health services, education, health education, health promotion, prevention, preventive). The content and methods components were then combined using the Boolean ' AND' operator. Seventeen topic areas were included in the content component: addiction, adult health, chronic diseases, communicable disease and infection, community health, dental health, environmental health, food safety and inspection, injury prevention and safety, mental health, nutrition, parenting, physical activity, pregnancy, sexual education, sexually transmitted infections, and women's health. This search strategy also made it more likely that we would capture articles for which established indexing terms did not exist such as social determinants of health and healthy communities.
Our PH search filter typically yielded a very high volume of results with very low precision. For example, between January 2006 and December 2007, of the 136,427 titles screened, 409 were relevant for the healthevidence.ca registry, or in other words, precision was 0.3%. In addition to using the PH search filter, more than 40 public health-relevant journals were hand searched annually, as well as the reference lists of all relevant reviews. Given this systematic search of the published review literature, we were reasonably confident that our retrieval methods were capturing a near complete set of relevant articles. We considered this set (the electronic database searches plus additional search strategies), the 'gold standard' for health-evidence.ca. A gold standard is "a set of relevant records against which a new search filter is tested and validated to determine how effective it is at retrieving particular types of records" [34]. While it is impossible to prove that the gold standard for health-evidence.ca identified all public health relevant systematic reviews, we are confident that this approach captured the vast majority of relevant reviews.
Given that the precision of the PH search filter was so low, we began to create an effective search filter that would decrease the total number of results retrieved, while maximizing the number of relevant results. The health-evidence.ca Systematic Review (SR) search filter we developed in 2008 was adapted from a previouslyvalidated filter [30], which included the terms: MED-LINE.tw, systematic review.tw, meta-analysis.pt, combined with the Boolean OR operator. While this filter was highly specific, it captured less than 82% of articles identified by our gold standard set. To customize this filter to retrieve only those systematic reviews of interventions, the term 'intervention' was added as an indexing term. This is referred to as the development data set.
The MEDLINE version of our health-evidence.ca SR search filter included the following indexing terms, combined with the Boolean 'OR' operator: MEDLINE.tw, systematic review.tw, meta-analysis.pt, intervention$.ti. We slightly modified the filter for use in EMBASE and CINAHL due to differences in indexing terms between the various databases. The indexing terms systematic review.tw and intervention$.ti are viable in both EMBASE and CINAHL, therefore these terms were consistent across all three databases. However, in both EMBASE and CINAHL, meta-analysis was not an indexed publication type, and therefore the term meta-analysis was included as a keyword in the search filter for these two databases. Each database employs a unique controlled vocabulary, thus the search strategy is tailored to the database. For example, MEDLINE does not have a preferred search term for systematic review so that concept must be searched as a text word. EMBASE and CINAHL, however, do have a specific indexing term for systematic review, so that term is used when tailoring the search to those databases.
The objective of this paper is to report the results of our efforts to evaluate and validate the health-evidence.ca SR search filter for retrieving systematic reviews and meta-analyses that evaluate the effectiveness of interventions. First, we compared the performance of the healthevidence.ca SR search filter to the PH search filter. We then compared the health-evidence.ca SR search filter to other known search filters targeted at capturing systematic reviews in existence at the time (Tables 1, 2 and 3).
Our intent was to identify a search filter that resulted in the optimal use of time and resources in updating the health-evidence.ca registry. Specifically, this paper reports the performance of each filter with respect to sensitivity, specificity, precision, and the number needed to read. The best option for our purposes is one that achieves high precision while not compromising sensitivity.

Methods
The health-evidence.ca SR search filter was evaluated and validated in two distinct ways.

Health-evidence.ca SR search filter vs. PH search filter
We compared the retrieval performance of the healthevidence.ca SR search filter in MEDLINE, EMBASE, and CINAHL with what we had retrieved using the gold standard, for both our development and validation data sets. The results are reported in Table 4. To test our health-evidence.ca SR search filter, we selected sub-sets from our gold  Four indices were used to evaluate filter performance: sensitivity, specificity, precision and "number needed to read (NNR)". Sensitivity is a measure of the proportion of actual positives which are correctly identified. We defined sensitivity as the proportion of systematic reviews identified by the gold standard that were also identified by each search filter. Sensitivity was calculated as: number of systematic reviews retrieved by a search filter relevant number of articles in the gold standard Â100 The higher the sensitivity, the more successful the search filter was in capturing a large number of the articles, in comparison to the gold standard, with 100% meaning there was perfect agreement between the search filter and the gold standard.
Specificity is a measure of the proportion of negatives which are correctly identified. We defined specificity as the proportion of irrelevant articles not retrieved by the search filters. Specificity was calculated as: number of non À relevant articles not retrieved by a search filter total number of records that are not relevant systematic reviews Â 100 Specificity is a reflection of how well a search filter omits non-relevant articles from the retrieved set, which in this case were articles that were not systematic reviews. The specificity score declines if a search filter retrieves an article that it deems to be relevant when, in fact, it is not (a false positive). A specificity of 100% means that the filter recognized all actual non-relevant articles; no articles were retrieved that were not relevant systematic reviews.
Precision (or positive predictive value) is the proportion of retrieved articles that represent relevant articles and can be calculated as: If a search filter has a high degree of precision, it can locate a high number of relevant articles while keeping the number of non-relevant articles retrieved low. A good precision score (N = 1.0) indicates that a high proportion of all articles retrieved for a particular search were actually relevant. In other words, if a search identified 10,000 articles of which 100 were relevant, the precision score would be 0.01, which would be low precision.
Finally, the NNR represents the number of articles that must be read before a relevant article is identified.
Number needed to read ¼ 1 precision For example, if the NNR was 16, then for every 16 articles identified by the search filter and read, one would be deemed relevant.

Results
Fifty-three relevant articles were identified in the development data set between January 1 and December 31, 2001. Of those 53 relevant reviews, all 53 were published in MEDLINE, 33 in EMBASE and 36 in CINAHL (see Table 4), with some overlap of the same articles being published in more than one of the databases. The initial set of 53 results (development data set) used to test and develop the search strategy was used to explore the sensitivity, specificity, precision, and NNR for both the PH and health-evidence.ca SR search filters.
The second set of 219 results (validation data set), represented a sub-set of the gold standard and was made up of relevant articles indexed in each of the 3 databases of interest between January 1, 2004 and December 31, 2005. Of the 219 articles, 207 were indexed in MED-LINE, 107 in EMBASE, and 129 in CINAHL, again with overlap of the same articles being published in more than one of the databases. During that same time period, a total of 1,174,817 records were indexed in MEDLINE, 990,862 records in EMBASE, and 272,264 records in CINAHL (see Table 4).

Specificity
In addition to being sensitive, the health-evidence.ca SR search filter demonstrated a slightly higher degree of   [30,31,[35][36][37] offered relatively high sensitivity (85.5%-88.9%) combined with good performance on specificity (98.5%-99.2%), precision (1.1-1.9), and number needed to read (52.0-94.9). Table 2 describes the results of the health-evidence.ca SR search filter in comparison to the seven other search filters tested in EMBASE. The health-evidence.ca SR and Scottish Intercollegiate Guidelines Network [37] search filters performed the best overall in terms of the combination of outcomes for sensitivity (87.9% and 81.3%), specificity (98.2% and 99.0%), precision (0.5 and 0.8) and NNR (186.0 and 118.6). The health-evidence.ca SR search filter, while having greater sensitivity, resulted in an additional 67 articles having to be read in comparison to the Scottish Intercollegiate Guidelines Network filter.

EMBASE
Sensitivity The health-evidence.ca SR search filter's sensitivity of 87.9% was slightly lower than that of the two top performing search filters which both obtained sensitivity scores of 96.3% (Wilcynski and Haynes, Sensitive query; Wilcynski and Haynes, Best optimization query).
Specificity All but the Wilcynski and Haynes (2007) search filter (sensitive query) achieved a level of specificity above 85%, with the health-evidence.ca SR search filter achieving 98.2%. The health-evidence.ca SR search filter was outperformed by the two Wilcynski and Haynes filters (99.3% for the 'Small drop in specificity, substantive gain in sensitivity' query, and 99.5% for the specific query), the BMJ Best Clinical Evidence filter (98.5%), and the Scottish Intercollegiate Guidelines Network filter (99.0%).
Precision The most precise filter had a score of 1.1 (Wilcynski and Haynes, 'Small drop in specificity, substantive gain in sensitivity' query) while retaining a high level of sensitivity (75.7%). The health-evidence.ca SR search filter offered moderate precision (0.5) in comparison.
Number needed to read The best performing filters for NNR were SIGN, BMJ Clinical Evidence filter, and the health-evidence.ca SR search filter at 118, 167.9, and 186 respectfully. Although the Wilcynski and Haynes ('Small drop in specificity, substantive gain in sensitivity' query) filter offered an NNR of 88.2, its sensitivity was much lower than that of other filters at 75.7%. Table 3 presents the results of the health-evidence.ca SR search filter along with the six other search filters tested in CINAHL. Although not performing with the best result on any single outcome, the health-evidence.ca SR search filter appeared to offer the best overall combination of sensitivity (89.9%), specificity (97.6%), precision (1.8), and NNR (57.2).

CINAHL
Sensitivity Two search strategies achieved a sensitivity of greater than 95% (Wong, Best sensitivity; Centre for Reviews and Dissemination (CRD) [38] filters), with the health-evidence.ca SR search filter achieving 89.9% sensitivity.
Specificity The Wong Best sensitivity query scored highest on specificity (99.4%), matched by the Wong Best optimization (sensitivity > specificity) query (99.4%). The Wong queries were followed closely in specificity by the McKibbon (1998) filter (98.9%) and the health-evidence.ca SR search filter (97.6%).
Precision The most precise search filter was Wong's, Best optimization query at 3.8, followed by the Best Specificity Query [33]

Discussion
The objective of health-evidence.ca is to contribute to evidence-informed decision making in public health by facilitating access to published systematic reviews evaluating the effectiveness of public health and health promotion interventions. An optimal search filter for healthevidence.ca is one that has high sensitivity, specificity, and precision and a relatively low NNR. However, any reduction in NNR was desirable. A filter such as this allows us to have confidence that all relevant articles will be identified (sensitivity), fewer non-relevant articles will be retrieved (specificity), most of the identified articles will be relevant (precision), and the NNR will be reduced. Reducing the NNR is of great importance since screening is a resource-and time-intensive process.
Although a search filter may perform exceptionally well on any single outcome, it is the balance of performance across these four domainssensitivity, specificity, precision, NNRthat distinguishes the best filter for our purposes. By replacing the PH search filter with the health-evidence.ca SR search filter, the overall number of articles retrieved from health-evidence.ca electronic searches was greatly reduced without losing relevant content. The balance struck by the SR search filter means that this filter would be useful to those wishing to retrieve systematic reviews related to health care, with wider application than that of our own database of reviews on the effectiveness of interventions. The desired benefit of filters is that they save time both in search strategy development and screening. One study demonstrated how filters reduce the number of results needed to screen [37], while another found that saving time both in search strategy development and screening of results was the most common benefit reported by librarians [38]. For our purposes, the health-evidence.ca SR search filter offered overall improvements in specificity and precision, with the associated decrease in the NNR, substantially decreasing screening time. The desired improvement in precision was feasible while only minimally impacting the sensitivity of the search strategy. The results of this study illustrate that for the most part, the health-evidence.ca SR search filter outperformed the PH search filter with respect to sensitivity, specificity, precision and NNR in all three databases. However, it was the overall balance among these variables and the fact that high precision could be combined with high sensitivity that made the health-evidence.ca SR search filter the optimal choice for identifying systematic reviews evaluating the effectiveness of interventions.
When compared to other filters in MEDLINE, EMBASE and CINAHL, overall, the health-evidence.ca SR search filter offered the right balance of sensitivity, specificity, precision, and NNR. Although other filters had higher sensitivity scores than the health-evidence.ca SR search filter in MEDLINE, these higher sensitivity scores were generally accompanied by poorer precision and NNR performance. In EMBASE, the health-evidence.ca SR and Scottish Intercollegiate Guidelines Network search filters performed the best overall and were comparable in terms of performance across all of the outcome measures. Likewise in CINAHL, though the health-evidence.ca SR search filter did not outperform other filters on any single outcome, it offered the most robust overall result of high sensitivity and specificity with a reasonably low NNR in comparison to other filters.
The health-evidence.ca SR search filter streamlines the process of locating and screening relevant reviews by allowing us to effectively search health databases with a simpler strategy that maintains a high level of both sensitivity and precision. The task of searching the health databases for every relevant systematic review evaluating effectiveness of public health interventions is a challenging one that requires balance. Because of the growth of the literature in the area of systematic reviews, highly sensitive searches often come up with result sets that are unmanageably large. However, if a search is too specific, then it has the risk of missing relevant articles. It is important to establish the right balance in the trade-off between sensitivity and specificity depending on what will best serve the purpose at hand [39,40]. Using the healthevidence.ca SR search filter has allowed us to achieve the right balance in our searches by retaining greater than 85% sensitivity across all three databases, while reducing the NNR by two thirds. We estimate that this has translated into a savings of 384 hours of staff time per quarterly update of health-evidence.ca by reducing the hours required to execute database searches, screen results, retrieve full-text versions of potentially relevant reviews, and test reviews for relevance. The reduction has meant that resources are available for the exploration and development of new protocols for searching other relevant but previously unexplored electronic databases covering areas such as environmental health, social welfare, and veterinary sciences for relevant public health content.
The health-evidence.ca SR search filter is an easy-touse tool. It can be entered into the OVID interface for searching in MEDLINE and EMBASE. Compared to other more complex filters, the health-evidence.ca SR search filter is easily entered. A survey of librarians revealed that users find search strings too long [38,40]. The SR search filter used by health-evidence.ca is a relatively short search filter, with other authors also finding that the brief search filters work well. Our results, which are similar to those of others [39,38], indicate that methodological search filters can be as or more effective than content filters for retrieving relevant systematic reviews [27][28][29][30][31][32][33][34][35]39]. Using a methodological filter allows us to circumvent the need to generate an accurate and all encompassing definition of public health that can be translated and applied across indexing systems within different databases. However, if desired, the search strategy can be combined (using Boolean logic, e.g. AND) with topic-specific search terms to reduce the number of articles retrieved, if applied for a specific topic area (e.g. influenza).

Limitations
Searching was conducted in OVID's search interface for all three databases; other search interfaces for these databases (e.g. PubMed) may handle the searches somewhat differently. As of August 30, 2008, CINAHL moved from OVID Technologies to be hosted by EBSCO, exclusively. Unfortunately, this change to EBSCO renders the CINAHL filters included in this paper, including our filters, out of date. The performance of these filters would require reevaluating them in the EBSCO platform before their application. This brings light to a key limitation of search filterscreation dates must always be considered before using a filter as changes to indexing terms and hosting platforms can impact filter function.
The sensitivity scores calculated for each search filter can be applied to broader searches for systematic reviews evaluating various interventions and are not necessarily applicable only to public health interventions. However, precision and NNR scores were calculated specifically for public health content and cannot be generalized to topic areas outside of public health. The low precision scores yielded across all search filters were expected, since precision is generally low when searching large databases [39,40]. Lastly, our group's own manual screening set was used as the gold standard. Although a consistent set of relevance criteria were applied to generate this results set, screening was shared between two authors (MD, KD), and several other members of the health-evidence.ca team. Although either MD or KD acted as second reviewer on each article, there was still potential for reviewer bias through the involvement of a small number of reviewers. Additionally, having a combination of both systematic review methodology indexing terms and public health indexing terms in our PH search filter dually limited our results sets, retrieving only content which met all requirements for both methodology and public health content.