Bmc Medical Research Methodology Open Access Identifying Observational Studies of Surgical Interventions in Medline and Embase

Background: Health technology assessments of surgical interventions frequently require the inclusion of non-randomised evidence. Literature search strategies employed to identify this evidence often exclude a methodological component because of uncertainty surrounding the use of appropriate search terms. This can result in the retrieval of a large number of irrelevant records. Methodological filters would help to minimise this, making literature searching more efficient.


Background
When assessing the safety and efficacy or effectiveness of health technologies it may not be appropriate to restrict the evidence to randomised controlled trials (RCTs) [1]. There are many situations where it may be considered necessary to include evidence from observational studies such as non-randomised comparative and case series studies [1]. In a survey of health technology assessments undertaken for the UK National Institute for Health and Clinical Excellence (NICE), 30% were found to include case series evidence [1]. A review of the included evidence for six systematic reviews, commissioned by NICE for their Interventional Procedures Program (IPP), found that 86% of the data came from case series with less than 5% from RCTs [2]. The most common reason given for inclusion of case series data was the absence of sufficient data from randomised evidence to assess effectiveness or safety outcomes. Indeed, in some cases, it was the only form of evidence available.
Searching for randomised evidence is relatively straightforward with the introduction of several initiatives to aid retrieval: the CENTRAL database of trials in The Cochrane Library; appropriate indexing terms in MEDLINE and EMBASE; and published highly sensitive filters [3][4][5][6]. There is also some evidence that the introduction of the CONSORT statement is associated with better reporting of randomised trials in the titles and abstracts and hence facilitates both searching and indexing [7].
Searching for non-randomised evidence of safety and efficacy or effectiveness from primary studies, however, is more problematic and there has been little published research to date. Indexing terms are less well established [8] and, when they do exist, are used inconsistently [9]. The reporting of methodological detail is often poor in observational studies and this contributes to problems in indexing and searching effectively [1].
The uncertainty in identifying appropriate search terms for non-randomised evidence has meant that a methodology component is often excluded from search strategies. This can lead to an inefficient use of valuable resources in terms of time involved in screening the titles and abstracts of a large number of irrelevant records. For the purposes of this study, health technology assessments and systematic reviews commissioned by NICE and published on their website at October 2005 were surveyed. 77 health technology assessments from the main NICE program and the seven systematic reviews carried out for the IPP programme were reviewed. 28 (36.4%) of the technology assessments and seven (100%) of the IPP reviews included non-randomised evidence. 31(88.6%) of these reports used no methodology filter in their search strategies while two reviews of diagnostic interventions used diagnostic filters and two strategies used adverse events filters to identify supplementary safety data.
Efficient searching for the evidence, for health technology assessments, requires an effective filter. The filter should maintain the sensitivity (or recall) of the original subject-only search, retrieving the same relevant studies (cell A of Table 1). In addition, the filter should reduce the number of irrelevant records retrieved (decrease in cell B and increase in cell D of Table 1), and hence increase precision and specificity of the search. While precision measures the number of relevant retrievals (A) in terms of the total number retrieved (A+B), the specificity measures the proportion of the non-reference standard (B+D) that are not retrieved (D). An effective filter, then, would maintain the sensitivity of the original subject search while maximising precision and specificity.
The aim of this study was to develop effective MEDLINE and EMBASE filters, to identify non-randomised evidence for surgical interventions, to be used in conjunction (using Boolean operator AND) with a subject search strategy.

Establishment of a reference standard
In a recent systematic review to assess the effectiveness and safety of laser in-situ keratomileusis (LASIK), the MEDLINE and EMBASE search strategies used to identify the evidence had incorporated terms that pertained only to the intervention and medical conditions of interest to the review and were restricted to the publication years 2000-2004 [10]. The MEDLINE and EMBASE strategies were run simultaneously as a multi-file search in Ovid and the results de-duplicated using the Ovid de-duplication tool. The retrieved titles and abstracts were screened and all reports within the scope of the topic that appeared to be randomised controlled trials, non-randomised comparative studies and case series were identified. Both prospective and retrospective studies were included. The full papers of these reports were acquired and study design assessed by one experienced reviewer. Any uncertainty was resolved by consultation with another reviewer involved in the review. The non-randomised studies constituted the reference standard for this study, against which the new methodology filters could be developed and assessed. (Subsequently for the purposes of the systematic review, smaller non-randomised and case series studies were excluded.)

Identification of the candidate terms for the filters
The titles, abstracts, thesaurus controlled subject headings (for MEDLINE and EMBASE) and the publication type field (for MEDLINE) were subjectively assessed by the information specialist for all the MEDLINE and EMBASE records of the reference standard. Terms that explained or gave an indication of methodology employed or systematic assessment of postoperative sequelae were identified and considered as candidate terms for the MEDLINE and EMBASE methodology filters. By incorporating each term individually with the original MEDLINE or EMBASE subject search strategy (using the Boolean operator AND), the sensitivity (proportion of reference standard retrieved); precision (proportion of total retrieved that were included in the reference standard); and specificity (proportion of non-reference standard studies that were not retrieved) of the candidate terms were calculated.

Development of the filters
Initially all the candidate terms for each database were combined, using the Boolean operator OR, to form the separate MEDLINE and EMBASE filters. These were run in combination with the subject-only search strategies (using the Boolean operator AND). Each candidate term was then tested to establish if its removal from the filter reduced overall sensitivity. If sensitivity was unaffected the term was considered redundant and was excluded from further analysis, while if sensitivity decreased the term was re-instated. To minimise the number of irrelevant records retrieved, two approaches were explored: 1. The terms were tested in order of precision, beginning with the lowest so that preference was given to retaining the terms with higher precision. The resulting MEDLINE and EMBASE filters are referred to as the Precision Terms Filters.
2. The terms were tested in order of specificity, beginning with the lowest so that preference was given to retaining the terms with higher specificity. The resulting MEDLINE and EMBASE filters are referred to as the Specificity Terms Filters.
By this process of elimination, redundant terms were removed and the combination of retained terms aimed to minimise the number of retrieved irrelevant records. Four filters were thus developed: Precision Terms Filters for both MEDLINE and EMBASE, and Specificity Terms Filters for both MEDLINE and EMBASE.

Assessing the performance of the filters
The subject-only MEDLINE and EMBASE search strategies were run with (using Boolean operator AND) and without the resulting filters. The total number of records and the number of reference standard records that were retrieved were used to calculate the retrieval parameters. The Specificity Terms Filters for MEDLINE and EMBASE and the Precision Terms Filters for MEDLINE and EMBASE were also run simultaneously in multi-file MEDLINE and EMBASE searches, the results de-duplicated using the Ovid deduplication tool, and retrieval parameters calculated.

Validation of the filters
The performances of the preferred filters were tested against two validation standards. These comprised the included non-randomised studies from two other reviews: a systematic review of photorefractive keratomileusis (PRK) for myopia [10] and electrosurgery for tonsillectomy [11]. The original MEDLINE and EMBASE strategies used to find the evidence for these reviews were subjectonly searches and did not include any methodology filters. The PRK search strategy searched for publications in the years 2000-2004 and for tonsillectomy, 1990 to 2004.
The validations standards were incomplete in comparison to our reference standard because they did not include all non-randomised studies that met the inclusion criteria in terms of intervention and medical condition. Some nonrandomised studies had been excluded at the screening stage because they did not fulfil other criteria, such as sample size or, in the case of the tonsillectomy review, were retrospective. These studies were not readily identifiable for inclusion in this study. The validation standards, therefore, underestimated the total number of non-randomised studies that were identified from the subjectonly searches (cells A and C in Table 1) and falsely assigned these non-randomised studies to the non-validation standards (cells B and D of Table 1). Because of these false assignments it was not possible to accurately calculate the search parameters. The original search strategies were run with and without the inclusion of the new filters, and performance compared in terms of the proportion of the validation standards that were retrieved and the reduction in number of retrievals.

Description of reference standard
The reference standard comprised 217 articles.  There was considerable similarity in retained text words between the strategies with three common to all four search filters (chang$, evaluat$ and reviewed) and two text words included in three of the strategies (baseline, compare$ or compara$). The term preoperat$ or pre operat$.mp was also included in three strategies.
The performances of the filters are detailed in Table 6. While all the EMBASE and combined MEDLINE/EMBASE filters achieved 100% sensitivity, the two MEDLINE filters achieved a sensitivity of 99.5%, with both failing to retrieve the same article. A detailed examination of this MEDLINE record failed to find any words in the title or abstract, or any index term, that gave an indication of methodology [12].  Filter again performed marginally better. This was a consistent finding for both databases separately as well as in combination. The filters substantially reduced the number of articles retrieved varying from 30.0% to 37.9%.
Once again the Specificity Terms Filter showed marginally superior performance, reducing the number retrieved from 1564 to 972 for Medline and from 1521 to 1016 for Embase. Given the consistent finding that the Specificity Terms Filter performed marginally better than the Precision Terms Filter, the former was chosen as the preferred filter and was tested against the validation standards.

Validation of filters
The MEDLINE and EMBASE Specificity Terms Filters were run against the two validation standards and the results are presented in     textword consecutive while the excluded tonsillectomy record contained the text word retrospective and was inaccurately indexed with the term Retrospective study -since all the tonsillectomy studies had been assessed as being prospective in design.
Both MEDLINE and EMBASE filters resulted in a substantial reduction in the number of retrievals but was greatest for the MEDLINE filter (PRK: 33.7% vs 21.9% and tonsillectomy: 39.6% vs 29.8%). Using the filters in combination in a multifile search resulted in a reduction of 28.4% (from 792 to 567) in retrievals for the PRK search and 30.1% (from1712 to 1196) for the tonsillectomy search.

Discussion
In a review of methodological search filters, Jenkins describes the identification of a gold or quasi-gold standard as the set of relevant records against which filters are assessed [19]. Typically this is achieved by hand searching a set of journals. Our study, however, was carried out in conjunction with a health technology assessment, where we used the results of screening titles and abstracts from a subject-only search. We therefore recognise that this may have failed to pick up some relevant records because either the subject search failed to retrieve them; because they were missed during the screening process due to error; or because the title and abstracts failed to provide sufficient information or gave misleading information. For these reasons we called our set a reference standard, rather than the gold standard.
Ideally the gold or reference standard should be representative of all indexed records to ensure that the resulting filter has generalizability. This can be a limitation of using a gold standard based on hand searching a particular set of journals [19]. While our reference standard was not limited in this way, having been derived from an electronic search of all MEDLINE and EMBASE for particular years, there may still be bias in that our set was confined to those journals that publish papers on refractive surgery. Furthermore our reference set was limited to the publication years 2000-2004 and may have benefited from better indexing in these years [19]. Some support for these limitations is evident from the performance of the filters on the validation sets. The MEDLINE and EMBASE filters, run separately, performed better on the PRK (refractive surgery and same time frame) rather than the tonsillectomy set (otolaryngology and includes earlier time period).
Reporting of studies in the titles and abstracts infrequently used explicit terms that describe study design. Terms such as case series, cohort, observational, non-random and non comparative (including variations of these terms) appeared in only a small proportion of records and hence had low sensitivity. The exception was the term compare$ or compara$.
Terminology that was used in the abstracts was often nonexplicit, giving an indication of general systematic assessment. Those retained in the filters were evaluat$, reviewed, chang$, consecutive$ and preoperat$. The use of structured abstracts, to improve explicit reporting of methodology in the abstract, would facilitate text word searching and could assist in more effective indexing.
The search filters were developed using an objective approach. The criterion for inclusion of terms was based on each term's ability to exclude irrelevant articles rather than ability to retrieve relevant ones. This approach aimed to include those terms with highest specificity or precision that in combination produced maximum sensitivity irrespective of the sensitivity of the individual terms. The performances of the Specificity and Precision Terms Filters were similar with the former being consistently marginally better in terms of overall precision and specificity for both databases separately and in combination, while sensitivity was the same. The similar performance is not unexpected given that the majority of the search terms were common to both filters. For MEDLINE 10 out of 12 terms were included in both filters while for EMBASE, 7 were the same out of 8 in the Precision Terms Filter and out of 9 in the Specificity Terms Filter.
The validation standards were pragmatically derived, comprising the included studies in two other health technology assessments. Although both sets contained a small amount of records, they were independent of the reference standard. The MEDLINE and EMBASE filters retrieved between 85.2% and 100% of the validation standards. These would have been improved with the addition of the terms post operative and case control for MEDLINE and consecutive and retrospective for EMBASE (although only prospective studies had been included) but would have reduced the precision of the searches. The one MEDLINE record that did not have an abstract or any methodological related terms would only have been retrieved if no filter had been used.
In combination however, when both MEDLINE and EMBASE were searched, 100% retrieval was achieved, for both validation standards, while reducing the number of retrievals by the same order as found for the reference standard. When comprehensive searching is required, as is usually the case for health technology assessments or systematic reviews, searching both MEDLINE and EMBASE is generally undertaken. The results of the validation would therefore suggest that the deficiencies in sensitivity of the individual filters would be considerably reduced when used in the context of multi-database searching.
The resulting filters were developed and tested on sets of studies relating to surgical interventions. With the excep-