Selection of systematic reviews
We sought Cochrane reviews that used at least one phase of the HSSS  to identify randomized controlled trials (RCTs) in MEDLINE. To be eligible, each Cochrane review must also i) have reported the citations for included and excluded studies and ii) have been a review of RCTs or quasiRCTs. The Cochrane Database of Systematic Reviews (CDSR) 3rd Quarter 2002 was searched through the Ovid interface using the search string (hsss or highly sensitive search).tw. to identify potential studies. Two reviewers assessed each systematic review against the eligibility criteria and resolved any conflicts through consultation. SRS™ was used for all screening and data extraction. SRS is a web-based platform for conducting systematic reviews .
The size of the MEDLINE retrieval was determined by replicating and running the MEDLINE search strategy. Cochrane reviews with a MEDLINE retrieval of 1000–6000 records were selected for testing the performance of the Ultraseek search engine ranking.
One librarian extracted descriptive data about the eligible reviews. The following elements were recorded: the number of included studies and the number of excluded studies cited in the review; the number of included studies indexed in MEDLINE; the number of excluded studies indexed in MEDLINE; date of the MEDLINE search reported in the body of the review; level of detail in which the search was reported; phases of the HSSS used; searching techniques employed (such as thesaurus terms, term explosion, free text terms, truncation, adjacency operators); and restrictions such as date, language of publication, age groups or methodological filters. Electronic databases searched as well as other sources used (such as checking reference lists or contacting authors or manufacturers) were also recorded.
A known-item search was undertaken in MEDLINE for each included and excluded study listed in the review. A single librarian (MS) completed the searching using the Ovid interface for MEDLINE 1966-April 2003. The indexing status of each study was recorded as indexed or not indexed, and for each review, the set of included studies was aggregated using OR statements, as was a set of excluded studies. Each set was downloaded for subsequent analysis.
The Ovid bibliographic records for all studies retrieved from MEDLINE by the replicated search were also downloaded. When the review reported the size of the MEDLINE retrieval, it was compared to ours to validate the replication. Where our search result was smaller than that reported in the review, it was excluded as irreproducible.
Search engine configuration
Produced by Verity, Ultraseek was originally a successful web search engine and is now focused on helping businesses manage their digital information . The Ultraseek search engine (Version 5.0) was selected on the basis of its ability to deal with meta-data and assign weights to various fields. As we were dealing with indexed records and indexers have the benefit of access to the full text of the document, we anticipated that relevance ranking could be optimized by assigning greatest weight to terms appearing in indexing fields, intermediate weight to terms appearing in the title field, and lowest weight to terms appearing in the abstract field, following Hutchinson . By comparison, in Boolean searching, each condition in the search is assigned a weight of 1 if present (i.e., the item is retrieved), and 0 if absent (i.e., the item is not retrieved).
The Ultraseek search engine indexes "collections". The bibliographic records associated with each systematic review were treated as a collection. Bibliographic records were downloaded from MEDLINE into Reference Manager databases, and tagged according to their inclusion status in the review. A Reference Manager output format was created to write each record with HTML tags. Three sets of fields were written as meta-data – MeSH headings, title and abstract (See sample record, Appendix 1). A Perl script was used to separate the HTML tagged bibliographic records into individual files. File names encoded the ID number of the review, whether the record was included or excluded from the review, and the reference ID number within that review, in the form http://10included3.html. Thus the collection consisted of the bibliographic records re-written as HTML files tagged with meta-data. The search engine was installed on a laptop computer where the collections resided. The search engine indexed all records in the collection. When a search was run against the collection, the number of items with relevance greater than zero was returned, along with list of up to the first 500 relevant items, sorted by relevance.
The Ultraseek search engine was configured to provide weights to the meta-data fields – index terms, title and description (abstract). When the weights given to the meta-data fields were varied in preliminary testing the relevance scores changed, but not the order of items, which was the variable of interest. Thus, the search engine was configured with all elements equally weighted, and the collections were indexed.
For each eligible review, one member of the research team (MS) identified subject terms to be entered into the Ultraseek search. In exploratory work, it became apparent that the number of tied relevance scores depended largely on the number of terms entered. Thus we decided to standardize our Ultraseek searches at 7 terms, the minimum number that seemed to reduce ties to a workable number.
We also established that the order in which terms were entered influenced the final relevance score. Terms were entered on the basis of perceived importance (see Table 3 for examples). A final eighth term, "random*" was included in each search. The asterisk is the truncation symbol used with Ultraseek.
Terms were selected to describe the topic of the review, focusing usually on the invention, but in some cases on the population. A number of reviews studied a constellation of interventions for a single condition, such as interventions for warts . In those cases, the interventions may have been determined by reviewers post-hoc, so terms focused on the condition – warts – rather than on any interventions in the review. When the reviewers reported challenges in study identification, for instance, identifying injuries caused by distance running, versus running in other contexts, such as playing soccer , we attempted to address that difficulty in the selected terms.
Once the terms were defined, they were entered into the search box of the basic interface of the Ultraseek search engine. Terms were entered in lowercase text and truncated, and each collection was searched. Search outputs were saved for subsequent analysis.
We examined i) the rankings of included studies within a collection comprising the entire MEDLINE retrieval and ii) the ranking of included studies where only the studies listed as included or excluded in the Cochrane report comprised the collection. As we were concerned that the search engine might be optimized to place highly relevant items in the top few items with less exact ranking further back in the pack, we also examined the precision of the top 10 rankings .
When testing the initial MEDLINE retrieval, recall was determined by considering the proportion of included studies ranking within the top 500. We compared the proportion falling within the top 500, based on their relevance rank, with the proportion expected if ranking was random. When testing listed included and excluded studies, the rankings were analyzed with a Wilcoxon rank sum test using SAS for exact permutations, in order to best handle the small data sets and frequent ties .