Identifying nurse staffing research in Medline: development and testing of empirically derived search strategies with the PubMed interface

Background The identification of health services research in databases such as PubMed/Medline is a cumbersome task. This task becomes even more difficult if the field of interest involves the use of diverse methods and data sources, as is the case with nurse staffing research. This type of research investigates the association between nurse staffing parameters and nursing and patient outcomes. A comprehensively developed search strategy may help identify nurse staffing research in PubMed/Medline. Methods A set of relevant references in PubMed/Medline was identified by means of three systematic reviews. This development set was used to detect candidate free-text and MeSH terms. The frequency of these terms was compared to a random sample from PubMed/Medline in order to identify terms specific to nurse staffing research, which were then used to develop a sensitive, precise and balanced search strategy. To determine their precision, the newly developed search strategies were tested against a) the pool of relevant references extracted from the systematic reviews, b) a reference set identified from an electronic journal screening, and c) a sample from PubMed/Medline. Finally, all newly developed strategies were compared to PubMed's Health Services Research Queries (PubMed's HSR Queries). Results The sensitivities of the newly developed search strategies were almost 100% in all of the three test sets applied; precision ranged from 6.1% to 32.0%. PubMed's HSR queries were less sensitive (83.3% to 88.2%) than the new search strategies. Only minor differences in precision were found (5.0% to 32.0%). Conclusions As with other literature on health services research, nurse staffing studies are difficult to identify in PubMed/Medline. Depending on the purpose of the search, researchers can choose between high sensitivity and retrieval of a large number of references or high precision, i.e. and an increased risk of missing relevant references, respectively. More standardized terminology (e.g. by consistent use of the term "nurse staffing") could improve the precision of future searches in this field. Empirically selected search terms can help to develop effective search strategies. The high consistency between all test sets confirmed the validity of our approach.


Background
PubMed/Medline contains more than 18 million references. The identification of relevant literature in this wide-ranging source is of great importance to researchers in remaining up-to-date with the latest developments in the field of interest, as well as in conducting comprehensive literature reviews. "Search filters are collections of search terms intended to capture frequently sought research methods, such as randomized controlled trials, or aspects of health care" [1]. While this definition includes methods filters for certain common research methods such as randomized controlled trials (RCTs) [2][3][4][5] and systematic reviews [3,[6][7][8][9], the identification of relevant literature in fields with less standardized methods such as health services research remains a cumbersome task. Furthermore, methods filters need to be complemented with terms of the topic of interest to identify the relevant literature. The development of the topic-specific part of the search strategy usually consists of an arbitrary selection of terms. Few studies have been conducted with the aim of systematically identifying this topic-specific part [10][11][12]. An approach guiding the selection of relevant terms could help researchers develop search strategies in a more objective and systematic manner for both topic and methods-related searches.
Nurse staffing research investigates the association between nurse staffing parameters and nursing and patient outcomes [13]. The basic question in nurse staffing research is which nurse-to-patient ratios result in high-quality patient care. Although previous research in this field has largely been observational in nature, a wide range of statistical methods and data sources are used [14], which makes it difficult to identify the relevant literature effectively. Empirically tested search strategies support the identification of literature in an effective and efficient manner [1,15], and are used in searches conducted in the production of systematic reviews and in the creation of automatic e-mail updates with PubMed's My NCBI.
To date, the development of empirically tested search strategies has been focused on identifying certain study types, such as RCTs and systematic reviews. Most research on search filters has tested the developed search strategy against a defined set of references from a hand search (gold standard) or other systematic reviews (quasi-gold standard) [1,[15][16][17][18][19][20][21][22][23][24]. An approach based on a set of relevant references identifying appropriate terms and then testing the developed strategy against several test sets could be used for search strategy development in general, beyond its sole use in the development of methods filters.
In the context of systematic reviews, the number of relevant references on a given topic in a database is a matter of particular interest. An estimate of the number of relevant references in the database could be used for resource planning purposes within the framework of comprehensive systematic reviews.
The four aims of this study were to: Search strategies solely aimed at sensitivity or precision target the extremes of the inverse relationship of these two parameters. A balanced strategy attempts to achieve both aims: to achieve high sensitivity without losing too much precision and vice versa. Balancing is based on the iterative addition and removal of parts of the search strategy to determine a balance between sensitivity and specificity. However, this balance is not precisely defined and remains a vague concept. The employed development process of the search strategy includes four sets of references to define and test the developed strategies. Two sets of references were used for the development of the search strategies, a development and a population set.
The development set was used to identify and evaluate the sensitivity of free-text terms (title, abstract) and Medical Subject Heading (MeSH) terms. The development set consisted of a pool of 78 relevant papers from PubMed/Medline, identified in three relevant systematic reviews investigating the association between nurse staffing and patient outcomes [13,25,26]. Systematic reviews have previously been used to identify relevant references for search filter development [24]. Well-conducted systematic reviews employ comprehensive searches in various databases and are often complemented by hand searches. A set of references created by merging relevant references from different systematic reviews can be assumed to represent the total population of relevant references. The selection of systematic reviews was not based on a systematic search but on a priori knowledge of the field. The systematic reviews were selected because, to our knowledge, they employed the most comprehensive searches so far targeting nurse staffing research [13,25]. Only those studies critically appraised and included in the systematic reviews and available in PubMed/Medline were incorporated in the development set.
A population set consisting of a random sample of PubMed/Medline references was used to compare the frequency of terms with the highest sensitivity from the development set with the frequency in the overall PubMed/Medline population. For the sampling procedure we limited a PubMed/Medline search (using an empty search field) to the last 12 months (12/2007 to 12/2008) and saved the retrieval results as a PMID list. From this list a random sample of 10,000 references was drawn. References of the population set were not screened for relevance and all references were assumed to be not relevant.
A text-mining approach was used to identify potentially relevant free-text terms from the development set. The analysis was computed with the tm package [27] in R [28], which is a statistical computing language and graphics environment.
The tm package creates a term-document matrix consisting of all terms used in a set of references (78 in this case) and expresses the frequency of each term in each reference. The PubMed/Medline references in the development set contained 1,779 terms. Terms present in at least five percent of the references (359 candidate terms) of the development set were selected for additional exploration. To further decrease the number of candidate terms, the 25 most overrepresented terms from the development set compared to the population set were used to develop the free-text part of the search strategies in PubMed/Medline. "Overrepresented" was defined as the most widely used terms in the 78 references of the development set that were prevalent in 2% or fewer references of the population set. Table 1 shows the prevalence of these 25 terms in both data sets.
The text-mining approach applied worked reliably only for single word terms. MeSH terms often consist of multiple words including special characters, which lead to unexpected results. Due to this technical constraint, a simplified approach was applied to identify the 20 most frequent MeSH terms to be used in the search strategy. Terms were selected on the basis of their frequency in the development set and their relevance to the question.
The final stage of the search strategy development consisted of iterative queries in PubMed/Medline comparing different combinations of free-text and MeSH terms in order to develop the three search strategies: 1) sensitive, 2) precise, and 3) a balance between sensitivity and precision. The identification of the most effective combination of terms for the most sensitive, precise or balanced strategy is still effected manually. However, on the basis of the pre-selection of relevant and specific terms, this process is considerably shorter than a nonempirically informed development process. Table 2 shows the developed search strategies in PubMed/Medline. We provide a single-line syntax for PubMed (Table  S1) and a syntax for OVID SP Medline (untested , Table  S2) as additional file 1.

Testing the search strategies
The newly developed search strategies (Table 2) were tested against three reference sets: the development, precision, and journal screening sets. All tests were conducted with the PubMed interface. PubMed/Medline was chosen for its free accessibility. The search development and testing were conducted in December 2008.
The following formulas were used for the calculation of sensitivity (1), precision (2), and the number needed to read (3) The development set consisted of 78 relevant references from the three reviews. As we did not screen the retrieved references in PubMed/Medline for relevance, we calculated precision based on the conservative assumption that all references retrieved additionally All search strategies tested (Sensitive, Precise, Balanced, PubMed HSR Sensitive, PubMed HSR Precise) were connected with the OR operator and limited to the time frame between 1982 and 2006. A random sample of 2,195 references was drawn from the retrieved 35,708 records and screened for relevance. This set was used to determine a less biased estimate of the precision of the search strategies and to estimate the overall number of relevant references. This estimation was based on the assumption that a joint search including all search strategies (with sensitivity of up to 1.00) should be able to capture all relevant studies in PubMed/Medline for the given time frame. Following this assumption it is possible to calculate an estimate for the number of relevant references in PubMed/Medline using the precision set. The relevant references of the precision set overlapped with the development set, except for one reference. This overlap can be expected for two reasons: 1) both sets target the same time frame, and 2) the development set was based on three comprehensive searches, which potentially captured all relevant records in this time frame.
The journal screening set was based on the assessment of all available abstracts of three journals relevant to nurse staffing research (Medical Care, Health Services Research, and Journal of Nursing Administration; issues 2006 to 2008; 1,274 references). The selection of journals was based on the frequency of relevant articles in each journal in the development set. The time frame of the latest search in the systematic reviews and the journal screening overlapped by six months, which resulted in one paper being included in both sets and two papers not being identified by the systematic reviews; we assume this was caused by the delay in full indexing in PubMed/Medline. There was no overlap of references between the journal screening and precision set.
The references retrieved from the precision and journal screening set were independently assessed for relevance by two of the authors (MS, SK). Eligibility criteria were based on the three systematic reviews (Table 3). Inconsistencies in the classification of references as relevant or non-relevant were resolved by consensus.
The performance of the newly developed search strategies was compared to PubMed's sensitive and precise special queries for outcomes in health services research (PubMed HSR Queries) [29]. PubMed's HSR Queries are methods filters targeting health services research, including nurse staffing research. These were combined with the topic-specific terms from the newly developed strategies. Figure 1 outlines the development process and the testing of the search strategies.

Performance of the newly developed strategies
The sensitive search strategy captured almost 100% of the relevant references in all test sets (Table 4), while the precise strategy captured between 6.1% and 32.0%. To identify a relevant paper from the retrieved references of the sensitive strategy, users would need to screen 297 references (NNR), while the precise strategy detected one relevant reference in every three.  Table 3 Inclusion and exclusion criteria for the precision and journal screening set Inclusion criteria: • Studies investigating the association between staffing (e.g. nurse-to-patient ratio or work hours per patient or patient day) and a) nursing outcomes (e.g. job satisfaction, nurse vacancy rate, nurse turnover rate, nurse retention rate) or b) patient outcomes (e.g. mortality, adverse drug events, nurse quality outcomes, length of stay; patient satisfaction with nursing care) Exclusion criteria: • Studies not published in English • Studies including a target population of outpatients and patients in long-term care facilities • Studies with no information relevant to nurse staffing policies and strategies • Studies examining the contributions of advance practice nurses (nurse practitioners, nurse clinicians, certified nurse midwives, nurse anesthetists) • Administrative reports and single-hospital studies that did not include control comparisons and did not test an associative hypothesis • Systematic or non-systematic reviews • Editorials, letters, non-original research

Discussion
The search strategies developed performed well in terms of sensitivity, with the expected pay-off for precision and vice versa. Depending on the objective of the search, all three strategies are suitable for specific purposes such as the use of sensitive strategies in systematic reviews or e-mail alerts. All strategies were assessed against three different test sets. For the measurement of sensitivity, the development and journal screening sets produced similar test results, while the results of the precision set showed greater differences. We assume these differences were caused by the insufficient sample size of the precision set. Although 2,195 references do not appear to be a small sample size, for a given prevalence of 0.0027% of relevant references in the PubMed/Medline population, the sample is still small. Precision ranged from 0.2% to 6.1% in the development set, 0.3% to 14.7% in the precision set, and 8.1% to 32.0% in the journal screening set. Although the ranges varied considerably between the test sets, the overall pattern in the comparison of the search strategies remained consistent: the precise strategies performed better than the balanced strategies and sensitive strategies. Conceptually, the precision set was the closest to the true population. Even with 2,195 screened references, this approach lacked the accuracy to differentiate between strategies in terms of sensitivity. However, precision derived from this set was not hampered by the small sample size and produced less biased estimates for the PubMed/Medline population than the development and journal screening set.
The comparison of PubMed's HSR Queries with the newly developed strategies shows advantages for the latter strategies. This favourable assessment could be expected, due to the broader scope of the HSR Queries. Therefore it might be more important to consider performance comparison as a validation method for the developed strategies and the test sets used.
One of the strengths of the population set is the possibility to infer to the overall PubMed/Medline population and calculate the expected number of relevant references on the topic of interest. However, it should be taken into account that this estimate is based on the Medline references of PubMed/Medline and therefore ignores a small percentage of non-Medline references. Although limited by wide confidence intervals, when exclusively compared to the development and journal screening set, the estimate allows an inference of the overall number of relevant papers.
In addition to the aims outlined, the study employed a development process for search strategies, with some features that might be useful to search strategy development in general. While the identification of candidate terms and the testing of the strategy against the development set have previously been done in research on methods filters, in our opinion the population and precision sets employed are unique features of this study. These sets allow (1) search strategy developers to select terms that are not only frequently used in relevant publications but also specific to the topic of interest, and (2) to achieve more realistic precision estimates for the PubMed/Medline population. For search strategy developers, the frequency of terms should not be the sole criterion for the selection of a term for a search strategy. For example, the term "patients" is present in 65% of the references in the development set, but also in 77% in the population set, indicating a lack of specificity for the topic of interest. The population set enables the developer to preselect these specific terms in order to develop sensitive and precise searches.
Although the development process described could support the development of performance-oriented search strategies, in general some limitations apply to this study and the generalizability of the process.
We assumed that the selected systematic reviews used for building the development set are the most comprehensive reviews in the topic area. However, we cannot rule out that other reviews containing additional relevant references exist. The search filters developed require references to be fully indexed in Medline and might not be able to fully capture citations in-process; this applies to many search filters [30] and also limits the use of search strategies as e-mail-update filters.
An untested search strategy without MeSH terms is provided in Additional file 1 (Table S3).

Conclusions
As with other literature on health services research, nurse staffing studies are difficult to identify in PubMed/Medline. Even though sensitive search strategies result in a high level of sensitivity, the considerable number of non-relevant references is a burden. Depending on the purpose of the search, researchers can choose between high sensitivity or high precision, i.e. retrieval of a large number of references or an increased risk of missing relevant references, respectively. More standardized terminology (e.g. by consistent use of the term "nurse staffing") could improve the precision of future searches in this field.
The described development process for an empirical search strategy is a useful -though technically demanding -approach to building performance-oriented strategies. The similar sensitivities of the tested strategies in the development and journal screening set confirm the validity of this approach. The precision set can be used to provide more realistic precision estimates and to calculate the expected number of relevant references in the population set.

Additional material
Additional file 1: Single-line syntax for PubMed (Table A1) and untested syntax for OVIDSP for Medline (Table A2). The single line syntax of the search strategies for PubMed and the OVIDSP syntax (untested) and provided as a convenience to the reader.