The art and science of study identification: a comparative analysis of two systematic reviews
BMC Medical Research Methodology volume 16, Article number: 24 (2016)
Systematic reviews (SRs) form the foundation for guidelines and evidence-based policy in medicine and public health. Although similar systematic reviews may include non-identical sets of studies, and it is recognized that different sets of studies may lead to different conclusions, little work has been published on why SR study cohorts differ.
We took advantage of concurrent publication of two SRs on the same topic – prevention of child exposure to tobacco smoke - to understand why study cohorts differed in the two reviews. We identified all studies included in just one review, investigated validity of specified reasons for exclusions, and, using database records, explored reasons for study non-identification. We assessed review methods and discordancy, and attempted to assess whether changes in study cohorts would have changed conclusions.
Sixty-one studies were included in the two reviews. Thirty-five studies were present in just one review; of these, twenty were identified and excluded by the parallel review.
Omissions were due to: review scope (9 studies, 26 %), outcomes of interest not measured (8 studies, 23 %), exclusion of reports with inadequate reporting (6 studies, 17 %), mixed or unclear reasons (3 studies, 8 %), search strategies concerning filters, tagging, and keywords (3 studies, 8 %), search strategies regarding sources (PUBMED not searched) (2 studies, 6 %); discordant interpretation of same eligibility criteria (2 studies, 6 %), and non-identification due to non-specific study topic (2 studies, 6 %). Review conclusions differed, but were likely due to differences in synthesis methods, not differences in study cohorts.
The process of study identification for SRs is part art and part science. While some differences are due to differences in review scope, outcomes measured, or reporting practices, others are caused by search methods or discrepancies in reviewer interpretations. Different study cohorts may or may not be a cause of differing SR results. Completeness of SR study cohorts could be enhanced by 1 – independent identification of studies by at least two reviewers, as recommended by recent guidelines, 2 - searching PUBMED with free-text keywords in addition to MEDLINE to identify recent studies, and 3 - Using validated search filters.
Systematic reviews (SRs) of the professional literature, sometimes termed the “platinum” standard of evidence , form the foundation for clinical guidelines and evidence-based health policy, thus shaping the policy agenda in medicine and public health [2, 3]. Contrary to narrative reviews, which may be biased due to non-systematic procedures for identifying original studies, SRs enjoy high credibility. Despite their popularity in recent years, enormous importance, and clear advantages over traditional reviews, it is not easy to validate the results of a given SR: there is simply no “gold standard” with which to compare the results. The scientific community has approached this issue by 1 – creating reporting standards [4–6]; 2 – developing tools to validate SR quality  and risk of bias ; and 3 – applying those standards and tools to published SRs [4, 9–16]. A fourth approach, made possible by the proliferation of published SRs on similar topics, is based on empirical comparisons of SR methods, results, and conclusions [17–25].
Performing a systematic review is a complex task, and searching for studies is one of the most difficult aspects. An early version of the Cochrane Handbook [Cochrane Handbook 2006, Section 5.3, p.76] , stated: “Identifying all relevant studies … is… largely what distinguishes a systematic review from a traditional narrative review.” Experience in conducting SRs over the years has shown that identifying all relevant studies was too ambitious; consequently, in recent versions of the handbook, that statement has been modified to read: “Systematic reviews of interventions require a thorough, objective and reproducible search of a range of sources to identify as many relevant studies as possible (within resource limits).”  (Section 184.108.40.206).
Indeed, identifying the entire set of relevant published studies is challenging at best, and may not be possible. In comparing SRs in 1996, Cook  found “incomplete identification of relevant studies”. In 17 review sets addressing the same topics, Linde found that the set of included primary studies varied by more than 50 % in 10 review sets . Campbell reported differences in search strategies . Rosen et al. comparing reviews performed by the US Community Guide and Cochrane on tobacco control, found that the US Community Guide did not locate (for reasons other than publication date), on average, 66.1 % of original studies included in the comparative Cochrane reviews, whereas Cochrane did not locate (for reasons other than publication date), on average, 43.5 % of Guide studies . Goodyear  compared 2 reviews on a similar topic, and found “imperfectly overlapping studies.” Ford , in his study of 8 meta-analyses on similar topics, found that 6 of the 8 meta-analyses missed relevant studies, and that 5 of 8 meta-analysis included studies which were ineligible according to the review’s stated inclusion criteria. We are unaware of any previous studies comparing similar systematic reviews which focused on detailed search procedures as well as selection procedures.
On March 24th, 2014, a systematic review and meta-analysis of interventions aimed at protecting children from tobacco smoke exposure was published . One author of this paper (LJR) was the lead author on that paper, and the other author (RS) had performed the literature search. The following week, on March 31, 2014, the Cochrane Collaboration published its third issue of the year, which included a review on the same topic conducted by Baxi et al.  While many of the studies were common to both reviews, some studies appeared in just one review. We attempted to discover, for topics which were common to both reviews, which studies were present in only one review, and find an explanation for the omission in the parallel review. Our primary goal was to identify flaws in the process of article identification in this set of two reviews, and use that information to improve future study identification processes. We also considered whether identifying or including omitted studies would have changed the results or conclusions of the reviews.
In order to compare the sets of studies included in two reviews published almost simultaneously on the topic of child protection from tobacco smoke exposure (Rosen 2014  and Baxi 2014 ), we did the following: 1 - Compared inclusion and exclusion criteria in the two reviews; 2 - Compared the search processes in the two reviews; 3 - Identified which studies were included in just one review; and 4 – Attempted to discover reasons for omission from one review. For studies explicitly excluded by one review, we explored why those studies were included in just one review. For studies which were not identified by one review, we examined possible reasons for non-identification. This included carefully examining records from databases to check dates of entry into the database, and keywords associated with the record.
Information on study identification, inclusion, and exclusion were obtained from the published reviews, and from correspondence with the lead author of the Cochrane review. Baxi referred to single or multiple citations, while Rosen referred to a single citation for each study. In our comparison, we used a single citation if there was one. We address issues stemming from multiple citations as necessary.
We compared the following aspects of the two reviews: methods of data synthesis; quality of reviews (AMSTAR ); key results and conclusions; discordancy ; and possible resultant policy recommendations (GRADE) . We also and considered whether identification and/or inclusion of omitted studies would have changed conclusions.
Inclusion and exclusion criteria
The differences and similarities between the inclusion criteria are presented in Table 1.
Both study titles focused on child tobacco smoke exposure (TSE) reduction. The definition of objectives as defined in the abstracts of the two reviews was essentially identical. In the text, Baxi specified objectives and outcomes which were broader than Rosen’s: she included parental/caretaker cessation and child health measures in addition to child TSE reduction. The acceptable interventions, study designs, and comparisons were identical in both reviews. Baxi’s acceptable population was broader: Rosen included parents of children aged 0-6, while Baxi included parents, caretakers, and educators of children 0–12. Rosen, but not Baxi, restricted eligibility to studies with follow-up of at least 1 month.
Baxi’s search strategy can be found at: http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD001746.pub3/full#CD001746-sec1-0011. Rosen’s search strategy can be found in Additional file 1. The search process used by each study is summarized in Table 2. Baxi’s review was the third update of previous reviews on this topic. The updated search, most recently conducted in Sept. 2013, was conducted by a librarian and a Cochrane trial search coordinator. Rosen’s review was the second in a series of meta-analyses; the first of which focused on parental cessation [32, 33]. Rosen’s electronic search was conducted by a librarian (RS), most recently in early October, 2013.
Both Baxi and Rosen reported their overall search strategies, and both gave reasons for excluded studies. Rosen presented the flow chart recommended by PRISMA regarding number of records retrieved, scanned, and number of full-text articles read.
Both Baxi and Rosen searched Ovid MEDLINE, but only Rosen searched also Ovid MEDLINE In-Process & Other Non-Indexed Citations. Both used subject headings for their main subject, Rosen also used free text terms.
Both searched EMBASE, and PsychInfo. Baxi, but not Rosen, systematically reviewed reference lists of all included studies, searched the grey literature, and searched CINAHL, ERIC, CENTRAL and Cochrane Specialised Group Register. Rosen, but not Baxi, searched WoS and added Pubmed to identify articles not included in any segment of Ovid MEDLINE. Baxi used Cochrane’s highly sensitive filter for study design, while Rosen used a non-validated specific filter. Both authors used published and unpublished information obtained from the authors.
Study identification and exclusion
All studies identified
Sixty-one studies were included in the two reviews. Baxi included 57 studies, and Rosen included 30 studies. Twenty-six studies were included in both reviews (Cochrane Identifier: Abdullah 2005, Baheiraei 2011, Butz 2011, Chan 2006a, Chellini 2013, Conway 2004, Eriksen 1996, Fossum 2004, Greenberg 1994, Groner 2000, Halterman 2011, Hovell 2000, Hovell 2002, Hovell 2009, Irvine 1999, Krieger 2005, McIntosh 1994, Prokhorov 2013, Severson 1997, Stotts 2012 2013, Tyc 2013, Wakefield 2002, Wilson 2001, Wilson 2011, Yilmaz 2006, Zakarian 2004). The Cochrane review provided multiple citations for many of the studies. One study which was included in both reviews was based on information from different reports [34, 35].
Studies identified and excluded
Baxi identified and excluded one study (Lanphear ) at the abstract stage, on the basis of goals. The stated reason for the exclusion was that reducing child environmental tobacco smoke exposure was not defined as a primary objective (Personal communication, July 31, 2014). However, in the Introduction, Lanphear included secondhand smoke exposure reduction as an explicit goal. The reason for omission was categorized as “Discordant interpretation of same eligibility criteria.” The remaining 19 studies were excluded by Rosen. One study was excluded at the title stage (Ekerbicer ), due to age of children. Three studies were excluded at the abstract stage; one due to age (Culp ), one due to outcomes and goals (maternal tobacco consumption) (Nuesslein ), and one due to outcomes (Ralston 2013 ). Of the 15 studies which were excluded by Rosen at the stage of full-text reading, 6 were excluded because there were no relevant outcomes. (Chan , Curry , Emmons , Hughes , Ralston 2008 , Winickoff ) Six studies were excluded because data necessary for meta-analysis review was missing [39, 48, 51, 52, 61, 69]. All but one of these  were missing means, and/or standard deviations, and/or sample sizes.
The exclusion of one study after reading the full-text of the reports was based on discordant interpretation for the same eligibility criteria (study design) . This occurred even though both reviewers had identical inclusion criteria for study design. The Borrelli study design involved randomization to 2 active intervention groups. Rosen excluded the study because it was unclear which group to include as the “intervention” and which to include as the “control”.
Rosen excluded one paper  because it was a follow-up study of a paper by Hovell 1994  which she was planning to include. After the initial exclusion decision on the Wahlgren paper was made, the Hovell trial was excluded because data for the meta-analysis were unavailable from the authors. Had the Wahlgren paper been reconsidered (as it should have been); it would have been rejected because the control group was exposed to intervention materials at the close of the initial intervention period, leading to the probability that the control group was contaminated by exposure to control materials. This was categorized as “mixed” reason for omission.
The Ratner paper  was excluded by Rosen due to lack of a control group. The paper described a small, uncontrolled observational study which was a follow-up of a randomized trial. That study was included by Baxi, who cited 5 papers relating to that trial, which focused on preventing smoking relapse post-partum. The additional reports were not found by, or relevant to, Rosen’s review. This could not be clearly classified as a discrepancy between reviewers, as it is possible that Baxi primarily utilized data from the other quoted studies. Therefore the reason for omission was categorized as “Mixed”.
Studies not identified
Two of these were excluded automatically by Rosen’s age filter [44, 70]. Two studies [49, 62] were not found by Baxi because of her search methodology: she did not search PUBMED, and these two publications were available in PUBMED, but not MEDLINE, at the time of search.
The following three articles, which appeared in Baxi’s review, but were not identified by Rosen’s electronic search, were not identified due to issues with filters, index terms or free-text search terms:
The Patel study  was in EMBASE at the time of Rosen’s search. Though the Abstract and Methods sections stated that participants had been randomized to intervention or control groups, and the study design was therefore a randomized controlled trial (RCT), there was no mention of RCT in the Abstract or title. Rather, the authors used the term “prospective follow-up Pilot study” in the Methods Section of the article. The article was not indexed as an RCT.
Rosen’s search filter relied on correct indexing (“randomized controlled trial” or “controlled clinical trial” as Emtree terms). Baxi found Patel in the Cochrane register, where it had been identified through a periodic search of EMBASE.
The Vineis study  was missed by Rosen’s MEDLINE search. Vinies called his study a “population-based trial” in the Abstract, and a “non-randomized experimental design” in the Methods section. Participants were assigned to intervention and control groups in a non-random manner. This was therefore a controlled trial, but was not termed as such by the authors.
The Pulley study , which was in PUBMED but not indexed in MEDLINE at the time of the search, was missed because neither the title nor the abstract mentioned that it was an RCT. Rather, it was described a “longitudinal, quasi-experimental design” in both the Abstract and the Methods section. Confirmation that this was an RCT is in the Methods section, in the statement: “Mother-infant pairs were randomly assigned to the either the control or intervention group (p.31). In addition, the status of this record in PUBMED is “Pubmed-not-Medline” which means that this record is not indexed.
Five studies which were included by Baxi but not identified by Rosen addressed topics not included in Rosen’s review (prevention of postpartum relapse to smoking [46, 47, 56, 64]; cessation among young mothers ). Two additional studies which were included by Baxi but not identified by Rosen [36, 67], addressed very general health outcomes, without a stated objective regarding protection of children from tobacco smoke. One of those papers  was identified during a search for papers for a review on another topic. Another was identified in previous versions of the Cochrane review, (Personal correspondence, Ruchi Baxi, 31 July 2014) but not through an electronic search for this review.
Duplicate / unclear study identification
One study which was included by Rosen but not by Baxi  may have been identified and excluded prior to reading the full-text article, but it is unclear (Personal correspondence, Baxi, July 31, 2014).
Another study was included in both reviews, but based on two different reports [34, 35]. Baxi included a conference report  which Rosen did not locate, as she did not search grey literature sources. Rosen included a published paper  which was not identified by Baxi because she did not search PUBMED, and at the time of search the paper was available on PUBMED but not on MEDLINE.
Summary of reasons for omissions
We found that omissions were attributable to: review scope (9 studies, 26 %) [40, 42–44, 46, 47, 56, 64, 70], outcomes of interest not measured (8 studies, 23 %) [38, 41, 45, 50, 54, 58, 59, 68], exclusion of reports with inadequate reporting (6 studies, 17 %) [39, 48, 51, 52, 61, 69], mixed or unclear reasons (3 studies, 8 %) [60, 63, 66], search strategies concerning filters, tagging, and keywords (3 studies, 8 %) [55, 57, 65], search strategies regarding sources: PUBMED not searched (2 studies, 6 %) [49, 62]; discordant interpretation of same eligibility criteria (2 studies, 6 %) [37, 53]; and non-identification due to non-specific study topic. (2 studies, 6 %) [36, 67].
Comparison of the two reviews
Data were synthesized differently in the two reviews. Baxi used a narrative approach to synthesizing the reviews, noting that this was because of heterogeneity of methodologies and outcome measures; she used the “head-counting”  approach to determining how many studies showed statistically significant results. Rosen took a meta-analytic approach. She used the random effects model due to heterogeneity between studies, and standardization to overcome the problem of heterogeneity between outcome measures. Using a narrative approach, as was done by Baxi, and using random effects models, as was done by Rosen, are both considered reasonable solutions to the problem of heterogeneity .
Assessment of quality using AMSTAR
The AMSTAR [7, 73] checklist was used to assess quality of the two SRs. Baxi’s review received a perfect score (11/11). Rosen received a score of 9/11. She lost one point because she did not have a published protocol and one point because she did not search the grey literature.
Comparisons of key results and conclusions
Review conclusions differed for both primary and subgroup analyses.
Baxi: The Results Section of the Abstract reported that “In only 14 of the 57 studies was there a statistically significant intervention effect for child ETS exposure.” In the Authors’ Conclusions Section of the Plain Language Summary, Baxi interpreted this and her other statements to mean that “Although several interventions …. have been used to try to reduce children’s tobacco smoke exposure, their effectiveness has not been clearly demonstrated.”
Rosen: Rosen’s results showed that interventions demonstrated some benefit to intervention participants at follow-up for parentally-reported exposure or protection (PREP outcome) (relative risk 1.12, p < .0001) and number of cigarettes smoked around children by parents at follow-up (P = .03). There was a non-significant trend towards benefit to interventions as measured by biomarkers ((RD 20.05, CI 20.13 to 0.03, P = .20). The summary statement in the Conclusions section of the abstract stated: “Interventions to prevent child TSE are moderately beneficial at the individual level.”
Using Moja’s  approach for discordant findings, Rosen’s review could be classified as “efficacious” because there was a statistically significant benefit to the intervention groups when using the parentally-reported measures, or as “mixed” as biochemical measures did not show a statistically significant effect.
The two SRs would probably be considered discordant by Moja’s criteria.
Baxi: Baxi found that “The review was unable to determine if any one intervention reduced parental smoking and child exposure more effectively than others, although seven studies were identified that reported intensive counselling or motivational interviewing provided in clinical settings was effective.” In the Conclusions Section of the text, she stated that “no intervention or setting was clearly more efficacious, and that intensive interventions for parents showed limited success.”
Rosen: In an exploratory subgroup analysis, Rosen found that “Most subgroups showed significant, albeit small, benefit to the interventions.”
Neither author had a very clear statement, as Baxi’s was ambiguous, and Rosen’s was based on exploratory analyses.
Translation into policy recommendations using GRADE. The GRADE system  provides guidance in how to use evidence to make recommendations. There are four possible recommendations: strong against, strong for, weak against, and weak for. Given the positive finding of Rosen on parentally-reported measures, and non-significant finding on biochemical measures, it is unclear what the classification would be. Baxi’s results might result in a “research only” recommendation.
Would identification and/or inclusion of studies omitted have changed review conclusions?
It is not possible to give a definitive answer to this question, but we tried to assess the possible impact of the unidentified studies.
ROSEN: Of the studies not identified by Rosen, three would have been excluded on the basis of inclusion criteria (age: Elder , Zhang , minimum follow-up: Pulley ), five on the basis of goals (postpartum relapse: French , Hannover , Phillips , Van’t Hopf , cessation of young mothers Davis ), one due to broad goals with no measured outcomes of interest (Wiggens ), two due to poor reporting (no reports on relevant outcomes at study end for all randomized participants (Vineis , Patel ). One study (Armstrong ) had general goals and included relevant data; whether it would have been included would have required a judgement call. Therefore, the maximum difference which could have occurred would have been the addition of one study to one of the four endpoints examined by Rosen. The meta-analysis was rerun, with nearly identical results (RR = 1.13, p < .0001). Therefore, identification of additional studies would not have affected review results.
BAXI: Two studies (Huang  and Streja ) included by Rosen were not identified by Baxi, an additional one (Teach ) may or may not have been identified, and one more (Lanphear ) may have been mistakenly omitted. Baxi (personal communication, July 31, 2014) thought that of these studies, only Streja  would have been included. It is not clear how this would have been handled by two Cochrane reviewers. The maximum difference in results would come from adding all four studies to the review. Two of the studies, Huang  and Teach , showed beneficial and statistically significant benefit to the intervention on parentally-reported child exposure or protection (PREP) at study end. Streja  did not show a statistically significant benefit on change in PREP. Lanphear  found no differences between intervention and control groups on child biomarkers at study end or on numbers of cigarettes smoked around the house.
If the Steja study was included in the review, then 14/58 instead of 14/57 studies would have been shown to be beneficial. The addition of all 4 studies- the maximum change - would have resulted in statistically-proven benefit in 16/61 (26.2 %) of studies instead of 14/57 studies (24.6 %). This would not have changed the main conclusions of the review.
The process of identifying studies for inclusion in a systematic review is complex, and involves both electronic and human aspects. Of the sixty-one studies included in the two reviews analyzed in this report, nearly 60 % (35/61) were present in just one review. Of these, over half (20/35), were identified and excluded by the parallel review, while over 40 % were not identified. Most omissions were due to differences in review scope (as expressed in inclusion criteria and search filters), measurement of outcomes, differing requirements for quantitative data, and search issues, including how and which sources were searched. A minority of omissions (2) resulted from discordant reviewer interpretations of identical inclusion criteria. We explore these issues below.
Review scope was an important factor in creating the different study cohorts in these reviews, as expressed in search strategies and inclusion criteria. It has been noted that differing inclusion criteria is one of the factors which contributes to SR discordance . In our comparison, omission of studies from one review was sometimes due to differences in inclusion criteria, which were much broader in Baxi’s review than in Rosen’s review. Differences in age criteria (up to age 12 in Rosen, and up to age 18 in Baxi) caused the exclusion of two studies [40, 43] and were likely responsible for the non-identification of two additional studies [44, 70]. The scope of the review was like also responsible for the non-identification of an additional 5 studies [42, 46, 47, 56, 64]. which would have been excluded from Rosen’s review.
Exclusions due to measurement and reporting of outcomes
The Cochrane MECIR best practice guidelines, state that neither outcomes nor poor reporting should be used as an exclusion criteria. Inclusion of these studies in the review, even if some studies won’t be used in the meta-analysis, is desirable because it allows readers to judge whether outcome reporting bias exists. The MECIR guideline recommends that “If authors do exclude studies on the basis of outcomes, care should be taken to ascertain that relevant outcomes are not available because they have not been measured rather than simply not reported.”
Missing relevant outcomes caused the exclusion of 8 studies by Rosen [38, 41, 45, 50, 54, 58, 59, 68]. Other studies not included in this comparison were excluded from both Rosen and Baxi due to outcomes. The absence of key data from six studies [39, 48, 51, 52, 61, 69] caused the exclusion of those studies from Rosen’s review, which employed a meta-analytic approach, but not from Baxi’s review, which used a narrative approach to synthesizing the data. Rosen differentiated between studies which didn’t collect outcomes (“no relevant outcomes reported”) and studies which didn’t report on relevant outcomes (“missing data”).
Electronic search issue 1: Search dates, MEDLINE, and PUBMED
Though MEDLINE and PUBMED are sometimes thought to be identical, they are not: According to the U.S. National Library of Medicine, as of May 2014, MEDLINE held over 21 million records, while PUBMED had over 23 million references, including all MEDLINE records and additional records . Some of the 2 million additional references refer to very recent publications which are not yet available in MEDLINE, while others refer to articles in journals not covered by MEDLINE. The reason why articles may appear in PUBMED earlier than MEDLINE is a function of the process of article entry into the two databases. When an article is first published electronically, publishers can upload it immediately to PUBMED, prior to print publication. The record receives the status of "Publisher" in PUBMED records, and at this stage it appears in PUBMED only, and does not yet appear in any segment of MEDLINE . The record goes through two additional stages before it fully enters into MEDLINE: first, the record receives a status of "In-Data Review," during which time the article data are validated; then, the record receives a status of "In Process" until MESH index terms are assigned. At this stage the record can also be found by searching Ovid MEDLINE In-Process database using free-text keywords.
This indexing process accounts for the time lag between appearance of a record in PUBMED and its appearance in MEDLINE. The time lag can be considerable. Duffy  investigated the time lags of two studies and found out that one study included in a SR was available in MEDLINE a month after it appeared in PUBMED, while a second study that was included in an SR did not appear in MEDLINE until six months after it appeared in PUBMED.
Three recent reports were missed by Baxi [35, 49, 62], who searched MEDLINE but not PUBMED, but were found by Rosen through her PUBMED search with free-text search terms. One of these studies  was included in abstract form in Baxi’s review. The time period for Stotts 2013  to move from "Publisher" status to "In-Process" and enter MEDLINE was five months, the time period for Huang 2013  was more than 8 months. Streja  entered MEDLINE only a year and a half after being available in PUBMED.
Electronic search issue 2: Indication of study design in title or abstract
In bibliographic databases such as MEDLINE, PUBMED and Embase, searches are done on the title, abstract and subject headings, not on the full-text of the article. When a record is not indexed, the ability to filter for study design is dependent upon correct reporting in the title and/or abstract of the report. One study , a “pubmed-not-medline” record, was missed because neither the title nor the abstract indicated that it was an RCT.
Electronic search issue 3: Index terms, free-text terms, and search filters
Index terms are assigned to articles in MEDLINE (MESH), EMBASE (EMTREE), and other databases. They describe the subject of the study and other parameters such as study design. Study design can serve as a filter to the search when a SR is limited to a certain study design, such as RCT. In this study, two reports were missed by Rosen due to problems with index terms [55, 65].
The indexing process is a complex task in which judgment calls by indexers play an important role. Indexing problems are not uncommon. Crumley  indicates that "For electronic databases, the reason cited most often (67 %) for missed studies was inadequate or inappropriate indexing". Our findings suggest that inadequate indexing is due at least in part to author failure to include study design explicitly in the title or abstract. Both the Cochrane Handbook  and the CONSORT reporting guidelines  require reporting of study design in the title or abstract. Our findings, and information about missed studies in the literature, support these recommendations.
For this reason, Cochrane Handbook and other guidelines also recommend using free-text terms in addition to index terms for the search process.
In order to help researchers combine their subject search terms with appropriate study design, methodological search filters were developed and tested for their quality. Cochrane developed filters for RCT in MEDLINE, (http://handbook.cochrane.org/ Section 6.4.11) but other organizations developed their own. For example, SIGN, a Scottish guidelines body, develops in-house filters and adapts other organizations' filters according to its needs. SIGN states that its filters "may provide less sensitive searches than used by other systematic reviewers such as The Cochrane Collaboration, but enable the retrieval of medical studies that are most likely to match SIGN's methodological criteria." (http://www.sign.ac.uk/methodology/filters.html#random) BMJ Clinical Evidence presents a different filter for RCT (http://clinicalevidence.bmj.com/x/set/static/ebm/learn/665076.html). CADTH, a Canadian organization that publishes evidence-based medicine research, also develops and maintains its own search filters (https://www.cadth.ca/resources/finding-evidence/strings-attached-cadths-database-search-filters).
Over the years, numerous filters have been developed, and one study compared between as many as 38 RCT filters for MEDLINE . These filters differ in the way and the time they were developed, tested and validated, and their performance is not always well reported .
Best choice of filter is difficult, and a clear guidance or tool to help researchers in this complex task is lacking. A recent study suggested that filter performance be presented as a forest plot, to allow visualization by reviewers of benefits . Though there is no consensus on how to choose a filter, using a validated filter is recommended.
The ubiquitous role of judgment calls
Even with clearly defined inclusion and exclusion criteria, and an appropriate search strategy, reviewers must regularly make judgment calls about whether to include or exclude a study. This occurs at all stages of the process of study identification: at the level of scanning records, at the level of reading abstracts, and at the level of reading full-text articles. It has previously been suggested that bias can be “injected” into meta-analysis at the stages of finding studies, selection of studies for inclusion, and data extraction .
In our comparison, two identified studies were omitted on the basis of differences in judgments between reviewers [37, 53]. We did not find reason to suspect bias. Rather, these examples illustrate the complexity of judgement calls.
The responsibility for making reasonable decisions falls on different individuals, at different points in the process. Errors might arise due to the creation of the search strategy, which may be under the control of one or more individuals; by authors of original reports, as they decide how to write the report and which data to include; by database indexers, and by reviewers.
Possible effects of missing studies on review results and conclusions
There are several possible effects of omitting studies from systematic reviews. First, such omissions reduce usefulness, because the review is incomplete. Second, such omissions could affect conclusions, particularly if many relevant studies were missed. Even a small number of missed studies – even if the results are in the same direction as the results of the found studies - may affect conclusions if the study estimates are synthesized using a meta-analytic approach: fewer studies result in a loss of power to detect a true effect if it exists. Third, such omissions may suggest bias in the results, either intentional (See: Goodyear-Smith ) or unintentional. The smaller bias in SRs with more comprehensive search strategies has been previously noted .
Would the review results have changed with the identification of or inclusion of omitted studies?
Our analyses showed that identification and inclusion of omitted studies would likely not have changed results of either review. The most likely reason for discrepant results is due to differences in summary methods. Differences in scope may have also played a role, though that is difficult to show, as the summary methods differed. It is well known that one of the advantages of meta-analytic synthesis over narrative synthesis is that small effects may be detected and quantified. The heterogeneity found by both reviews was dealt with in different ways, and loss of ability to detect true effects was the main downside of the narrative summary approach taken by Baxi.
Strengths and limitations
This work presents a unique approach to exploring why cohorts of original studies in similar systematic reviews differ. Unlike the few previous comparisons which explored reasons for discrepancies in SR cohorts [19, 21, 28], this study focused on the study identification process, and used detailed, date-specific information from MEDLINE, PUBMED, and EMBASE to understand why some studies were missed. The publication of two reviews almost simultaneously on the same topic, with similar dates of searching, allowed the comparison.
Although only two reviews were included in the comparison, we were able to identify several lacunae in the SR study identification and selection process. Because our findings about omissions are general in nature, they are likely to be problematic in some other systematic searches as well. More work is necessary to assess how prevalent these issues are. Adopting the comparative approach used here will likely yield information on other stumbling blocks to identification of all relevant studies for SRs. Combining knowledge from such comparisons could serve as an engine for improved methods to identify relevant studies.
Our recommendations for enhancing complete identification of relevant studies for systematic reviews can be found in Table 4 below.
Tobacco smoke exposure
Randomized controlled (or clinical) trial
Stegenga J. Is meta-analysis the platinum standard of evidence? Stud Hist Philos Biol Biomed Sci. 2011;42(4):497–507.
Mulrow CD. Rationale for systematic reviews. BMJ. 1994;309(6954):597–9.
Mullen PD, Ramirez G. The promise and pitfalls of systematic reviews. Annu Rev Public Health. 2006;27:81–102.
Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet. 1999;354(9193):1896–900.
Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. 2009;62(10):1006–12.
Hutton B, Salanti G, Chaimani A, Caldwell D, Schmid C, Thorlund K, et al. The Quality of Reporting Methods and Results in Network Meta-Analyses: An Overview of Reviews and Suggestions for Improvement. PLoS One. 2014;9(3):e92508.
Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:10.
Whiting P, Savovic J, Higgins J, Caldwell D, Reeves BC, Shea BJ, et al. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34.
Delaney A, Bagshaw SM, Ferland A, Laupland K, Manns B, Doig C. The quality of reports of critical care meta-analyses in the Cochrane Database of Systematic Reviews: an independent appraisal. Crit Care Med. 2007;35(2):589–94.
Jadad AR, Cook DJ, Jones A, Klassen TP, Tugwell P, Moher M, et al. Methodology and reports of systematic reviews and meta-analyses: a comparison of Cochrane reviews with articles published in paper-based journals. JAMA. 1998;280(3):278–80.
Moher D, Tetzlaff J, Tricco AC, Sampson M, Altman DG. Epidemiology and reporting characteristics of systematic reviews. PLoS Med. 2007;4(3), e78.
Wen J, Ren Y, Wang L, Li Y, Liu Y, Zhou M, et al. The reporting quality of meta-analyses improves: a random sampling study. J Clin Epidemiol. 2008;61(8):770–5.
Willis BH, Quigley M. The assessment of the quality of reporting of meta-analyses in diagnostic research: a systematic review. BMC Med Res Methodol. 2011;11:163.
J. C, Churchill R, Higgins J, Lasserson T, Tovey D. Methodological standards for the conduct of new Cochrane Intervention Reviews. The Cochrane Library 2013(2.3). http://editorial-unit.cochrane.org/sites/editorial-unit.cochrane.org/files/uploads/MECIR_conduct_standards%202.3%2002122013_0.pdf
Jadad AR, Cook DJ, Browman J. A guide to interpreting discordant systematic reviews. Can Med Assoc J. 1997;156:1411–6.
Moja L, del Rio MP F, Banzi R, Cusi C, D'Amico R, Liberati A, et al. Multiple systematic reviews: methods for assessing discordances of results. Intern Emerg Med. 2012;7(6):563–8.
Alperson SY, Berger VW. Opposing systematic reviews: the effects of two quality rating instruments on evidence regarding t'ai chi and bone mineral density in postmenopausal women. J Altern Complement Med. 2011;17(5):389–95.
Campbell J, Bellamy N, Gee T. Differences between systematic reviews/meta-analyses of hyaluronic acid/hyaluronan/hylan in osteoarthritis of the knee. Osteoarthritis Cartilage. 2007;15(12):1424–36.
Cook DJ, Reeve BK, Guyatt GH, Heyland DK, Griffith LE, Buckingham L, et al. Stress ulcer prophylaxis in critically ill patients. Resolving discordant meta-analyses. JAMA. 1996;275(4):308–14.
Katerndahl DA, Lawler WR. Variability in meta-analytic results concerning the value of cholesterol reduction in coronary heart disease: a meta-meta-analysis. Am J Epidemiol. 1999;149(5):429–41.
Linde K, Willich SN. How objective are systematic reviews? Differences between reviews on complementary medicine. J R Soc Med. 2003;96(1):17–22.
Rosen LJ, Ben Noach M. Systematic reviews on tobacco control from Cochrane and the Community Guide: different methods, similar findings. J Clin Epidemiol. 2010;63(6):596–606.
Thacker SB, Stroup DF. Methods and interpretation in systematic reviews: commentary on two parallel reviews of epidural analgesia during labor. Am J Obstet Gynecol. 2002;186(5 Suppl Nature):S78–80.
Bown MJ, Sutton AJ. Quality control in systematic reviews and meta-analyses. Eur J Vasc Endovasc Surg. 2010;40(5):669–77.
Ford A, Guyatt G, Talley N, Moayyedi P. Errors in the conduct of systematic reviews of pharmacological interventions for irritable bowel syndrome. Am J Gastroenterol. 2010;105(2):280–8.
Higgins JPT, Green S E. Cochrane Handbook for Systematic Reviews of Interventions 4.2.6 [updated September 2006]. Chichester, UK: The Cochrane Library, Issue 4. John Wiley & Sons, Ltd.; 2006.
Higgins J, Green S (editors): Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. In.: Cochrane Collaboration; 2011.
Goodyear‐Smith F, van Driel M, Arroll B, Del Mar C. Analysis of decisions made in meta-analyses of depression screening and the risk of confirmation bias: A case study. BMC Med Res Methodol. 2012;12:76.
Rosen LJ, Myers V, Hovell M, Zucker D, Ben Noach M. Meta-analysis of parental protection of children from tobacco smoke exposure. Pediatrics. 2014;133(4):698–714.
Baxi R, Sharma M, Roseby R, Polnay A, Priest N, Waters E, et al. Family and carer smoking control programmes for reducing children's exposure to environmental tobacco smoke. Cochrane Database Syst Rev. 2014;3, CD001746.
Andrews J, Guyatt GH, Oxman AD. GRADE guideslines: 14. Going from evidence to recommendations: the significance and presentation of recommendations. J Clin Epidemiol. 2013;66(7):719–25.
Rosen LJ, Noach MB, Winickoff JP, Hovell MF. Parental smoking cessation to protect young children: a systematic review and meta-analysis. Pediatrics. 2012;129(1):141–52.
Rosen LJ. E-letter: Revised tables re: Parental Smoking Cessation to Protect Young Children: A Systematic Review and Meta-analysis. Pediatrics. 2013.
Stotts A, Northrup T, Green C, Evans P, Tyson J, Hovell M: The Baby's Breath Project: A pilot trial to reduce secondhand smoke exposure in high respiratory risk infants in the neonatal intensive care unit (POS1-69). In: Society for Research on Nicotine and Tobacco 18th Annual Meeting March 13-16, 2012, Houston, Texas. 2012: 60.
Stotts AL, Green C, Northrup TF, Dodrill CL, Evans P, Tyson J, et al. Feasibility and efficacy of an intervention to reduce secondhand smoke exposure among infants discharged from a neonatal intensive care unit. J Perinatol. 2013;33(10):811–6.
Armstrong KL, Fraser JA, Dadds MR, Morris J. Promoting secure attachment, maternal mood and child health in a vulnerable population: A randomized controlled trial. J Paediatr Child Health. 2000;36(6):555–62.
Borrelli B, McQuaid EL, Novak SP, Hammond SK, Becker B. Motivating Latino Caregivers of Children With Asthma to Quit Smoking: A Randomized Trial. J Consult Clin Psychol. 2010;78(1):34–43.
Chan SSC, Lam TH, Salili F, Leung GM, Wong DCN, Botelho RJ, et al. A randomized controlled trial of an individualized motivational intervention on smoking cessation for parents of sick children: A pilot study. Appl Nurs Res. 2005;18(3):178–81.
Chilmonczyk BA, Palomaki GE, Knight GJ, Williams J, Haddow JE. An unsuccessful cotinine-assisted intervention strategy to reduce environmental tobacco smoke exposure during infancy. Am J Dis Child. 1992;146(3):357–60.
Culp AM, Culp RE, Anderson JW, Carter S. Health and safety intervention with first-time mothers. Health Educ Res. 2007;22(2):285–94.
Curry SJ, Ludman EJ, Graham E, Stout J, Grothaus L, Lozano P. Pediatric-based smoking cessation intervention for low-income women: A randomized trial. Arch Pediatr Adolesc Med. 2003;157(3):295–302.
Davis SW, Cummings KM, Rimer BK, Sciandra R, Stone JC. The impact of tailored self-help smoking cessation guides on young mothers. Health Educ Q. 1992;19(4):495–504.
Ekerbicer HC, Celik M, Guler E, Davutoglu M, Kilinc M. Evaluating environmental tobacco smoke exposure in a Group of turkish primary school students and developing intervention methods for prevention. BMC Public Health. 2007;7.
Elder JP, Perry CL, Stone EJ, Johnson CC, Yang M, Edmundson EW, et al. Tobacco use measurement, prediction, and intervention in elementary schools in four states: The CATCH study. Prev Med. 1996;25(4):486–94.
Emmons KM, Hammond SK, Fava JL, Velicer WF, Evans JL, Monroe AD. A randomized trial to reduce passive smoke exposure in low-income households with young children. Pediatrics. 2001;108(1):18–24.
French GM, Groner JA, Wewers ME, Ahijevych K. Staying smoke free: An intervention to prevent postpartum relapse. Nicotine Tob Res. 2007;9(6):663–70.
Hannover W, Thyrian JR, Roske K, Grempler J, Rumpf HJ, John U, et al. Smoking cessation and relapse prevention for postpartum women: Results from a randomized controlled trial at 6, 12, 18 and 24 months. Addict Behav. 2009;34(1):1–8.
Herbert RJ, Gagnon AJ, O'Loughlin JL, Rennick JE. Testing an empowerment intervention to help parents make homes smoke-free: a randomized controlled trial. J Community Health. 2011;36(4):650–7.
Huang CM, Wu HL, Huang SH, Chien LY, Guo JL. Transtheoretical model-based passive smoking prevention programme among pregnant women and mothers of young children. Eur J Public Health. 2013;23(5):777–82.
Hughes DM, McLeod M, Garner B, Goldbloom RB. Controlled trial of a home and ambulatory program for asthmatic children. Pediatrics. 1991;87(1):54–61.
Kallio K, Jokinen E, Hamalainen M, Kaitosaari T, Volanen I, Viikari J, et al. Impact of repeated lifestyle counselling in an atherosclerosis prevention trial on parental smoking and children's exposure to tobacco smoke. Acta Paediatr, Int J Paediatr. 2006;95(3):283–90.
Kimata H. Cessation of passive smoking reduces allergic responses and plasma neurotrophin. Eur J Clin Invest. 2004;34(2):165–6.
Lanphear BP, Hornung RW, Khoury J, Yolton K, Lierl M, Kalkbrenner A. Effects of HEPA air cleaners on unscheduled asthma visits and asthma symptoms for children exposed to secondhand tobacco smoke. Pediatrics. 2011;127(1):93–101.
Nuesslein TG, Struwe A, Maiwald N, Rieger C, Stephan V. Maternal tobacco consumption can be reduced by simple intervention of the paediatrician. Klin Padiatr. 2006;218(5):283–6.
Patel S, Hendry P, Kalynych C, Butterfield R, Lott M, Lukens-Bull K. The impact of third-hand smoke education in a pediatric emergency department on caregiver smoking policies and quit status: A pilot study. Int J Disabil Human Dev. 2012;11(4):335–42.
Phillips RM, Merritt TA, Goldstein MR, Deming DD, Slater LE, Angeles DM. Prevention of postpartum smoking relapse in mothers of infants in the neonatal intensive care unit. J Perinatol. 2012;32(5):374–80.
Pulley KR, Flanders-Stepans MB. Smoking hygiene: an educational intervention to reduce respiratory symptoms in breastfeeding infants exposed to tobacco. J Perinat Educ. 2002;11(3):28–37.
Ralston S, Roohi M. A randomized, controlled trial of smoking cessation counseling provided during child hospitalization for respiratory illness. Pediatr Pulmonol. 2008;43(6):561–6.
Ralston S, Grohman C, Word D, Williams J. A randomized trial of a brief intervention to promote smoking cessation for parents during child hospitalization. Pediatr Pulmonol. 2013;48(6):608–13.
Ratner PA, Johnson JL, Bottorff JL. Mothers' efforts to protect their infants from environmental tobacco smoke. Can J Public Health. 2001;92(1):46–7.
Schonberger HJAM, Dompeling E, Knottnerus JA, Maas T, Muris JWM, van Weel C, et al. The PREVASC study: The clinical effect of a multifaceted educational intervention to prevent childhood asthma. Eur Respir J. 2005;25(4):660–70.
Streja L, Crespi CM, Bastani R, Wong GC, Jones CA, Bernert JT, et al. Can a minimal intervention reduce secondhand smoke exposure among children with asthma from low income minority families? Results of a randomized trial. J Immigr Minor Health. 2014;16(2):256–64.
Teach SJ, Crain EF, Quint DM, Hylan ML, Joseph JG: Improved asthma outcomes in a high-morbidity pediatric population: results of an emergency department-based randomized clinical trial. Arch Pediatr Adolesc Med, 160(5):535-41.
Van't Hof SM, Wall MA, Dowler DW, Stark MJ. Randomised controlled trial of a postpartum relapse prevention intervention. Tob Control. 2000;9 Suppl 3:III64–6.
Vineis P, Ronco G, Ciccone G, Vernero E, Troia B, D'Incalci T, et al. Prevention of exposure of young children to parental tobacco smoke: Effectiveness of an educational program. Tumori. 1993;79(3):183–6.
Wahlgren DR, Hovell MF, Meltzer SB, Hofstetter CR, Zakarian JM. Reduction of environmental tobacco smoke exposure in asthmatic children: A 2-year follow-up. Chest. 1997;111(1):81–8.
Wiggins M, Oakley A, Roberts I, Turner H, Rajan L, Austerberry H, et al. Postnatal support for mothers living in disadvantaged inner city areas: A randomised controlled trial. J Epidemiol Community Health. 2005;59(4):288–95.
Winickoff JP, Healey EA, Regan S, Park ER, Cole C, Friebely J, et al. Using the postpartum hospital stay to address mothers' and fathers' smoking: The NEWS study. Pediatrics. 2010;125(3):518–25.
Woodward A, Owen N, Grgurinovich N, Griffith F, Linke H. Trial of an intervention to reduce passive smoking in infancy. Pediatr Pulmonol. 1987;3(3):173–8.
Zhang D, Qiu X. School-based tobacco-use prevention - People's Republic of China, May 1989-January 1990. JAMA. 1993;269(23):2972.
Hovell MF, Meltzer SB, Zakarian JM, Wahlgren DR, Emerson JA, Hofstetter CR, et al. Reduction of environmental tobacco smoke exposure among asthmatic children: a controlled trial. Chest. 1994;106(2):440–6.
Borenstein M, Hedges L, Higgins J, Rothstein H. Introduction to Meta-Analysis. West Sussix, UK: Wiley; 2009.
AMSTAR Checklist [http://amstar.ca/Amstar_Checklist.php]
Fact Sheet: MEDLINE, PubMed, and PMC (PubMed Central): How are they different? [Available from: http://www.nlm.nih.gov/pubs/factsheets/dif_med_pub.html]
MEDLINE®/PubMed® Data Element (Field) Descriptions. [http://www.nlm.nih.gov/bsd/mms/medlineelements.html#stat]
Duffy S, Misso K, Noake C, Ross J, L. S: Supplementary searches of PubMed to improve currency of MEDLINE and MEDLINE In-Process searches via OvidSP. Kleijnen Systematic Reviews Ltd, York. In: UK InterTASC Information Specialists’ Sub-Group (ISSG) Workshop. Exeter: UK; 2014.
Crumley ET, Wiebe N, Cramer K, Klassen TP, Hartling L. Which resources should be used to identify RCT/CCTs for systematic reviews: a systematic review. BMC Med Res Methodol. 2005;5:24.
CONSORT 2010 checklist of information to include when reporting a randomised trial [http://www.equator-network.org/reporting-guidelines/consort/]
McKibbon KA, Wilczynski NL, Haynes RB. Retrieving randomized controlled trials from medline: a comparison of 38 published search filters. Health Inf Libr J. 2009;26(3):187–202.
Jenkins M. Evaluation of methodological search filters—a review. Health Inf Libr J. 2004;21(3):148–63.
Harbour J, Fraser C, Lefebvre C, Glanville J, Beale S, Boachie C, et al. Reporting methodological search filter performance comparisons: a literature review. Health Inf Libr J. 2014;31(3):176–94.
Felson D. Bias in meta-analytic research. J Clin Epidemiol. 1992;45(8):885–92.
We gratefully acknowledge the assistance of the Flight Attendant Medical Research Institute in this research (Award # 072086_YCSA). We thank Ruchi Baxi and Lindsay Stead for providing unpublished details about the search process in the Cochrane review. We thank Vicki Myers-Gamliel for statistical assistance with the revised meta-analysis.
The authors have no competing interests.
The study was conceived of and designed by LJR and RS. LJR compared eligibility criteria, analyzed reasons for omission, drafted most of the manuscript, and approved the final manuscript. RS carefully analyzed search procedures, contributed to the writing, critically edited the entire manuscript, and approved the final manuscript.
About this article
Cite this article
Rosen, L., Suhami, R. The art and science of study identification: a comparative analysis of two systematic reviews. BMC Med Res Methodol 16, 24 (2016). https://doi.org/10.1186/s12874-016-0118-2
- Systematic reviews
- Evidence-based decision making
- Electronic searching
- Tobacco smoke exposure
- Tobacco control