Disagreement in primary study selection between systematic reviews on negative pressure wound therapy

Background Primary study selection between systematic reviews is inconsistent, and reviews on the same topic may reach different conclusions. Our main objective was to compare systematic reviews on negative pressure wound therapy (NPWT) regarding their agreement in primary study selection. Methods This retrospective analysis was conducted within the framework of a systematic review (a full review and a subsequent rapid report) on NPWT prepared by the Institute for Quality and Efficiency in Health Care (IQWiG). For the IQWiG review and rapid report, 4 bibliographic databases (MEDLINE, EMBASE, The Cochrane Library, and CINAHL) were searched to identify systematic reviews and primary studies on NPWT versus conventional wound therapy in patients with acute or chronic wounds. All databases were searched from inception to December 2006. For the present analysis, reviews on NPWT were classified as eligible systematic reviews if multiple sources were systematically searched and the search strategy was documented. To ensure comparability between reviews, only reviews published in or after December 2004 and only studies published before June 2004 were considered. Eligible reviews were compared in respect of the methodology applied and the selection of primary studies. Results A total of 5 systematic reviews (including the IQWiG review) and 16 primary studies were analysed. The reviews included between 4 and 13 primary studies published before June 2004. Two reviews considered only randomised controlled trials (RCTs). Three reviews considered both RCTs and non-RCTs. The overall agreement in study selection between reviews was 96% for RCTs (24 of 25 options) and 57% for non-RCTs (12 of 21 options). Due to considerable disagreement in the citation and selection of non-RCTs, we contacted the review authors for clarification (this was not initially planned); all authors or institutions responded. According to published information and the additional information provided, most differences between reviews arose from variations in inclusion criteria or inter-author study classification, as well as from different reporting styles (citation or non-citation) for excluded studies. Conclusion The citation and selection of primary studies differ between systematic reviews on NPWT, particularly with regard to non-RCTs. Uniform methodological and reporting standards need to be applied to ensure comparability between reviews as well as the validity of their conclusions.


Background
Although systematic reviews are a valuable tool in the synthesis of evidence, they should be interpreted with caution [1]. The sharp rise in the number of systematic reviews published over the past decades has led to a concomitant increase in discordant results and conclusions between reviews on the same research question [2][3][4][5]. This has caused disputes between researchers and created difficulties for decision-makers in selecting appropriate health care interventions. Among other things, discordance between reviews may be caused by differences in primary study selection [6] due to variations in literature search strategies, selection criteria, and the application of selection criteria [2].
The Institute for Quality and Efficiency in Health Care (Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen, IQWiG) conducted a systematic review on the effectiveness and safety of negative pressure wound therapy (NPWT) versus conventional wound therapy in patients with acute or chronic wounds. The NPWT technique aims to accelerate wound healing by placing a foam dressing in the wound and applying controlled subatmospheric pressure [7]. The German-language full review and a rapid report on studies subsequently published are available on the IQWiG website [8,9]. In addition, an English-language journal article has been published [10].
An additional retrospective analysis was conducted in order to compare different systematic reviews on NPWT regarding their agreement in primary study selection. The review methodologies were also compared.

Methods
For the IQWiG review and rapid report, 4 bibliographic databases (MEDLINE, EMBASE, The Cochrane Library, and CINAHL) were searched to identify systematic reviews and primary studies on NPWT versus conventional wound therapy in patients with acute or chronic wounds. All databases were searched from inception to May 2005 (review) and between May 2005 and December 2006 (rapid report).
The multi-source search strategy and literature screening are described in detail elsewhere [8]. Eligible primary studies were randomised controlled trials (RCTs), as well as non-randomised controlled trials (non-RCTs) with a concurrent control group. Studies were classified as nonrandomised if allocation concealment was viewed as inadequate [11]. Quasi-randomised studies were therefore classified as non-randomised. The intervention was categorised as NPWT if a medical device system identical or comparable to the vacuum-assisted closure (V.A.C. ® ) system was used. Studies were considered to be eligible only if publicly accessible full-text articles or other com-prehensive study information (e.g. clinical study reports provided by manufacturers) were available.
For the present analysis, an identical and sufficiently large primary study pool, i.e. the pool of studies that could potentially be identified by all reviews, was required to ensure comparability between reviews. As a preliminary analysis showed that early reviews merely included 2 to 4 primary studies, only reviews published in or after December 2004 were considered.
Eligible reviews had to include data from completed primary studies on NPWT. Reviews were classified as systematic reviews (as opposed to narrative reviews) if multiple sources were searched (at least MEDLINE and The Cochrane Library), and the search strategy (including the search date) was documented [12].
Primary studies were eligible for inclusion only if they had been published before June 2004 and if the entry date of a study in a database preceded the date of the literature search of any systematic review analysed.
The methodology and primary study selection between reviews were compared, and the overall agreement in study selection between reviews was reported.
Only a summary of the reviews' quality assessment of primary studies and their conclusions on the effectiveness of NPWT is presented here, as the main focus of this paper was to compare the agreement in primary study selection between reviews.
The methods applied in the reviews included are presented in Tables 2 and 3. Regarding bibliographic databases, all reviews used MEDLINE, EMBASE, and The Cochrane Library, but the nursing database CINAHL was used only by IQWiG. The search terms applied varied between reviews. Regarding study design, the IQWiG review [8], as well as the reviews by Costa 2005 [30] and Pham 2006 [31] considered both RCTs and non-RCTs, while the reviews by Samson 2004 [29] and OHTAC 2006 [32] took only RCTs into account.
As the comparison of systematic reviews based on published information showed numerous inconsistencies, we decided to contact the authors of the other reviews for clarification (this was not initially planned). We received responses from all authors approached (or from other researchers at the publishing institutions). After reviewing the responses, it became clear that reporting styles for excluded studies differed between reviews. For example, the response by OHTAC stated that "it must be noted that we do not routinely cite or analyse studies that have been excluded from our EBAs (evidence-based analyses)" [personal communication]. It consequently became apparent that some studies we had initially classified as "not identified by other reviews" had actually been identified but excluded, and subsequently not reported. We therefore changed the classification of studies not cited in reviews to "not reported". In addition, the authors of reviews corrected or clarified published information (their comments are included in Tables 4,5,6); in this context we thank them for generously providing information.
Details of the primary study selection are presented according to the study classification by IQWiG in Tables 4   (5 RCTs), 5 (7 non-RCTs), and 6 (3 non-RCTs and 1 RCT excluded by IQWiG, but included by at least one other review).
The reviews included between 4 and 13 eligible primary studies published before June 2004. With regard to RCTs, the overall agreement in primary study selection between reviews was 96% (24 of 25 options) ( Table 5).
More variations were noted concerning the selection of non-RCTs; the agreement between reviews considering both RCTs and non-RCTs was 57% (12 of 21 options). Of the 9 mismatches, according to published information and the information provided by authors or institutions, 7 were due to different inclusion criteria (e.g. language criteria), and 2 were due to variations in study classification (Table 5).
Four studies (3 non-RCTs and 1 RCT) were excluded by IQWiG but included by at least one other review. The reasons for exclusion were as follows: the study included historical controls (2 non-RCTs [13,26]); the intervention applied was not comparable to the NPWT technique (1 non-RCT [14]); or an additional intervention was applied that may have affected the study outcomes (1 RCT [19]) ( Table 6). Substantial variations in study selection were shown between reviews.
Only the IQWiG review included a meta-analysis (changes in wound size), which indicated an advantage in favour of NPWT. However, only a few trials with small sample sizes were analysed.
The overall quality of the primary studies was assessed in 3 of 5 reviews, and was in general classified as poor. All reviews concluded that the evidence base on NPWT was insufficient (Table 7).

Discussion
An analysis of 5 systematic reviews on NPWT showed differences (which mainly concerned non-RCTs) in the citation and selection of primary studies.
We would like to emphasize that by presenting these differences, we are not implying that the 4 other reviews identified were of inferior quality compared with the IQWiG review. Variations in the number of primary studies identified and selected are not surprising, as the reviews used different search strategies, literature sources, and inclusion criteria. After correspondence with the authors of the other reviews, many differences regarding the citation of primary studies could be attributed to different reporting styles (citation or non-citation) for excluded studies, not to the non-detection of studies in the literature searches.
Most differences in study selection resulted from variations in inclusion and exclusion criteria. For example, due to language restrictions, studies published in German were selected by IQWiG, but not by other reviews. Opinions on the relevance of language bias differ; a study published in 1997 comparing English and German-language publications concluded that English-language bias may be introduced in systematic reviews if they include only trials reported in English [33]. In contrast, a more recent publication noted that, for conventional medicinal interventions, language restrictions did not appear to bias estimates of effectiveness [34]. Moreover, for Germanlanguage publications on RCTs, it has been reported that German medical journals no longer play a role in the dissemination of trial results [35].
The inclusion criteria for primary study design were also inconsistent; 3 reviews (including the IQWiG review) considered both RCTs and non-RCTs, and 2 reviews considered only RCTs. The non-RCTs included in our analysis were non-randomised controlled intervention studies. However, there are many different study types that can be seen as non-RCTs (e.g., classical observational studies). The inclusion of non-RCTs in systematic reviews is inconsistent and controversial [36][37][38][39][40]. The validity of systematic reviews including non-RCTs may be affected by the differing susceptibility of RCTs and non-RCTs to selection Flow chart of the review selection Figure 1 Flow chart of the review selection. Flow chart of the study selection  *"Considered to be of less clinical importance" [29]. † Costa also considered economic outcomes. ‡ One crossover study involving 7 patients was also included. § Unpublished data from primary studies were only to be considered in the review if comprehensive study information (e.g. a clinical study report) was available. || Additional information (IQWiG): An English-language title was required. No language restrictions were otherwise posed. If an English-language title or abstract indicated the potential relevance of a foreignlanguage text, the text was obtained and translated. ¶ Personal communication (C. Perera, ASERNIP-S): "This publication draws from an accelerated systematic review which was published in 2003 and is accessible at http://www.surgeons.org/AM/ Template.cfm?Section=ASERNIP_S_Publications&CONTENTID=14159&TEMPLATE=/CM/ContentDisplay.cfm. This review contains the full methodological details, including search strategies and inclusion/ exclusion criteria. An accelerated systematic review uses the same methodology as a full systematic review, but may restrict the types of studies considered in order to produce the review in a shorter time period than the full systematic review. For example, accelerated reviews generally only include comparative studies and not case series, unless safety outcomes were inadequately described in the comparative evidence." **Wound types listed in the results section, not in the methods. † † "Conference abstracts and manufacturer's information were included if they contained relevant safety and efficacy data." [31] ‡ ‡ Information according to [31]. Additional information: "Searches for the review were conducted without language restriction in the first instance; however, included studies were limited to those published in English. An exception to this would be if there was a paucity of English language evidence, or if a landmark RCT was published in a non-English language, in which case the studies would then be translated and included."   Vacuum-assisted closure: † -"topical negative pressure" -"sub-atmospheric pressure therapy" (also "subatmospheric") -"sub-atmospheric pressure dressing" (also "subatmospheric") -"vacuum sealing" -"vacuum assisted closure" -"negative pressure dressing" -"negative pressure therapy" -"foam suction dressing" -"vacuum compression" -"vacuum pack" -"sealed surface wound suction" -"sealing aspirative therapy" Wounds: -"wound*" -"ulcer*" -"decubit*" -"incision*" -"dressing" -"free flap" -"skin graft*" -"skin transplantation" -"degloving injuries" -"degloving injury"

Costa/MUHC TAU 2005[30]
"vacuum" or "vacuum-assisted" or "VAC" or "negative pressure" or "suction dressing" or "subatmospheric" or "sub-atmospheric" or "subatmospheric pressure" or "NPWT" and "wound healing" || *End of search period. † "The intersection of the vacuum-assisted closure terms and wound terms served as the initial pool of references. These were cross-referenced with the terms for randomized trials compiled by the Cochrane Collaboration...." For further details, please see Appendix A [29]. ‡ Request for "lists of published, randomized, controlled trials (RCTs), published abstracts of RCTs within the past 2 years, and published articles on study design, or protocols of any RCTs (published or in press)" [29]. § CENTRAL: 2003 ("through issue number 4", [29]). || "Health technology agencies databases were also searched for technology assessment reports, systematic reviews and economic studies with the keywords 'vacuum', 'subatmospheric pressure', and 'subatmospheric pressure' used individually" [30]. ¶ Unpublished data from primary studies were only to be considered in the review if comprehensive study information (e.g. a clinical study report) was available. ** "Updated searches were performed in July 2005 to include any new RCTs" [31].

Non-RCT
Non-RCT Non-RCT Non-RCT Non-RCT Kamolz 2004. IQWiG: The primary outcome was not a clinical but a surrogate outcome. "The perfusion of both hands was measured using the technique of dynamic laser-fluorescence-videography" [21]. IQWiG included this study because the outcome "pain" was reported in the results section ("All patients tolerated the V.A.C. application without major reports of pain and discomfort" [21]), although the method of pain measurement was unclear. Had these studies been landmark RCTs, they would have been translated and included in the review." Genecov 1998. PC (D. Samson): "This study was not a parallel groups or crossover randomized trial, but it was a within-subjects experimental design in which each participant served as his/her own control by receiving Opsite wound dressing and vacuum-assisted closure to separate wounds or wound areas. Since our review was focused on the primary outcome of progress to full wound healing and this study addressed only biopsy findings, this trial was excluded for reporting a non-relevant outcome." McCallon 2000. PC (D. Samson): "Rather than excluding a marginal study like this based on quality concerns, our review selected an inclusive pool of randomized controlled trials, then evaluated study quality, noting that this trial ...used an allocation method that was probably inadequate to be considered true randomization (p. 57)." Genecov 1998. PC (Medical Advisory Secretariat): "...the Genecov study is a case series of ten subjects and was incorrectly referred to as a randomized controlled trial (RCT). The study was excluded so how it was classified is not of particular relevance." McCallon 2000. PC (Medical Advisory Secretariat): "... excluded based on the information reported in the abstract; there were less than 20 patients and the study was designated as an RCT by MEDLINE. We would not have retrieved the full text to further examine the study to determine how the randomization process was conducted given its exclusion based on number of subjects." *Unless otherwise noted, the language of publication (abstract and full text) is English. For this analysis, all studies not classified as randomised trials in the systematic reviews were classified as non-RCTs by IQWiG.

OHTAC
† Databases containing primary studies (entry date: yyyy-mm-dd). ‡ German full text. § Review did not consider non-English or non-French full-text publications. || Where authors had classified studies as RCTs, the studies were also classified as RCTs in the Pham review, regardless of the methods used to randomise patients. Where the method of randomisation was      [29] Yes. 6× poor in quality "The body of evidence is insufficient to support conclusions about the effectiveness of vacuum-assisted closure in the treatment of wounds."

Costa/MUHC TAU 2005/[30]
No "Consequently, we agree with the conclusions of the previous technology assessment reports and systematic reviews [29,55,57,60,63,64] that there is insufficient evidence to recommend the routine use of this technology." IQWiG [8] Yes. 17× poor in quality "There are at present no results of adequate reliability which provide proof of the superiority of NPWT in comparison with conventional therapy and which would justify broad use of this method outside clinical trial settings." Pham/ASERNIP-S 2006 update [31] No "There is a paucity of high-quality RCTs on TNP for wound management with sufficient sample size and adequate power to detect any differences between TNP and standard dressings.". OHTAC [32] Yes bias [39], although it has been suggested that under certain conditions, estimates of effectiveness of non-RCTs may be valid if confounding is controlled for [40].
RCTs with adequately concealed allocation prevent selection bias and consequent distortions of treatment effects [41], and systematic reviews including RCTs represent the highest level of evidence for therapeutic interventions [42]. However, the quality and quantity of RCTs in surgical research is limited [43], and it has therefore been proposed not to base this type of research on RCTs alone [36,44]. Indeed, for some topics, non-RCTs are the only evidence available [45].
As for NPWT, although this treatment is widely applied in clinical practice, particularly in chronic wounds, at the time the IQWiG systematic review on NPWT was being planned only few RCTs were available; moreover, these were of poor quality [29]. However, there has been a recent increase in published RCTs, and as several of them are ongoing, more publications can be expected in the near future. One HTA agency has already changed its policy from including both RCTs and non-RCTs in systematic reviews on NPWT to one of including solely RCTs [32]. We agree with other researchers that non-RCTs should only be performed when RCTs are infeasible or unethical [38], and that systematic reviews including non-RCTs should only be conducted when RCTs are not available [39]. However, we emphasize that this should not be generalized to recommend excluding all kinds of non-randomised studies from systematic reviews on any topic and for any outcome of interest.
The type of non-RCT considered also differed: IQWiG's precondition for inclusion was the existence of a concurrent control group; studies with a historical control group were excluded, as systematic bias may arise from time trends in the outcomes of study participants [38].
Moreover, variations in the classification of study design were noted between reviews. For example, David Sampson, one of the other review authors, stated: "In general, our definition of randomized trials was probably more inclusive than yours. We decided to be inclusive due to the small number of potentially relevant studies available at that time. Our goal was to evaluate the quality of a larger pool of included studies rather than exclude more studies, based on quality concerns, to create a smaller pool of included studies" [personal communication].
As subjective factors are involved in the preparation of systematic reviews, inter-author variation is inevitable [46]. The evaluation of inter-author variation has shown that differences particularly affect the classification of study design [46,47]. One study showed that this was the case even when specific instructions and definitions were provided [47]. However, a recent analysis of the reproducibility of systematic reviews showed that, where authors were provided with guidelines for review preparation (including an algorithm to ensure that study designs were defined in a standardised manner), the overall reproducibility between reviews was good [48]. This finding emphasizes the relevance of standard reporting guidelines. The CON-SORT statement on improving the quality of reporting for RCTs has been available for over a decade [49], and a revised version was published in 2001 [50]. In contrast, guidelines for non-RCTs are more recent [51,52]. The introduction of uniform reporting standards for non-RCTs may improve the future quality of reporting and lead to a closer agreement in the primary study citation and selection of systematic reviews.
Even though the reviews analysed included different numbers and types of studies, all reviews reached similar conclusions. This may be explained by the fact that the overall quality of the data on NPWT is poor.

Conclusion
The citation and selection of primary studies differ between systematic reviews on NPWT, primarily with regard to non-RCTs. These differences arise from variations in review methodology and inter-author classification of study design, as well as from different reporting styles for excluded studies. Uniform methodological and reporting standards need to be applied to ensure comparability between reviews as well as the validity of their conclusions.