Methods to systematically review and meta-analyse observational studies: a systematic scoping review of recommendations

Mueller, Monika; D’Addario, Maddalena; Egger, Matthias; Cevallos, Myriam; Dekkers, Olaf; Mugglin, Catrina; Scott, Pippa

doi:10.1186/s12874-018-0495-9

Table 4 Key item with conflicting recommendations

From: Methods to systematically review and meta-analyse observational studies: a systematic scoping review of recommendations

	Recommendations in favour	Recommendations against
Research question
Should we formulate the research question as precise as possible?	“A focused research question is essential. The question that is asked needs to be as scientifically precise as possible.” [51] “While others (e.g., EPPI-Centre) have opted to answer very broad questions in their reviews, we have chosen to keep our questions quite specific. We have done this for two reasons. First, practitioners and policymakers want answers to specific questions, and so our reviews and their summary statements provide this. Second, keeping questions specific limits any one literature search and retrieval. Given that the “hit” rate for relevant articles in an electronic search regarding public health topics is about 10%, any review requires a lot of reviewer time to select the relevant articles from those identified. When topics are broad, the “hit” rate can be even lower, requiring more resources.” [53]	“Thus, questions that the review addresses may be broad or narrow in scope, with each one of them associated with their own advantages and disadvantages. While the questions may be refined based on the data which is available during the review, it is essential to guard against bias and modifying questions, as post-hoc questions are more susceptible to the bias than those asked a priori and data-driven questions can generate false conclusions based on spurious results.” [47] “A review needs to focus on meaningful and not trivial outcomes. The chosen focus of a review, whether broad or narrow, will not, in itself affect the quality of the review but, it will impact on its relevance.” [49] “The research question about safety and tolerability in a review may be broad or narrow in scope. […] In general, reviewers who have already identified important safety concerns (for instance, from the knowledge of the pharmacology, or anatomical site of the intervention) should carry out a narrow-focused evaluation covering particular aspects of the relevant adverse effects. On the other hand, reviewers who are not aware of any specific safety problems, could start with a general overview of the range of adverse effects associated with an intervention. A widely scoped review may be part of an initial evaluation which eventually throws up specific safety issues that merit further focused study.” [35]
Study eligibility
Should we include studies of all languages?	“Ideally, it would be best to include all studies regardless of language of publication. However, for practical reasons, many meta-analyses limit themselves to English language studies. Although this decreases the number of studies, it does not appear to bias the effect size”. [30]	“Including papers in all languages may actually introduce more bias into a meta-analysis”. [61]
Should we avoid multiple inclusions?	“authors must be careful to avoid the multiple inclusion of studies from which more than one publication has arisen”. [61]	“It is important that each entry in a meta-analysis represents an independent sample of data. Thus, for example, multiple reports of the same study need to be merged to obtain a single “best” answer for that study” [33]
Considering different study designs
Should we include both RCT and NRS in a single systematic review?	“When both randomized and non-randomized evidence are available, we favor a strategy of including NRS and RCTs in the same systematic review but synthesizing their results separately.” [75] “When an adverse event is rare or occurs a long time after intervening, including NRS in systematic reviews may be desirable because randomized controlled trials (RCTs) often have inadequate power to detect a difference in harm between intervention and control groups and commonly do not follow up participants in the long term …Another reason to include NRS in a systematic review is that there might be no or very few RCTs, and there may be a need to synthesize the best available evidence.” [75] “Systematic reviews that evaluate vaccine safety will need to expand to include study designs beyond RCTs. Randomisation is the only way to control for all unknown confounders, thereby minimising the effects of bias on the results. Only limited empirical evidence is available on the impact that non-randomised study designs may have on the measurement of adverse events.” [49] “Under ideal circumstances, studies of different designs should be included.” [34]	“Ideally, researchers should consider including only controlled trials with proper randomisation of patients that report on all initially included patients according to the intention to treat principle and with an objective, preferably blinded, outcome assessment.” [29] “Where RCTs (including cluster RCTs) are available to answer questions of effectiveness or efficacy they should be included in your review. This type of study design has the greatest potential for maximising internal validity. RCTs may not be available, and in these circumstances, non-RCTs are likely to represent the best available evidence and should be included” [39].
Should we pool results of different study designs in a single meta-analysis if results are similar over the different study designs?	“If the meta-analysis includes some randomized experiments and some observational studies, we can meta-analyze them separately and combine their results if they are quite similar, borrowing strength for the randomized experiments from the similar results of the nonrandomized studies.” [51] “The contribution of study design to heterogeneity in the effect estimates should be analysed and separate meta-analysis should be conducted by study design when the effect estimates systematically vary by design.” [34] “From these examples, we conclude that an initial stratification of results by study design is useful. A combined analysis should adjust for design features if there is heterogeneity across study designs or, alternatively, results should be reported separately for each design, and further exploration may be warranted to understand the sources of the differences.” [77]	“Generally, separate meta-analyses should be performed on studies of different designs. It is not usually advisable to combine studies of different designs in a single meta-analysis unless it can be determined that study design has little or no influence on study characteristics such as quality of data, specificity of exposure, and uniformity of diagnoses. In reality, study design is usually one of the most important determinants of data quality, exposure specificity, and diagnostic criteria. Similarly, studies with very different statistical techniques, different comparison populations, or different diagnostic categories should generally not be lumped into a single analysis.” [70] “Therefore, in most situations we do not recommend combining cohort and case-control studies in a single meta-analysis. The meta-analysis should at least be stratified by study design.” [70] “We favor a strategy of including NRS and RCTs in the same systematic review, but synthesizing their results separately. Including NRS will often make the limitations of the evidence derived from RCTs more apparent, thereby guiding inferences about generalizability, and may help with the design of the next generation of RCTs.” [75] “While there is absence of overall consensus on the reporting of nonrandomized studies, there is general agreement that combining data between nonrandomized and randomized studies is methodologically flawed, and that multilevel extrapolations should be avoided.” [56]
Risk of bias assessment
Should we use scales and summary scores to assess the quality of studies?	“The methodological quality of the recruited studies must be checked before analysis. There are several checklists and score systems to facilitate decision about the quality of a study”. [30] “The idea of computing some sort of quality score is attractive” [77]. “… a chosen quality scoring system, especially if oriented to measuring biases, might be used to adjust results” [77]	“We do not recommend the use of quality scoring for the simple reason that it would be impossible to treat different study characteristics … that are related to quality as if they are of equal importance or interchangeable and can be measured by a single score”. [70] “Most methodologists hate this. There is tremendous variability in calculating aggregate quality scores. Two biases may cancel out, have independent effects or multiplicative impact on the results”. [88] “Our broad recommendations are that tools should (i) include a small number of key domains; (ii) be as specific as possible (with due consideration of the particular study design and topic area); (iii) be a simple checklist rather than a scale and (iv) show evidence of careful development, and of their validity and reliability”. [89] “Finally, I wholeheartedly condemn quality scores because they conflate objective study properties (such as study design) with subjective and often arbitrary quality weighting schemes. Use of such scores can seriously obscure heterogeneity sources and should be replaced by stratification or regression analyses of the relation of study results to the items or components of the score”. [85] “It adds to the previous evidence that contemporary quality scores have little or no value in improving the utility of a meta-analysis. Indeed, they may introduce bias, because you get a different answer depending on which quality score you use. In addition, none of the quality scores considered clearly performed better than others when using large trials as a reference standard”. [92]
Publication bias
Should we assess publication bias with a funnel plot?	“Bias can be detected visually by drawing a funnel plot”. [55] “Publication bias is difficult to eliminate, but some statistical procedures may be helpful in detecting its presence. An inverted funnel plot is sometimes used to visually explore the possibility that publication bias is present”. [16] “A graphic device known as funnel plot can be employed to detect the presence of publication bias”. [48] “The likely presence or absence of bias should be routinely examined in sensitivity analysis and funnel plot”. [97]	“Important, but graphical attempts to detect publication bias can be influenced by the subjective expectations of the analyst”. [85]
Statistical analysis
Should we use statistical measures of heterogeneity to decide on statistical model?	“Failing to reject the null-hypothesis assumes that there is homogeneity across the studies and differences between studies are due to random error. In this case a fixed-effect analysis is appropriate” [55]. “… when statistical heterogeneity is present in a meta-analysis, a random effects model should be used to calculate the overall effect” [66].	“In taking account of heterogeneity when summarizing effect measures from observational studies many authors recommend formal tests of heterogeneity. However, the available tests often lack statistical power. This means that the possible existence should be considered even where the available tests fail to demonstrate it” [101]. “… the decision as to whether estimated differences are large enough to preclude combination or averaging across studies should depend on the scientific context, not just statistical significance” [34].

Back to article page

ISSN: 1471-2288

Contact us

General enquiries: journalsubmissions@springernature.com

BMC Medical Research Methodology

Contact us