Quality of observational studies of clinical interventions: a meta-epidemiological review
BMC Medical Research Methodology volume 22, Article number: 313 (2022)
This meta-epidemiological study aimed to assess methodological quality of a sample of contemporary non-randomised clinical studies of clinical interventions.
This was a cross-sectional study of observational studies published between January 1, 2012 and December 31, 2018. Studies were identified in PubMed using search terms ‘association’, ‘observational,’ ‘non-randomised’ ‘comparative effectiveness’ within titles or abstracts. Each study was appraised against 35 quality criteria by two authors independently, with each criterion rated fully, partially or not satisfied. These quality criteria were grouped into 6 categories: justification for observational design (n = 2); minimisation of bias in study design and data collection (n = 11); use of appropriate methods to create comparable groups (n = 6); appropriate adjustment of observed effects (n = 5); validation of observed effects (n = 9); and authors interpretations (n = 2).
Of 50 unique studies, 49 (98%) were published in two US general medical journals. No study fully satisfied all applicable criteria; the mean (+/−SD) proportion of applicable criteria fully satisfied across all studies was 72% (+/− 10%). The categories of quality criteria demonstrating the lowest proportions of fully satisfied criteria were measures used to adjust observed effects (criteria 20, 23, 24) and validate observed effects (criteria 25, 27, 33). Criteria associated with ≤50% of full satisfaction across studies, where applicable, comprised: imputation methods to account for missing data (50%); justification for not performing an RCT (42%); interaction analyses in identifying independent prognostic factors potentially influencing intervention effects (42%); use of statistical correction to minimise type 1 error in multiple outcome analyses (33%); clinically significant effect sizes (30%); residual bias analyses for unmeasured or unknown confounders (14%); and falsification tests for residual confounding (8%). The proportions of fully satisfied criteria did not change over time.
Recently published observational studies fail to fully satisfy more than one in four quality criteria. Criteria that were not or only partially satisfied were identified which serve as remediable targets for researchers and journal editors.
The growth of electronic medical records and other ‘real-world’ digitised sources of clinical data has led to a proliferation of observational studies of the effectiveness of clinical interventions. While the scientific standard for assessing intervention efficacy remains randomised controlled trials (RCTs), well-designed observational studies have been used to elucidate potential harms, and expand the evidence base in situations where existing RCTs have limited generalisability because of selective patient enrolment or outcome reporting, or new RCTs are logistically very difficult to perform . The main concern with observational studies is their vulnerability to bias, particularly confounding by indication , whereby patients receive a therapy based on certain patient or clinician characteristics which may not be explicitly stated or recorded, but which are prognostically important and influence the outcome of interest, independently of the therapy . In the past, influential observational studies have helped institutionalise scores of clinical practices for decades that were subsequently shown to be ineffective or indeed harmful when subjected to RCTs where randomisation eliminated selection bias in who received the experimental therapy .
Nevertheless, reviews of observational studies suggest that they often report effects and generate inferences similar to those of RCTs studying the same therapy and involving similar populations and outcome measures [5,6,7]. Advances in study design, statistical methods and clinical informatics have potential to lend greater rigour to observational studies . Multiple guidelines detailing methodological  and reporting  standards, and instruments for assessing study quality [10,11,12,13] exist. Although systems for grading evidence quality, such as Grades of Recommendation, Assessment, Development and Evaluation (GRADE), rank observational studies as being of lower quality than RCTs, they can be regarded as sources of valid data if they are well designed, show large effect sizes and account for all plausible confounders . Many systematic reviews include both RCT and high-quality observational studies in their analyses in deriving causal inferences .
However, the level of trustworthiness of observational studies remains controversial. We hypothesised that, due to advances in observational research, such studies are becoming more rigorous and valid. The aim of this meta-epidemiological study was to assess the methodological quality of a sample of recently reported non-randomised clinical studies of commonly used clinical interventions, and ascertain if quality is improving over time.
In reporting this study, we applied the guidelines for meta-epidemiological methodology research proposed by Murad and Young . No a priori study protocol existed or was registered.
A backward search from December 31, 2018 to January 1, 2012 was performed using PubMed with no language filters to identify observational studies of therapeutic interventions. Search terms comprised ‘association’, ‘observational,’ or ‘non-randomised’ or ‘comparative effectiveness’ within titles or abstracts. We included studies which: involved clinician-mediated therapeutic interventions administered directly to adult patients; reported comparison of two concurrent therapeutic interventions which could include ‘usual care’; and whose outcomes included patient-important sentinel events (ie mortality, serious morbid events, hospitalisations) rather than solely disease or symptom control measures.
We excluded studies that: 1) featured case control comparisons, historical controls only, single arm cohorts, adjunct therapies, diagnostic tests (with no associated therapeutic intervention) or cost-effectiveness analyses (with no separate comparative effectiveness data); 2) compared a single intervention group with a placebo group; 3) comprised RCTs, or reviews and meta-analyses of either RCTs or observational studies; 4) involved paediatric, anaesthetic or psychiatric interventions or patients; 5) analysed therapies which were highly specialised, or not in common use (eg genetically guided therapies, investigational agents in research settings); 6) assessed effects of system-related innovations rather than direct clinical care (eg funding or governance structures); 7) studied non-medical interventions (eg effects on cardiovascular outcomes of reducing salt consumption or increasing physical activity); 8) studied exposures, risk factors or prognostic factors that may influence therapeutic effectiveness but did not involve head to head comparisons of two interventions (eg effects of dose changes or co-interventions); or 9) were descriptive studies with no reporting of outcome measures. One author (SG) performed the search and initial study selection, with subsequent independent review by the second author (IAS).
From each selected study we extracted the following data: study title, journal, and date of publication; rationale stated in the introduction for choosing an observational study design; existence of a pre-specified study protocol; patient selection criteria; methods of data collection from primary sources; reference to validation checks for coded administrative data or longitudinal data linkage processes; methods for minimising recording bias (in administrative data), recall bias, social desirability bias, and surveillance bias (in clinical registry data); methods for assessing clinical outcomes; choice of predictor variables; population characteristics and statistical methods used for balancing populations; imputation methods used for missing data; subgroup analyses and interaction testing for identifying independent prognostic variables; use of unplanned post-hoc analyses; statistical methods used for adjusting for multiple outcome analyses, clustering effects (in multicentre studies) and time-dependent bias; effect size and confidence intervals; sensitivity analyses for unmeasured confounders; stated intervention mechanism of action, temporal relation between intervention and outcomes, and dose-response relationships; any falsification tests performed; comparisons with results of other similar studies; and statements about study limitations and implications for clinical practice.
Application of quality criteria
Both authors independently read the full text articles of included studies and applied to each study a list of 35 quality criteria which were, with some modification, based on those the authors have previously published (Table 1)  and which covered criteria listed in previously cited critical appraisal and reporting guidelines for observational studies [10,11,12,13]. These quality criteria were grouped into 6 categories: justification for observational design (n = 2); minimisation of bias in study design and data collection (n = 11); use of appropriate methods to create comparable groups (n = 6); appropriate adjustment of observed effects (n = 5); validation of observed effects (n = 9); and authors interpretations (n = 2).
For each study, the extent to which each criterion was satisfied were categorised as fully satisfied (Y) – all elements met; partially satisfied (P) – some elements met; or not satisfied (N) – no elements met; or not applicable (NA) if that criterion was not relevant to the study design, analytic methods or outcome measures. Inter-rater agreement between authors for criterion categorisation was initially 95.2% and consensus was reached on all criteria after discussion.
Summary measures and synthesis of results
For each study, we calculated the proportion of all applicable quality criteria categorised as fully, partially or not satisfied. For each criterion applicable at the level of individual studies, we calculated the proportion of studies which fell into each category of satisfaction. We calculated the proportion of criteria which were fully, partially or not satisfied by all studies for which criteria were applicable. Trend analysis assessed whether the proportion of applicable criteria that were fully, partially or not satisfied changed over time for studies published between 2012 and 2018. All analyses were performed using Excel functions or Graph Pad software.
The literature search identified 1076 articles from which 50 unique studies met selection criteria [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67] of which 28 (56%) assessed non-procedural, mainly pharmacological, therapies [18,19,20,21,22,23,24, 26, 28, 31,32,33, 36, 38,39,40, 43, 44, 50, 53,54,55,56, 64,65,66,67], 15 (30%) assessed invasive procedures [27, 29, 30, 34, 35, 37, 41, 42, 51, 52, 58,59,60,61,62,63], 4 (8%) assessed investigational strategies [25, 45, 46, 57] and 3 (6%) assessed models of care [47,48,49]. Studies most frequently involved interventions related to cardiology (18/50; 36%) [19,20,21, 23, 26, 30, 32, 35, 39, 40, 44, 50, 52, 61, 64,65,66,67], surgery (13/50; 26%) [27, 29, 31, 34, 37, 38, 42, 53, 58,59,60, 62, 63], neurology (4/50; 8%) [28, 43, 49, 54] and oncology (4/50; 8%) [18, 25, 56, 57]. Most studies (36/50, 72%) [18,19,20,21,22, 24,25,26,27,28,29,30,31,32, 34,35,36,37, 40, 42,43,44,45, 48, 50, 52, 54, 55, 57,58,59,60, 63,64,65] were published in one journal (JAMA), with 13/50 (26%) [33, 38, 39, 41, 46, 47, 49, 51, 53, 56, 61, 62, 66, 67] in another (JAMA Internal Medicine). Sample size varied from as low as 464 participants  to as high as 1,256,725 . Study characteristics are summarised in the on-line supplement, and an example of the application of the quality criteria is presented in Table 2.
Analyses of methodological quality and risk of bias
The proportions of applicable criteria which were fully, partially or not satisfied for each study are depicted in Fig. 1. No study was shown to have all applicable criteria fully satisfied, with the mean (+/−SD) proportion of applicable criteria fully satisfied across all studies being 72% (+/− 10%). This figure was the same for both non-procedural (68% [+/− 9%]) and procedural (70% [+/− 10%]) interventions. The categories of quality criteria demonstrating the lowest proportions of fully satisfied criteria were measures used to adjust observed effects (criteria 20, 23, 24) and validate observed effects (criteria 25, 27, 33).
At the level of individual studies, the proportions of all criteria fully or partially satisfied ranged between 60.7% (17 of 28 criteria) and 96.6% (28 of 29 criteria) and the proportions of all criteria that were not satisfied ranged from 3.4% (1 of 29 criteria) to 39.3% (11 of 28 criteria). Only two studies had more than 80% of applicable criteria fully satisfied (Chan et al. at 87%  and Friedman et al. at 81% ;) while two studies met only 50% of applicable criteria (Merie et al. ; Shirani et al ).
At the level of individual criteria, the proportions of studies in which a specific criterion was fully, partially and not satisfied, or was not applicable, are depicted in Fig. 2. One criterion (recall bias) was not applicable to any study as informal patient self-report was not used as a data source.
Across all studies, criteria associated with high levels (≥80%) of full satisfaction (where applicable) comprised: appropriate statistical methods (most commonly propensity-based methods) used to ensure balanced groups (100%); absence of social desirability bias as all studies either used validated, externally administered questionnaires or did not rely on patient self-reported symptoms or function as their primary end-points (100%); coherence of results with other studies of similar interventions (100%); temporal cause-effect relationships (98%); prospective, validated data collection (97%); plausibility of results (96%); absence of surveillance bias in clinical registry data (95%); formulation of pre-specified study protocol (94%); consistency of results to similar studies of same interventions (88%); clear statements on how prognostic variables were selected and measured (86%); data from the majority of the population sample being used in analyses (86%); absence of overstatement of study conclusions (84%); independent blind assessment of outcomes (84%); and adequate matching of patient populations being compared (80%).
Criteria associated with intermediate (51 to 79%) levels of full satisfaction comprised: absence of recording bias in administrative datasets (79%); presence of dose-response relationships (75%); absence of unplanned post-hoc analyses (76%); statistical exclusion of potentially beneficial effect in studies with conclusions of no effect or harm (76%); adequate accounting for selection bias in patient recruitment (74%); and representativeness of the study population (74%).
Criteria associated with low (≤50%) levels of full satisfaction comprised: imputation or other processes to account for missing data or drop-outs (50%); justification for not performing an RCT (42%); interaction analyses in identifying independent prognostic factors that may have influenced intervention effects (42%); use of statistical correction methods to minimise type 1 error arising from multiple analyses of several different outcome measures (33%); clinically significant effect sizes (30%); residual bias analyses that accounted for unmeasured or unknown confounders (14%); and falsification tests for residual confounding (8%).
The proportions of all applicable criteria that were fully, partially or not satisfied showed no appreciable change over time (Fig. 3).
To our knowledge, this is one of only a few studies to apply a comprehensive list of criteria for assessing the methodological rigour of a cohort of contemporary observational studies of commonly used therapeutic interventions in adult patients reported in high-impact general medicine journals. Overall, there was a high level of adherence to criteria related to study protocol pre-specification, sufficiently sized and representative population samples, prospective collection of validated and objective data with minimisation of various forms of ascertainment and measurement bias, appropriate statistical methods, avoidance of post-hoc analyses, testing for causality, and impartial interpretation of overall study results. These criteria are central to most critical appraisal guides and reporting guidelines for observational studies, are well known to researchers, and hence will likely attract a high level of adherence.
However, there is room for improvement. On average, each study failed to satisfy at least one in four quality criteria which were applicable to that study. The most frequent omission was failing to conduct a falsification (or specificity of effect) test for studies which reported intervention benefits. This test demonstrates whether a benefit is seen for outcomes that can be plausibly attributed to the intervention (eg reduction in myocardial infarctions with coronary revascularisation), but no change for a separate outcome most unlikely to be affected by the intervention (eg in this example, reduction in cancer incidence), whereas if a benefit is seen for both outcomes, then the intervention is probably not the causative factor but some other confounding factor that affects both outcomes . Second was the failure to eliminate the possibility of positive effects being annulled or attenuated by an unmeasured or unknown confounder by undertaking residual (or quantitative) bias or instrumental variable analyses. A new concept called the ‘E value’ and its associated formula have recently been articulated which denotes how prevalent and sizeable in its effects such a confounder would have to be to negate the observed benefit [69, 70]. Understandably, as this is a recent innovation, studies prior to 2017 could not have used this technique, although other methods have been used in the past , and this form of bias has been known for decades . Third was the absence of large effect sizes which, according to GRADE, lessens the likelihood that the observed benefit is real, as small effect sizes provide little buffer against residual confounding . Exactly what constitutes a large enough effect size to counter such confounding remains controversial, with relative risks (RRs) > 2 (or < 0.5) , ≥5 (or ≤ 0.2)  or ≥ 10 (or ≤ 0.1)  being cited as reasonable thresholds. We chose the first of these three thresholds as the minimum necessary, cognizant of the fact that RRs varying between 0.5 and 2 are the ones most commonly reported. Fourth was the absence of correction for statistical significance (using Bonferroni or other methods) for multiple outcome analyses in avoiding type 1 errors whereby significant but spurious benefits are generated simply by the play of chance . Fifth was the omission of subgroup analyses and statistical interaction testing that could identify effect modifiers that differentially influence intervention effects . Proper use of such analyses seems to be an ongoing challenge for RCTs as well . Sixth was lack of multiple imputation processes to account for missing data or drop-outs, an omission frequently seen in clinical research . Such analyses assess the potential for observed effects to have been attenuated by unascertained adverse events occurring among those lost to follow-up at study end, particularly if the outcome of interest, such as deaths, is infrequent. Finally, many studies failed to provide a substantive reason why an RCT could not be performed in the absence of existing RCTs. While it may arguably not qualify as a quality criterion, we believe researchers are obliged to explain why a study design vulnerable to bias was preferred over more robust randomised designs if no substantive barriers to doing such an RCT existed.
A further concern is that despite the promulgation of reporting guidelines for non-randomised studies and the development of statistical methods for gauging the level of sensitivity of results to residual bias, our trend analysis indicates little improvement in methodological quality of studies published between 2012 and 2018. Overall, deficits in statistical analytic methods featured more prominently than deficits in study design and conduct. In particular, the absence of falsification tests, E-value quantification, subgroup analyses using tests for interaction, and adjustment for missing data and multiple comparisons limited the ability of many studies to account for residual confounding in their results.
Our study has several limitations. First, despite excellent agreement between authors in categorising levels of criterion satisfaction, this task involves subjective and potentially biased judgement. However, this problem is common to most quality assessment tools . Second, our criteria have not been validated, although few tools have, and, in any event, our criteria included those contained within other well-publicised instruments which have recognised limitations . Third, some may argue that studies not using propensity score methods to create matched cohorts for primary analyses and relying solely on multivariate regression models should be classed as more vulnerable to bias than those which do. However, research has not shown the former to be necessarily superior to the latter . Fourth, we made no attempt to rank or weight criteria according to the magnitude of their potential to bias study results, but as far as we aware, no validated weighting method has been reported . Fifth, our chosen threshold for effect size (odds ratio ≤ 0.5 or relative risk reduction ≥50%) is arbitrary and may be regarded as too stringent, but is the upper threshold quoted by other researchers [73,74,75]. Sixth, our small sample of 50 studies, with the majority taken from only 2 journals, and identified from searching only one database is arguably not representative of all observational studies of therapeutic interventions, although PubMed is the database widely used by practising clinicians to find articles most relevant to their practice. The inclusion of the terms ‘association’ and ‘observational’ in our search strategy likely biased study retrieval towards articles published in JAMA and JAMA Internal Medicine, as these journals use these words consistently in their titles and abstracts. However, it is also possible these journals have a greater propensity than other journals to publish observational studies. We would recommend that all journals request authors to have their study titles clearly indicate they are observational. While the sample is small, the included studies involved commonly used clinical interventions, and by being published in high impact journals have considerable potential to influence practice. Moreover, other investigators have found it difficult to find large numbers of observational trials in specific disciplines over extended periods of time .
Contemporary observational studies published in two high impact journals show limitations that warrant remedial attention from researchers, journal editors and peer reviewers. Reporting guidelines for such studies should promulgate the need for falsification testing, quantification of E values, effects sizes that denote less vulnerability to residual confounding, appropriate statistical adjustment for multiple outcome analyses, statistical interaction tests for identifying important predictors of intervention effects, complete patient follow-up, and justification for choosing to undertake an observational study rather than an RCT.
Availability of data and materials
All data generated or analysed during this study are included in the manuscript and in the appendix.
Schünemann HJ, Tugwell P, Reeves BC, et al. Non-randomized studies as a source of complementary, sequential or replacement evidence for randomized controlled trials in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4:49–62.
Kyriacou DN, Lewis RJ. Confounding by indication in clinical research. JAMA. 2016;316(17):1818–9.
Groenwold RH, Van Deursen AM, Hoes AW, Hak E. Poor quality of reporting confounding bias in observational intervention studies: a systematic review. Ann Epidemiol. 2008;18(10):746–51.
Hemkens LG, Contopoulis-Ioannidis D, Ioannidis JPA. Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey. BMJ. 2016;352:i493.
Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342:1887–92.
MacLehose RR, Reeves BC, Harvey IM, et al. A systematic review of comparisons of effect sizes derived from randomised and non-randomised studies. Health Technol Assess. 2000;4:1–154.
Anglemyer A, Horvath HT, Bero L. Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev. 2014;4:MR000034.
Methodology committee of the patient-centred outcomes research institute (PCORI). Methodological standards and patient centredness in comparative effectiveness research: the PCORI perspective. JAMA. 2012;307:1636–40.
Von Elm E, Altman DG, Egger M, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann Intern Med. 2007;147:573–7.
Sanderson S, Tatt ID, Higgins JP. Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int J Epidemiol. 2007;36:666–76.
Wells GA, Shea B, O’Connell D, et al. The Newcastle-Ottawa scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. 2008. http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp
Sterne JAC, Hernán MA, Reeves BC, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919.
Dreyer NA, Bryant A, Velentgas P. The GRACE checklist: a validated assessment tool for high quality observational studies of comparative effectiveness. J Manag Care Spec Pharm. 2016;22(10):1107–13.
Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336:924–6.
Shrier I, Boivin JF, Steele RJ, et al. Should meta-analyses of interventions include observational studies in addition to randomized controlled trials? a critical examination of underlying principles. Am J Epidemiol. 2007;166:1203–9.
Murad MH, Wang Z. Guidelines for reporting meta-epidemiological methodology research. Evid Based Med. 2017;22(4):139–42.
Scott I, Attia J. Cautionary tales in the interpretation of observational studies of effects of clinical interventions. Intern Med J. 2017;47(2):144–57.
Smith GL, Xu Y, Buchholz TA, et al. Association between treatment with brachytherapy vs whole-breast irradiation and subsequent mastectomy, complications, and survival among older women with invasive breast cancer. JAMA. 2012;307(17):1827–37.
Merie C, Køber L, Olsen PS, et al. Association of warfarin therapy duration after bioprosthetic aortic valve replacement with risk of mortality, thromboembolic complications, and bleeding. JAMA. 2012;308(20):2118–25.
Lund LH, Benson L, Dahlstrom U, Edner M. Association between use of renin-angiotensin system antagonists and mortality in patients with heart failure and preserved ejection fraction. JAMA. 2012;308(20):2108–17.
Bangalore S, Steg PG, Deedwania P, et al. β-Blocker use and clinical outcomes in stable outpatients with and without coronary artery disease. JAMA. 2012;308(13):1340–9.
Marin JM, Agusti A, Villar I, et al. Association between treated and untreated obstructive sleep apnea and risk of hypertension. JAMA. 2012;307(20):2169–76.
Goldberger ZD, Chan PS, Berg RA, et al. Duration of resuscitation efforts and survival after in-hospital cardiac arrest: an observational study. Lancet. 2012;380:1473–81.
Yunos NM, Bellomo R, Hegarty C, et al. Association between a chloride-liberal vs chloride-restrictive intravenous fluid administration strategy and kidney injury in critically ill adults. JAMA. 2012;308(15):1566–72.
Stoffels I, Boy C, Poppel T, et al. Association between sentinel lymph node excision with or without preoperative SPECT/CT and metastatic node detection and disease-free survival in melanoma. JAMA. 2012;308(10):1007–14.
Anderson C, Lyngbæk S, Nguyen CD, et al. Association of clopidogrel treatment with risk of mortality and cardiovascular events following myocardial infarction in patients with and without diabetes. JAMA. 2012;308(9):882–9.
Adams TD, Davidson LE, Litwin SE, et al. Health benefits of gastric bypass surgery after 6 years. JAMA. 2012;308(11):1122–31.
Shirani A, Zhao Y, Karim ME, et al. Association between use of interferon beta and progression of disability in patients with relapsing-remitting multiple sclerosis. JAMA. 2012;308(3):247–56.
Suri RM, Vanoverschelde J-L, Grigioni F, et al. Association between early surgical intervention vs watchful waiting and outcomes for mitral regurgitation due to flail mitral valve leaflets. JAMA. 2013;310(6):609–16.
Peterson PN, Varosy PD, Heidenreich PA, et al. Association of single- vs dual-chamber ICDs with mortality, readmissions, and complications among patients receiving an ICD for primary prevention. JAMA. 2013;309(19):2025–34.
London MJ, Hur K, Schwartz GG, Henderson WG. Association of perioperative β-blockade with mortality and cardiovascular morbidity following major noncardiac surgery. JAMA. 2013;309(16):1704–13.
Carrero JJ, Evans M, Szummer K, et al. Warfarin, kidney dysfunction, and outcomes following acute myocardial infarction in patients with atrial fibrillation. JAMA. 2014;311(9):919–28.
Hsu T-W, Liu J-S, Hung S-C, et al. Renoprotective effect of renin-angiotensin-aldosterone system blockade in patients with predialysis advanced chronic kidney disease, hypertension, and anemia. JAMA Intern Med. 2014;174(3):347–54.
Salman RA-S, White PM, Counsell CE, et al. Outcome after conservative management or intervention for unruptured brain arteriovenous malformations. JAMA. 2014;311(16):1661–9.
Al-Khatib SM, Hellkamp AS, Fonarow GC, et al. Association between prophylactic implantable cardioverter-defibrillators and survival in patients with left ventricular ejection fraction between 30 and 35%. JAMA. 2014;311(21):2209–15.
Roumie CL, Greevy RA, Grijalva CG, et al. Association between intensification of metformin treatment with insulin vs sulfonylureas and cardiovascular events and all-cause mortality among patients with diabetes. JAMA. 2014;311(22):2288–96.
Sjöström L, Peltonen M, Jacobson P, et al. Association of bariatric surgery with long-term remission of type 2 diabetes and with microvascular and macrovascular complications. JAMA. 2014;311(22):2297–304.
Brinkman W, Herbert MA, O’Brien S, et al. Preoperative β-blocker use in coronary artery bypass grafting surgery. JAMA Intern Med. 2014;174(8):1320–7.
Pasternak B, Svanström H, Melbye M, Hviid A. Association of treatment with carvedilol vs metoprolol succinate and mortality in patients with heart failure. JAMA Intern Med. 2014;174(10):1597–604.
Lund LH, Benson L, Dahlstrom U, et al. Association between use of β-blockers and outcomes in patients with heart failure and preserved ejection fraction. JAMA. 2014;312(19):2008–18.
Lindenauer PK, Stefan MS, Shieh M-S, et al. Outcomes associated with invasive and noninvasive ventilation among patients hospitalized with exacerbations of chronic obstructive pulmonary disease. JAMA Intern Med. 2014;174(12):1982–93.
Arterburn DE, Olsen MK, Smith VA, et al. Association between bariatric surgery and long-term survival. JAMA. 2015;313(1):62–70.
Kuramatsu JB, Gerner ST, Schellinger PD, et al. Anticoagulant reversal, blood pressure levels, and anticoagulant resumption in patients with anticoagulation-related intracerebral haemorrhage. JAMA. 2015;313(8):824–36.
Szummer K, Oldgren J, Lindhagen L, et al. Association between the use of fondaparinux vs low-molecular-weight heparin and clinical outcomes in patients with non–ST-segment elevation myocardial infarction. JAMA. 2015;313(7):707–16.
Jarvik JG, Gold LS, Comstock BA, et al. Association of early imaging for back pain with clinical outcomes in older adults. JAMA. 2015;313(11):1143–53.
Foy AJ, Liu G, Davidson WR Jr, et al. Comparative effectiveness of diagnostic testing strategies in emergency department patients with chest pain. An analysis of downstream testing, interventions, and outcomes. JAMA. Intern Med. 2015;175(3):428–36.
Ornstein KA, Aldridge MD, Garrido MM, et al. Association between hospice use and depressive symptoms in surviving spouses. JAMA Intern Med. 2015;175(7):1138–46.
Valley TS, Sjoding MW, Ryan AM, et al. Association of intensive care unit admission with mortality among older patients with pneumonia. JAMA. 2015;314(12):1272–9.
Bekelis K, Marth NJ, Wong K, et al. Primary stroke center hospitalization for elderly patients with stroke implications for case fatality and travel times. JAMA Intern Med. 2016;176(9):1361–8.
Chan PS, Berg RA, Tang Y, et al. American Heart Association’s Get With the Guidelines–Resuscitation Investigators. Association between therapeutic hypothermia and survival after in-hospital cardiac arrest. JAMA. 2016;316(13):1375–82.
Gershengorn HB, Scales DC, Kramer A, Wunsch H. Association between overnight extubations and outcomes in the intensive care unit. JAMA Intern Med. 2016;176(11):1651–60.
Andersen LW, Granfeldt A, Callaway CW, et al. Association between tracheal intubation during adult in-hospital cardiac arrest and survival. JAMA. 2017;317(5):494–506.
London MJ, Schwartz GG, Hur K, Henderson WG. Association of perioperative statin use with mortality and morbidity after major noncardiac surgery. JAMA Intern Med. 2017;177(2):231–42.
Xian Y, O’Brien EC, Liang L, et al. Association of preceding antithrombotic treatment with acute ischemic stroke severity and in-hospital outcomes among patients with atrial fibrillation. JAMA. 2017;317(10):1057–67.
Axelsson KF, Nilsson AG, Wedel H, et al. Association between alendronate use and hip fracture risk in older patients using oral prednisolone. JAMA. 2017;318(2):146–55.
Haaland GS, Falk RS, Straume O, Lorens JB. Association of warfarin use with lower overall cancer incidence among patients older than 50 years. JAMA Intern Med. 2017;177(12):1774–80.
Presley CJ, Tang D, Soulos PR, et al. Association of broad-based genomic sequencing with survival among patients with advanced non–small cell lung cancer in the community oncology setting. JAMA. 2018;320(5):469–77.
Friedman DJ, Piccini JP, Wang T, et al. Association between left atrial appendage occlusion and readmission for thromboembolism among patients with atrial fibrillation undergoing concomitant cardiac surgery. JAMA. 2018;319(4):365–74.
Jakobsen GS, Småstuen MC, Sandbu R, et al. Association of bariatric surgery vs medical obesity treatment with long-term medical complications and obesity-related comorbidities. JAMA. 2018;319(3):291–301.
Reges O, Greenland P, Dicker D, et al. Association of bariatric surgery using laparoscopic banding, roux-en-Y gastric bypass, or laparoscopic sleeve gastrectomy vs usual care obesity management with all-cause mortality. JAMA. 2018;319(3):279–90.
Bansal N, Szpiro A, Reynolds K, et al. Long-term outcomes associated with implantable cardioverter defibrillator in adults with chronic kidney disease. JAMA Intern Med. 2018;178(3):390–8.
Berry SD, Rothbaum RR, Kiel DP, Lee Y, Mitchell SL. Association of clinical outcomes with surgical repair of hip fracture vs nonsurgical management in nursing home residents with advanced dementia. JAMA Intern Med. 2018;178(6):774–80.
Fisher DP, Johnson E, Haneuse S, et al. Association between bariatric surgery and macrovascular disease outcomes in patients with type 2 diabetes and severe obesity. JAMA. 2018;320(15):1570–82.
Inohara T, Manandhar P, Kosinski A, et al. Association of renin-angiotensin inhibitor treatment with mortality and heart failure readmission in patients with transcatheter aortic valve replacement. JAMA. 2018;320(21):2231–41.
Ray WA, Chung CP, Murray KT, et al. Association of oral anticoagulants and proton pump inhibitor cotherapy with hospitalization for upper gastrointestinal tract bleeding. JAMA. 2018;320(21):2221–30.
Brar S, Ye F, James MT, et al. Association of angiotensin-converting enzyme inhibitor or angiotensin receptor blocker use with outcomes after acute kidney injury. JAMA Intern Med. 2018;178(12):1681–90.
Sheppard JP, Stevens S, Stevens R, et al. Benefits and harms of antihypertensive treatment in low-risk patients with mild hypertension. JAMA Intern Med. 2018;178(12):1626–34.
Prasad V, Jena AB. Prespecified falsification end points. can they validate true observational associations? JAMA. 2013;309(3):241–2.
Haneuse S, VanderWeele TJ, Arterburn D. Using the E-value to assess the potential effect of unmeasured confounding in observational studies. JAMA. 2019;321(6):602–3.
VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167(4):268–74.
Lash TL, Fox MP, MacLehose RF, et al. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43:1969–85.
Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54:948–63.
Guyatt GH, Oxman AD, Montori V, et al. GRADE guidelines: 5. Rating the quality of evidence-publication bias. J Clin Epidemiol. 2011;64(12):1277–82.
Collins R, Bowman L, Landray M, Peto R. The magic of randomization versus the myth of real-world evidence. N Engl J Med. 2020;382(7).
Glasziou P, Chalmers I, Rawlins M, McCulloch P. When are randomised trials unnecessary? Picking signal from noise. BMJ. 2007;334:349–51.
Sedgwick P. Multiple hypothesis testing and Bonferroni’s correction. BMJ. 2014;349:g6284.
Andrews N, Cho H. Validating effectiveness of subgroup identification for longitudinal data. Stat Med. 2018;37(1):98–106.
Gabler NB, Duan N, Raneses E. No improvement in the reporting of clinical trial subgroup effects in high-impact general medical journals. Trials. 2016;17(1):320.
Hayati Rezvan P, Lee KJ, Simpson JA, et al. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015;15:30.
Hartling L, Hamm MP, Milne A, et al. Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs. J Clin Epidemiol. 2013;66:973–81.
Stang A. Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur J Epidemiol. 2010;25(9):603–5.
Shah BR, Laupacis A, Hux J, et al. Propensity score methods gave similar results to traditional regression modelling in observational studies: a systematic review. J Clin Epidemiol. 2005;58:550–9.
Wang Z, Taylor K, Allman-Farinelli M, et al. A systematic review: tools for assessing methodological quality of human observational studies. NHMRC. 2019; Available at https://nhmrc.gov.au/guidelinesforguidelines/develop/assessing-risk-bias.
Hendriksma M, Joosten MH, Peters JP, et al. Evaluation of the quality of reporting of observational studies in otorhinolaryngology based on the STROBE statement. PLoS One. 2017;12(1):e0169316.
All methods were carried out in accordance with relevant guidelines and regulations.
There was no funding source for this study.
Ethics approval and consent to participate
As this was a secondary analysis of pre-existing published literature that is in the public domain and which reports aggregated de-identified data that does not allow identification of individuals, no consent process involving individuals was required. No experimental protocols were used as this was a study of published literature and did not involve investigation of experimental interventions. As no data from private sources were sought, accessed or used, with all analyses involving data contained within the original publications, no institutional review board consent was required.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Grosman, S., Scott, I.A. Quality of observational studies of clinical interventions: a meta-epidemiological review. BMC Med Res Methodol 22, 313 (2022). https://doi.org/10.1186/s12874-022-01797-1