Skip to main content

Table 1 Quality criteria used for assessing observational studies of interventions

From: Quality of observational studies of clinical interventions: a meta-epidemiological review

Criteria

Explanations

Justification for observational design

1. RCTs non-existent or inadequate

2. RCTs not feasible

Given that RCTs have minimal vulnerability to bias, authors should provide, at the outset, reasons why they chose to perform an observational study rather than an RCT

Minimisation of bias in study design and data collection

3. Pre-specified study protocol

4. Clearly stated patient selection criteria

5. Representative study population

6. Prospective and verifiable data collection

7. Validation checks for coded administrative data

8. Validation checks for longitudinal data linkage processes

9. Minimisation of recording bias in administrative data

10. Minimisation of recall bias

11. Minimisation of social desirability bias

12. Minimisation of surveillance bias in clinical registry data

13. Independent assessment of outcomes

Observational studies should emulate RCTs and pre-specify all aspects of the intended study.

Source and selection of the study population should be inclusive

Study sample should be representative of all patients in whom the intervention may be used.

Observational studies often use routinely collected clinical data that were not originally intended for research purposes and hence collected with less attention to validity or reliability.

Data that is prospectively collected (ie in real time as care is provided, even if analysed later in retrospect) and capable of verification (ie data source is subject to curation and re-analysis) is more accurate. Data quality checks should include logic and range checks, level of agreement with random health record re-abstractions and ‘gold standard’ studies comparing data abstractions with standard clinical and laboratory criteria or expert panel review.

Coded data in administrative datasets should specify the code assignment process and validation procedures for ensuring accurate ascertainment of diagnoses.

Robust and validated data linkage processes need to be in place for collecting data on the same patients over time from different data sources.

Recording bias can affect administrative datasets used for reimbursement purposes as a result of ‘up-coding’ (inclusion of all possible diagnoses and procedures) to maximise revenue.

Recall bias occurs when exposures or outcomes are ascertained from self-reporting by patients who may selectively recall past events

Social desirability bias occurs when patients may self-report outcomes that they expect clinicians will want to hear or are in accord with peer norms

Surveillance (or detection) bias may affect clinical registries whereby outcome assessments occur more often than usual in persons receiving a particular intervention.

Outcomes such as clinical events or should be adjudicated independently by clinical experts who are blind to intervention assignment. Mortality data needs to come from verifiable sources such as death registries.

Use of appropriate methods to create comparable groups

14. Appropriate statistical regression models for balancing populations

15. Model includes all important predictor variables

16. Appropriate selection and measurement of all important predictor variables

17. Majority of population sample included in analysis

18. Imputation methods for missing data

19. Comparison groups well balanced

Observational studies are prone to selection bias in clinician decision-making which may relate to patient factors (age, gender, diagnosis or disease

severity, frailty, cognitive function, physical capacity, personal preferences), clinician factors (level of training or expertise) and system of care factors (supportive infrastructure). The intervention groups must be balanced in terms of their likelihood (or propensity) to have received the intervention under study and minimise confounding by indication.

Several statistical techniques can be used to adjust for confounding, with propensity score based methods used most commonly. These scores define an individual’s ‘propensity’ or probability of receiving the intervention between 0 and 1, conditional on all factors likely to influence this decision (as above). Propensity scores are derived from regression models where intervention is the outcome (independent variable), and pre-intervention factors influencing whether patients receive the intervention are the predictors (dependent variables).

The model should include all clinically relevant predictor variables as determined by clinical experts.

A clinically valid rationale as to why these predictors were chosen, and the methods used to ascertain and measure them should be provided.

These regression models should be applied to all or most of the study sample population to maintain study power

Imputation methods should account for missing data, especially outcome data if the outcomes are infrequent (< 5% or 5 per 100 person years).

The models should yield well balanced comparison groups as measured by standardised mean differences < 10% or variance ratios < 2.0.

Appropriate adjustment of observed effects

20. Subgroup analyses ad Interaction testing for identifying independent prognostic variables

21. Avoidance of unplanned post-hoc analysis

22. Correction for multiple outcome analyses

23. Adjustment for clustering effects in multicentre studies

24. Adjustment for time-dependent bias

The observed effects of the intervention should be subject to subgroup analyses and statistical interaction testing in identifying independent prognostic variables associated with greater or lesser intervention effects.

Unplanned post-hoc analyses (or ‘data dredging’) which are not well justified should be avoided as they may be biased by researchers’ knowledge of main outcomes.

The statistical significance of results of multiple analyses of several different outcomes should be corrected by appropriate methods (such as Bonferroni).

Outcomes should be corrected for clustering effects if a study has been collected data from multiple sites where the interventions were being delivered, unless there is reasonable assurance that patients, clinicians, intervention mode and outcome measures were uniform across sites. Outcomes should be adjusted for time-varying co-variates (eg level of disease severity or timing of exposure to interventions) that may influence intervention effects and avoid immortal time bias and reverse causality.

Validation of observed effects

25. Large effect size

26. Exclusion of possible benefit in presence of negative results

27. Sensitivity analysis for unmeasured confounders

28. Plausibility of intervention mechanism of action

29. Temporal relation between intervention and outcomes

30. Dose-response relationship

31. Consistency with other studies of same intervention

32. Coherence with other studies of similar interventions

33. Falsification test for intervention effect specificity

Effect sizes should be reasonably large in presenting a high signal to noise ratio that provides a buffer to residual confounding. There is no validated threshold but we have chosen RR or OR ≤ 0.5 (see text).

In studies which report a point estimate of no benefit or harm, the 95% confidence interval for the effect size should not cross over the line of unity, suggesting a possible benefit that the study was unable to uncover due to inadequate power, large numbers of drop-outs, or biases in data collection, patient selection, or analytic methods

Sensitivity analyses should be performed to assess how prevalent and influential an unknown or unmeasured confounder would have to be to attenuate or annul the observed effect. Quantitative bias analysis or E value calculations are accepted methods.

A cause and effect relationship between intervention and observed outcomes is more likely if they satisfy the following Bradford-Hill causality criteria:

Plausible mechanism of action that explains how the intervention results in observed outcomes

Credible temporal relation between when the intervention is implemented and the outcome observed

Increasing therapeutic response with increasing intensity or dose of the intervention

Consistency of results with those reported in other trials (randomised and non-randomised) of the same intervention in similar populations

Coherence of results with those reported in trials of similar interventions

Falsification (or effect specificity) test where manifestations of another disease condition on which the intervention will exert no plausible effect is compared between groups, with expected result of no difference between groups.

Authors’ interpretations

34. Study limitations acknowledged

35. Impartial statement of study implications

Given the vulnerability to bias of observational studies, authors should be totally candid and exhaustive in stating the limitations to their study, with particular emphasis given to selection bias in patient selection and adequacy of methods for balancing groups.

For the same reasons, authors’ should interpret their findings cautiously, not overstate their significance or the implications for clinical practice, and indicate when their results should be confirmed by additional studies, in particular RCTs.