Skip to main content

Table 1 Methods to enhance causal inference in observational research

From: Are there non-linear relationships between alcohol consumption and long-term health?: a systematic review of observational studies employing approaches to improve causal inference


Relevant sub-methods


Can address:




Reverse causality

Selection bias

Measurement error


Analytical methods applied to traditional longitudinal study designs

Propensity scores (PS)

[11, 19]

Covariate balancing propensity scores (CBPS)

-The PS is a single value reflecting the probability of exposure for an individual given their values on all relevant covariates

-PS generation occurs as a data ‘pre-processing’ step prior to main analysis

-Usually generated via logistic regression

-Once generated, the PS can be used for matching, stratification, weighting, or as a covariate for adjustment in regression


-Unlike standard methods, can handle large numbers of covariates

-Not reliant on correctly modelling covariate- outcome relationships

-Covariate balance after matching/weighting can be assessed

-Still relies on appropriate choice of covariates and accurate measurement

-PS matching may not find matches for some treatment cases, leading to reduced sample size and limiting generalizability

-Effect estimation with PS doesn’t always perform better than regular adjustment

-Most PS methods rely on manual covariate balance checking and refitting




-A family of methods intended for use with time-dependent variables

-Developed as a solution to the problem of time-varying covariates affected by past exposure, including those that act as both confounders and mediators over time

-The three G-methods are the G-formula, marginal structural models, and G-estimation, each relying on its own modelling assumptions


-Unlike standard methods to control for confounding, G-methods do not fix values of covariates, thus do not block mediation via the covariate, and avoid introducing collider bias

-Accounting for changes in variables over time mitigates misclassification

-Still relies on appropriate choice of covariates and accurate measurement



(aka G-computation or G-standardization) [21, 23, 25, 26]

-First models relationship given observed data (using actual exposure for each individual), and then predicts outcomes under counterfactual exposures, with the difference taken as the causal effect

-Is a generalization of standardization (conditioning on covariates and then marginalizing) that accounts for dynamic variables by considering covariate distribution over follow-up time


(specifically, bias due to censoring)

-Can be used to calculate risk ratios and risk differences

-Well-suited to assess time-varying exposures

-Vulnerable to the ‘g-null paradox’- null hypotheses tend to be rejected in large studies even when true

-Usually requires specifying a statistical model, hence also being known as the ‘parametric’ G-formula


Marginal structural models (MSMs)

[21, 23, 25]

-Use weights based on inverse probability of exposure at each time point to create a pseudo-population where each combination of covariates is equally present in each exposure condition

-Using these weights, MSMs then estimate the causal effect

-The most popular of the G-methods


(specifically, bias due to censoring)

-Simplest of the G-methods to understand and implement

-Can also integrate censoring weights to account for differential attrition

-Not ideal for assessing exposure-confounder interactions, with standard MSMs unable to estimate interactions involving dynamic variables

-Requires checking weight distribution, may require refitting (as with PS methods)

-Cannot be used if all participants are exposed/unexposed on a particular level of a confounder


G-estimation of structural nested models


-At each wave assesses the relationship between exposure and likelihood of outcome given covariates, adjusting for exposure and covariate values from past waves, thus accounting for dynamic confounders affected by past exposure

-Considered semi-parametric in that mean counterfactual outcomes under no exposure are unspecified


-Can be used even if all participants are exposed/ unexposed on a particular level of a confounder

-Can be used to assess exposure-confounder interactions

-Unlike the other G-methods, cannot account for selection bias arising from censoring, so data requires preliminary weighting to account for bias from censoring

Doubly robust methods


Targeted maximum likelihood estimation; Augmented inverse probability weighting

-Incorporates both an estimation of the outcome mechanism (as in regression adjustment) and the exposure mechanism (as in propensity scores)


-Only the outcome mechanism or the exposure mechanism need be consistently estimated to generate an unbiased estimate of exposure effect

-Still relies on appropriate choice of covariates and accurate measurement

Fixed effects regression [27,28,29,30,31]


-A technique developed in the econometrics literature for use with longitudinal data with repeat outcome measurements, only

using information on within-subject variation, thus controlling for all time-invariant sources of confounding

-Treats time-invariant characteristics that differ between individuals as fixed parameters (unlike in mixed models), allowing estimation of parameters of interest net of stable confounders

-Each participant serves as own control


-Removes the threat of all observed and unobserved time-invariant confounding

-Models can be extended to include time-varying covariates

-Cannot overcome time-varying confounding without extending the model, and these variables must be observed/measured

-Individuals with stable exposure values do not contribute to estimates; also leads to imprecision when exposures change little over time

-Reducing confounding comes at the cost of more sampling variability

-Cannot generate parameter estimates for stable characteristics like race

Causal mediation analysis [32,33,34]


-Integrates traditional mediation analysis (which separately estimates total effect of exposure on outcome, indirect effect via mediators, and direct effect unexplained by mediators) with the potential outcomes framework to allow for exposure-mediator interaction and non-linear relationships (i.e., is a non-parametric method)

-Uses the concepts of ‘controlled direct effect’, ‘natural direct effect’, and ‘natural indirect effect’

-Makes explicit underlying assumptions related to unmeasured confounding, and encourages sensitivity analyses to test robustness to assumption violations

(see advantages and limitations*)


-Effect decomposition is still possible given exposure-mediator interaction, nonlinearity, and categorical variables

-Makes underlying assumptions explicit

-Helps identify the mechanism/s of an exposure’s effect; especially useful when heterogenous causal mechanisms at play

-Can be extended to situations with multiple mediators

-*If a variable completely mediates the exposure-outcome relationship and is shielded from confounders, confounder measurement is not needed

-*In practice, likely there will always be exposure-mediator or mediator-outcome confounding, so still need to observe and accurately measure covariates

-Analyses make strong assumptions, necessitating sensitivity analyses

Alternative observational study designs

Natural experiments

[17, 35,36,37]


-Mimic RCTs by exploiting exogenous events that are truly randomized/approximate random assignment

-Differ from true experiments in that exposure is not assigned by the researcher

-Assignment may be as a result of naturally occurring phenomena (e.g., a weather event), or of human intervention implemented for reasons other than the research question (e.g., army draft lottery)


-In approximating randomization, obviates the need for accurate measurement of confounders

-Potential to overcome measurement error, reverse causation, and selection bias

-Rare to find truly random or as if-random exposure assignment


Standard natural experiments

-Natural experiments where individuals are as-if/randomly assigned to exposure and control groups


-May be difficult to find a standard natural experiment that maps on well to the actual research question of interest


Instrumental variable analysis

-Assesses the relationship between an as-if/ randomly assigned proxy for the exposure of interest and the outcome

-A valid instrumental variable must be associated with the exposure of interest, be independent of confounders of the exposure-outcome relationship, and should affect the outcome only via the exposure

-Useful when the exposure itself is difficult to manipulate or measure

-Difficult to find valid instrumental variables

-Potential for weak instrument bias i.e., when the instrument explains a small amount of variance in exposure

-Relies on assumption that the instrumental variable is not associated with exposure-outcome confounding


Genetic instrumental variables

-Kind of instrumental variable analysis using genetic variants as proxies for exposure

-The most prominent technique is Mendelian Randomisation

-Genes cannot be confounded by environment, cannot be subject to reverse causality, and are stable over time

-Multiple variants can be combined to explain more variance in exposure, mitigating weak instrument bias

-Genetic instrumental variables are proxies for lifelong exposure - this period may be longer than what the research question is interested in

-Potential for weak instrument bias, pleiotropy (gene affecting more than one phenotype) and linkage disequilibrium (genes more likely to be inherited together)

-Possible population stratification (there may be population subgroups with different distributions of genes)

Quasi-experiments [35]


-Like natural experiments, exploit exogenous events to assess relationships between exposures and outcomes, but lack random or as-if random assignment


-Same as for natural experiments

-Without random assignment, confounding is still possible i.e., cause of exposure may also contribute to outcome

Family-based designs

[17, 36]

Twin studies;

Sibling comparison

-By comparing genetically related participants discordant for the exposure of interest, accounts for confounding from genetic or shared environmental sources


-Controls for unmeasured/unobservable confounding (for shared covariates)

-Comparing monozygotic and dizygotic twins enhances understanding of genetic vs. environmental confounding

-Still need to observe and accurately measure non-shared environmental covariates to control for this kind of confounding

-May be difficult to find family members discordant for exposure of interest

Negative controls [18, 36]

Negative control exposures; Negative control outcomes

-Have the same confounding structures as the exposure-outcome relationship of interest, but lack a plausible causal mechanism

-If association is greater for the relationship of interest than for the negative control, a causal relationship is likely; if not, suggests confounding/other shared biases responsible

-May take the form of a negative control exposure or a negative control outcome


(specifically, immortal time bias)


-Can identify when confounding (or assumed shared bias) is responsible for apparent causal effects

-Does not require observation/measurement of covariates

-Relies on assumption of no plausible causal mechanism in the negative control relationship

-Relies on assumption that same confounding structure shared by relationship of interest and negative control relationship