Confounding-adjustment methods for the causal difference in medians

Background With continuous outcomes, the average causal effect is typically defined using a contrast of expected potential outcomes. However, in the presence of skewed outcome data, the expectation (population mean) may no longer be meaningful. In practice the typical approach is to continue defining the estimand this way or transform the outcome to obtain a more symmetric distribution, although neither approach may be entirely satisfactory. Alternatively the causal effect can be redefined as a contrast of median potential outcomes, yet discussion of confounding-adjustment methods to estimate the causal difference in medians is limited. In this study we described and compared confounding-adjustment methods to address this gap. Methods The methods considered were multivariable quantile regression, an inverse probability weighted (IPW) estimator, weighted quantile regression (another form of IPW) and two little-known implementations of g-computation for this problem. Methods were evaluated within a simulation study under varying degrees of skewness in the outcome and applied to an empirical study using data from the Longitudinal Study of Australian Children. Results Simulation results indicated the IPW estimator, weighted quantile regression and g-computation implementations minimised bias across all settings when the relevant models were correctly specified, with g-computation additionally minimising the variance. Multivariable quantile regression, which relies on a constant-effect assumption, consistently yielded biased results. Application to the empirical study illustrated the practical value of these methods. Conclusion The presented methods provide appealing avenues for estimating the causal difference in medians. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-023-02100-6.


Introduction
Causal inference is a central goal of health research, aiming to assess how intervening on a given exposure impacts an outcome of interest.[1] In a perfect randomized controlled trial (i.e., with no loss to follow-up) causal effects can in principle be directly estimated by comparing the average outcome in those randomised to each exposure level.However, in observational studies estimation of causal effects requires more sophisticated methods, in particular to adjust for potential confounding due to the lack of randomisation.
With continuous outcomes, the average causal effect is typically defined as a contrast of the expected potential outcome under exposure versus under no exposure (i.e., the estimand in the hypothetical target trial we seek to emulate with observational data [2,3]).However, epidemiological studies may suffer from skewed outcome data, for which the expectation may no longer be interpretable as the central value of the distribution.Examples of skewed outcomes are abundant in health research, particularly in areas using scale scores for measurement (e.g., self-reported quality of life via the PedsQL [4], childhood behaviour via the SDQ [5]), time-to-event outcomes in the absence of censoring (e.g., survival time), or duration of events (e.g., breastfeeding duration).
When faced with this challenge in practice, the typical approach is to either "ignore or transform" -ignore the skewness in the data entirely and continue to define the estimand as a contrast of expected potential outcomes, or transform the outcome to obtain a more symmetric distribution for which the expectation is interpretable as the centre value.Both approaches have their advantages, although neither is entirely satisfactory in many practical settings.Ignoring the skewness may be appropriate when the expected potential outcomes are of direct interest, allowing established confounding-methods to be applied [1], although this may not reflect the central value of the outcome distribution.Transformation of the outcome to be more symmetrically distributed could be an alternative solution.However, this relies on a suitable transformation existing (which may not be feasible for highly skewed distributions) and makes interpretation of the causal effects more complex than interpretation in the original scale (e.g., log-years instead of years).
An appealing alternative could be to define the causal effect using a contrast of median potential outcomes (i.e., the causal difference in medians).In fact, the causal effect has been generally defined as a contrast of any functional of the distributions of counterfactual outcomes under different exposure values.[6] However, despite being a widely acknowledged concept, there is limited availability and awareness of confounding-adjustment methods to estimate the causal difference in medians in practice.A handful of previous studies have acknowledged the need for such methods, and presented derivations of approaches to estimate causal effects as contrasts of distribution quantiles more generally.A study by Zhang et al. (2012) derived a number of methods -a quan-tile regression estimator, an inverse-probability weighted estimator, and a stratified estimator using propensity scores.[7] A more recent study in the field of environmental science defined a novel "overlap weighting" estimator using a class of balancing weights from functions of the propensity score model to weight each group to a selected target population.[8] An immediate suggestion may be to use multivariable quantile regression as explored previously [7,8], although this relies on the strict constant-effect assumption which may be too simplistic in practice.Despite these advances, application of these methods in epidemiological research remains scarce.
The current discussion and evaluation of such methods is relatively limited, which has potentially led to a lack of awareness in their existence.Furthermore, to the best of our knowledge, the use of g-computation in the context of medians has not been widely discussed, let alone studied in relation to other approaches.In addition, there has been limited investigation of how these methods perform and compare in realistic settings across various scenarios, specifically in terms of the degree of skewness in the outcome which has only been explored minimally and not for all the methods considered here.[8] In this paper we aim to describe and compare confounding-adjustment methods to estimate the causal difference in medians, intending to increase understanding of their utility and encourage application in practice where appropriate.We begin this paper by defining the causal effect of interest alongside an illustrative example from the Longitudinal Study of Australian Children (LSAC) [9], before outlining the confounding-adjustment methods considered.We then report findings from a simulation study motivated by the LSAC example, in addition to demonstration of the methods applied to the LSAC data.We conclude by summarising the key findings, strengths, limitations and practical recommendations.

Defining the causal effect using medians
Consider an observational study with continuous skewed outcome variable Y , a binary exposure variable A, and a vector of K confounder variables C. We assume C includes only binary or continuous variables, noting that categorical confounders can be represented as a set of binary indicators.For simplicity of discussion, we have restricted A to be binary and assume that no variable is subject to missingness.
The example used throughout this paper involves data from 4882 children from a longitudinal cohort study (LSAC).[9,10] Children aged 4-5 years were recruited in 2004 (wave 1; approved by the Australian Institute of Family Studies Ethics Committee), with follow-ups every two years in subsequent waves.The example investigation examined the impact of maternal mental health on a child's behaviour in early childhood.The exposure (A) was a binary indicator of probable serious We define Y a to be the potential outcome when the exposure is set to level a.In the LSAC example, Y a represents the SDQ score for a child when their mother is set to have a probable serious mental illness (a = 1) versus not (a = 0).Here we define the shorthand notation m a to denote the median (med ) potential outcome under exposure level A = a, such that m a = med[Y a ] and m a ∈ R.
Therefore, the causal difference in medians, denoted by δ, is defined as the difference between the median potential outcomes under the two exposure levels: For the LSAC example, δ represents the difference in median SDQ scores if all children were exposed to maternal mental health problems compared to if none of them were exposed.
The causal difference in medians is identifiable from observational data under the assumptions of consistency, conditional exchangeability given C and positivity (assumptions 1-3; see supplementary material for further detail), as has been shown elsewhere.[13,8] Whether these assumptions hold in practice is a matter of debate, however, for the remainder of this paper we assume these conditions do hold.

Confounding-adjustment methods
Under the aforementioned assumptions, the causal difference in medians δ can in principle be estimated from observable data using methods that adjust for potential confounding.Here we introduce the confounding-adjustment methods investigated in our study, focusing on their implementation in practice.

Multivariable quantile regression
Multivariable regression is a commmon approach to estimate the average causal effect, adjusting for confounding through conditioning on the confounders.To estimate δ, a natural adaptation is to use quantile regression (QR); a method for modelling the quantiles of the distribution of a random variable conditional on a set of covariates.[14] When applying this approach, a QR model is fitted for the outcome variable Y conditional on both the exposure A and confounder variables C, with the τ th quantile of Y modelled as By setting τ = 0.5, the coefficient of the exposure variable β 1 (0.5) encodes the difference in the conditional median outcome between exposure groups for every level of C (confounder strata), i.e., Under assumptions (1-3), the assumption of a constant causal effect across confounder strata and assuming the QR model is correctly specified, the estimated exposure coefficient β1 (0.5) has been shown to be a consistent estimator for the causal effect δ. [8] Quantile regression is a widely-applied and accessible method in practice, with implementation readily available in statistical software (e.g., the quantreg package in R [15]).However, the assumption of a constant causal effect across confounder strata may be too simplistic.

IPW estimator
An alternative approach uses the framework of inverse probability weighting (IPW) to create a pseudo-population in which the distribution of C is balanced between exposure groups, such that the association between A and Y in the pseudo-population provides an unbiased estimate for the causal effect of A on Y .[16,17] To create the pseudo-population, observations are re-weighted in a way that is inversely proportional to the probability of the observed exposure conditional on the confounding variables via calculation of inverse probability (IP) weights.These probabilities are estimated from a model for the propensity score (PS) defined as π(c) = P (A = 1|C = c).
A study by Zhang et al. (2012) derived an IPW estimator for the causal difference in medians and other quantiles of the potential outcome distribution.[7] Applying this approach in the context of medians, m a is estimated as the solution to the equation where Ŵa,i denotes the estimated weight for observation i = 1, . . ., n under exposure level A i = a.
The weights are defined as W a,i = I(A i = a)/[nP (A i = a|c i )] and calculated using estimates of the propensity score, π(c), obtained via a suitable regression model (e.g., a logistic regression model for A conditional on C).Following the suggestion of Zhang et al. ( 2012), normalised weights Ŵ * a,i are preferred to improve finite-sample performance [7], and are calculated by dividing each weight by the sum of all weights in the associated exposure group (see supplementary material for mathematical formulation).
To obtain the estimates m1 and m0 , Equation 3 is solved for each exposure level a.In practice, statistical software can be used to solve this equation (e.g., via the uniroot function in R).Under assumptions (1-3) and assuming that the propensity score model is correctly specified, the difference between these two values consistently estimates the causal difference in medians δ.

Weighted quantile regression
An alternative implementation of IPW uses IP weights to fit a weighted quantile regression (QR) model, weighting the score equations of the regression as opposed to the observed outcomes (as is done via the IPW estimator).Using this approach, a univariable QR model for the τ th quantile of Y is specified as and fit using the estimated IP weights as outlined for the IPW estimator above.By setting τ = 0.5, the coefficient of the exposure variable encodes the difference in medians between each exposure level in the pseudo-population.Under assumptions (1-3) and assuming the propensity score model is correctly specified, the estimate β * 1 (0.5) has been shown to be a consistent estimator for the causal difference in medians δ. [8] Unlike the previous IPW estimator, which needs to be hand-coded at present, implementation of weighted QR is readily available within software packages (e.g., the weights argument within the rq function in R [15,18]).

G-computation
G-computation is another popular confounding-adjustment method, arising from the g-formula which states that under assumptions (1-3), the marginal density of Y a can be identified from observable data as [19] where the outer expectation is over C, where f is defined as the density.Intuitively, the right-hand side is standardising the conditional density under an exposure value to the distribution of the confounders in the whole sample, thus addressing the imbalance in confounders between exposure groups due to non-randomisation, which enables a contrast under exposure values comparable.This result can be used to identify any functional of the marginal density of Y a .
When interest is in the median potential outcome m a , the g-formula implies that m a can be identified as the solution to Here we note the term inside the integral is the expectation of a density, and thus the median potential outcome is not identified by a simple aggregation across the sample (i.e., the median of the conditional medians or expectations).Instead, estimation of m a using g-computation requires estimation of the conditional density within the integral.Next we describe two possible implementations of g-computation for estimation of m a -a Monte Carlo integration-based approach and an approximate approach, denoted as g-comp (MC) and g-comp (approx), respectively.

G-comp (MC)
The first implementation, g-comp (MC), uses Monte Carlo simulation to perform draws from the density f Y a (y), which can then be used to estimate m a .[20] Specifically, we posit a model for the conditional density of Y given A = a and C. We then repeatedly draw from the expectation over C of this density, i.e., corresponding to draws from f Y a (y) (based on Equation 5).The median potential outcome under exposure value a is estimated as the median of these draws.[20] Implementing this approach requires a model for the distribution of Y conditional on A and C (referred to as the outcome model).Given the skewed nature of the outcome in this setting, one possible approach is to assume that Y ∼ A, C follows an approximate log-normal distribution, with the mean of the underlying normal distribution dependent on A and C. The g-comp (MC) method would therefore be implemented as follows: 1. Fit a linear model for log(Y ) conditional on A and C to the observed data.
2. Using the fitted linear model, obtain predictions of the mean outcome (on the log scale) for every observation (where i = 1, ..., n) twice: (a) Set each observation to be exposed (i.e., set (b) Set each observation to be unexposed (i.e., set 3. For a = 1, 0, repeatedly perform R draws for each observation from a log-normal distribution parametrised with the mean (on the log scale) equal to μa i and standard deviation σ equal to the estimated residual deviance of the model fitted in step 1.
4. For a = 0, 1, the median of the combined R samples drawn for each of the n observations are used to obtain an estimate of the median potential outcome ( ma ).[20]

G-comp (approx)
In practice, the above approach may be computationally intensive, particularly for a data set with a large number of observations.Therefore, an alternative approach, g-comp (approx), approximates E[f Y |A,C (y|a, c)] by obtaining estimates of it across a grid of candidate y values, denoted as y * , and then solves Equation 6 numerically to obtain the estimated median potential outcome.This approach again requires a model for the density and an assumed distribution for Y ∼ A, C (e.g., log-normality).
When implementing this approach, steps 1 and 2 are performed as outlined above.For a = 0, 1 and for each candidate y * , we then estimate fY |A,C (y * |a, c i ) for each record i by assuming a lognormal density with the mean (on the log scale) equal to μa i and standard deviation σ (as defined in step 3 above).Averaging the conditional densities over the sample yields the estimated expectation for candidate value y * : By repeating this process for every candidate value y * , the expectation within the integral in Equation 6 is estimated across the range of Y .This integral is then approximated by adding up these values cumulatively, and finding the minimum value of y * for which this sum is equal to 0.5 to estimate m a .
For both g-computation implementations, the difference between the estimates m1 and m0 consistently estimates the causal difference in medians under assumptions (1-3) and assuming that the outcome model is correctly specified.[20]

Standard error estimation
For all confounding-adjustment methods outlined here, it is advised that standard errors and confidence intervals (CIs) are estimated using bootstrap procedures (using the percentile method) based on previous recommendations [7].

Simulation Study
A simulation study was conducted motivated by the LSAC example to investigate the performance of each confounding-adjustment method in a realistic setting and under varying degrees of skewness in the outcome variable.

Design of the simulation study
We generated 1000 datasets consisting of 1000 records for each of four skewness scenarios considered.The true causal difference in medians δ in each scenario was computed by empirical methods [8] (see supplementary material for further details and true values used).Coefficients in the outcome-generating model were modified to ensure the true value was large enough to be estimable with the given sample size with adequate power (approximately 80%) in an unadjusted analysis.
The confounding-adjustment methods were applied to each simulated dataset to estimate δ, alongside an unadjusted contrast of sample medians across exposure groups.Specific details about model specifications and implementation are provided in supplementary material, although here we note that both the propensity score model and outcome model were correctly specified (i.e., consistent with the data generation approach).
For each confounding strength and skewness scenario, metrics assessing the performance of each method were calculated using the formulae in Morris et al. ( 2019) [21] and averaged over the 1000 samples, with key focus on bias in the point estimate.Monte Carlo standard errors were estimated for each metric.All analysis was conducted in R 4.0.2[18], using the quantreg [15] and boot [22,23] packages within self-developed code (available at https://github.com/daisyshep/CI-medians. git), with the IPW estimator implemented using R code supplied in the method's source paper.[7] Results from the simulation study Estimates for δ differed depending on the method used, with the variation in estimates tending to increase with increasing skewness for both confounding strengths (Figure 1).Estimates obtained using multivariable QR were biased across all skewness scenarios (relative bias range: 6.6% to 8.1% weak confounding, 9.0% to 14.4% strong confounding; Table 2); an expected result given the method's strict assumption of a constant causal effect across stratum which did not hold in the data generating mechanism.In contrast, methods which relaxed this assumption (the IPW estimator, weighted QR and both implementations of g-computation) performed well across both confounding strength settings, with minimal bias in estimates for δ (relative bias < 5% in the majority of skewness scenarios).
Both g-comp (MC) and g-comp (approx) had similar relative bias to one another across all simulation scenarios (i.e., differing by <0.1% between implementations), with g-comp (approx) being quicker computationally to implement.Furthermore, both approaches yielded lower empirical standard errors comparatively to all other methods across all settings, therefore minimising both the bias and variance simultaneously.
There was little bias in estimating the standard error (SE) with the bootstrap for all methods, with a slight overestimation for the IPW and weighted QR estimator.In all settings, the coverage probability was close to nominal level for all confounding-adjustment methods (range: 93.60% to 97.70%).

Application to LSAC
Methods were applied to the LSAC data using complete cases only (n = 3245), with specific details about their implementation provided in supplementary material.All methods estimated the median SDQ score for children exposed to maternal mental health problems to be higher than the median SDQ score for children if they were not exposed (Figure 2), suggesting moderately increased behavioural problems in early childhood for children of mothers with mental illness.
Estimated effects were more consistent across the methods than observed within the simulation study, although multivariable QR yielded lower estimates of δ than the other methods.The IPW estimator and weighted QR produced the same point estimates and bootstrap CIs to one another, as did the g-computation approaches, which estimated a slightly smaller causal effect comparatively.

Discussion
In the presence of skewed outcome data, the common approach to "ignore or transform" may not be optimal, and defining the causal effect using a contrast of median potential outcomes may be more appropriate.Despite being a widely acknowledged concept, there is scarce availability and awareness of confounding-adjustment methods to estimate this parameter in practice.A handful of previous studies have proposed approaches to estimate causal effects as contrasts of distribution  quantiles more generally [8,7], but investigation of these methods and their application in health and medical studies remain scarce.
In this paper we aimed to address this gap by describing and evaluating methods identified from previous literature (multivariable quantile regression [8], an IPW estimator [7] and weighted quantile regression [8]) alongside two implementations of g-computation that, to the best of our knowledge, have not been widely described or studied alongside other methods.The confoundingadjustment methods were selected and described with a key focus on their accessibility and ease of implementation in a practical setting, with code made available, to encourage their use in practice where applicable.
Results from the simulation study indicated varied performance of the confounding-adjustment methods when estimating the causal difference in medians.As anticipated, the multivariable QR was too simplistic for the realistic setting reflected in our simulated datasets and produced biased estimates.The IPW estimator, weighted QR and both implementations of g-computation yielded estimates with minimal bias, with g-computation additionally minimising the variance in estimates; an expected observation as IPW estimates tend to be more variable than those obtained via gcomputation.[24] We also note that the g-comp (approx) implementation was computationally more efficient than g-comp (MC) provided a suitable y * range is used.
These findings need to be interpreted in light of the fact that under our data generation approach, both the propensity score model (used for the IPW estimator and weighted QR) and outcome model (used for g-computation) were correctly specified; a critical assumption when applying the singly robust methods as noted by Zhang et al. (2012) in the case of IPW.[7] However, in the context of the medians, the outcome model for g-computation relates to specifying a model for the whole density.In practice, correctly specifying this model may be harder to achieve than a correctly specified propensity score model, and thus could be considered a stronger assumption than for the weighted methods.
A strength of this work was the design of our simulation study motivated by the LSAC example, allowing us to investigate the performance of these methods in a realistic scenario.Further we investigated varying skewness distributions in the outcome variable, alongside two different strengths of confounding bias present in the datasets, resulting in a more complex and realistic study than those explored in previous papers.[7] Additionally, our inclusion of the g-computation approach (under two implementations) in an accessible and clear manner, has brought light to a little discussed approach.
A potential limitation of our study was the restriction to singly robust methods only, which rely on correct specification of the respective model.Previous studies have presented a handful of promising doubly robust methods, which combine both a model for the outcome and a model for the exposure and rely on only one of the models being correctly specified to obtain an unbiased estimator.[7,25,26,27] Doubly robust methods have not yet been evaluated in a range of complex and realistic scenarios as is done here for singly robust methods, with their implementation not as accessible or readily available in software and therefore their application remains scarce in practice.
Both of these factors would be useful to pursue in future work.We also note the outcome distributions explored within this study were all heavily right-skewed, so results may represent an extreme representation of the methods' performance under heavily skewed outcome data.
In conclusion, when estimating the causal difference in medians the IPW estimator, weighted QR or g-computation present promising approaches, provided a richly specified model is used such that correct specification of the propensity score model or outcome model is likely.Implementation of the IPW estimator and weighted QR methods is readily available and therefore accessible to implement (e.g., rq function in R [18], open-source code of the IPW estimator [7]).Implementations of the g-computation approach are not as readily available, but we have provided source code which can guide practitioners in the implementation of this method.Overall, these methods provide appealing alternatives to the common "ignore or transform" approach or the stringent constanteffect assumption of multivariable quantile regression, enhancing our capability to obtain meaningful causal effect estimates with skewed outcome data.

S1
The LSAC case study where f Y a (y) is the density function of Y a , or alternatively as the 50 th centile of the cumulative distribution function (CDF) of Y a , F Y a (y) = P (Y a ≤ y).Under the assumption that the CDF is continuous and strictly increasing for a ∈ 0, 1, then m a = F −1 Y a (0.5).Therefore the causal difference in medians, denoted by δ, is defined as the difference between the median (denoted as med) potential outcomes under the two exposure levels:

S2.1 Identifiability assumptions
The causal difference in medians is identifiable from observational data under the following identifiability assumptions.Firstly we require the consistency assumption, which states that the exposure A corresponds to a well-defined intervention that in turn corresponds to the versions of the exposure in the data.[1] Under these conditions, the potential outcome Y a is well-defined and would be equal to Y if an individual received exposure level A = a (assumption 1).Secondly, we require the conditional exchangeability assumption given the selected set of confounders, which states that the potential outcome Y a is independent of the received exposure A given C, i.e., Y a ⊥ ⊥ A|C (assumption 2).Thirdly, we require the positivity assumption, which states that every individual in the population has a positive probability of being exposed or unexposed, that is P (A = a|C = c) > 0, for all c with positive probability of occurring (assumption 3).

S3 IPW estimator
Here we provide further explanation and details supporting the IPW estimator outlined in the main manuscript.The IPW estimator in Equation 3 from the main manuscript can be derived based on the following reasoning.Given assumptions (1-3) hold (as outlined in Section S2.1 above), then for a given exposure level a the cumulative distribution function is equal to [7] Under each exposure level a ∈ {0, 1} and given estimates of the denominator probabilities from the fitted propensity score model π(c), the expectation can be estimated using the sample average [16,28] and can be regarded as the weighted empirical distribution of Y .[7] The IPW estimator of the median outcome value m a is therefore defined as the solution to Equation 3 in the main manuscript.It is important to note that Equation 3 may not have a unique solution, and therefore m a can be estimated by the value which minimises the difference between the two sides.[7] S0.0.1 S3.1 Specification of normalised weights In place of weights W a,i in Equation 3, normalised weights W * a,i are advised to be use to improve finite-sample performance.[7] These normalised weights are calculated by dividing each weight by the sum of all weights in the associated exposure group, such that with the estimated values Ŵ * a,i replacing Ŵa,i in Equation 3 (in the main manuscript).

S4.3 True causal difference in medians
Within the simulation study, the true causal difference in medians δ, with respect to which assess bias, was computed by empirical methods outlined in the supplementary material of Sun et al. (2021).[8] Specific details of our implementation are as follows.Initially we generated a large dataset (1,000,000 observations) per skewness scenario and strength of confounding bias (weak or strong).For each large dataset, a quantile regression model (including all main effects and exposureconfounder interactions) was fitted under different quantiles τ ; we used 200 different values for τ equally distributed across the range [0.05, 0.95].For our vector of m a support values, we covered the range [3,8] in increments of 0.005.Different values of δ were obtained for each skewness scenario and under the two different strengths of confounding bias, as outlined below in Table S2.

S4.4 Implementation of confounding-adjustment methods in simulation study
Here we outline the model specification and implementation of each confounding-adjustment method investigated in our study.For the multivariable quantile regression approach, the outcome model included A and C as predictors with main effect terms only, as is the default in most software.
For both the IPW estimator and the weighted quantile regression approaches, the propensity score model regressed A on the confounders including main effect terms only.For the g-computation approaches, the outcome model was specified as log(Y ) conditional on A and C, including main effect terms and two exposure-confounder interaction terms.Here we note that both the propensity score model and the outcome model were correctly specified (i.e., consistent with the data generation approach).For g-comp (MC), we performed R = 1000 draws per observation.For g-comp (approx), the candidate y * values ranged over the values [0.01, 8] in increments of 0.01.

S4.5 Monte Carlo standard errors
Table S3: Monte Carlo standard errors of performance estimates calculated over the 1000 simulated datasets per skewness scenario and confounding bias strength under each of the confoundingadjustment methods.

S5 Implementation of confounding-adjustment methods in LSAC example
For the multivariable quantile regression approach, the outcome model included A and C (consisting of 9 confounders; Table 1) as predictors with main effect terms only.The propensity score model (used for the IPW estimator and weighted quantile regression) regressed A on C including main effect terms only.For both g-comp (MC) and g-comp (approx), the outcome model was specified as log(Y ) conditional on A and C, including two-way interaction terms between A and three confounders (child's sex, maternal completion of high school, consistent parenting) deemed plausible based on substantive knowledge.g-comp (MC), we performed R = 1000 draws per observation.
For g-comp (approx), the vector of candidate y * values ranged over the values [0.01, 18] in increments of 0.01.

For
each scenario, dataset and record, five confounder variables C k for k = 1, . . ., 5 (three binary and two continuous), a binary exposure A and a skewed continuous outcome Y were generated based on variables in the LSAC data set (see supplementary Table S1 for a full outline of each variable and their generating distribution).Values for log(Y ) were generated from a normal distribution with the mean defined by a linear regression model including A and C as predictors.Different skewed distributions in Y were established by setting the standard deviation in the generating normal distribution for log(Y ) to σ = 0.75, 1, 1.25, 1.5 signifying increasing skewness scenarios.Values for log(Y ) were exponentiated to obtain the outcome value Y , with the distribution of Y being positively skewed (supplementary FigureS2).Here we note that σ characterises other properties of the outcome distribution beyond the measure of skewness.Therefore, findings across skewness scenarios are not expected to behave in a monotonic pattern.Data was generated under two different confounding settings -weak confounding bias (approximately 10% relative bias in the unadjusted estimate relative to the true value) and strong confounding bias (approximately 20% relative bias).Modifications to coefficients in the outcome-generating model were used to achieve this.

Figure 1 :
Figure 1: Calculated bias in causal difference estimates obtained under each method, skewness scenario (strength of skewness increases with scenario number, with scenario 1 and 4 corresponding to the weakest and highest skewness, respectively) and confounding setting in the simulated datasets (1000 datasets per skewness scenario) analysed under each method.

Figure 2 :
Figure 2: Estimated causal difference in medians δ for the Longitudinal Study Of Australian Children (LSAC) example[9, 10] obtained under each confounding-adjustment method, where δ = m1 − m0 is the estimated difference in median SDQ scores if all children were exposed to maternal mental health problems compared to if none of them were exposed.Point estimates and their corresponding 95% confidence intervals are presented alongside the figure.

Figure S1 :
Figure S1: Distribution of the Strengths & Difficulties questionnaire (SDQ) scores at wave 3 for the Longitudinal Study Of Australian Children (LSAC) case study.[9,10]

S4. 1
Data generationFor each scenario, dataset and record, five confounder variables C k for k = 1, . . ., 5 (three binary and two continuous) were generated sequentially in order based on variables in the LSAC data set.Observations for the binary confounders C 1 , C 3 and C 4 (based on sex, maternal education and financial hardship, respectively) were generated from a binomial distribution with the success probability defined by a logistic regression model using all previous confounders as predictors.For the continuous confounders C 2 and C 5 (maternal age and log-transformed baseline SDQ, respectively) values were generated from normal distributions with the means defined by a linear regression model including all previous confounders as predictors.Here we note none of these models included interaction terms, as outlined in TableS1below.Secondly, a binary exposure A was generated for each record from a binomial distribution with success probability defined by a logistic regression model including all confounders as main effects.A skewed continuous outcome Y was then generated for each record.Values for log(Y ) were generated from a normal distribution with the mean defined by a linear regression model including the exposure and all confounders as predictors.The linear regression model was specified to include all main effects and two exposure-confounder interaction terms based on interactions observed in the LSAC example.Different skewed distributions in the outcome variable were established by setting the standard deviation in the generating normal distribution for log(Y ) to σ = 0.75, 1, 1.25, 1.5 for each of the four increasing skewness scenarios, respectively.Values for log(Y ) were then exponentiated to obtain the outcome value Y , with the distribution of Y being positively skewed (FigureS2).

Figure S2 :
Figure S2: Distribution of the outcome variable (Y ) under each skewness scenario and confounding bias strength used within the simulation study.

Table 1 :
[9,10]ew of variables from the Longitudinal Study of Australian Children (LSAC) example[9,10]examining the impact of maternal mental health on a child's behavioural difficulties in early childhood.The exposure and confounders are recorded at wave 1 (2004), with the outcome variable recorded at wave3 (2008).

Table 2 :
Performance of confounding-adjustment methods across confounding and skewness scenarios in the simulation study with maximum Monte Carlo standard errors (SE) provided in the table footnote (see supplementary Table S3 for all Monte Carlo SEs).

Table S1 :
Detail on the models used to generate variables for the simulation study, with the structure based on the LSAC dataset unless otherwise specified.

Table S2 :
True values for the causal difference in medians δ obtained under each skewness scenario and confounding setting.