An improved multiply robust estimator for the average treatment effect

Wang, Ce; Wei, Kecheng; Huang, Chen; Yu, Yongfu; Qin, Guoyou

doi:10.1186/s12874-023-02056-7

Research
Open access
Published: 11 October 2023

An improved multiply robust estimator for the average treatment effect

Ce Wang¹^na1,
Kecheng Wei¹^na1,
Chen Huang¹,
Yongfu Yu^1,2 &
…
Guoyou Qin^1,2

BMC Medical Research Methodology volume 23, Article number: 231 (2023) Cite this article

1116 Accesses
Metrics details

Abstract

Background

In observational studies, double robust or multiply robust (MR) approaches provide more protection from model misspecification than the inverse probability weighting and g-computation for estimating the average treatment effect (ATE). However, the approaches are based on parametric models, leading to biased estimates when all models are incorrectly specified. Nonparametric methods, such as machine learning or nonparametric double robust approaches, are robust to model misspecification, but the efficiency of nonparametric methods is low.

Method

In the study, we proposed an improved MR method combining parametric and nonparametric models based on the previous MR method (Han, JASA 109(507):1159-73, 2014) to improve the robustness to model misspecification and the efficiency. We performed comprehensive simulations to evaluate the performance of the proposed method.

Results

Our simulation study showed that the MR estimators with only outcome regression (OR) models, where one of the models was a nonparametric model, were the most recommended because of the robustness to model misspecification and the lowest root mean square error (RMSE) when including a correct parametric OR model. And the performance of the recommended estimators was comparative, even if all parametric models were misspecified. As an application, the proposed method was used to estimate the effect of social activity on depression levels in the China Health and Retirement Longitudinal Study dataset.

Conclusions

The proposed estimator with nonparametric and parametric models is more robust to model misspecification.

Peer Review reports

Background

The primary goal of much-applied research is to estimate the causal effect of specific treatment (exposure or intervention) on the outcome. In randomized controlled trials, where treatments are randomly assigned to participants, the average treatment effect (ATE) can be estimated by directly comparing outcomes between treatment and control groups [1]. In observational studies, however, there are usually unbalanced covariates between treatment and control groups due to the non-randomized treatment assignment. As a result, a direct comparison of outcomes between treatment and control groups may lead to a biased estimation of the ATE [2].

The inverse probability weighting (IPW) with a propensity score (PS) model and the direct confounding adjustment (known as g-computation) with an outcome regression (OR) model are the general approaches to handling confounding bias [1, 3, 4]; Compared to IPW and g-computation methods, doubly robust approaches provide double protection from model misspecification [5,6,7]; but, the doubly robust approach does not offer sufficient protection for estimating ATE in practice, as they allow only one PS model and one OR model. Recently, multiply robust (MR) approaches, increasing the likelihood of including the correct model, are proposed for estimating ATE or a population mean with missing values [8,9,10,11].And, the previous MR approach [9] is robust against extreme values of the fitted receiving treatment probability. However, the previous MR approach [9] only considering parametric models may lead to a biased estimation when the included parametric models are all incorrectly specified.

In addition, there is growing interest in developing nonparametric methods for estimating ATE to protect against model misspecification. Machine learning, a general term for a diverse number of nonparametric algorithms, is particularly useful for classification and prediction and is used to estimate the ATE [12,13,14,15,16,17]. However, the root mean square error (RMSE) of machine learning seems to be higher than that of the correct parametric model may due to incorrect hyperparameter settings [18,19,20]. Nonparametric double robust methods based on the kernel smoothing approach [21, 22] or targeted minimum loss [23] have also been proposed to estimate the ATE. Yet, the efficiency of these estimators is not high because of slow convergence rates.

In this study, based on the previous MR approach [9], we proposed an improved MR approach considering both parametric and nonparametric models to improve the robustness to model misspecification. Our simulation study showed that the proposed MR approach is more robust to model misspecification than previous MR approach; and the MR estimators with only OR models, where one of the models was a nonparametric model, were the best among all MR estimators for the following two reasons. First, the MR estimators were robust to model misspecification, and had the lowest root mean square errors (RMSEs) when including a correct parametric OR model. Second, the performance of the best estimators was comparative even if all parametric models were misspecified.

Method

Notation and assumptions

Let ${{\varvec{X}}}_{i}$ be a $p$-dimensional vector of covariates, ${Y}_{i}$ be the observed outcome, and ${Z}_{i}$ be the treatment status taking value 1 if treated or 0 if untreated. Let (${Y}^{1}, {Y}^{0}$) be the two potential outcomes in the treatment and control groups, respectively, and the ATE is defined as

$$\Delta =E\left({Y}^{1}\right)-E\left({Y}^{0}\right)$$

And to draw a correct causal inference in the study, exchangeability, consistency, and positivity assumptions hold [24].

Previous multiply robust method

The previous MR approach proposed by Han [9] provides multiple protection to the model misspecification. Specifically, specifying two sets of parametric models, $\mathcal{P}=\left\{{\pi }^{l}\left({\varvec{X}}\right),l=1, 2, 3,\dots ,L\right\}$ for propensity score and $\mathcal{M}=\left\{{m}_{z}^{k}\left({\varvec{X}}\right),k=\mathrm{1,2},3,\dots ,K\right\}$ for outcome, where ${m}_{z}^{k}\left({\varvec{X}}\right)={m}^{k}\left({\varvec{X}},Z\right)$. Without loss of generality, let ${\mathbb{I}}=1,\dots ,{n}_{1}$ and ${\mathbb{J}}=1,\dots ,{n}_{0}$ be the indexes for treated and untreated subjects, respectively. Let ${n}_{1}$ and ${n}_{0}=n-{n}_{1}$ represent the size of treatment and control groups, respectively.

To recover the treated population average from subjects in the treatment group, the empirical likelihood weights ${w}_{i}\left(i\in {\mathbb{I}}\right)$ for the outcome ${Y}_{i}\left(i\in {\mathbb{I}}\right)$ in the treatment group are estimated by maximizing $\prod_{i\in {\mathbb{I}}}{w}_{i}$ subject to the following constraints:

$${w}_{i}\ge 0 (i\in {\mathbb{I}})$$

$$\sum_{i\in {\mathbb{I}}}{w}_{i}=1$$

$$\sum_{i\in {\mathbb{I}}}{w}_{i}{\widehat{\pi }}^{l}\left({{\varvec{X}}}_{i}\right)={\widehat{\theta }}_{1}^{l}(l=1, 2, 3,\dots ,L)$$

$$\sum_{i\in {\mathbb{I}}}{w}_{i}{\widehat{m}}_{1}^{k}\left({{\varvec{X}}}_{i}\right)={\widehat{\eta }}_{1}^{k}(k=1, 2, 3,\dots ,K)$$

where ${\widehat{\theta }}_{1}^{l}={n}^{-1}{\sum }_{i=1}^{n}{\pi }^{l}\left({{\varvec{X}}}_{i}\right)$ and ${\widehat{\eta }}_{1}^{k}={n}^{-1}{\sum }_{i=1}^{n}{m}_{1}^{k}\left({{\varvec{X}}}_{i}\right)$. By symmetry, the weights ${w}_{j}\left(j\in {\mathbb{J}}\right)$ for the control group are given by maximizing $\prod_{j\in {\mathbb{J}}}{w}_{j}$ according to the following constraints:

$${w}_{j}\ge 0 (j\in {\mathbb{J}})$$

$$\sum_{j\in {\mathbb{J}}}{w}_{j}=1$$

$$\sum_{j\in {\mathbb{J}}}{w}_{j}({1-\widehat{\pi }}^{l}\left({{\varvec{X}}}_{j}\right))={\widehat{\theta }}_{0}^{l}(l=1, 2, 3,\dots ,L)$$

$$\sum_{j\in {\mathbb{J}}}{w}_{j}{\widehat{m}}_{0}^{k}\left({{\varvec{X}}}_{j}\right)={\widehat{\eta }}_{0}^{k}(k=1, 2, 3,\dots ,K)$$

where ${\widehat{\theta }}_{0}^{l}={n}^{-1}{\sum }_{i=1}^{n}(1-{\pi }^{l}\left({{\varvec{X}}}_{i}\right))$ and ${\widehat{\eta }}_{0}^{k}={n}^{-1}{\sum }_{i=1}^{n}{m}_{0}^{k}\left({{\varvec{X}}}_{i}\right)$. The ${w}_{i}$ and ${w}_{j}$ can be given with Lagrange multiplier method as follows:

$${\widehat w}_i=\frac1{n_1}\frac1{1+\widehat p_1^T{\widehat g}_1\left(X_i\right)}\left(i\in\mathbb{I}\right)$$

$${\widehat w}_j=\frac1{n_0}\frac1{1+\widehat p_0^T{\widehat g}_0\left(X_j\right)}\left(j\in\mathbb{J}\right)$$

where

$${\widehat{{\varvec{g}}}}_{1}{\left({\varvec{X}}\right)}^{T}=\left\{{\widehat{\pi }}^{1}\left({\varvec{X}}\right)-{\widehat{\theta }}_{1}^{1},\dots ,{\widehat{\pi }}^{L}\left({\varvec{X}}\right)-{\widehat{\theta }}_{1}^{L}, {\widehat{m}}_{1}^{1}\left({\varvec{X}}\right)-{\widehat{\eta }}_{1}^{1},\dots ,{\widehat{m}}_{1}^{K}\left({\varvec{X}}\right)-{\widehat{\eta }}_{1}^{K}\right\}$$

$${\widehat{{\varvec{g}}}}_{0}{\left({\varvec{X}}\right)}^{T}=\left\{{(1-\widehat{\pi }}^{1}\left({\varvec{X}}\right)\right)-{\widehat{\theta }}_{0}^{1},\dots ,(1-{\widehat{\pi }}^{L}\left({\varvec{X}}\right))-{\widehat{\theta }}_{0}^{L}, {\widehat{m}}_{0}^{1}\left({\varvec{X}}\right)-{\widehat{\eta }}_{0}^{1},\dots ,{\widehat{m}}_{0}^{K}\left({\varvec{X}}\right)-{\widehat{\eta }}_{0}^{K}\}$$

${\widehat{{\varvec{\rho}}}}_{1}^{T}$ and ${\widehat{{\varvec{\rho}}}}_{0}^{T}$ are the $(J+K$)-dimensional Lagrange multipliers solving

$$\frac1{n_1}\sum\limits_{i\in\mathbb{I}}\frac{{\widehat{{\varvec{g}}}}_1\left(\boldsymbol{X}_i\right)}{1+{\widehat{{\varvec{\rho}}}}_1^T{\widehat{{\varvec{g}}}}_1\left(\boldsymbol{X}_i\right)}=0$$

$$\frac1{n_0}\sum\limits_{j\in\mathbb{I}}\frac{{\widehat{{\varvec{g}}}}_0\left(\boldsymbol{X}_j\right)}{1+{\widehat{{\varvec{\rho}}}}_0^T{\widehat{{\varvec{g}}}}_0\left(\boldsymbol{X}_j\right)}=0$$

${\widehat{{\varvec{\rho}}}}_{1}^{T}$ and ${\widehat{{\varvec{\rho}}}}_{0}^{T}$ must satisfy $1+{\widehat{\rho }}_{1}^{T}{\widehat{g}}_{1}\left({{\varvec{X}}}_{i}\right)>0$ and $1+{\widehat{\rho }}_{0}^{T}{\widehat{g}}_{0}\left({{\varvec{X}}}_{j}\right)>0$ due to the non-negativity of ${w}_{i}$ and ${w}_{j}$, respectively. The estimation of ${\widehat{w}}_{i}$ and ${\widehat{w}}_{j}$ can be solved by the Newton–Raphson algorithm [9].

In summary, the ATE estimated by MR method [9] is defined as

$${\widehat{\Delta }}_{mr(Han)}=\sum_{i\in {\mathbb{I}}}{\widehat{w}}_{i}{Y}_{i}-\sum_{j\in {\mathbb{J}}}{\widehat{w}}_{j}{Y}_{j}$$

Proposed multiply robust method

The previous MR method [9] allows multiple parametric models, increasing the likelihood of including the correct model; However, there is still a biased estimates of ATE when all parametric models are misspecified. Based on the previous MR method, the proposed MR method allows multiple parametric models, and also includes a nonparametric PS model and a nonparametric OR model.

We select the neural network (NNET) as nonparametric models in the proposed MR method. NNET, one machine learning algorithm, has been used to estimate the ATE [15, 16]. We specified three-layer (input layer, one hidden layer, output layer) NNET, which may be practical [12], and the hidden layer consists of 4 nodes. We performed the NNET using the nnet R package with default parameters. Finally, the proposed MR method added a NNET-based outcome regression model (NN-OR) and a NNET-based propensity score model (NN-PS) in base of previous MR method.

Similar to the previous MR method [9], we specify two sets of models, $\mathcal{P}=\left\{{\pi }^{l}\left({\varvec{X}}\right),l=1, 2, 3,\dots ,L,L+1\right\}$ for propensity score and $\mathcal{M}=\left\{{m}_{z}^{k}\left({\varvec{X}}\right),k=\mathrm{1,2},3,\dots ,K,K+1\right\}$ for outcome. Assume ${\pi }^{L+1}\left({\varvec{X}}\right)$ and ${m}_{z}^{K+1}({\varvec{X}})$ are the NN-PS and NN-OR models, respectively, and the other are parametric models. The empirical likelihood weights ${w}_{i}\left(i\in {\mathbb{I}}\right)$ for the outcome ${Y}_{i}\left(i\in {\mathbb{I}}\right)$ in the treatment group are estimated by maximizing $\prod_{i\in {\mathbb{I}}}{w}_{i}$ subject to the following constraints:

$${w}_{i}\ge 0 (i\in {\mathbb{I}})$$

$$\sum_{i\in {\mathbb{I}}}{w}_{i}=1$$

$$\sum_{i\in {\mathbb{I}}}{w}_{i}{\widehat{\pi }}^{l}\left({{\varvec{X}}}_{i}\right)={\widehat{\theta }}_{1}^{l}(l=1, 2, 3,\dots ,L,L+1)$$

$$\sum_{i\in {\mathbb{I}}}{w}_{i}{\widehat{m}}_{1}^{k}\left({{\varvec{X}}}_{i}\right)={\widehat{\eta }}_{1}^{k}(k=1, 2, 3,\dots ,K,K+1)$$

where ${\widehat{\theta }}_{1}^{l}={n}^{-1}{\sum }_{i=1}^{n}{\pi }^{l}\left({{\varvec{X}}}_{{\varvec{i}}}\right)$ and ${\widehat{\eta }}_{1}^{k}={n}^{-1}{\sum }_{i=1}^{n}{m}_{1}^{k}\left({{\varvec{X}}}_{i}\right)$. By symmetry, the weights ${w}_{j}\left(j\in {\mathbb{J}}\right)$ for the control group are given by maximizing $\prod_{j\in {\mathbb{J}}}{w}_{j}$ according to the following constraints:

$${w}_{j}\ge 0 (j\in {\mathbb{J}})$$

$$\sum_{j\in {\mathbb{J}}}{w}_{j}=1$$

$$\sum_{j\in {\mathbb{J}}}{w}_{j}({1-\widehat{\pi }}^{l}\left({{\varvec{X}}}_{j}\right))={\widehat{\theta }}_{0}^{l}(l=1, 2, 3,\dots ,L,L+1)$$

$$\sum_{j\in {\mathbb{J}}}{w}_{j}{\widehat{m}}_{0}^{k}\left({{\varvec{X}}}_{j}\right)={\widehat{\eta }}_{0}^{k}(k=1, 2, 3,\dots ,K,K+1)$$

where ${\widehat{\theta }}_{0}^{l}={n}^{-1}{\sum }_{i=1}^{n}(1-{\pi }^{l}\left({{\varvec{X}}}_{i}\right))$ and ${\widehat{\eta }}_{0}^{k}={n}^{-1}{\sum }_{i=1}^{n}{m}_{0}^{k}\left({{\varvec{X}}}_{i}\right)$. The ${w}_{i}$ and ${w}_{j}$ can be given with Lagrange multiplier method as follows:

$${\widehat{w}}_{i}=\frac{1}{{n}_{1}}\frac{1}{1+{\widehat{{\varvec{\rho}}}}_{1}^{T}{\widehat{{\varvec{g}}}}_{1}\left({{\varvec{X}}}_{i}\right)} \left(i\in {\mathbb{I}}\right)$$

$${\widehat{w}}_{j}=\frac{1}{{n}_{0}}\frac{1}{1+{\widehat{{\varvec{\rho}}}}_{0}^{T}{\widehat{{\varvec{g}}}}_{0}\left({{\varvec{X}}}_{j}\right)} \left(j\in {\mathbb{J}}\right)$$

where

$${\widehat{{\varvec{g}}}}_{1}{\left({\varvec{X}}\right)}^{T}=\left\{{\widehat{\pi }}^{1}\left({\varvec{X}}\right)-{\widehat{\theta }}_{1}^{1},\dots ,{\widehat{\pi }}^{L}\left({\varvec{X}}\right)-{\widehat{\theta }}_{1}^{L},{\widehat{\pi }}^{L+1}\left({\varvec{X}}\right)-{\widehat{\theta }}_{1}^{L+1}, {\widehat{m}}_{1}^{1}\left({\varvec{X}}\right)-{\widehat{\eta }}_{1}^{1},\dots ,{\widehat{m}}_{1}^{K}\left({\varvec{X}}\right)-{\widehat{\eta }}_{1}^{K},{\widehat{m}}_{1}^{K+1}\left({\varvec{X}}\right)-{\widehat{\eta }}_{1}^{K+1}\right\}$$

$${\widehat{{\varvec{g}}}}_{0}{\left({\varvec{X}}\right)}^{T}=\left\{{(1-\widehat{\pi }}^{1}\left({\varvec{X}}\right)\right)-{\widehat{\theta }}_{0}^{1},\dots ,\left(1-{\widehat{\pi }}^{L}\left({\varvec{X}}\right)\right)-{\widehat{\theta }}_{0}^{L}, \left(1-{\widehat{\pi }}^{L+1}\left({\varvec{X}}\right)\right)-{\widehat{\theta }}_{0}^{L+1}, {\widehat{m}}_{0}^{1}\left({\varvec{X}}\right)-{\widehat{\eta }}_{0}^{1},\dots ,{\widehat{m}}_{0}^{K}\left({\varvec{X}}\right)-{\widehat{\eta }}_{0}^{K},{\widehat{m}}_{0}^{K+1}\left({\varvec{X}}\right)-{\widehat{\eta }}_{0}^{K+1}\}$$

${\widehat{{\varvec{\rho}}}}_{1}^{T}$ and ${\widehat{{\varvec{\rho}}}}_{0}^{T}$ are the $(J+K+2$)-dimensional Lagrange multipliers solving

$$\frac1{n_1}\sum\limits_{i\in\mathbb{I}}\frac{{\widehat{{\varvec{g}}}}_1\left(\boldsymbol{X}_i\right)}{1+{\widehat{{\varvec{\rho}}}}_1^T{\widehat{{\varvec{g}}}}_1\left(\boldsymbol{X}_i\right)}=0$$

$$\frac1{n_0}\sum\limits_{i\in\mathbb{J}}\frac{{\widehat{{\varvec{g}}}}_0\left(\boldsymbol{X}_j\right)}{1+{\widehat{{\varvec{\rho}}}}_0^T{\widehat{{\varvec{g}}}}_0\left(\boldsymbol{X}_j\right)}=0$$

${\widehat{{\varvec{\rho}}}}_{1}^{T}$ and ${\widehat{{\varvec{\rho}}}}_{0}^{T}$ must satisfy $1+{\widehat{{\varvec{\rho}}}}_{1}^{{\varvec{T}}}{\widehat{{\varvec{g}}}}_{1}\left({{\varvec{X}}}_{i}\right)>0$ and $1+{\widehat{{\varvec{\rho}}}}_{0}^{T}{\widehat{{\varvec{g}}}}_{0}\left({{\varvec{X}}}_{j}\right)>0$ due to the non-negativity of ${w}_{i}$ and ${w}_{j}$, respectively. The estimation of ${\widehat{w}}_{i}$ and ${\widehat{w}}_{j}$ can be solved by the Newton–Raphson algorithm [9].

The ATE estimated by the proposed method is defined as

$${\widehat{\Delta }}_{mr}=\sum_{i\in {\mathbb{I}}}{\widehat{w}}_{i}{Y}_{i}-\sum_{j\in {\mathbb{J}}}{\widehat{w}}_{j}{Y}_{j}$$

Bootstrap confidence interval

The confidence interval of the estimators $\Delta$ could be obtained by the bootstrap method, where $\Delta$ maybe IPW, OR, or MR estimator. Specifically, $n$ individuals first are resampled with replacement from the original data for $B$ times to obtain $B$ bootstrap sample, where $B$ is the pre-specified number. For $b=1,\dots ,B$, let ${\widehat{\Delta }}^{b}$ be the estimates of the estimator from the $b$-th bootstrap sample. Then the bootstrap variance estimator for ${\widehat{\Delta }}^{b}$ is given by

$$\widehat{var}\left({\widehat{\Delta }}^{b}\right)= \frac{1}{B-1}\sum_{b=1}^{B}({\widehat{\Delta }}^{b}-\frac{1}{B}\sum_{b=1}^{B}{\widehat{\Delta }}^{b}{)}^{2}$$

A normality-based 95% confidence interval for $\Delta$ is ${\widehat{\Delta }}^{b}\pm 1.96\sqrt{\widehat{var}\left({\widehat{\Delta }}^{b}\right)}$

Simulation study

Simulation design

We conducted comprehensive simulation studies to evaluate the performance of the proposed MR method. Ten covariates ${X}_{1}-{X}_{10}$ were simulated from the standard normal distribution, where $\mathrm{corr}\left({X}_{1},{X}_{5}\right)=\mathrm{corr}\left({X}_{4},{X}_{9}\right)=0.9$ and $\mathrm{corr}\left({X}_{2},{X}_{6}\right)=\mathrm{corr}\left({X}_{3},{X}_{8}\right)=0.2$. The treatment indicator was generated from a Bernoulli distribution with a PS model as followed

$$logit[P(Z=1|{\varvec{X}})]=0.8*{X}_{1}-0.25*{X}_{2} + 0.6*{X}_{3}-0.4*{X}_{4}-0.8*{X}_{5}-0.5*{X}_{6}+ 0.7*{X}_{7}$$

, which produced a treatment prevalence of approximately 50%. The outcome was simulated from

$$Y=-3.85 + 0.3*{X}_{1}-0.36*{X}_{2}-0.73*{X}_{3}-0.2*{X}_{4} + 0.71*{X}_{8} + 0.19*{X}_{9} + 0.26*{X}_{10} + 0.3*{X}_{1}^{2}-0.36*{X}_{2}^{2} + Z + \varepsilon$$

where $\varepsilon \sim N(\mathrm{0,1})$. The true value of ATE is 1 in the simulation.

In the simulation, we specified three models, including a NN-PS model and two parametric models

$${\mathbb{A}}=\left\{\begin{array}{c}{\pi }^{1}(X)={f}_{1}({X}_{1},{X}_{2},{X}_{3},{{X}_{4},X}_{5},{X}_{6},{X}_{7})\\ {\pi }^{2}(X)={f}_{2}\left({X}_{1},{X}_{2},{X}_{3}{,{X}_{4},X}_{5},{X}_{6},{X}_{7}\right)\\ {\pi }^{3}(X)={f}_{3}\left({X}_{1}^{2},{X}_{2}^{2},{X}_{3}^{2},{X}_{4}^{2},{X}_{5}^{2},{X}_{6}^{2},{X}_{7}^{2}\right)\end{array}\right\}$$

for propensity score. Let ${\pi }^{1}({\varvec{X}})$ be the PS of NN-PS model, and ${\pi }^{2}({\varvec{X}})$ and ${\pi }^{3}\left({\varvec{X}}\right)$ be the PS of logistic models. And we also specified three models, including a OR-NNET model and two parametric models

$${\mathbb{B}}=\left\{\begin{array}{c}{m}^{1}(X,Z)={h}_{1}({X}_{1},{X}_{2},{X}_{3},{{X}_{4},X}_{8},{X}_{9},{X}_{10},Z)\\ \begin{array}{c}{m}^{2}\left({\varvec{X}},Z\right)={h}_{2}\left({X}_{1},{X}_{2},{X}_{3}{,X}_{8},{X}_{9},{X}_{10},{X}_{1}^{2},{X}_{2}^{2},Z\right)\\ {m}^{3}\left({\varvec{X}},Z\right)={h}_{3}\left({X}_{1}{X}_{2},{X}_{3}{X}_{4},{X}_{8}{X}_{9},{X}_{1}{X}_{8},{X}_{2}{X}_{9},{X}_{3}{X}_{10},Z\right)\end{array}\end{array}\right\}$$

for outcome regression. Let ${m}^{1}({\varvec{X}},Z)$ be the outcome of NN-OR model, and ${m}^{2}({\varvec{X}},Z)$ and ${m}^{3}\left({\varvec{X}},Z\right)$ be the outcome of linear regression models. According to the data-generating process, ${\pi }^{2}\left({\varvec{X}}\right)$ and ${m}^{2}\left({\varvec{X}},{\varvec{Z}}\right)$ were correctly specified. In order to distinguish these estimation methods, three IPW estimators were defined as “IPW.model1”, “IPW.model2” and “IPW.model3”, where “IPW.model1” refer to the IPW estimator with a NN-PS model; three OR estimators were defined as “OR.model1”, “OR.model2” and “OR.model3”, where “OR.model1” refer to the OR estimator with NN-OR model. For the MR estimators, each estimator is denoted as “MR000000” where each digit of the six numbers, from left to right, indicates if ${\pi }^{1}\left({\varvec{X}}\right)$, ${\pi }^{2}\left({\varvec{X}}\right)$, ${\pi }^{3}\left({\varvec{X}}\right)$,${m}^{1}\left({\varvec{X}},Z\right), {m}^{2}\left({\varvec{X}},Z\right)$ or ${m}^{3}\left({\varvec{X}},Z\right)$ is included in the estimator (“1” means yes, and “0” means no).

And we studied the performance of MR estimators when all parametric models were misspecified as follows

$${\mathbb{A}}=\left\{\begin{array}{c}{\pi }^{1}(X)={f}_{1}({X}_{1},{X}_{2},{X}_{3},{{X}_{4},X}_{5},{X}_{6},{X}_{7})\\ {\pi }^{2}\left({\varvec{X}}\right)={f}_{2}\left({X}_{1},{X}_{2}{,X}_{5}\right)\\ {\pi }^{3}(X)={f}_{3}\left({X}_{1}^{2},{X}_{2}^{2},{X}_{3}^{2},{X}_{4}^{2},{X}_{5}^{2},{X}_{6}^{2},{X}_{7}^{2}\right)\end{array}\right\}$$

for propensity score

$${\mathbb{B}}=\left\{\begin{array}{c}{m}^{1}(X,Z)={h}_{1}({X}_{1},{X}_{2},{X}_{3},{{X}_{4},X}_{8},{X}_{9},{X}_{10},Z)\\ \begin{array}{c}{m}^{2}\left({\varvec{X}},Z\right)={h}_{2}\left({X}_{1},{X}_{2},{X}_{10},Z\right)\\ {m}^{3}\left({\varvec{X}},Z\right)={h}_{3}\left({X}_{1}{X}_{2},{X}_{3}{X}_{4},{X}_{8}{X}_{9},{X}_{1}{X}_{8},{X}_{2}{X}_{9},{X}_{3}{X}_{10},Z\right)\end{array}\end{array}\right\}$$

for outcome regression.

In addition, which factors related to the treatment and outcome were unknown in practice; hence, we also explored the performance of MR estimators when including NNET models with all covariates in situations where parametric models included correct models or did not include any correct model.

In the simulation, we calculated the mean relative bias, RMSE, and coverage rate to assess the performance of the proposed MR method. All results were based on 1000 simulation replications, and the sample sizes $n=200, 500 \mathrm{\ and}\ 2000$.

Simulation results

Table 1 and Figure S1 showed the simulation results of estimating ATE where the proposed MR method included the correct parametric models. And Table 2 and Figure S2 showed the simulation results when there were no correct parametric models in MR method. We could get a few conclusions from the simulation studies.

Table 1 Simulation results with different sample sizes = 200, 500 or 2000 in the situation where the parametric models included the correct models and the neural network model included true covariates

Full size table

Table 2 Simulation results with different sample sizes = 200, 500 or 2000 in the situation where the parametric models did not include the correct models and the neural network model included true covariates

Full size table

According to Table 1 and Figure S1, biases of IPW, OR, or MR estimators were ignorable when the parametric models were correctly specified or NNET models were included. The RMSEs of estimators with correct PS or NN-PS models were larger than those with correct OR or NN-OR, respectively. And the RMSEs of the estimators with the correct parametric OR model (OR.model2, MR000010) had the smallest RMSEs among all estimators, and were significantly less than those with NN-OR model (OR.model1, MR000100). However, it can easily be seen that the biases and RMSEs of the estimator with a wrong parametric model are much larger (OR.model3, MR000001).

Together with Table 2 and Figure S2, the proposed MR estimators improved the robustness to the model misspecification even if the parametric models were all incorrectly modeled (MR111000, MR000111, MR111011, MR011111, MR111111 in Table 2 and Figure S2). Although the biases of the six estimators are negligible, the MR000111 had the smallest RMSE among the six estimators. Further, MR000111 with a correct OR model (MR000111 in Table 1 and Figure S1) had a smaller RMSE than the MR estimators with both parametric PS and OR models; and the RMSE of MR000111 was small as that of OR estimator with the correct parametric model (OR.model2 in Table 1 and Figure S1). Even if the two parametric OR models are incorrectly specified, the RMSE of MR000111 is similar to that of NN-OR model.

In addition, Table 3 and Figure S3 showed the simulation results when NNET models included all covariates where parametric models included a correct model; and Table 4 and Figure S4 showed the simulation results when NNET models included all covariates where parametric models did not include a correct model. Similar results were observed in Tables 3 and 4 and Figures S3 and S4.

Table 3 Simulation results with different sample sizes = 200, 500 or 2000 in the situation where the parametric models included the correct models and the neural network model included all covariates

Full size table

Table 4 Simulation results with different sample sizes = 200, 500 or 2000 in the situation where the parametric models did not include the correct models and the neural network model included all covariates

Full size table

In conclusion, the simulation results showed that the proposed MR estimators were robust to model misspecification even if all parametric models were incorrectly specified. Further, considering the robustness to model misspecification and RMSE, the MR estimators with only OR models, where one of the models was a NNET model, was the most recommended. The recommended estimators were robust to model misspecification and tended to have the smallest RMSE when the estimators included a correct OR model; and the performance of the recommended estimators was comparative even if all parametric models were misspecified.

Empirical study

The China Health and Retirement Longitudinal Study (CHARLS) is a large-scale, nationally representative longitudinal survey of people aged 45 or older and their spouses in China, including assessments of the social, economic, and health status of community residents [25]. The study aimed to estimate the treatment effect of social activity on the depression level in the real-world data from CHARLS. The depression level was evaluated by CES-D Depression Scale, and the total score was between 0 and 30: a higher total score denotes a higher depression level, while a lower total score denotes a lower depression level. The self-reported social activity includes 11 categories based on individual responses to the question, “Have you done any of these activities in the last month”. The value of the variable is 1 if the participant takes part in any activities; otherwise, the value is 0. The group with a value 1 was defined as a social activity group, and the group with a value 0 was defined as a non-social activity group. Baseline information included age, marital status, sex, region, smoke status, self-reported hypertension, self-reported diabetes, self-reported heart disease, and self-reported stroke. Inclusion criteria are: (1) participants who took part in the survey in 2011–2012 (2) complete baseline. A total of 10,119 participants were included in the analysis. The baseline information was summarized in Table S1. We found that the non-social activity group had a higher level of depression.

In the study, we specified three PS models (including one NN-PS model and two logistic models) and three OR models (including one NN-OR model and two linear regression) in the MR method. For the NNET models, we included all covariates. For the parametric models, we explored the association of social activity group/non-social activity group with all covariates via a logistic model, and the association of depression with all covariates through a linear model; and we identified candidate models with a significant level at 0.05 and 0.01. Hence, three set of covariates in [${\pi }^{1}\left({\varvec{X}}\right),{\pi }^{2}\left({\varvec{X}}\right),{\pi }^{3}({\varvec{X}})$] are: (i) age, marital status, sex, region, smoke status, self-reported hypertension, self-reported diabetes, self-reported heart disease and self-reported stroke; (ii) age, marital status, sex, region, smoke status, and self-reported diabetes; (iii) age, region and smoke status. Three sets of covariates in [${m}^{1}({\varvec{X}},Z),{m}^{2}({\varvec{X}},Z),{m}^{3}({\varvec{X}},Z)$] are: (i) age, marital status, sex, region, smoke status, self-reported hypertension, self-reported diabetes, self-reported heart disease, self-reported stroke and activity group; (ii) marital status, sex, region, self-reported diabetes, self-reported heart disease, self-reported stroke, and social activity; (iii) marital status, sex, region, self-reported heart disease, self-reported stroke and activity group. We applied the MR methods to estimate the effect. The results were shown in Table 5 and Figure S5.

Table 5 Estimating effect of social activity on depression level (non-social activity group as reference group)

Full size table

From Table 5, all estimates showed that the social activity group had lower depression scores than the non-social activity group. However, when only specifying NN-PS model and NN-OR model (MR100000, MR000100, MR100100), the estimators tended to have higher estimated values and higher standard errors. The other estimators had similar estimates, and MR000111 tended to have the smallest standard errors.

Discussion

In this study, we considered estimating ATE between treatment and control groups in observational studies. The proposed MR estimators combined parametric and nonparametric models based on the previous MR method [8]. Our simulation study showed that the MR estimators with only outcome regression (OR) models, where one of the models was a nonparametric model, were the most recommended because of the robustness to model misspecification and the lowest root mean square error (RMSE) when including a correct parametric OR model. And the performance of the recommended estimators was comparative even if all parametric models were misspecified. We mainly focused on estimating ATE in the study, and our proposed method can be easily to estimate other causal parameters, such as the average treatment effect on the treated, $E({Y}^{1}-{Y}^{0}|Z=1)$ [26], and log of causal risk ratio for binary outcome $log\frac{E({Y}^{1})}{E({Y}^{0})}$ [27].

Our simulation results showed that the IPW, direct confounding adjustment, or MR estimators with only parametric models might gain a large bias, large RMSE, and low coverage rate when the parametric models were misspecified. By contrast, when adding a NNET model to the MR estimators, the bias was ignorable and coverage rate was close to 95% even if we misspecified all parametric models. In addition, the estimators with only OR models are the most recommended because of the robustness to the model specification and the smallest RMSE when including a correct OR model. Further, NNET can extract the complex relationship among variables without prior information so that we put some variables unrelated to exposure or outcome in the model and could still get similar results.

A limitation of study is that no relevant theoretical proof was provided, and future research will focus on theoretical proofs and properties of the proposed method. Further, we focused on estimating ATE in a non-survival context in the study, but there are lots of time-to-event outcomes in observational studies, and the extension of the proposed method in the survival outcomes studies will be a topic of our future research.

Conclusions

In this study, we proposed a new MR estimator, considering nonparametric and parametric models, which is more robust to model misspecification.

Availability of data and materials

Publicly available datasets were analyzed in this study. These data can be downloaded from http://charls.pku.edu.cn/.

Abbreviations

MR:: Multiply robust
ATE:: Average treatment effect
RMSE:: Root mean square error
IPW:: Inverse probability weighting
PS:: Propensity score
OR:: Outcome regression
NNET:: Neural network
CHARLS:: China Health and Retirement Longitudinal Study

References

Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
Article Google Scholar
Hernán MA, Robins JM. Causal inference. Boca Raton: CRC; 2010.
Mansournia MA, Altman DG. Inverse probability weighting. BMJ. 2016;352.
Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9–12):1393–512.
Article Google Scholar
Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61(4):962–73.
Article PubMed Google Scholar
Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96(3):723–34.
Article PubMed PubMed Central Google Scholar
Glynn AN, Quinn KM. An introduction to the augmented inverse propensity weighted estimator. Polit Anal. 2010;18(1):36–56.
Article Google Scholar
Han P, Wang L. Estimation with missing data: beyond double robustness. Biometrika. 2013;100(2):417–30.
Article Google Scholar
Han P. A further study of the multiply robust estimator in missing data analysis. JASA. 2014;109(507):1159–73.
CAS Google Scholar
Wang L. Multiple robustness estimation in causal inference. Commun Stat Theory Methods. 2019;48(23):5701–18.
Article Google Scholar
Shu D, Han P, Wang R, Toh S. Estimating the marginal hazard ratio by simultaneously using a set of propensity score models: a multiply robust approach. Stat Med. 2021;40(5):1224–42.
Article PubMed Google Scholar
Westreich D, Lessler J, Funk MJ. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol. 2010;63(8):826–33.
Article PubMed PubMed Central Google Scholar
Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat Med. 2010;29(3):337–46.
Article PubMed PubMed Central Google Scholar
Cannas M, Arpino B. A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting. Biom J. 2019;61(4):1049–72.
Article PubMed Google Scholar
Setoguchi S, Schneeweiss S, Brookhart MA, Glynn RJ, Cook EF. Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol Drug Saf. 2008;17(6):546–55.
Article PubMed PubMed Central Google Scholar
Gharibzadeh S, Mansournia MA, Rahimiforoushani A, Alizadeh A, Amouzegar A, Mehrabani-Zeinabad K, et al. Comparing different propensity score estimation methods for estimating the marginal causal effect through standardization to propensity scores. Commun Stat Simul Comput. 2018;47(4):964–76.
Article Google Scholar
Chernozhukov V, Escanciano JC, Ichimura H, Newey WK, Robins JM. Locally robust semiparametric estimation. Econometrica. 2022;90(4):1501–35.
Article Google Scholar
Le Borgne F, Chatton A, Léger M, Lenain R, Foucher Y. G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes. Sci Rep. 2021;11(1):1–12.
Article Google Scholar
Probst P, Boulesteix A-L, Bischl B. Tunability: importance of hyperparameters of machine learning algorithms. J Mach Learn Res. 2019;20(1):1934–65.
Google Scholar
Yang L, Shami A. On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing. 2020;415:295–316.
Article Google Scholar
Colangelo K, Lee Y-Y. Double debiased machine learning nonparametric inference with continuous treatments. arXiv preprint arXiv:200403036. 2020.
Kennedy EH, Ma Z, McHugh MD, Small DS. Non-parametric methods for doubly robust estimation of continuous treatment effects. J R Stat Soc Series B Stat Methodol. 2017;79(4):1229–45.
Article PubMed Google Scholar
Benkeser D, Carone M, Laan MVD, Gilbert P. Doubly robust nonparametric inference on the average treatment effect. Biometrika. 2017;104(4):863–80.
Article CAS PubMed Google Scholar
Hernan M, Robins J. Causal Inference: What if. Boca Raton: Chapman & Hill/CRC; 2020.
Google Scholar
Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China health and retirement longitudinal study (CHARLS). Int J Epidemiol. 2014;43(1):61–8.
Article PubMed Google Scholar
Hartman E, Grieve R, Ramsahai R, Sekhon JS. From sample average treatment effect to population average treatment effect on the treated: combining experimental with observational studies to estimate population treatment effects. J R Stat Soc A Stat Soc. 2015;178(3):757–78.
Article Google Scholar
Wei K, Qin G, Zhang J, Sui X. Multiply robust estimation of the average treatment effect with missing outcomes. J Stat Comput and Simul. 2023;93(10):1479–95.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by National Natural Science Foundation of China (No. 82273730 and 82173612), Shanghai Rising-Star Program (21QA1401300), Shanghai Municipal Natural Science Foundation (22ZR1414900) and Shanghai Municipal Science and Technology Major Project (ZD2021CY001). The sponsors had no role in study design, data collection, data analysis, data interpretation, or writing of this report.

Author information

Ce Wang and Kecheng Wei contributed equally to this work.

Authors and Affiliations

Department of Biostatistics, Key Laboratory for Health Technology Assessment, National Commission of Health, Key Laboratory of Public Health Safety of Ministry of Education, School of Public Health, Fudan University, Shanghai, China
Ce Wang, Kecheng Wei, Chen Huang, Yongfu Yu & Guoyou Qin
Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
Yongfu Yu & Guoyou Qin

Authors

Ce Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kecheng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Chen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yongfu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Guoyou Qin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ce Wang, Kecheng Wei, Yongfu Yu, and Guoyou Qin propose the study concept and design; Ce Wang, Kecheng Wei and Chen Huang drafted the manuscript; Ce Wang and Kecheng Wei performed statistical analysis; Yongfu Yu and Guoyou Qin conducted study supervision; and all authors critically revised the manuscript for important intellectual content.

Corresponding authors

Correspondence to Yongfu Yu or Guoyou Qin.

Ethics declarations

Ethics approval and consent to participate

Since the simulated datasets did not involve any human data, ethics approval was not applicable; and the real data is publicly available, thus ethics approval was not required.

Consent for publication

Not applicable.

Competing interests

None of the co-authors have a conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Wang, C., Wei, K., Huang, C. et al. An improved multiply robust estimator for the average treatment effect. BMC Med Res Methodol 23, 231 (2023). https://doi.org/10.1186/s12874-023-02056-7

Download citation

Received: 08 July 2023
Accepted: 01 October 2023
Published: 11 October 2023
DOI: https://doi.org/10.1186/s12874-023-02056-7

An improved multiply robust estimator for the average treatment effect

Abstract

Background

Method

Results

Conclusions

Background

Method

Notation and assumptions

Previous multiply robust method

Proposed multiply robust method

Bootstrap confidence interval

Simulation study

Simulation design

Simulation results

Empirical study

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Research Methodology

Contact us