An improved multiply robust estimator for the average treatment effect

Background In observational studies, double robust or multiply robust (MR) approaches provide more protection from model misspecification than the inverse probability weighting and g-computation for estimating the average treatment effect (ATE). However, the approaches are based on parametric models, leading to biased estimates when all models are incorrectly specified. Nonparametric methods, such as machine learning or nonparametric double robust approaches, are robust to model misspecification, but the efficiency of nonparametric methods is low. Method In the study, we proposed an improved MR method combining parametric and nonparametric models based on the previous MR method (Han, JASA 109(507):1159-73, 2014) to improve the robustness to model misspecification and the efficiency. We performed comprehensive simulations to evaluate the performance of the proposed method. Results Our simulation study showed that the MR estimators with only outcome regression (OR) models, where one of the models was a nonparametric model, were the most recommended because of the robustness to model misspecification and the lowest root mean square error (RMSE) when including a correct parametric OR model. And the performance of the recommended estimators was comparative, even if all parametric models were misspecified. As an application, the proposed method was used to estimate the effect of social activity on depression levels in the China Health and Retirement Longitudinal Study dataset. Conclusions The proposed estimator with nonparametric and parametric models is more robust to model misspecification. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-023-02056-7.


Background
The primary goal of much-applied research is to estimate the causal effect of specific treatment (exposure or intervention) on the outcome.In randomized controlled trials, where treatments are randomly assigned to participants, the average treatment effect (ATE) can be estimated by directly comparing outcomes between treatment and control groups [1].In observational studies, however, there are usually unbalanced covariates between treatment and control groups due to the non-randomized treatment assignment.As a result, a direct comparison of outcomes between treatment and control groups may lead to a biased estimation of the ATE [2].
The inverse probability weighting (IPW) with a propensity score (PS) model and the direct confounding adjustment (known as g-computation) with an outcome regression (OR) model are the general approaches to handling confounding bias [1,3,4]; Compared to IPW and g-computation methods, doubly robust approaches provide double protection from model misspecification [5][6][7]; but, the doubly robust approach does not offer sufficient protection for estimating ATE in practice, as they allow only one PS model and one OR model.Recently, multiply robust (MR) approaches, increasing the likelihood of including the correct model, are proposed for estimating ATE or a population mean with missing values [8][9][10][11].And, the previous MR approach [9] is robust against extreme values of the fitted receiving treatment probability.However, the previous MR approach [9] only considering parametric models may lead to a biased estimation when the included parametric models are all incorrectly specified.
In addition, there is growing interest in developing nonparametric methods for estimating ATE to protect against model misspecification.Machine learning, a general term for a diverse number of nonparametric algorithms, is particularly useful for classification and prediction and is used to estimate the ATE [12][13][14][15][16][17].However, the root mean square error (RMSE) of machine learning seems to be higher than that of the correct parametric model may due to incorrect hyperparameter settings [18][19][20].Nonparametric double robust methods based on the kernel smoothing approach [21,22] or targeted minimum loss [23] have also been proposed to estimate the ATE.Yet, the efficiency of these estimators is not high because of slow convergence rates.
In this study, based on the previous MR approach [9], we proposed an improved MR approach considering both parametric and nonparametric models to improve the robustness to model misspecification.Our simulation study showed that the proposed MR approach is more robust to model misspecification than previous MR approach; and the MR estimators with only OR models, where one of the models was a nonparametric model, were the best among all MR estimators for the following two reasons.First, the MR estimators were robust to model misspecification, and had the lowest root mean square errors (RMSEs) when including a correct parametric OR model.Second, the performance of the best estimators was comparative even if all parametric models were misspecified.

Notation and assumptions
Let X i be a p-dimensional vector of covariates, Y i be the observed outcome, and Z i be the treatment status taking value 1 if treated or 0 if untreated.Let ( Y 1 , Y 0 ) be the two potential outcomes in the treatment and control groups, respectively, and the ATE is defined as And to draw a correct causal inference in the study, exchangeability, consistency, and positivity assumptions hold [24].

Previous multiply robust method
The previous MR approach proposed by Han [9] provides multiple protection to the model misspecification.Specifically, specifying two sets of parametric models, P = π l (X), l = 1, 2, 3, . . ., L for propensity score and M = m k z (X), k = 1, 2, 3, . . ., K for outcome, where m k z (X) = m k (X, Z) .Without loss of general- ity, let I = 1, . . ., n 1 and J = 1, . . ., n 0 be the indexes for treated and untreated subjects, respectively.Let n 1 and n 0 = n − n 1 represent the size of treatment and control groups, respectively.
To recover the treated population average from subjects in the treatment group, the empirical likelihood weights w i (i ∈ I) for the outcome Y i (i ∈ I) in the treat- ment group are estimated by maximizing i∈I w i subject to the following constraints: . By symmetry, the weights w j j ∈ J for the control group are given by maximizing j∈J w j according to the follow- ing constraints: The w i and w j can be given with Lagrange multiplier method as follows: where ρ T 1 and ρ T 0 are the (J + K)-dimensional Lagrange mul- tipliers solving ρ T 1 and ρ T 0 must satisfy 1 + ρ T 1 g 1 (X i ) > 0 and 1 + ρ T 0 g 0 X j > 0 due to the non-negativity of w i and w j , respectively.The estimation of w i and w j can be solved by the Newton-Raphson algorithm [9].
In summary, the ATE estimated by MR method [9] is defined as

Proposed multiply robust method
The previous MR method [9] allows multiple parametric models, increasing the likelihood of including the correct model; However, there is still a biased estimates of ATE when all parametric models are misspecified.Based on the j∈J w j m k 0 X j = η k 0 (k = 1, 2, 3, . . ., K ) previous MR method, the proposed MR method allows multiple parametric models, and also includes a nonparametric PS model and a nonparametric OR model.We select the neural network (NNET) as nonparametric models in the proposed MR method.NNET, one machine learning algorithm, has been used to estimate the ATE [15,16].We specified three-layer (input layer, one hidden layer, output layer) NNET, which may be practical [12], and the hidden layer consists of 4 nodes.We performed the NNET using the nnet R package with default parameters.Finally, the proposed MR method added a NNET-based outcome regression model (NN-OR) and a NNET-based propensity score model (NN-PS) in base of previous MR method.
Similar to the previous MR method [9], we specify two sets of models, P = π l (X), l = 1, 2, 3, . . ., L, L + 1 for pro- pensity score and M = m k z (X), k = 1, 2, 3, . . ., K , K + 1 for outcome.Assume π L+1 (X) and m K +1 z (X) are the NN-PS and NN-OR models, respectively, and the other are parametric models.The empirical likelihood weights w i (i ∈ I) for the outcome Y i (i ∈ I) in the treatment group are estimated by maximizing i∈I w i subject to the follow- ing constraints: . By symmetry, the weights w j j ∈ J for the control group are given by maximizing j∈J w j according to the follow- ing constraints: The w i and w j can be given with Lagrange multiplier method as follows: where ρ T 1 and ρ T 0 are the (J + K + 2)-dimensional Lagrange multipliers solving ρ T 1 and ρ T 0 must satisfy 1 + ρ T 1 g 1 (X i ) > 0 and 1 + ρ T 0 g 0 X j > 0 due to the non-negativity of w i and w j , respectively.The estimation of w i and w j can be solved by the Newton-Raphson algorithm [9].
The ATE estimated by the proposed method is defined as

Bootstrap confidence interval
The confidence interval of the estimators could be obtained by the bootstrap method, where maybe IPW, OR, or MR estimator.Specifically, n individuals first are resampled with replacement from the original data for B times to obtain B bootstrap sample, where is the pre-specified number.For b = 1, . . ., B , let b be the estimates of the estimator from the b-th bootstrap sample.Then the bootstrap variance estimator for b is given by

Simulation study Simulation design
We conducted comprehensive simulation studies to evaluate the performance of the proposed MR method.Ten covariates X 1 − X 10 were simulated from the standard normal distribution, where corr(X 1 , X 5 ) = corr(X 4 , X 9 ) = 0.9 and corr(X 2 , X 6 ) = corr(X 3 , X 8 ) = 0.2 .The treatment indica- tor was generated from a Bernoulli distribution with a PS model as followed , which produced a treatment prevalence of approximately 50%.The outcome was simulated from where ε ∼ N (0, 1) .The true value of ATE is 1 in the simulation.
In the simulation, we specified three models, including a NN-PS model and two parametric models for propensity score.Let π 1 (X) be the PS of NN-PS model, and π 2 (X) and π 3 (X) be the PS of logistic mod- els.And we also specified three models, including a OR-NNET model and two parametric models for outcome regression.Let m 1 (X, Z) be the outcome of NN-OR model, and m 2 (X, Z) and m 3 (X, Z) be the outcome of linear regression models.According to the data-generating process, π 2 (X) and m 2 (X, Z) were cor- rectly specified.In order to distinguish these estimation methods, three IPW estimators were defined as "IPW.model1", "IPW.model2" and "IPW.model3",where "IPW.model1" refer to the IPW estimator with a NN-PS model; three OR estimators were defined as "OR.model1", "OR.model2" and "OR.model3",where "OR.model1" refer to the OR estimator with NN-OR model.For the MR estimators, each estimator is denoted as "MR000000" where each digit of the six numbers, from left to right, indicates if π 1 (X) , π 2 (X) , π 3 (X),m 1 (X, Z), m 2 (X, Z) or m 3 (X, Z) is included in the estimator ("1" means yes, and "0" means no).
And we studied the performance of MR estimators when all parametric models were misspecified as follows for propensity score for outcome regression.
In addition, which factors related to the treatment and outcome were unknown in practice; hence, we also explored the performance of MR estimators when including NNET models with all covariates in situations where parametric models included correct models or did not include any correct model.
In the simulation, we calculated the mean relative bias, RMSE, and coverage rate to assess the performance of the proposed MR method.All results were based on 1000 simulation replications, and the sample sizes n = 200, 500 and 2000.

Simulation results
Table 1 and Figure S1 showed the simulation results of estimating ATE where the proposed MR method included the correct parametric models.And Table 2 and Figure S2 showed the simulation results when there were no correct parametric models in MR method.
We could get a few conclusions from the simulation studies.
According to Table 1 and Figure S1, biases of IPW, OR, or MR estimators were ignorable when the parametric models were correctly specified or NNET models were included.The RMSEs of estimators with correct PS or NN-PS models were larger than those with correct OR or NN-OR, respectively.And the RMSEs of the estimators with the correct parametric OR model (OR.model2,MR000010) had the smallest RMSEs among all estimators, and were significantly less than those with NN-OR model (OR.model1,MR000100).However, it can easily be seen that the biases and RMSEs of the estimator with a wrong parametric model are much larger (OR.model3,MR000001).
Together with Table 2 and Figure S2, the proposed MR estimators improved the robustness to the model misspecification even if the parametric models were all incorrectly modeled (MR111000, MR000111, MR111011, MR011111, MR111111 in Table 2 and Figure S2).Although the biases of the six estimators are negligible, the MR000111 had the smallest RMSE among the six estimators.Further, MR000111 with a correct OR model (MR000111 in Table 1 and Figure S1) had a smaller RMSE than the MR estimators with both parametric PS and OR models; and the RMSE of MR000111 was small as that of OR estimator with the correct parametric model (OR.model2 in Table 1 and Figure S1).Even if the two parametric OR models are incorrectly specified, the RMSE of MR000111 is similar to that of NN-OR model.
In addition, Table 3 and Figure S3 showed the simulation results when NNET models included all covariates where parametric models included a correct model; and Table 4 and Figure S4 showed the simulation results when NNET models included all covariates where parametric models did not include a correct model.Similar results were observed in Tables 3 and 4 and Figures S3 and S4.
In conclusion, the simulation results showed that the proposed MR estimators were robust to model misspecification even if all parametric models were incorrectly specified.Further, considering the robustness to model misspecification and RMSE, the MR estimators with only OR models, where one of the models was a NNET model, was the most recommended.The recommended estimators were robust to model misspecification and tended to have the smallest RMSE when the estimators included a correct OR model; and the performance of the recommended estimators was comparative even if all parametric models were misspecified.

Empirical study
The China Health and Retirement Longitudinal Study (CHARLS) is a large-scale, nationally representative longitudinal survey of people aged 45 or older and their spouses in China, including assessments of the social, economic, and health status of community residents [25].The study aimed to estimate the treatment effect of social activity on the depression level in the real-world data from CHARLS.The depression level was evaluated by CES-D Depression Scale, and the total score was between 0 and 30: a higher total score denotes a higher depression level, while a lower total score denotes a lower depression level.The self-reported social activity includes 11 categories based on individual responses to the question, "Have you done any of these activities in the last month".The value of the variable is 1 if the participant takes part estimators are denoted as "MR000000", where each digit of the four numbers, from left to right, indicates if π 1 (X) , π 2 (X),π 3 (X), m 1 (X, Z) , m 2 (X, Z) or m 3 (X, Z) is included in the estimator ("1" means yes and "0" means no) (2) complete baseline.A total of 10,119 participants were included in the analysis.The baseline information was summarized in Table S1.We found that the non-social activity group had a higher level of depression.

Estimator
In the study, we specified three PS models (including one NN-PS model and two logistic models) and three OR models (including one NN-OR model and two linear regression) in the MR method.For the NNET  models, we included all covariates.For the parametric models, we explored the association of social activity group/non-social activity group with all covariates via a logistic model, and the association of depression with all covariates through a linear model; and we identified candidate models with a significant level at 0.05 and 0.01.Hence, three set of covariates in [ π 1 (X), π 2 (X), π 3 (X) ] are: (i) age, marital status, sex, region, smoke status, self-reported hypertension, self-reported diabetes, selfreported heart disease and self-reported stroke; (ii) age, marital status, sex, region, smoke status, and selfreported diabetes; (iii) age, region and smoke status.Three sets of covariates in [ m 1 (X, Z), m 2 (X, Z), m 3 (X, Z) ] are: (i) age, marital status, sex, region, smoke status, self-reported hypertension, self-reported diabetes, selfreported heart disease, self-reported stroke and activity   5, all estimates showed that the social activity group had lower depression scores than the non-social activity group.However, when only specifying NN-PS model and NN-OR model (MR100000, MR000100, MR100100), the estimators tended to have higher estimated values and higher standard errors.The other estimators had similar estimates, and MR000111 tended to have the smallest standard errors.

Discussion
In this study, we considered estimating ATE between treatment and control groups in observational studies.The proposed MR estimators combined parametric and nonparametric models based on the previous MR method [8].Our simulation study showed that the MR estimators with only outcome regression (OR) models, where one of the models was a nonparametric model, were the most recommended because of the robustness to model misspecification and the lowest root mean square error (RMSE) when including a correct parametric OR model.And the performance of the recommended estimators was comparative even if all parametric models were misspecified.We mainly focused on estimating ATE in the study, and our proposed method can be easily to estimate other causal parameters, such as the average treatment effect on the treated, E(Y 1 − Y 0 |Z = 1) [26], and log of causal risk ratio for binary outcome log E(Y 1 ) E(Y 0 ) [27].Our simulation results showed that the IPW, direct confounding adjustment, or MR estimators with only parametric models might gain a large bias, large RMSE, and low coverage rate when the parametric models were misspecified.By contrast, when adding a NNET model to the MR estimators, the bias was ignorable and coverage rate was close to 95% even if we misspecified all parametric models.In addition, the estimators with only OR models are the most recommended because of the robustness to the model specification and the smallest RMSE when including a correct OR model.Further, NNET can extract the complex relationship among variables without prior information so that we put some variables unrelated to exposure or outcome in the model and could still get similar results.
A limitation of study is that no relevant theoretical proof was provided, and future research will focus on theoretical proofs and properties of the proposed method.Further, we focused on estimating ATE in a non-survival context in the study, but there are lots of time-to-event outcomes in observational studies, and the extension of the proposed method in the survival outcomes studies will be a topic of our future research.

Conclusions
In this study, we proposed a new MR estimator, considering nonparametric and parametric models, which is more robust to model misspecification.

Table 1
Simulation results with different sample sizes = 200, 500 or 2000 in the situation where the parametric models included the correct models and the neural network model included true covariates Bias (%) mean relative bias, RMSE root mean square error, CR coverage rate, IPW inverse probability weighting, OR outcome regression, MR multiply robust, MR:

Table 2
Simulation results with different sample sizes = 200, 500 or 2000 in the situation where the parametric models did not include the correct models and the neural network model included true covariates Bias (%) mean relative bias, RMSE Root mean square error, CR Coverage rate, IPW Inverse probability weighting, OR Outcome regression, MR Multiply robust; MR estimators are denoted as "MR000000", where each digit of the four numbers, from left to right, indicates if π 1 Z) is included in the estimator ("1" means yes and "0" means no)

Table 3
Simulation results with different sample sizes = 200, 500 or 2000 in the situation where the parametric models included the correct models and the neural network model included all covariates Bias (%) mean relative bias, RMSE root mean square error, CR coverage rate, IPW inverse probability weighting, OR outcome regression, MR multiply robust, MR estimators are denoted as "MR000000", where each digit of the four numbers, from left to right, indicates if π 1 We applied the MR methods to estimate the effect.The results were shown in Table5and FigureS5.From Table

Table 4
Simulation results with different sample sizes = 200, 500 or 2000 in the situation where the parametric models did not include the correct models and the neural network model included all covariates Bias (%) mean relative bias, RMSE Root mean square error, CR Coverage rate, IPW: inverse probability weighting; OR: outcome regression; MR, multiply robust; MR estimators are denoted as "MR000000", where each digit of the four numbers, from left to right, indicates if π 1 Z) is included in the estimator ("1" means yes and "0" means no)