 Research
 Open access
 Published:
Estimation of average treatment effect based on a multiindex propensity score
BMC Medical Research Methodology volume 22, Article number: 337 (2022)
Abstract
Background
Estimating the average effect of a treatment, exposure, or intervention on health outcomes is a primary aim of many medical studies. However, unbalanced covariates between groups can lead to confounding bias when using observational data to estimate the average treatment effect (ATE). In this study, we proposed an estimator to correct confounding bias and provide multiple protection for estimation consistency.
Methods
With reference to the kernel functionbased doubleindex propensity score (Ker.DiPS) estimator, we proposed the artificial neural networkbased multiindex propensity score (ANN.MiPS) estimator. The ANN.MiPS estimator employed the artificial neural network to estimate the MiPS that combines the information from multiple candidate models for propensity score and outcome regression. A Monte Carlo simulation study was designed to evaluate the performance of the proposed ANN.MiPS estimator. Furthermore, we applied our estimator to real data to discuss its practicability.
Results
The simulation study showed the bias of the ANN.MiPS estimators is very small and the standard error is similar if any one of the candidate models is correctly specified under all evaluated sample sizes, treatment rates, and covariate types. Compared to the kernel functionbased estimator, the ANN.MiPS estimator usually yields smaller standard error when the correct model is incorporated in the estimator. The empirical study indicated the point estimation for ATE and its bootstrap standard error of the ANN.MiPS estimator is stable under different model specifications.
Conclusions
The proposed estimator extended the combination of information from two models to multiple models and achieved multiply robust estimation for ATE. Extra efficiency was gained by our estimator compared to the kernelbased estimator. The proposed estimator provided a novel approach for estimating the causal effects in observational studies.
Background
Estimating the average treatment effect (ATE) is essential for assessing causal effects of treatments or interventions in biometrics, epidemiology, econometrics, and sociology. The ATE can be estimated by directly comparing mean outcomes between treated and controlled groups in randomized controlled trials [1]. However, randomized controlled trials are usually difficult to implement because of budget restrictions, ethics, and subjects’ noncompliance. Therefore, observational studies are increasingly used for estimating ATE. However, the baseline covariates are commonly unbalanced between treated and controlled groups in observational studies, and simply comparing mean outcomes may induce confounding bias [2].
Inverse probability weighting (IPW) under potential outcome framework is a popular approach for correcting confounding bias [3,4,5]. The IPW approach specifies a propensity score (PS) model to estimate subjects’ PS and uses the inverse of PS to balance baseline covariates between groups [6, 7]. For binary treatment, the mostly used PS model is the logistic regression. Some machine learning models, such as decision tree[8] and artificial neural network [9,10,11,12] are also used to estimate the PS. Another widely used approach is outcome regression (OR) [13]. The OR approach specifies an OR model, such as generalized linear model [14] to model the outcome as a function of the treatment and covariates to correct confounding bias directly. Some machine learning models, such as random forest [15] and artificial neural network [16] are also used as the OR model. Both IPW and OR approaches yield consistent estimation only if the corresponding model is correctly specified, but neither can be verified by the data alone.
Doubly robust approach, combining the models of PS and OR, can yield consistent estimation when any one of these two models is correctly specified (not necessarily both). Recently, a variety of doubly robust estimators for ATE have been proposed, such as augmented estimating equations estimator [17] and target maximum likelihood estimator [18]. The kernel functionbased doubleindex propensity score (Ker.DiPS) estimator proposed by Cheng et al. [19] is one of the weightingbased doubly robust estimators. They used the NadarayaWatsontype kernel function to combine the information from one PS model and one OR model to obtain an integrated PS, which they named as doubleindex propensity score (DiPS). Using IPW approach based on the DiPS, the Ker.DiPS estimator achieved doubly robust estimation for ATE. However, the integrated PS estimated by NadarayaWatsontype kernel may be out of range between 0 to 1. The unreasonable PS violates the causal inference assumption and may yield uncertain estimation. Moreover, the Ker.DiPS estimator allows only two opportunities for estimation consistency.
To provide more protection on estimation consistency, we would like to develop an estimator allowing specifying multiple candidate models and can achieve estimation consistency when any one model is correctly specified. Such type of estimator is defined as multiply robust estimator [20, 21]. When combining the information from multiple candidate models to obtain the multiindex propensity score (MiPS), the NadarayaWatsontype kernel function may yield unstable estimation as it suffers from the “curse of dimensionality” [22,23,24]. With the development of scalable computing and optimization techniques [25, 26], the use of machine learning, such as artificial neural network (ANN) has been one of the most promising approaches in connection with applications related to approximation and estimation of multivariate functions [27, 28]. The ANN has the potential of overcoming the curse of dimensionality [29, 30] and has been used as a universal approximators for various functional representations [31,32,33]. Therefore, we replaced the kernel function with ANN to conduct nonparametric regression to estimate the MiPS. We aim to achieve multiply robust estimation for ATE using the ANNbased MiPS.
The rest of the article is organized as follows. In the Notations and assumptions section, we introduce necessary notations and causal inference assumptions. In the Some existing approaches section, we introduce some existing estimators that leads to the development of our estimator. In the Proposed multiindex propensity score section, we describe the origin and construction of the proposed estimator in detail. In the Simulation studies section, we perform simulations to evaluate the performance of the proposed estimator. A real data analysis was conducted in the Application to NHEFS data section. We make further discussion in the Discussion section and conclude the paper in the Conclusions section.
Methods
Notations and assumptions
Suppose that \({\mathbf{Z}}_{i}={\left({Y}_{i},{A}_{i},{\mathbf{X}}_{i}^{{\top }}\right)}^{{\top }}, i=1,\dots ,n\) be the observed data for \({i}^{\mathrm{th}}\) subject from independent and identically distributed copies of \(\mathbf{Z}={\left(Y,A,{\mathbf{X}}^{{\top }}\right)}^{{\top }}\), where \(Y\) is the outcome, \(A\) is the binary indicator of treatment (\(A=1\) if treated and \(A=0\) if controlled), and \(\mathbf{X}\) is the pdimensional vector of pretreatment covariates. Let \({Y}^{1}\) and \({Y}^{0}\) represent the potential outcomes if a subject was assigned to treated or controlled group, respectively. The formula for average treatment effect (ATE) is
Under causal inference framework, the identifiability assumptions are usually assumed, that is [6],

Assumption 1. Consistency: \(Y=A{Y}^{1}+(1A){Y}^{0}\) with probability 1;

Assumption 2. Ignorability: (Y ^{1}, Y ^{0}) ⫫ A  X, ⫫ denotes statistical independence;

Assumption 3. Positivity: \(0<\pi \left(\mathbf{X}\right)<1\), where \(\pi \left(\mathbf{X}\right)=P\left(A=1 \right \mathbf{X})\) denotes the propensity score.
Some existing approaches
The IPW estimator is usually used for correcting confounding bias. The propensity score (PS) \(\pi \left(\mathbf{X}\right)=P\left(A=1 \right \mathbf{X})\) can be modeled as \(\pi \left(\mathbf{X};\boldsymbol{\alpha }\right)={g}_{\pi }\left({\alpha }_{0}+{\boldsymbol{\alpha }}_{1}^{\mathrm{T}}\mathbf{X}\right)\), where \({g}_{\pi }\left(\cdot \right)\) is a specified link function, for example, the inverse of the logit function for the logistic regression, and \(\boldsymbol{\alpha }={\left({\alpha }_{0},{\boldsymbol{\alpha }}_{1}^{\mathrm{T}}\right)}^{\mathrm{T}}\) are the unknown parameters and can be estimated from maximum likelihood estimation. Under causal inference assumptions, the ATE can be estimated by the IPW estimator
where \(\widehat{\boldsymbol{\alpha }}\) is the estimated value of \(\boldsymbol{\alpha }\). If \(\pi \left(\mathbf{X};\boldsymbol{\alpha }\right)\) is correctly specified, \({\widehat{\Delta }}_{IPW}\) is a consistent estimator of \(\Delta\).
The OR estimator is another commonly used approach for correcting confounding bias. Let \({\mu }_{A}\left(\mathbf{X}\right)=E\left(Y \right \mathbf{X},A)\) denote outcome regression (OR), where \(A\in \{\mathrm{0,1}\}\). It can be modeled as \({\mu }_{A}\left(\mathbf{X};{\varvec{\beta}}\right)={g}_{\mu }\left({\beta }_{0}+{{\varvec{\beta}}}_{1}^{T}\mathbf{X}+{\beta }_{2}A\right)\), where \({g}_{\mu }(\cdot )\) is a specified link function, for example, the identity function for the linear regression, \({\varvec{\beta}}={\left({\beta }_{0},{{\varvec{\beta}}}_{1}^{{\top }},{\beta }_{2}\right)}^{{\top }}\) are the unknown parameters and can be estimated from maximum likelihood estimation. Interactions between \(A\) and \(\mathbf{X}\) in OR model can also be accommodated by estimating the OR separately by treated and controlled groups [19]. Under causal inference assumptions, the ATE also can be estimated by the OR estimator
where \(\widehat{{\varvec{\beta}}}\) is the estimated value of \({\varvec{\beta}}\). If \(\mu \left(\mathbf{X},A;{\varvec{\beta}}\right)\) is correctly specified, \({\widehat{\Delta }}_{OR}\) is a consistent estimator of \(\Delta\).
If the PS model for IPW estimator or the OR model for OR estimator is incorrectly specified, the estimation consistency of \({\widehat{\Delta }}_{IPW}\) or \({\widehat{\Delta }}_{OR}\) with \(\Delta\) can not be guaranteed. To provide protection against model misspecification, Cheng et al. [19] considered integrating the information of PS \(\pi \left(\mathbf{X};\boldsymbol{\alpha }\right)\) and OR \({\mu }_{a}\left(\mathbf{X};{\varvec{\beta}}\right)\) to construct doubleindex propensity score (DiPS), which is denoted by \(\pi \left(\mathbf{X};{\boldsymbol{\alpha }}_{1},{{\varvec{\beta}}}_{1}\right)=E\left[A  {\boldsymbol{\alpha }}_{1}^{\mathrm{T}}\mathbf{X},{{\varvec{\beta}}}_{1}^{\mathrm{T}}\mathbf{X}\right]\). In order to estimate this conditional expectation, Cheng et al. [19] firstly got the estimated value \({\widehat{\boldsymbol{\alpha }}}_{1}\) of PS model and the estimated value \({\widehat{{\varvec{\beta}}}}_{1}\) of OR model, then used the NadarayaWatson kernel estimator [34] to conduct nonparametric regression of \(A\) on \({\widehat{\boldsymbol{\alpha }}}_{1}^{\mathrm{T}}\mathbf{X}\) and \({\widehat{{\varvec{\beta}}}}_{1}^{\mathrm{T}}\mathbf{X}\), to get the estimated value of DiPS as
where \({\widehat{\mathbf{S}}}_{i}=\left({\widehat{\boldsymbol{\alpha }}}_{1}^{\mathrm{T}}{\mathbf{X}}_{i},{\widehat{{\varvec{\beta}}}}_{1}^{\mathrm{T}}{\mathbf{X}}_{i}\right)\) and \(\widehat{\mathbf{S}}=\left({\widehat{\boldsymbol{\alpha }}}_{1}^{\mathrm{T}}\mathbf{X},{\widehat{{\varvec{\beta}}}}_{1}^{\mathrm{T}}\mathbf{X}\right)\) are bivariate regressors, which is named doubleindex. \({\mathcal{K}}_{\mathbf{H}}\left(\bullet \right)\) is a kernel function with a bandwidth \(\mathbf{H}\) of \(2\times 2\) matrix. Using the estimated DiPS \(\widehat{\pi }\left(\mathbf{X};{\widehat{\boldsymbol{\alpha }}}_{1},{\widehat{{\varvec{\beta}}}}_{1}\right)\), the ATE can be estimated by
Cheng et al. [19] demonstrated that \({\widehat{\Delta }}_{DiPS}\) is a doubly robust estimator: it is consistent when \(\pi \left(\mathbf{X};\boldsymbol{\alpha }\right)\) is correctly specified, or \({\mu }_{A}\left(\mathbf{X};{\varvec{\beta}}\right)\) is correctly specified, but not necessarily both.
Proposed multiindex propensity score
Although \({\widehat{\Delta }}_{DiPS}\) in (3) can achieve doubly robust estimation for ATE, the DiPS estimated by the NadarayaWatson kernel estimator in (2), which may make the estimated probability outside the range of 0 to1, then the above Assumption 3 is violated. Furthermore, \({\widehat{\Delta }}_{DiPS}\) in (3) only allows a single model for PS and a single model for OR, the estimation consistency cannot be guaranteed when both models are incorrect. To provide more protection on estimation consistency, we would like to develop an approach that allows multiple candidate models for PS and/or OR, to achieve multiple robustness: the estimator is consistent when any model for PS or any model for OR is correctly specified.
Specifically, we consider multiple candidate models for PS \(\{{\pi }^{k}\left(\mathbf{X};{\boldsymbol{\alpha }}^{k}\right)={g}_{\pi }\left({\alpha }_{0}^{k}+{\boldsymbol{\alpha }}_{1}^{k\mathrm{T}}\mathbf{X}\right),k=1,\dots ,K\}\) and multiple candidate models for OR \(\left\{{\mu }_{A}^{l}\left(\mathbf{X};{{\varvec{\beta}}}^{l}\right)={g}_{\mu }\left({\beta }_{1}^{l}+{{\varvec{\beta}}}_{1}^{l\mathrm{T}}\mathbf{X}+{\beta }_{2}^{l}A\right),l=1,\dots ,L\right\}\), probably with different choices or functional forms of covariates. Then we integrate the information from multiple PS models and multiple OR models to construct multiindex propensity score (MiPS), which is denoted by \(\pi \left(\mathbf{X};{\boldsymbol{\alpha }}_{1}^{1},...,{\boldsymbol{\alpha }}_{1}^{K},{{\varvec{\beta}}}_{1}^{1},...,{{\varvec{\beta}}}_{1}^{L}\right)=E\left[A  {\boldsymbol{\alpha }}_{1}^{1\mathrm{T}}\mathbf{X},...{\boldsymbol{\alpha }}_{1}^{K\mathrm{T}}\mathbf{X},{{\varvec{\beta}}}_{1}^{1\mathrm{T}}\mathbf{X},...,{{\varvec{\beta}}}_{1}^{L\mathrm{T}}\mathbf{X}\right]\). In order to estimate this conditional expectation, we firstly get the estimated values \({\widehat{\boldsymbol{\alpha }}}_{1}^{1}\),…, \({\widehat{\boldsymbol{\alpha }}}_{1}^{K}\) of multiple PS models and the estimated values \({\widehat{{\varvec{\beta}}}}_{1}^{1}\),…, \({\widehat{{\varvec{\beta}}}}_{1}^{L}\) of multiple OR models, then a naive idea is to use the multivariate NadarayaWatson kernel estimator to conduct nonparametric regression of \(A\) on \({\widehat{\boldsymbol{\alpha }}}_{1}^{1\mathrm{T}}\mathbf{X}\),…, \({\widehat{\boldsymbol{\alpha }}}_{1}^{K\mathrm{T}}\mathbf{X}\) and \({\widehat{{\varvec{\beta}}}}_{1}^{1\mathrm{T}}\mathbf{X}\),…, \({\widehat{{\varvec{\beta}}}}_{1}^{L\mathrm{T}}\mathbf{X}\) to get the estimated value of MiPS as
where \({\widehat{\mathbf{S}}}_{j}=\left({\widehat{\boldsymbol{\alpha }}}_{1}^{1\mathrm{T}}{\mathbf{X}}_{j},\dots , {\widehat{\boldsymbol{\alpha }}}_{1}^{K\mathrm{T}}{\mathbf{X}}_{j},{\widehat{{\varvec{\beta}}}}_{1}^{1\mathrm{T}}{\mathbf{X}}_{j},\dots , {\widehat{{\varvec{\beta}}}}_{1}^{L\mathrm{T}}{\mathbf{X}}_{j}\right)\) and \(\widehat{\mathbf{S}}=\left({\widehat{\boldsymbol{\alpha }}}_{1}^{1\mathrm{T}}\mathbf{X},\dots , {\widehat{\boldsymbol{\alpha }}}_{1}^{K\mathrm{T}}\mathbf{X},{\widehat{{\varvec{\beta}}}}_{1}^{1\mathrm{T}}\mathbf{X},\dots , {\widehat{{\varvec{\beta}}}}_{1}^{L\mathrm{T}}\mathbf{X}\right)\) are multivariate regressors, which is named multiindex. \({\mathcal{K}}_{\mathbf{H}}\left(\bullet \right)\) is a kernel function with a bandwidth \(\mathbf{H}\) of \(\left(K+L\right)\times \left(K+L\right)\) matrix. Using the estimated kernelbased MiPS \({\widehat{\pi }}^{Ker}\left(\mathbf{X};{\widehat{\boldsymbol{\alpha }}}_{1}^{1},...,{\widehat{\boldsymbol{\alpha }}}_{1}^{K},{\widehat{{\varvec{\beta}}}}_{1}^{1},...,{\widehat{{\varvec{\beta}}}}_{1}^{L}\right)\), the ATE can be estimated by
However, if there are no additional assumptions about the regression structure, the performance of NadarayaWatson kernel estimator in (5) degrades as the number of regressors increases. This degradation in performance is often referred to as the “curse of dimensionality” [22,23,24]. Our following simulation results also show that \({\widehat{\Delta }}_{MiPS}^{Ker}\) has obvious bias when multiple candidate models are included in \({\widehat{\pi }}^{Ker}\left(\mathbf{X};{\widehat{\boldsymbol{\alpha }}}_{1}^{1},...,{\widehat{\boldsymbol{\alpha }}}_{1}^{K},{\widehat{{\varvec{\beta}}}}_{1}^{1},...,{\widehat{{\varvec{\beta}}}}_{1}^{L}\right)\), even if the correct PS and/or OR model is covered.
With the development of scalable computing and optimization techniques [25, 26], the use of machine learning has been one of the most promising approaches in connection with applications related to approximation and estimation of multivariate functions [27, 28]. Artificial neural network (ANN) is one of machine learning approaches. Benefiting from its flexible structure, the ANN becomes a universal approximator of a variety of functions [31,32,33]. The ANN comprises an input layer, a researcherspecified number of hidden layer(s), and an output layer. The hidden layer(s) and output layer consist of a number of neurons (also specified by researchers) with activation functions [35]. The operation of ANN includes following steps: 1) Information is input from the input layer, which passes it to the hidden layer; 2) In the hidden layer(s), the information is multiplied by the weight and a bias is added, and then passed to the next layer after transforming by the activation function; 3) The information is passed layer by layer until the last layer, where it is multiplied by the weight and then transformed by the activation function to provide the output; and 4) Calculate the error between the output and the actual value, and minimize the error by optimizing the weight parameters and bias parameters through the backpropagation algorithm [36]. In addition to having the potential of overcoming the “curse of dimensionality” [29, 30], the ANN is capable of automatically capturing complex relationships between variables [27]. It may be suited for modeling the relationship between treatment and multiindex because interactions commonly exist between indexes due to shared covariates in candidate PS and/or OR models. Therefore, we replaced the kernel function by ANN and proposed our ANNbased MiPS (ANN.MiPS) estimator.
Now we propose the ANNbased MiPS. We firstly get the estimated values \({\widehat{\boldsymbol{\alpha }}}_{1}^{1}\),…, \({\widehat{\boldsymbol{\alpha }}}_{1}^{K}\) of multiple PS models and the estimated values \({\widehat{{\varvec{\beta}}}}_{1}^{1}\),…, \({\widehat{{\varvec{\beta}}}}_{1}^{L}\) of multiple OR models, then use the ANN to conduct nonparametric regression of \(A\) on multiple indexes \({\widehat{\boldsymbol{\alpha }}}_{1}^{1\mathrm{T}}\mathbf{X}\),…, \({\widehat{\boldsymbol{\alpha }}}_{1}^{K\mathrm{T}}\mathbf{X}\) and \({\widehat{{\varvec{\beta}}}}_{1}^{1\mathrm{T}}\mathbf{X}\),…, \({\widehat{{\varvec{\beta}}}}_{1}^{L\mathrm{T}}\mathbf{X}\) to get the estimated value of MiPS as \({\widehat{\pi }}^{Ann}\left(\mathbf{X};{\widehat{\boldsymbol{\alpha }}}_{1}^{1},...,{\widehat{\boldsymbol{\alpha }}}_{1}^{K},{\widehat{{\varvec{\beta}}}}_{1}^{1},...,{\widehat{{\varvec{\beta}}}}_{1}^{L}\right)\). Then the ATE can be estimated by
Our following simulations indicate the multiple robustness of \({\widehat{\Delta }}_{MiPS}^{Ann}\): its bias is ignorable when any model for PS or any model for OR is correctly specified.
We implemented the ANN that contains 2 hidden layers with 4 neurons in each hidden layer using AMORE package [37] for ANN.MiPS estimator. Therefore, the total number of parameters to be estimated in the ANN is \(4*(K+L)+32\), including \(4*(K+L)+24\) weight parameters and 8 bias parameters. The learning rate is set as 0.001 [10, 12]. The momentum is set as 0.5, the default value in the AMORE package. The hyperbolic tangent function was specified as the activation function for hidden layer. The sigmoid function was specified as the activation function for output layer to ensure the estimated ANNbased MiPS is between 0 to 1 [38]. To examine the performance stability of the estimator, we performed a sensitivity analysis using different hyperparameter selections. The simulations, real data analysis, and all statistical tests were conducted using R software (Version 4.1.0) [39]. A zip file of AMORE package and an example code for implementing the ANN.MiPS approach can be found in the attachment.
Simulation studies
We conducted simulation studies to evaluate the performance of (i) single modelbased estimators: IPW estimator in (1) and OR estimator in (2); (ii) doubly robust estimators: augmented inverse probability weighting (AIPW) [17] and target maximum likelihood estimator (TMLE) [18], which allows a single model for PS and a single model for OR; (iii) multiple modelsbased estimators: kernelbased estimator in (6) and ANNbased estimator in (7), which allows multiple candidate models for PS and/or OR.
Ten covariates \({X}_{1}{X}_{10}\) were generated from standard normal distribution, and the correlation between them are shown in Fig. 1. The binary treatment indicator \(A\) was generated from a Bernoulli distribution according to the following propensity score
\({\alpha }_{0}\) was set to be 0 or 1.1 to make approximately 50% or 25% subjects entering the treatment group. The continuous outcome \(Y\) was generated from
where \(\varepsilon\) follows the standard normal distribution. The true ATE was \(\Delta =E\left({Y}^{1}\right)E\left({Y}^{0}\right)=0.4\).
In the estimation, two estimation models were specified
for propensity score, and two estimation models were specified
for outcome regression. According to the datagenerating mechanism, \({\pi }^{1}\left(\mathbf{X};{\boldsymbol{\alpha }}^{1}\right)\) and \({{\mu }_{A}}^{1}\left(\mathbf{X};{{\varvec{\beta}}}^{1}\right)\) were correct PS and correct OR models, whereas \({\pi }^{2}\left(\mathbf{X};{\boldsymbol{\alpha }}^{2}\right)\) and \({{\mu }_{A}}^{2}\left(\mathbf{X};{{\varvec{\beta}}}^{2}\right)\) were incorrect PS and incorrect OR models, due to the misspecified functional forms of covariates. To distinguish these estimation methods, each estimator is denoted as "method0000". Each of the four numbers, from left to right, represents if \({\pi }^{1}\left(\mathbf{X};{\boldsymbol{\alpha }}^{1}\right)\), \({\pi }^{2}\left(\mathbf{X};{\boldsymbol{\alpha }}^{2}\right)\), \({{\mu }_{A}}^{1}\left(\mathbf{X};{{\varvec{\beta}}}^{1}\right)\) or \({{\mu }_{A}}^{2}\left(\mathbf{X};{{\varvec{\beta}}}^{2}\right)\) is included in the estimator, where “1” indicates yes and “0” indicates no.
We investigated sample sizes of \(n=300\) and \(n=1000\) with 1000 replications in all settings. Tables 1 and 2 show the estimation results of all estimators, along with five evaluation measures including percentage of bias (BIAS, in percentage), root mean square error (RMSE), Monte Carlo standard error (MCSE), bootstrapping standard error (BSSE) based on 100 resamples, and coverage rate of 95% Wald confidence interval (CICov). Our bootstrapping procedure resamples from the original sample set with replacement until the bootstrapping sample size reaches the original sample size. Fig. S1 shows the distribution of the estimated ATEs of Ker.MiPS and ANN.MiPS estimators. The following conclusions can be obtained. For estimation bias,

(i)
If specifying one model for PS or one for OR: The IPW, Ker.MiPS, and ANN.MiPS estimators all have a small bias if the PS model is correctly specified (IPW.correct, Ker.MiPS1000, ANN.MiPS1000). The OR, Ker.MiPS, and ANN.MiPS estimators all have a small bias if the OR model is correctly specified (IPW.correct, Ker.MiPS0010, ANN.MiPS0010).

(ii)
If specifying one model for PS and one model for OR: The AIPW, TMLE, Ker.MiPS and ANN.MiPS estimators all have a small bias if the PS model is correctly specified (AIPW1010, AIPW1001, Ker.MiPS1010, Ker.MiPS1001, ANN.MiPS1010, ANN.MiPS1001), or if the OR model is correctly specified (AIPW1010, AIPW0110, Ker.MiPS1010, Ker.MiPS0110, ANN.MiPS1010, ANN.MiPS0110).

(iii)
If specifying multiple candidate models for PS and OR: The multiple robustness property of the ANN.MiPS estimator is well demonstrated by the ignorable bias of ANN.MiPS1110, ANN.MiPS1101, ANN.MiPS1011, ANN.MiPS0111, and ANN.MiPS1111. On the contrary, the biases of the Ker.MiPS estimators under all model specifications are close to or larger than 5%.
For estimation efficiency,

(i)
If models for both PS and OR are correctly specified: The MCSE of AIPW1010, TMLE1010, and ANN.MiPS1010 estimators are all smaller than that of IPW.correct and ANN.MiPS1000 estimators. The improved efficiency may benefit from the information of the correct OR model.

(ii)
If multiple candidate models incorporate the correct PS and OR models: The MCSE of ANN.MiPS1110, ANN.MiPS1011, and ANN.MiPS1111 estimators are all close to ANN.MiPS1010.
To evaluate the performance of the MiPS estimator when the number of specified models increases, we have considered three additional estimators: MiPS11112PS, adding two additional incorrect PS models \(\left\{\begin{array}{c}logit\left[{\pi }^{3}\left(\mathbf{X};{\boldsymbol{\alpha }}^{3}\right)\right]=\left(1,{X}_{1},{X}_{2},{X}_{3}\right){\boldsymbol{\alpha }}^{3}\\ logit\left[{\pi }^{4}\left(\mathbf{X};{\boldsymbol{\alpha }}^{4}\right)\right]=\left(1,{X}_{1}^{2},{X}_{2}^{2},{X}_{3}^{2}\right){\boldsymbol{\alpha }}^{4}\end{array}\right\}\) on the basis of the MiPS1111; MiPS11112OR, adding two additional incorrect OR models \(\left\{\begin{array}{c}{\mu }_{A}^{3}\left(\mathbf{X};{{\varvec{\beta}}}^{3}\right)=\left(1,{X}_{1},{X}_{2},{X}_{3},A\right){{\varvec{\beta}}}^{3}\\ {\mu }_{A}^{4}\left(\mathbf{X};{{\varvec{\beta}}}^{4}\right)=\left(1,{X}_{1}^{2},{X}_{2}^{2},{X}_{3}^{2},A\right){{\varvec{\beta}}}^{4}\end{array}\right\}\) on the basis of the MiPS1111; MiPS11112PS2OR, adding two additional incorrect PS models \({\pi }^{3}\left(\mathbf{X};{\boldsymbol{\alpha }}^{3}\right)\) and \({\pi }^{4}\left(\mathbf{X};{\boldsymbol{\alpha }}^{4}\right)\) and two additional incorrect OR models \({\mu }_{A}^{3}\left(\mathbf{X};{{\varvec{\beta}}}^{3}\right)\) and \({\mu }_{A}^{4}\left(\mathbf{X};{{\varvec{\beta}}}^{4}\right)\) on the basis of the MiPS1111. Table 3 shows the estimation results. The following conclusions can be obtained.

(i)
The estimation bias of ANN.MiPS11112PS, ANN.MiPS11112OR, and ANN.MiPS11112PS2OR estimators is still ignorable. The estimation efficiency of these estimators is hardly degraded compared to ANN.MiPS1010 estimator.

(ii)
The estimation bias of Ker.MiPS11112PS, Ker.MiPS11112OR, and Ker11112PS2OR estimators is close to or larger than 10%. The MCSE of these estimators is obviously larger than that of Ker.MiPS1010 estimator.
We also evaluated the performance of ANN.MiPS estimator under the simulation scenario with both continuous and discrete covariates. The simulation setting was described in Supplementary Document. Similar conclusions can be obtained as the above scenario with all continuous covariates (Table S1, S2). The sensitivity analysis of hyperparameters selection in ANN revealed the performance stability of ANN.MiPS estimator (Table S3).
Application to NHEFS data
To illustrate our proposed method, we analyzed a subset of real data from the National Health and Nutrition Examination Survey Data  Epidemiologic Followup Study (NHEFS) (wwwn.cdc.gov/nchs/nhanes/nhefs/). The dataset consists of 1,507 participants aged 25–74 who smoked at the first survey and were followed for approximately 10 years. The empirical study aimed to estimate the ATE of smoking cessation (coded as quitting and nonquitting, with nonquitting as the reference group) on weight gain. Participants were categorized as treated if they quit smoking during followup, otherwise controlled. Weight gain for each individual was measured as weight at the end of followup minus weight at baseline survey (in kilograms). During the 10year followup, 379 (25.15%) participants quit smoking. The average weight gain was greater for those who quit smoking with an unadjusted difference of 2.4 kg.
Table 4 summarized the baseline characteristics, including age, gender, race, baseline weight, active life level, education level, exercise, smoking intensity, smoking years, and ever use of weight loss medication between the smoking quitters and nonquitters. As shown in the table, the distribution of age, gender, race, education level, smoking intensity, and smoking years was different between quitters and nonquitters. When estimating the ATE of smoking cessation on weight gain, these factors should be adjusted for if they are confounders.
To identify candidate models for ANN.MiPS estimator, we explored the association of smoking cessation with all potential risk factors by logistic regression, and explored the association of weight gain with all potential risk factors by linear regression. The covariates in model 1 and model 2 for both PS and OR models were identified at significant levels of 0.05 and 0.1, respectively. The covariates in PS model 1 and model 2 were (i) age, gender, race, smoking intensity, and smoking years; (ii) age, gender, race, smoking intensity, smoking years, education level, and exercise situation. The covariates in OR model 1 and model 2 were (i) age, weight at baseline, smoking intensity, education level, and active life level; (ii) age, weight at baseline, smoking intensity, education level, active life level, and family income level. We applied the single modelbased IPW estimator, single modelbased OR estimator, and our proposed ANN.MiPS estimator to estimate the ATE. The four numbers in the ANN.MiPS estimator, from left to right, represents if PS model 1, PS model 2, OR model 1, or OR model 2 is included in the estimator, where “1” indicates yes and “0” indicates no. For example, “ANN.MiPS1010” represents that the PS model 1 and OR model 1 are included in the estimator. The standard error of estimation was estimated based on 500 resampled bootstrapping.
The estimation results in Table 5 indicated that all estimators suggested quitting smoking significantly increased participants' weight gain. Most of the estimated adjusted effects based on these estimators were greater than the estimated unadjusted effects of 2.4, which seems more precise and reliable. The point estimation and its bootstrap standard error for ATE of the ANN.MiPS estimator was stable under different model specifications.
Discussion
In this paper, we considered causal inference in observational studies where effects estimation was susceptible to confounding bias due to imbalanced covariates between groups. With reference to the Ker.DiPS estimator [19], we proposed the ANN.MiPS estimator to provide more chances for correcting the confounding bias. We evaluated the performance of our estimator under simulation scenarios with small (\(n=300\)) or large (\(n=1000\)) sample size, with treatment rate of 25% or 50%, and with covariates consisting of all continuous type or both continuous and discrete types. The results indicated the multiple robustness property of our estimator: the estimation bias is small if any model for PS or any model for OR is correctly specified. In addition to achieving multiply robust estimation for ATE, the proposed estimator showed a higher estimation efficiency than the kernelbased estimator when any model for PS or OR is correctly specified, especially when only the OR model is correctly specified.
One limitation of our approach is that the multiple candidate models for PS \(\{{\pi }^{k}\left(\mathbf{X};{\boldsymbol{\alpha }}^{k}\right)={g}_{\pi }\left({\alpha }_{0}^{k}+{\boldsymbol{\alpha }}_{1}^{kT}\mathbf{X}\right),k=1,\dots ,K\}\) and the multiple candidate models for OR \(\left\{{\mu }^{l}\left(\mathbf{X},A;{{\varvec{\beta}}}^{l}\right)={g}_{\mu }\left({\beta }_{1}^{l}+{{\varvec{\beta}}}_{1}^{lT}\mathbf{X}+{\beta }_{2}^{l}A\right),l=1,\dots ,L\right\}\) need to be parametric, since the MiPS is defined as \(\pi \left(\mathbf{X};{\boldsymbol{\alpha }}_{1}^{1},...,{\boldsymbol{\alpha }}_{1}^{K},{{\varvec{\beta}}}_{1}^{1},...,{{\varvec{\beta}}}_{1}^{L}\right)=E\left[A {\boldsymbol{\alpha }}_{1}^{1T}\mathbf{X},...{\boldsymbol{\alpha }}_{1}^{KT}\mathbf{X},{{\varvec{\beta}}}_{1}^{1T}\mathbf{X},...,{{\varvec{\beta}}}_{1}^{LT}\mathbf{X}\right]\), in which we need to conduct nonparametric regression of \(A\) on \({\widehat{\boldsymbol{\alpha }}}_{1}^{1\mathrm{T}}\mathbf{X}\),…, \({\widehat{\boldsymbol{\alpha }}}_{1}^{K\mathrm{T}}\mathbf{X}\) and \({\widehat{{\varvec{\beta}}}}_{1}^{1\mathrm{T}}\mathbf{X}\),…, \({\widehat{{\varvec{\beta}}}}_{1}^{L\mathrm{T}}\mathbf{X}\). Therefore, the nonparametric models, such as the kernel function, ANN, and random forest are not suitable as candidate models for the MiPS estimator because the coefficients of covariates cannot be obtained. When the candidate models are constructed by nonparametric models, some other multiply robust approaches may be adopted to integrate the information from multiple candidate models, such as the regressionbased estimator under least square’s framework [40], the estimator based on empirical likelihood weighting [20], and the estimator based on model mixture procedures [41]. At this point, double/debiased machine learning approach may be extended to multiple/debiased machine learning for obtaining valid inference about ATE [42].
Although the performance of ANN.MiPS estimator remains stable when specifying eight candidate models, an excessive number of models can impose a heavy computational burden. Therefore, we recommend carefully constructing a comprehensive set of reasonable but less similar candidate models to control the model number in practical applications, using both subject knowledge and reliable datadriven tools, such as causality diagrams [43], variable selection techniques [44], and covariate balancing diagnostics [45].
Finally, we give some intuitive discussions about the theoretical properties of the proposed estimator. Referring to proof Chen et al. [19], \({\widehat{\Delta }}_{MiPS}^{ANN}\) is consistent for
where \({\widehat{\boldsymbol{\alpha }}}_{1}^{1},...,{\widehat{\boldsymbol{\alpha }}}_{1}^{K},{\widehat{{\varvec{\beta}}}}_{1}^{1},...,{\widehat{{\varvec{\beta}}}}_{1}^{L}\) converge to \({\overline{\boldsymbol{\alpha }} }_{1}^{1},...,{\overline{\boldsymbol{\alpha }} }_{1}^{K},{\overline{{\varvec{\beta}}} }_{1}^{1},...,{\overline{{\varvec{\beta}}} }_{1}^{L}\), \({\widehat{\pi }}^{ANN}\left(\bullet \right)\) converges to \({\overline{\pi }}^{ANN}\left(\bullet \right)\). According to some theoretical results on ANN, under certain conditions, \({\overline{\pi }}^{ANN}\left(\mathbf{X};{\overline{\boldsymbol{\alpha }} }_{1}^{1},...,{\overline{\boldsymbol{\alpha }} }_{1}^{K},{\overline{{\varvec{\beta}}} }_{1}^{1},...,{\overline{{\varvec{\beta}}} }_{1}^{L}\right)=\pi \left(\mathbf{X};{\overline{\boldsymbol{\alpha }} }_{1}^{1},...,{\overline{\boldsymbol{\alpha }} }_{1}^{K},{\overline{{\varvec{\beta}}} }_{1}^{1},...,{\overline{{\varvec{\beta}}} }_{1}^{L}\right)\). At this time, when one of candidate models for PS \(\{{\pi }^{k}\left(\mathbf{X};{\boldsymbol{\alpha }}^{k}\right)={g}_{\pi }\left({\alpha }_{0}^{k}+{\boldsymbol{\alpha }}_{1}^{kT}\mathbf{X}\right),k=1,\dots ,K\}\) is correctly specified, \(\pi \left(\mathbf{X};{\overline{\boldsymbol{\alpha }} }_{1}^{1},...,{\overline{\boldsymbol{\alpha }} }_{1}^{K},{\overline{{\varvec{\beta}}} }_{1}^{1},...,{\overline{{\varvec{\beta}}} }_{1}^{L}\right)=\pi \left(\mathbf{X}\right)\), \({\overline{\Delta } }_{MiPS}^{ANN}=\Delta\). On the other hand, when one of candidate models for OR \(\left\{{\mu }_{A}^{l}\left(\mathbf{X};{{\varvec{\beta}}}^{l}\right)={g}_{\mu }\left({\beta }_{1}^{l}+{{\varvec{\beta}}}_{1}^{lT}\mathbf{X}+{\beta }_{2}^{l}A\right),l=1,\dots ,L\right\}\) is correctly specified, \(E\left[Y {\overline{\boldsymbol{\alpha }} }_{1}^{1T}\mathbf{X},...{\overline{\boldsymbol{\alpha }} }_{1}^{KT}\mathbf{X},{\overline{{\varvec{\beta}}} }_{1}^{1T}\mathbf{X},...,{\overline{{\varvec{\beta}}} }_{1}^{LT}\mathbf{X},A \right]={\mu }_{A}\left(\mathbf{X}\right)\), \({\overline{\Delta } }_{MiPS}^{ANN}=\Delta\). As for the asymptotic distribution of proposed estimator, the variability of \({\widehat{\Delta }}_{MiPS}^{ANN}\) mainly comes from: (1) the estimated values \({\widehat{\boldsymbol{\alpha }}}_{1}^{1}\),…, \({\widehat{\boldsymbol{\alpha }}}_{1}^{K}\) of multiple PS models and the estimated values \({\widehat{{\varvec{\beta}}}}_{1}^{1}\),…, \({\widehat{{\varvec{\beta}}}}_{1}^{L}\) of multiple OR models, (2) the estimated nonparametric function \({\widehat{\pi }}^{ANN}\left(\bullet \right)\) using ANN. For the first variation, if the parameters are estimated by maximum likelihood, the asymptotic normality of the estimators has been obtained by White [46]. For the second variation, the error bound and convergence rate have been discussed in some theoretical research [29, 47]. It will be our future research topic to give and prove the theoretical properties of \({\widehat{\Delta }}_{MiPS}^{ANN}\) estimator strictly and systematically.
Conclusions
IN this study, we proposed the ANN.MiPS estimator to correct confounding bias when using the observational data to estimate the ATE. The proposed estimator allowed multiple candidate models for PS and OR, and guaranteed the estimated integrated PS is between 0 and 1. The multiple robustness property of our estimator was illustrated through simulation studies. Extra efficiency was gained compared to the kernel functionbased estimator. The proposed estimator provided a new choice for multiply robust estimation of ATE in observational studies.
Availability of data and materials
The simulated data can be simulated from the example code in the attachment. The realworld data used can be accessed from https://wwwn.cdc.gov/nchs/nhanes/nhefs/default.aspx/.
Abbreviations
 ATE:

Average treatment effect
 IPW:

Inverse probability weighting
 PS:

Propensity score
 OR:

Outcome regression
 AIPW:

Augment inverse probability weighting
 TMLE:

Target maximum likelihood estimator
 DiPS:

Doubleindex propensity score
 Ker.DiPS:

Kernel functionbased doubleindex propensity score
 MiPS:

Multiindex propensity score
 ANN:

Artificial neural network
 ANN.MiPS:

Artificial neural networkbased multiindex propensity score
 Ker.MiPS:

Kernel functionbased multiindex propensity score
 RMSE:

Root mean square error
 MCSE:

Monte Carlo standard error
 BSSE:

Bootstrapping standard error
 95CICov:

95% Confidence interval coverage rate
 NHEFS:

Nutrition Examination Survey Data  Epidemiologic Followup Study
References
Kovesdy CP, KalantarZadeh K. Observational studies versus randomized controlled trials: avenues to causal inference in nephrology. Adv Chronic Kidney Dis. 2012;19(1):11–8.
Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical sciences. New York: Cambridge University Press; 2015.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
Wooldridge JM. Inverse probability weighted Mestimators for sample selection, attrition, and stratification. Port Econ J. 2002;1(2):117–39.
Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004;23(19):2937–60.
Hernán MA, Robins JM. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC; 2020.
Joffe MM, Ten Have TR, Feldman HI, Kimmel SE. Model selection, confounder control, and marginal structural models: review and new applications. Am Stat. 2004;58(4):272–9.
Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat Med. 2010;29(3):337–46.
Keller B, Kim JS, Steiner PM. Neural networks for propensity score estimation: Simulation results and recommendations. Quantitative psychology research. Wisconsin: Springer; 2015: 279–291.
Collier ZK, Leite WL, Zhang H. Estimating propensity scores using neural networks and traditional methods: a comparative simulation study. Commun StatSimul Comput 2021:1–16.
Collier ZK, Zhang H, Liu L. Explained: Artificial intelligence for propensity score estimation in multilevel educational settings. Pract Assess Res Eval. 2022;27(1):3.
Setoguchi S, Schneeweiss S, Brookhart MA, Glynn RJ, Cook EF. Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol Drug Saf. 2008;17(6):546–55.
Elwert F, Winship C: Effect heterogeneity and bias in maineffectsonly regression models. Heuristics, probability and causality: A tribute to Judea Pearl 2010:327–336.
Vansteelandt S, Goetghebeur E. Causal inference with generalized structural mean models. J Roy Stat Soc Ser B (Stat Method). 2003;65(4):817–35.
Lu M, Sadiq S, Feaster DJ, Ishwaran H. Estimating individual treatment effect in observational data using random forest methods. J Comput Graph Stat. 2018;27(1):209–19.
Chen X, Liu Y, Ma S, Zhang Z. Efficient estimation of general treatment effects using neural networks with a diverging number of confounders. 2020. arXiv preprint arXiv:200907055.
Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Amer Statistical Assoc. 1994;89(427):846–66.
Van Der Laan MJ, Rubin D. Targeted maximum likelihood learning. Int J Biostat. 2006;2(1):1–38.
Cheng D, Chakrabortty A, Ananthakrishnan AN, Cai T. Estimating average treatment effects with a doubleindex propensity score. Biometrics. 2020;76(3):767–77.
Han P, Wang L. Estimation with missing data: beyond double robustness. Biometrika. 2013;100(2):417–30.
Han P. Multiply robust estimation in regression analysis with missing data. J Amer Statistical Assoc. 2014;109(507):1159–73.
Bellman RE. Curse of dimensionality. Adaptive control processes: a guided tour. New Jersey: Princeton University Press; 1961.
Donoho DL. Highdimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lecture. 2000;2000(1):32.
Rodrıguez G. Smoothing and nonparametric regression. New Jersey: Princeton University 2001.
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M. Tensorflow: Largescale machine learning on heterogeneous distributed systems. 2016. arXiv preprint arXiv:160304467.
Kingma DP, Ba J. Adam: A method for stochastic optimization. 2014. arXiv preprint arXiv:14126980.
Mitchell TM, Mitchell TM. Machine learning, vol. 1. New York: McGrawhill; 1997.
Bzdok D, Krzywinski M, Altman N. Machine learning: a primer. Nat Methods. 2017;14(12):1119.
Bauer B, Kohler M. On deep learning as a remedy for the curse of dimensionality in nonparametric regression. Ann Stat. 2019;47(4):2261–85.
Chen X, White H. Improved rates and asymptotic normality for nonparametric neural network estimators. IEEE Trans Inf Theory. 1999;45(2):682–91.
White H, Gallant AR. Artificial Neural Networks: Approximation and Learning Theory. Oxford: Blackwell; 1992.
Hornik K, Stinchcombe M, White H, Auer P. Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 1994;6(6):1262–75.
Yarotsky D. Optimal approximation of continuous functions by very deep ReLU networks. In: 2018: Stockholm: PMLR: 639–649.
Conn D, Li G. An oracle property of the NadarayaWatson kernel estimator for highdimensional nonparametric regression. Scand J Stat. 2019;46(3):735–64.
Hart PE, Stork DG, Duda RO. Pattern classification. New Jersey: Wiley Hoboken; 2000.
HechtNielsen R. Theory of the backpropagation neural network. Neural networks for perception. California: Academic Press; 1992:65–93.
Limas MC, Meré JBO, Marcos AG, Ascacíbar FJMdP, Espinoza AVP, Elias F, Ramos JMP. AMORE: A MORE flexible neural network package. In: 2014; 2014.
Kyurkchiev N, Markov S. Sigmoid functions: some approximation and modelling aspects. Saarbrucken: LAP LAMBERT Academic Publishing; 2015. p. 4.
Team RC. R: A language and environment for statistical computing. 2013.
Chan KCG. A simple multiply robust estimator for missing response problem. Stat. 2013;2(1):143–9.
Li W, Gu Y, Liu L. Demystifying a class of multiply robust estimators. Biometrika. 2020;107(4):919–33.
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, Robins J. Double/debiased machine learning for treatment and structural parameters. In.: Oxford University Press, Oxford, UK; 2018.
Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82(4):669–88.
VanderWeele TJ. Principles of confounder selection. Eur J Epidemiol. 2019;34(3):211–9.
Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensityscore matched samples. Stat Med. 2009;28(25):3083–107.
White H. Maximum likelihood estimation of misspecified models. Econometrica: J Econom Society. 1982;50(1):1–25.
SchmidtHieber J. Nonparametric regression using deep neural networks with ReLU activation function. Ann Stat. 2020;48(4):1875–97.
Acknowledgements
None.
Funding
This work was funded by National Natural Science Foundation of China (No.82173612, No.82273730), Shanghai RisingStar Program (21QA1401300), Shanghai Municipal Natural Science Foundation (22ZR1414900), Shanghai Special Program: Clinical Multidisciplinary Treatment System and Systems Epidemiology Research, and Shanghai Municipal Science and Technology Major Project (ZD2021CY001). The sponsors had no role in study design, data collection, data analysis, data interpretation, or writing of this report.
Author information
Authors and Affiliations
Contributions
GYQ and YFY designed the study. JQX and KCW wrote the manuscript. JQX performed simulations and analyzed the realworld data. CW, CH, YXX, and RZ revised the manuscript. All authors have provided critical comments on the draft, and read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Since the simulated datasets did not involve any human data, ethics approval was not applicable. Since the real data is publicly available, ethics approval was not required.
Consent for publication
Not applicable.
Competing interests
The authors declared no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1:
Fig. S1. The distribution of the estimated average treatment effect for kernelbased MiPS estimator and artificial neural networkbased MiPS estimator in 1000 simulated data sets. The range of the yaxis is restricted from 1.4 to 0.6 given that the kernelbased MiPS estimator yields highly biased estimation under some model specifications. The dashed line denotes the true average treatment effect. Table S1. Estimation results for scenario with both continuous and discrete covariates under 50% treated based on 1000 replications. Table S2. Estimation results of multiindex propensity score estimator incorporating extra incorrect models under scenario with both continuous and discrete covariates. Table S3. Sensitivity analysis of ANN.MiPS estimator with different tuning parameters selection for ANN under scenario of all continuous covariates and 50% treated.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Xu, J., Wei, K., Wang, C. et al. Estimation of average treatment effect based on a multiindex propensity score. BMC Med Res Methodol 22, 337 (2022). https://doi.org/10.1186/s12874022018223
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874022018223