Skip to main content

Propensity score analysis with missing data using a multi-task neural network



Propensity score analysis is increasingly used to control for confounding factors in observational studies. Unfortunately, unavoidable missing values make estimating propensity scores extremely challenging. We propose a new method for estimating propensity scores in data with missing values.

Materials and methods

Both simulated and real-world datasets are used in our experiments. The simulated datasets were constructed under 2 scenarios, the presence (T = 1) and the absence (T = 0) of the true effect. The real-world dataset comes from LaLonde’s employment training program. We construct missing data with varying degrees of missing rates under three missing mechanisms: MAR, MCAR, and MNAR. Then we compare MTNN with 2 other traditional methods in different scenarios. The experiments in each scenario were repeated 20,000 times. Our code is publicly available at


Under the three missing mechanisms of MAR, MCAR and MNAR, the RMSE between the effect and the true effect estimated by our proposed method is the smallest in simulations and in real-world data. Furthermore, the standard deviation of the effect estimated by our method is the smallest. In situations where the missing rate is low, the estimation of our method is more accurate.


MTNN can perform propensity score estimation and missing value filling at the same time through shared hidden layers and joint learning, which solves the dilemma of traditional methods and is very suitable for estimating true effects in samples with missing values. The method is expected to be broadly generalized and applied to real-world observational studies.

Peer Review reports


In observational studies, propensity scores are increasingly used to control for confounding [1, 2]. When the observed baseline characteristics are sufficient to correct for confounding bias and the propensity model is correctly constructed, a conditional exchange can be conducted between subjects with the same propensity score [3, 4]. Observational studies usually inevitably have covariates with missing values. Currently, estimating the propensity score in the presence of missing values is a challenge for studying causality [5,6,7,8]. Common approaches to dealing with missing values in propensity analysis include full-case analysis, adding missing indicator variables to the propensity model, and multiple imputation [9,10,11]. Unfortunately, these methods are inherently flawed. For example, the missing indicator method introduces new biases [12]. There are studies using machine learning methods to replace traditional logistic regression [13,14,15,16,17]. However, they do not address the propensity score misestimation problem caused by overfitting. In contrast to hand-crafted models [18], neural networks can automatically learn interactions between variables. A multi-task neural network is a network structure with multiple outputs. It has been widely used in the medical field. With a multi-task neural network, propensity score computation and missing value filling can be performed jointly. By optimizing the global objective function, overfitting to the propensity score calculation task can be prevented, while the estimation problem of missing value [19] is effectively solved. This study develops a new pipeline for calculating propensity scores in samples with missing values based on a multi-task neural network. To evaluate the accuracy of our model in estimating the true effect, we conduct experiments on simulated and real-world data separately, and compare our method with traditional methods.

Data and methods

Propensity score

In a study, individual subjects may have multiple covariates. Propensity scoring is a way of simplification multiple covariates [20]. It condenses multiple covariates into a single variable (propensity score), whose meaning is the conditional probability of being assigned to the experimental group depending on the covariates [21]. A propensity score can be viewed as a function of the original multiple covariates, so the propensity score includes information about these covariates. Rosenbaum and Rubin demonstrated that the propensity score e(X) can be used to balance the distribution of a covariate between experimental and control groups when the covariate X meets the strong negligibility assumption [3].

$$e\left({X}_i\right)=\Pr \left({T}_i=1\left|{X}_i\right.\right)$$

Propensity score estimation

In complete data, logistic regression is the most commonly used method for estimating propensity scores under the conditions of binary treatment or exposure [22]. The propensity score is calculated by performing binary regression on covariates (i.e. potential confounders) by treatment or exposure indicator variables, which can be written as:

$$\textrm{logit}\left({p}_i\left(T=1\right)\right)={X}^{\prime}\beta, i=1,2,\dots, n$$

where, X = (1, X1, X2, …, XK), β = (β0, β1, β2, …, βK), K is the number of covariates and n is the number of observations. An individual’s propensity score can be estimated as

$${p}_i=\frac{{\textrm{e}}^{X_i^{\prime}\beta }}{1+{\textrm{e}}^{X_i^{\prime}\beta }}$$

In many situations, logistic regression may not be the best choice when estimating propensity scores. We assumed that the log probability of exposure was linearly related to covariates when using logistic regression to estimate exposure probabilities. However, this assumption is not always true. Logistic regression cannot estimate propensity scores accurately when covariates interact with each other or when covariates and treatments are not linear. To solve the inherent problem of logistic regression estimation of propensity scores, some studies substitute machine learning algorithms for logistic regression. These include decision trees, random forests, Naive Bayes, support vector machines, etc. [13,14,15, 23, 24] It is claimed that these methods can provide a more accurate estimate of propensity scores. Nevertheless, these conclusions have not been validated by systematic simulation studies.

Missing data

In realistic observational studies, individual covariates may have large amounts of missing data, which may lead to both loss of efficiency and biased estimates. Based on the degree to which confounding factors are related to outcome and exposure, the magnitude of bias varies.

Type of missing data

There are three types of missing data depending on the mechanism of missing: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) [25, 26]. MCAR refers to missing data when a random subset of the study population has the same probability of being missing. In contrast to MCAR, the term MAR is counterintuitive. MAR occurs when the probability of missing is dependent only on the observed information. Missing data are denoted by MNAR when their probability depends on the unobserved data, such as the observation value itself.

Methods for handling missing values

Complete case analysis is the easiest way to deal with incomplete confounding data, which restricts the analysis to cases where all variables are complete. If the absence of covariates is independent of treatment and outcome, then this approach provides unbiased estimates of group effects. Another simple method is the missing indicator method [27]. Before incorporating confounding into a propensity score model, add a “missing” category to partially observed categories. Continuous confounders are set to a specific value, such as 0, and both the confounding factor and missingness indicator (a variable that indicates whether the variable is observed) are included in the propensity score model. In many cases, this approach leads to biased results. Missing pattern analysis is a generalization of the missing index method. This method is used when all individuals are grouped together according to different missing patterns. Then, propensity scores are estimated in each group separately. As a practical matter, this method fails when the number of participants with missing patterns is lower than the number of observed covariates. It usually occurs when there are a lot of missing patterns in the data. Multiple imputation is a method of using chain equations to impute incomplete data, in which the missing covariates are imputed with plausible values based on the predicted distribution of the missing covariates in a set of observed data many times to create complete datasets [28, 29]. We used MICE (version 3.3.0) in R (version 3.6.3) to perform multiple imputation. A Bayesian linear regression was used for the mice model. It is commonly used when covariates and outcomes are continuous. Other parameters are set as defaults.

Inverse probability weighting

Inverse probability weighting (IPW) uses the inverse of the propensity score as weights to create a synthetic sample in which the baseline covariate distribution is independent of treatment assignment [30]. In this study, we use IPW to estimate the true effect. Unlike propensity score matching, IPW uses all individuals in both groups, thus avoiding sample waste. A high level of statistical power was maintained in all cases to detect effects. IPW was more sensitive to erroneous propensity score estimation. This limitation emphasizes the importance of carefully defining model selection before applying propensity score weighting. Multi-task neural networks can overcome this limitation.

Multi-task neural network

Neuronal networks are excellent function approximators, which can estimate linear and nonlinear functions. It uses data samples with known outcomes as examples for supervised training. In this process, a nonlinear function model is built to predict the output data based on the input data. Figure 1(a) shows three independent neural networks. All networks have the same inputs and outputs. Back-propagation is used to train each net separately. There is no connection between the three nets, so the information that one learns cannot help the others. This is known as single-task learning (STL). Figure 1(b) shows a single net with the same inputs as those on the left, but three outputs corresponding to the learning task. Each of the 3 outputs is connected to the same hidden layer. Three of the MTL outputs undergo parallel backpropagation. These results share a hidden layer, meaning the internal representation of one task is available for other tasks. The core idea of multitask learning is to share knowledge learned from different tasks and to train them simultaneously.

Fig. 1
figure 1

Structure diagram of multi-task neural network

In this study, we propose a novel pipeline using a multi-task neural network (MTNN) to estimate propensity scores. There are three parts to our task set: reconstructing input covariates, estimating propensity scores, and predicting missing patterns. There is a close relationship between these tasks. The structure of MTNN is shown in Fig. 1. In order to achieve joint optimality across all tasks, the MTNN must correctly learn the relationship between covariates, covariates and absence, and covariates and exposure levels. Through joint learning and sharing hidden layers, MTNN reduces overfitting when estimating propensity scores. The detailed calculation procedure and more information about MTNN training can be found in Supplementary S1. Our tutorial and source code for MTNN are also available on githubFootnote 1 so readers can apply our method to real problems and gain a deeper understanding of it. Models for missing value imputation and estimation of propensity scores are determined from the convergence of the objective function. In all experiments in this study, we chose the model for the last epoch after convergence.


Simulation data

We adopted a data simulation generation process similar to that of Choi [7]. Two scenarios were considered, one in which the outcome was treatment-related (effect≠0), and one in which it was treatment-independent (effect = 0). In each scenario, we considered three different deletion mechanisms. First, we generated 2 continuous covariates, X1 and X2, for each subject. X1 follows a normal distribution with mean 0 and standard deviation 1. X2 depends on X1.

$${X}_{2i}=0.5{X}_{1i}+{\varepsilon}_i\ \textrm{with}\ {\varepsilon}_i\sim N\left(\textrm{0,0.75}\right)$$

In this way, the standard deviation of X2 is also 1, and the correlation between X1 and X2 is equal to 0.5. The treatment T was generated from the binomial distribution, with the probability for subject I to receive the treatment being equal to:


By this equation, about 30% of subjects were treated.

We constructed 2 scenarios:

Scenario 1: the outcome is affected by treatment: we assume, without losing generality, that treatment has an effect of 1 on the subject’s outcome.

$${Y}_i={X}_{1i}+{X}_{2i}+ Trea{t}_i+{\varepsilon}_i,\textrm{with}\ {\varepsilon}_i\sim N\left(0,1\right)$$

Scenario 2: the outcome is unrelated to the treatment.

$${Y}_i={X}_{1i}+{X}_{2i}+{\varepsilon}_i,\textrm{with}\ {\varepsilon}_i\sim N\left(0,1\right)$$

To test the effect of different missing rates on effect estimation in simulated datasets, we preset 7 missing rates, including 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8. Missing values in X2 are generated using three mechanisms:

  1. (1)

    MCAR: In X2, a random proportion of observations is set to be missing.

  2. (2)

    MAR: The higher the value of X1, the more likely the value of X2 is missing. Taking M as the missing indicator of X2, the probability of missing X2 value is:

  3. (3)

    MNAR: The higher the value of X2, the more likely the value is missing. The probability of missing an X2 value is:


C is a constant used to control the missing rate. As an example, if a missing rate of around 50% is to be controlled, C can be set to 0.

Real-world data

The real-world data come from a subset of the data from the treated group in the National Supported Work Demonstration (NSWD) and the comparison sample from the Population Survey of Income Dynamics (PSID). The dataset has been used by many researchers to test the effects of different propensity score analysis methods [31, 32]. There are 614 samples in this dataset (185 treatments and 429 controls). Each person has 9 variables. Table S1 provides more details. Treat is the intervention variable, re78 is the outcome, and the other 7 variables are covariates. Table S2 summarizes the distribution of covariates between different treatment groups. It shows that the distributions of the variables age, race, married, nondegree, re74, re75 differ between groups. Therefore, we need to correct the effect estimates with propensity scores.

Our experiments used the inverse probability-weighted effect size of the propensity score calculated from the complete data as the reference. Simulations were then performed to estimate the true effect under the three missing mechanisms. We made missing values occur in both variables re74 and re75. In each of these variables, missing values were constructed randomly. Similar to the setting we used for simulated datasets, we used 7 missing rate settings for real-world datasets: 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8.

  1. (1)

    MCAR: In both variables re74 and re75, randomly selected given proportion of observations are set to be missing.

  2. (2)

    MAR: The missing rate is assumed to be proportional to a linear combination of age and education. These 2 variables were chosen arbitrarily without loss of generality, as there were correlations between the covariates (table S8). To facilitate setting the probability of missing, we normalize the age and years of education so that the mean is 0. Let M1 and M2 represent the missing indicators of re74 and re75, respectively, then their missing probability is:

  1. (3)

    MNAR: The higher the value of a variable, the more likely that value is missing. Similar to age and years of education, we also normalize re74 and re75. Then the probability of re74/re75 missing is:


Estimation of the true effect

The first step is to deal with missing values in the samples. As MTNN computes propensity scores and imputation values simultaneously, it does not require separate missing value processing. When propensity scores were estimated by logistic regression, multiple imputation and missing indicator methods were used to handle missing values. We estimate propensity scores using age, education, race, marital status, education, and re74 and re75 as covariates. These 7 covariates are also included in the regression analysis used to estimate effect. Lastly, we estimated the effect using an inverse probability-weighted regression analysis of the propensity score, in which subjects receiving treatment were weighed 1/propensity score and subjects not receiving treatment were weighed 1/(1 - propensity score). Figure 2 shows the workflow for estimating the effects of the three methods.

Fig. 2
figure 2

Flowchart of the three methods for estimating effect. a the missing index method; b the multiple imputation method; c multi-task neural network method. MTNN, multi-task neural network


There are 2 kinds of effects in the experiments with simulated data, and three mechanisms for handling missing values, i.e., 6 scenarios for generating simulated data, and 3 methods for handling missing values. In experiments with real-world data, there are three missing mechanisms, namely three scenarios. For each scenario, the same process of missing value imputation, propensity score calculation, and effect estimation was repeated 20,000 times before evaluating the results of the different methods. Comparisons are conducted based on standard deviations (SD) and root mean square errors (RMSE), which is defined as:

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^n\kern0.1em {\left({\hat{\beta}}_i-\beta \right)}^2}$$

Where \(\hat{\beta}\) is the estimate and β is the true value.


Analysis results on simulation datasets

Figure 3 shows the RMSE of the true effect estimates under 2 true effect scenarios and three missing mechanisms. The smallest RMSE for all 6 data scenarios is achieved with MTNN. Thus, MTNN seems to be the best method over the other two. In addition, regardless of the choice of method used, the higher the missing rate, the higher the RMSE. When the missing rate was increased from 0.2 to 0.8, the RMSE for any of the three estimation methods nearly doubled. Table 1, Table S3 and Table S4 present more detailed information on the estimation results for the three methods. In all scenarios of data, we find that MTNN is not only optimal in estimation of true effect deviation, but also that the standard deviation of its estimation results is the smallest. This shows that MTNN provides the most accurate estimation, as well as being more stable than other methods.

Fig. 3
figure 3

Root mean square error of the true effect estimated by different methods under three missing mechanisms in the simulation dataset. (a), (d) are under MCAR, (b), (e) are under MCAR, (c)-(f) is under MNAR. For a, b and c, the true effect is 0; for d, e and f, the true effect is 1. MCAR, missing completely at random; MAR, missing at random; MNAR, missing not at random

Table 1 Estimation of the true effect in the simulated datasets using three different methods under the MCAR mechanism

Analysis results on real-world datasets

We first calculated the propensity score by logistic regression from the complete data, and then used the inverse probability-weighted regression equation to calculate the effect to be 712.743 (Table S7). Since the true effect of real-world data is unknowable, we use it as a reference standard to compare the performance of different methods.

Figure 4 compares RMSE between different methods under three distinct missing mechanisms. According to the analysis results of simulated data, MTNN exhibited the smallest RMSE under different missing mechanisms and missing rates. The difference is that in the real-world dataset, the missing rate is less influential on the RMSE of the estimated result. Table 2, Table S5 and Table S6 provide further details of the estimation results for the various methods. It is clear that the standard deviation of the MTNN estimation results is lower than that of the 2 other methods. Figures 56 and 7 show the between-group standardized mean differences (SMD) of each covariate adjusted by the propensity scores estimated by the three methods under the three missing mechanisms.

Fig. 4
figure 4

RMSE of the true effect estimated by different methods under three missing mechanisms in the real-world dataset. a MCAR, b MAR, c MNAR

Table 2 Estimation of the true effect in the real-world datasets using three different methods under the MCAR mechanism
Fig. 5
figure 5

Between-group standardized mean differences under MCAR for covariates adjusted for propensity scores calculated by three different methods

Fig. 6
figure 6

Between-group standardized mean differences under MAR for covariates adjusted for propensity scores calculated by three different methods

Fig. 7
figure 7

Between-group standardized mean differences under MNAR for covariates adjusted for propensity scores calculated by three different methods


In this study, we develop a novel method for calculating propensity scores with multi-task neural networks that can calculate propensity scores directly for samples with missing values. On simulated and real-world datasets, we compare the proposed method with two commonly used ones. Under the three missing mechanisms, the RMSE of our proposed method for estimating the true effect is the smallest. In addition, the standard deviation of the true effect estimated by MTNN is the smallest, indicating that it is more robust than the other two methods. While previous studies have demonstrated smaller RMSEs for machine learning algorithms, our study confirms these findings in scenarios with missing values [33,34,35,36]. We also found that under lower missing rate conditions, the RMSE of the missing indicator method is better than multiple imputation for all 3 missing mechanisms. This result is consistent with the previous study [7].

Recent studies have used autoencoders to reduce the dimension of high-dimensional features and then calculate propensity scores using the reduced features [17]. It leverages the ability of neural networks to deal with high-dimensional data. However, they did not consider reconstruction and computation of the propensity score as joint tasks. Instead, we train the model together with reconstruction of the input, prediction of missing patterns, and estimation of propensity scores as joint tasks to prevent overfitting. It causes propensity scores to be close to zero or one, resulting in biased estimates of the effects.

As the variable dimension increases in observational studies, the relationship between variables will be more complex, and missing will be more difficult to avoid. It also becomes increasingly difficult to manually determine propensity models for high-dimensional variables. The neural network has the ability to model complex models, so there is no need to manually specify the so-called correct model, and the neural network can learn adaptively by observing the data. Multiple imputation is expensive for large datasets. In contrast, for the MTNN model, the computational cost of this process is smaller. Furthermore, Compared to multiple imputation [37], MTNN does not require any prior assumptions about the distribution of the data. It automatically learns the correlations between variables, thus impute their missing values.

In practice, a missing rate of greater than 30% is generally considered too high to make a reliable inference, but we want to thoroughly test the MTNN model’s stability and performance under different missing rate scenarios. Due to this, we have created a list of missing rates that are relatively high. We found that even when the missing rate is high, MTNN still performs well. It shows that the correlation between variables can be captured and utilized very effectively. Even though an increase in missing rates decreases the performance of the MTNN model, it still outperforms other methods.


Our study also has some limitations. First, there is a slight difference in performance between simulated and real data for the MTNN model. The reason for this phenomenon is that in real-world data, relationships between variables are more complex. It is difficult to simulate these unknowable complex connections manually. Due to the fact that our experiments simulate only the simplest possible case, there is a slightly different result between the 2 types of data. Second, we cannot know the true effect of real-world data. Our model aims to establish a more accurate method for estimating model parameters when missing values are present. For this purpose, a complete real data modeling process is used as a standard of evaluation. It is our goal to prove that the proposed method can estimate the parameter value with the missing value as close as possible to the parameter value estimated without the missing value. Therefore, “true effect” should actually mean “effect estimated from full data” in real-world data. Third, MTNN assumes that input variables are correlated. Using the joint learning technique and the shared hidden layer, this correlation is used to estimate propensity scores and fill in missing values. When the input variables are independent or weakly correlated, MTNN may be unable to provide accurate estimates.


In this study, we propose a novel method for estimating propensity scores in data with missing values. It is based on a multi-task neural network, where missing value imputation and propensity score estimation are jointly trained as related tasks. Through the experimental results of simulated data and real-world data, we prove that our model has the smallest error in estimating the true effect under different missing mechanisms and different missing rates, and the standard deviation of the effect estimate is also the smallest. This shows that our method has good applicability in real-world observational studies with missing values.

Availability of data and materials

The data in this study is available from the corresponding author on reasonable request. Readers interested in the code of the simulation analysis may contact the corresponding author.




  1. Webster‐Clark M, Stürmer T, Wang T, Man K, Marinac‐Dabic D, Rothman KJ, et al. Using propensity scores to estimate effects of treatment initiation decisions: state of the science. Stat Med. 2021;40(7):1718–35.

    Article  PubMed  Google Scholar 

  2. Austin PC, Jembere N, Chiu M. Propensity score matching and complex surveys [J]. Stat Methods Med Res. 2018;27(4):1240–57.

    Article  PubMed  Google Scholar 

  3. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.

    Article  Google Scholar 

  4. Lin J, Gamalo‐Siebers M, Tiwari R. Propensity-score-based priors for Bayesian augmented control design. Pharm Stat. 2019;18(2):223–38.

    Article  PubMed  Google Scholar 

  5. Cham H, West SG. Propensity score analysis with missing data. Psychol Methods. 2016;21(3):427.

    Article  PubMed  Google Scholar 

  6. D'Agostino RB Jr, Rubin DB. Estimating and using propensity scores with partially missing data. J Am Stat Assoc. 2000;95(451):749–59.

    Article  Google Scholar 

  7. Choi J, Dekkers OM, le Cessie S. A comparison of different methods to handle missing data in the context of propensity score analysis. Eur J Epidemiol. 2019;34(1):23–36.

    Article  CAS  PubMed  Google Scholar 

  8. Malla L, Perera-Salazar R, McFadden E, Ogero M, Stepniewska K, English M. Handling missing data in propensity score estimation in comparative effectiveness evaluations: a systematic review [J]. Journal of comparative effectiveness research. 2018;7(3):271–9.

    Article  PubMed  Google Scholar 

  9. Shao J, Wang L. Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika. 2016;103(1):175–87.

    Article  Google Scholar 

  10. Qu Y, Lipkovich I. Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach. Stat Med. 2009;28(9):1402–14.

    Article  PubMed  Google Scholar 

  11. Crowe BJ, Lipkovich IA, Wang O. Comparison of several imputation methods for missing baseline data in propensity scores analysis of binary outcome. Pharm Stat. 2010;9(4):269–79.

    Article  PubMed  Google Scholar 

  12. Mattei A. Estimating and using propensity score in presence of missing background data: an application to assess the impact of childbearing on wellbeing. Statistical Methods and Applications. 2009;18(2):257–73.

    Article  Google Scholar 

  13. Linden A, Yarnold PR. Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments. J Eval Clin Pract. 2016;22(6):875–85.

    Article  Google Scholar 

  14. Cannas M, Arpino B. A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting. Biom J. 2019;61(4):1049–72.

    PubMed  Google Scholar 

  15. Tu C. Comparison of various machine learning algorithms for estimating generalized propensity score. J Stat Comput Simul. 2019;89(4):708–19.

    Article  Google Scholar 

  16. Setoguchi S, Schneeweiss S, Brookhart MA, et al. Evaluating uses of data mining techniques in propensity score estimation: a simulation study [J]. Pharmacoepidemiol Drug Saf. 2008;17(6):546–55.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Weberpals J, Becker T, Davies J, et al. Deep learning-based propensity scores for confounding control in comparative effectiveness research: a large-scale, real-world data study [J]. Epidemiology. 2021;32(3):378–88.

    Article  PubMed  Google Scholar 

  18. Kubat M. Neural networks: a comprehensive foundation by Simon Haykin Macmillan ISBN 0–02–352781-7. The Knowledge Engineering Review. 1999;13(4):409–12.

    Article  Google Scholar 

  19. Caruana R. Multitask learning. Mach Learn. 1997;28(1):41–75.

    Article  Google Scholar 

  20. Guo S, Fraser MW. Propensity score analysis: statistical methods and applications: SAGE publications; 2014.

    Google Scholar 

  21. Stuart EA. Matching methods for causal inference: a review and a look forward. Statistical science: a review journal of the Institute of Mathematical Statistics. 2010;25(1):1.

    Article  PubMed  Google Scholar 

  22. Cepeda MS, Boston R, Farrar JT, et al. Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. Am J Epidemiol. 2003;158(3):280–7.

    Article  PubMed  Google Scholar 

  23. Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning [J]. Stat Med. 2010;29(3):337–46.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Westreich D, Lessler J, Funk MJ. Propensity score estimation: machine learning and classification methods as alternatives to logistic regression. J Clin Epidemiol. 2010;63(8):826.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Santos MS, Pereira RC, Costa AF, et al. Generating synthetic missing data: a review by missing mechanism. IEEE Access. 2019;7:11651–67.

    Article  Google Scholar 

  26. Garciarena U, Santana R. An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. Expert Syst Appl. 2017;89:52–65.

    Article  Google Scholar 

  27. West SG, Cham H, Thoemmes F, et al. Propensity scores as a basis for equating groups: basic principles and application in clinical treatment outcome research. J Consult Clin Psychol. 2014;82(5):906.

    Article  PubMed  Google Scholar 

  28. Zhang P. Multiple imputation: theory and method. International Statistical Review/Revue Internationale de Statistique. 2003:581–92.

  29. Li P, Stuart EA, Allison DB. Multiple imputation: a flexible tool for handling missing data. Jama. 2015;314(18):1966–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424.

    Article  Google Scholar 

  31. Lalonde RJ. Evaluating the econometric evaluations of training programs with experimental data. Am Econ Rev. 1986:604–20.

  32. Dehejia RH, Wahba S. Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. J Am Stat Assoc. 1999;94(448):1053–62.

    Article  Google Scholar 

  33. Karim ME, Pang M, Platt RW. Can we train machine learning methods to outperform the high-dimensional propensity score algorithm? Epidemiology. 2018;29(2):191–8.

    Article  PubMed  Google Scholar 

  34. Wyss R, Schneeweiss S, Van Der Laan M, et al. Using super learner prediction modeling to improve high-dimensional propensity score estimation. Epidemiology. 2018;29(1):96–106.

    Article  PubMed  Google Scholar 

  35. Ju C, Combs M, Lendle SD, et al. Propensity score prediction for electronic healthcare databases using super learner and high-dimensional propensity score methods. J Appl Stat. 2019;46(12):2216–36.

    Article  PubMed  Google Scholar 

  36. Choi BY, Wang C-P, Michalek J, et al. Power comparison for propensity score methods. Comput Stat. 2019;34(2):743–61.

    Article  Google Scholar 

  37. Liu X. Methods and applications of longitudinal data analysis: Elsevier; 2015.

    Google Scholar 

Download references


The authors would like to thank Professor He Daihai for theoretical guidance.


This work was partially supported by the National Natural Science Foundation of China [grant number 11901352]; the Research Grants Council of the Hong Kong Special Administrative Region, China [HKU C7123-20G]; “Coronavirus Disease Special Project” of Xinglin Scholars of Chengdu University of Traditional Chinese Medicine [grant number XGZX2013].

Author information

Authors and Affiliations



Study conception and design: S Yang, J Luo and X Yan; Collection and creation of data: J Luo, P Du and S Yang; Data analysis and interpretation: S Yang, J Luo, X Yan, X Feng, P Du; Drafting the manuscript and figures: all authors; Final approval of manuscript: all authors.

Corresponding authors

Correspondence to Xiaodong Yan or Jiawei Luo.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors have no conflicts of interest to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

 Table S1 Variable descriptions for the real dataset. Table S2 Summary of the real dataset. Table S3 Estimation of the true effect in the simulated datasets using three different methods under the MAR mechanism. Table S4 Estimation of the true effect in the simulated datasets using three different methods under the MNAR mechanism. Table S5 Estimation of the true effect in the real datasets using three different methods under the MAR mechanism. Table S6 Estimation of the true effect in the real datasets using three different methods under the MNAR mechanism. Table S7 Regression coefficients for real-world data without missing values. Table S8 Spearman's correlation coefficient for each input variable in real-world data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, S., Du, P., Feng, X. et al. Propensity score analysis with missing data using a multi-task neural network. BMC Med Res Methodol 23, 41 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: