Multiply robust estimator for the difference in survival functions using pseudo-observations

Background When estimating the causal effect on survival outcomes in observational studies, it is necessary to adjust confounding factors due to unbalanced covariates between treatment and control groups. There is no study on multiple robust method for estimating the difference in survival functions. In this study, we propose a multiply robust (MR) estimator, allowing multiple propensity score models and outcome regression models, to provide multiple protection. Method Based on the previous MR estimator (Han 2014) and pseudo-observation approach, we proposed a new MR estimator for estimating the difference in survival functions. The proposed MR estimator based on the pseudo-observation approach has several advantages. First, the proposed estimator has a small bias when any PS and OR models were correctly specified. Second, the proposed estimator considers the advantage pf the pseudo-observation approach, which avoids proportional hazards assumption. A Monte Carlo simulation study was performed to evaluate the performance of the proposed estimator. And the proposed estimator was used to estimate the effect of chemotherapy on triple-negative breast cancer (TNBC) in real data. Results The simulation studies showed that the bias of the proposed estimator was small, and the coverage rate was close to 95% when any model for propensity score or outcome regression is correctly specified regardless of whether the proportional hazard assumption holds, finite sample size and censoring rate. And the simulation results also showed that even though the propensity score models are misspecified, the bias of the proposed estimator was still small when there is a correct model in candidate outcome regression models. And we applied the proposed estimator in real data, finding that chemotherapy could improve the prognosis of TNBC. Conclusions The proposed estimator, allowing multiple propensity score and outcome regression models, provides multiple protection for estimating the difference in survival functions. The proposed estimator provided a new choice when researchers have a "difficult time" choosing only one model for their studies. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-023-02065-6.


Background
Inferring the causal effect between specific treatment (exposure or intervention) and the outcome is the primary goal of much-applied research.Survival outcomes are common in epidemiologic studies and require special handling due to being subject to the right censoring during follow-up.In randomized controlled trials, the difference in survival functions based on Kaplan-Meier (KM) estimator between treatment and control groups can be used to assess the causal effect of treatment on survival outcomes [1,2].However, when treatment assignment is not randomized in observational studies, there are some unbalanced covariates between treatment and control groups, which leads to a biased estimation of the difference in survival functions [3].There are two general approaches to eliminating or reducing confounding biases.The first approach is direct confounding adjustment via an outcome regression (OR) model such as Cox model [4].The survival function in the control or treatment group is estimated by averaging the fitted values from the outcome regression model for each group, but the estimation may be biased when the proportional hazards assumption fails.Another approach is combing KM estimator with inverse probability weighting (IPW), where each subject is weighted by the inverse of the probability of receiving the treatment, known as propensity score (PS), thereby balancing the distribution of covariates between control and treatment groups [5,6].IPW method relies on the PS model reflecting the relationship of confounding factors with treatment [7].The two methods above are unbiased only if the OR and PS models are correctly specified, which are usually unknown in practice.
There is growing interest in developing doubly robust (DR) methods for estimating the average causal effect (ACE) of survival outcomes to protect against potential misspecification of OR or PS model [4,[8][9][10].DR methods are based on combing the IPW and direct confounding adjustment methods and provide double protection from model misspecification, as DR methods are valid when either one of the OR or PS models is specified correctly.However, DR methods do not provide sufficient protection for estimation consistency, as they allow only one model for IPW and one for direct confounding adjustment.Consistency of DR methods is still lost when PS and OR models are both misspecified [11,12].
Compared to DR methods, multiply robust approaches allow multiple PS or OR models, increasing the likelihood of including the correct model.And multiply robust approaches are consistent when the candidate models contain correctly specified models [13].Although multiply robust approaches are used for casual inference [14] or estimating a population mean with missing values [11,12], they are proposed in the context of a non-survival outcome.Recently, Shu et al. [13] proposed a multiply robust approach for estimating marginal hazard ratio, yet the method only includes PS models, not OR models.
In this study, we proposed a multiply robust (MR) estimator, including multiple PS and OR models, for estimating the difference in survival functions, which has not been studied in the literature.

Notation
For each subject i(i = 1, 2, . . ., n ), we observe a p-dimensional vector of covariates X i , treatment status Z i (i.e., it could be treatment status: Z i = 1 if treated and Z i = 0 if untreated), observed survival time T i (the smaller value of true survival time T i and censoring time C i ) and cen- soring indicator δ i (δ i = 1 when T i = T i and δ i = 0 when T i = C i ).We assume that true survival time T i and cen- soring time are independent, given X i and Z i .
Note that T 1 , T 0 represent a pair of potential sur- vival times and cannot be observed simultaneously for a subject, where T 1 represents the one that would occur if every subject were treated and T 0 represents the one that would happen if every subject were untreated.We relate the observed survival time to the potential survival times according to the consistency assumption, that is, T = T Z = ZT 1 + (1 − Z)T 0 .The difference in sur- vival functions between treatment and control groups is defined to be where t is the time point at which the difference in sur- vival functions is estimated.where S 1 (t) is the survival function of the treatment group and S 0 (t) is the survival function of the control group.

Pseudo-observations for survival outcomes
Although survival functions for every individual can not be observed, they can be estimated by the pseudo-observation method proposed by Andersen et al. [15] and used to estimate the survival functions at a single time point [16].Further, this approach was applied to estimate ACE of survival outcomes without considering assumptions of proportional hazards [17].The pseudo-observation for ith individual survival function at a fixed time point t is defined as where S(t) and S −i (t) are the KM estimator using all samples and leaving out ith individual, respectively.Graw et al. [18] have examined the asymptotic properties of (1) the pseudo observations for competing risk models, also applying to S i (t).

Pseudo-observation-based IPW
In observational studies, direct comparison of survival functions between two groups may lead to a biased causal effect estimation due to unbalanced covariates between two groups.The IPW method based on the PS π (X) is commonly used to adjust for confounding bias.We assume that there is no unmeasured confounding and positivity (0 < π(X) < 1) [3].The ACE can be estimated by IPW estimator in the context of a non-survival outcome as follows where Y i is the ith individual outcome (continuous or dis- crete outcomes), and π is the estimated values of π(X) , for example, derived from the logistic regression models.
For time-to-event outcomes, adapting (3) for estimating �(t) is difficult as we do not observe individual sur- vival functions; yet, the individual survival functions can be obtained according to (2).Once the individual survival functions are obtained, it is easy to estimate the �(t) for a fixed time point t with (3), and the IPW estimator based on the pseudo-observation method is defined as [4,17],

Pseudo-observation-based direct confounding adjustment method
The direct confounding adjustment method for eliminating confounding bias in an OR model is another method to estimate ACE in observational studies.For time-toevent outcomes, Cox model is the most commonly used to be as an outcome regression model where h 0 (t) is the baseline hazard function, β is a set of parameters of covariates and γ is the parameter of treat- ment variable Z .According to (5), we can estimate indi- vidual survival function S(t|X i , Z i ) for the ith subject given X i and Z i where S 0 (t) can be estimated by fitting the Cox model.The survival functions in treatment and control are estimated by averaging individual survival (3) functions estimated from (6), respectively.And �(t) can be estimated by directly comparing the survival functions between two groups.However, this method requires the proportional hazards assumption.
Pseudo observations estimated from (2) can be used to predict individual survival functions with covariates and treatment variable for a fixed time point t , which avoids the proportional hazards assumption [4,17].Specifically, let t = {t 1 , . . ., t h } as the set of distinct times and define the pseudo-observation for subject i at time t h as S i (t h ) , where h = 1, . . ., H and i = 1, . . ., n .Then we can con- vert the Cox model to a generalized linear model (GLM) with a suitable link function g(x) = log −log(x) [15], Where ξ t h is the intercept term for time t h , and γ and β are regression parameters.Note that, although ξ t h are time specific, γ and β are shared between different time points t = {t 1 , . . ., t h }.Estimating the unknown param- eters γ , β and β H = ξ 1 , . . ., ξ t h could be solved by the following generalized estimating equation (GEE) where β * = (β H , γ , β), S i (t) = ( S i (t 1 ), . . ., S i (t h )) T and V −1 i is a working covariance matrix which may account for the correlation structure inherent to the pseudoobservations [17].The difference in survival functions with direct confounding adjustment based on the pseudo-observation method under model ( 7) can be estimated as If the outcome regression model ( 5) is correctly specified, the �(t) OR is consistent.

Multiply robust estimator for the difference in survival functions
We combined the advantages of previous MR (Han, 2014) and PO method to proposed the proposed MR method.The proposed method could consider a set of OR models and a sets of PS models and also avoid the proportional hazards assumption.The step of estimating the difference in survival functions is similar to the previous MR (Han, 2014).First, specifying multiple PS models and multiple OR models.Second, estimating weights of individual in the treatment and control groups, respectively.Finally, estimating the difference in survival functions between treatment and control groups.(7) Specifically, to achieve multiple robustness, we specify multiple PS models and multiple OR models, P = π l (X), l = 1, 2, 3, . . ., L for the propensity score and M = m k Z (X), k = 1, 2, 3, . . ., K for outcome regression, where m k Z (X) = m k (X, Z) .Without loss of generality, let I = 1, . . ., n 1 represent the indexes for treated subjects and J = 1, . . ., n 0 the indexes for untreated subjects.n 1 and n 0 = n − n 1 are the size of treatment and control groups, respectively.
Second, to recover survival function from treated subjects for a time point t , we impose w i (i ∈ I) as the weights of individual pseudo-observation S i (t) from the treated subjects, which can be estimated by maximizing i∈I w i according to the following constraints: Similarly, we also impose w j j ∈ J as the weights of indi- vidual pseudo-observation S j (t) from the untreated sub- jects, which can be estimated by maximizing j∈J w j subject to the following constraints: According to Lagrange multiplier method [19], it is easy to show that w i and w j maximizing the i∈I w i and j∈J w j are given by (10) where and ρ T t = ρ t1 , ρ t2 , . . ., ρ tS is S = J + K-dimensional vector satisfying the equations Due to the non-negativity of w i and w j , ρ T t must satisfy that 1 + ρ T 1 g 1 (X i ) > 0 and 1 + ρ T 0 g 0 X j > 0 .The esti- mation of w i and w j can be solved by the Newton-Raph- son algorithm [19].
Finally, the MR estimator of the difference in survival functions between treatment and control groups for the fixed time t is defined as S i (t) and S j (t) are the individual survival functions esti- mated with PO method.

Bootstrap confidence interval
We perform the bootstrap method to estimate the confidence interval of �(t) mr .Specifically, n subjects are resampled from the original data with the replacement for B times to obtain B bootstrap samples, where B is a user-specified number.For b = 1, . . ., B , let � b mr (t) be the estimated difference of survival functions from the b-th bootstrap sample.Then the bootstrap variance estimator for � b mr (t) is defined as

Simulation design
We conducted comprehensive simulation studies to evaluate the performance of the proposed method.There were three covariates X 1 , X 2 , X 3 distributed as stand- ard normal with mean zero and unit variance, where corr(X 1 , X 3 ) = 0.2 and X 2 was independent of the X 1 and X 3 .The treatment indicator Z was simulated from the Bernoulli distribution according to the following propensity score The survival times were simulated from a Cox-Weibull model with T 1 = −log(u) for control group, where L = −3 + 0.5 * (X 1 + X 2 ) , γ = 1 , u was simulated from Uniform(0,1).And we defined the proportional hazard setting via setting the shape and scale parameters [(v treatment and control groups (Figure S1).An independent censoring time was simulated from an exponential distribution with rates, and we set the different rates [ exp(−4)orexp(−3) ] to obtain censoring rates (approximately 25% or 50%).
In this simulation, we specified two models for propensity score, and two models for outcome regression.According to the data-generating process, π 1 X; β 1 and m 1 X, Z; γ 1 were correctly specified, while π 2 X; β 2 and m 2 X, Z; γ 2 were incor- rectly specified.We were interested in estimating the difference in survival functions �(t * ) at t * = 10and20 .We performed IPW, direct confounding adjustment, and multiply robust (MR) methods to estimate the �(t) .To distin- guish these estimation methods, two IPW-based estimators were denoted as "IPW.model1"and "IPW.model2"; two direct-confounding-adjustment-based estimators were denoted as "OR.model1" and "OR.model2"; for the MR estimators, each estimator is denoted as "MR-0000", where each digit of the four numbers, from left to right, indicates if π 1 X; β 1 , π 2 X; β 2 , m 1 X, Z; γ 1 or m 2 X, Z; γ 2 is included in the estimator ("1" means yes and "0" means no).
In addition, we defined the non-proportional hazard setting via setting the shape scale parameters [(v 1 = 0.005, 1 = 3), (v 0 = 0.005, 0 = 3.2)] in the treat- ment and control groups, respectively (Figure S1); and we also specified two models incorrectly for propensity score.In the study, we investigated a sample size of n = 200 and n = 500 with 1000 replications.We used three evaluation criteria: mean relative bias, root mean square error (RMSE), and coverage rate.The boxplots of mean relative bias, RMSE and coverage rate were shown in the main article (Figs. 1 2, 3 and 4) and the table of estimation results were shown in the supplementary material (Table S1-S4).

Simulation results
Figure 1 and Table S1 showed the simulation results of the difference between treatment and control groups in the proportional hazard setting with the different censoring rates (25% or 50%) and different sample sizes (200 or 500).We could draw a few conclusions from the simulation results.
is included in the estimator ("1" means yes and "0" means no) 1.When specifying a propensity score model π(X; β) , or an outcome regression model m(X, Z; γ ) : the biases of IPW and MR were negligible if π(X; β) was correctly specified (IPW.model1,MR1000).The biases of OR and MR were small if m(X, Z; γ ) was correctly modeled (OR.model1,MR0010), and MR0010 had the smallest RMSE among all estimators.The biases and RMSEs were large, and the coverage rates were much less than 95% if π (X; β) or m(X, Z; γ ) were incorrectly specified (IPW.model2,OR.model2, MR0100, MR0001).2. When specifying a propensity score model π(X; β) , and an outcome regression model m(X, Z; γ ) : the biases MR were ignorable when either π(X; β) or m(X, Z; γ ) was correctly specified (MR1010, MR1001, MR0110).The biases and RMSEs were severe large, and the coverage rates were well below 95% when the π (X; β) and m(X, Z; γ ) were both incorrectly specified (MR0101).MR1010, MR1001 and MR0110 had almost the same RMSEs, which were smaller than IPW.model1 and OR.model1.3. When specifying multiply propensity score models π(X; β) , and multiply outcome regression model models m(X, Z; γ ) : the small biases of MR1100, MR0011, MR1110, MR1101, MR1011, MR0111, and MR1111 well suggested the multiple robustness of our proposed methods.Based on our simulation results, the RMSEs of those estimators were almost all smaller than IPW.model1 and OR.model1, suggesting that adding more models did not lead to a significant increase in RMSE.
In addition, Fig. 2 and Table S2 showed the estimation results in the non-proportional hazard setting.Figures 3  and 4 and Tables S3 and S4 showed the simulation results in proportional and non-proportional hazard settings with two PS models specified incorrectly.Similar patterns were observed in Figs. 2, 3 and 4 and Tables S2-S4.MR estimators are denoted as "MR-0000", where each digit of the four numbers, from left to right, indicates if π 1 X; β 1 , π 2 X; β 2 , m 1 X, Z; γ 1 or m 2 X, Z; γ 2 is included in the estimator ("1" means yes and "0" means no) In conclusion, the simulation results showed that our MR method could provide more protection against model misspecification even when PS models are misspecified in the proportional or non-proportional hazard settings.

Empirical study
To illustrate the proposed method, we analyzed the realworld data on triple-negative breast cancer (TNBC) from Surveillance, Epidemiology, and End Results (SEER) [20].TNBC, characterized by an absence of estrogen, progesterone receptors, and human epidermal growth factor receptor 2, has high invasiveness, high metastatic potential, proneness to relapse, and poor prognosis [21].Chemotherapy remains the primary treatment for TNBC and improves the prognosis of TNBC patients [22].The objective of the study was to estimate the effect of chemotherapy on TNBC.Baseline information included marital status, race, laterality, grade, The American Joint Committee on Cancer (AJCC) stage, distant metastasis, age, surgery, and chemotherapy.Inclusion criteria are: (1) patients diagnosed between 2010 and 2012, (2) patients aged between 20 and 79, (3) one primary tumor only, (4) complete baseline and survival information.A total of 10,613 patients were included in the analysis, with 79.9% opting for chemotherapy and a censoring rate of 80%.The baseline information was summarized in Table 1.We found that the chemotherapy group had a higher proportion of patients with III/IV stage, ≥ 50 years old, receiving radiotherapy and surgery compared to the non-chemotherapy group.
To identify the candidate models for MR method, we explored the association of chemotherapy with all covariates through the multivariate logistic regression, and the association of survival with all covariates through the multivariate Cox model.Two sets of covariates for each model were identified using the significant level at 0.05 and 0.1.Two sets of covariates in [{π 1 X; β 1 , π 2 X; β 2 ] Fig. 4 Simulation results with different sample sizes = 200 or 500 and different censoring rates = 25% or 50% in the scenario where proportional hazard assumption dose not hold based on 1000 replications when PS models misspecified.The red dotted line represents the true value, the true values of �(t = 10) or �(t = 20) are 0.2524 and 0.3816, respectively.IPW: inverse probability weighting; OR: outcome regression; MR, multiply robust; MR estimators are denoted as "MR-0000", where each digit of the four numbers, from left to right, indicates if π 1 X; β 1 , π 2 X; β 2 , m 1 X, Z; γ 1 or m 2 X, Z; γ 2 is included in the estimator ("1" means yes and "0" means no) are: (i) marital status, grade, stage, age, radiotherapy, and surgery; (ii) marital status, race, grade, stage, age, radiotherapy, and surgery.Two sets of covariates in m 1 X, Z; γ 1 , m 2 X, Z; γ 2 are: (i) marital status, race, grade, stage, distant metastasis, age, radiotherapy, surgery, and chemotherapy; (ii) marital status, race, laterality, grade, stage, distant metastasis, age, radiotherapy, surgery, and chemotherapy.We applied the IPW, direct confounding adjustment, and MR methods to estimate the �(t) at the time point t * = year 1, 2, 3, 4, 5 and 6.The results were shown in Fig. 5 and Table S5.

Discussion
In this study, we considered estimating the difference in survival functions between control and treatment groups in observational studies.The proposed MR method allowed multiple PS and OR models, and the bias of the proposed method is small when any model for propensity score or outcome regression is correctly specified regardless of in proportional or non-proportional hazard settings.Although we mainly focused on estimating the difference in survival functions in methodology, our proposed method can be easily extended to estimate other time-to-event parameters.For example, the restricted average survival causal effect t * 0 S 1 (t) − t * 0 S 0 (t) com- pares the survival time at t * , and the survival quantile effect S 1 −1 (1 − q) − S 0 −1 (1 − q) compares the q -quantile of the survival time distribution between two groups [2].
Wang [4] combined the advantages of augmented inverse propensity weighted and PO method to proposed a simple, doubly robust method for estimating the causal effect of survival outcome [4].Although the DR method also can avoid proportional hazards assumption, the DR method may lead to biased estimates when OR model and PS model are both misspecified [4].By contrast, our proposed method could allow multiple PS models and multiple OR models, which provides extra protection from model misspecification.Our simulation studies showed that the IPW and direct confounding adjustment methods could lead to large bias and low coverage rate when misspecifying the propensity score and outcome regression models.By contrast, our proposed method showed satisfactory performance with finite sample sizes.
For the proposed method, the bias was ignorable and the coverage rate was close to 95% when correctly specifying any PS and OR models.
There are some limitations in the study.First, we assume that there is no unmeasured confounding; However, there are often confounding variables that cannot be measured in observational studies, which leads to a biased estimation [23,24].Second, our proposed method could not deal with competing risks, which are also common in survival data [25,26].Third, we identify the candidate models based on the significance level of 0.05 and 0.1 in the empirical study, while the model specification may need clinical knowledge and existing literature, or statistical methods [27,28].Fourth, although the proposed method could multiple PS models and multiple OR models, which provides multiple protection against model misspecification, there may a biased estimated when the specified models do not include a correct model.

Conclusions
Under the positivity, no unmeasured confounding (conditional exchangeability), consistency assumptions, the proposed MR method, pseudo-observation-based IPW and pseudo-observation-based direct confounding adjustment methods could be used to estimate the difference in survival functions.However, Pseudo-observation-based Fig. 5 The triple-negative breast cancer data analysis: estimated �(t * ) for t * = 1, 2, 3, 4, 5 and 6 years using different methods IPW or pseudo-observation-based direct confounding adjustment may lead to a biased estimate when the PS model or OR models is misspecified.By contrast, the proposed MR method based on the pseudo-observation approach has several advantages.First, the proposed method could allow a set of PS models and a set of OR models, and the proposed method has a small bias when any PS and OR models were correctly specified.Second.The proposed method considered the advantage of pseudo-observation approach, which avoids proportional hazards assumption.In actual research, the proposed estimator provided a new choice when researchers have a difficult time choosing only one model for their studies.

Fig. 1
Fig.1 Simulation results with different sample sizes = 200 or 500 and different censoring rates = 25% or 50% in the scenario where proportional hazard assumption holds based on 1000 replication.The red dotted line represents the true value, the true values of �(t = 10) or �(t = 20) are 0.1480 and 0.2725, respectively.IPW: inverse probability weighting; OR: outcome regression; MR, multiply robust; MR estimators are denoted as "MR-0000", where each digit of the four numbers, from left to right, indicates if π 1 X; β 1 , π 2 X; β 2 , m 1 X, Z; γ 1 or m 2 X, Z; γ 2 is included in the estimator ("1" means yes and "0" means no)

Fig. 2
Fig. 2 Simulation results with different sample sizes = 200 or 500 and different censoring rates = 25% or 50% in the scenario where proportional hazard assumption dose not hold based on 1000 replications.The red dotted line represents the true value, the true values of �(t = 10)or �(t = 20) are 0.2524 and 0.3816, respectively.IPW: inverse probability weighting; OR: outcome regression; MR, multiply robust; MR estimators are denoted as "MR-0000", where each digit of the four numbers, from left to right, indicates if π 1 X; β 1 , π 2 X; β 2 , m 1 X, Z; γ 1 or m 2 X, Z; γ 2 is included in the estimator ("1" means yes and "0" means no)

Fig. 3
Fig. 3 Simulation results with different sample sizes = 200 or 500 and different censoring rates = 25% or 50% in the scenario where proportional hazard assumption holds based on 1000 replications when PS models misspecified.The red dotted line represents the true value, the true values of �(t = 10) or �(t = 20) are 0.1480 and 0.2725, respectively.IPW: inverse probability weighting; OR: outcome regression; MR, multiply robust;MR estimators are denoted as "MR-0000", where each digit of the four numbers, from left to right, indicates if π 1 X; β 1 , π 2 X; β 2 , m 1 X, Z; γ 1 or m 2 X, Z; γ 2 is included in the estimator ("1" means yes and "0" means no)

Table 1
Baseline characteristics between chemotherapy and non-chemotherapy groups