Covariate balance-related propensity score weighting in estimating overall hazard ratio with distributed survival data

Background When data is distributed across multiple sites, sharing information at the individual level among sites may be difficult. In these multi-site studies, propensity score model can be fitted with data within each site or data from all sites when using inverse probability-weighted Cox regression to estimate overall hazard ratio. However, when there is unknown heterogeneity of covariates in different sites, either approach may lead to potential bias or reduced efficiency. In this study, we proposed a method to estimate propensity score based on covariate balance-related criterion and estimate the overall hazard ratio while overcoming data sharing constraints across sites. Methods The proposed propensity score was generated by choosing between global and local propensity score based on covariate balance-related criterion, combining the global propensity score fitted in the entire population and the local propensity score fitted within each site. We used this proposed propensity score to estimate overall hazard ratio of distributed survival data with multiple sites, while requiring only the summary-level information across sites. We conducted simulation studies to evaluate the performance of the proposed method. Besides, we applied the proposed method to real-world data to examine the effect of radiation therapy on time to death among breast cancer patients. Results The simulation studies showed that the proposed method improved the performance in estimating overall hazard ratio comparing with global and local propensity score method, regardless of the number of sites and sample size in each site. Similar results were observed under both homogeneous and heterogeneous settings. Besides, the proposed method yielded identical results to the pooled individual-level data analysis. The real-world data analysis indicated that the proposed method was more likely to find a significant effect of radiation therapy on mortality compared to the global propensity score method and local propensity score method. Conclusions The proposed covariate balance-related propensity score in multi-site distributed survival data outperformed the global propensity score estimated using data from the entire population or the local propensity score estimated within each site in estimating the overall hazard ratio. The proposed approach can be performed without individual-level data transfer between sites and would yield the same results as the corresponding pooled individual-level data analysis. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-023-02055-8.


Background
The growth of large multi-site medical datasets is accelerating with the development of big data and advances in data collection and storage.If data from multiple sources can be combined, the study power and generalizability can be improved, and multi-site research collaboration can also be carried out.However, in research of data from multiple sites, it is generally challenging to share information at the individual level among sites due to privacy, network security, and transmission speed [1].Therefore, it is necessary to develop statistical methods that only require summary-level information to provide personal privacy protection while analyzing data from multiple sites.
In biomedical research, a common outcome of interest is the time-to-event endpoint, which focuses on whether or not an event occurred and when that event occurred.Cox proportional model is a popular semi-parametric approach to describe the relationship between the timeto-event endpoints and a set of covariates by estimating the hazard ratios [2].In multi-site, distributed data, Lu et al. and Vilk et al. developed distributed Cox model based on iterative methods, which required iterative data sets to be transferred multiple times between the analysis center and each site [3,4].Li et al. proposed a method for distributed Cox regression that did not need multiple iterative file transfers among sites, but used the summary-level statistical data received from each site to find the solution of parameters based on the iterative method in the analysis center [5].
In observational studies, the inverse probability weighted (IPW) Cox regression model can be used to estimate the overall hazard ratio while adjusting for measured confounders through weighting [6].Propensity score is the probability of treatment assignment conditional on the covariates and the IPW method assigns weight as the inverse of the probability of receiving the observed treatment to each individual [7][8][9].In multisite, distributed studies, considering propensity score weighting, Yoshida et al. compared three methods of sharing aggregate-level information to assess the performance of estimating hazard ratio from cox models in simulated distributed data networks [10].The estimated results were comparable to the pooled individual-level data analysis.Shu et al. estimated the hazard ratio in multi-site study based on the IPW Cox model with summary-level information and provided theoretical justification [11].Most multi-site studies obtained estimation based on local propensity score and local weight which fit propensity score models using data within each site.The local propensity score considered the possible heterogeneity of each site, while the sample size used to fit models was reduced.Alternatively, a global propensity score model can also be fitted using data from all sites based on distributed logistic regression, and the estimated treatment effect will be equivalent to a weighted pooled individual-level analysis [12].However, when there is unknown heterogeneity of covariates in different sites, either global or local propensity score to estimate the overall treatment effect may result in potential bias or lower efficiency.
In this article, we propose a new method that uses only the summary-level statistics from each site to estimate the overall hazard ratio based on the new proposed propensity score in distributed survival data.The proposed propensity score is generated by choosing between global and local propensity score based on criteria to better control confounding bias and improve estimation efficiency.Our proposed propensity score is motivated by Dong et al. who proposed the subgroup balancing propensity score to estimate the subgroup treatment effect, which combined the global and local propensity score estimation to ensure covariate balance and control variance inflation [13].
The rest of the article is organized as follows.In Sect."Data transfer from each site k to the analysis center: each site transmits distinct observed event times for site k,, to the analysis center." in methods, we present the weighted estimation of overall hazard ratio through IPW Cox model.In Sect. 2 in methods, we present the proposed method to estimate the propensity score, and provide respective algorithms using summary-level information to obtain the proposed propensity score.In Sect. 3 in methods, we present the methods of solving the estimating equations to estimate the overall hazard ratio based on the proposed propensity score.In simulations section, we present the simulation results demonstrating the performance of the proposed method and compare that to the global or local propensity score method and pooled individual-level data analysis.In application section, we give a real-world data application for illustration.At the end of the article, we conclude with some discussion.

Weighted estimation of the overall hazard ratio
Let X be a vector of measured confounders, A be a binary treatment variable ( A = 1 if treated and A = 0 if untreated).T * is the true survival time, C is the cen- soring time which assumed to be independent of T * given X .Due to censoring, we observe T = min(T * , C) and δ = I(T * ≤ C).I(•) is the indicator function.Sup- pose we observe n independent sample {A i , T i , X i , δ i } , i = 1, . . ., n , from K data-contributing sites.Let � k = {i : iinsitek, fori = 1, . . ., n} be the index set for individuals belonging to the k th sites with size n k and G i = k if individual i belongs to the k th site, where k = 1, . . ., K.
Suppose we have d distinct observed event times across all sites whereT For j = 1, . . ., d , let D j be the set of individuals who have the observed event time ofT D j ,D j = {i : . ., n} , and let R j be the risk set for individuals who are at risk at timeT D j ,R j = {i : T i ≥ T D j , i = 1, . . ., n} .Also, let R j (k) be the risk set for individuals who are at risk at time . ., n} and let R k,j (k′) be the risk set for individuals who are at risk at time , where k′ = 1, . . ., K .
In this article, we focus on estimating the overall hazard ratio, exp(θ) , between treatment and control groups in the entire population: where 0 (t) is the baseline hazard function.
IPW Cox regression model is commonly used to estimate hazard ratio.Based on the propensity score e = P(A = 1|X) , the inverse probability weight is w = A e + 1−A 1−e .We assume that the hazard ratio to be common across K data-contributing sites and all sites have a common baseline hazard 0 (t) .The weighted partial likelihood score function for the common log hazard ratio θ is [14], The estimate of the log hazard ratio θ can be obtained by solving Eq. (1).

Proposed propensity score weighting method for estimating the overall hazard ratio
We propose a new method to estimate the overall hazard ratio based on our proposed propensity score weight, which does not require individual-level data sharing

Global and local propensity score
In the setting of distributed data with K sites, the propensity score can be estimated globally using data from the entire population or locally within each site.Taking logistic regression models as an example, global propensity score is estimated by fitting logistic regression models to the overall sample: Since we assume that data at the individual level cannot be shared among sites, data from the full sample cannot be directly used to fit model, and only summarylevel statistics can be obtained from each site.The global propensity score can be obtained by distributed logistic regression.Let e(X, α g ) = P(A = 1|X), and the logistic loss is The distributed Newton-Raphson method [15,16] is used to obtain the empirical loss minimizer α := arg min α i,k M logis A i,k , X i,k , α through iterations: where is the global Hessian matrix [15].The iteration process is as follows: 1. Initialize α (0) = argmin α i∈� 1 M logis (A i , X i , α) based on data from the analysis center (e.g.site 1), and set t = 0. 2. Repeat the following steps until t meets the max iter- ation times or �G   (t) ) and the global Hessian matrix H logis n in the analysis center.c) Update α (t) in the analysis center as Then we could obtain the global propensity score e g = e X, α g based on the estimated parameter α g from itera- tions.It is worth noting that each site computes its own gradient and Hessian matrix, which are subsequently summarized to update the parameters.As a result, any site can be chosen as the analysis center.It is generally recommended to consider the hardware capabilities and computational power of each site when determining the analysis center.
An alternative approach is to estimate the local propensity score within each site: We fit the model at each site using the observations from that site and obtain the local propensity score e l = e X, α l,k based on the estimated parameter α l,k from each site.

Proposed propensity score
Motivated by Dong et al. [13] we propose a balancing propensity score to estimate the overall hazard ratio in distributed data to improve the estimation efficiency.The proposed method is to choose between the global and local propensity score by optimizing the overall confounder balance for propensity score weighting.
where e i is the estimated propensity score, x ip is the value of the pth measured confounder X p for individual i ; σ p is the standard deviations of X p for overall population.M p accounts for balancing of confounder X p in the overall sample.
Notably, M p could not be directly estimated in distrib- uted data and needs file transfer between sites.M p could be rewritten as: To obtain M p , each site should transfer the following items to the analysis center: M p could then be calculated in the analysis center using these transferred values from each site based on (2.6).The objective function is the sum of the squares of M p .
We choose between global and local propensity scores for each site to minimize the objective function F.

Stochastic search algorithm to estimate the proposed propensity score
Dong and others proposed a stochastic search algorithm to find the minimized objective function Fin (2.7) [13].For each site k = 1, . . ., K , let S k = 1 if individu- als in site k are weighted based on the estimated global propensity score, and S k = 2 if individuals in site k are weighted based on the estimated local propensity score.
The search process is as follows: 1. Initially, let all sites use the global propensity score and S k = 1 for k = 1, . . ., K .The analysis center cal- culates the initial value F int for the objective func- tion F using information transferred from each site.Let the minimum value F min = F int , and let  For each site k = 1, . . ., K , if S k,min = 1 then the pro- posed propensity score for site k is equal to the global propensity score; otherwise the proposed propensity score is equal to the local propensity score estimated within that site, i.e.,

Estimation of overall hazard ratio with distributed survival data based on proposed propensity score
Based on the proposed propensity score e i = e p,i , the inverse probability weight for individual i is Then we could estimate the log hazard ratio θ by solving Eq. ( 1).In order to obtain θ in distributed data, Eq. ( 1) can be rewritten as lǫR k,j (k′) w l exp(θA l )A l and lǫR k,j (k′) w l exp(θ A l ) in the score Eq. (3.1) can be expressed as e p = e g forsitek, ifS k,min = 1 e p = e l forsitek, ifS k,min = 2 Then the Eq.(3.1) can be further rewritten as To solve (3.2), we need to know: (u1) Particularly in (u2), should be calculated for all d distinct observed event times across all sites.Therefore, each site needs to know the d distinct observed event times, which requires each site to first send d(k) observed event times in that site to the analysis center.Then the analysis center needs to summarize the event times from each site and send back all the d distinct event times to each site.With information on d distinct event times, each site k could then calculate lǫR j (k),A l =1 w l and lǫR j (k),A l =0 w l , and sends the results to the analysis center to sum up.
Detailed procedures to obtain the estimated log hazard ratio θ in distributed data: Data transfer from analysis center to each site: The analysis center summarizes the distinct observed event times across all sites, and transmits all d event times, T D 1 , T D 2 , . . ., T D d , to each site.3. Calculation in each site and data transfer from each site to the analysis center: Each site k calcu- lates lǫR j (k),A l =1 w l and lǫR j (k),A l =0 w l for d dis- tinct observed event times, and transmits the calculation result to the analysis center.4. Data transfer from the analysis center to each site: Analysis center summarizes K k=1 lǫR j (k),A l =1 w l and K k=1 lǫR j (k),A l =0 w l for d distinct observed event times, and transmits the summarized result to each site. 5. Data transfer from each site to the analysis center: For d(k) distinct observed event times within each site, each site generates a summary-level table with 4 columns and d(k) rows.The four columns are (i) lǫR k,j (k′),A l =0 w l for the d(k) distinct observed event times T D k,j in site k.Each site transmits the 4-column table to the analysis center.An example of the 4-column summary table is presented in Table S1.
In particular, for all d distinct observed event times across all sites, K k′=1 lǫR k,j (k′),A l =1 w l and K k′=1 lǫR k,j (k′),A l =0 w l has been calculated and transmitted to each site in step 4. Therefore, for d(k) distinct event times observed in site k, columns (iii) and (iv) can be directly obtained from file transfer in step 4. 6.The analysis center solves equation (3.1) based on file transfer in step 5, and obtains the estimated log hazard ratio θ .

Simulation design
To examine the performance of the proposed method, we performed two sets of simulations.The first simulation was to compare the performance of our proposed method with the global propensity score for the entire population or local propensity score estimated within each site in distributed data with K sites.The second simulation was to compare our proposed method to the results obtained from the corresponding pooled individual-level data.
In the stochastic search process, L 1 was set to be 500, and L 2 was set to be 20.We considered K = 5, 10, 20 and n k = 500, 1000, 2000 to evaluate the impact of different numbers of sites and different sample sizes in each site on performance.We reported following performance measures: absolute bias, root mean squared error (RMSE), and ratio of RMSE of different methods against the proposed method (r-RMSE).We also presented the measure of coverage probability; however, due to constraints regarding computational costs, we only provided results for 5

Simulation results
When the covariates and the treatment assignment in each site were homogenous, the absolute bias was small for all the methods, i.e., weighted using global, local, and proposed propensity score.Compared with global weight and local weight, our proposed weight had a smaller RMSE, regardless of the number of sites and sample size in each site.The ratio of RMSE of the global or local weight to our proposed weight (r-RMSE) was up to 1.578 (Table 1).
In the heterogeneity setting, the absolute bias of our method was mostly somewhere between global and local weight, or close to that of global and local weight.Regarding RMSE, the RMSE of our proposed method remained the smallest, and the r-RMSE was up to 1.540 (Table 2).The results are similar when the treatment assignment was generated with X = (X 1 , X 2 , X 3 , X 4 ) (Table 3, Table 4).
Besides, we have computed the 95% coverage probability for 5 sites, and our proposed method achieved a coverage probability close to the nominal 95%, and was closer to the nominal 95% compared to the global and local method (Table 5).
When comparing our proposed method to the results obtained from the corresponding pooled individual-level data analysis, as expected, our proposed method in distributed data and pooled individual-level data analysis yielded identical results under all scenarios (Table 6).

Application
We apply the proposed method to real-world triplenegative breast cancer (TNBC) data from Surveillance, Epidemiology, and End Results (SEER) [17].TNBC is an aggressive subtype of breast cancer, accounting for about 20% of all breast cancer cases [18].It is known that radiation therapy can improve locoregional control in breast cancer patients and has a positive impact on the long-term survival of high-risk patients [19].
The dataset included 4120 patients aged 20-79 years diagnosed with TNBC in 2010 with complete information.The treatment variable was set to 1 if the patient received radiation therapy and 0 if not.The outcome of interest was the time to death during the follow-up of up to 71 months.Descriptive characteristics of patients according to radiation therapy were presented in Table S2.We estimated the hazard ratio and 95% confidence interval after adjusting for age, race, marital status, laterality, grade, the American Joint Committee on Cancer (AJCC) stage, surgery, distant metastasis, and chemotherapy in the propensity score model.
The patients were from five states: Connecticut ( n 1 = 717 ), Hawaii ( n 2 = 274 ), Iowa ( n 3 = 723 ), Ken- tucky ( n 4 = 1176 ), and Louisiana ( n 5 = 1230 ).Descrip- tive characteristics of patients according to five sites were presented in Table S3.We compared the proposed method with methods based on the global or local propensity score in the distributed survival data with 5 sites.We further compared the estimates from proposed methods in distributed data to estimates from the corresponding pooled individual-level data analyses.
The confidence intervals were calculated using the bootstrap method with 200 replications [11].All n indi- viduals in K sites were assigned ID of {1, 2, . . .n} .In each bootstrap replication, the analysis center re-sampled with replacement from {1, 2, . . .n} and sent the re-sampled ID of the 200 replications to each site.Each site then prepared 200 bootstrap samples based on the instruction from the analysis center.The sample size of the resulting bootstrap samples for each site may differ from that site's original size.
Table 7 presented the estimated hazard ratio and their 95% confidence intervals.Results from the proposed methods and methods based on global or local propensity score indicated that radiation therapy had a positive impact on long-term survival in patients with TNBC.The proposed method was more likely to find a significant effect (hazard ratio, 0.679; 95% confidence interval, 0.585 to 0.789) compared to the global propensity score method (0.737; 0.653 to 0.832) and local propensity score method (0.709; 0.619 to 0.812).Besides, the proposed method and methods based on global or local propensity score produced hazard ratio estimates and 95% confidence intervals equivalent to those obtained from the corresponding pooled individual-level data analyses.

Discussion
We have proposed a covariate balance-related propensity score to create inverse probability weight to make inferences on the overall hazard ratio in multi-site distributed survival data.This proposed propensity score is produced based on covariate balance-related criterion in the entire population.The proposed propensity score is shown to perform better than the global propensity score estimated using data from the entire population or the local propensity score estimated Table 2 Comparisons of proposed weight, global weight and local weight to estimate the overall hazard ratio in the simulations, with heterogeneous design and treatment assignment generated with X      within each site.Besides, the proposed method could be conducted without individual-level data transferred among sites and would yield identical results to the corresponding pooled individual-level data analysis.The proposed method is developed based on distributed data with multiple sites.Since our proposed method in distributed data and pooled individual-level data analysis yield identical results, the proposed method can be extended to the general studies that data is distributed in multiple sites, but data communication among sites is not restricted.Therefore, in multi-site data, whether or not data transmission between sites is allowed, we recommend using our proposed approach and selecting between the global and local propensity Table 5 The coverage probability of different propensity score methods with the number of sites set to 5 Setting 1: homogenous design and treatment assignment generated with X 1 ~X4 and X 2 1 , X 1 X 4 ; setting 2: heterogeneous design and treatment assignment generated with X 1 ~X4 and X   score in each site to estimate the overall treatment effect with efficiency.
In our real-world data analysis, we calculated the 95% confidence intervals based on the global bootstrap method, which re-sampled from the entire population.In practice, researchers can also use the alternative local bootstrap method for simplicity [11].Specifically, each site could generate its 200 or more bootstrap samples with replacement from the original sample in that site, which is the conventional bootstrap method within the site.We also applied the local bootstrap method to the real-world data, and the result was similar and presented in Table S4.
Our method is proposed based on the unstratified Cox model.Sometimes, if we assume the baseline hazard to vary by site, stratification on site is helpful and the stratified Cox model is used accordingly.In this case, the stratified Breslow-type weighted partial likelihood would be used instead of (1) in our study [11].The main difference is that each site no longer needs to know the information of all d distinct observed event times across all sites, but only needs to obtain its own summarized information of d(k) distinct observed event times.Accordingly, the detailed steps 1 to 5 in calculating the overall hazard ratio in our study can be replaced by a simple step, i.e., to obtain the following information within each site: (i) iǫD k,j (k) w i A i , (ii) iǫD k,j (k) w i , (iii) lǫR k,j (k′),A l =1 w l for the d(k) distinct observed event times T D k,j in site k, (iv) lǫR k,j (k′),A l =0 w l for the d(k) dis- tinct observed event times T D k,j in site k.Under such circumstances, only one file transfer from each site to the analysis center is required after obtaining the proposed propensity score.
When conducting propensity score-based analysis, it is crucial to correctly identify the set of confounders and specify the propensity score model.We assume that all confounding variables are measurable and known in our study, and that there is no misclassification, missing data, or time-varying covariates.In future studies, it is possible to consider extending our method to situations where these assumptions are not satisfied or data with a large number of candidate covariates [20,21].

Conclusions
In this study, we proposed a covariate balance-related propensity score to estimate the overall hazard ratio, which only required summary-level information across sites to provide personal privacy protection.The proposed propensity score was estimated based on covariate balance-related criterion, and was shown to outperform the global propensity score estimated using data from the entire population or the local propensity score estimated within each site.
logis n (α)� ≤ pre-specified threshold.a) Transfer α (t) to each site to compute the local gradient G logis n k (α (t) ) and the local Hessian matrix

1 .
Data transfer from each site k to the analysis center: each site transmits d(k) distinct observed event times for site k,T D k,1 , T D k,2 , . . ., T D k,d(k) , to the analysis center.

1 ~X4 and X 2 1 , X 1 X 4 Bias= 5 K = 10 K
Absolute bias, RMSE Root mean squared error, r-RMSE Ratio of RMSE of global weight or local weight against proposed weight K

K = 5 K = 10 K
and transfer the local gradient and local Hessian matrix to the analysis center.
K. 2. Repeat the following steps until the number of repeats is no smaller than L 1 or F min does not change over L 2 repeats.The values of L 1 and L 2 are pre-specified.3. Randomly permutate all the sites { 1 , 2,…, K} and get a new random ordering of the K sites, { A 1 , A 2 , . . ., A K }. 4. Following the order { A 1 , A 2 , . . ., A K } in step (a), for each site, choose the global or local propensity score that gives a smaller value of objective function F while fixing the propensity score chosen for other K − 1 sites each time.If site k chooses the global pro- pensity score, then S k = 1 ; if site k chooses the local propensity score, then S k = 2.
After all sites have selected the global or local propensity score, calculate F rep for this repeat in the analysis center.6.If F rep in step (c) is smaller than F min , then update F min = F rep and S k,min = S k ; if F rep ≥ F min , then keep F min and S k,min unchange.

Table 1
Comparisons of proposed weight, global weight and local weight to estimate the overall hazard ratio in the simulations, with homogenous design and treatment assignment generated with X 1 X 4 Bias Absolute bias, RMSE Root mean squared error, r-RMSE Ratio of RMSE of global weight or local weight against proposed weight K = 5

Table 3
Comparisons of proposed weight, global weight and local weight to estimate the overall hazard ratio in the simulations, with homogenous design and treatment assignment generated with X Bias Absolute bias, RMSE Root mean squared error, r-RMSE Ratio of RMSE of global weight or local weight against proposed weight

Table 4
Comparisons of proposed weight, global weight and local weight to estimate the overall hazard ratio in the simulations, heterogeneous design and treatment assignment generated with X Bias Absolute bias.RMSE Root mean squared error, r-RMSE Ratio of RMSE of global weight or local weight against proposed weight 21 , X 1 X 4

Table 6
Comparisons of the proposed method in distributed data and corresponding pooled individual-level data analysis Bias absolute bias, RMSE root mean squared error, r-RMSE ratio of RMSE of global weight or local weight against proposed weight Setting 1: homogenous design and treatment assignment generated with X 1 ~X4 and X 2 1 , X 1 X 4 ; setting 2: heterogeneous design and treatment assignment generated with X 1 ~X4 and X 2 1 , X 1 X 4

Table 7
Estimation of overall hazard ratio and the corresponding 95% confidence intervals using different propensity score estimation methods in distributed data and pooled individuallevel data