Skip to main content

Treatment effect estimation using the propensity score in clinical trials with historical control

This article has been updated

Abstract

Background

Clinical trials assessing new treatment effects require a control group to compare the pure treatment effects. However, in clinical trials on regenerative medicine, rare diseases, and intractable diseases, it may be ethically difficult to assign participants to the control group. In recent years, the use of historical control data has attracted attention as a method for supplementing the number of participants in the control group. When combining historical control data with new randomized controlled trial (RCT) data, the assessment of heterogeneity using outcome data is not sufficient. Therefore, several statistical methods that consider participant outcomes and baseline characteristics, including the propensity score (PS) method have been proposed.

Methods

We propose a new method considering “information on whether the data are RCT data or not” in the PS model when combining the RCT and historical control data. The performance of the proposed method in estimating the treatment effect is evaluated using simulation data.

Results

When the distribution of covariates is similar between the RCT and historical control data, not much difference in performance is found between the proposed and conventional methods to estimate the treatment effect. On the other hand, when the distribution of covariates is not similar between the two kinds of data, the proposed method shows higher performance.

Conclusions

Even when it is not known whether RCT and historical control data are similar, the proposed PS model is useful to estimate the treatment effect appropriately in RCTs using historical control data.

Peer Review reports

Introduction

Clinical trials that assess new treatment effects require a control group to compare the pure treatment effects, which exclude baseline characteristics [1]. Randomized controlled trials (RCTs) are considered the gold standard approach in confirmatory trials for reducing bias and assessing objective effects. However, in clinical trials for regenerative medicine, rare diseases, and intractable diseases, random assignment of participants to the control group may be ethically difficult. Recently, there has been active collection of real-world data and construction of a disease registry [2,3,4], and the utilization of historical control data has attracted attention as a supplement for the number of control group participants in clinical trials. Appropriate utilization of historical control data can ensure that patients are offered promising treatments faster by reducing the number of participants assigned to control groups, thus accelerating drug development [5, 6]. The U.S. Food and Drug Administration has issued draft guidance on natural history studies for rare disease drug development [7], and further utilization of external control is expected [2,3,4].

The use of historical control data is still being debated [8,9,10]. Frequentist approaches include, the pooling method in which historical control data are equated with the new trial control group and merged as is, and the test-then-pool method, which is used after determining the similarity between both outcome data by hypothesis test [11]. Bayesian approaches include power priors [12] and hierarchical modeling [13, 14], which discount the amount of information in historical control data [11, 15]. A previous study proposed a method that calculates the difference between outcome data of a new trial control group and historical control data and used weighting as an estimate of heterogeneity [16]. Evaluation of heterogeneity with outcome data is useful, but not sufficient in situations with different measurement periods and conditions. Besides, the information from historical control data may distort the true results from new trials [15], or conversely, historical control data may be hardly used [16], which poses a large risk for implementation.

In the causal inference framework, propensity scores (PS) [17, 18] may be used to compare groups that are not randomized. The PS indicates the probability of treatment allocation calculated using baseline characteristics. Thus, by aligning the baseline characteristics between treatment groups, it is possible to estimate the treatment effect while minimizing the effect of confounding on treatment allocation. When utilizing historical control data, a method using the PS has been proposed for considering the heterogeneity of baseline characteristics. In general, the matching [19, 20] and inverse probability of treatment weighting (IPTW) [21] methods are used as PS methods [22, 23]. Methods using PS to assess the generalizability of the population participating in RCT to the patient population [24], and to merge RCT data with observational data [25] have also been proposed. Additionally, a method combining the PS methods and Bayesian dynamic borrowing framework has been proposed [26].

Furthermore, as this study considers a special clinical trial that uses historical data in combination with new RCT data includes information on whether the data are RCT or historical control data. This information could be an important confounding factor along with baseline characteristics. Accordingly, we evaluate the performance of the method used for the clinical trial that newly considers “information on whether the data are RCT data or not” in the conventional PS model when estimating the treatment effect using simulation data.

Proposal of the PS model

In a clinical trial in which the primary endpoint is binary outcome \(Y\) (presence or absence of an event), we assume historical control data are combined with new two-armed RCT data as part of a control group. \({Y}_{i}=1\) indicates that an event has occurred, and \({Y}_{i}=0\) indicates that no event has occurred with participant \(i\ \left(i=1\dots l\right)\). We set \(T\) as the treatment group indicator (\({T}_{i}=1\) for the treatment and \({T}_{i}=0\) for the control groups for participant \(i\)) and \(X\) as the vector of all covariates \({X}_{j}\) (\(j=1\dots\textit{k}\) and \({X}_{ij}\) denotes the \(\textit{j}\)th covariate of participant \(i\)), which are the possible confounding factors. When estimating the PS, a model would generally be expressed as

$$\begin{array}{c}\pi =\mathrm{logit}\left\{{\text{Pr}}\left(T=1|X\right)\right\}={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\dots +{\beta }_{k}{X}_{k},\end{array}$$
(1)

using a logistic regression [27, 28], where \(\beta_j\ \left(j=1\dots\textit{k}\right)\) denotes a coefficient of the regression model.

Here, we might consider the information on whether the data were derived from the new RCT or historical control data as an important confounding factor. Therefore, in the proposed method of this study, the PS model newly considers information on whether the data are RCT data or not and sets that information as indicator variable \(X_\textit{r}\). \({X}_{ir}=1\) indicates that participant \(i\) is from the RCT, and \({X}_{ir}=-1\) indicates that participant \(i\) is from the historical control group. As a proposed method including \(X_\textit{r}\), the PS model could be expressed as

$$\begin{array}{c}{\pi }^{*}=\mathrm{logit}\left\{{\text{Pr}}\left(T=1|X,{X}_{r}\right)\right\}={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\dots +{\beta }_{k}{X}_{k}+{\beta }_{r}{X}_{r}.\end{array}$$
(2)

We considered that the performance in estimating the treatment effect between the conventional method using \(\pi\) and proposed method using \({\pi }^{*}\) may vary due to the difference in the distribution of covariates between the RCT and historical control data. In Stimulation study section, we evaluate the performance of the method using simulated data.

As a PS method, although the matching method is easy to understand, there is a possibility that the amount of information will be drastically reduced. In this study, we apply the IPTW method to utilize more information when evaluating the model’s performance. When estimating the treatment effect, each participant’s weight \(w\) could be \(w=T/{\text{expit}}\left(\pi \right)+\left(1-T\right)/\left\{1-{\text{expit}}\left(\pi \right)\right\}\) in the conventional method and \(w=T/{\text{expit}}\left({\pi }^{*}\right)+\left(1-T\right)/\left\{1-{\text{expit}}\left({\pi }^{*}\right)\right\}\) in the proposed method.

Simulation study

Settings

In this simulation study, to evaluate the treatment effect, we set the total number of participants as n = 900 and the allocation ratio between the RCT treatment group, RCT control group, and historical control group as 1:1:2. Moreover, we set the outcome event rates as 50%, 10%, and 5%; the odds ratios as 1.0, 2.0, 5.0, and 10.0; and the two-sided significance level as 5%. Furthermore, we also examined cases where the number of participants was small. The simulation results assuming the total number of participants as n = 200 are shown. The method and conditions in the simulation setting are the same as those shown in the setting assuming that n = 900, except for the total number of participants. The supplementary examination was conducted by assuming a situation with odds ratios of 1.5 and 2.5 (Additional file 1: Appendix A). In addition, we assume a situation wherein the allocation ratios are different (Additional file 1: Appendix B) and one of the four covariates is binary data (Additional file 1: Appendix C). We also conducted simulations in which the assignment of treatment variables was completely random in the RCT population (Additional file 1: Appendix D), and simulations were based on parameter settings from the actual clinical trial [29] (Additional file 1: Appendix G). To estimate the treatment effect, the IPTW using the PS method is applied, and the odds ratio based on the weight is estimated by the logistic regression model.

The performance measurements of the simulation result include the following: (1) difference of the estimated log odds ratio from the true log odds ratio (bias), (2) mean squared error (MSE), (3) coverage of 95% confidence interval (coverage), and (4) type I error rate and power. The simulation data are generated while assuming two scenarios wherein the distribution of covariates is either similar or not similar between the RCT and historical control data.

Scenario (I)

In this situation, the distribution of covariates is similar between the RCT and historical control data. From the multivariate standard normal distribution, four covariates are generated for participant \(i\) as

$$\begin{array}{c}\left\{X_{i1},X_{i2},X_{i3},X_{i4}\right\}\sim N\left(0,1\right)\end{array}.$$
(3)

Here, the true PS model \(\pi_{\textit{i},\textit{true}}\) is

$$\begin{array}{c}\pi_{\textit{i},\textit{true}}=\mathrm{logit}\left\{\text{Pr}\left(T=1\vert X\right)\right\}=\beta_0+\beta_1X_{i1}+\beta_2X_{i2}+\beta_3X_{i3}+\beta_4X_{i4},\end{array}$$
(4)

and the parameters are \(\left\{{\beta }_{0}, { \beta }_{1}, { \beta }_{2},{ \beta }_{3},{ \beta }_{4}\right\}=\left\{{b}_{0}, 1.00, -0.50, 0.25, 0.10\right\}\). \({b}_{0}\) is a constant correction value corresponding to the treatment allocation ratio (Additional file 1: Appendix E). Based on Eq. (4), each participant’s treatment allocation is determined from the Bernoulli distribution:

$$\begin{array}{c}T_i\sim Bernoulli\left\{\frac{\text{exp}\left(\pi_{\textit{i},\textit{true}}\right)}{1+\text{exp}\left(\pi_{\textit{i},\textit{true}}\right)}\right\}.\end{array}$$
(5)

The model that generates outcome data \({y}_{i}\) is as follows:

$$\begin{array}{c}y_\textit{i}=\mathrm{logit}\left\{\text{Pr}\left(Y=1\vert X\right)\right\}=\alpha_0+\beta_{treat}T_i+\alpha_1X_{i1}+{\alpha_2X}_{i2}+{\alpha_3X}_{i3}+\alpha_4X_{i4}+\varepsilon_i/100,\end{array}$$
(6)

where \(\left\{{\alpha }_{0},{ \alpha }_{1},{ \alpha }_{2},{ \alpha }_{3},{ \alpha }_{4}\right\}=\left\{{a}_{0}, 0.274, 0.137, -0.137, 0.137\right\}\). Here, \({\beta }_{treat}\) is the true log odds ratio of the treatment effect, and the error term \({\varepsilon }_{i}\sim N\left(0, 1\right)\) is generated according to independent normal distribution. Besides, \({a}_{0}\) is a constant correction value corresponding to the outcome event rate (Additional file 1: Appendix E). Based on Eq. (6), each participant’s outcome \({Y}_{i}\) is determined from the Bernoulli distribution:

$$\begin{array}{c}Y_i\sim Bernoulli\left\{\frac{\text{exp}\left(y_\textit{i}\right)}{1+\text{exp}\left(y_\textit{i}\right)}\right\}.\end{array}$$
(7)

Scenario (II)

In this situation, the distribution of covariates is not similar between the RCT and historical control data. As with scenario (I), after generating covariates from the multivariate standard normal distribution,

$$\begin{array}{c}\left\{{{X}^\prime}_{i1},{ {X}^\prime}_{i2},{ {X}^\prime}_{i3},{ {X}^\prime}_{i4}\right\}\sim N\left(0, 1\right),\end{array}$$
(8)

each covariate in the RCT data are transformed as follows:

$$\begin{array}{c}\left\{{X}_{i1}={{X}^\prime}_{i1}-1, { X}_{i2}={ {X}^\prime}_{i2}\times 0.7, { X}_{i3}={\text{ln}}\left|{{X}^\prime}_{i3}\right|,{ X}_{i4}={ {X}^\prime}_{i4}\right\}.\end{array}$$
(9)

For historical control data, the covariates without transformation, \({X}_{i1},{ X}_{i2},{ X}_{i3},\mathrm{and}\, {X}_{i4}\), are simply used from the generation of standard multivariate normal distributions, that is,

$$\begin{array}{c}\left\{{X}_{i1}={{X}^\prime}_{i1}, { X}_{i2}={ {X}^\prime}_{i2}, { X}_{i3}={{X}^\prime}_{i3},{ X}_{i4}={ {X}^\prime}_{i4}\right\}.\end{array}$$
(10)

Here, the true PS model \(\pi_{\textit{i},\textit{true}}^{\mathit\ast}\) is provided as

$$\begin{array}{c}\pi_{\textit{i},\textit{true}}^{\mathit\ast}=\mathrm{logit}\left\{\text{Pr}\left(T=1\vert X,X_r\right)\right\}=\beta_0+{\beta_1X}_{i1}+{\beta_2X}_{i2}+{\beta_3X}_{i3}+{\beta_4X}_{i4}+{\beta_rX}_{ir},\end{array}$$
(11)

where \(\left\{{\beta }_{0}, { \beta }_{1}, { \beta }_{2},{ \beta }_{3},{ \beta }_{4},{ \beta }_{r}\right\}=\left\{\left({b}_{0}-{b}_{r}\right), 1.00, -0.50, 0.25, 0.10, {b}_{r}\right\}\). \({b}_{r}\) is the coefficient value of indicator variable \({X}_{r}\) in the true PS model for each treatment allocation ratio (Additional file 1: Appendix F). These parameters are simultaneously calculated using a true PS model for only RCT data,

$$\begin{array}{c}\pi_{\textit{i},\textit{true},\textit{RCT}}=\mathrm{logit}\left\{\text{Pr}\left(T=1\vert X\right)\right\}=b_0+{1.00X}_{i1}-{0.50X}_{i2}+{0.25X}_{i3}+{0.10X}_{i4};\end{array}$$
(12)

the true PS model for only historical control data,

$$\begin{array}{c}\pi_{\textit{i},\textit{true},\textit{HC}}=\mathrm{logit}\left\{\text{Pr}\left(T=1\right)\right\}=0;\end{array}$$
(13)

and a covariate of each participant (Additional file 1: Appendix F; calculation method). Based on Eq. (12), treatment allocation \({T}_{i}\) for each participant is determined from the Bernoulli distribution:

$$\begin{array}{c}T_i\sim Bernoulli\left\{\frac{\text{exp}\left(\pi_{\textit{i},\textit{true}}^{\mathit\ast}\right)}{1+\text{exp}\left(\pi_{\textit{i},\textit{true}}^{\mathit\ast}\right)}\right\}.\end{array}$$
(14)

Outcome data \({y}_{i}\) are generated by

$$\begin{array}{c}y_\textit{i}=\mathrm{logit}\left\{\text{Pr}\left(Y=1\vert X\right)\right\}=\alpha_0+\beta_{treat}T_i+\alpha_1X_{i1}+\alpha_2X_{i2}+\alpha_3X_{i3}+\alpha_4X_{i4}+\alpha_rX_{ir}+\varepsilon_i/100,\end{array}$$
(15)

where \(\left\{{\alpha }_{0},{ \alpha }_{1},{ \alpha }_{2},{ \alpha }_{3},{ \alpha }_{4},{ \alpha }_{r}\right\}=\left\{{a}_{0},0.274, 0.137, -0.137, 0.137, 0.137\right\}\). Based on Eq. (15), each participant’s outcome \({Y}_{i}\) is determined from the Bernoulli distribution:

$$\begin{array}{c}Y_i\sim Bernoulli\left\{\frac{\text{exp}\left(y_\textit{i}\right)}{1+\text{exp}\left(y_\textit{i}\right)}\right\}.\end{array}$$
(16)

Results

The usual number of participants

In scenario (I), wherein the distribution of covariates is similar between the RCT and historical control data, not much difference in the proposed and conventional methods was found in the bias, MSE, coverage of 95% confidence interval, and type I error (Table 1).

Table 1 Scenario (I): performance of the estimated propensity score (PS) model

On the other hand, in scenario (II), wherein the distribution of covariates is not similar between the RCT and historical control data, the proposed method tended to have a smaller bias, coverage of 95% confidence interval closer to 95%, and a type I error rate closer to 5%. In addition, there was not much difference between the proposed and conventional methods in the MSE (Table 2).

Table 2 Scenario (II): performance of the estimated propensity score (PS) model

When the allocation ratios between the RCT treatment group, RCT control group, and historical control group were 2:1:3 (Additional file 1: Appendix Table B.3), 1:1:4 (Additional file 1: Appendix Table B.5), 2:1:6 (Additional file 1: Appendix Table B.6), 2:1:1 (Additional file 1: Appendix Table B.11), and 3:1:2 (Additional file 1: Appendix Table B.12)—that is, different but not extremely skewed—the same tendency in all performance measurements was observed as in the allocation ratio of 1:1:2. However, when the allocation ratios were 9:1:10 (Additional file 1: Appendix Table B.4), 9:1:20 (Additional file 1: Appendix Table B.7), 1:1:18 (Additional file 1: Appendix Table B.8), 2:1:27 (Additional file 1: Appendix Table B.9), and 9:1:90 (Additional file 1: Appendix Table B.10)—that is, extremely skewed—the bias and MSE had increased.

In addition, the same trends were observed for all performance measures when one of the four covariates was binary data (Additional file 1: Appendix Table C.1) as when the four covariates were continuous data.

Moreover, the simulation where the treatment variable in RCT population was generated independent of covariates (Additional file 1: Appendix Table D.1) shown also almost the same result in the text. In a simulation where the parameter settings of an actual clinical trial were applied (Additional file 1: Appendix Table G.1) was also similar result in the text.

Small number of participants

In the case where the total number of participants is n = 200, the same tendency was observed in all performance measurements as in the case where the number of participants is n = 900.

That is, in scenario (I), wherein the distribution of covariates is similar between the RCT and historical control data, not much difference in the proposed and conventional methods was found in the bias, MSE, coverage of 95% confidence interval, and type I error rate (Table 3).

Table 3 Scenario (I): performance of the estimated propensity score (PS) model by simulation setting assuming n = 200

And then, in scenario (II), wherein the distribution of covariates is not similar between the RCT and historical control data, the proposed method tended to have a smaller bias, coverage of 95% confidence interval closer to 95%, and a type I error rate closer to 5%. In addition, there was not much difference between the proposed and conventional methods in the MSE (Table 4).

Table 4 Scenario (II): performance of the estimated propensity score (PS) model by simulation setting assuming n = 200

Discussion

The results in this study suggest that a situation wherein the distribution of covariates is similar between the RCT and historical control data—that is, scenario (I)—the estimation bias of the treatment effect in the PS model would not be affected by including the information on whether the participant data is RCT data or not. On the other hand, a situation wherein the distribution of covariates is not similar between the RCT and historical control data—that is, scenario (II)—the use of the proposed PS method is recommended because the performance of estimating the treatment effect is improved by including the information on whether the participant data is RCT data or not.

As for the relationship between the outcome event rate and performance of estimating the treatment effect, it is considered appropriate that the higher the outcome event rate, the higher the performance of the estimation. Therefore, in the situation where the distributions of covariates are similar, the treatment effect could be estimated appropriately using both the proposed and conventional methods for this situation. Meanwhile, where the distributions of covariates are not similar, a similar tendency is observed when using the proposed method, and so it is considered that the appropriate treatment effect can be estimated. However, in the conventional method, the lower the outcome event rate, the higher the performance that can be estimated, and so there is a possibility that the appropriate treatment effect cannot be estimated.

Moreover, even when the allocation ratio between the RCT treatment group, RCT control group, and historical control group is changed, if the allocation ratio is not extremely skewed, the same consideration is possible as in the allocation ratio of 1:1:2 in this situation. Namely, in the situation where the distributions of covariates are similar, when considering the information on whether the data are RCT data or not in the PS model, the effect on the performance of estimating the treatment effect was not as marked. And also, in the situation where the distributions of covariates are not similar, the performance of estimating the treatment effect was improved by considering whether the data are RCT data or not. Meanwhile, when the allocation ratio was extremely skewed, bias and MSE increased tremendously, and the estimation could not be conducted appropriately. This is because the number of participants in the RCT control group was extremely small when the allocation ratio was extremely biased.

As another situation, even if the total number of participants is small or and the covariates include binary data, the same consideration is possible as that when the total number of participant is n = 900 and the covariates are all continuous data. The same trend is suggested when the treatment variables in the RCT population are considered completely independently and randomly from the covariates. In other words, when the distribution of covariates is similar between the RCT and historical control data, not much difference in performance is found between the proposed and conventional methods to estimate the treatment effect. And, when the distribution of covariates is not similar between the two kinds of data, the proposed method shows higher performance. In addition, the same argument as above can be considered to apply even when there is variation in data such as actual clinical trial data.

For these reasons, when combining the RCT and historical control data in the clinical trial setting, it is important to consider whether the distribution of important participant baseline characteristics that influence the outcomes is similar or not. Moreover, for appropriate utilization of historical control data, it is useful to apply the proposed PS model that considers \(X_\textit{r}\) while assessing possible differences. However, when considering the utilization of historical control data to reinforce the number of participants in the RCT control group, it is necessary to simulate several patterns of allocation ratio and evaluate the performance of the allowable range of how small the control group can be from the planning stage of the clinical trial, and use this with caution. In addition, since the proposed method uses PS, the possibility of the presence of unmeasured confounding factors, that is, whether the covariates used in the PS model are sufficient, should also be considered. And, this method is assuming that use single historical control data set, and have limited that could not have considered for difference between two or more historical control data set. Furthermore, in this study, we focused on the treatment effect in the entire population, including historical control data, and investigated a method for estimating the Average Treatment Effect (ATE). However, there may be situations in which it is desirable to estimate the Average Treatment Effect on the Treated (ATT) in the RCT population or treatment group, and we would like to consider the performance evaluation in such cases to be a future issue. While paying attention to issues such as the increase in type I error rate, it is possible to appropriately reduce the number of participants assigned to the RCT control group. We believe that this will help improve the efficiency of clinical trials, solve ethical problems, and thus save more people.

Conclusions

In clinical trials utilizing historical control data, considering information on whether the data are RCT data or not in the proposed PS model is useful for appropriately estimating the treatment effect, even when it is not known whether the RCT data and the historical control data are similar. Promotion of appropriate utilization of historical control data will contribute to the realization of better medical care.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Change history

  • 04 March 2024

    The author requested to hyperlink "Additional file 1" instead of "Additional file 1: Appendix A" in the Supplementary Information Section.

Abbreviations

RCT:

Randomized controlled trial

PS:

Propensity score

IPTW:

Inverse probability of treatment weighting

MSE:

Mean squared error

Coverage:

Coverage of 95% confidence interval

References

  1. International Council on Harmonisation (ICH). Guidance for industry E9 statistical principles for clinical trials. https://www.fda.gov/media/71336/download. Accessed 30 Mar 2023.

  2. U.S. Food & Drug Administration. Framework for FDA’s real world evidence program. https://www.fda.gov/media/120060/download. Accessed 30 Mar 2023.

  3. European Medicines Agency. Discussion paper: use of patient disease registries for regulatory purpose - methodological and operational considerations. https://www.ema.europa.eu/documents/other/discussion-paper-use-patient-disease-registries-regulatory-purposes-methodological-operational_en.docx. Accessed 30 Mar 2023.

  4. Pharmaceuticals and Medical Devices Agency. Notification: basic principles on utilization of registry for applications. https://www.pmda.go.jp/files/000240806.pdf. Accessed 30 Mar 2023.

  5. Pocock SJ. The combination of randomized and historical controls in clinical trials. J Chronic Dis. 1976;29(3):175–88. https://doi.org/10.1016/0021-9681(76)90044-8.

    Article  CAS  PubMed  Google Scholar 

  6. van Rosmalen J, Dejardin D, van Norden Y, et al. Including historical data in the analysis of clinical trials: is it worth the effort? Stat Methods Med Res. 2018;27(10):3167–82. https://doi.org/10.1177/0962280217694506.

    Article  MathSciNet  PubMed  Google Scholar 

  7. U.S. Food & Drug Administration. Rare diseases: natural history studies for drug development guidance for industry. https://www.fda.gov/media/122425/download. Accessed 30 Mar 2023.

  8. International Council on Harmonisation (ICH). Guidance for industry E10 choice of control group in clinical trials. https://www.fda.gov/media/71349/download. Accessed 30 Mar 2023.

  9. Mark L, Weili H, Jie C, et al. Biostatistical considerations when using RWD and RWE in clinical studies for regulatory purposes: a landscape assessment. Stat Biopharm Res. 2023;15(1):3–13. https://doi.org/10.1080/19466315.2021.1883473.

    Article  Google Scholar 

  10. Sacks H, Chalmers TC, Smith H Jr. Randomized versus historical controls for clinical trials. Am J Med. 1982;72(2):233–40. https://doi.org/10.1016/0002-9343(82)90815-4.

    Article  CAS  PubMed  Google Scholar 

  11. Viele K, Berry S, Neuenschwander B, et al. Use of historical control data for assessing treatment effects in clinical trials. Pharm Stat. 2014;13(1):41–54. https://doi.org/10.1002/pst.1589.

    Article  PubMed  Google Scholar 

  12. Chen MH, Ibrahim JG. Power prior distributions for regression models. Statist Sci. 2000;15(1):46–60. https://doi.org/10.1214/ss/1009212673.

  13. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. Wiley. 2004. https://doi.org/10.1002/0470092602.

  14. Neuenschwander B, Capkun-Niggli G, Branson M, et al. Summarizing historical information on controls in clinical trials. Clin Trials. 2010;7(1):5–18. https://doi.org/10.1177/1740774509356002.

    Article  PubMed  Google Scholar 

  15. Takeda K, Oba M, Kakizume T, et al. Bayesian approach to utilize historical control data in clinical trials. Jpn J Biom. 2015;36(1):25–50. https://doi.org/10.5691/jjb.36.25.

    Article  Google Scholar 

  16. Galwey NW. Supplementation of a clinical trial by historical control data: is the prospect of dynamic borrowing an illusion? Stat Med. 2017;36(6):899–916. https://doi.org/10.1002/sim.7180.

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  17. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. https://doi.org/10.1093/biomet/70.1.41.

    Article  MathSciNet  Google Scholar 

  18. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46(3):399–424. https://doi.org/10.1080/00273171.2011.568786.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Dehejia RH, Wahba S. Propensity score-matching methods for non-experimental causal studies. Rev Econ Stat. 2002;84(1):151–61. https://doi.org/10.1162/003465302317331982.

    Article  Google Scholar 

  20. Lin J, Gamalo-Siebers M, Tiwari R. Propensity score matched augmented controls in randomized clinical trials: a case study. Pharm Stat. 2018;17(5):629–47. https://doi.org/10.1002/pst.1879.

    Article  PubMed  Google Scholar 

  21. Hirano K, Imbens GW, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica. 2003;71(4):1161–89. https://doi.org/10.1111/1468-0262.00442.

    Article  MathSciNet  Google Scholar 

  22. D’Agostino RB Jr. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17(19):2265–81. https://doi.org/10.1002/(sici)1097-0258(19981015)17:19<2265::aid-sim918>3.0.co;2-b .

  23. Austin PC, Mamdani MM. A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Stat Med. 2006;25(12):2084–106. https://doi.org/10.1002/sim.2328.

    Article  MathSciNet  PubMed  Google Scholar 

  24. Stuart EA, Cole SR, Bradshaw CP, et al. The use of propensity scores to assess the generalizability of results from randomized trials. J R Stat Soc Ser A Stat Soc. 2011;174(2):369–86. https://doi.org/10.1111/j.1467-985X.2010.00673.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  25. Rosenman ETR, Owen AB, Baiocchi M, et al. Propensity score methods for merging observational and experimental datasets. Stat Med. 2022;41(1):65–86. https://doi.org/10.1002/sim.9223.

    Article  MathSciNet  PubMed  Google Scholar 

  26. Fu C, Pang H, Zhou S, et al. Covariate handling approaches in combination with dynamic borrowing for hybrid control studies. Pharm Stat. 2023;22(4):619–32. https://doi.org/10.1002/pst.2297.

    Article  PubMed  Google Scholar 

  27. Austin PC. The performance of different propensity score methods for estimating marginal odds ratios. Stat Med. 2007;26(16):3078–94. https://doi.org/10.1002/sim.2781.

    Article  MathSciNet  PubMed  Google Scholar 

  28. Kang JDY, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007;22(4):523–39. https://doi.org/10.1214/07-STS227.

    Article  MathSciNet  Google Scholar 

  29. Torrelo A, Rewerska B, Galimberti M, et al. Efficacy and safety of baricitinib in combination with topical corticosteroids in paediatric patients with moderate-to-severe atopic dermatitis with an inadequate response to topical corticosteroids: results from a phase III, randomized, double-blind, placebo-controlled study (BREEZE-AD PEDS). Br J Dermatol. 2023;189(1):23–32. https://doi.org/10.1093/bjd/ljad096.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We are grateful to the teachers who gave enthusiastic guidance and insightful comments, as well as to the laboratory members of the Department of Clinical Medicine (Biostatistics), Kitasato University, for participating in meaningful discussions. I wish to thank my parents and siblings who understood and supported my research activities and my grandparents who cheered for me.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

SK and MT approved the final version of manuscript to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. SK and MT interpreted the results. SK contributed to the design of this study, analyzed, and drafted the manuscript.

Corresponding author

Correspondence to Saki Kanamori.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Appendix A. Simulation setting assuming odds ratios 1.5 and 2.5. Appendix B. Simulation setting assuming that the allocation ratio between the RCT treatment group, RCT control group, and historical control data is other than 1:1:2. Appendix C. Simulation setting assuming that one of the covariates is binary data. Appendix D. Simulation setting assuming that the randomized assignment of treatment variables. Appendix E. Probability of treatment allocation correction value b0 and outcome event rate correction value a0. Appendix F. Calculation method of the true PS model in scenario (II). Appendix G. Simulation based on actual clinical trial parameter settings.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kanamori, S., Takeuchi, M. Treatment effect estimation using the propensity score in clinical trials with historical control. BMC Med Res Methodol 24, 47 (2024). https://doi.org/10.1186/s12874-023-02127-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12874-023-02127-9

Keywords