 Research
 Open access
 Published:
Modelbased standardization using multiple imputation
BMC Medical Research Methodology volume 24, Article number: 32 (2024)
Abstract
Background
When studying the association between treatment and a clinical outcome, a parametric multivariable model of the conditional outcome expectation is often used to adjust for covariates. The treatment coefficient of the outcome model targets a conditional treatment effect. Modelbased standardization is typically applied to average the model predictions over the target covariate distribution, and generate a covariateadjusted estimate of the marginal treatment effect.
Methods
The standard approach to modelbased standardization involves maximumlikelihood estimation and use of the nonparametric bootstrap. We introduce a novel, generalpurpose, modelbased standardization method based on multiple imputation that is easily applicable when the outcome model is a generalized linear model. We term our proposed approach multiple imputation marginalization (MIM). MIM consists of two main stages: the generation of synthetic datasets and their analysis. MIM accommodates a Bayesian statistical framework, which naturally allows for the principled propagation of uncertainty, integrates the analysis into a probabilistic framework, and allows for the incorporation of prior evidence.
Results
We conduct a simulation study to benchmark the finitesample performance of MIM in conjunction with a parametric outcome model. The simulations provide proofofprinciple in scenarios with binary outcomes, continuousvalued covariates, a logistic outcome model and the marginal log odds ratio as the target effect measure. When parametric modeling assumptions hold, MIM yields unbiased estimation in the target covariate distribution, valid coverage rates, and similar precision and efficiency than the standard approach to modelbased standardization.
Conclusion
We demonstrate that multiple imputation can be used to marginalize over a target covariate distribution, providing appropriate inference with a correctly specified parametric outcome model and offering statistical performance comparable to that of the standard approach to modelbased standardization.
Background
There has been active debate on whether marginal or conditional estimands should be preferred when estimating relative treatment effects [1,2,3,4,5,6]. Many researchers argue that marginal estimands are more appropriate inferential targets for decisions at the population level [1,2,3]. The distinction between marginal and conditional treatment effects is particularly important for noncollapsible measures such as the odds ratio and the hazard ratio. For such measures, the populationlevel marginal effect cannot be expressed as a weighted average of individual or subgrouplevel conditional effects. Almost invariably, marginal and conditional estimands do not coincide for noncollapsible effect measures, even in the absence of confounding and effect measure modification [7,8,9,10,11,12].
In the estimation of marginal treatment effects, covariate adjustment is desirable across a range of settings: (1) it is applied in the analysis of randomized controlled trials (RCTs) to correct for “chance” covariate imbalances and increase power, precision and efficiency [13]; (2) it allows for confounding control in the analysis of observational studies [14]; and (3) it accounts for covariate differences between multiple studies in indirect treatment comparisons and transportability analyses [15,16,17,18,19]. This article focuses on the latter scenario. Nevertheless, the methodological findings are also applicable to covariate adjustment between the treatment arms of a single comparative study.
A popular approach to covariate adjustment involves fitting a parametric multivariable model of the conditional outcome mean given treatment and baseline covariates. The treatment coefficient of the model targets a conditional effect. Socalled modelbased standardization, Gcomputation or marginalization approaches are required to integrate or average the conditional outcome model over the target covariate distribution, and produce a covariateadjusted estimate of the marginal treatment effect [16, 20,21,22,23,24,25,26,27,28,29]. The standard approach to modelbased standardization uses maximumlikelihood estimation to fit the outcome model and the nonparametric bootstrap for variance estimation [16, 25,26,27,28].
We introduce a novel generalpurpose method for modelbased standardization stemming from the ideas underlying multiple imputation [30]. Despite the close relationships between the methodologies, these largely have been developed separately, with some exceptions [31]. We build a link in this article,^{Footnote 1} terming our proposed approach multiple imputation marginalization (MIM). As opposed to the standard version of modelbased standardization, MIM accommodates a Bayesian statistical framework, which naturally allows for the principled propagation of uncertainty, readily handles missingness in the patientlevel data, integrates the analysis into a probabilistic framework, and permits the incorporation of prior evidence and other contextual information.
We conduct a simulation study to benchmark the finitesample performance of MIM in conjunction with a parametric outcome model. The simulations provide proofofprinciple in scenarios with binary outcomes, continuousvalued covariates, a logistic outcome model and the marginal log odds ratio as the target effect measure. When parametric modeling assumptions hold, MIM yields unbiased estimation in the target covariate distribution. Code to implement the MIM methodology in R is provided in Additional file 1.
Methods
We wish to transport the results of a comparative “index” study to a target distribution of covariates. We assume that the target is characterized by a dataset that is external to the index study. In practice, this could belong to an observational study or be derived from secondary healthcare data sources (e.g. disease registries, cohort studies, insurance claims databases or electronic health records). Such administrative datasets are typically larger, less selected, and more representative of target populations of policyinterest than the participants recruited by RCTs [34,35,36].
For instance, in drug development, a pivotal Phase III RCT is typically conducted premarket authorization to obtain regulatory approval. Such trial may have relatively narrow selection criteria, to enhance statistical precision and power in efficacy and safety testing [37,38,39]. Policymakers may be interested in transporting inferences to a “realworld” target covariate distribution, which is more diverse or heterogeneous in composition, and more representative of the patients who will receive the intervention in routine clinical practice [40].
Let \(S=1\) denote the index study and let \(S=2\) denote the external target. Adopting the potential outcomes framework [41], the target marginal average treatment effect estimand for the MIM procedure described in this article is a contrast between the, possibly transformed, means of potential outcome distributions:
where \(Y^t\) denotes the potential outcome that would have been observed for a subject assigned to intervention \(T=t\), with \(t \in \{0, 1\}\), \(E(\cdot )\) represents an expectation taken over the distribution of potential outcomes in \(S=2\), and \(g(\cdot )\) is an appropriate “link” function, e.g. the identity, log or logit, mapping the mean potential outcomes onto the plus/minus infinity range. The target estimand in Eq. 1 is the average treatment effect, constructed on an additive scale e.g. the mean difference, (log) risk ratio or (log) odds ratio scale, had everyone in the target had been assigned \(T=1\) versus \(T=0\).
We briefly outline the data requirements of the MIM procedure. Individuallevel data \(\mathcal {D}=\left( \varvec{x}, \varvec{t}, \varvec{y} \right)\) for a comparative index study, randomized or nonrandomized, are available. Here, \(\varvec{x}\) is an \(N \times K\) matrix of clinical or demographic baseline covariates, where N is the number of participants in the study and K is the number of baseline covariates. Each subject \(n=1,2,\dots ,N\) has a row vector \(\varvec{x}_n = \left( x_{n,1}, x_{n,2}, \dots , x_{n,K}\right)\) of K covariates. We let \(\varvec{y} = \left( y_1, y_2, \dots , y_N\right)\) denote a vector of clinical outcomes and \(\varvec{t}=\left( t_1, t_2, \dots , t_N\right)\) denote a binary treatment indicator vector, with each entry taking the value zero or one. We assume that \(\mathcal {D}\) has no missing values but MIM can be readily adapted to address this issue, as is illustrated in Additional file 1.
The target dataset contains a matrix of covariates \(\varvec{x}^{tar}\) of dimensions \(N^{tar} \times K\), where \(N^{tar}\) is the number of subjects and K is the number of covariates. We assume that all K covariates in the index study are available in the target. Each subject has a row vector \(\varvec{x}_i^{tar}=\left( x_{i,1}^{tar}, x_{i,2}^{tar}, \dots , x_{i,K}^{tar}\right)\) of K covariates. Individuallevel outcomes under the treatments being studied in the index trial are assumed unavailable in the target, as would be the case for interventions evaluated in the premarketing authorization setting.
In the scenario described in this article, standardization is performed with respect to an external data source, and the aim is to estimate marginal treatment effects in an external covariate distribution. This is typically the case in transportability analyses translating inferences from trials lacking external validity to the target population for decisionmaking, or in covariateadjusted indirect comparisons transporting relative effects across a connected network of trials.
Nevertheless, as illustrated in Additional file 1, it is also possible to perform standardization over the covariate distribution observed in the index study. This avoids extrapolation into an external data source and may be useful when adjusting for covariate imbalances between treatment arms within randomized or nonrandomized comparative studies. Within a randomized experiment, covariate adjustment is not necessary for unbiased estimation of the marginal treatment effect, but can be used to increase precision, i.e. reduce standard errors [13, 42]. Within a nonrandomized study, covariate adjustment is necessary to remove confounding bias [43].
Multiple imputation marginalization
Conceptually, MIM consists of two separate stages: (1) the generation (synthesis) of synthetic datasets; and (2) the analysis of the generated datasets. The synthesis is separated from the analysis — only after the synthesis has been completed is the marginal effect of treatment on the outcome estimated. This is analogous to the separation between the imputation and analysis stages in multiple imputation.
Multiple imputation is a simulation technique that, arguably, is fundamentally Bayesian [30, 44, 45]. Its original development was grounded in Bayesian modeling, with imputed outcomes derived, at least conceptually, from a posterior predictive distribution. Computational tools such as Markov chain Monte Carlo (MCMC) and the Gibbs sampler only arose to prominence in the statistical literature several years after Rubin’s seminal paper [46]. Consequently, the typical practical implementation of multiple imputation is based on a hybrid approach [30]: “think like a Bayesian, do like a frequentist”.
Interestingly, our standardization problem can be conceptualized as a missing data problem. Outcomes for the subjects in the index study are observed, but outcomes in the target population, under the treatments examined in the index study, are systematically missing. MIM standardizes over the target by replacing the missing outcomes with a set of plausible values, conditional on some prespecified imputation mechanism. Extending the parallel with the missing data literature, MIM relies on a missingatrandomlike assumption: missing outcomes in the target are assumed conditionally exchangeable with those observed in the index study, conditioning on the adjustment model used for standardization.
MIM sits within a Bayesian framework by characterizing probabilistic relationships among a set of variables, and adopts a simulation approach. Figure 1 reveals a Bayesian directed acyclic graph (DAG) summarizing the general MIM structure and the links between its modules. In this graphical representation, the nodes represent variables; single arrows indicate probabilistic relationships and double arrows indicate deterministic functions. The plate notation indicates repeated analyses. We return to Fig. 1 and provide more detailed explanations for the notation and the individual modules throughout this section.
Generation of synthetic datasets: a missing data problem
The first stage, synthetic data generation, consists of two steps. Initially, the firststage regression captures the relationship between the outcome \(\varvec{y}\) and the covariates \(\varvec{x}\) and treatment \(\varvec{t}\) in the patientlevel data for the index study. In the outcome prediction step, predicted outcomes for each treatment are generated in the target by drawing from the posterior predictive distribution of outcomes, given the observed predictoroutcome relationships in the index study, the set treatment and the target covariate distribution.
Firststage regression
Firstly, a multivariable regression of the observed outcome \(\varvec{y}\) on the baseline covariates \(\varvec{x}\) and treatment \(\varvec{t}\) is fitted to the subjectlevel data of the index study:
where \(\mu _n\) is the conditional outcome expectation of subject n on the natural scale (e.g., the probability scale for binary outcomes), \(g\left( \cdot \right)\) is an appropriate link function, \(\beta _0\) is the intercept, \(\varvec{\beta }_{\varvec{1}}\) and \(\varvec{\beta }_{\varvec{2}}\) are vectors of regression coefficients, and the treatment coefficient \(\beta _t\) targets a conditional effect at baseline, when the covariates are zero. The model specification assumes that the covariates are prognostic of outcome at the individual level. Due to the presence of treatmentcovariate interactions, the covariates are also assumed to be (conditional) effect measure modifiers, i.e., predictive of treatment effect heterogeneity, at the individual level on the linear predictor scale.
The conditional outcome model in Eq. 2 will be our “working”, “nuisance” or “imputation” model from now onward. We consider this to be a parametric model within a generalized linear modeling framework. In logistic regression, the link function \(g\left( \mu _n\right) =\text {logit}\left( \mu _n\right) = \ln \left[ \mu _n/\left( 1\mu _n\right) \right]\) is adopted. Other choices are possible in practice such as the identity link for linear regression or the log link for Poisson regression. The conditional outcome model is to be estimated using a Bayesian approach. We shall assume that efficient simulationbased sampling methods such as MCMC are used. Prior distributions for the regression coefficients would have to be specified, potentially using contextual information.
When standardizing with respect to an external target and/or when the index study is nonrandomized, one is reliant on correct specification of the outcome model for unbiased estimation. In the former case, there is particular interest in modeling covariatetreatment product terms (“interactions”) to capture (conditional) effect measure modification. In the latter case, the outcome model should adjust for potential confounders. Time and care should be dedicated to modelbuilding, while being mindful of erroneous extrapolation outside the covariate space observed in the index study [47].
We shall assume that a single parametric outcome model is estimated, including treatmentcovariate product terms to capture treatment effect heterogeneity at the individual level. An alternative strategy is to postulate two separate outcome models, one for each treatment group in the index comparative study [48]. While such approach allows for individuallevel treatment effect heterogeneity over all the baseline covariates included in the models, it prevents borrowing information across treatment groups.
Outcome prediction
In this step, we generate predicted outcomes for the treatments under investigation, but in the target covariate distribution, by drawing from the posterior predictive distribution of outcomes. This is to be constructed using the imputation model in Eq. 2. Beforehand, a “data augmentation” step is required. We shall create a copy of the original target covariate dataset and vertically concatenate it to the original \(\varvec{x}^{tar}\). The concatenation is denoted \(\varvec{x}^{\varvec{*}}=\left[ \begin{array}{c} \varvec{x}^{tar} \\ \varvec{x}^{tar} \end{array}\right]\) and has \(N^*=\left( 2 \times N^{tar}\right)\) rows and K columns. The original \(j=1, 2, \dots , N^{tar}\) rows are assigned the treatment value \(t^{*}_j=1\) and the appended \(j=\left( N^{tar}+1\right) , \left( N^{tar}+2\right) \dots , N^{*}\) rows are assigned the treatment value \(t^{*}_j=0\). The treatment indicator vector in the augmented dataset is denoted \(\varvec{t}^{\varvec{*}}=\left( t^*_1, t^*_2 \dots t^*_{N^*}\right)\).
Using MCMC sampling, it is fairly straightforward to implement the estimation of both the firststage regression and the outcome prediction steps within a single Bayesian computation module. Having fitted the firststage regression, we will iterate over the L converged draws of the MCMC algorithm to generate \(M \le L\) synthetic datasets: \(\{ \mathcal {D}^{*} = \mathcal {D}^{*(m)}: m=1,2\dots ,M \}\), where \(\mathcal {D}^{*(m)}=\left( \varvec{x}^{\varvec{*}}, \varvec{t}^{\varvec{*}}, \varvec{y}^{\varvec{*}(m)}\right)\). Covariates \(\varvec{x}^{\varvec{*}}\) and treatment \(\varvec{t}^{\varvec{*}}\) are fixed across all the synthetic datasets. In line with the multiple imputation framework, each synthetic dataset is filled in by drawing a vector of outcomes \(\varvec{y}^{\varvec{*}(m)} = \left( y^{*(m)}_1, y^{*(m)}_2, \dots , y^{*(m)}_{N^*}\right)\) of size \(N^*\) from the posterior predictive distribution \(p\left( \varvec{y}^{\varvec{*}} \mid \varvec{x}^{\varvec{*}}, \varvec{t}^{\varvec{*}}, \varvec{y}, \varvec{x}, \varvec{t}\right)\), given the original index trial and the augmented target datasets.
Assuming convergence of the MCMC sampling algorithm, the posterior predictive distribution of outcomes is approximated numerically as:
where the realizations \(\varvec{\beta }^{(l)} \sim p \left( \varvec{\beta } \mid \varvec{y}, \varvec{x}, \varvec{t}\right)\) are independent draws from the posterior distribution of the firststage regression parameters \(\varvec{\beta } = \left( \beta _0, \varvec{\beta }_{\varvec{1}}, \varvec{\beta }_{\varvec{2}}, \beta _t \right)\), which encode the predictoroutcome relationships observed in the index trial, given some suitably defined prior \(p\left( \varvec{\beta }\right)\). Here, \(l=1, 2, \dots L\) indexes each MCMC iteration after convergence.
Consequently, L predictive samples \(\varvec{y}^{\varvec{*}(1)}, \dots , \varvec{y}^{\varvec{*}(L)} \sim p\left( \varvec{y}^{\varvec{*}} \mid \varvec{x}^{\varvec{*}}, \varvec{t}^{\varvec{*}}, \varvec{\beta }^{(l)}\right)\) are drawn independently from the posterior predictive distribution of outcomes. Dedicated MCMC programming software such as Stan [49] would typically return an \(L \times N^*\) matrix of simulations. We will “thin” the matrix such that only \(M \le L\) of the rows are retained. This is to reduce the computational run time of the MIM analysis stage. The M remaining outcome imputations are used to complete the synthetic datasets \(\{\mathcal {D}^{*(m)}=\left( \varvec{x}^{\varvec{*}}, \varvec{t}^{\varvec{*}}, \varvec{y}^{\varvec{*}(m)}\right) : m=1,2,\dots ,M\}\). Table 1 illustrates the structure of each synthetic dataset.
Analysis of synthetic datasets
In the second stage, the analysis of synthetic datasets, we seek inferences about the marginal treatment effect in the target covariate distribution (TATE in Eq. 1, but here denoted \(\Delta\)), given the synthesized outcomes. The analysis stage consists of another two steps. In the secondstage regression step, estimates of the marginal treatment effect in each synthesis \(m=1,2,\dots ,M\) are generated by regressing the predicted outcomes \(\varvec{y}^{\varvec{*}(m)}\) on the treatment indicator \(\varvec{t}^{\varvec{*}}\). In the pooling step, the treatment effect estimates and their variances are combined across all M syntheses.
In standard multiple imputation, the imputation and analysis stages may be performed simultaneously in a joint model [45]. In MIM, this is challenging because the dependent variable of the analysis is completely synthesized. Consider the Bayesian DAG in Fig. 1. In a joint model, the predicted outcomes are a collider variable, blocking the only path between the first and the second module (information from the directed arrows “collides” at the node). As a result, the data synthesis and analysis stages have been implemented as separate modules in a twostage framework. The analysis stage conditions on the outcomes predicted by the synthesis stage, treating these as observed data.
Secondstage regression
We fit M secondstage regressions of predicted outcomes \(\varvec{y}^{\varvec{*}(m)}\) on treatment \(\varvec{t}^{\varvec{*}}\) for \(m=1,2,\dots ,M\). Identical analyses are performed on each synthesis:
where \(\eta _j^{(m)}\) is the expected outcome on the natural scale of unit j in the mth synthesis, the coefficient \(\alpha ^{(m)}\) is an intercept term and \(\delta ^{(m)}\) denotes the marginal treatment effect in the mth synthesis. There is some nontrivial computational complexity to performing a Bayesian fit in this step. That would embed a nested simulation scheme. Namely, if we draw M samples \(\{ \varvec{y}^{\varvec{*}(m)} : m = 1,2,\dots M \}\) in the synthesis stage, a further number of samples, say R, of the treatment effect \(\{ \delta ^{(m,r)}: m = 1,2\dots M; r = 1, 2,\dots R \}\) would be drawn for each of these realizations separately. This structure is unlikely to be feasible in terms of running time.
Using maximumlikelihood estimation, a point estimate \(\hat{\delta }^{(m)}\) of the marginal treatment effect and a measure of its variance \(\hat{v}^{(m)}\) are generated in each synthesis \(\varvec{y}^{\varvec{*}(m)}\). Equation 3 is a marginal model of outcome on treatment alone. Adopting terminology from the missing data literature, the secondstage regression in the analysis stage is “congenial” with the firststage regression in the synthesis stage because treatment was already included as a predictor in the firststage regression [44].
Pooling
We must now combine the M point estimates of the marginal treatment effect and their variances to generate a posterior distribution. Pooling across multiple syntheses is a topic that has already been investigated within the domain of statistical disclosure limitation [50,51,52,53,54,55,56].
In statistical disclosure limitation, data agencies mitigate the risk of identity disclosure by releasing multiple fully synthetic datasets. These only contain simulated values, in lieu of the original confidential data of real survey respondents. Raghunathan et al. [50] describe full synthesis as a twostep process: (1) construct multiple synthetic populations by repeatedly drawing from the posterior predictive distribution, conditional on a model fitted to the original data; and (2) draw random samples from each synthetic population, releasing these synthetic samples to the public. In practice, as indicated by Reiter and Raghunathan [55], it is not a requirement to generate the populations, but only to generate values for the synthetic samples. Once the samples are released, the analyst seeks inferences based on the synthetic data alone.
MIM is analogous to this problem, albeit there are some differences. In MIM, the analyst also acts as the synthesizer of data, and there is no “original data” on outcomes as such if the index study has not been conducted in the target covariate distribution. In any case, values for the samples are generated in the synthesis stage by repeatedly drawing from the posterior predictive distribution of outcomes. This is conditional on the predictoroutcome relationships indexed by the model fitted to the index study, the set treatment and the target distribution of covariates.
We seek to construct a posterior distribution for the marginal treatment effect, conditional on the synthetic outcomes (and treatment). That is, \(p\left( \Delta \mid \varvec{y}^{\varvec{*}}, \varvec{t}^{\varvec{*}}\right)\). Following Raab et al. [56], each \(\varvec{y}^{\varvec{*}(m)}\) is viewed as a random sample from \(p\left( \varvec{y}^{\varvec{*}} \mid \varvec{x}^{\varvec{*}}, \varvec{t}^{\varvec{*}}, \varvec{\beta }^{(m)} \right)\), where \(\varvec{\beta }^{(m)}\) is sampled from its posterior \(p\left( \varvec{\beta } \mid \varvec{x}, \varvec{t}, \varvec{y} \right)\). Hence, the “true” marginal treatment effect \(\delta ^{(m)}\) for the mth synthesis, corresponding to \(\varvec{\beta }^{(m)}\), can be defined as a function of this sample. In each secondstage regression in Eq. 3, this is the treatment effect estimated by \(\hat{\delta }^{(m)}\).
Consequently, following Raghunathan et al. [50], the estimators \(\{ \hat{\delta }^{(m)}, \hat{v}^{(m)}; m=1,2,\dots ,M \}\) from the secondstage regressions are treated as “data”, and are used to construct an approximation to the posterior density \(p\left( \Delta \mid \varvec{y}^{\varvec{*}}, \varvec{t}^{\varvec{*}}\right)\). This density is assumed to be approximately normal and is parametrized by its first two moments: the mean \(\mu _{\Delta }\), and the variance \(\sigma _{\Delta }^2\). To derive the conditional distribution \(p\left( \mu _\Delta , \sigma _\Delta ^2 \mid \varvec{y}^{\varvec{*}}, \varvec{t}^{\varvec{*}}\right)\) of these moments given the syntheses, the estimators \(\{ \hat{\delta }^{(m)}, \hat{v}^{(m)}; m=1,2,\dots ,M \}\), where \(\hat{v}^{(m)}\) is the point estimate of the variance in the mth secondstage regression, are treated as sufficient summaries of the syntheses, and \(\mu _{\Delta }\) and \(\sigma _{\Delta }^2\) are treated as parameters. Then, the posterior distribution \(p\left( \Delta \mid \varvec{y}^{\varvec{*}}, \varvec{t}^{\varvec{*}}\right)\) is constructed as:
In analogy with the theory of multiple imputation [30], the following quantities are required for inference:
where \(\bar{\delta }\) is the average of the treatment effect point estimates across the M syntheses, \(\bar{v}\) is the average of the point estimates of the variance (the “within” variance), and b is the sample variance of the point estimates (the “between” variance). These quantities are computed using the point estimates from the secondstage regressions.
After deriving the quantities in Eqs. 5, 6 and 7, there are two options to approximate the posterior distribution of the marginal treatment effect in Eq. 4. The first involves direct Monte Carlo simulation and the second uses a simple normal approximation. In Additional file 1, the inferential framework for pooling outlined in this section is extended to scenarios involving correlated outcomes and nonscalar estimands with multiple components [57]. This involves a multivariate outcome model (i.e. with multiple dependent variables) and the combination of correlated treatment effects corresponding to multiple outcomes.
Pooling via posterior simulation
Firstly, one draws \(\mu _\Delta\) and \(\sigma _\Delta ^2\) from their posterior distributions, conditional on the syntheses. These distributions are derived by Raghunathan et al. [50]. Values of \(\mu _\Delta\) are drawn from a normal distribution:
Values of \(\sigma _\Delta ^2\) are drawn from a chisquared distribution with \(M1\) degrees of freedom:
Values of \(\Delta\) are drawn from a tdistribution with \(M1\) degrees of freedom [50]:
where the \(\sigma _\Delta ^2/M\) term in the variance is necessary as an adjustment for there being a finite number of syntheses; as \(M \rightarrow \infty\), the variance tends to \(\sigma _\Delta ^2\).
By performing a large number of simulations, one is estimating the posterior distribution in Eq. 4 by approximating the integral of the posterior in Eq. 10 with respect to the posteriors in Eqs. 8 and 9 [50]. Hence, the resulting draws of \(\Delta\) are samples from the posterior distribution \(p\left( \Delta \mid \varvec{y}^{\varvec{*}}, \varvec{t}^{\varvec{*}} \right)\) in Eq. 4. One can take the expectation over the posterior draws to produce a point estimate \(\hat{\Delta }\) of the marginal treatment effect in the target distribution of covariates. A point estimate of its variance \(\hat{V}\left( \hat{\Delta }\right)\) can be directly computed from the draws of the posterior density. Uncertainty measures such as 95% interval estimates can be calculated from the corresponding empirical quantiles.
The posterior distributions in Eqs. 8, 9 and 10 have been derived under certain normality assumptions, which are adequate for reasonably large sample sizes, where the relevant sample sizes are both the size N of the index study and the size \(N^*\) of the synthetic datasets.
Pooling via combining rules
A simple alternative to direct Monte Carlo simulation is to use a basic approximation to the posterior density in Eq. 4, such that the sampling distribution in Eq. 10 is normal as opposed to a tdistribution. The posterior mean is the average of the treatment effect point estimates across the M syntheses. A combining rule for the variance arises from using \(b  \bar{v}\) to estimate \(\sigma ^2_\Delta\), which is equivalent to setting \(\sigma ^2_\Delta\) at its approximate posterior mean in Eq. 9 [54]. Again, the b/M term is necessary as an adjustment for there being a finite number of syntheses.
Consequently, point estimates for the marginal treatment effect in the target covariate distribution and its variance can be derived using the following plugin estimators:
The combining rules are slightly different to Rubin’s variance estimator in multiple imputation (in Eq. 12, \(\bar{v}\) is subtracted instead of added) [30]. Interval estimates can be approximated using a normal distribution, e.g. for 95% interval estimates, taking \(\pm 1.96\) times the square root of the variance computed in Eq. 12 [50]. A more conservative, heaviertailed, tdistribution with \(\nu _f = \left( M1 \right) \left( 1+\bar{v}/\left( \left( 1+1/M\right) b\right) \right) ^2\) degrees of freedom has also been proposed, as normal distributions may produce excessively narrow intervals and undercoverage when M is more modest [52]. Note that the combining rules in Eqs. 11 and 12 are only appropriate for reasonably large M. The choice of M is now discussed.
Number of synthetic datasets
In standard multiple imputation, it is not uncommon to release as little as five imputed datasets [30]. However, MIM is likely to require a larger value of M because it imputes all of the outcomes in the syntheses, as opposed to a relatively small proportion of missing values. Adopting terminology from the missing data literature, the “fraction of missing information” in MIM is 1, because the original dataset used to fit the firststage regression is different than the augmented target dataset used to fit the secondstage regression.
In the statistical disclosure limitation literature, a common choice for the number of syntheses is \(M=100\) [52]. We encourage setting M as large as possible, in order to minimize Monte Carlo error and thereby maximize precision and efficiency. A sensible strategy is to increase the number of syntheses until repeated analyses across different random seeds give similar results, within a specified degree of accuracy. Assuming MCMC simulation is used in the synthesis stage, the value of M is likely to be a fraction of the total number of iterations or posterior samples required for convergence. As computation time is driven by the synthesis stage, increasing M provides more precise and efficient estimation [52, 58] at little additional cost in the analysis stage.
An inconvenience of the expressions in Eqs. 9 and 12 is that these may produce negative variances. When the posterior in Eq. 9 generates a negative value of \(\sigma ^2_\Delta\), i.e., when \(\frac{\left( M1\right) b}{\chi ^*} < v\) (where \(\chi ^*\) is the draw from the posterior in Eq. 9), the variance of the posterior distribution in Eq. 10 is negative. Similarly, Eq. 12 produces a negative variance when \((1+1/M)b < \bar{v}\). This is because the formulations have been derived using methodofmoments approximations, where estimates are not necessarily constrained to fall in the parameter space. Negative variances are unlikely to occur if M and the size of the synthetic datasets are relatively large. This is due to lower variability in \(\sigma ^2_\Delta\) and \(\hat{V}\left( \hat{\Delta }\right)\) [53]: \(\bar{v}\) decreases with larger syntheses and b is less variable with larger M [52].
Simulation study
Aims
The objectives of the simulation study are to provide proofofprinciple for MIM and to benchmark its statistical performance against that of the standard implementation of parametric modelbased standardization (parametric Gcomputation), which uses maximumlikelihood estimation with nonparametric bootstrapping for inference [16, 25,26,27,28]. The simulation study investigates a setting in which the index study is a perfectlyexecuted twoarm RCT. This will be standardized to produce a marginal treatment effect in an external target covariate distribution.
Methods will be evaluated according to the following finitesample (frequentist) characteristics [59]: (1) unbiasedness; (2) precision; (3) efficiency; and (4) coverage of interval estimates. The chosen performance metrics assess these criteria specifically. The ADEMP (Aims, Datagenerating mechanisms, Estimands, Methods, Performance measures) structure by Morris et al. [59] is used to describe the simulation study design. Example R code implementing the methods on a simulated example is provided in Additional file 1. All simulations and analyses have been performed using R software version 4.1.1 [60].^{Footnote 2}
Datagenerating mechanisms
The datagenerating mechanisms are inspired by those presented by Phillippo et al. [61]. We consider binary outcomes using the log odds ratio as the measure of effect. An index RCT investigates the efficacy of an active treatment (coded as treatment 1) versus a control (coded as treatment 0). Outcome \(y_n\) for subject n in the index RCT is simulated from a Bernoulli distribution, with event probabilities \(\theta _n = p\left( y_n \mid \varvec{x}_n, t_n \right)\) given covariates \(\varvec{x}_n\) and treatment \(t_n\) generated using a logistic model:
Two correlated continuous covariates, \(x_{n, 1}\) and \(x_{n, 2}\) are simulated per subject n by drawing from a multivariate Gaussian copula with prespecified means and standard deviations for the marginal distributions, and a prespecified covariance matrix. The first covariate follows the marginal distribution \(x_{n,1} \sim \text {N}(1, 0.5^2)\), and the second covariate follows the marginal distribution \(x_{n,2} \sim \text {N}(0.5, 0.2^2)\). There is some positive correlation between the two covariates, with pairwise correlation coefficients set to 0.15. Both covariates are prognostic of the outcome in the control group at the individual level. Due to the presence of treatmentcovariate interactions, both covariates are also (conditional) effect measure modifiers (i.e., predictive of treatment effect heterogeneity) at the individual level on the (log) odds ratio scale. The covariates also modify marginal treatment effects at the population level on the (log) odds ratio scale.
We set the intercept to \(\beta _0 = 0.5\), coefficients for the main covariateoutcome associations to \(\beta _{1,k}=2\sigma _{k}\) for covariate k, where \(\sigma _{k}\) is the standard deviation of the sampling distribution of covariate k, and coefficients for the interaction terms to \(\beta _{2,k}=\sigma _{k}\). The treatment coefficient (i.e. the conditional log odds ratio for the active intervention versus control at baseline, when the covariate values are zero) is set to \(\beta _{t}=1.5\). That is, if the binary outcome represents the occurrence of an adverse event, the active treatment would be more efficacious than the control. The covariates could represent influential prognostic and effectmodifying comorbidities that are associated with greater odds of the adverse event and lower efficacy of active treatment versus control at the individual level on the (log) odds ratio scale.
The simulation study adopts a factorial arrangement using three index trial sample sizes times two levels of overlap between the index trial and the target covariate distributions. This results in a total of six simulation scenarios. The settings are defined by varying the following parameter values:

Sample sizes of \(N \in \{ 500, 1000, 2000\}\) for the index RCT, with a 1:1 active treatment vs. control allocation ratio.

The level of (deterministic) overlap between the index RCT and the target covariate distribution: limited overlap (50% of the index study population lies outside of the target population) and full overlap (the index study population is entirely contained within the target population) [61].
Following Phillippo et al. [61], the target covariate distribution is set to achieve the required level of overlap by using a proxy parameter \(\kappa\) (\(\kappa =0.5\) corresponds to 50% overlap and \(\kappa =1\) corresponds to full overlap). Then, each covariate k in the target follows the marginal distribution \(x^{*}_{n,k} \sim \text {N} \left( m^*_k, {\sigma ^{*}_k}^{2}\right)\), with \(m^*_k = m_k(1.1 + (1\kappa )^2)\) and \(\sigma ^*_k = 0.75\sigma _k\), where \(m_k\) is the mean of the sampling distribution of covariate k in the index RCT. The target joint covariate distribution is a multivariate Gaussian copula with the pairwise correlation coefficients set to 0.15. \(N^{tar}=2000\) subject profiles are simulated for the target covariate dataset. Individuallevel outcomes in the target, under the treatments being investigated in the index study, are assumed unavailable and not simulated.
Estimands
The target estimand is the true marginal log odds ratio for active treatment versus control in the target covariate distribution. This may vary across the settings of the simulation study because, by design, changing the level of (deterministic) overlap changes the target covariate distribution, and the true marginal log odds ratio depends on the covariate distribution.
For each scenario, true values of the marginal estimand are determined by simulating a cohort of 2,000,000 subjects, a number sufficiently large to minimize sampling variability, using the target covariate distributions in the simulation study. Hypothetical subjectlevel binary outcomes under active treatment and control are simulated for the cohort according to the true outcomegenerating mechanism. The true marginal log odds ratio is computed by averaging the simulated unitlevel outcomes under each treatment and contrasting the marginal outcome expectations on the log odds ratio scale. A simulationbased approach is necessary to compute the true marginal estimands due to the noncollapsibility of the (log) odds ratio [62,63,64].
For \(\kappa =0.5\) (limited overlap), the true marginal outcome probabilities for active treatment and control in the target population are 0.60 and 0.75, respectively, resulting in a true marginal log odds ratio of 0.68. For \(\kappa =1\) (full overlap), the true marginal outcome probabilities for active treatment and control in the target population are 0.50 and 0.69, resulting in a true marginal log odds ratio of 0.81.^{Footnote 3}
Methods
Each simulated dataset is analyzed using: (1) the standard implementation of parametric modelbased standardization (parametric Gcomputation) [16, 25,26,27,28]; and (2) parametric modelbased standardization using MIM.
Standard modelbased standardization
Among the subjects in the index RCT, outcomes are regressed on baseline covariates and treatment using a logistic model. Maximumlikelihood is used to estimate the conditional outcome model, which is correctly specified. Outcome predictions under each treatment are made by applying the fitted regression to the full target covariate dataset. The marginal log odds ratio is derived by: (1) averaging the predicted conditional outcome means by treatment group over the target covariate dataset; (2) transforming the resulting marginal outcome means to the log odds ratio scale; and (3) producing a contrast for active treatment versus control in such scale [16, 27, 28]. For inference, the index RCT is resampled via the ordinary nonparametric bootstrap with replacement, using 1,000 resamples (the target covariates are assumed fixed). The average marginal log odds ratio and its standard error are computed as the mean and the standard deviation, respectively, across the resamples. Confidence intervals are computed using the “percentile” method; 95% interval estimates are derived from the 2.5th and the 97.5th percentiles across the resamples.
Multiple imputation marginalization
In the synthesis stage, the firststage multivariable logistic regression is correctly specified and is estimated using MCMC sampling. This is implemented using the R package rstanarm [65], an appendage to rstan [66]. We adopt the default normallydistributed “weakly informative” priors for the logistic regression coefficients [65]. Predicted outcomes are drawn from their posterior predictive distribution, given the augmented target dataset. We run two Markov chains with 4,000 iterations per chain, with 2,000 “burnin” iterations that are not used for posterior inference. The MCMC chains are thinned every 4 iterations to use a total of \(M=(2000\times 2)/4=1000\) syntheses of size \(N^*=2 \times N^{tar}=4000\) in the analysis stage. The secondstage regressions are simple logistic regressions of predicted outcomes on treatment that are fitted to each synthesis using maximumlikelihood estimation. Their point estimates and variances are pooled using the combining rules in Eqs. 11 and 12. Waldtype 95% confidence intervals are estimated using tdistributions with \(\nu _f = \left( M1 \right) \left( 1+\bar{v}/\left( \left( 1+1/M\right) b\right) \right) ^2\) degrees of freedom. Variance estimates are never negative under any simulation scenario. In a test simulation scenario (\(\kappa =0.5\) and \(N=1000\)), the selected value of \(M=1000\) is high enough, so that the Monte Carlo error is adequate with respect to the uncertainty in the estimator. Upon inspection, marginal log odds ratio estimates across different random seeds are approximately within 0.01.
Performance measures
We simulate 1,000 datasets per scenario. For each scenario and methodology, the following performance measures are computed over the 1,000 simulated datasets: (1) bias; (2) empirical standard error (ESE); (3) mean square error (MSE); and (4) empirical coverage rate of the 95% interval estimates. These criteria are explicitly defined by Morris et al. [59] and specifically address the aims of the simulation study. The ESE evaluates precision (aim 2) and the MSE measures overall efficiency (aim 3), accounting for bias (aim 1) and precision (aim 2).
To quantify the simulation uncertainty, Monte Carlo standard errors (MCSEs) over the data replicates, as defined by Morris et al. [59], will be reported for each performance metric. Based on the scenario inducing the highest longrun variability (\(\kappa =0.5\) and \(N=500\)), the MCSE of the bias of the methods is at most 0.015 under 1,000 simulations per scenario, and the MCSE of the coverage (based on an empirical coverage percentage of \(95\%\)) is \(\left( \sqrt{(95 \times 5)/1000}\right) \%=0.69\%\), with the worstcase being \(1.58\%\) under \(50\%\) coverage. Such levels of simulation uncertainty are considered sufficiently precise, and 1,000 simulations per scenario are deemed appropriate.
Results
Performance metrics for the six simulation scenarios are reported in Fig. 2. The limited overlap settings (\(\kappa =0.5\)) are displayed at the top (in ascending order of index trial sample size, from top to bottom), followed by the full overlap settings (\(\kappa =1\)) at the bottom. For each simulation scenario, there is a ridgeline plot depicting the spread of point estimates for the marginal log odds ratio across the 1,000 simulation runs. The dashed red lines indicate the true estimands. At the right of each ridgeline plot, a summary tabulation exhibits empirical quantities used to measure the statistical performance of each method, with MCSEs presented in parentheses alongside the corresponding performance metrics.
In the full overlap scenarios, absolute bias is similarly low for MIM and the standard version of modelbased standardization. In the limited overlap scenarios, the bias of both standardization methods has slightly higher magnitude. There seems to be a minimal increase in bias as the number of subjects in the index trial decreases. Bias is more marked in the scenarios with \(N=500\) (0.019 and 0.013 for MIM and the standard approach, respectively, in the limited overlap setting, and 0.017 and 0.017, respectively, in the full overlap setting). This is likely due to the smallsample bias inherent in logistic regression [67].
As expected, precision is lost as the index trial sample size and the level of covariate overlap are reduced. With \(\kappa =0.5\), there exists a subpopulation within the target population that does not overlap with the index trial population. Therefore, inferences in a subsample of the target covariate dataset will rely on extrapolation of the conditional outcome model. With poorer covariate overlap, further extrapolation is required, thereby incurring a sharper loss of precision. Precision is very similar for both standardization methods, as ESEs are virtually equal in all simulation scenarios for both. Similarly, efficiency is virtually identical for both standardization methods. As per the ESE, MSE values also increase as the number of subjects in the index trial and the level of overlap decrease. Because bias is almost negligible across the simulation scenarios, efficiency is driven more by precision than by bias.
From a frequentist viewpoint, the empirical coverage rate should be equal to the nominal coverage rate to obtain appropriate type I error rates for null hypothesis testing. Namely, 95% interval estimates should include the true marginal log odds ratio 95% of the time. Theoretically, due to our use of 1,000 Monte Carlo simulations per scenario, the empirical coverage rate is statistically significantly different to the desired 0.95 if it is less than 0.9365 or more than 0.9635. For both standardization methods, empirical coverage rates only fall outside these boundaries, marginally — 0.934 for MIM and 0.935 for the standard approach – in the scenario with \(N=500\) and full overlap. This suggests that uncertainty quantification by the standardization methods is adequate.
Discussion
Despite measuring statistical performance in terms of frequentist finitesample properties, MIM offers performance comparable to that of the standard version of modelbased standardization. Both approaches provide appropriate inference with a correctly specified parametric conditional outcome model. The simulation study demonstrates proofofprinciple for the standardization methods, but only considers a simple bestcase scenario with correct model specification and two continuous covariates. It does not investigate how robust the methods are to failures in assumptions.
Parametric outcome models impose strong functional form assumptions; for example, that effects are linear and additive on some transformation of the conditional outcome expectation. Such modeling assumptions may not be plausible where there are a large number of covariates and complex nonlinear relationships between them. To provide some protection against model misspecification bias, one may consider using flexible dataadaptive estimators, e.g. nonparametric or machine learning techniques, for the conditional outcome model (the firststage regression in MIM). While such approaches make weaker modeling assumptions, they may still be subject to largerthandesirable bias in finite samples and are constrained by limited theoretical justification for valid statistical inference [68].
In practice, the use of MIM is appealing for several reasons. Firstly, as illustrated in Additional file 1, MIM can readily handle missingness in the patientlevel data for the comparative index study. Missing outcomes, and potentially covariate and treatment values, could be imputed in each MCMC iteration of the synthesis stage, naturally accounting for the uncertainty in the missing data of the index study.
Secondly, the Bayesian firststage regression model can incorporate both hard external evidence (e.g. the results of a metaanalysis) and soft external evidence (e.g. expert knowledge) to construct informative prior distributions for the model coefficients. When external data cannot be leveraged, “weakly informative” contextual information can be used to construct skeptical or regularization prior distributions. Through shrinkage, such priors can improve efficiency with respect to maximumlikelihood estimators in certain scenarios [69].
Thirdly, a Bayesian formulation for the firststage regression offers additional flexibility to address other issues, such as measurement error in the patientlevel data of the index trial [70]. Bayesian model averaging can be used to capture structural or model uncertainty [71]. When one is unsure about which baseline covariates are (conditional) effect measure modifiers, one can allow interactions to be “half in, half out” by specifying skeptical prior distributions for the candidate product term coefficients in Eq. 2 [72,73,74].^{Footnote 4}
In this article, we have used multiple imputation to perform modelbased standardization over a target empirical covariate distribution, assumed to belong to participants that are external to the index study. In this scenario, one is reliant on correct specification of the outcome model (the firststage regression in MIM) for unbiased estimation in the target. One is also reliant on covariates being consistently defined across data sources and on complete information on influential covariates being available both for the index study and for the target. In practice, this is a key challenge [75, 76], which could be addressed through the development of core patient characteristic sets that define clinically important covariates to be measured and reported among specific therapeutic areas [77].
In the absence of overlap between the covariate distributions in the index study and the external target, e.g. when the index study covariate distribution lies outside the target covariate distribution, one must consider the plausibility of the outcome model extrapolation. Sensitivity analyses using alternative model specifications may be warranted to explore the dependence of inferences on the selected adjustment model. Recently, several authors have proposed sensitivity analyses that are applicable where potential effect measure modifiers are measured only in the index trial but not in the target dataset [78, 79]. These techniques could be applied in conjunction with MIM.
While we have used multiple imputation to standardize over an external covariate distribution, it is also possible to standardize over the empirical covariate distribution of the index study, as illustrated in Additional file 1. This involves less stringent assumptions and avoids modelbased extrapolation into an external data source. Such approach allows for the estimation of covariateadjusted marginal treatment effects within individual comparative studies, adjusting for covariate imbalances between treatment arms.
A limitation of this article is the lack of a real case study demonstrating the application of the new methodology. While proofofprinciple for MIM has been provided through simulation studies, the method should be applied to a real example in order to influence applied practice. This is a key priority for future research.
Availability of data and materials
The files required to generate the data, run the simulations, and reproduce the results are available at http://github.com/remiroazocar/MIM.
Notes
The files required to run the simulations are available at http://github.com/remiroazocar/MIM.
In contrast, the true average conditional log odds ratio for active treatment versus control, at the covariate means of the target population, is given by the weighted average: \(\beta _t + \beta _{2, 1} m^*_1 + \beta _{2,2} m^*_2 = 2.5 + 0.5 \times (1.1 + (1\kappa )^2) + 0.2 \times 0.5 \times (1.1 + (1\kappa )^2)\). This is equal to 0.69 for \(\kappa =0.5\) and to 0.84 for \(\kappa =1\).
In the words of Simon and Freedman [74], this “encourages the quantification of prior belief about the size of interactions that may exist. Rather than forcing the investigator to adopt one of two extreme positions regarding interactions, it provides for the specification of intermediate positions.”
Abbreviations
 ADEMP:

Aims, Datagenerating mechanisms, Estimands, Methods, Performance measures
 DAG:

Directed acyclic graph
 ESE:

Empirical standard error
 IPD:

Individual patient data
 HTA:

Health technology assessment
 MCMC:

Markov chain Monte Carlo
 MCSE:

Monte Carlo standard error
 MIM:

Multiple imputation marginalization
 MSE:

Mean square error
 RCT:

Randomized controlled trial
References
RemiroAzócar A. Target estimands for populationadjusted indirect comparisons. Stat Med. 2022;41(28):5558–69.
RussekCohen E. Discussion of “target estimands for populationadjusted indirect comparisons’’ by Antonio RemiroAzocar. Stat Med. 2022;41(28):5573–6.
Spieker AJ. Comments on the debate between marginal and conditional estimands. Stat Med. 2022;41(28):5589–91.
Senn S. Conditions for success and margins of error: estimation in clinical trials. Stat Med. 2022;41(28):5586–8.
Schiel A. Commentary on “Target estimands for populationadjusted indirect comparisons’’. Stat Med. 2022;41(28):5570–2.
Van Lancker K, Vo TT, Akacha M. Estimands in heath technology assessment: a causal inference perspective. Stat Med. 2022;41(28):5577–85.
Greenland S, Pearl J. Adjustments and their consequencescollapsibility analysis using graphical models. Int Stat Rev. 2011;79(3):401–26.
Greenland S, Morgenstern H. Confounding in health research. Annu Rev Public Health. 2001;22(1):189–212.
Kaufman JS. Marginalia: comparing adjusted effect measures. Epidemiology. 2010;21(4):490–3.
Whittemore AS. Collapsibility of multidimensional contingency tables. J R Stat Soc: Ser B (Methodol). 1978;40(3):328–40.
Greenland S, Pearl J, Robins JM. Confounding and collapsibility in causal inference. Stat Sci. 1999;14(1):29–46.
Huitfeldt A, Stensrud MJ, Suzuki E. On the collapsibility of measures of effect in the counterfactual causal framework. Emerg Themes Epidemiol. 2019;16:1–5.
Morris TP, Walker AS, Williamson EJ, White IR. Planning a method for covariate adjustment in individuallyrandomised trials: a practical guide. Trials. 2022;23(328).
Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med. 2013;32(16):2837–49.
RemiroAzócar A, Heath A, Baio G. Methods for population adjustment with limited access to individual patient data: A review and simulation study. Res Synth Methods. 2021;12(6):750–75.
RemiroAzócar A, Heath A, Baio G. Parametric Gcomputation for compatible indirect treatment comparisons with limited individual patient data. Res Synth Methods. 2022;13(6):716–44.
RemiroAzócar A. Twostage matchingadjusted indirect comparison. BMC Med Res Methodol. 2022;22(1):1–16.
Josey KP, Berkowitz SA, Ghosh D, Raghavan S. Transporting experimental results with entropy balancing. Stat Med. 2021;40(19):4310–26.
Phillippo DM, Dias S, Ades A, Belger M, Brnabic A, Schacht A, et al. Multilevel network metaregression for populationadjusted treatment comparisons. J R Stat Soc Ser A (Stat Soc). 2020;183(3):1189–210.
Robins J. A new approach to causal inference in mortality studies with a sustained exposure periodapplication to control of the healthy worker survivor effect. Math Model. 1986;7(9–12):1393–512.
Zhang Z. Estimating a marginal causal odds ratio subject to confounding. Commun StatTheory Methods. 2008;38(3):309–21.
Moore KL, van der Laan MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med. 2009;28(1):39–64.
Austin PC. Absolute risk reductions, relative risks, relative risk reductions, and numbers needed to treat can be obtained from a logistic regression model. J Clin Epidemiol. 2010;63(1):2–6.
Rosenblum M, Van Der Laan MJ. Simple, efficient estimators of treatment effects in randomized trials using generalized linear models to leverage baseline variables. Int J Biostat. 2010;6(1).
Snowden JM, Rose S, Mortimer KM. Implementation of Gcomputation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol. 2011;173(7):731–8.
Wang A, Nianogo RA, Arah OA. Gcomputation of average treatment effects on the treated and the untreated. BMC Med Res Methodol. 2017;17(1):1–5.
Daniel R, Zhang J, Farewell D. Making apples from oranges: Comparing noncollapsible effect estimators and their standard errors after adjustment for different covariate sets. Biom J. 2021;63(3):528–57.
Campbell H, Park JE, Jansen JP, Cope S. Standardization allows for efficient unbiased estimation in observational studies and in indirect treatment comparisons: a comprehensive simulation study. 2023. arXiv preprint arXiv:230109661.
Vo TT, Porcher R, Chaimani A, Vansteelandt S. A novel approach for identifying and addressing casemix heterogeneity in individual participant data metaanalysis. Res Synth Methods. 2019;10(4):582–96.
Rubin DB. Multiple imputation for nonresponse in surveys. vol. 81. New York: Wiley; 2004.
Westreich D, Edwards JK, Cole SR, Platt RW, Mumford SL, Schisterman EF. Imputation approaches for potential outcomes in causal inference. Int J Epidemiol. 2015;44(5):1731–7.
Remiro Azócar A. PopulationAdjusted Indirect Treatment Comparisons with Limited Access to PatientLevel Data. London: UCL (University College London); 2022.
RemiroAzócar A, Heath A, Baio G. Marginalization of regressionadjusted treatment effects in indirect comparisons with limited patientlevel data. 2020. arXiv preprint arXiv:200805951.
Girman CJ, Ritchey ME, Zhou W, Dreyer NA. Considerations in characterizing realworld data relevance and quality for regulatory purposes: a commentary. Pharmacoepidemiol Drug Saf. 2019;28(4):439.
Weiss NS. Generalizing from the results of randomized studies of treatment: Can nonrandomized studies be of help? Eur J Epidemiol. 2019;34(8):715–8.
Ramsey SD, Adamson BJ, Wang X, Bargo D, Baxi SS, Ghosh S, et al. Using electronic health record data to identify comparator populations for comparative effectiveness research. J Med Econ. 2020;23(12):1618–22.
Rothwell PM. External validity of randomised controlled trials: “to whom do the results of this trial apply?’’. Lancet. 2005;365(9453):82–93.
Rothwell PM. Commentary: External validity of results of randomized trials: disentangling a complex concept. Int J Epidemiol. 2010;39(1):94–6.
Greenhouse JB, Kaizar EE, Kelleher K, Seltman H, Gardner W. Generalizing from clinical trial data: a case study. The risk of suicidality among pediatric antidepressant users. Stat Med. 2008;27(11):1801–1813.
Happich M, Brnabic A, Faries D, Abrams K, Winfree KB, Girvan A, et al. Reweighting randomized controlled trial evidence to better reflect real lifea case study of the Innovative Medicines Initiative. Clin Pharmacol Ther. 2020;108(4):817–25.
Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688.
Tackney MS, Morris T, White I, Leyrat C, DiazOrdaz K, Williamson E. A comparison of covariate adjustment approaches under model misspecification in individually randomized trials. Trials. 2023;24(1):1–18.
Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424.
Meng XL. Multipleimputation inferences with uncongenial sources of input. Stat Sci. 1994;9(4):538–58.
Gabrio A, Mason AJ, Baio G. A full Bayesian model to handle structural ones and missingness in economic evaluations from individuallevel data. Stat Med. 2019;38(8):1399–420.
Rubin DB. Multiple imputations in sample surveysa phenomenological Bayesian approach to nonresponse. In: Proceedings of the survey research methods section of the American Statistical Association. vol. 1. USA: American Statistical Association Alexandria; 1978. p. 20–34.
Vo TT. A cautionary note on the use of Gcomputation in population adjustment. Res Synth Methods. 2023;14(3):338–41.
Dahabreh IJ, Robertson SE, Steingrimsson JA, Stuart EA, Hernan MA. Extending inferences from a randomized trial to a new target population. Stat Med. 2020;39(14):1999–2014.
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, et al. Stan: a probabilistic programming language. J Stat Softw. 2017;76(1).
Raghunathan TE, Reiter JP, Rubin DB. Multiple imputation for statistical disclosure limitation. J Off Stat. 2003;19(1):1.
Rubin DB. Statistical disclosure limitation. J Off Stat. 1993;9(2):461–8.
Reiter JP. Satisfying disclosure restrictions with synthetic data sets. J Off Stat. 2002;18(4):531.
Reiter JP. Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. J R Stat Soc Ser A (Stat Soc). 2005;168(1):185–205.
Si Y, Reiter JP. A comparison of posterior simulation and inference by combining rules for multiple imputation. J Stat Theory Pract. 2011;5(2):335–47.
Reiter JP, Raghunathan TE. The multiple adaptations of multiple imputation. J Am Stat Assoc. 2007;102(480):1462–71.
Raab GM, Nowok B, Dibben C. Practical data synthesis for large samples. J Priv Confidentiality. 2016;7(3):67–97.
Bujkiewicz S, Achana F, Papanikos T, Riley R, Abrams K. Multivariate metaanalysis of summary data for combining treatment effects on correlated outcomes and evaluating surrogate endpoints. NICE DSU Tech Support Doc. 2019;20.
Reiter JP. Inference for partially synthetic, public use microdata sets. Surv Methodol. 2003;29(2):181–8.
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–102.
Team RC, et al. R: a language and environment for statistical computing. 2013.
Phillippo DM, Dias S, Ades A, Welton NJ. Assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study. Stat Med. 2020;39(30):4885–911.
Austin PC. The performance of different propensity score methods for estimating marginal odds ratios. Stat Med. 2007;26(16):3078–94.
Austin PC, Stafford J. The performance of two datageneration processes for data with specified marginal treatment odds ratios. Commun StatSimul Comput®. 2008;37(6):1039–1051.
RemiroAzócar A. Purely prognostic variables may modify marginal treatment effects for noncollapsible effect measures. 2022. arXiv preprint arXiv:221001757.
Goodrich B, Gabry J, Ali I, Brilleman S. rstanarm: Bayesian applied regression modeling via Stan. R Pack Version. 2020;2(1).
Team SD. RStan: the R interface to Stan. R Packag Version. 2020;2(21.2).
Nemes S, Jonasson JM, Genell A, Steineck G. Bias in odds ratios by logistic regression modelling and sample size. BMC Med Res Methodol. 2009;9:1–5.
Naimi AI, Mishler AE, Kennedy EH. Challenges in obtaining valid causal effect estimates with machine learning algorithms. Am J Epidemiol. 2023;192(9):1536–44.
Keil AP, Daza EJ, Engel SM, Buckley JP, Edwards JK. A Bayesian approach to the gformula. Stat Methods Med Res. 2018;27(10):3183–204.
Keil AP, Daniels JL, HertzPicciotto I. Autism spectrum disorder, flea and tick medication, and adjustments for exposure misclassification: the CHARGE (CHildhood Autism Risks from Genetics and Environment) casecontrol study. Environ Health. 2014;13(1):1–10.
Madigan D, Raftery AE. Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc. 1994;89(428):1535–46.
Dixon DO, Simon R. Bayesian subset analysis. Biometrics. 1991;47(3):871–81.
Spiegelhalter DJ, Freedman LS, Parmar MK. Bayesian approaches to randomized trials. J R Stat Soc Ser A (Stat Soc). 1994;157(3):357–87.
Simon R, Freedman LS. Bayesian design and analysis of two x two factorial clinical trials. Biometrics. 1997;53(2):456–64.
Stuart EA, Bradshaw CP, Leaf PJ. Assessing the generalizability of randomized trial results to target populations. Prev Sci. 2015;16:475–85.
Stuart EA, Rhodes A. Generalizing treatment effect estimates from sample to population: A case study in the difficulties of finding sufficient data. Eval Rev. 2017;41(4):357–88.
Vuong ML, Tu PHT, Duong KL, Vo TT. Development of minimum reporting sets of patient characteristics in epidemiological research: a methodological systematic review. Res Methods Med Health Sci. 2023;26320843231191777.
Nguyen TQ, Ebnesajjad C, Cole SR, Stuart EA. Sensitivity analysis for an unobserved moderator in RCTtotargetpopulation generalization of treatment effects. Ann Appl Stat. 2017;11(1):225–47.
Dahabreh IJ, Hernán MA. Extending inferences from a randomized trial to a target population. Eur J Epidemiol. 2019;34:719–22.
Acknowledgements
Not applicable.
Funding
AH is funded by Canada Research Chair in Statistical Trial Design; Natural Sciences and Engineering Research Council of Canada (award No. RGPIN202103366).
Author information
Authors and Affiliations
Contributions
A.R.A., A.H. and G.B. conceived the research idea, developed the methodology, performed the analyses, prepared the figures, and wrote and reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
RemiroAzócar, A., Heath, A. & Baio, G. Modelbased standardization using multiple imputation. BMC Med Res Methodol 24, 32 (2024). https://doi.org/10.1186/s1287402402157x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402402157x