Skip to main content

Two-stage matching-adjusted indirect comparison



Anchored covariate-adjusted indirect comparisons inform reimbursement decisions where there are no head-to-head trials between the treatments of interest, there is a common comparator arm shared by the studies, and there are patient-level data limitations. Matching-adjusted indirect comparison (MAIC), based on propensity score weighting, is the most widely used covariate-adjusted indirect comparison method in health technology assessment. MAIC has poor precision and is inefficient when the effective sample size after weighting is small.


A modular extension to MAIC, termed two-stage matching-adjusted indirect comparison (2SMAIC), is proposed. This uses two parametric models. One estimates the treatment assignment mechanism in the study with individual patient data (IPD), the other estimates the trial assignment mechanism. The first model produces inverse probability weights that are combined with the odds weights produced by the second model. The resulting weights seek to balance covariates between treatment arms and across studies. A simulation study provides proof-of-principle in an indirect comparison performed across two randomized trials. Nevertheless, 2SMAIC can be applied in situations where the IPD trial is observational, by including potential confounders in the treatment assignment model. The simulation study also explores the use of weight truncation in combination with MAIC for the first time.


Despite enforcing randomization and knowing the true treatment assignment mechanism in the IPD trial, 2SMAIC yields improved precision and efficiency with respect to MAIC in all scenarios, while maintaining similarly low levels of bias. The two-stage approach is effective when sample sizes in the IPD trial are low, as it controls for chance imbalances in prognostic baseline covariates between study arms. It is not as effective when overlap between the trials’ target populations is poor and the extremity of the weights is high. In these scenarios, truncation leads to substantial precision and efficiency gains but induces considerable bias. The combination of a two-stage approach with truncation produces the highest precision and efficiency improvements.


Two-stage approaches to MAIC can increase precision and efficiency with respect to the standard approach by adjusting for empirical imbalances in prognostic covariates in the IPD trial. Further modules could be incorporated for additional variance reduction or to account for missingness and non-compliance in the IPD trial.

Peer Review reports


In many countries, health technology assessment (HTA) addresses whether new treatments should be reimbursed by public health care systems [1]. This often requires estimating relative effects for interventions that have not been directly compared in a head-to-head trial [2]. Consider that there are two active treatments of interest, say A and B, that have not been evaluated in the same study, but have been contrasted against a comparator C in different studies. In this situation, an indirect comparison of relative treatment effect estimates is required. The analysis is said to be anchored by the common comparator C.

A typical situation in HTA is that where a pharmaceutical company has individual patient data (IPD) from its own study comparing A versus C, which we shall denote the index trial, but only published aggregate-level data (ALD) from another study comparing B versus C, which we call the competitor trial. In this two-study scenario, cross-trial imbalances in effect measure modifiers, effect modifiers for short, make the standard indirect treatment comparisons [3] vulnerable to bias [4]. Novel covariate-adjusted indirect comparison methods have been introduced to account for these imbalances and provide equipoise to the comparison [5,6,7,8,9].

The most popular methodology [10] in peer-reviewed publications and submissions for reimbursement is matching-adjusted indirect comparison (MAIC) [11,12,13]. MAIC weights the subjects in the index trial to create a “pseudo-sample” with balanced moments with respect to the competitor trial. The standard formulation of MAIC proposed by Signorovitch et al. [11] uses a method of moments to estimate a logistic regression, which models the trial assignment mechanism. The weights are derived from the fitted model and represent the odds of assignment to the competitor trial for the subjects in the IPD, conditional on selected baseline covariates.

Under no failures of assumptions, MAIC has produced unbiased treatment effect estimation in simulation studies [7, 14,15,16,17,18,19,20]. Nevertheless, there are some concerns about its inefficiency and instability, particularly where covariate overlap is poor and effective sample sizes (ESSs) after weighting are small [21]. These scenarios are pervasive in health technology appraisals [10]. In these cases, weighting methods are sensitive to inordinate influence by a few subjects with extreme weights and are vulnerable to poor precision. A related concern is that feasible numerical solutions may not exist where there is no common covariate support [21, 22]. Where overlap is weak, methods based on modeling the outcome expectation exhibit greater precision and efficiency than MAIC [21, 23,24,25] but are prone to extrapolation, which may lead to severe bias under model misspecification [26, 27].

Consequently, modifications of MAIC that seek to maximize precision have been presented. An alternative implementation estimates the weights using entropy balancing [17, 28]. The proposal is similar to the standard method of moments, with the additional constraint that the weights are as close as possible to unit weights, potentially penalizing extreme weighting schemes. While the approach has appealing computational properties, Phillippo et al. have proved that it is mathematically equivalent to the standard method of moments [29].

More recently, Jackson et al. have developed a distinct weight estimation procedure that satisfies the conventional method of moments while explicitly maximizing the ESS [22]. This translates into minimizing the dispersion of the weights, with more stable weights improving precision at the expense of inducing bias.

Other approaches to limit the undue impact of extreme weights involve truncating or capping the weights. These are common in survey sampling [30] and in many propensity score settings [31, 32] but are yet to be investigated specifically alongside MAIC. Again, a clear trade-off is involved from a bias-variance standpoint. Lower variance comes at the cost of sacrificing balance and accepting bias [33, 34]. Limitations of weight truncation are that it shifts the target population or estimand definition, and that it requires arbitrary ad hoc decisions on cutoff thresholds.

In order to gain efficiency, I propose a modular extension to MAIC which uses two parametric models. One estimates the treatment assignment mechanism in the index study, the other estimates the trial assignment mechanism. The first model produces inverse probability of treatment weights that are combined with the weights produced by the second model. I term this approach two-stage matching-adjusted indirect comparison (2SMAIC).

In the anchored scenario, the conventional version of MAIC relies on randomization in the index trial. In this setting, the treatment assignment mechanism (the true conditional probability of treatment among the subjects) is typically known. In addition, randomization ensures that there is no confounding on expectation. Therefore, it may seem counter-intuitive to model the treatment assignment mechanism in this study. Nevertheless, this additional step is beneficial to control for finite-sample imbalances in prognostic baseline covariates. These imbalances often arise due to chance and correcting for them leads to efficiency gains.

An advantage of 2SMAIC is that, due to incorporating a treatment assignment model, it is also applicable where the index study is observational. In this case, within-study randomization is not leveraged and concerns about internal validity must be addressed by including potential confounders of the treatment-outcome association in the treatment assignment model. The estimation procedure for the trial assignment weights does not necessarily need to be that of Signorovitch et al. [11] and alternative methods could be used [16, 22]. Further modules could be incorporated to account for missingness [35] and non-compliance [36], e.g. dropout or treatment switching, in the index trial.

I conduct a proof-of-concept simulation study to examine the finite-sample performance of 2SMAIC with respect to the standard MAIC when the index study is an RCT. The two-stage approach improves the precision and efficiency of MAIC without introducing bias. The results are consistent with previous research on the efficiency of propensity score estimators [37, 38]. Finally, the use of weight truncation in combination with MAIC is explored for the first time. Example code to implement the methodologies in R is provided in Additional file 1.


Context and data structure

We focus on the following setting, which is common in submissions to HTA agencies. Let S and T denote indicators for the assigned study and the assigned treatment, respectively. There are two separate studies that enrolled distinct sets of participants and have now been completed. The index study (S=1) compares active treatment A (T=1) versus C (T=0), e.g. standard of care or placebo. The competitor study (S=2) evaluates active treatment B (T=2) versus C (T=0). Covariate-adjusted indirect comparisons such as MAIC perform a treatment comparison in the S=2 sample, implicitly assumed to be of policy interest. We ask ourselves the question: what would be the marginal treatment effect for A versus B had these treatments been compared in an RCT conducted in S=2?

The marginal treatment effect for A vs. B is estimated on the linear predictor (e.g. mean difference, log-odds ratio or log hazard ratio) scale as:

$$\hat{\Delta}_{12}^{(2)} = \hat{\Delta}_{10}^{(2)} - \hat{\Delta}_{20}^{(2)},$$

where \(\hat {\Delta }_{10}^{(2)}\) is an estimate of the hypothetical marginal treatment effect for A vs. C in the competitor study sample, and \(\hat {\Delta }_{20}^{(2)}\) is an estimate of the marginal treatment effect of B vs. C in the competitor study sample. MAIC uses weighting to transport inferences for the marginal A vs. C treatment effect from S=1 to S=2. The estimate \(\hat {\Delta }_{10}^{(2)}\) is produced, which is then input into Eq. 1. Because the within-trial relative effect estimates are assumed statistically independent, their variances are summed to estimate the variance of the marginal treatment effect for A vs. B.

The manufacturer submitting evidence for reimbursement has access to individual-level data \(\mathcal {D}_{AC}=({\boldsymbol {x},\boldsymbol {t},\boldsymbol {y}})\) on covariates, treatment and outcomes for the participants in its trial. Here, x is a matrix of pre-treatment baseline covariates (e.g. comorbidities, age, gender), of size n×k, where n is the total number of subjects in the study sample and k is the number of covariates. A row vector xi=(xi,1,xi,2,…,x1,k) of k covariates is recorded for each participant i=1,…n. We let y=(y1,y2,…,yn) denote a vector of the clinical outcome of interest and t=(t1,t2,…,tn) denote a binary treatment indicator vector. We shall assume that there is no loss to follow-up or missing data on covariates, treatment and outcome in \(\mathcal {D}_{AC}\).

We consider all baseline covariates to be prognostic of the clinical outcome and select a subset of these, zx, as marginal effect modifiers for A with respect to C on the linear predictor scale, with a row vector zi recorded for each patient i. In the absence of randomization, the variables in x would induce confounding between the treatment arms in the index study (internal validity bias). On the other hand, cross-trial imbalances in the variables in z induce external validity bias with respect to the competitor study sample.

Neither the manufacturer submitting the evidence nor the HTA agency evaluating it have access to IPD for the competitor trial. We let \(\mathcal {D}_{BC}=[\boldsymbol {\theta }_{\boldsymbol {x}}, \hat {\Delta }_{20}^{(2)}, \hat {V}(\hat {\Delta }_{20}^{(2)})]\) represent the published ALD that is available for this study. No patient-level covariates, treatment or outcomes are available. Here, θx denotes a vector of means or proportions for the covariates; although higher-order moments such as variances may also be available. An assumption is that a sufficiently rich set of baseline covariates has been measured for the competitor study. Namely, that summaries for the subset θzθx of covariates that are marginal effect modifiers are described in the table of baseline characteristics in the study publication.

Also available is an internally valid estimate \(\hat {\Delta }_{20}^{(2)}\) of the marginal treatment effect for B vs. C in the competitor study sample, and an estimate \(\hat {V}(\hat {\Delta }_{20}^{(2)})\) of its variance. These are either directly reported in the publication or, assuming that the competitor study is a well-conducted RCT, derived from crude aggregate outcomes in the literature.

Matching-adjusted indirect comparison

In MAIC, IPD from the index study are weighted so that the moments of selected covariates are balanced with respect to the published moments of the competitor study. The weight wi for each participant i in the index trial is estimated using a logistic regression:

$$\ln(w_{i}) = \ln[w(\boldsymbol{z}_{i})] = \ln \left[ \frac{Pr(S=2 \mid \boldsymbol{z}_{i})}{1 - Pr(S=2 \mid \boldsymbol{z}_{i})} \right] = \alpha_{0} + \boldsymbol{z}_{i}\boldsymbol{\alpha}_{\boldsymbol{1}},$$

where α0 is the model intercept and α1 is a vector of model coefficients. While most applications of weighting, e.g. to control for confounding in observational studies, construct “inverse probability” weights for treatment assignment, MAIC uses “odds weighting” [39, 40] to model trial assignment. The weight wi represents the conditional odds that an individual i with covariates zi, selected as marginal effect modifiers, is enrolled in the competitor study. Alternatively, the weight represents the inverse conditional odds that the individual is enrolled in the index study.

The logistic regression parameters in Eq. 2 cannot be derived using conventional methods such as maximum-likelihood estimation, due to unavailable IPD for the competitor trial. Signorovitch et al. propose using a method of moments instead to enforce covariate balance across studies [11]. Prior to balancing, the IPD covariates are centered on the means or proportions published for the competitor trial. The centered covariates for subject i in the IPD are defined as \(\boldsymbol {z}^{\boldsymbol {*}}_{i} = \boldsymbol {z}_{i} - \boldsymbol {\theta }_{\boldsymbol {z}}\).

Weight estimation involves minimizing the objective function:

$$Q(\boldsymbol{\alpha}_{\boldsymbol{1}}) = \sum\limits_{i=1}^{n} \exp \left(\boldsymbol{z}^{\boldsymbol{*}}_{i} \boldsymbol{\alpha}_{\boldsymbol{1}}\right).$$

The function Q(α1) is convex [11] and can be minimized using standard convex optimization algorithms [41]. Provided that there is adequate overlap, minimization yields the unique finite solution: \(\hat {\boldsymbol {\alpha }}_{\boldsymbol {1}}=\text {argmin}[Q(\boldsymbol {\alpha }_{\boldsymbol {1}})]\). Feasible solutions do not exist if all the values observed for a covariate in z are greater or lesser than its corresponding element in θz [22].

After minimizing the objective function in Eq. 3, the weight estimated for the i-th participant in the IPD is:

$$\hat{w}_{i} = \exp(\boldsymbol{z}^{\boldsymbol{*}}_{i}\hat{\boldsymbol{\alpha}}_{\boldsymbol{1}}).$$

The estimated weights are relative, in the sense that any weights that are proportional are equally valid [22]. Weighting reduces the ESS of the index trial. The approximate ESS after weighting is typically estimated as \(\left (\sum _{i}^{n}\hat {w}_{i}\right)^{2}/\sum _{i}^{n}\hat {w}_{i}^{2}\) [5, 42]. Low values of the ESS suggest that a few influential participants with disproportionate weights dominate the reweighted sample.

Consequently, marginal mean outcomes for treatments A and C in the competitor study sample (S=2) are estimated as the weighted average:

$$\hat{\mu}^{(2)}_{t} = \frac{\sum_{i=1}^{n_{t}} y_{i,t} \hat{w}_{i,t}}{\sum_{i=1}^{n_{t}} \hat{w}_{i,t}},$$

where nt denotes the number of participants assigned to treatment t{0,1} of the index trial, yi,t represents the observed clinical outcome for subject i in arm t, and \(\hat {w}_{i,t}\) is the weight assigned to patient i under treatment t. For binary outcomes, \(\hat {\mu }_{t}\) would estimate the expected marginal outcome probability under treatment t. Absolute outcome estimates may be desirable as inputs to health economic models [25] or in unanchored comparisons made in the absence of a common control group.

In anchored comparisons, the objective is to estimate a relative effect for A vs. C, as opposed to absolute outcomes. Indirect treatment comparisons are typically conducted on the linear predictor scale [3, 4, 6]. Consequently, this scale is also used to define effect modification, which is scale specific [5].

One can convert the mean absolute outcome predictions produced by Eq. 5 from the natural scale to the linear predictor scale, and compute the marginal treatment effect for A vs. C in S=2 as the difference between the average linear predictions:

$$\hat{\Delta}_{10}^{(2)} = g \left(\hat{\mu}_{1}^{(2)} \right) - g \left(\hat{\mu}_{0}^{(2)} \right).$$

Here, g(·) is an appropriate link function, e.g. the identity link produces a mean difference for continuous-valued outcomes, and the \(\text {logit} \left (\hat {\mu }^{(2)}_{t} \right) = \ln \left [\hat {\mu }^{(2)}_{t}/\left (1-\hat {\mu }^{(2)}_{t} \right)\right ]\) generates a log-odds ratio for binary outcomes. Different, potentially more interpretable, choices such as relative risks and risk differences are possible for the marginal contrast. One can map to these scales by manipulating \(\hat {\mu }_{1}^{(2)}\) and \(\hat {\mu }_{0}^{(2)}\) differently.

Alternatively, the weights generated by Eq. 4 can be used to fit a simple regression of outcome on treatment to the IPD [43]. The model can be fitted using maximum-likelihood estimation, weighting the contribution of each individual i to the likelihood by \(\hat {w}_{i}\). In this approach, the treatment coefficient of the fitted weighted model is the estimated marginal treatment effect \(\hat {\Delta }_{10}^{(2)}\) for A vs. C in S=2.

The original approach to MAIC uses a robust sandwich-type variance estimator [44] to compute the standard error of \(\hat {\Delta }_{10}^{(2)}\). This relies on large-sample properties and has understated variability with small ESSs in a previous simulation study investigating MAIC [7] and in other settings [45,46,47,48]. In addition, most implementations of the sandwich estimator, e.g. when fitting the weighted regression [49], ignore the estimation of the trial assignment model, assuming the weights to be fixed quantities. While analytic expressions that incorporate the estimation of the weights could be derived, a practical alternative is to resample via the ordinary non-parametric bootstrap [23, 50, 51], re-estimating the weights and the marginal treatment effect for A vs. C in each bootstrap iteration. Point estimates, standard errors and interval estimates can be directly calculated from the bootstrap replicates.

We briefly describe the assumptions required by MAIC and their implications:

  1. 1

    Internal validity of the effect estimates derived from the index and competitor studies. This is certainly feasible where the studies are RCTs because randomization ensures exchangeability over treatment assignment on expectation. While internal validity may hold in RCTs, it is a more stringent condition for observational studies. The absence of informative measurement error, missing data, non-adherence, etc. is assumed.

  2. 2

    Consistency under parallel studies [52]. There is only one well-defined version of each treatment [53] or any variations in the versions of treatment are irrelevant [54, 55]. This applies to the common comparator C in particular.

  3. 3

    Conditional transportability (exchangeability) of the marginal treatment effect for A vs. C from the index to the competitor study [39]. Namely, trial assignment does not affect this measure, conditional on z. Prior research has referred to this assumption as the conditional constancy of relative effects [5, 6, 9]. It is plausible if z comprises all of the covariates that are considered to modify the marginal treatment effect for A vs. C (i.e., there are no unmeasured effect modifiers) [56,57,58]Footnote 1.

  4. 4

    Sufficient overlap. The ranges of the selected covariates in S=1 should cover their respective moments in S=2. Overlap violations can be deterministic or random. The former arise structurally, due to non-overlapping trial target populations (eligibility criteria). The latter arise empirically due to chance, particularly where sample sizes are small [60]. Therefore, overlap can be assessed based on absolute sample sizes. The ESS is a convenient one-number diagnostic.

  5. 5

    Correct specification of theS=2 covariate distribution. Analysts can only approximate the joint distribution because IPD are unavailable for the competitor study. Covariate correlations are rarely published for S=2 and therefore cannot be balanced by MAIC. In that case, they are assumed equal to those in the pseudo-sample formed by weighting the IPD [5].

I make a brief remark on the specification of the parametric trial assignment model in Eq. 2. This does not necessarily need to be correct as long as it balances all the covariates, and potential transformations of these covariates, e.g. polynomial transformations and product terms, that modify the marginal treatment effect for A vs. C [9, 23]. Squared terms are often included to balance variances for continuous covariates [11] but initial simulation studies do not report performance benefits [14, 17]. This is probably due to greater reductions in ESS and precision [25].

The identification of effect modifiers will likely require prior background knowledge and substantive domain expertise. Bias-variance trade-offs are also important. Failing to include an influential effect modifier in z, whether in imbalance or not, leads to bias in S=2 [5, 40, 61]. On the other hand, the inclusion of covariates that are not effect modifiers reduces overlap, thereby increasing the chance of extreme weights. This decreases precision without improving the potential for bias reduction [6, 62], even if the covariates are strongly imbalanced across studies. That is, even if they predict or are associated to trial assignment.

Put simply, as is the case for other weighting-based methods [63, 64], MAIC is potentially unbiased if either the trial assignment mechanism or the outcome-generating mechanism is known, with the latter leading to better performance due to reduced variance and increased efficiency.

Two-stage matching-adjusted indirect comparison

While the standard MAIC models the trial assignment mechanism, two-stage MAIC (2SMAIC) additionally models the treatment assignment mechanism in the index trial. The treatment assignment model is estimated to produce inverse probability of treatment weights. Then, these are combined with the odds weights generated by the standard MAIC. The resulting weights seek to balance covariate moments between the studies and the treatment arms of the index trial.

For the treatment assignment mechanism, a propensity score logistic regression of treatment on the covariates is fitted to the IPD:

$$\text{logit}[e_{i}] = \text{logit}[e(\boldsymbol{x}_{i})] = \text{logit}[Pr(T=1\mid \boldsymbol{x}_{i})] = \beta_{0} + \boldsymbol{x}_{i} \boldsymbol{\beta}_{\boldsymbol{1}},$$

where β0 and β1 parametrize the logistic regression. The propensity score ei is defined as the conditional probability that participant i is assigned treatment A versus treatment C given measured covariates xi [65].

Having fitted the model in Eq. 7, e.g. using maximum-likelihood estimation, propensity scores for the subjects in the index trial are predicted using:

$$\hat{e}_{i} = \text{expit}[\hat{\beta}_{0} + \boldsymbol{x}_{i} \hat{\boldsymbol{\beta}}_{\boldsymbol{1}}],$$

where \(\text {expit}(\cdot)=\exp (\cdot)/[1+\exp (\cdot)], \hat {\beta }_{0}\) and \(\hat {\boldsymbol {\beta }}_{\boldsymbol {1}}\) are point estimates of the logistic regression parameters, and \(\hat {e}_{i}\) is an estimate of ei. Inverse probability of treatment weights are constructed by taking the reciprocal of the estimated conditional probability of the treatment assigned in the index study [37]. That would be \(1/\hat {e}_{i}\) for units under treatment A and \(1/(1-\hat {e}_{i})\) for units under treatment C.

Consequently, the weights produced by the standard MAIC (Eq. 4) are rescaled by the estimated inverse probability of treatment weights. The contribution of each subject i in the IPD is weighted by:

$$\hat{\omega}_{i} = \frac{t_{i} \hat{w}_{i}}{\hat{e}_{i}} + \frac{(1-t_{i}) \hat{w}_{i}}{(1-\hat{e}_{i})}.$$

The weights \(\{ \hat {w}_{i}, i=1,\dots,n \}\) estimated by the standard MAIC are odds, constrained to be positive. These balance the index and competitor study studies in terms of the selected effect modifier moments. The estimated propensity scores \(\{ \hat {e}_{i},\, i=1,\dots,n \}\) are probabilities bounded away from zero and one. Therefore, the weights \(\{ \hat {\omega }_{i},\, i=1,\dots,n \}\) produced by 2SMAIC in Eq. 9 are constrained to be positive. These weights achieve balance in effect modifier moments across studies, but also seek to balance covariate moments between the index trial’s treatment groups.

Marginal mean outcomes for treatments A and C in the competitor study sample are estimated as the weighted average of observed outcomes:

$$\hat{\mu}^{(2)}_{t} = \frac{\sum_{i=1}^{n_{t}} y_{i,t} \hat{\omega}_{i,t}}{\sum_{i=1}^{n_{t}} \hat{\omega}_{i,t}},$$

where \(\hat {\omega }_{i,t}\) is the weight assigned to patient i under treatment t. One can convert the mean absolute outcome predictions generated by Eq. 10 to the linear predictor scale, and compute the marginal treatment effect for A vs. C in S=2 as the difference between the average linear predictions, as per Eq. 6. Alternatively, a weighted regression of outcome on treatment alone can be fitted to the IPD, in which case the treatment coefficient of the fitted model represents the estimated marginal treatment effect \(\hat {\Delta }_{10}^{(2)}\) for A vs. C in S=2.

Inference can be based on a robust sandwich-type variance estimator or on resampling approaches such as the non-parametric bootstrap. As noted previously, the sandwich variance estimator is biased downwards when the ESS after weighting is small, leading to overprecision. In practice, the non-parametric bootstrap is a preferred option, re-estimating both the trial assignment model and the treatment assignment model in each iteration. This approach explicitly accounts for the estimation of the weights and is expected to perform better where the ESS is small.

It may seem counter-intuitive to estimate the treatment assignment mechanism when the index trial is an RCT. The randomized design implies that the true propensity scores {ei, i=1,…,n} are fixed and known. For instance, consider a marginally randomized two-arm trial with a 1:1 treatment allocation ratio. The trial investigators have determined in advance that the probability of being assigned to active treatment versus control is ei=0.5 for all i.

The rationale for estimating the propensity scores is the following. Randomization guarantees that there is no confounding on expectation [66]. Nevertheless, covariate balance is a large-sample property, and one may still observe residual covariate imbalances between treatment groups due to chance, especially when the trial sample size is small [67]. As formulated by Senn [66], “over all randomizations the groups are balanced; for a particular randomization they are unbalanced.” The use of estimated propensity scores allows to correct for random finite-sample imbalances in prognostic baseline covariates. In the RCT literature, inverse probability of treatment weighting is an established approach for covariate adjustment [68], and has increased precision, efficiency and power with respect to unadjusted analyses in the estimation of marginal treatment effects [48, 69].

Insofar, the use of anchored MAIC has been limited to situations where the index trial is an RCT. 2SMAIC can be used when the index study is observational, provided that the baseline covariates in x offer sufficient control for confounding. In non-randomized studies, the true propensity score for each participant in the index study is unknown, and additional conditions are needed to produce internally valid estimates of the marginal treatment effect for A vs. C. These are: (1) conditional exchangeability over treatment assignment [70]; and (2) positivity of treatment assignment [60]. Randomized trials tend to meet these assumptions by design. The assumptions have conceptual parallels with the conditional transportability and overlap conditions previously described for MAIC.

The first assumption indicates that the potential outcomes of subjects in each treatment group are independent of the treatment assigned after conditioning on the selected covariates. It relies on all confounders of the effect of treatment on outcome being measured and accounted for [71]. The second assumption indicates that, for every participant in the index study, the probability of being assigned to either treatment is positive, conditional on the covariates selected to ensure exchangeability [60]. This requires overlap between the joint covariate distributions of the subjects under treatment A and under treatment C. This assumption is threatened if there are few or no individuals from either treatment group in certain covariate subgroups/strata.

Simulation study


The objectives of the simulation study are to provide proof-of-principle for 2SMAIC and to benchmark its statistical performance against that of MAIC in an anchored setting where the index study is an RCT. We also investigate whether weight truncation can improve the performance of MAIC and 2SMAIC by reducing the variance caused by extreme weights.

Each method is assessed using the following frequentist characteristics [72]: (1) unbiasedness; (2) precision; (3) efficiency (accuracy); and (4) randomization validity (valid confidence interval estimates). The selected performance metrics specifically evaluate these criteria. The ADEMP (Aims, Data-generating mechanisms, Estimands, Methods, Performance measures) framework [72] is used to describe the simulation study design. Example R code implementing the methodologies is provided in Additional file 1. All simulations and analyses have been conducted in R software version 4.1.1 [73]Footnote 2.

Data-generating mechanisms

We consider continuous outcomes using the mean difference as the measure of effect. For the index and competitor studies, outcome yi for participant i is generated as:

$$y_{i} = \beta_{0} + \boldsymbol{x}_{i}\boldsymbol{\beta}_{\boldsymbol{1}} + \left(\beta_{t} + \boldsymbol{x}_{i}\boldsymbol{\beta}_{\boldsymbol{2}} \right)\mathbb{1}(t_{i}=1) + \epsilon_{i},$$

using the notation of the index study data. Each xi contains the values of three correlated continuous covariates, which have been simulated from a multivariate normal distribution with pre-specified means and covariance matrix. There is some positive correlation between the three covariates, with pairwise Pearson correlation levels set to 0.2. The covariates have main effects and are prognostic of individual-level outcomes independently of treatment. They also have first-order covariate-treatment product terms, thereby modifying the conditional (and marginal) effects of both A and B versus C on the mean difference scale, i.e., z is equivalent to x. The term εi is an error term for subject i generated from a standard (zero-mean, unit-variance) normal distribution.

The main “prognostic” coefficient β1,k=2 for each covariate k. This is considered a strong covariate-outcome association. The interaction coefficient β2,k=1 for each covariate k, indicating notable effect modification. We set the intercept β0=5. Active treatments A and B are assumed to have the same set of effect modifiers with respect to the common comparator, and identical interaction coefficients for each effect modifier. Consequently, the shared (conditional) effect modifier assumption holds [5]. The main treatment coefficient βt=−2 is considered a strong conditional treatment effect versus the control at baseline (when the covariate values are zero).

The continuous outcome may represent a biomarker indicating disease severity. The covariates are comorbidities associated with higher values of the biomarker and which interact with the active treatments to hinder their effect versus the control.

It is assumed that the index and competitor studies are simple, marginally randomized, RCTs. The number of participants in the competitor RCT is 300, with a 1:1 allocation ratio for active treatment vs. control. For this study, individual-level covariates are summarized as means. These would be available to the analyst in a table of baseline characteristics in the study publication. Individual-level outcomes are aggregated by fitting a simple linear regression of outcome on treatment to produce an unadjusted estimate of the marginal mean difference for B vs. C, with its corresponding nominal standard error. This information would also be available in the published study.

We adopt a factorial arrangement using two index trial sample sizes times three overlap settings. This results in a total of six simulation scenarios. The following parameter values are varied:

  • Sample sizes of n{140,200} are considered for the index trial, with an allocation ratio of 1:1 for intervention A vs. C. The sample sizes are small but not unusual in applications of MAIC in HTA submissions [10]. It is anticipated that smaller trials are subject to a greater chance of covariate imbalance than larger trials [74].

  • The level of (deterministic) covariate overlap. Covariates follow normal marginal distributions in both studies. For the competitor trial, the marginal distribution means are fixed at 0.6. For the index trial, the mean μk{0.5,0.4,0.3} for each covariate k. These settings yield strong, moderate and poor overlap, respectively. The standard deviations in both studies are fixed at 0.4, i.e., a one standard deviation increase in each covariate is associated with a 0.8 unit increase in the outcome. Greater covariate imbalances across studies lead to poorer overlap between the trials’ target populations, which translates into more variable weights and a lower ESS. Unless otherwise stated, when describing the results of the simulation study, “covariate overlap” relates to deterministic overlap between the trials’ target populations and not to random violations arising due to small sample sizes.


The target estimand is the marginal mean difference for A vs. B in S=2. The treatment coefficient βt=−2 is the same for both A vs. C and B vs. C, and the shared (conditional) effect modifier assumption holds. Therefore, the true conditional treatment effects for A vs. C and B vs. C in S=2 are the same (−2+3×(0.6×1)=−0.2). Because mean differences are collapsible, the true marginal treatment effects for A vs. C and B vs. C coincide with the corresponding conditional estimands. The true marginal effect for A vs. B in S=2 is a composite of that for A vs. C and B vs. C, which cancel out. Consequently, the true marginal mean difference for A vs. B in S=2 is zero.

Note that all the methods being compared conduct the same unadjusted analysis to estimate the marginal treatment effect of B vs. C. Because the competitor study is a randomized trial, this estimate should be unbiased with respect to the corresponding marginal estimand in S=2. Therefore, differences in performance between the methods will arise from the comparison between A and C, for which marginal and conditional estimands are non-null.


Each simulated dataset is analyzed using the following methods:

  • Matching-adjusted indirect comparison (MAIC). The trial assignment model in Eq. 2 contains main effect terms for all three effect modifiers — only covariate means are balanced. The objective function in Eq. 3 is minimized using BFGS [41]. The weights estimated by Eq. 4 are used to fit a weighted simple linear regression of outcome on treatment to the index trial IPD.

  • Two-stage matching-adjusted indirect comparison (2SMAIC). We follow the same steps as for the standard MAIC. In addition, the treatment assignment model in Eq. 7 is fitted to the index study IPD, including main effect terms for all three baseline covariates. Propensity score estimates are generated by Eq. 8 and combined with the weights generated by Eq. 4 as per Eq. 9. The resulting weights are used to fit a weighted simple linear regression of outcome on treatment to the index trial IPD.

  • Truncated matching-adjusted indirect comparison (T-MAIC). This approach is identical to MAIC but the highest estimated weights (Eq. 4) are truncated using a 95th percentile cutpoint, following Susukida et al. [75, 76], Webster-Clark et al. [77], and Lee et al. [31]. Specifically, all weights above the 95th percentile are replaced by the value of the 95th percentile.

  • Truncated two-stage matching-adjusted indirect comparison (T-2SMAIC). This approach is identical to 2SMAIC but all the estimated weights (Eq. 9) larger than the 95th percentile are set equal to the 95th percentile.

All approaches use the ordinary non-parametric bootstrap to estimate the variance of the A vs. C marginal treatment effect. 2,000 resamples of each simulated dataset are drawn with replacement [50, 78]. Due to patient-level data limitations for the competitor study, only the IPD of the index trial are resampled in the implementation of the bootstrap. The average marginal mean difference for A vs. C in S=2 is computed as the average across the bootstrap resamples. Its standard error is the standard deviation across these resamples. For the “one-stage” MAIC approaches, each bootstrap iteration re-estimates the trial assignment model. For the “two-stage” MAIC approaches, both the trial assignment and the treatment assignment model are re-estimated in each iteration.

All methods perform the indirect treatment comparison in a final stage, where the results of the study-specific analyses are combined. The marginal mean difference for A vs. B is obtained by directly substituting the point estimates \(\hat {\Delta }_{10}^{(2)}\) and \(\hat {\Delta }_{20}^{(2)}\) in Eq. 1. Its variance is estimated by adding the point estimates of the variance for the within-study treatment effect estimates. Wald-type 95% confidence interval estimates are constructed using normal distributions.

Performance measures

We generate 5,000 simulated datasets per simulation scenario. For each scenario and analysis method, the following performance metrics are computed over the 5,000 replicates: (1) bias in the estimated treatment effect; (2) empirical standard error (ESE); (3) mean square error (MSE); and (4) empirical coverage rate of the 95% confidence interval estimates. These metrics are defined explicitly in prior work [7, 72].

The bias evaluates aim 1 of the simulation study. It is equal to the average treatment effect estimate across the simulations because the true estimand is zero (\(\Delta _{12}^{(2)}=0\)). The ESE targets aim 2 and is the standard deviation of the treatment effect estimates over the 5,000 runs. The MSE represents the average squared bias plus the variance across the simulated replicates. It measures overall efficiency (aim 3), accounting for both bias (aim 1) and precision (aim 2). Coverage assesses aim 4, and is computed as the percentage of estimated 95% confidence intervals that contain the true value of the estimand.

We have used 5,000 replicates per scenario based on the analysis method and scenario with the largest long-run variability (standard MAIC with n=140 and poor overlap). Assuming \(\text {SD}(\hat {\Delta }_{12}^{(2)}) \leq 0.53\), the Monte Carlo standard error (MCSE) of the bias is at most \(\sqrt {\text {Var}(\hat {\Delta }_{12}^{(2)})/N_{sim}}=\sqrt {0.28/5000}=0.007\) under 5,000 simulations per scenario, and the MCSE of the coverage, based on an empirical coverage rate of 95% is \(\left (\sqrt {(95 \times 5)/5000}\right)\%=0.31\%\), with the worst-case being 0.71% under 50% coverage. These are considered adequate levels of simulation uncertainty.


Performance measures for all methods and simulation scenarios are reported in Fig. 1. The strong overlap settings are at the top (in ascending order of index trial sample size), followed by the moderate overlap settings and the poor overlap settings at the bottom. For each data-generating mechanism, there is a ridgeline plot visualizing the spread of point estimates for the marginal A vs. B treatment effect over the 5,000 simulation replicates. Below each plot, a table summarizing the performance metrics of each method is displayed. MCSEs for each metric, used to quantify the simulation uncertainty, have been computed and are presented in parentheses alongside the average of each performance measure. These are considered negligible due to the large number of simulated datasets per scenario. In Fig. 1, Cov denotes the empirical coverage rate of the 95% confidence interval estimates.

Fig. 1
figure 1

Simulation study results. Point estimates of the treatment effect and performance metrics for all methods and simulation scenarios

In the most extreme scenario (n=140 and poor covariate overlap), weights could not be estimated for 1 of the 5,000 simulated datasets. This was due to total separation: empirically, all the values observed in the index trial for one of the baseline covariates were below the competitor study mean. Therefore, there were no feasible solutions minimizing the objective function in Eq. 3. The affected replicate was discarded, and 4,999 simulated datasets were analyzed in the corresponding scenario. With respect to the treatment assignment model, empirical overlap between treatment arms was always excellent due to randomization in the index trial.


Even with the small index trial sample sizes, bias is similarly low for MAIC and 2SMAIC without truncation in all simulation scenarios. There is a slight increase in bias as the ESS after weighting decreases, with the bias of highest magnitude occurring with n=140 and poor covariate overlap (the scenario with the lowest ESS after weighting) for MAIC (-0.041) and 2SMAIC (-0.031). In absolute terms, the bias of 2SMAIC is smaller than that of MAIC in all simulation scenarios. For 2SMAIC, it is within Monte Carlo error of zero in all scenarios except in the most extreme setting, mentioned earlier, and in the setting with n=200 and moderate overlap (-0.008). Of all methods, 2SMAIC produces the lowest bias in every simulation scenario.

Weight truncation increases absolute bias in all scenarios. T-MAIC and T-2SMAIC consistently exhibit greater bias than MAIC and 2SMAIC. When overlap is strong, truncation only induces bias very slightly. As overlap is reduced, the bias induced by truncation is more noticeable, particularly in the n=140 settings. For instance, the bias for T-MAIC and T-2SMAIC in the scenarios with poor overlap is substantial (for n=140: 0.157 and 0.160, respectively; for n=200, 0.149 and 0.153). For the truncated methods, the magnitude of the bias also appears to increase as the ESS after weighting decreases.


As expected, all methods incur precision losses as the number of subjects in the index trial and covariate overlap decrease. Despite enforcing randomization in the index trial, 2SMAIC increases precision, as measured by the ESE, with respect to MAIC in every simulation scenario. Reductions in ESE are more dramatic in the n=140 settings than in the n=200 settings. This is attributed to a greater chance of empirical covariate imbalances with smaller sample sizes. Interestingly, reduced covariate overlap seems to minimize the effect of incorporating the second (treatment assignment) stage. This is likely due to precision gains being offset by the presence of extreme weights, which lead to large reductions in ESS and inflate the ESE. The same trends are revealed for T-2SMAIC with respect to T-MAIC across the simulation scenarios. Both “two-stage” versions have reduced ESEs compared to their “one-stage” counterparts in all scenarios.

Weight truncation decreases the ESE across all simulation scenarios for one-stage and two-stage MAIC. This is to be expected as the influence of outlying weights is reduced. When overlap is strong, truncation offers only a small improvement in precision. This has little impact in comparison to the inclusion of a second stage in MAIC. For instance, under strong overlap and n=140, the ESE for MAIC and 2SMAIC is 0.516 and 0.386, respectively; compared to ESEs of 0.489 and 0.371 for the corresponding truncated versions.

The precision gains of weight truncation become more considerable as overlap weakens and the extremity of the weights increases. When overlap is poor, truncation reduces the ESE more sharply than the incorporation of a second stage in MAIC. For example, under poor overlap and n=140, the ESE of MAIC and 2SMAIC is 0.767 and 0.703, respectively, and that of the truncated versions is 0.563 and 0.519. Unsurprisingly, the combination of incorporating the second stage and truncating the weights is most effective at variance reduction. As n decreases, precision seems to be more markedly reduced for the one-stage approaches than for the two-stage approaches, and for the untruncated approaches than for the truncated ones.

Where covariate overlap is strong, T-2SMAIC has the highest precision, followed by 2SMAIC, T-MAIC and MAIC. Where covariate overlap is moderate or poor, T-2SMAIC has the highest precision, followed by T-MAIC, 2SMAIC and MAIC.


As per the ESE, MSE values decrease for all methods as the index trial sample size and covariate overlap increase. In agreement with the trends for precision, the two-stage versions of MAIC increase efficiency with respect to the corresponding one-stage methods in all scenarios, particularly in the n=140 settings. Efficiency gains for the two-stage approaches are stronger where covariate overlap is strong and become less noticeable as covariate overlap weakens, due to extreme weights. For instance, with strong overlap and n=200, MSEs for MAIC and 2SMAIC are 0.205 and 0.127, respectively. With poor overlap and n=200, these are 0.459 and 0.393, respectively.

Differences in MSE between methods are driven more by comparative precision than bias. This is expected in the strong overlap scenarios, where the bias for all methods is negligible, but also occurs in the poor overlap scenarios. The precision gains of truncation more than counterbalance the increase in bias when the variability of the weights is high. As overlap decreases, the relative efficiency of the truncated versus the untruncated approaches is markedly improved. For example, with poor overlap and n=200, the MSE of T-MAIC and T-2SMAIC is 0.263 and 0.233, respectively (compared to MSEs of 0.459 and 0.393 for MAIC and 2SMAIC).

T-2SMAIC is the most efficient method and MAIC is the least efficient method across all simulation scenarios in terms of MSE. Where covariate overlap is strong, T-2SMAIC yields the highest efficiency, followed by 2SMAIC, T-MAIC and MAIC. Where overlap is poor, T-2SMAIC has the highest efficiency, followed by T-MAIC, 2SMAIC and MAIC. Where overlap is moderate, 2SMAIC and T-MAIC have comparable efficiency.


From a frequentist perspective, 95% confidence interval estimates should include the true estimand 95% of the time. Namely, empirical coverage rates should equal the nominal coverage rates to ensure appropriate type I error rates for testing a “no effect” null hypothesis. Theoretically, due to our use of 5,000 Monte Carlo simulations per scenario, empirical coverage rates are statistically significantly different to the desired 0.95 if they are under 0.944 or over 0.956.

Empirical coverage rates for MAIC are statistically significantly different to the nominal coverage rate in all but one scenario: that with strong overlap and n=200. Where covariate overlap is strong or moderate, all other methods exhibit empirical coverage rates that are very close to the advertised nominal values (all differences are not significantly different, except for T-MAIC in the scenario with strong overlap and n=140).

There is discernible undercoverage for all methods when overlap is poor. This is particularly the case for the approaches without truncation. For instance, for the smallest sample size (n=140) with poor overlap, the empirical coverage rate is 0.900 for MAIC and 0.917 for 2SMAIC. These anti-conservative inferences could arise from the use of normal distribution-based confidence intervals when the ESS after weighting is small. While the large-sample normal approximation produces asymptotically valid inferences, a reasonable alternative in small ESS scenarios could be the use of a t-distribution. An open question is how to choose the degrees of freedom of the t-distribution.

Interestingly, coverage drops are larger for the untruncated approaches than for the truncated approaches as overlap weakens. This is surprising because the truncated methods induce sizeable bias in the poor overlap settings, and one would have expected coverage rates to be degraded further by this bias. Weight truncation has improved coverage rates in another simulation study in a different context [31]. This warrants further investigation. Overcoverage is not a problem for any of the methods as the empirical coverage rates never rise above 0.956.


Limitations of simulation study

In all simulation scenarios, two-stage methods offer enhanced precision and efficiency with respect to one-stage methods. These gains are likely linked to the prognostic strength of the baseline covariates included in the treatment assignment model. We have assumed, as is typically the case in practice, that the baseline covariates are prognostic of outcome. Less notable increases in precision and efficiency are expected when covariate-outcome associations are lower.

All approaches depend on the critical assumption of conditional transportability over trials. Given the somewhat arbitrary and unclear process driving selection into different studies in our context (in reality, there is not a formal assignment process determining whether subjects are in study sample S=1 or S=2), I have not specified a true trial assignment mechanism in the simulation study. Nevertheless, the true outcome-generating mechanism imposes linearity and additivity assumptions in the covariate-outcome associations and the treatment-by-covariate interactions. Conditional transportability holds because the trial assignment model balances means for all the covariates that modify the marginal treatment effect of A vs. C.

In real-life scenarios, it is entirely possible that more complex relationships underlie the outcome-generating process. These would potentially require balancing higher-order moments, covariate-by-covariate interactions and non-linear transformations of the covariates. In practice, sensitivity analyses will be required to explore whether there are discrepancies in the results produced by different model specifications.

The methods evaluated in this article focus on correcting for imbalances in baseline covariates, i.e., the ‘P’ in the PICO (Population, Intervention, Comparator, Outcome) framework [79]. Nevertheless, there are other kinds of differences which may bias indirect treatment comparisons, e.g. in comparator or endpoint definitions. The methodologies that have been evaluated in this article cannot adjust for these types of differences.

Contributions in light of recent simulation studies

Prior simulation studies in the context of anchored indirect treatment comparisons have concluded that outcome regression is more precise and efficient than weighting when the conditional outcome-generating mechanism is known [23, 24]. This is likely to remain the case despite the performance gains of 2SMAIC and the truncated approaches with respect to MAIC.

Nevertheless, there is one caveat. In these studies, the (one-stage) MAIC trial assignment model only accounts for covariates that are marginal effect modifiers. The reason for this is that including prognostic covariates that are not effect modifiers deteriorates precision without improving the potential for bias reduction. Conversely, the outcome regression approaches have included all prognostic covariates in the outcome model, making use of this prognostic information to increase precision and efficiency. Therefore, the equipoise or fairness in previous comparisons between weighting and outcome regression is debatable.

With 2SMAIC, weighting approaches can now make use of this prognostic information by including the relevant covariates in the treatment assignment model. Future simulation studies comparing weighting and outcome regression should involve 2SMAIC as opposed to its one-stage counterpart, particularly in these “perfect information” scenarios.

Extension to observational studies

Almost invariably, anchored MAIC has been applied in a setting where the index trial is randomized. In this setting, the inclusion of the treatment assignment model leads to efficiency gains by increasing precision. Any reduction in bias will be, at most, modest due to the internal validity of the index trial. Nevertheless, in situations where the index study is observational, the treatment assignment model can be useful to reduce internal validity bias due to confounding.

Transporting the results of a non-randomized study from S=1 to S=2 requires further untestable assumptions. Additional barriers are: (1) susceptibility to unmeasured confounding; and (2) positivity issues. Due to randomization, there is typically excellent overlap between treatment arms in RCTs. However, theoretical (deterministic) violations of positivity may occur in observational study designs [34, 60, 80], e.g. subjects with certain covariate values may have a contraindication for receiving one of the treatments, resulting in a null probability of treatment assignment.

In addition to these conceptual problems, “chance” violations of positivity may occur with small sample sizes or high-dimensional data due to sampling variability, in both randomized and non-randomized studies. These have not been observed in this simulation study. Near-violations of positivity between treatment arms may lead to extreme inverse probability of treatment weights [81], further inflating variance in 2SMAIC.

Finally, it is worth noting that observational study designs have traditionally been more prone than RCTs to additional causes of internal validity bias, e.g. missing outcome data, measurement error or protocol deviations [82].

Approaches for variance reduction

Weight truncation is a relatively informal but easily implemented method to improve precision by restricting the contribution of extreme weights. The choice of a 95th percentile cutoff is based on prior literature and is somewhat arbitrary, but worked well in this simulation study. Alternative threshold values could be considered.

Lower thresholds will further reduce variance at the cost of introducing more bias and shifting the target population or estimand definition further [32, 83]. The ideal truncation level will vary on a case-by-case basis and can be set empirically, e.g. by progressively truncating the weights [32, 84]. Density plots are likely helpful to assess the dispersion of the weights and identify an optimal cutoff point. Weight truncation is likely of little utility where there is sufficient overlap and the weights are well-behaved. Efficiency gains are expected to decrease with larger sample sizes, as the induced bias could potentially offset the reduction of variance.

We have only explored two strategies to improve efficiency: (1) modeling the trial assignment mechanism; and (2) truncating the weights that are above a certain level. Nevertheless, there are other approaches that could be used in practical applications, either on their own or combined with the procedures explored in this article. Weight trimming [85] is closely related to weight truncation. It involves excluding the subjects with outlying weights, thereby sharing many of the limitations of truncation: setting arbitrary cutoff points, and changing the target population even further. Trimming is unappealing because it directly throws away information, discarding data from some individuals, and likely losing precision with respect to truncation.

The use of stabilized weights is often recommended to gain precision and efficiency [32, 86], particularly when the weights are highly variable. In the implementations of MAIC in this article, the fitted weighted outcome model is considered to be “saturated” (i.e., cannot be misspecified) because it is a marginal model of outcome on a time-fixed binary treatment [87]. For saturated models, stabilized and unstabilized weights give identical results [87]. Nevertheless, weight stabilization is encouraged when the weighted outcome model is unsaturated, e.g. with dynamic (time-varying) or continuous-valued treatment regimens [44, 88].

Another approach that has been used to gain efficiency is overlap weighting [89, 90]. It also changes the target estimand, estimating treatment effects in a subsample with good overlap. While the approach is worth consideration, it is challenging to implement in our context because IPD are unavailable for the competitor study.

In the Background section, I referred to the weight estimation procedure by Jackson et al. [22], which satisfies the method of moments while maximizing the ESS, thereby reducing the dispersion of the weights. 2SMAIC is a modular framework and this approach could be used instead of the standard method of moments to estimate the trial assignment odds weights. Different weighting modules could be incorporated to account for missing outcomes [35], treatment switching [91, 92] and other forms of non-adherence to the protocol [36] in the index trial.


I have introduced 2SMAIC, an extension of MAIC that combines a model for the treatment assignment mechanism in the index trial with a model for the trial assignment mechanism. The first model accounts for covariate differences between treatment arms, producing inverse probability weights that can balance the treatment groups of the index study. The second model accounts for effect modifier differences between studies, generating odds weights that achieve balance across trials and allow us to transport the marginal effect for A vs. C from S=1 to S=2. In 2SMAIC, both weights are combined to attain balance between the treatment arms of the index trial and across the studies.

The statistical performance of 2SMAIC has been investigated in scenarios where the index study is an RCT. We find that the addition of a second (treatment assignment) stage increases precision and efficiency with respect to the standard one-stage MAIC. It does so without inducing bias and being less prone to undercoverage. Efficiency and precision gains are prominent when the index trial has a small sample size, in which case it is subject to empirical imbalances in prognostic baseline covariates. Two-stage MAIC accounts for these chance imbalances through the treatment assignment model, mitigating the precision loss coming with decreasing sample sizes. Precision and efficiency gains are attenuated when there is poor overlap between the target populations of the studies, due to the high extremity of the estimated weights.

The inclusion of weight truncation approaches has been evaluated for the first time in the context of MAIC. The one-stage and two-stage approaches produced very little bias before truncation was applied. Where covariate overlap was strong and the variability of the weights tolerable, truncation only improved precision and efficiency slightly, while inducing bias. The benefits of truncation become more apparent in situations with weakening overlap, where it diminishes the influence of extreme weights, substantially improving precision and even coverage with respect to the untruncated approaches.

Due to bias-variance trade-offs, precision improvements always come at the cost of bias. In this simulation study, the trade-off favors variance reduction over the induced bias, with truncation improving efficiency in all scenarios. Nevertheless, truncation is likely unnecessary when the weights are well-behaved and the ESS after weighting is sizeable. The combination of a second stage and weight truncation is most effective in improving precision and efficiency in all simulation scenarios.

When covariate overlap is poor, undercoverage is an issue for all methods, particularly for the untruncated approaches. Novel outcome regression-based techniques [21, 23,24,25, 93] may be preferable in these situations. The development of doubly robust approaches that combine outcome modeling with a model for the trial assignment weights is also attractive, as these would give researchers two chances for correct model specification.

In the absence of a common comparator group, unanchored comparisons contrast the outcomes of single treatment arms between studies. Because one of the stages relies on estimating the treatment assignment mechanism in the index study, the two-stage approaches are not applicable in the unanchored case. This is a limitation, as many applications of covariate-adjusted indirect comparisons are in this setting [10], both in published studies and in health technology appraisals.

Finally, I address a misconception that has arisen recently in the literature [25, 94]. It is believed that MAIC replicates the unadjusted analysis that would be performed in a hypothetical “ideal RCT” because it targets a marginal estimand, and that MAIC cannot make use of information on prognostic covariates. While all approaches to MAIC target marginal estimands, these produce covariate-adjusted estimates of the marginal effect. The standard one-stage approach to MAIC accounts for covariate differences across studies. The two-stage approaches introduced in this article generate covariate-adjusted estimates that also account for imbalances between treatment arms in the index trial, as is the case in covariate-adjusted analyses of RCTs.

Availability of data and materials

The files required to generate the data, run the simulations, and reproduce the results are available at


  1. This assumption is strong and untestable. Nevertheless, it is weaker than that required by unanchored comparisons. Unanchored comparisons compare absolute outcome means as opposed to relative effect estimates. Therefore, these rely on the conditional exchangeability of the absolute outcome mean under active treatment (conditional constancy of absolute effects) [5, 6, 40, 59]. This requires capturing all factors that are prognostic of outcome given active treatment.

  2. The files required to run the simulations are available at



Two-stage matching-adjusted indirect comparison


Aggregate-level data


Empirical standard error


Effective sample size


Individual patient data


Health technology assessment


Matching-adjusted indirect comparison


Monte Carlo standard error


Mean square error


Population, Intervention, Comparator, Outcome


Randomized controlled trial


Truncated matching-adjusted indirect comparison


Truncated two-stage matching-adjusted indirect comparison


  1. Vreman RA, Naci H, Goettsch WG, Mantel-Teeuwisse AK, Schneeweiss SG, Leufkens HG, Kesselheim AS. Decision making under uncertainty: comparing regulatory and health technology assessment reviews of medicines in the united states and europe. Clin Pharmacol Ther. 2020; 108(2):350–7.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. Sutton A, Ades A, Cooper N, Abrams K. Use of indirect and mixed treatment comparisons for technology assessment. Pharmacoeconomics. 2008; 26(9):753–67.

    PubMed  Article  Google Scholar 

  3. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol. 1997; 50(6):683–91.

    CAS  PubMed  Article  Google Scholar 

  4. Dias S, Sutton AJ, Ades A, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Dec Making. 2013; 33(5):607–17.

    Article  Google Scholar 

  5. Phillippo D, Ades T, Dias S, Palmer S, Abrams KR, Welton N. Nice dsu technical support document 18: methods for population-adjusted indirect comparisons in submissions to nice. Sheffield: NICE Decision Support Unit; 2016.

  6. Phillippo DM, Ades AE, Dias S, Palmer S, Abrams KR, Welton NJ. Methods for population-adjusted indirect comparisons in health technology appraisal. Med Dec Making. 2018; 38(2):200–11.

    Article  Google Scholar 

  7. Remiro-Azócar A, Heath A, Baio G. Methods for population adjustment with limited access to individual patient data: A review and simulation study. Res Synth Methods. 2021; 12(6):750–75.

    PubMed  Article  Google Scholar 

  8. Remiro-Azócar A, Heath A, Baio G. Conflating marginal and conditional treatment effects: Comments on “assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study”. Stat Med. 2021; 40(11):2753–8.

    PubMed  Article  Google Scholar 

  9. Remiro-Azócar A, Heath A, Baio G. Effect modification in anchored indirect treatment comparisons: Comments on “matching-adjusted indirect comparisons: Application to time-to-event data”. Stat Med. 2022; 41(8):1541–53.

    PubMed  Article  Google Scholar 

  10. Phillippo DM, Dias S, Elsada A, Ades A, Welton NJ. Population adjustment methods for indirect comparisons: A review of national institute for health and care excellence technology appraisals. Int J Technol Assess Health Care. 2019;35(3):221–8.

  11. Signorovitch JE, Wu EQ, Andrew PY, Gerrits CM, Kantor E, Bao Y, Gupta SR, Mulani PM. Comparative effectiveness without head-to-head trials. Pharmacoeconomics. 2010; 28(10):935–45.

    PubMed  Article  Google Scholar 

  12. Signorovitch J, Erder MH, Xie J, Sikirica V, Lu M, Hodgkins PS, Wu EQ. Comparative effectiveness research using matching-adjusted indirect comparison: an application to treatment with guanfacine extended release or atomoxetine in children with attention-deficit/hyperactivity disorder and comorbid oppositional defiant disorder. Pharmacoepidemiol Drug Saf. 2012; 21:130–7.

    CAS  PubMed  Article  Google Scholar 

  13. Signorovitch JE, Sikirica V, Erder MH, Xie J, Lu M, Hodgkins PS, Betts KA, Wu EQ. Matching-adjusted indirect comparisons: a new tool for timely comparative effectiveness research. Value Health. 2012; 15(6):940–7.

    PubMed  Article  Google Scholar 

  14. Hatswell AJ, Freemantle N, Baio G. The effects of model misspecification in unanchored matching-adjusted indirect comparison: results of a simulation study. Value Health. 2020; 23(6):751–9.

    PubMed  Article  Google Scholar 

  15. Cheng D, Ayyagari R, Signorovitch J. The statistical performance of matching-adjusted indirect comparisons: Estimating treatment effects with aggregate external control data. Ann Appl Stat. 2020; 14(4):1806–33.

    Article  Google Scholar 

  16. Wang J. On matching-adjusted indirect comparison and calibration estimation. arXiv preprint arXiv:2107.11687. 2021.

  17. Petto H, Kadziola Z, Brnabic A, Saure D, Belger M. Alternative weighting approaches for anchored matching-adjusted indirect comparisons via a common comparator. Value Health. 2019; 22(1):85–91.

    PubMed  Article  Google Scholar 

  18. Kühnast S, Schiffner-Rohe J, Rahnenführer J, Leverkus F. Evaluation of adjusted and unadjusted indirect comparison methods in benefit assessment. Methods Inf Med. 2017; 56(03):261–7.

    PubMed  Article  Google Scholar 

  19. Weber D, Jensen K, Kieser M. Comparison of methods for estimating therapy effects by indirect comparisons: A simulation study. Med Dec Making. 2020; 40(5):644–54.

    Article  Google Scholar 

  20. Jiang Y, Ni W. Performance of unanchored matching-adjusted indirect comparison (maic) for the evidence synthesis of single-arm trials with time-to-event outcomes. BMC Med Res Methodol. 2020; 20(1):1–9.

    CAS  Article  Google Scholar 

  21. Phillippo DM, Dias S, Ades A, Welton NJ. Assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study. Stat Med. 2020; 39(30):4885–911.

    PubMed  PubMed Central  Article  Google Scholar 

  22. Jackson D, Rhodes K, Ouwens M. Alternative weighting schemes when performing matching-adjusted indirect comparisons. Res Synth Methods. 2021; 12(3):333–46.

    PubMed  Article  Google Scholar 

  23. Remiro-Azócar A, Heath A, Baio G. Parametric g-computation for compatible indirect treatment comparisons with limited individual patient data. arXiv preprint arXiv:2108.12208. 2021.

  24. Remiro-Azócar A, Heath A, Baio G. Marginalization of regression-adjusted treatment effects in indirect comparisons with limited patient-level data. arXiv preprint arXiv:2008.05951. 2020.

  25. Phillippo DM, Dias S, Ades AE, Welton NJ. Target estimands for efficient decision making: Response to comments on “assessing the performance of population adjustment methods for anchored indirect comparisons: A simulation study”. Stat Med. 2021; 40(11):2759–63.

    PubMed  Article  Google Scholar 

  26. Ho DE, Imai K, King G, Stuart EA. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit Anal. 2007; 15(3):199–236.

    Article  Google Scholar 

  27. Rubin DB. Estimating causal effects from large data sets using propensity scores. Ann Intern Med. 1997; 127(8):757–63.

    CAS  PubMed  Article  Google Scholar 

  28. Belger M, Brnabic A, Kadziola Z, Petto H, Faries D. Inclusion of multiple studies in matching adjusted indirect comparisons (maic). Value Health. 2015; 18(3):33.

    Article  Google Scholar 

  29. Phillippo DM, Dias S, Ades A, Welton NJ. Equivalence of entropy balancing and the method of moments for matching-adjusted indirect comparison. Res Synth Methods. 2020; 11(4):568–72.

    PubMed  PubMed Central  Article  Google Scholar 

  30. Elliott MR, Little RJ. Model-based alternatives to trimming survey weights. J Off Stat. 2000; 16(3):191–210.

    Google Scholar 

  31. Lee BK, Lessler J, Stuart EA. Weight trimming and propensity score weighting. PloS ONE. 2011; 6(3):18174.

    Article  CAS  Google Scholar 

  32. Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008; 168(6):656–64.

    PubMed  PubMed Central  Article  Google Scholar 

  33. Moore KL, Neugebauer R, van der Laan MJ, Tager IB. Causal inference in epidemiological studies with strong confounding. Stat Med. 2012; 31(13):1380–404.

    PubMed  PubMed Central  Article  Google Scholar 

  34. Léger M, Chatton A, Le Borgne F, Pirracchio R, Lasocki S, Foucher Y. Causal inference in case of near-violation of positivity: comparison of methods.  Biom J. 2022. In press.

  35. Seaman SR, White IR. Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res. 2013; 22(3):278–95.

    PubMed  Article  Google Scholar 

  36. Cain LE, Cole SR. Inverse probability-of-censoring weights for the correction of time-varying noncompliance in the effect of randomized highly active antiretroviral therapy on incident aids or death. Stat Med. 2009; 28(12):1725–38.

    PubMed  Article  Google Scholar 

  37. Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004; 23(19):2937–60.

    PubMed  Article  Google Scholar 

  38. Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica. 1998;66(2):315–31.

    Article  Google Scholar 

  39. Westreich D, Edwards JK, Lesko CR, Stuart E, Cole SR. Transportability of trial results using inverse odds of sampling weights. Am J Epidemiol. 2017; 186(8):1010–4.

    PubMed  PubMed Central  Article  Google Scholar 

  40. Dahabreh IJ, Robertson SE, Steingrimsson JA, Stuart EA, Hernan MA. Extending inferences from a randomized trial to a new target population. Stat Med. 2020; 39(14):1999–2014.

    PubMed  Article  Google Scholar 

  41. Nocedal J, Wright S. Numerical optimization. New York: Springer Science and Business Media; 2006.

  42. Kish L. Survey Sampling. New York: Wiley; 1965.

  43. Schafer JL, Kang J. Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychol Methods. 2008; 13(4):279.

    PubMed  Article  Google Scholar 

  44. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000; 11(5):550–60.

    CAS  PubMed  Article  Google Scholar 

  45. Fay MP, Graubard BI. Small-sample adjustments for wald-type tests using sandwich estimators. Biometrics. 2001; 57(4):1198–206.

    CAS  PubMed  Article  Google Scholar 

  46. Chen Z, Kaizar E. On variance estimation for generalizing from a trial to a target population. arXiv preprint arXiv:1704.07789. 2017.

  47. Tipton E, Hallberg K, Hedges LV, Chan W. Implications of small samples for generalization: Adjustments and rules of thumb. Eval Rev. 2017; 41(5):472–505.

    PubMed  Article  Google Scholar 

  48. Raad H, Cornelius V, Chan S, Williamson E, Cro S. An evaluation of inverse probability weighting using the propensity score for baseline covariate adjustment in smaller population randomised controlled trials with a continuous outcome. BMC Med Res Methodol. 2020; 20(1):1–12.

    Article  Google Scholar 

  49. Zeileis A. Object-oriented computation of sandwich estimators. J Stat Softw. 2006; 16:1–16.

    Article  Google Scholar 

  50. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: CRC press; 1994.

  51. Sikirica V, Findling RL, Signorovitch J, Erder MH, Dammerman R, Hodgkins P, Lu M, Xie J, Wu EQ. Comparative efficacy of guanfacine extended release versus atomoxetine for the treatment of attention-deficit/hyperactivity disorder in children and adolescents: applying matching-adjusted indirect comparison methodology. CNS Drugs. 2013; 27(11):943–53.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. Hartman E, Grieve R, Ramsahai R, Sekhon JS. From sample average treatment effect to population average treatment effect on the treated: combining experimental with observational studies to estimate population treatment effects. J R Stat Soc Ser A (Stat Soc). 2015; 178(3):757–78.

    Article  Google Scholar 

  53. Rubin DB. Randomization analysis of experimental data: The fisher randomization test comment. J Am Stat Assoc. 1980; 75(371):591–3.

    Google Scholar 

  54. VanderWeele TJ, Hernan MA. Causal inference under multiple versions of treatment. J Causal Infer. 2013; 1(1):1–20.

    Article  Google Scholar 

  55. VanderWeele TJ. Concerning the consistency assumption in causal inference. Epidemiology. 2009; 20(6):880–3.

    PubMed  Article  Google Scholar 

  56. Hernán MA, VanderWeele TJ. Compound treatments and transportability of causal inference. Epidemiology (Cambridge, Mass.) 2011; 22(3):368.

    Article  Google Scholar 

  57. O’Muircheartaigh C, Hedges LV. Generalizing from unrepresentative experiments: a stratified propensity score approach. J R Stat Soc Ser C (Appl Stat). 2014; 63(2):195–210.

    Article  Google Scholar 

  58. Zhang Z, Nie L, Soon G, Hu Z. New methods for treatment effect calibration, with applications to non-inferiority trials. Biometrics. 2016; 72(1):20–29.

    CAS  PubMed  Article  Google Scholar 

  59. Rudolph KE, van der Laan MJ. Robust estimation of encouragement design intervention effects transported across sites. J R Stat Soc Ser B (Stat Methodol). 2017; 79(5):1509–25.

    Article  Google Scholar 

  60. Westreich D, Cole SR. Invited commentary: positivity in practice. Am J Epidemiol. 2010; 171(6):674–7.

    PubMed  PubMed Central  Article  Google Scholar 

  61. Stuart EA. Matching methods for causal inference: A review and a look forward. Stat Sci Rev J Inst Math Stat. 2010; 25(1):1.

    Google Scholar 

  62. Nie L, Zhang Z, Rubin D, Chu J. Likelihood reweighting methods to reduce potential bias in noninferiority trials which rely on historical data to make inference. Ann Appl Stat. 2013; 7(3):1796–813.

    Article  Google Scholar 

  63. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006; 163(12):1149–56.

    PubMed  Article  Google Scholar 

  64. Shortreed SM, Ertefaie A. Outcome-adaptive lasso: variable selection for causal inference. Biometrics. 2017; 73(4):1111–22.

    PubMed  PubMed Central  Article  Google Scholar 

  65. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983; 70(1):41–55.

    Article  Google Scholar 

  66. Senn S. Testing for baseline balance in clinical trials. Stat Med. 1994; 13(17):1715–26.

    CAS  PubMed  Article  Google Scholar 

  67. Li X, Ding P. Rerandomization and regression adjustment. J R Stat Soc Ser B (Stat Methodol). 2020; 82(1):241–68.

    Article  Google Scholar 

  68. Morris TP, Walker AS, Williamson EJ, White IR. Planning a method for covariate adjustment in individually-randomised trials: a practical guide. Trials. 2022;23:328.

  69. Williamson EJ, Forbes A, White IR. Variance reduction in randomised trials by inverse probability weighting using the propensity score. Stat Med. 2014; 33(5):721–37.

    PubMed  Article  Google Scholar 

  70. Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006; 60(7):578–86.

    PubMed  PubMed Central  Article  Google Scholar 

  71. Holland PW. Statistics and causal inference. J Am Stat Assoc. 1986; 81(396):945–60.

    Article  Google Scholar 

  72. Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019; 38(11):2074–102.

    PubMed  PubMed Central  Article  Google Scholar 

  73. Team, R Core, et al. R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna; 2013.

  74. Thompson DD, Lingsma HF, Whiteley WN, Murray GD, Steyerberg EW. Covariate adjustment had similar benefits in small and large randomized controlled trials. J Clin Epidemiol. 2015; 68(9):1068–75.

    PubMed  Article  Google Scholar 

  75. Susukida R, Crum RM, Hong H, Stuart EA, Mojtabai R. Comparing pharmacological treatments for cocaine dependence: Incorporation of methods for enhancing generalizability in meta-analytic studies. Int J Methods Psychiatr Res. 2018; 27(4):1609.

    Article  Google Scholar 

  76. Susukida R, Crum RM, Stuart EA, Mojtabai R. Generalizability of the findings from a randomized controlled trial of a web-based substance use disorder intervention. Am J Addict. 2018; 27(3):231–7.

    PubMed  PubMed Central  Article  Google Scholar 

  77. Webster-Clark MA, Sanoff HK, Stürmer T, Peacock Hinton S, Lund JL. Diagnostic assessment of assumptions for external validity: an example using data in metastatic colorectal cancer. Epidemiology (Cambridge, Mass.) 2019; 30(1):103.

    Article  Google Scholar 

  78. Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? a practical guide for medical statisticians. Stat Med. 2000; 19(9):1141–64.

    CAS  PubMed  Article  Google Scholar 

  79. Richardson WS, Wilson MC, Nishikawa J, Hayward RS, et al.The well-built clinical question: a key to evidence-based decisions. Acp j club. 1995; 123(3):12–13.

    Article  Google Scholar 

  80. Petersen ML, Porter KE, Gruber S, Wang Y, Van Der Laan MJ. Diagnosing and responding to violations in the positivity assumption. Stat Methods Med Res. 2012; 21(1):31–54.

    PubMed  Article  Google Scholar 

  81. Li F, Thomas LE, Li F. Addressing extreme propensity scores via the overlap weights. Am J Epidemiol. 2019; 188(1):250–7.

    PubMed  Google Scholar 

  82. Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman D, et al.Evaluating non-randomised intervention studies. Health Technol Assess (Winchester, England). 2003; 7(27):1–173.

    Google Scholar 

  83. Xiao Y, Moodie EE, Abrahamowicz M. Comparison of approaches to weight truncation for marginal structural cox models. Epidemiol Methods. 2013; 2(1):1–20.

    Article  Google Scholar 

  84. Kish L. Weighting for unequal pi. J Off Stat. 1992; 8(2):183.

    Google Scholar 

  85. Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika. 2009; 96(1):187–99.

    Article  Google Scholar 

  86. Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (iptw) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015; 34(28):3661–79.

    PubMed  PubMed Central  Article  Google Scholar 

  87. Shiba K, Kawahara T. Using propensity scores for causal inference: pitfalls and tips. J Epidemiol. 2021; 31:457–63.

    PubMed  PubMed Central  Article  Google Scholar 

  88. Robins JM, Hernán MA. Estimation of the causal effects of time-varying exposures. Longitudinal Data Anal. 2009; 553:599.

    Google Scholar 

  89. Zeng S, Li F, Wang R, Li F. Propensity score weighting for covariate adjustment in randomized clinical trials. Stat Med. 2021; 40(4):842–58.

    PubMed  Article  Google Scholar 

  90. Desai RJ, Franklin JM. Alternative approaches for confounding adjustment in observational studies using weighting based on the propensity score: a primer for practitioners. BMJ. 2019;367:l5657.

  91. Robins JM, Finkelstein DM. Correcting for noncompliance and dependent censoring in an aids clinical trial with inverse probability of censoring weighted (ipcw) log-rank tests. Biometrics. 2000; 56(3):779–88.

    CAS  PubMed  Article  Google Scholar 

  92. Latimer NR, Abrams K, Lambert P, Crowther M, Wailoo A, Morden J, Akehurst R, Campbell M. Adjusting for treatment switching in randomised controlled trials–a simulation study and a simplified two-stage method. Stat Methods Med Res. 2017; 26(2):724–51.

    PubMed  Article  Google Scholar 

  93. Phillippo DM, Dias S, Ades A, Belger M, Brnabic A, Schacht A, Saure D, Kadziola Z, Welton NJ. Multilevel network meta-regression for population-adjusted treatment comparisons. J R Stat Soc Ser A (Stat Soc). 2020; 183(3):1189–210.

    Article  Google Scholar 

  94. Remiro-Azócar A. Target estimands for population-adjusted indirect comparisons. In press, Stat Med. 2022.

Download references


Not applicable.


No financial support was provided for this research.

Author information

Authors and Affiliations



ARA conceived the research idea, developed the methodology, performed the analyses, prepared the figures, and wrote and reviewed the manuscript. The author read and approved the final manuscript.

Corresponding author

Correspondence to Antonio Remiro-Azócar.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

ARA is employed by Bayer plc. The author declares that he has no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Supplementary Material.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Remiro-Azócar, A. Two-stage matching-adjusted indirect comparison. BMC Med Res Methodol 22, 217 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Health technology assessment
  • Indirect treatment comparison
  • Matching-adjusted indirect comparison
  • Covariate adjustment
  • Covariate balance
  • Inverse probability of treatment weighting
  • Evidence synthesis