 Research
 Open Access
 Published:
Incorporating singlearm studies in metaanalysis of randomised controlled trials: a simulation study
BMC Medical Research Methodology volume 21, Article number: 114 (2021)
Abstract
Background
Use of real world data (RWD) from nonrandomised studies (e.g. singlearm studies) is increasingly being explored to overcome issues associated with data from randomised controlled trials (RCTs). We aimed to compare methods for pairwise metaanalysis of RCTs and singlearm studies using aggregate data, via a simulation study and application to an illustrative example.
Methods
We considered contrastbased methods proposed by Begg & Pilote (1991) and armbased methods by Zhang et al (2019). We performed a simulation study with scenarios varying (i) the proportion of RCTs and singlearm studies in the synthesis (ii) the magnitude of bias, and (iii) betweenstudy heterogeneity. We also applied methods to data from a published health technology assessment (HTA), including three RCTs and 11 singlearm studies.
Results
Our simulation study showed that the hierarchical power and commensurate prior methods by Zhang et al provided a consistent reduction in uncertainty, whilst maintaining overcoverage and small error in scenarios where there was limited RCT data, bias and differences in betweenstudy heterogeneity between the two sets of data. The contrastbased methods provided a reduction in uncertainty, but performed worse in terms of coverage and error, unless there was no marked difference in heterogeneity between the two sets of data.
Conclusions
The hierarchical power and commensurate prior methods provide the most robust approach to synthesising aggregate data from RCTs and singlearm studies, balancing the need to account for bias and differences in betweenstudy heterogeneity, whilst reducing uncertainty in estimates. This work was restricted to considering a pairwise metaanalysis using aggregate data.
Background
Health technology assessment (HTA) decisionmakers, such as the National Institute for Health and Care Excellence in England and Wales, recommend new health technologies for reimbursement based on costeffectiveness. They consider the clinical effectiveness of a technology against comparators, estimated by a metaanalysis of studies conducted in similar patient populations recording a common outcome measure [1]. A randomised controlled trial (RCT) provides the best evidence of relative effectiveness because random treatment allocation minimises participant selection bias between arms [2]. However, decisionmakers may consider observational evidence (e.g. singlearm studies) when, for example, a technology has received accelerated regulatory approval [3]. This suggests a need to develop metaanalysis methods which can combine randomised and nonrandomised studies, whilst addressing issues in nonrandomised data. Bayesian methods provide a flexible approach for combining data from different sources, and can be implemented via Markov chain Monte Carlo (MCMC) sampling which aids probablistic decisonmaking in HTA [4].
A number of methods have been proposed for pairwise metaanalysis of RCTs and singlearm studies using aggregate data, which make different assumptions regarding data variability. Begg & Pilote [5] proposed a method under a frequentist framework, which assumes exchangeability for baseline treatment effects and a common relative treatment effect. The method does not distinguish between RCTs and singlearm studies, but can be extended to account for bias in singlearm data. In this context, bias refers to the systematic difference between data from RCTs and singlearm studies. Zhang et al [6] proposed several methods under a Bayesian framework, which assume exchangeability for treatment effects on each arm. The methods distinguish between RCTs and singlearm studies, by assuming correlation between RCT arms and differences in betweenstudy heterogeneity. Some of the methods use singlearm data to inform prior distributions for model parameters. Although Zhang et al performed a simulation study to compare the relative performance between their methods [6], there has been no comparison of these methods with the methods by Begg & Pilote. Other methods, which are not considered here, use studymatching or individual participant data (IPD) to perform a network metaanalysis (NMA) of RCTs and singlearm studies. Schmitz et al [7] proposed a method using aggregate data on patient characteristics to match singlearm studies with similar patient samples, and perform a NMA of RCTs and the matched studies. Thom et al [8] proposed a method which assumes exchangeability for baseline treatment effects, and uses IPD to adjust for covariates.
In this paper, we focus on the methods proposed by Begg & Pilote [5] and Zhang et al [6], which combine data from RCTs and singlearm studies at the aggregate level. The two sets of metaanalytic methods are contrastbased and armbased methods, respectively. We aim to compare both sets of methods to investigate how these different approaches, as well as a number of other specific assumptions, affect their relative performance. We compare the methods in an extensive simulation study, building on the simulation study by Zhang et al [6]. We evaluate performance under a number of scenarios varying the proportion of RCTs and singlearm studies in the synthesis, the magnitude of bias between data from RCTs and singlearm studies, and differences in betweenstudy heterogeneity across RCTs and singlearm studies.
Illustrative example: dataset
Rheumatoid arthritis (RA) is a chronic autoimmune condition causing joint inflammation, which can be treated by a number of biologic diseasemodifying antirheumatic drugs (bDMARDs); adalimumab (ADA), etanercept (ETN), infliximab (IFX), abatacept (ABT), and rituximab (RTX) [9]. The treatment response can be assessed by using the American College of Rheumatology (ACR) response criteria, where ACR20 represents a 20% improvement in symptoms [10]. Malottki et al assessed the clinical effectiveness of bDMARDs in a HTA [11], identifying three RCTs and 11 singlearm studies for which data were available on the ACR20 outcome. Figure 1 shows a forest plot illustrating the armlevel proportions of ACR20 responders in each study. The plot does not suggest a systematic bias between data from RCTs and singlearm studies on the bDMARD arm.
There are three RCTs in which participants have been assigned a placebo or a bDMARD, and 11 singlearm studies in which participants were assigned a bDMARD. We select the three placebo arms as the baseline, so that π_{1} represents the marginal response probability for participants assigned a placebo, and π_{2} represents the marginal response probability for participants assigned a bDMARD. Thus, the odds ratio represents the increase in odds of achieving a ACR20 response for participants given a bDMARD versus placebo.
Methods
Methods for metaanalysis of RCTs and singlearm studies using aggregate data
In this section, we describe the methods by Begg & Pilote [5] and by Zhang et al [6] under a Bayesian framework. Although Begg & Pilote introduced methods under a frequentist framework, they are adapted here for Bayesian implementation to ensure a fair comparison between both sets of methods. For consistency, and to enable a direct comparison of all methods, we adapt the methods by Begg & Pilote to a dichotomous outcome. We consider a pairwise metaanalysis, with n RCTs assessing treatments one and two, m singlearm studies assessing treatment one, and l singlearm studies assessing treatment two. We let i=1,...,n index RCTs, i=n+1,...,n+m indexes singlearm studies on arm 1, and i=n+m+1,...,n+m+l indexes singlearm studies on arm 2. Here, we describe first the methods introduced by Begg & Pilote, and then the methods introduced by Zhang et al. The first set of methods parametrise treatment effect contrasts, whilst the latter parametrise treatment effects on each arm. For clarity, we define notation as we introduce each method and attempt to use the original symbols where possible. For the methods by Begg & Pilote, we begin by describing the original method (BP), and then describe the biasadjusted (BPbias) and randomeffects (BPrandom) methods by showing how they buildon the BP method. For the methods by Zhang et al, we begin by describing the bivariate generalised linear mixedeffects model (BGLMM) I method, and then describe the BGLMM II, hierarchical power prior (HPP) and hierarchical commensurate prior (HCP) methods by showing how they buildon the BGLMM I method. We then describe how marginal response probabilities are calculated for each method.
Begg & Pilote (BP) original method
By adapting the method by Begg & Pilote (BP) to Binomial data, it assumes that in each arm of study i the number of responders follows a Binomial distribution
where n_{1i},n_{2i} are the numbers of participants on arms one and two, respectively, and p_{1i},p_{2i} are the response probabilities on arms one and two, respectively. The response probability in each arm is transformed onto the linear predictor scale using a suitable link function g()
where θ_{i} represents the baseline treatment effect (i.e. the treatment effect in arm one) in study i, and δ represents the relative treatment effect (i.e. the treatment effect in arm two relative to arm one). Here, the relative treatment effect is assumed to be identical across all studies, whilst the baseline treatment effects are exchangeable (i.e. vary across studies according to a common distribution)
with mean μ and standard deviation σ. Suitably noninformative prior distributions can be placed on μ and σ; μ∼N(0,10^{5}),σ∼Γ^{−1}(10^{−4},10^{−4}).
Begg & Pilote method with biasadjustment (BPbias)
The biasadjusted version of the BP method (BPbias) extends BP in Eq. (2) with the additional assumption that singlearm data are systematically biased relative to RCT data
where ξ (for arm one) and η (for arm two) represent bias in the singlearm data. The bias is assumed to be common across singlearm studies and suitably noninformative Normal prior distributions can be placed on the bias parameters; ξ∼N(0,10^{5}) and η∼N(0,10^{5}).
Begg & Pilote method with random effects (BPrandom)
The BP method with random effects (BPrandom) extends BP in Eq. (2) by assuming exchangeable relative treatment effects
where δ_{i} are the relative treatment effects assumed to follow a Normal distribution
with mean d and standard deviation τ. Suitably noninformative prior distributions can be placed on d and τ; d∼N(0,10^{5}),τ∼Γ^{−1}(10^{−4},10^{−4}).
Bivariate generalised linear mixed effects models (BGLMM) I & II
The first method proposed by Zhang et al is bivariate generalised linear mixedeffects model (BGLMM) I, which assumes a Binomial likelihood for the armlevel data as formulated in Eq. (1). In contrast to Begg & Pilote, Zhang et al model the treatment effect in each arm of study i. For RCTs, the method assumes data are correlated between arms
where μ_{1} and μ_{2} represent the mean treatment effect in each arm, whilst (ν_{1i},ν_{2i}) are assumed to follow a bivariate Normal distribution with covariance matrix Σ, which accounts for betweenstudy heterogeneity across RCTs on each arm and correlation between arms. Noninformative Normal prior distributions can be placed on the mean treatment effects; μ_{1}∼N(0,10^{5}),μ_{2}∼N(0,10^{5}). An inverseWishart prior distribution can be placed on the covariance matrix; Σ∼W^{−1}(R,2), where R is a 2×2 scale matrix with diagonal elements equal to 1 and offdiagonal elements equal to 0.005. This prior distribution is weakly informative on both the correlation and standard deviation parameters, but correctly implies that the populationaveraged treatmentspecific event probabilities range from 0 to 1. The method assumes the same mean treatment effects μ_{1} and μ_{2} for singlearm studies
where ν_{3i} and ν_{4i} are each assumed to follow a univariate Normal distribution to account for the betweenstudy heterogeneity across singlearm studies on each arm. Similar to Zhang et al, we place inverseGamma prior distributions on the standard deviation parameters; σ_{3}∼Γ^{−1}(10^{−4},10^{−4}) and σ_{4}∼Γ^{−1}(10^{−4},10^{−4}).
The BGLMM I method can be modified in Eq. (8) to assume different mean treatment effects μ_{3} and μ_{4} for singlearm studies
Noninformative Normal prior distributions can be placed on the mean treatment effects; μ_{3}∼N(0,10^{5}),μ_{4}∼N(0,10^{5}), which can themselves be applied to inform prior distributions for μ_{1} and μ_{2} in a twostep method. First, the model specified in Eq. (9) is fit to the singlearm data to estimate posterior distributions for μ_{3} and μ_{4}, from which posterior median and standard deviation estimates are obtained. Then, the model specified in Eq. (7) is fit to the RCT data, with informative prior distributions (based on the extracted estimates) placed on the mean treatment effects; \(\mu _{1} \sim N\left (\hat {\mu _{3}}, \hat {\tau _{1}}^{2}\right), \mu _{2} \sim N\left (\hat {\mu _{4}}, \hat {\tau _{2}}^{2}\right)\). This modifiedversion of the BGLMM I method is labelled BGLMM II.
Hierarchical power prior (HPP)
The hierarchical power prior (HPP) method extends the BGLMM I method in Eq. (1) by raising the likelihood functions L(p_{1i}) and L(p_{2i}) for the singlearm studies to a power between zero and one
where α_{1} and α_{2} represent the power parameters for each arm. To allow flexibility in downweighting the singlearm data, Beta prior distributions can be placed on the power parameters; α_{1}∼β(10,1),α_{2}∼β(10,1). A β(10,1) prior has mean 0.91 and a 95% credible interval ranging from 0.69 to 0.99, which indicates a moderatetostrong similarity between singlearm studies and RCTs, and provides a modest downweighting [6].
Hierarchical commensurate prior (HCP)
The hierarchical commensurate prior (HCP) method assumes different mean treatment effects for RCTs and singlearm studies (described by Eqs. (7) and (9)), and places Normal prior distributions on μ_{1} and μ_{2} informed by the singlearm data;
where τ_{1} and τ_{2} are commensurability parameters representing agreement between data from RCTs and singlearm studies. Similar to Zhang et al, we place Gamma prior distributions on each parameter; τ_{1}∼Γ(10^{−3},10^{−3}),τ_{2}∼Γ(10^{−3},10^{−3}). For small parameter values, the variance of the singlearm data is inflated and the contribution to μ_{1} and μ_{2} is downweighted. As parameter values approach zero, only RCT data contribute in estimating μ_{1} and μ_{2}, whilst singlearm data are ignored. As the parameter values approach infinity, data from RCTs and singlearm studies contribute equally in estimating μ_{1} and μ_{2}.
Marginal response probabilities
The methods described above for a dichotomous outcome model the response probability in each arm, based on the numbers of participants and responders (described by Eq. (1)), and use a link function g() to transform the response probability onto the linear predictor scale where treatment effects are additive. A logit or probit link function can be used for metaanalysis with Binomial data [4], although the logit link is often favoured in the published literature as the relative treatment effects are easier to interpret on the log odds ratio scale. The methods proposed by Zhang et al do not parametrise relative treatment effects, and instead they recommend using the probit link Φ^{−1}() and then calculating the marginal response probability in each arm
where Φ is the cumulative distribution function for the standard Normal distribution. The marginal response probabilities can be used to calculate a marginal odds ratio OR_{21}=π_{2}(1−π_{1})/π_{1}(1−π_{2}). We implement the methods proposed by Begg & Pilote using a probit link to allow a direct comparison of all methods. For the BP and BPbias methods, we obtain the marginal response probabilities using Eq. (13)
and for the BPrandom method using Eq. (14)
Summary of methods
In this section, we have described the details of each method (including suitable prior distributions) and the corresponding marginal response probabilities (WinBUGS code used to fit each of the methods is provided in Appendix D). We note that the methods proposed by Zhang et al reduce to the model described by Eq. (7) when applied to RCT data only (i=1,...,n), which we label BGLMM ^{∗}. Similarly, the BP and BPbias methods reduce to the model described by Eqs. (2) and (3) when applied to RCT data only, which we label BP ^{∗}. We label the BPrandom method applied to RCT data only as BPrandom ^{∗}.
Simulation study: methods
In this section, we report aims, datageneration methods, estimands, methods, and performance measures for the simulation study, as recommended by Morris et al [12]. The simulation study aimed to compare the performance of the methods described previously, under a number of scenarios varying the proportion of RCTs and singlearm studies in the synthesis, the magnitude of bias between data from RCTs and singlearm studies, and differences in betweenstudy heterogeneity across RCTs and singlearm studies. We aimed to buildon the simulation study performed by Zhang et al [6], where the estimands were the marginal response probability in each arm π_{1} and π_{2}. We evaluated performance by calculating coverage, meansquare error (MSE), and mean change in 95% credible interval length (CrIL). The latter measures the average change in CrIL when a method is applied to RCT and singlearm data versus RCT data only. The methods were implemented via MCMC sampling in the WinBUGS software, using a burnin of 20,000 iterations and 100,000 iterations for posterior estimation [13].
Datageneration methods
As in Zhang et al [6], we let n represent the number of RCTs assessing treatments one and two, m  the number of singlearm studies assessing treatment one, and l  the number of singlearm studies assessing treatment two. We let i denote study, and set the total number of studies in a dataset n+m+l=30. The data were simulated basedon the BGLMM I method, modified to assume bias in the singlearm data. The steps taken to simulate a dataset were as follows. For RCT data, we specified values for the betweenstudy heterogeneity on each arm (σ_{1} and σ_{2}) and correlation between arms (ρ) to obtain the covariance matrix (Σ) to simulate v_{1i} and v_{2i}
We assigned values to the mean treatment effect in each arm (μ_{1} and μ_{2}), and applied the simulated v_{1i} and v_{2i} to obtain response probabilities on each arm
where Φ is the cumulative distribution function for the standard Normal distribution. We set the number of participants to 100 in each arm of study i, and applied the response probabilities (p_{1i} and p_{2i}) to sample the number of responders (r_{1i} and r_{2i}) from a Binomial distribution
For singlearm data, we specified values for the betweenstudy heterogeneity on each arm (σ_{3} and σ_{4}) to simulate v_{3i} and v_{4i}
We defined values for the bias in each arm (ξ and η), and applied the simulated v_{3i} and v_{4i} together with the mean treatment effects (μ_{1} and μ_{2}), to obtain the response probabilities on each arm
We applied the response probabilities (p_{1i} and p_{2i}) to sample the numbers of responders (r_{1i} and r_{2i}) from a Binomial distribution
The data were simulated under a number of scenarios adapted from scenario 1 (S1), where the number of RCTs n=15 and the number of singlearm studies on each arm m=10 and l=5. The magnitude of bias for singlearm studies ξ=0.2 and η=0.4, and betweenstudy heterogeneity parameters σ_{1}=0.6,σ_{2}=0.7,σ_{3}=0.8,σ_{4}=1. Due to lack of randomisation, singlearm data are at a higher risk of bias compared to randomised data, so we assume a systematic difference (i.e. parameters ξ=0.2 and η=0.4) and larger betweenstudy heterogeneity (i.e. parameters σ_{3}=0.8 and σ_{4}=1.0). For all scenarios, the mean treatment effects were set to μ_{1}=0.4 and μ_{2}=1.1, and correlation was ρ=0.6. We arrange the scenarios into four groups, where in each group the scenarios focus on varying a common set of parameter values. In group one, S15, the number of RCTs gradually decreases (from n=15 to n=1). This was intended to clearly demonstrate the performance of the methods in scenarios where there is little randomised evidence (i.e. S4 and S5, where n=3 and n=1) compared to scenarios where there is relatively substantial randomised evidence (i.e. S1, where n=15). In group two, [S6, S1, S79], the bias gradually increases (from ξ=0,η=0 to ξ=0.8,η=1). In group three, [S1012, S6], the betweenstudy heterogeneity for singlearm data gradually increases (from σ_{3}=0.1,σ_{4}=0.3 to σ_{3}=0.8,σ_{4}=1), with zero bias (ξ=0,η=0). In group four, [S1315, S1], the betweenstudy heterogeneity for the singlearm data gradually increases (from σ_{3}=0.1,σ_{4}=0.3 to σ_{3}=0.8,σ_{4}=1), with nonzero bias (ξ=0.2,η=0.4). A full description of the parameter values specified in each scenario is provided in Table A.1 (Appendix A).
Results
Illustrative example: results
Table 1 presents posterior median estimates (and 95% credible intervals) for π_{1},π_{2}, and the marginal odds ratio. The results are presented separately for analysis of RCT and singlearm data versus analysis of RCT data only. For the latter analysis, a randomeffects metaanalysis (REMA) [14] and fixedeffect metaanalysis (FEMA) were also implemented. The odds ratio estimates range from 2.53 to 3.4, suggesting participants assigned a bDMARD versus placebo were more than twice as likely to achieve an ACR20 response. However, only the contrastbased methods show CrIs greater than one. There is a reduction in uncertainty when methods are applied to include singlearm studies in the synthesis, and the armbased methods show a greater reduction in CrIL for the odds ratio (between 2349%) compared to the contrastbased methods (between 1722%). Table 1 includes estimates for the deviance information criterion (DIC), which provides a measure of model fit whilst penalising model complexity [15]. The BP method has the highest DIC value and is the simplest method in terms of model parameters.
Simulation study: results
In this section, we present the simulation study results for each method in terms of coverage, MSE and mean change in CrIL. We illustrate the results for each scenario group with a line plot of the performance measures for each estimand.
Scenarios S15
Across scenarios S15, the proportion of RCTs gradually decreases (from n=15 to n=1), whilst the total number of studies remains fixed (n+m+l=30). The results for these scenarios are presented in Fig. 2. The HPP and HCP methods, which both downweight the singlearm data, perform relatively strongly with overcoverage (i.e. coverage above the nominal value 0.95) and small MSE for both estimands. This suggests that downweighting the singlearm data can mitigate the effect of bias. The BPbias method, which includes a parameter on each arm to account for bias, performs strongly in S12 where there is a significant number of RCTs (n=15 and n=12). However, its performance dropsoff and MSE is much larger in S45 where there are few RCTs (n=3 and n=1). This suggests that it requires a significant number of both RCTs and singlearm studies to estimate bias. The BP method is naive to studydesign, and shows undercoverage for all scenarios, which worsens as the proportion of RCTs decreases. All methods, aside from BPbias, provide a reduction in uncertainty when including versus excluding singlearm data.
In scenarios S15, data were simulated using the BGLMM I method, potentially favouring armbased methods over contrastbased methods. We performed a sensitivity analysis to further explore this, where data were simulated using the BPrandom method. We label these scenarios S 1^{∗} 5^{∗}, and the results are presented in Figure B.1 (Appendix B). The HCP and HPP methods performed strongly across S15, and maintain their performance in S 1^{∗} 5^{∗} with overcoverage and relatively small MSE. The performance of BPbias in S 1^{∗} 5^{∗} mirrors its performance in S15, with a significant decrease in coverage and increase in MSE occurring in S 5^{∗}. The BP and BPrandom methods show a reduction in undercoverage and MSE, perhaps because their assumptions are now better aligned with the datagenerating method (e.g. common betweenstudy heterogeneity across all studies). In contrast, the BGLMM I and II methods show a reduction in coverage and a small increase in MSE, because their assumptions are not as aligned to the datagenerating method. The impact of betweenstudy heterogeneity is explored further in scenarios [S1012, S6] and [S1315, S6].
Scenarios [S6, S1, S79]
Across scenarios [S6, S1, S79], the magnitude of bias in each arm gradually increases (from ξ=0,η=0 to ξ=0.8,η=1). The results for these scenarios are presented in Fig. 3. The BPbias method shows consistent overcoverage and small MSE, but does not offer any reduction in uncertainty when including singlearm studies. The HCP and HPP methods maintain coverage close to the nominal value and small MSE, whilst offering a consistent reduction in uncertainty. They only show a drop in performance in S9, where there is relatively large bias in singlearm data. The BP and BPrandom methods are naive to studydesign and show reduction in uncertainty, but a steep decrease in coverage and increase in MSE as the bias is increased. The drop in performance is worse for π_{1} than π_{2}, perhaps because there are more singlearm studies on arm one (m=10) than arm two (l=5). The BGLMM I and II methods do not account for bias in the singlearm data, but show a more gradual decrease in coverage and increase in MSE.
Scenarios [S1012, S6]
Across scenarios [S1012, S6], betweenstudy heterogeneity in singlearm studies on each arm gradually increases (from σ_{3}=0.1,σ_{4}=0.3 to σ_{3}=0.8,σ_{4}=1) but remains fixed in RCTs (σ_{1}=0.6,σ_{2}=0.7), and there is zero bias (ξ=0,η=0). Figure 4 presents the results for these scenarios. The BPbias method shows significant undercoverage and large MSE in S10 where the singlearm studies have much lower betweenstudy heterogeneity compared to RCTs, but still provides some reduction in uncertainty. The BP and BPrandom methods show less undercoverage and much smaller MSE, whilst providing a greater reduction in uncertainty. The BGLMM I and II methods show overcoverage and little MSE, whilst providing the greatest reduction in uncertainty. As the betweenstudy heterogeneity is increased, all methods show a decrease for the reduction in uncertainty, but the HCP and HPP methods are impacted the least.
Scenarios [S1315, S1]
In contrast to [S1012, S6], scenarios [S1315, S1] assume nonzero bias in the singlearm data (ξ=0.2,η=0.4), and the results are presented in Fig. 5. In S13, where singlearm data is much less uncertain than the RCT data, the BGLMM I and II methods show a significant reduction in uncertainty but large undercoverage and significant MSE. In comparison, the BP and BPrandom methods provide a more modest reduction in uncertainty but better coverage, although the BPrandom method shows large MSE. The BPbias method shows improvement compared to S10, but offers only a modest reduction in uncertainty which diminishes in [S1415, S1]. The HCP and HPP methods, which downweight singlearm data, show overcoverage and small MSE across the scenarios whilst maintaining a reduction in uncertainty.
Discussion
In this paper, we aimed to compare methods proposed by Begg & Pilote and Zhang et al, for pairwise metaanalysis combining data from RCTs and singlearm studies using aggregate data. Based on our simulation study, we conclude that the HCP and HPP methods provide a consistent reduction in uncertainty when including singlearm data, whilst remaining robust to limited RCT data, bias, and differences in betweenstudy heterogeneity across the two sets of data. Both methods achieve this by downweighting the singlearm data, HPP through specification of a prior distribution, and HCP through estimating disagreement between the data from RCTs and singlearm studies. The BPbias method offers a simpler approach to mitigating bias, but requires a significant proportion of RCTs and singlearm studies in the synthesis. The BGLMM I and II methods provide a reduction in uncertainty, contingent upon little or no bias. Through our analysis of an illustrative example, we have shown that the methods can be used to combine data from RCTs and singlearm studies to achieve a significant reduction in uncertainty, compared to traditional metaanalysis of RCTs alone. We list below key recommendations in applying the methods for synthesis of RCTs and singlearm studies.
Key recommendations:

The BP method is a parsimonious approach offering significant reduction in uncertainty (compared to the analysis of RCT data alone), when there is little or no bias and differences in betweenstudy heterogeneity between the two types of data.

The BGLMM I and II methods provide a significant reduction in uncertainty whilst accounting for differences in betweenstudy heterogeneity, when there is little or no bias.

The HPP method allows for downweighting singlearm data, and remains robust to limited RCT data and bias whilst providing a reduction in uncertainty.

The HCP method provides a consistent reduction in uncertainty whilst accounting for disagreement between RCT and singlearm data, and also remains robust to limited RCT data.
In a traditional metaanalysis of RCTs aiming to estimate a pooled relative treatment effect [16], baseline treatment effects are allowed to vary independently to preserve randomisation in each arm and minimise bias. There has been discussion in the literature regarding armbased and contrastbased approaches to metaanalysis. Hong et al have suggested armbased methods can minimise bias when data are assumed to be missing in a particular arm [17]. In response, Dias & Ades have argued that armbased models actually increase bias since they do not preserve randomisation [18]. A further exploration of constrastbased and armbased models has been performed by White et al considering a NMA context [19]. The traditional metaanalysis approach is not feasible when seeking to combine RCTs and singlearm studies, because the singlearm studies lack a comparator arm to estimate a relative treatment effect. Consequently, exchangeability must be assumed on at least one arm to incorporate the singlearm studies. The methods by Begg & Pilote assume exchangeability on the designated baseline arm, whilst those by Zhang et al assume exchangeability on both arms. Thus, it may be beneficial to perform a sensitivity analysis using more than one method. Decisionmakers can then consider the benefits offered by including the singlearm studies (e.g. reduction in uncertainty) versus the potential penalties (e.g. increased risk of bias), and whether the penalties have been mitigated by applying a suitable method.
Aside from application in HTA, the methods assessed here can also be useful in clinical settings, for instance, in earlyphase cancer research. Phase II cancer trials assess treatment efficacy via randomisedcontrolled or singlearm study designs, where only the latter may be ethical for rare cancers [20]. Consequently, there may be data available from both RCTs and singlearm studies, which need to be synthesised to determine the feasibility of a phase III trial [21]. Thus, further methodological development for performing a metaanalysis (or NMA [22]) to combine data from different study designs is required.
In this paper, we have considered the case where only aggregate data are available, which limits the methods to rudimentary biasadjustment. The methods, however, can be very useful in situations when there are no IPD available and RCT data are limited. Further research is required to explore methods adjusting for bias which is variable across studies, to account for differences in risk of bias due to study setting (e.g. singlecentre versus multicentre singlearm studies). When IPD are available, a more detailed adjustment for potential biases can be carried out, as there are a number of approaches available that can be applied to mitigate confounding when estimating causal treatment effects [23]. For example, availability of IPD allows for enhancing approaches for metaregression, which is recommended to explore bias and heterogeneity in a synthesis of evidence [24]. Furthermore, we did not consider the methods recently proposed by Schmitz et al [7], which incorporate singlearm studies in NMA of RCTs. Although proposed under a NMA context, they can be adapted for pairwise metaanalysis. However, after matching singlearm studies based on covariate information, the models used to synthesise data are only applicable to twoarm studies. Including those methods in this simulation study would also require specifying a model from which to simulate data on covariates. Thus, the simulation study was restricted to the methods by Begg & Pilote and Zhang et al.
Conclusions
We have performed an extensive comparison of methods proposed by Begg & Pilote and Zhang et al, for pairwise metaanalysis combining data from RCTs and singlearm studies using aggregate data. We conclude that those methods by Zhang et al (HCP and HPP), which use the singlearm data to define prior distribution for model parameters, provide a consistent reduction in uncertainty when including singlearm data, whilst remaining robust to data variability. The other methods considered here perform worse when there is limited RCT data (BPbias), significant bias (BGLMM I & II), and differences in betweenstudies heterogeneity across the two sets of data (BP and BPrandom). We hope this study is informative for researchers seeking to perform a pairwise metaanalysis of RCTs and singlearm studies using aggregate data. We have described the existing methods in detail under a Bayesian framework, and the methods’ advantages and disadvantages under a number of data scenarios.
Availability of data and materials
All data generated or analysed during this study are included in this published article [and its Supplementary information files].
Declarations
Abbreviations
 HTA:

Health technology assessment
 RCT:

Randomised controlled trial
 MCMC:

Markov chain Monte Carlo
 NMA:

Network metaanalysis
 BP:

Begg & Pilote
 BPbias:

Begg & Pilote method with biasadjustment
 BPrandom:

Begg & Pilote method with random effects
 BGLMM:

Bivariate generalised linear mixed effects model
 HPP:

Hierarchical power prior
 HCP:

Hierarchical commensurate prior
 MSE:

Mean square error
 CrIL:

Credible interval length
 S:

Scenario
 RA:

Rheumatoid arthritis
 bDMARDs:

Biologic diseasemodifying antirheumatic drugs
 ADA:

Adalimumab
 ETN:

Etanercept
 IFX:

Infliximab
 ABT:

Abatacept
 RTX:

Rituximab
 ACR:

American College of Rheumatology
 REMA:

Random effects metaanalysis
 FEMA:

Fixed effects metaanalysis
 DIC:

Deviance information criterion
 IPD:

Individual participant data.
References
 1
Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA. Cochrane Handbook for Systematic Reviews of Interventions. 2nd Edition. Chichester: Wiley; 2019.
 2
Dias S, Welton NJ, Sutton AJ, Ades A. NICE DSU Technical Support Document 1: Introduction to evidence synthesis for decision making. University of Sheffield, Decision Support Unit. 2011:1–24.
 3
Woolacott N, Corbett M, JonesDiette J, Hodgson R. Methodological challenges for the evaluation of clinical effectiveness in the context of accelerated regulatory approval: an overview. J Clin Epidemiol. 2017; 90:108–18.
 4
Dias S, Welton NJ, Sutton AJ, Ades AE. NICE DSU Technical Support Document 2: A Generalised Linear Modelling Framework for Pairwise and Network MetaAnalysis of Randomised Controlled Trials. 2011. http://www.nicedsu.org.uk. Last updated September 2016.
 5
Begg CB, Pilote L. A model for incorporating historical controls into a metaanalysis. Biometrics. 1991; 47(3):899–906.
 6
Zhang J, Ko CW, Nie L, Chen Y, Tiwari R. Bayesian hierarchical methods for metaanalysis combining randomizedcontrolled and singlearm studies. Stat Methods Med Res. 2019; 28(5):1293–310.
 7
Schmitz S, Maguire Á., Morris J, Ruggeri K, Haller E, Kuhn I, Leahy J, Homer N, Khan A, Bowden J, et al. The use of single armed observational data to closing the gap in otherwise disconnected evidence networks: a network metaanalysis in multiple myeloma. BMC Med Res Methodol. 2018; 18(1):66.
 8
Thom HH, Capkun G, Cerulli A, Nixon RM, Howard LS. Network metaanalysis combining individual patient and aggregate data from a mixture of study designs with an application to pulmonary arterial hypertension. BMC Med Res Methodol. 2015; 15(1):34.
 9
Chakravarty K, McDonald H, Pullar T, Taggart A, Chalmers R, Oliver S, Mooney J, Somerville M, Bosworth A, Kennedy T. BSR/BHPR guideline for diseasemodifying antirheumatic drug (DMARD) therapy in consultation with the British Association of Dermatologists. Rheumatology. 2008; 47(6):924–5.
 10
Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT, Bingham III CO, Birnbaum NS, Burmester GR, Bykerk VP, Cohen MD, et al. 2010 rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. 2010; 62(9):2569–81.
 11
Malottki K, Barton P, Tsourapas A, Uthman A, Liu Z. Adalimumab, etanercept, infliximab, rituximab and abatacept for the treatment of rheumatoid arthritis after the failure of a tumour necrosis factor inhibitor: a systematic review and economic evaluation. Health Technol Assess. 2011; 15(14).
 12
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019; 38(11):2074–102.
 13
Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGSa Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput. 2000; 10(4):325–37.
 14
DerSimonian R, Laird N. Metaanalysis in clinical trials. Control Clin Trials. 1986; 7(3):177–88.
 15
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002; 64(4):583–639.
 16
Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to randomeffects metaanalysis: a comparative study. Stat Med. 1995; 14(24):2685–99.
 17
Hong H, Chu H, Zhang J, Carlin BP. A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons. Res Synth Methods. 2016; 7(1):6–22.
 18
Dias S, Ades A. Absolute or relative effects? armbased synthesis of trial data. Res Synth Methods. 2016; 7(1):23.
 19
White IR, Turner RM, Karahalios A, Salanti G. A comparison of armbased and contrastbased models for network metaanalysis. Stat Med. 2019; 38(27):5197–213.
 20
Grayling MJ, Dimairo M, Mander AP, Jaki TF. A review of perspectives on the use of randomization in phase II oncology trials. J Natl Cancer Inst. 2019; 111(12):1255–62.
 21
Sabin T, Matcham J, Bray S, Copas A, Parmar MK. A quantitative process for enhancing end of phase 2 decisions. Stat Biopharm Res. 2014; 6(1):67–77.
 22
Martina R, Jenkins D, Bujkiewicz S, Dequen P, Abrams K. The inclusion of real world evidence in clinical development planning. Trials. 2018; 19(1):1–12.
 23
Faria R, Hernadez Alava M, Manca A, Wailoo AJ. NICE DSU Technical Support Document 17: The use of observational data to inform estimates of treatment effectiveness for Technology Appraisal: Methods for comparative individual patient data.2015. p. 19–20. http://www.nicedsu.org.uk.
 24
Phillippo DM, Ades AE, Dias S, Palmer S, Abrams KR, Welton NJ. NICE DSU Technical Support Document 18: Methods for populationadjusted indirect comparisons in submission to NICE. 2016. http://www.nicedsu.org.uk.
Acknowledgements
This research used the ALICE/SPECTRE High Performance Computing Facility at the University of Leicester.
Funding
JS was supported by UK National Institute for Health Research (NIHR) Methods Fellowship [award no. RMFI201708027] and NIHR Doctoral Research Fellowship [award no. NIHR300190]. KRA and SB were supported by the UK Medical Research Council [grant no. MR/R025223/1]. KRA is partially supported by Health Data Research (HDR) UK, the NIHR Applied Research Collaboration East Midlands (ARC EM), and as a NIHR Senior Investigator Emeritus (NFSI051210159). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.
Author information
Affiliations
Contributions
JS undertook the data curation and formal analysis, SB and KRA conceptualised the simulation study and provided supervision for JS. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
JS does not have conflict of interest.
SB has served as a paid consultant, providing methodological advice, to NICE and Roche, and has received research funding from European Federation of Pharmaceutical Industries & Associations (EFPIA) and Johnson & Johnson.
KRA has served as a paid consultant, providing methodological advice, to; Abbvie, Amaris, Allergan, Astellas, AstraZeneca, Boehringer Ingelheim, BristolMeyers Squibb, CreativCeutical, GSK, ICON/Oxford Outcomes, Ipsen, Janssen, Eli Lilly, Merck, NICE, Novartis, NovoNordisk, Pfizer, PRMA, Roche and Takeda, and has received research funding from Association of the British Pharmaceutical Industry (ABPI), European Federation of Pharmaceutical Industries & Associations (EFPIA), Pfizer, Sanofi and Swiss Precision Diagnostics. He is a Partner and Director of Visible Analytics Limited, a healthcare consultancy company.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
Supplementary material for: Incorporating singlearm studies in metaanalysis of randomised controlled trials: A simulation
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Singh, J., Abrams, K.R. & Bujkiewicz, S. Incorporating singlearm studies in metaanalysis of randomised controlled trials: a simulation study. BMC Med Res Methodol 21, 114 (2021). https://doi.org/10.1186/s12874021013011
Received:
Accepted:
Published:
Keywords
 Evidence synthesis
 Real world data
 Singlearm studies
 Bayesian hierarchical methods
 Metaanalysis
 Armbased methods