Illustrative example and sources of evidence
In a motivating example, DMTs used in patients with RRMS were considered. A systematic review was carried out to identify studies, both randomised and observational, of different DMTs with a main focus on effectiveness of fingolimod to illustrate how the inclusion of RWE in NMA would impact the estimates of effectiveness of fingolimod in the context of a technology appraisal. The literature search was limited to studies reported prior to January 2010, when fingolimod was given licencing authorisation. Data were extracted on the effect of each treatment on relapse rate. Search terms utilised is available in Additional file 1.
Network meta-analysis
A random-effects NMA model with adjustment for multi-arm trials [11] was used as the base case meta-analytic model. To investigate the effect of fingolimod on relapse rate, the number of relapses rik in each study i and treatment arm k was modelled as count data following the Poisson distribution [12],
$${r}_{ik}\sim Poisson\left({\gamma}_{ik}{E}_{ik}\right)$$
(1)
where Eik is the exposure time in person years and γik is the rate at which events (relapses) occur in arm k for study i. Following a standard generalized linear model approach, the conjugate log link was used with random true treatment effect differences δibk between treatments k and b which are assumed to follow a common normal distribution:
$$\log \left({\gamma}_{ik}\right)={\mu}_{ib}+{\delta}_{ib k}{I}_{k\ne b}$$
(2)
$${\delta}_{ibk}\sim N\left({d}_{bk},{\sigma}^2\right)$$
(3)
Assuming consistency in the network (which means that, for example, average treatment effect difference dAC, between treatments A and C, equals the sum of average treatment effect differences dAB, between treatments A and B, and dBC, between treatments B and C) allows us to represent treatment effect for each treatment contrast dbk in the network as a difference of basic parameters which are average treatment effects of each treatment in the network compared to a common reference treatment 1; dbk = d1k − d1b. Adopting a Bayesian approach to estimating the parameters of Eqs. (1)-(3) requires that prior distributions are placed on the model parameters: the baseline study effects, μib, for example, the uniform distribution μib~Uniform(−10, 10), on the basic parameters, d1k~Uniform(−10, 10) and on the between-study variance σ~Uniform(0, 2).
For multi-arm studies, correlation between treatment effects relative to a common baseline treatment is taken into account by assuming true treatment effects \({\delta}_{i\left({bk}_n\right)}\) follow a common multivariate normal distribution which can be represented as series of univariate conditional distributions as follows:
$${\delta}_{i\left({bk}_1\right)}\sim \mathrm{Normal}\left({d}_{\left({bk}_1\right)},\kern0.75em {\sigma}^2\right)$$
(4)
$${\delta}_{i\left({bk}_n\right)}\mid \left(\begin{array}{c}{\delta}_{i\left({bk}_1\right)}\\ {}\begin{array}{c}\vdots \\ {}{\delta}_{i\left({bk}_{n-1}\right)}\end{array}\end{array}\right)\sim \mathrm{Normal}\left({d}_{\left({bk}_n\right)}+\frac{1}{n}\sum_{t=1}^{n-1}\left({\delta}_{i\left({bk}_t\right)}-{d}_{\left({bk}_t\right)}\right),\frac{\left(n+1\right)}{2n}{\sigma}^2\right)$$
(5)
where n = 2, …, p in the (p + 1)-arm study of p treatment effect estimates relative to the reference treatment.
Naïve pooling approach
The above NMA model was initially used to combine data from RCTs with RWE by including the observational studies at ‘face-value’. Data from all studies, regardless of the study design, were combined in the NMA described above.
This model was then extended to account for the differences between the designs of the studies as described in the following sections.
Power prior approach
To take into account the differences in study design between RCTs and observational studies, a ‘power transform prior’ approach was adopted [7]. This approach allows down-weighting of the RWE, thus making the data from this type of studies contribute less compared to data obtained from the RCTs. This is achieved by introducing a down-weighting factor, alpha (α), which the likelihood contribution of the RWE studies is raised to the power of. Alpha (α) is then varied between zero and one, with zero meaning that RWE is entirely discounted in the NMA, and with one indicating that all RWE is considered at ‘face-value’, which is assumed to be the same for each RWE study included in the network. The impact of different levels of weighting on the results of the NMA is performed by considering a series of values for alpha. The results are then summarised both in terms of the effect estimates (and their associated level of uncertainty) and the rankings that the treatments received (based on these effect estimates).
Considering the annualised relapse rate ratio (ARRR) and assuming δ = log(ARRR), the overall joint posterior distribution is given by,
$$P\left(\delta | RCT, RWE\right)\propto L\left(\delta | RCT\right)\times L{\left(\delta | RWE\right)}^{\alpha }P\left(\delta \right)$$
(6)
where L(θ| Y) is the likelihood of θ given data Y. Assuming a standard random-effects NMA model, we combine the likelihood contribution of RWE, raised to the power of alpha, with the likelihood of the RCT data. Together with the prior distributions for the basic parameters, this gives the overall posterior distribution with RWE discounted by the parameter alpha. Assuming that the number of relapses follow a Poisson distribution, the RWE log likelihood (LL) in (1) becomes
$${LL}_{ikh}={\log}{\left(\frac{{\gamma_{ikh}}^{r_{ikh}}{e}^{-{\gamma}_{ikh}}}{r_{ikh}!}\right)}^{\alpha_h}$$
(7)
$${LL}_{ikh}={\alpha}_h\left({r}_{ikh}\log \left({\gamma}_{ikh}\right)-{\gamma}_{ikh}-\log \left({r}_{ikh}!\right)\right)$$
(8)
where h indexes the different values of α.
Hierarchical model approach
An alternative approach to allowing differentiation between study designs in NMA is introducing another level in a Bayesian hierarchical model, modelling the between-study heterogeneity of treatment effects within each study design (RCT or RWE) and across study designs. The hierarchical model by Schmitz et al. (2012) was adapted to model count data using a Poisson distribution. Assuming j = 1, 2 where 1 represents the RCT data and 2 represents the RWE then Eq. (1) now becomes,
$${r}_{ik}^j\sim Poisson\left({\gamma}_{ik}^j{E}_{ik}^j\right)$$
(9)
And, similarly as in the general NMA model, using the log link function Eq. (2) becomes
$$\log \left({\gamma}_{ik}^j\right)={\upmu}_{ik}^j+{\delta}_{ibk}^j{I}_{k\ne b}$$
(10)
The data from the two sources of evidence, RCT data and RWE data, are modelled separately at the within-study and within-design level. Similarly, as in Schmitz et al. (2013) assuming the treatment effects from RCT and RWE evidence are exchangeable, the study designs pecific estimates are combined to estimate an overall measure of treatment effect using random-effects [9]. Thus, if \({\delta}_{ibk}^1\) and \({\delta}_{ibk}^2\) represent the treatment effect of treatment k against a reference treatment b, based on the RCT evidence and RWE respectively, then,
$${\delta}_{ibk}^1\sim N\left({d}_{bk},{\sigma}^2\right)$$
(11)
$${\delta}_{ibk}^2\sim N\left({d}_{bk},{\sigma}^2\right)$$
(12)
where dbk is the mean treatment effect of treatment k compared to a reference treatment b and σ2 is the variance representing the between-studies heterogeneity. Prior distributions need to be placed on the parameters of the model, for example, the following “vague” prior distributions:
$${d}_{bk}\sim Uniform\left(-10,10\right)$$
$$\sigma \sim Uniform\left(0,2\right)$$
This model was further extended by adopting a power prior approach at the within-study level for RWE (level one) by down weighting the likelihood contribution of the RWE by the factor alpha, as in Eq. (6), in the hierarchal model in order to provide a further sensitivity analysis. Combining average underlying study effects \({\delta}_{ibk}^1\) from RCTs with down-weighted effects \({\delta}_{ibk}^2\) from RWE produces an overall pooled ARRR combined effects dbk.
Implementation and model fit
All models were implemented in WinBUGS version 1.4.3 [13]. The first 10,000 simulations were discarded for all models as a burn-in. The main analyses were based on additional 20,000 iterations in order to ensure convergence. Convergence was investigated by visually inspecting the trace and history plots. Model fit was evaluated using the total residual deviance and the DIC for each network size [6]. Between-study heterogeneity was assessed using the standard deviation across random-effects models. Inconsistency was assessed by assessing residual deviance and performing node splitting analysis [14].