Multivariate network meta-analysis incorporating class effects

Background Network meta-analysis synthesises data from a number of clinical trials in order to assess the comparative efficacy of multiple healthcare interventions in similar patient populations. In situations where clinical trial data are heterogeneously reported i.e. data are missing for one or more outcomes of interest, synthesising such data can lead to disconnected networks of evidence, increased uncertainty, and potentially biased estimates which can have severe implications for decision-making. To overcome this issue, strength can be borrowed between outcomes of interest in multivariate network meta-analyses. Furthermore, in situations where there are relatively few trials informing each treatment comparison, there is a potential issue with the sparsity of data in the treatment networks, which can lead to substantial parameter uncertainty. A multivariate network meta-analysis approach can be further extended to borrow strength between interventions of the same class using hierarchical models. Methods We extend the trivariate network meta-analysis model to incorporate the exchangeability between treatment effects belonging to the same class of intervention to increase precision in treatment effect estimates. We further incorporate a missing data framework to estimate uncertainty in trials that did not report measures of variability in order to maximise the use of all available information for healthcare decision-making. The methods are applied to a motivating dataset in overactive bladder syndrome. The outcomes of interest were mean change from baseline in incontinence, voiding and urgency episodes. All models were fitted using Bayesian Markov Chain Monte Carlo (MCMC) methods in WinBUGS. Results All models (univariate, multivariate, and multivariate models incorporating class effects) produced similar point estimates for all treatment effects. Incorporating class effects in multivariate models often increased precision in treatment effect estimates. Conclusions Multivariate network meta-analysis incorporating class effects allowed for the comparison of all interventions across all outcome measures to ameliorate the potential impact of outcome reporting bias, and further borrowed strength between interventions belonging to the same class of treatment to increase the precision in treatment effect estimates for healthcare policy and decision-making.


Background
In a health technology assessment (HTA) setting, it is imperative that decisions regarding healthcare policy are formulated with consideration to all outcomes associated with the decision question, using all relevant evidence, and appropriately taking uncertainty in to account. In many healthcare conditions, the effects of new interventions are reported on multiple outcomes and these effects are correlated. Taking in to account the dependence between correlated treatment effects, as well as heterogeneous reporting of treatment effects, may have a substantial impact on the ability to appropriately estimate cost-effectiveness for new interventions [1], especially in terms of capturing uncertainty.
Network meta-analysis (NMA) is widely used in an evidence synthesis setting due to the attractive nature of utilizing all relevant information from both direct evidence (obtained from head-to-head trials) and indirect evidence (obtained from trials that share a common treatment comparator) [2][3][4]. Multivariate network meta-analysis (MVNMA) builds on this framework and has the ability to simultaneously model treatment effects for multiple outcomes, as well as estimate the correlation between them. Consequently, for the studies that did not report certain outcomes of interest, it is possible to obtain a predictive value for missing data by modelling the treatment effects on multiple outcomes jointly, taking into account the correlation between them. This methodology can not only increase the information for decision making, but it can also limit the potential for outcome reporting bias in the original trials [5,6], and consequently increase precision and confidence in the treatment effect estimates [7].
There are two types of correlations that need to be incorporated in the estimation of treatment effect estimates for multiple outcomes -one at the between-study level, and another at the within-study level. Betweenstudy correlations occur due to variability betweenstudies. Between-study variability occurs in situations with which studies have a different distribution of potential treatment effect modifiers, such as differences in patient characteristics, trial designs, and baseline patient severity. Within-study correlations occur between treatment effects on multiple outcomes within a trial, and are a consequence of differences in patient-level characteristics. These correlations are indicative of the association between treatment effects measured on multiple outcomes on the same patients within a study. Withinstudy correlations are often difficult to estimate and they are seldom reported in clinical trials. Thus, estimation is often required using individual participant data [8], for example by bootstrapping [9], or elicited from expert opinion [10].
The ability to simultaneously model treatment effects on multiple outcomes in a multivariate analysis is an appealing feature in many HTA settings, as interest frequently lies in multiple and correlated outcome measures [11]. In the last 20 years, evidence synthesis methods have witnessed a rapid increase in methodological developments and applications of multivariate analyses to assess interventions with two or more outcomes of interest [9,[11][12][13][14][15][16][17][18][19]. It is often desirable to account for the correlation between treatment effects on multiple outcomes in a meta-analysis framework as this has the ability to borrow strength between all reported outcomes and studies in order to inform treatment effect estimates [9]. Such an approach is commonly referred to as multivariate meta-analysis. In more recent developments, multivariate meta-analyses have been extended to incorporate multiple treatment comparisons or network meta-analyses [7,10,20]. Ades et al. [20] simultaneously modelled mutually exclusive, competing risk outcomes, using a multinomial likelihood whereby the within-study correlations were accounted for but the between-study correlations were assumed to be zero. Efthimiou et al. [10] proposed a MVNMA model that accounts for both the betweenand within-study correlations of binary outcomes. Specifically, Efthimiou et al. [10] incorporated the within-study correlations at the study-specific treatment contrast level, which can be problematic when incorporating multi-arm trials. Achana et al. [7] used a more natural modelling approach for arm-level data whereby the within-study correlations were incorporated at the treatment-arm level. Using this approach, Achana et al. [7] considered the treatment arms to be independent as a consequence of randomisation, which greatly eases computation of the likelihood for multi-arm trials. Furthermore, Achana et al. [7] developed this methodology to borrow information across outcomes in order to predict an estimate for missing data. This methodology allows disconnected interventions to be incorporated in to the analyses if they belong to a connected network for one or more additional outcomes; thereby, allowing all interventions to be evaluated across all outcome measures.
In situations with which there are a large number of interventions in the treatment networks and relatively few trials informing each treatment comparison, there is a potential issue with sparsity of data, which can lead to substantial parameter uncertainty. Collapsing the intervention arms into their respective treatment classes, also known as "lumping" interventions, increases the evidence base and precision in the effect estimates, but with such a class-based approach, the direct interpretation of individual intervention effects (especially those of dose or formulation effects) are lost, which can hinder decisionmaking. To overcome this issue, exchangeability between the same interventions, but with different formulations and/or treatment regimes can be incorporated in a threelevel hierarchical NMA to borrow information within the classes of interventions, strengthening inferences, and potentially reducing the uncertainty around the individual intervention effects [21].
Furthermore, in situations where clinical trial data are heterogeneously reported and/or missing, synthesising such data can be problematic. Meta-analysis methods require specification of both a measure of effect and a measure of variability in the original trial reports. In the absence of such data, trials are often excluded from the analysis [22]. In a decision making context, missing data can have a detrimental impact on the overall decision, as many of the interventions of interest have to be excluded from the analyses. For this reason, when meta-analysing data, it is good practice to estimate missing standard errors for all models, based on additional measures of uncertainty [23].
This paper extends the methodology of Achana et al. [7] to incorporate the exchangeability between interventions of the same class [21], and further accounts for correlations between baseline and follow-up for missing measures of variability. This approach makes use of all available data by borrowing strength within classes of interventions as well as across outcomes. Borrowing information in this way has the potential to increase the precision in treatment effect estimates, as well as allowing for the comparison of all interventions across all outcome measures, to aid decision making. The remainder of this article is set out as follows: the first section describes the motivating example in overactive bladder (OAB). This is followed by a description of the multivariate network meta-analysis methods, model developments, and model estimation, before applying these methods to the motivating dataset and presenting the results. The article concludes with a discussion and final remarks.

Motivating dataset
The cardinal symptoms of OAB syndrome are urinary incontinence, increased daytime frequency (or voiding), and urgency. A gold standard tool for assessing the quantitative measure of OAB symptoms are patient-reported bladder diaries whereby individuals with OAB record the frequency and severity of OAB symptoms on a daily basis [24]. These patient-reported bladder diaries are often used as the primary outcome in trials assessing new interventions for OAB. However, many of the key symptoms are often under-reported. For example, in the current literature urgency is defined as "the cardinal symptom" of OAB [25], but there are far fewer studies evaluating interventions for urgency (n=62) compared to incontinence (n=117) and voiding episodes (n=124). This poses several limitations for decision makers; most notably, it is difficult to estimate both clinical and costeffectiveness of interventions across the entire symptom syndrome.
In this motivating dataset, interest lies in evaluating the change from baseline for several symptomatic responses. Whilst symptoms at baseline and follow-up are commonly reported, change from baseline is often unreported; and thus, whilst it is possible to calculate the mean change from baseline, the variance is often missing. Table 1 illustrates the different scenarios for trial reporting in the included studies. In this example, there were no trials that reported follow-up data alone.
In total, 143 studies reported one of the three cardinal symptoms of OAB. Of these, only 51 (36%) reported treatment effects for all 3 outcomes. A total of 117, 124, and 62 trials evaluated 101, 108, and 58 different interventions for incontinence, voiding, and urgency episodes, respectively. Figures 1, 2 and 3 illustrate the networks of evidence for incontinence, voiding and urgency episodes, with corresponding treatment codes given in Additional file 1. Nearly one third of all studies did not report measures of variability in the mean treatment effects. In total, 44 (38%), 42 (34%), and 18 (29%) studies solely reported mean effects and gave no measure of uncertainty or variability for incontinence, voiding, and urgency episodes, respectively.
For analyses incorporating class effects, interventions were grouped according to expert clinical opinion (DGT). Figure 4 demonstrates the treatment classification of each of the individual interventions, where the central node represents the classes of treatments and the linked arms represent each of the individual interventions within those classes.

Methods
In this section, we begin by recalling the MVNMA model described by Achana et al. [7] (Model 1) and develop this framework further to incorporate the exchangeability between interventions of the same class of interventions (Model 2). We then describe a missing data framework to incorporate studies with missing measures of uncertainty or variability.

Model 1: multivariate network meta-analysis
Following the random effects MVNMA described by Achana et al. [7], let Y ij = (y ij1 ,y ij2 ,y ij3 ), be the observed vector of effects for intervention j of the i th study (i = 1, 2, ..., n s ) for each of the outcomes of interest (1 = urinary incontinence, 2 = voiding frequency, 3 = urgency episodes). Let Y ij follow a multivariate normal (MVN) distribution such that: where θ ij is a vector of true treatment effects, S ij is the treatment-specific within-study covariance matrix assumed known, μ i is a vector of true baseline effects in study i with baseline treatment b, and δ i,bk is a vector of treatment-specific effects in the k th arm relative to the baseline treatment in arm 1 of study i. The elements of S ij are expressed as: ρ w (12) se ij (1) se ij (2) ρ w (13) se ij(1) se ij (3) ρ w (12) se ij (1) se ij (2) se 2 ij (2) ρ w (23) se ij (2) se ij (3) ρ w (13) se ij(1) se ij (3) ρ w (23) se ij (2) where se ij(l) denotes the observed standard errors of intervention j for outcome, l = 1, 2, 3, and ρ w (qr) denotes the within-study correlations, for q = 1, 2 and r = 2, 3. For 2-arm trials, the treatment effect differences of δ i,bk are assumed to be normally distributed and correlated: where d t ib t ik (l) represents the pooled effect of the treatment in arm k relative to the treatment in arm b in study i for each outcome, l. is the between-study covariance matrix under an assumption of homogeneous betweenstudy variances [2] and correlations across treatment contrasts [7]. This notation can easily be extended to account for multi-arm trials [26], which is further described in Additional file 2.
The relative effect of the study-specific reference treatment in arm 1 (the control arm) relative to itself for outcome l, δ i,b1(l) , is set to 0 and as such the set of conditional univariate distributions begin with the relative effect of the intervention in arm 2 relative to the control arm, δ i,b2(l) . The study-specific treatment comparisons, δ i,b(k−1)(l) , are expressed in terms of the basic parameters of the pooled treatment effects for the intervention in arm k, d 1,t ik (l) and the basic parameters of the pooled treatment effects for the intervention in the reference treatment arm, d 1,t ib (l) . For example, for trials that compare interventions A and B for outcome l, the pooled treatment effect, d t ib t ik (l) = d AB(l) , is given by: For d AB where A > 1 and B > 1, the relative effects are expressed in terms of the basic parameters described in Eq. (5). The intervention effect of the reference treatment for the entire treatment network, j = 1(l), usually a placebo or control intervention, is set to 0 for every outcome l, such that d 11(l) = 0. In order to predict treatment effect estimates for missing data for trials that did not report all outcomes of interest, following Achana et al. [7], we assumed that the pooled effects of intervention j = 2, ..., n t relative to a reference treatment, d 1j(l) , for outcome, l, can be expressed as a sum of treatment-specific effects, α j , and outcome-specific effect, γ l , such that: The parameter ζ indicates the deviation of treatment effect profiles across outcomes. If ζ was close to zero, this would indicate a high degree of similarity between outcomes. In situations where ζ was particularly large, this would indicate a substantial deviation between treatment effect profiles across outcomes. Non-informative prior distributions were specified for μ il , α j ∼ Normal(0, 10 3 ), σ l ∼ Uniform(0,2), and ζ ∼ Uniform(0, 2), for l = 1, 2, 3. The spherical parameterisation technique based on Cholesky decomposition [27] was used to express the between-studies variance-covariance matrix, , and discussed in more detail in Additional file 3.

Model 2: multivariate network meta-analysis incorporating class effects
To utilise the additional similarity between the same interventions with different regimes, Model 1 was developed further to incorporate class effects. For study i evaluating intervention j belonging to class m, Y ij m = (y ij m 1 ,y ij m 2 ,y ij m 3 ), denotes the observed vector of effects for each of the outcomes of interest, and expressed in the following way: where μ i and δ i,bk have the same interpretation as in Model 1. The parameter θ ij m , represents a vector of true treatment effects, and S ij m represents the withinstudy covariance matrix, for intervention j belonging to a broader class of interventions m, for study i, such that: As in Model 1, the elements of δ i,bk for 2-arm trials are assumed to be drawn from a multivariate normal distribution as described in Eq. 3 (and generalised to multi-arm trials as described in Additional file 2). Here δ i,bk is expressed in terms of the pooled effect, d t ib t ik (l), of the treatment in arm k relative to the treatment in arm b in study i for each outcome, l. For example, when we incorporate class effects, the pooled treatment effect of treatment A belonging to class c, relative to the pooled treatment effect of treatment B belonging to class g, d A c B g (l) , is given by: The intervention effect of the reference treatment for the entire treatment network, j = 1 1 (l), usually a placebo or control intervention, is set to 0 for every outcome l, such that d 1 1 1 1 (l) = 0. The basic parameters for relative treatment effects, d 1 1 j m (l) , of intervention j within class m, relative to the reference treatment, were assumed to follow a normal distribution with mean equal to the treatment-specific effect, α j m , plus the outcome-specific effect, γ l , and variance, ζ 2 : where γ l and ζ 2 have the same interpretation as Model 1.
In order to incorporate the exchangeability between treatment-specific effects, α j m , within class m, α j m was assumed to follow a normal distribution with mean equal to the pooled effect estimate for the m th class of interventions, β m , with class specific between-intervention variance, ν 2 m , [21] such that: Non-informative prior distributions were specified for μ il , σ l ∼ Uniform(0,2) for l = 1, 2, 3, ζ ∼ Uniform(0, 2), β m ∼ Normal(0, 10 3 ) and ν m ∼ Uniform(0, 2). The spherical parameterisation technique based on Cholesky decomposition [27] was used to construct a prior distribution for the between-study variance-covariance matrix, , and discussed in more detail in Additional file 3. In all multivariate analyses, prior distributions for the variance parameters (i.e. σ l , ζ , and ν m ) were restricted to a Uniform(0,2) distribution on the standard deviation scale. For example, a value of 2 for the class-specific betweenintervention variance, ν m , suggests that for a random pair of interventions, the difference in the mean change from baseline could be as large as 2.2 events on average. Uniform(0,2) prior distributions were considered for variance components in multivariate analyses in order to aid computation of the variance-covariance matrix. However, there is an argument to suggest that the variance parameters in a Bayesian model can be decomposed into the sum of several other variance components [28]. Thus, in hierarchical NMAs with additional variance components at the class-level, more informative prior distributions for variance parameters may be reasonable.

Missing data framework
In situations where the observed standard errors of treatment effects, se ij , were not reported, but baseline and follow-up variances were available (scenario 5 of Table 1), the correlation, ξ , between variance at baseline, sd baseline ij 2 , and follow-up, sd followup ij 2 , were used to impute estimates of the variance for change from baseline, sd change ij 2 . This is calculated as: Using external information from trials that report all variance terms (scenario 1 of Table 1), an informative prior distribution was placed on the correlation, ξ , using Fisher's Z-transformation [23]. For trials that do not report the variability at follow-up (scenario 6 of Table 1), a linear predictor with baseline variance as a covariate was included such that: where υ represents a constant term, and λ the regression coefficient. Trials that did not provide a measure of variability (scenario 7 of Table 1) were excluded from the analyses.

Model estimation
All models were estimated using Markov Chain Monte Carlo implemented in WinBUGS 1.4.3 [29]. Example Win-BUGS code for the MVNMA incorporating class effects and the missing data framework is given in Additional file 4. Samples were collected for 150,000 MCMC iterations with the first 10,000 iterations discarded in the form of a 'burn-in. ' Convergence plots were assessed for a random sample of parameters of interest including treatment effect estimates and between-study standard deviations [30]. Brooks-Gelman-Rubin statistics, autocorrelation, history, and trace plots were used to detect non-convergence for three individual MCMC chains with disparate starting values. Sensitivity analyses were undertaken to assess the impact of the choice of prior distributions for variance parameter. For both variance parameters, two alternative distributions were considered. For prior specification of the elements of V 1/2 , (1) Gamma(0.001,0.001) on the precision scale, and (2) Halfnormal(0,1) on the standard deviation scale were considered. For prior specification of ζ , (1) Gamma(0.01,0.01) on the precision scale, and (2) Half-normal(0,1) on the standard deviation scale were considered.

Multivariate network meta-analysis
A multivariate approach allowed for the inclusion of 143 studies evaluating 115 interventions for OAB across all 3 outcomes. Previously, univariate analyses of urinary incontinence, voiding frequency and urgency episodes included 115, 119, and 60 studies evaluating 97, 100, and 54 interventions, respectively. Results for univariate analyses are given in Additional file 5 for completeness. Figure 5 illustrates the network of evidence for MVNMA evaluating incontinence, voiding, and urgency episodes. Incorporating a multivariate approach increased the ability to include more interventions for all three outcomes, compared to that of univariate analyses displayed in Figs. 1, 2 and 3. This was particularly apparent for urgency episodes, with which all interventions evaluated for incontinence and voiding could now be evaluated for urgency. Furthermore, for interventions that were evaluated for urgency in the original trials but disconnected from the univariate network of evidence (Fig. 3), e.g. reflexology (71), a MVNMA could borrow information between outcomes for which they were connected, in order to complete the network of evidence. However, eight treatments remained disconnected from the multivariate network of evidence: darifenacin 7.5-15mg once daily + BT (88), tolterodine + BT (94), darifenacin ER 7.5-15mg once daily (104), tolterodine (105), lidocaine gel 2x6ml (129), emepronium bromide immediate release 200mg three times a day (130), sacral nerve stimulation + tolterodine extended release 2mg once daily (136), tolterodine extended release 2mg once daily (137). These interventions were disconnected in all networks of evidence for univariate NMAs evaluating each of the outcomes of interest; and thus, borrowing information between outcomes had little impact on the inclusion of these particular interventions in multivariate analyses. Table 2 displays the estimated posterior median reduction and 95% credible intervals for change from baseline in incontinence, voiding, and urgency episodes. Treatment effect estimates were first ranked according to their effectiveness in reducing incontinence episodes, then by voiding frequency, and finally by urgency episodes. Sacral nerve stimulation appeared to be the most effective intervention for the management of urinary incontinence, voiding and urgency with an estimated posterior median reduction of -7.43 (95%CrI: -9.59,-4.73), -7.64 (95%CrI: -9.82,-4.91) and -7.96 (95%CrI: -10.19,-5.29) episodes, relative to placebo, respectively.
Adopting a multivariate approach changed the overall clinical decision for urgency episodes. In univariate analyses, electrostimulation with vaginal oestrogen cream 1.25mg/day appeared to be the most effective intervention for reducing urgency (see Additional file 5). Accounting for the correlation between outcomes using MVNMA, sacral nerve stimulation appeared to be the most promising intervention overall. Table 3 displays the estimated posterior median reduction and 95% credible intervals for change from baseline in incontinence, voiding, and urgency episodes obtained from multivariate network meta-analysis incorporating class effects. Sacral nerve stimulation appeared to be the most effective intervention for reducing incontinence, voiding and urgency episodes with an estimated posterior median reduction of -8 (95%CrI: -9.54, -6.27), -8.19 (95%CrI: -9.69,-6.49) and -8.49(95%CrI: -10.11,-6.78) episodes, relative to placebo, respectively.

Multivariate network meta-analysis incorporating class effects
Incorporating class effects in MVNMAs broadly increased the precision in treatment effect estimates (Table 3) compared to MVNMAs without incorporating class effects (Table 2). For example, for sacral nerve stimulation incorporating a hierarchical structure further increased precision in the treatment effect estimates by approximately 33%, 35%, and 28% for urinary incontinence, voiding and urgency episodes, respectively.
Borrowing information between outcomes generally increased the precision in treatment effect estimates compared to univariate analyses (Figs. 6, 7, and 8). This finding was particularly apparent for sacral nerve stimulation, with which a multivariate hierarchical approach incorporating class effects increased precision in the estimated treatment effects by approximately 60% and 160% for urinary incontinence (Fig. 6) and voiding frequency ( Fig. 7) respectively, compared to treatment effect estimates obtained from univariate NMAs (see additional  Figure 9 illustrates the treatment profiles for all of the cardinal symptoms of OAB. Use of MVNMA allowed for the comparison of all interventions across all outcome measures, and thus completed the intervention profiles for efficacy outcomes. Generally, the treatment rankings were broadly similar to those obtained from univariate analyses (see Additional file 6). Sacral nerve stimulation appeared to be the most effective intervention across all three outcomes. Using multivariate analyses, estriol 1mg intravesically appeared to be amongst the top ten interventions (Fig. 9). Previously, estriol 1mg intravesically was only evaluated for voiding, and ranked amongst the top interventions       . 6 Comparison of the estimated posterior median difference (and 95% credible intervals) in change from baseline in incontinence episodes relative to placebo between individual-intervention, hierarchical, and multivariate hierarchical NMA models for the top 10 interventions (see Additional file 5). Borrowing information between outcomes allowed for estimation of treatment effects for both urinary incontinence and urgency episodes, and consequently estriol 1mg intravesically was ranked in tenth place across all outcome measures. However, the remaining top ten interventions remained unchanged.

Convergence diagnostics
Convergence diagnostic plots for a small selection of basic parameters, d 1j , the pooled effect estimate of intervention j relative to placebo, are given in Additional file 7. Both the between and within chain variability appeared to reach stability, and the ratio appeared to converge to 1. All plots appeared to show reducing autocorrelation with increased Fig. 7 Comparison of the estimated posterior median difference (and 95% credible intervals) in change from baseline in voiding episodes relative to placebo between individual-intervention, hierarchical, and multivariate hierarchical NMA models for the top 10 interventions lag. History and trace plots took the appearance of random noise, with no obvious difference between multiple MCMC chains with very different starting values. Overall all diagnostic plots appeared to suggest that there was no evidence of non-convergence and both d 1j and ζ were estimated from samples which appeared to have a reasonable degree of mixing of the chains.

Sensitivity analysis
Sensitivity analyses assessing different choices of prior distributions for the between-study standard deviations, the elements of the matrix V 1/2 , and the variability of treatment effect profiles across outcomes, ζ , are given in Additional file 8. Different choices of prior distributions for variance parameters had very little impact in Fig. 8 Comparison of the estimated posterior median difference (and 95% credible intervals) in change from baseline in urgency episodes relative to placebo between individual-intervention, hierarchical, and multivariate hierarchical NMA models for the top 10 interventions the treatment effect estimates for all three outcomes, and therefore the overall clinical decisions regarding interventions effectiveness remained the same. Thus suggesting that treatment effect estimates were robust to choice of prior distribution.

Discussion
This paper extends the MVNMA framework described by Achana et al. [7], to incorporate the exchangeability of interventions belonging to the same class of interventions. This approach makes use of the correlations between multiple outcomes in order to predict and impute treatment effect estimates for missing data, and therefore, has the potential to limit the impact of outcome reporting bias, as well as, borrows strength between interventions belonging to the same class of treatment, which has the potential to increase precision in the treatment effect estimates. In the OAB example, the datasets used in univariate analyses for each outcome independently, included 115, 119 and 60 studies evaluating 97, 100, and 54 interventions for incontinence, voiding, and urgency episodes, respectively. Despite urgency being documented as "the cardinal symptom" of OAB [25], it was sufficiently underreported in the original trials, and consequently fewer interventions were able to be evaluated in univariate analyses. Adopting a multivariate approach, we were able to borrow information across outcomes and consequently included 143 studies evaluating all 115 interventions for the management of incontinence, voiding and urgency. Using this methodology completed the treatment profiles for all prominent symptoms of OAB, which in turn allows decision makers to make inferences regarding the potential treatment benefit for all interventions, across all salient outcomes. In doing so, sacral nerve stimulation appeared to be the most effective intervention for reducing incontinence, voiding and urgency episodes with an estimated posterior median reduction of -8 (95%CrI: -9.54, -6.27), -8.19 (95%CrI: -9.69,-6.49) and -8.49(95%CrI: -10.11,-6.78) episodes, relative to placebo, respectively. However, due to the limited number of studies and participants in which sacral nerve stimulation was evaluated, these results should be interpreted with caution.
Sacral nerve stimulation was not evaluated for urgency in the original trials and thus could not be assessed in univariate analyses. Using a multivariate approach therefore changed the overall clinical decision for the management of urgency, where electrostimulation in combination with vaginal oestrogen cream 1.25mg/day was found to be the most effective intervention from univariate analyses. Similarly, from univariate analyses, estriol 1mg intravesically appeared to be a promising intervention for voiding, but there was no data for all other outcomes. Using a multivariate approach, estriol 1mg intravesically ranked in the top 10 interventions for all three cardinal symptoms of OAB.
A key assumption of MVNMA is that the data is multivariately normally distributed. In situations with which we wish to synthesise multiple binomial outcomes, or a mixture of binomial and continuous outcomes, it is common to assume a normal approximation of binomial data on the log odds ratio (or log odds) scale [9]. This assumption has been shown to hold well when the proportions of events are close to 0.5, but poorly when the proportions of events are close to 0 or 1 [31]. In situations such as these more sophisticated methods, such as copula models, are required to appropriately capture the within-study variability using binomial data directly [32].
In this paper, we describe a missing data framework to incorporate estimates of uncertainty for trials that did not report any measure of variability in the mean treatment effects. It could be argued that trials reporting incomplete outcome data are at a higher risk of bias [33], and the impact of including such trials in meta-analyses should be thoroughly explored in sensitivity analyses.
A further assumption of MVNMA is that all data are assumed to be missing at random [17], which in the case of the OAB example may not be plausible [34]. It is likely that there is an element of selective reporting in the original trials, such that outcomes with which interventions perform particularly well are more likely to be reported [35]. In this situation, treatment effects may be exaggerated [17], though recent simulation studies suggest that a multivariate meta-analysis can lead to a more appropriate estimate of treatment effect in the presence of outcome reporting bias [5,6] under a variety of missing data scenarios, including missing at random and missing not at random [6]. In order to obtain more accurate estimates of treatment effects for decision making, data are needed for all interventions, across all outcome measures. Following the Core Outcome Measures in Effectiveness Trials (COMET) initiative [36], there is a clear need to define a core outcome set (COS) for the future reporting of OAB trials [34].
Previous simulation studies in multivariate metaanalysis have shown that in situations when there are a large proportion of missing data, a multivariate approach results in increased borrowing of information and increased precision in treatment effect estimates [6,16]. Thus, multivariate methods are of most use in scenarios with a larger proportion of missing data, and less use in scenarios with complete data. In a Bayesian framework, with non-informative prior distributions specified for hyper-parameters (as described in this paper), it has previously been shown that the total number of missing values must not exceed ((J − 1) × N) − ((J − 1) + N), for J interventions and N outcomes. In situations with which the total number of missing values exceeds this number, the hyper-parameters will become unidentifiable and informative prior distributions will be required to improve model convergence [7].
To ameliorate the impact of outcome reporting bias, the correlation between outcomes were used to obtain a predictive value for missing data. In order to achieve this, an assumption of constant relative effectiveness across outcomes was assumed for the basic parameters of the pooled treatment effect estimates, d (1j)l and d (1 1 j m )l , as described in Eqs. (6) and (10) for MVNMA and MVNMA incorporating class effects, respectively. If interest lies in the difference between active interventions, this may be a strong assumption as the outcome-specific effect, γ l , will cancel. For example, the relative treatment effects of intervention A relative to intervention B obtained from MVNMA are expressed in terms of the basic parameters such that d AB = (d (1B)l − d (1A)l ) ∼ Normal(α B − α A , 2ζ 2 ). Thus, in these situations, alternative methods should be explored. An assumption of constant relative effectiveness across outcomes is of less importance if interest lies in the relative rankings of the interventions, as these are calculated based on the basic parameters, d (1j)l .
As with all random effects models, in situations when the deviation of treatment effect profiles across outcomes, ζ , is large, it may not be sensible to combine data across all outcomes. In this scenario, we encourage the use of sensitivity analyses to; i) assess the impact of potentially outlying observations in univariate and multivariate analyses, and ii) assess the difference in mean treatment effects, and variability in mean treatment effects across outcomes, using a series of univariate analyses. Reducing the model to a bivariate or univariate analysis in sensitivity analyses will allow investigation of the impact of outcomes on the magnitude of ζ , as well as the influence of missing outcome data on the robustness of the overall conclusions.
In this example, the outcome-specific effect, γ l , was assumed to be constant across all interventions. Further work could extend this model to incorporate the exchangeability of outcome-specific effects within classes of interventions, such that γ l m ∼ Normal(κ m , 2 m ), where κ m denotes the pooled outcome effect for the m th class of interventions, and 2 m denotes the class-specific between intervention variance. However this approach is likely to substantially increase the number of parameters to be estimated in the model and could lead to computational difficulties. Furthermore, if there is evidence to suggest that there is a disparity in outcome-specific effects between classes, the assumption of homogeneous between-study correlations is also unlikely to be satisfied and alternative parameterisations of the between-study covariance matrix would need to be considered.
One limitation of implementing MVNMA in Win-BUGS, is the difficultly in calculating deviance statistics for the assessment of model fit and comparison. Residual deviance could be calculated by monitoring the estimated true treatment effects, θ ij , for each study i = 1, ..., n s and intervention j = 1, ..., n t , and calculating the difference of these true treatment effects relative to the observed treatment effects, Y ij , using the following equation: total residual deviance = i=n s ,j=n t i=1,j=1 (Y ij − θ ij ) 2 S ij , where S ij denotes the treatment-specific within-study covariance matrix. In this paper, the within-study correlations were incorporated at the treatment-arm level in order to appropriately account for multi-arm trials [7], thus the withinstudy model was parametrised at the arm-level whereas the between-study model, with which θ ij is estimated, was parametrised at the study-level. This makes calculation of residual deviances more difficult, especially in the presence of large amounts of missing data. In the OAB example, there were a large number of observed data-points with intermittently missing data, therefore, such an approach would be both computational-, and time-intensive. Further work is needed to explore reparameterisation of MVNMA models, and alternative methods in order to adequately assess model fit and comparison.
In this example, use of a MVNMA incorporating class effects was illustrated using the three cardinal symptoms of OAB (incontinence, voiding, and urgency). However, interest lies in both efficacy and safety outcomes. Incorporating additional outcomes results in an exponential increase in the number of parameters to be estimated in the model. This is particularly true for estimation of the parameters involved in both the within-study, and between-study covariance matrix. This substantial increase in the number of parameters can often result in computational difficulties for complex multivariate models such MVNMAs. Furthermore, estimating the withinstudy correlation structures can be particularly difficult for mixed [9] and binary outcomes [18]. This is because an analytic solution is not possible [18]. A further limitation of using MVNMA for imputing missing data with mixed outcomes, is the assumption that intervention effects are exchangeable across outcomes. This assumption may not be reasonable if the outcomes differ in an important way, e.g, if the outcomes were measured on different scales. For example, binary outcomes on a log-odds scale and continuous outcomes on a mean difference scale will differ in terms of the uncertainty with which they were estimated.
In this analysis, a homogeneity of correlations assumption was used to simplify the number of parameters in the model and to aid computation. In situations with which there are fewer interventions, and more information for each treatment comparison, it may be desirable to incorporate treatment-specific between-study correlations. It may also be desirable to incorporate treatment-specific within-study correlations, however, data for every pair of treatment comparisons may be difficult to obtain, and thus within-study correlations may be difficult to estimate. In this example, the within-study correlations were estimated from individual patient data obtained from the RELAX trial [37]. Uncertainty in estimating the withinstudy correlations needs to be further accounted for. For continuous outcomes that follow a multivariate normal distribution, it would be possible to obtain estimates of the within-study correlations directly from the covariance matrix [9]. Estimates of the within-study correlations, together with their uncertainty, could be incorporated in to the MVNMA model by applying prior distributions to the within-study correlation parameters, pw xy , using a bootstrapping method [9].