Network meta-analysis of multiple outcome measures accounting for borrowing of information across outcomes

Background Network meta-analysis (NMA) enables simultaneous comparison of multiple treatments while preserving randomisation. When summarising evidence to inform an economic evaluation, it is important that the analysis accurately reflects the dependency structure within the data, as correlations between outcomes may have implication for estimating the net benefit associated with treatment. A multivariate NMA offers a framework for evaluating multiple treatments across multiple outcome measures while accounting for the correlation structure between outcomes. Methods The standard NMA model is extended to multiple outcome settings in two stages. In the first stage, information is borrowed across outcomes as well across studies through modelling the within-study and between-study correlation structure. In the second stage, we make use of the additional assumption that intervention effects are exchangeable between outcomes to predict effect estimates for all outcomes, including effect estimates on outcomes where evidence is either sparse or the treatment had not been considered by any one of the studies included in the analysis. We apply the methods to binary outcome data from a systematic review evaluating the effectiveness of nine home safety interventions on uptake of three poisoning prevention practices (safe storage of medicines, safe storage of other household products, and possession of poison centre control telephone number) in households with children. Analyses are conducted in WinBUGS using Markov Chain Monte Carlo (MCMC) simulations. Results Univariate and the first stage multivariate models produced broadly similar point estimates of intervention effects but the uncertainty around the multivariate estimates varied depending on the prior distribution specified for the between-study covariance structure. The second stage multivariate analyses produced more precise effect estimates while enabling intervention effects to be predicted for all outcomes, including intervention effects on outcomes not directly considered by the studies included in the analysis. Conclusions Accounting for the dependency between outcomes in a multivariate meta-analysis may or may not improve the precision of effect estimates from a network meta-analysis compared to analysing each outcome separately.


Background
Meta-analysis or the quantitative synthesis of evidence, usually from systematic reviews, has become a popular tool in healthcare evaluations [1,2]. Largely driven by a desire for more realistic synthesis of complex healthcare evidence, increasingly sophisticated methodology has been developed.
One area of meta-analysis that has seen significant methodological development is the application of multivariate statistical methods for the comparison of treatments on two or more endpoints (usually known as multivariate metaanalysis) [3][4][5][6][7][8]. These methods are appealing because many studies and systematic reviews focus on broad health effects and therefore typically report several outcome measures [4,6,9]. In such instances, the multivariate approach offers some advantages over separate univariate analyses including the ability to account for the inter-relationship between outcomes and borrow strength across studies as well as across outcomes [10] through modelling the correlation structure [7,11]. This can potentially reduce outcome reporting bias [12] and the uncertainty with which intervention effects are estimated. Additionally, in a decision making context where the synthesis is meant to inform a health economic evaluation, accounting for the correlations between effect estimates on different outcomes is important as the dependence between outcomes may have implication for estimating quality of life or economic consequences associated with treatment [13]. An example is the situation where a particularly effective treatment for a disease condition is associated with a large side effect profile. Ignoring information about the inter-relationships between beneficial and 'side effect' endpoints in such instances may have implications for quantifying the benefits associated with treatment.
When summarising effectiveness evidence, correlations between the effectiveness estimates typically arise at either within-study and or between-study levels. At the withinstudy level, correlations arise mainly due to differences in patient-level characteristics. They are rarely reported in the published literature and usually have to be estimated from external sources such as individual patient level data if available or elicited from expert opinion [8,11,14,15]. At the between-study level, correlations arise from i) differences in the distribution of patient-level characteristics across studies, in which case they will be related to the within-study correlations and/or ii) differences in the distribution of other study-level characteristics such as study design, population and baseline disease severity [16]. The within-study correlations thus give an indication of the association between multiple endpoints within a study while the between-study correlations indicate how the underlying true study-specific effects on different outcomes vary jointly across studies.
A second area of rapid methodological development is network meta-analysis (NMA) [17], also known as mixed treatment comparison meta-analysis [18][19][20] or multiple treatment meta-analysis [21][22][23]. NMA methods extend standard pairwise meta-analysis to enable simultaneous comparison of multiple treatments while maintaining randomisation of individual studies [18]. The method enables 'direct' evidence (i.e. evidence from studies directly comparing two interventions of interest) and 'indirect' evidence (i.e. evidence from studies that do not compare the two interventions directly) to be pooled under the assumption of evidence consistency [24]. Estimates of intervention effects can then be obtained, including effects between treatments not directly compared within any one individual study [19]. NMA methods thus provide a coherent framework for appraising all available evidence relevant to a specific decision problem. The results from such analyses are increasingly being used to inform economic evaluations in healthcare decision making where coherent decisions (about judicious use of scarce resource) need to be made based on sound appraisal of all available evidence.
Approaches to extend NMA methodology to multiple outcome settings have been proposed in the literature [13,[25][26][27], initially focusing on mutually exclusive competing risk outcomes [13] or a single outcome measured at multiple time points [26,28]. More recently, Efthimiou et al. [14] proposed a method for modelling multiple correlated outcomes in networks of evidence with binary outcome measures. The proposed method accounts for both the within-study and between-study correlation structure and includes a strategy for eliciting expert opinion to inform the within-study correlations. This paper contributes to the growing literature on the simultaneous evaluation of correlated outcomes. We do this in two stages. In the first stage (labelled as model 2 in the remainder of the paper), information is borrowed across studies as well as across outcomes through modelling the correlations between effectiveness estimates on different outcomes. In the second stage (labelled as model 3 in the remainder of the paper), additional information is borrowed across outcomes based on ideas for combining evidence across human and animal studies originally proposed by DuMouchel and Harris [29] and also revisited by Jones et al. [30]. The proposed second stage analysis methods allows: i) disconnected treatments to be incorporated as nodes in a network of evidence and ii) prediction of intervention effects for outcomes where evidence from primary studies is either sparse or not directly available from any one study included in the analysis. The motivating application area is injury prevention in children where a broad array of outcomes and intervention packages have been evaluated with the aim of increasing safety practices around the home (to ultimately reduce household injuries).
The remainder of this paper is structured as follows: the example dataset is first described followed by a Methods section describing the statistical models developed and implementation of the models. These are followed by sections presenting the results of applying the methods to the motivating dataset and a discussion.

Dataset
The example data comes from a recently updated Cochrane systematic review of home safety education and provision of safety equipment for injury prevention in children [31]. The models developed in this paper are applied to a subset of the review evidence relating to the prevention of poisoning injuries. Table 1 presents the data from 22 studies for the following outcomes: a) Safe storage of medicines b) Safe storage of other household products (e.g. cleaning products) and c) Possession of a poison control centre (PCC) telephone number. Thirteen of the 22 studies considered at least two of the three outcomes. Of these, 8 considered storage of medicines and storage of other household products, 2 considered storage of other household products and possession of a PCC telephone number, and 3 considered all three outcome measures. Individual patient data (IPD) were available for 8 of the 13 studies, of which 7 were in a format suitable for the analysis reported here as explained by the footnotes in Table 1.
We classified the interventions trialled in the 22 studies into 9 relatively homogenous treatment packages: 3) Education + provision of free/low cost equipment (E + FE) 4) Education + provision of free/low cost equipment + home safety inspection (E + FE + HSI) 5) Education + provision of free/low cost equipment + fitting of equipment (E + FE + F) 6) Education + home safety inspection (E + HSI) 7) Education + provision of free/low cost equipment + home safety inspection + fitting of equipment (E + FE + HSI + F) 8) Education + home visit (E + HV) 9) Provision of free/low cost equipment (FE). Figure 1 shows the comparisons between the interventions that were made by individual studies and the number of comparisons in each network. All studies compared 2 intervention strategies, except Babul et al. (2007) [33] which compared 3 strategies. Data on each outcome was not available for all interventions; i.e. for the storage of medicines and other household products outcomes, interventions E + HSI and E + HV were not investigated in any of the included studies, and for possession of a PCC number interventions, E + FE + F and FE were not available.

Methods
In this section, we first present the NMA statistical model for one binary outcome measure and then extend it to compare multiple interventions across multiple outcomes. Throughout the paper, we refer to these single and multiple outcome models as univariate and multivariate NMAs respectively. Where studies report multiple outcomes, these will not be independent as each household provides information on the different outcome measures within intervention arms. The multivariate model takes this correlation structure into account by allowing the intervention effects measured by one outcome to be correlated with the intervention effects measured by other outcomes.

Model 1: Univariate NMA
Given arm-level binary data of the form presented in Table 1, a random effects NMA may be specified using the method of Lu and Ades [20]. It is assumed that the occurrence of r ik events out of a total of n ik households in the kth-arm (k = A,B,C,….,) of the ith-study follow a binomial distribution with underlying event probability p ik : where μ ib is a study-specific baseline effect (i.e. the log-odds for the control group in study i with baseline treatment b), δ i(bk) is a study-specific log-odds ratio, d (bk) is the pooled effect of treatment k relative to treatment b (a quantity usually of interest in a meta-analysis) and σ 2 bk ð Þ is the between-study variance or heterogeneity parameter. Random effects NMA assumes that intervention effects are exchangeable across the network regardless of whether or not treatments b and k are included in study i [18]. This assumption implies that the pooled effects d (bk) , can be expressed as functions of basic parameters with reference to a common comparator or baseline treatment (i.e. [24]. Throughout this paper, we take usual care (UC) intervention to be the reference or 'baseline' treatment (i.e. UC is taken as treatment A of equation (2) above). Multi-arm studies (i.e. studies with more than 2 treatment groups) present a special problem in network meta-analysis because they produce evidence on multiple treatment effects that are correlated through sharing a common reference or 'baseline' treatment. Under a homogenous variance assumption ( σ 2 bk ð Þ ¼ σ 2 ), the covariance between any two effects that share a common reference treatment is σ 2 2 [20]. The homogeneous variance assumption allows for the distribution of effects (in a study with an arbitrary number of arms) to be expressed as a univariate marginal distribution and a series of univariate conditional distributions. Specifically, for the ith-study with p + 1 arms and p treatment effect estimates relative to the reference treatment, if then the marginal and conditional univariate distributions for arm j, given the previous 1, ⋯, (j − 1) arms are: The analysis is conducted within a Bayesian framework requiring prior distributions to be specified for all model parameters. Accordingly, we specified minimally informative prior distributions corresponding to a Normal (0,10 3 ) prior distribution for the pooled mean effects relative to usual care, d (Ak) and the study-specific baseline effects, μ ib and a Uniform (0,2) prior distribution for the betweenstudy standard deviation on the log odds ratio scale σ [34].

Model 2: Multivariate NMA
We extend the univariate NMA model defined above to the multiple outcomes settings in order to account for correlations between intervention effects on different outcomes. The method presented here follows from Ades et al. (2010) NMA with competing risks model [13] where only the within-study correlations are taken into account. We extend their method to account for the between-study correlation as well.
Note that in Ades et al. (2010), a multinomial likelihood was appropriate as the three binary outcomes (relapse during treatment for Schizophrenia, discontinuation because of intolerable side effects, and discontinuation for other reasons) are mutually exclusive and event probabilities sum to 1 across outcomes. A multinomial likelihood will not be appropriate for our example dataset because each household can have one, two or all three outcomes simultaneously so that the event probabilities do not sum to 1 across outcomes. Instead, we use a Normal distribution on the log-odds scale to take account of the within-study correlations between outcomes. We assume that in each study i and for each k-th arm, the estimatesθ ikm of the observed log-odds of an event on the mth outcome (m = 1, 2, ⋯, M) jointly follow a multivariate normal distribution: represent vectors of 'true' baseline and study-specific effects in study i with baseline treatment b respectively. The quantitiesθ ik1 ;θ ik2 ; ⋯; θ ikM and (θ ik1 , θ ik2 , ⋯, θ ikM ) represent vectors of observed and 'true' log-odds of response in arm k of study i and S ik is the associated within-study covariance matrix usually assumed known but estimated in practice from the data [35]. We calculatedθ ik1 ;θ ik2 ; ⋯;θ ikM and the diagonal elements of S ik using standard formulae for log-odds and variance of the log-odds [2]. We applied a continuity correction by adding 0.5 to the numerators and 1 to the denominators of studies with 0% or 100% event rate in one of the treatment arms [36,37]. The off-diagonals of S ik were calculated from estimates of within-study correlations r mn ik between outcomes m and n (m ≠ n) in arm k of study i obtained from studies with IPD (see Additional file 1: Table S1). The method used to estimate the correlations from the IPD is described in the implementation section below.
When summarising evidence across multiple endpoints, it is common to encounter instances where some studies do not report information for all outcomes of interest leading to incomplete vectors with missing study-specific effects for the outcomes not reported [5,10]. Such studies can be included in our model under the assumption that the effects for outcomes not reported are missing at random. When implemented using the WinBUGS software, the missing study effects and standard errors are coded as NA in the data, a strategy previously outlined in Bujkiewicz et al. [10] and Dakin et al. [28]. This enables WinBUGS to automatically 'impute' values for the missing information under missing at random assumption with predicted distributions.
We refer to equation (5) as the within-study model and the model describing the distribution of the 'true' effects across studies (presented below) as the between-study model following standard terminology in multivariate metaanalysis [5,6,10,11,38,39]. For the network of two-arm trials, the between-study model for the ith study is thus given by: where the 'true' effects δ i(bk)m (m = 1, 2, ⋯, M) jointly follow a Normal distribution with mean effects d (bk)m . The parameters in equation (6) have the same interpretation as in equation (2) except that they are now specific to each outcome. The covariance matrix Σ (bk) contains terms for the between-study variances, σ 2 bk ð Þm for each outcome m and the between-study correlations ρ mn bk between effects measured by outcome m and n (m ≠ n) specific to each k versus b comparison. Fitting the full model would thus require a large number of possibly multi-arm studies in order to make Σ (bk) identifiable [5,13]. The number of parameters in Σ (bk) , can however be reduced if reasonable assumptions can be made about the covariance structure. In particular, most practical applications of NMA methods involve the assumption of a common between-study variance across treatment arms, often referred to as a homogenous variance assumption [18,40,41]. Therefore, to simplify Σ (bk) we make the additional assumption in this context of a common between-study correlation (ρ mn bk ¼ ρ mn ) leading to the following simplified between-study covariance structure for two-arm studies: where, as in the univariate case, σ m represent the common between-study standard deviation or heterogeneity parameter specific to outcome m.
To include multi-arm studies in our model, we extend equations (3) and (4) to the multiple outcome setting. We show in Appendix A, that under evidence consistency and the homogenous between-study covariance structure (σ 2 bk ð Þm ¼ σ 2 m and ρ mn bk ¼ ρ mn ), equation (3) can be extended to the multiple outcome settings by formulating the distribution of effects in a multi-arm study i with p + 1 arms reporting on m = 1, 2, ⋯, M outcomes as follows: where p is the number of treatment effect estimates. The corresponding marginal and conditional distributions for arm j, given the previous 1, ⋯, (j − 1) arms are: To complete model 2, μ ibm and d (1k)m are given minimally informative prior distributions: ð Þm e Normal 0; 10 3

À Á
Prior distributions also need to be specified for Σ (M × M) which, in general, is non-trivial because of the positive definite constraint. Initially we specified an Inverse-Wishart distribution [42]: where K is M × M scale matrix and M is the total number of outcomes. Specifying minimally informative Inverse-Wishart prior distribution is, however, problematic, especially when the amount of data is small relative to the dimensions of Σ (M × M) as is the case for our example data. Therefore, to allow for flexibility in formulating a prior distribution for Σ (M × M) , we also followed a strategy outlined by Lu and Ades (2009) [43] and more recently by Wei and Higgins (2013) [39] to express Σ (M × M) in terms of a diagonal matrix of standard deviations V 1/2 and squared positive semi-definite matrix of correlations R based on a separation strategy (Barnard et al. [44]): where the off-diagonal elements of R contain correlation terms and diagonal elements equal 1. Lu and Ades [43] and also Wei and Higgins [39] showed that R can be written as R = L T L using Cholesky decomposition where L is an upper triangular matrix. The spherical parameterization technique [39,43] can be used to express R in terms of sine and cosine functions of the elements in L. Using this later technique, we specified Uniform (0,π) prior distributions for the spherical coordinate φ mn in our model to ensure that elements of the correlation matrix R lie in the interval (−1,1). Finally, the elements of V 1/2 correspond to the between-study standard deviation terms in Σ (M × M) and are given independent Uniform (0,2) prior distributions as in the univariate case (model 1).

Model 3: Borrowing strength across interventions and outcomes
From Table 1, it can be seen that none of the studies had considered the interventions E + HSI and E + HV for storage of medicines and storage of other household products. Similarly, interventions E + FE + F and FE were not trialled by any of the included studies on possession of a PCC number. To estimate the full set of 24 basic intervention effects relative to usual care from 9 interventions on 3 outcomes, we applied methods originally proposed by DuMouchel and Harris [29] and revisited by DuMouchel and Groer [45] and Jones et al. [30]. We assume that the pooled effects of treatment k relative to usual care intervention d (Ak)m , can be expressed as a sum of treatment-specific effect α k and outcome-specific effect γ m . This assumption replaces the minimally informative prior distribution N(0, 10 3 ) specified for d (Ak)m in model 2 with: where K is the total number of treatments being evaluated across M outcomes, and for k = 1 (i.e. reference treatment A), d (Ak)m equal to zero. Note that on the logarithmic scale, this would imply that the ratio of any intervention effects is constant across outcomes as the γ m cancel, i.e.
Equation (10) thus embodies an assumption of equal or constant relative potency of treatments across outcomes which imply exchangeability of the relative effects between the non-reference/baseline treatments indicated by equation (11). For our example dataset, this implies that missing intervention effects for comparisons with the usual care intervention can be predicted directly from equation (10) as a linear combination of γ m and α k assuming that each treatment effect relative to usual care is reported on at least one outcome. The missing intervention effects between non-reference/baseline treatments if required can similarly be predicted directly from the model as linear combinations of the intervention effects relative to usual care.
The parameter τ controls the accuracy of the constant relative potency assumption. Values of τ close to zero would thus indicate a high degree of confidence (and support from the data) in the parallelism of effect profiles across outcomes and the constant relative potency assumption. Conversely, larger values of τ would indicate otherwise.
Multi-arm studies are included in model 3 based on equations (8) and (9) in the same way as in model 2. To complete model 3, the parameters α k , γ m and τ are given minimally informative prior distributions. For the mean effects, this is a normal distribution with zero mean and large variance: α k ; γ me Normal 0; 10 3 À Á We give τ a Uniform (0, 2) prior distribution, considered to be minimally informative on the log-odds ratio scale. Sensitivity analyses were conducted to assess the impact of specifying alternative prior distributions for τ that are also considered minimally informative [46]: i. Normal prior distribution centred on 0 with large variance and constrained to be positive, τ~N (0, 10 2 ), τ ≥ 0 ii. Gamma prior distribution placed on the precision: τ − 2~G amma(0.001, 0.001).
The results of the sensitivity analyses are presented in Additional file 1: Figure S1.
There is a limitation to the number of data (i.e. intervention effects relative to the usual care) on outcomes allowed to be missing for the model hyper-parameters to be identifiable. For K interventions and M outcomes, there will be (K − 1) × M equation (10) that are used to estimate a total of (K − 1) + M hyper-parameters (i.e. (K − 1) of α k and M of γ m hyper-parameters). Therefore no more than ((K − 1) × M) − ((K − 1) + M) missing values in total are allowed. For example, for K = 3 treatments and M = 2 outcomes, data has to be available on both outcomes for both treatment comparisons with the baseline when the prior distributions are non-informative. When large number of data on outcomes is missing, placing informative prior distributions on the hyper-parameters can improve convergence.

Implementation of the models
We fitted a total of four models, models 1 and 3 as described above and two versions of model 2. In model 2a, we specified an inverse-Wishart prior distribution for the between-study covariance matrix Σ (M × M) whilst in model 2b, we specified a prior distribution for Σ (M × M) based on the separation strategy. All four models allowed for multi-arm trials to be included in the analysis. To fit the multivariate NMA models, the quantities θ^i k1 ; θ ik2 ; θ^i k3 ð Þand the diagonals of S ik were estimated using standard 2×2-table formulae [2]. Next, we obtained estimates of the within-study correlations from the IPD studies using the following three methods: i) Pearson correlation coefficient between the observed outcome events ii) Bootstrapping as described in Daniel and Hughes (1998), and iii) Generalised Estimating Equations (See details in Additional file 1). All three methods produced identical estimates of the correlations between pairs of outcome specific log-odds of event from each IPD study (Additional file 1: Table S1). Therefore, we formulated informative prior distributions for the correlation terms in S ik of equation (5)  where r mn ik is the within-study correlation between the outcomes m and n effects measured on the log-odds scale in arm k of study i, and r mn and var (r mn )are the mean and variance of the within-study correlation between outcomes m and n effects measured by IPD respectively.
We also assessed consistency of the evidence within each network using a method based on node splitting [24]. We found no evidence of conflict between the direct and indirect sources (Additional file 1: Table S2) in all three outcome networks. We fitted all models described above in WinBUGS [47] using Markov Chain Monte Carlo (MCMC) simulations. The univariate models were fitted separately for each outcome using WinBUGS code available from Dias et al. [48]. The WinBUGS code for the multivariate models is provided in the appendices 1 and 2 of the Additional file 1. Convergence was assessed by examination of the trace and autocorrelation plots and the Rubin-Gelman statistic after running 400 000 simulations and discarding the first 200 000 samples as 'burn in samples'.

Univariate and multivariate analyses
Parameters of interest were the posterior median estimate (and 95% credible intervals) of the pooled intervention effects relative to the usual care intervention and the posterior median estimate (and 95% credible intervals) of the between-study standard deviation and correlation terms. Summary forest plots displaying effectiveness estimates relative to usual care on the odds ratio (OR) scale are presented in Figure 2. It can be seen that, all 4 models produced broadly similar estimates when the treatment effect is not extreme compared to the other effect estimates for the same outcome. Compared to the univariate analysis, the multivariate models produced noticeably less extreme estimates of intervention effects. This can be seen in the effect of FE + HSI (3) on possession of PCC number being shifted towards the line of no effect from an OR of 39 Posterior median and 95% credible intervals of the between-study standard deviations and correlations are presented in Table 2. The posterior medians of the between-study correlations from the multivariate models were small and estimated with considerable uncertainty (i.e. all had large variances). Estimates of the betweenstudy standard deviations were broadly similar for the univariate NMA (model 1) and the multivariate NMA using the separation strategy (model 2b), and relatively high for multivariate NMA using the inverse-Wishart prior distribution (model 2a).

Borrowing strength across outcomes
It can be seen from Figure 2 that the effect of E + HSI and E + HV relative to usual care intervention on safe storage of medicines and safe storage of other household products, and E + FE + F and FE on possession of a PCC telephone number were only estimated in model 3 as none of the studies had trialled these interventions on the respective outcomes. In this model, estimates of relative effects between non-reference/baseline treatments were assumed to be exchangeable across outcomes, which enabled estimates to be obtained for all outcomes by predicting effects where the interventions have not been considered for the particular outcome of interest. For the intervention/outcome pair where data from trials were available, this extrapolation step had the additional effect of producing more precise estimates of the treatment effect in comparison to the models that do not assume exchangeability effects across outcomes.   The posterior median and 95% credible intervals of intervention effects relative to usual care were unaffected by placing alternative minimally informative prior distributions on τ (Additional file 1: Appendix 2). The posterior median and credible intervals for τ ( Table 2) were similarly not sensitive to the choice of prior distribution placed on τ in the primary and sensitivity analyses. The posterior median estimates of τ were all close to zero, which suggest that assumptions about the parallelism of effect profiles across outcomes is supported by the data. The estimates of τ would thus suggest with 95% probability, that based on the information in our example dataset, the extrapolation model could be accurate to within a factor of about e (2 × 0.10) = 1.24 (95% CrI 1.01 to 2.87).

Discussion
We have presented methods for simultaneous comparison of multiple treatments across multiple outcome measures while preserving the internal randomisation of individual studies. Our method may be viewed as an extension of Ades et al.'s (2010) NMA with competing risks paper [13] wherein only the within-study correlation is taken into account. We have extended their method to account for the dependency between outcome effects across studies as well as within-studies.
In this particular application of the multivariate approach, accounting for the correlation between outcomes alone (models 2a and 2b) did not reduce the uncertainty around estimates of intervention effects compared to analysing each outcome separately (model 1). Assuming that intervention effects are exchangeable across outcome did however lead to a modest reduction in uncertainty around effectiveness estimates (model 3).
The between-study correlations were estimated with considerable uncertainty (Table 2) and appear to have little impact on overall effect estimates. This may be because the between-study correlation arises due to, among other things, differences in study-level characteristics that also give rise to between-study heterogeneity in a meta-analysis. Based on a criterion outlined in Spiegelhalter et al. [49] the posterior median estimates of the between-study standard deviations, σ 1 and σ 2 on the log odds ratio scale (Table 2) could be interpreted as indicating evidence of low to moderate heterogeneity for storage of medicines and storage of other household products outcomes. Only the estimates for possession of poison control centre number exhibited a considerable degree of heterogeneity. Consequently, the posterior medians of the between-study correlations were small. There was therefore very little gain (in terms of increasing the precision of estimates) from formulating the betweenstudy covariance structure described for the analysis presented here. Accounting for the between-study correlation is likely to be beneficial in situations where the between-study variance (heterogeneity) is large relative to within-study variances.
We opted to incorporate the within-study correlation through the arm-specific effects (log-odds) rather than the study-specific treatment difference (log-odds ratio) as is often done in multivariate meta-analysis [3,4,15,38]. This approach greatly simplifies the likelihood for multiarm studies because treatment arms can be considered independent as a consequence of randomisation. Hence, there is no requirement to account for the additional correlations between effect estimates which share a common comparator treatment in the model likelihood [50]. The arm-based approach is also likely to be useful when (as is typical with many practical application of multivariate meta-analysis) the within-study correlations are not available [10,12,15,51] and have to be obtained from an external source such as expert opinion [14]. In such situations, formulating questions about correlations between outcome-specific event probabilities (which can be used directly in an arm-based approach) is more likely to be intuitive and easily understood by non-statistician healthcare experts than questions about correlations between intervention effects. It is acknowledged however, that the correlations between the intervention effects if required can easily be obtained from the correlations between the outcomes [14,51]. At the between-study level, we assumed a common correlation structure across treatments in addition to the common variance assumption underlying most practical application of NMA methods. The common correlation assumption implies that if several separate multivariate meta-analyses were conducted with the same outcomes, each with a different set of k versus b comparison, the assumption is that the between-study correlations would be the same across the different sets of bk comparisons. We suggested this structure to simplify the covariance structure and reduce the number of parameters in the model. Appropriateness of such modelling assumptions would need to be considered carefully and assessed when it is feasible to do so.
Initially we specified an inverse-Wishart distribution for the between-study covariance matrix Σ (M × M) . However, we believe this prior distribution to be influential due to the small number of studies in our example dataset relative to the number of outcomes. Under these conditions, the inverse-Wishart prior distribution produced upwardly-biased estimates of σ 1 and σ 2 and downward bias in the estimate for σ 3 when compared to the corresponding estimates obtained from the univariate model (Table 2). These findings are consistent with observations in the univariate case where the use of a Gamma prior distribution (which is the univariate analogue of the Inverse-Wishart prior distribution) can lead to an overestimation of the heterogeneity parameter when the true value is close to 0 [46,52]. As an alternative to an inverse-Wishart prior distribution therefore, we followed the spherical decomposition technique suggested by Lu and Ades [43]. This parameterization offered greater flexibility in formulating independent prior distributions for the standard deviation and correlation terms in Σ (M × M) .
An obvious limitation to implementation of the multivariate models presented in this paper is the limited availability of data including i) the problem of missing within-study correlations and ii) the requirement for a relatively large number of studies to estimate all model parameters. The problem of missing within-study correlations has traditionally hampered the widespread application of multivariate meta-analysis [7,10,15]. In our example, IPD was available from a proportion of the included studies and we have used correlations estimated from the IPD to formulate informative prior distributions for the within-study model. Alternative approaches to dealing with missing within-study correlations when IPD is not available include: i) using the observed correlation from the summary study-specific effects [12], ii) eliciting information about the correlations from external sources such as clinical experts [14] and iii) specifying 'vague' prior distributions for analysis conducted within a Bayesian framework [6].
The second data issue concerns the number of studies needed to estimate the full unstructured betweenstudy covariance matrix presented in equation (6). We anticipate a large number of multi-arm studies reporting across the three outcomes will be needed to identify Σ (bk) and estimate all model parameters. This can be problematic considering the fact that most applications of network meta-analysis typically include mostly two-arm studies with very small numbers of multi-arm studies. Even with the simplification of the betweenstudy covariance matrix given in equation (5), a relatively large number of studies in comparison to the total number of outcomes being considered may still be needed. We are unable to answer the question of how many studies should be considered large enough for a NMA with multiple outcomes. As a guide, Wei and Higgins [39] recently estimated 15, 27 and 42 studies as a minimum for multivariate pairwise metaanalysis with two, three and four-outcomes respectively. Hence, we believe an even larger number of studies will be required for the NMA with multiple outcomes.
Another limitation of the multivariate models presented here is that they rely on the normal approximation to binomial distribution to incorporate the within-study correlations in the model. The normal approximation frequently fails and may not provide adequate fit to the data in the presence of studies with zero or a small number of events, necessitating use of continuity corrections. We were unable to use the exact binomial distribution as our primary interest was to develop models for summary binary data where outcomes are not mutually exclusive, and where it is not reasonable to assume that within-study correlations are zero so that the likelihood factorises easily as in Arends et al. [3]. Further methodological investigations into modelling multivariate summary data that is not normally distributed will therefore be useful. An example is provided in Chu et al. [53] where parameterization of the within-study model enabled the special case of diagnostic sensitivity and specificity to be jointly modelled with disease prevalence using a trivariate binomial likelihood. In the interim, an alternative formulation which bypasses the need for approximating normal distributions is to directly model the IPD where this is available. This will require extending Saramago et al.'s [54] NMA model with aggregate and individual participant level data from single outcome to multiple outcome settings.
We assessed the consistency of each outcome network separately using the method of node splitting [24]. We found no evidence of conflict between the direct and indirect sources on pairwise contrasts that have both sources of evidence in model 1. We did not assess the consistency of the multivariate estimates partly because we are unaware of current methods for carrying out this type of assessment. We are investigating extensions of the node-split method to multiple outcome networks and investigate the effect of jointly synthesising evidence across multiple endpoints on evidence consistency in a simulation study.
Our initial motivation for a multiple outcome NMA was to estimate intervention effects for all the outcomes, including effects of interventions on outcomes not considered by any of the studies included in the analysis. This requires the correlation structure between effects on multiple outcomes to be appropriately modelled and also ensuring the mechanism of "borrow strength" across outcomes through the assumption of exchangeability of the random effect across outcomes. This implies a priori assumption that outcomes are related but different and that there is no way of knowing the order of magnitude of effects on outcomes. If this assumption does not hold, it may potentially lead to worse or more biased effectiveness estimates. In our example, the outcomes are similar and measured on the same scale. It would be clearly inappropriate to assume that intervention effects are exchangeable across outcomes that are different in some important respects such as being measured on different scales (e.g. where one outcome reports a weighted mean difference and another outcome reports a log-odds ratio) as such estimates will differ in terms of the precision with which they are estimated.

Conclusion
Our aim in this paper was to present methods for simultaneous comparison of multiple treatments across multiple outcome measures while preserving the internal randomisation of individual studies. Application of the method to the poison prevention data yielded similar point estimates of treatment effect to those obtained from a univariate NMA but the uncertainty around the multivariate estimates increased or decreased depending on the prior distribution specified for the between-study covariance structure. The proposed method followed the usual hierarchical approach to multivariate meta-analysis where correlations between outcomes are modelled at the within-study and or between-study levels.

Appendix A
Between-study covariance for multi-arm studies reporting multiple outcomes For a multi-arm study i with K treatments labelled A, B, C, …, K reporting a total of M outcomes labelled 1, 2, …, M. A random effects between-study model can be represented as: Where δ i(Ak)m and d (Ak)m are study-specific and mean effect of treatment k relative A (reference treatment) on outcome m in study i respectively and Σ FULL is the full (K-1) × (K-1) blocks of M × M within-treatment betweenoutcome covariance matrix. The parameters in Σ FULL have the following interpretation: (σ 2 Ak ð Þ m ) indicate the variance of the effect of treatment k (k = B, C, ⋯, K) relative to A on outcome m across studies. ρ mn Ak;Ak ð Þ indicate the correlation between δ i(Ak)m and δ i(Ak)n (i.e. the correlation between the effect of treatment k relative to A on outcome m and the effect of treatment k relative to A on outcome n (m ≠ n)) specific to the Ak comparison.
ρ mm Ah;Ak ð Þ indicate the correlation between δ i(Ah)m and δ i (Ak)m (i.e. the correlation between the effect of treatment h relative to A on outcome m and the effect of treatment k relative to A (h ≠ k) on outcome m because they share a common comparator A).
The diagonal block matrices in Σ FULL thus carry terms for the between-study variance (σ 2 The homogenous between-study correlation assumption implies ρ mn Substituting these expressions into equation (A1), we see that the between-study correlation terms equal ρ mn in the diagonal block of matrices and 1 2 ρ mn in the off-diagonal block of matrices of in Σ FULL leading to the following simplificationfollowing simplification of the between-study covariance matrix: Finally by relabeling the reference treatment A as b, (δ i(AB)1 , ⋯, δ i(AK)m ) as δ i bk1