The variance structure in an MTC is challenging to estimate because it rests on the amount of evidence and the linkage between comparisons. A number of approaches are available, but their performance is tied with the appropriateness of the assumed linkage between comparisons, and in the Bayesian framework, the elicited variance priors. Conventional MTC models have made use of the unrealistic assumption that the between trial variances for the included comparisons are all equal [4–6, 10, 15]. Emerging evidence (including our examples), however, suggest this approach is sub-optimal [10, 15]. Instead, there is a need to consider ‘heterogeneous variance structures’. Because the amount of evidence to reliably estimate heterogeneity variance parameters is typically sparse, some precision can be gained either by incorporating informative variance priors or by using alternative restrictive heterogeneity variance structures in connection with weakly informed variance priors. In this paper we have considered two types of informative variance priors: frequentist and empirically informed; and we considered two restrictive variance structures with weakly informative priors: the exchangeable variances approach, and the consistency variances approach.

Our examples suggest that these four approaches all allow for reliable estimation of differing between-study heterogeneity variances across comparisons, whereas the unrestricted approach often does not. To this end, these four approaches seem superior to the homogeneous variance structure model as well as the unrestricted heterogeneous variances approach. The frequentist informed approach yielded the best model fits in both example, and although further research is needed at this point, one could argue for this approach as a primary supplement to the conventional homogeneous model.

Our study offers several strengths, but also has some limitations. Our chosen illustrative examples are of different size and complexity and yield heterogeneity estimates for which the homogeneous variance assumption was violated to an extend that impacted the findings of the MTCs. Our study is also the first to compare multiple weakly and moderately informed approaches to modelling heterogeneity in MTCs. Our study, however, is by no means generalizable to all MTCs. Several treatment networks may exist or emerge in which, for example, the homogeneous variance model and some heterogeneous variance model will yield close to equal inferences about all comparative treatment effects. In this vein, it is important that authors and readers of MTCs continually pay careful consideration to the fragility of variance estimation, credible intervals and treatment rank probabilities. Another limitation is the empirical nature of this study. With empirical data we can only observe differences, but never infer definitively about the truth. In this context, simulation studies would be needed to investigate the performance of the models based on bias, precision, MSE, etc., under different scenarios and types of networks. However, we believe additional empirical studies are necessary to inform which scenarios are truly important to explore under simulation.

Appropriate modelling of heterogeneity variances in MTCs will become increasingly important over the next years. First, ‘statistical significance’ and treatment rank probabilities can be sensitive to the employed variance structure and variance priors [15]. Since regulatory agencies and clinical decision makers increasingly rely on comparative effectiveness inferences from MTCs, choosing the appropriate variance structures and priors (and necessary sensitivity analyses) also becomes increasingly important.

Further, we will likely see an increase in MTCs incorporating meta-regression or subgroup analysis to explain the observed heterogeneity by effect modification caused by some clinical covariate(s). In this vein, appropriately estimating the unexplained degree of heterogeneity for each treatment comparison is seminal to reliable estimation of the effect modification caused by some clinical covariate(s). In other words, without unbiased quantification of heterogeneity it becomes increasingly challenging to explain heterogeneity.