An empirical comparison of Bayesian modelling strategies for missing binary outcome data in network meta-analysis

Spineli, Loukia M.

doi:10.1186/s12874-019-0731-y

Research article
Open access
Published: 24 April 2019

An empirical comparison of Bayesian modelling strategies for missing binary outcome data in network meta-analysis

Loukia M. Spineli ORCID: orcid.org/0000-0001-9515-582X¹

BMC Medical Research Methodology volume 19, Article number: 86 (2019) Cite this article

2277 Accesses
18 Citations
3 Altmetric
Metrics details

Abstract

Background

A number of strategies have been proposed to handle missing binary outcome data (MOD) in systematic reviews. However, none of these have been evaluated empirically in a series of published systematic reviews.

Methods

Using published systematic reviews with network meta-analysis (NMA) from a wide range of health-related fields, we evaluated comparatively the most frequently described Bayesian modelling strategies for MOD in terms of log odds ratio (log OR), between-trial variance, inconsistency factor (i.e. difference between direct and indirect estimates for a comparison), surface under the cumulative ranking (SUCRA) and rankings. We extended the Bayesian random-effects NMA model to incorporate the informative missingness odds ratio (IMOR) parameter, and applied the node-splitting approach to investigate inconsistency locally. We considered both pattern-mixture and selection models, different structures for prior distribution of log IMOR, and different scenarios for MOD. To illustrate level of agreement between different strategies and scenarios, we used Bland-Altman plots.

Results

Addressing MOD using extreme scenarios and ignoring the uncertainty about the scenarios led to systematically different and more precise log ORs compared to modelling MOD under the missing at random (MAR) assumption. Hierarchical structure of log IMORs led to lower between-trial variance, especially in the case of substantial MOD. Assuming common-within-network or trial-specific log IMORs yielded similar posterior results for all NMA estimates, whereas intervention-specific structure systematically inflated uncertainty around log ORs and SUCRAs. Pattern-mixture model agreed with selection model, particularly under the trial-specific structure; however, selection model systematically reduced precision around log IMORs. Overall, different strategies and scenarios mostly had good agreement in the case of low MOD.

Conclusions

Addressing MOD using extreme scenarios and/or ignoring the uncertainty about the scenarios may negatively affect NMA estimates. Modelling MOD via the IMOR parameter can ensure bias-adjusted estimates and offer valuable insights into missingness mechanisms. The researcher should seek an expert opinion in order to decide on the structure of log IMOR that best aligns to the condition and interventions studied and to define a proper prior distribution for log IMOR. Our findings also apply to pairwise meta-analyses.

Peer Review reports

Background

Missing (participant) outcome data (MOD) in a series of trials have preoccupied a number of researchers who have contributed to the development of several methods of different complexity (for example, [1,2,3,4,5,6,7,8,9,10]) to address primarily binary MOD in a pairwise meta-analysis. Only a handful of these methodologies have been extended further to operate in a network of several interventions [8, 11]. These methodological articles provide only limited empirical evidence to demonstrate the merits and demerits of proposed methods as they usually consider one published systematic review with pairwise or network meta-analyses (NMA). Furthermore, the modelling strategies and missingness scenarios considered to investigate the value of proposed methods differ considerably across methodological articles (Additional file 1: Table S1).

There is no universally ‘best’ strategy for how authors of systematic reviews should deal with MOD in included trials. Like other types of missing data (e.g. missing studies and outcomes), successful handling of MOD rests on plausible yet untestable assumptions regarding the missingness mechanism in conjunction with appropriate analytical strategies [4]. In practice, the missingness mechanism is explored by making sensible assumptions on whether data are informatively missing, and if so, what the outcomes would plausibly be if participants had never left the trial. When included trials provide limited or no information on the reasons for MOD, in order to explore assumptions empirically, the meta-analyst examines the sensitivity of results to plausible scenarios [2]. A usual starting point of the analysis is to assume that data are missing at random (MAR) and then investigate any deviations from this assumption by performing a series of sensitivity analyses (Additional file 1: Table S1) [2,3,4, 12].

According to the Cochrane handbook (version 5.1.0) [13], principal options to deal with MOD in a pairwise meta-analysis constitute (i) exclusion of missing participants from the analysis, (ii) imputation of missing outcomes in each arm of every trial using specific scenarios and (iii) statistical modelling of the missingness mechanism. Furthermore, uncertainty induced by imputing MOD according to item (ii) might be accounted for or not in the meta-analysis results [1, 2]. These options are also relevant in the context of NMA. Since NMA is an extension of pairwise meta-analysis, these options extend naturally even though authors of relevant published literature may not have explicitly done (e.g. Turner et al. [10]). However, extension of these options to a network of interventions should be accompanied by comprehensive investigation and acknowledgement of the implications of MOD on core components of the NMA model (i.e. consistency equation and ranking measures). Otherwise, a suboptimal reporting and handling of MOD in a network of interventions can greatly raise risk of providing misleading conclusions.

We consider statistical modelling to be a more proper strategy to handle MOD because – contrary to exclusion or imputation of MOD before analysis – it accounts for possible bias and uncertainty around trial-specific estimates of treatment effect due to MOD while maintaining the randomised sample in each trial [10]. In particular, modelling MOD using Bayesian approaches – the latter being very popular in NMA as they foster probabilistic statements that are an integral part of the inferential NMA framework [14, 15] – naturally allows for uncertainty induced by MOD to be incorporated into NMA estimates using proper prior distributions. To explore the implications of different Bayesian modelling strategies of binary MOD on core NMA components, we set up a comprehensive empirical study using published systematic reviews with NMA from a wide range of health-related fields [16]. In this way, we can investigate whether, and for which NMA estimates, the compared modelling strategies disagree using real data and taking into account the extent and balance of MOD within each network – factors that may trigger this discordance. Since NMA constitutes an increasingly applied evidence-synthesis tool that has become widely acknowledged by researchers and policy-making bodies, such as the National Institute of Clinical Excellence [15, 17,18,19], it is crucial to provide the necessary, empirically based directions to handle MOD appropriately in a network of several interventions.

The rest of the article is organised as follows. Initially, we describe our analysed dataset and then review the modelling strategies and missingness scenarios that we incorporated in the Bayesian random-effects NMA model. Furthermore, we delineate the analyses we performed to compare the reviewed modelling strategies in terms of NMA estimates. Then, we present the results of the empirical evaluation, we discuss our results and highlight important limitations and recommendations, and we provide our conclusions.

Methods

Selection process of analysed dataset

This empirical study was based on our previous survey with systematic reviews of multiple interventions published between 01/01/2009 and 31/03/2017 in peer-reviewed journals of several health-related fields [16]. Details on the search strategy and selection process of the eligible systematic reviews and NMAs can be found in our previous work [16].

We only considered NMAs (31 in total) that provided arm-level binary outcome data with present MOD in included trials; however, we excluded one review where no NMA was employed, and one review for reporting data in a non-extractable manner. The whole selection process resulted in 29 eligible NMAs in total that comprised our empirical dataset.

We used odds ratio (OR) as the effect measure in all eligible NMAs mainly due to its preferred statistical properties [20]. In each network, we recorded outcome events so that OR more than 1 indicated beneficial effect for the first intervention in each comparison.

Characterising networks based on prevalence and balance of MOD

We considered the ‘five-and-twenty rule’ as proposed by Sackett et al. [21] to determine a trial as having low (MOD ≤5%), moderate and large risk (MOD > 20%) of attrition bias. Furthermore, we calculated difference in percentage of MOD (%MOD) between compared interventions in order to define MOD as being balanced or unbalanced in each trial of every network. By applying these rules, we distinguished networks with ‘low’, ‘moderate and balance’, ‘moderate and imbalance’, ‘large and balance’ and ‘large and imbalance’ MOD. Step-by-step details on this strategy can be found in the web appendix (Additional file 2).

Missingness models in network meta-analysis

In the presence of MOD, we need a model that incorporates both the missing and observed information and, in addition, allows us to learn about missingness mechanisms. We briefly describe two missingness models that have been proposed for that purpose.

Pattern-mixture model

Consider a network of N trials investigating different sets of T interventions. In arm k = 1, 2, … , a_i of trial i, we observe the number of events, r_ik, and the number of MOD, m_ik, out of the total randomised, n_ik. In arm k of trial i, the number of observed events and the number of MOD are assumed to be sampled from the corresponding binomial distributions [10]:

$$ {r}_{ik}\sim Bin\left({p}_{ik}^o,{n}_{ik}-{m}_{ik}\right)\ \mathrm{and}\ {m}_{ik}\sim Bin\left({q}_{ik},{n}_{ik}\right) $$

with $ {p}_{ik}^o $ being the probability of event conditional on the completers and q_ik being the probability of MOD.

The pattern-mixture model was the most commonly described model to address MOD in systematic reviews (Additional file 1: Table S1). It describes distribution of the outcome between completers and missing participants [3, 10]. Then, the underlying probability of event in arm k of trial i, p_ik, is modelled conditional on whether an event is observed or missing [10]:

$$ {p}_{ik}={p}_{ik}^o\bullet \left(1-{q}_{ik}\right)+{p}_{ik}^m\bullet {q}_{ik} $$

where $ {p}_{ik}^m $ indicates the probability of event conditional on missing participants in arm k of trial i. Following Turner et al. [10], the above equation can be re-arranged to link $ {p}_{ik}^o $ with the remaining parameters:

$$ {p}_{ik}^o=\frac{p_{ik}-{p}_{ik}^m\bullet {q}_{ik}}{1-{q}_{ik}} $$

(1)

Then, using the logit function, we define the log odds of event in arm k of trial i as follows:

$$ logit\left({p}_{ik}\right)={u}_i+{\theta}_{i,k1}\bullet I\left(k>1\right) $$

(2)

where u_i = logit(p_i1) is the log odds of event in the baseline arm of trial i and θ_{i, k1} is the log OR of event in arm k relative to the baseline arm of trial i. Typically, θ_{i, k1} follows a normal distribution with mean $ {\mu}_{t_{ik}{t}_{i1}} $ (i.e. the summary log OR of event between intervention t_ik and t_i1 of trial i) and variance τ², which is commonly assumed to be constant across different comparisons. The index t_ik indicates the intervention studied in arm k of trial i. In trial i with a_i ≥ 3 arms, log ORs are correlated since they share the same comparator and therefore follow a multivariate normal distribution, which is equivalent to conditional univariate normal distributions for θ_{i, k1} of arm k > 2, conditional on all arms from k = 2 to a_i − 1 (eq. 11 in Dias et al. [22]).

Under the consistency assumption (which implies statistical agreement between direct and (possibly more than one) indirect sources of evidence [14]), summary log ORs for all possible comparisons among non-reference interventions are obtained as functions of T − 1 summary log ORs for the basic parameters, namely, treatment effects relative to the reference intervention of the network (here, the reference is intervention 1):

$$ {\mu}_{tl}={\mu}_{t1}-{\mu}_{l1} $$

(3)

with t, l = {2, 3, … , T} and t ≠ l.

Selection model

Another way to model observed data (i.e. r_ik, n_ik − r_ik − m_ik and m_ik) is to consider the following multinomial distribution [4, 11] in arm k of trial i:

$$ {\left({r}_{ik},{n}_{ik}-{r}_{ik}-{m}_{ik},{m}_{ik}\right)}^T\sim M\left({p}_{1, ik},{p}_{2, ik,},{p}_{3, ik},{n}_{ik}\right) $$

with

$$ {\displaystyle \begin{array}{c}{p}_{1, ik}=\left(1-{c}_{1, ik}\right)\bullet {p}_{ik}\\ {}{p}_{2, ik}=\left(1-{c}_{0, ik}\right)\bullet \left(1-{p}_{ik}\right)\\ {}{p}_{3, ik}={c}_{1, ik}\bullet {p}_{ik}+{c}_{0, ik}\bullet \left(1-{p}_{ik}\right)\end{array}} $$

where p_{1, ik} reflects the marginal probability of observing the underlying event, p_{2, ik} reflects the marginal probability of not observing the underlying event and p_{3, ik} is actually the probability of MOD in arm k of trial i (i.e. p_{3, ik} = q_ik) and is modelled conditional on whether the missing participants may have experienced the underlying event or not [4, 11]. The last line describes the selection model [4, 11]. Then, parameters c_{1, ik} and c_{0, ik} denote the probability of MOD conditional on those participants with the underlying event and the probability of MOD conditional on those participants without the underlying event in arm k of trial i, respectively. Only q_ik is estimable from the data, and thus, we need to assign proper prior distributions on all other parameters.

Informative missingness odds ratio parameter

To be able to incorporate plausible informative prior beliefs about the missingness process, we need alternative missingness parameters to $ {p}_{ik}^m $, c_{1, ik} and c_{0, ik} that measure the relationship between the underlying outcome (event or non-event) and the status of the outcome (being missing or observed) [10]. Alternative missingness parameters have been already proposed in the literature.

Informative missingness odds ratio (IMOR) appeared to be the most popular missingness parameter in the literature (Additional file 1: Table S1). Under the pattern-mixture model, it is defined as the ratio of the odds of an event conditional on missing participants to the odds of an event conditional on completers in arm k of trial i [2, 3, 10]:

$$ {IMOR}_{ik}={\varphi}_{ik}=\frac{p_{ik}^m/\left(1-{p}_{ik}^m\right)}{p_{ik}^o/\left(1-{p}_{ik}^o\right)}. $$

Then, Eq. (1) can be re-written as follows (see also Appendix A in Turner et al. [10]):

$$ {p}_{ik}^o=\frac{-\left(\left({q}_{ik}-{p}_{ik}\right)\left(1-{\varphi}_{ik}\right)-1\right)-\sqrt{{\left(\left({q}_{ik}-{p}_{ik}\right)\left(1-{\varphi}_{ik}\right)-1\right)}^2-4{p}_{ik}\left(1-{q}_{ik}\right)\left(1-{\varphi}_{ik}\right)}}{2\left(1-{q}_{ik}\right)\left(1-{\varphi}_{ik}\right)} $$

Under the selection model, IMOR is defined as the ratio of the odds of MOD conditional on those with the underlying event to the odds of MOD conditional on those participants without the underlying event in arm k of trial i [4, 11]:

$$ {\varphi}_{ik}=\frac{c_{1, ik}/\left(1-{c}_{1, ik}\right)}{c_{0, ik}/\left(1-{c}_{0, ik}\right)} $$

Then, c_{1, ik} and c_{0, ik} can be parameterised with regard to φ_ik in the logarithmic scale (i.e. log(φ_ik) = δ_ik) and parameter γ_ik that indicates the average MOD across underlying event and underlying non-event in arm k of trial i as follows [4, 11]:

$$ {\displaystyle \begin{array}{c} logit\left({c}_{1, ik}\right)={\gamma}_{ik}+{\delta}_{ik}/2\\ {} logit\left({c}_{0, ik}\right)={\gamma}_{ik}-{\delta}_{ik}/2\end{array}} $$

with

$$ {\gamma}_{ik}=\frac{logit\left({c}_{1, ik}\right)+ logit\left({c}_{0, ik}\right)}{2} $$

In both missingness models, IMOR takes positive values, with IMOR equals 1 being equivalent to MAR. Then, in both missingness models, we use equations (2) and (3) with a random-effects model for θ_{i, k1} to apply random-effects NMA model with consistency equations.

Similar to OR, IMOR is applied in the logarithmic scale but is back-transformed in order to aid interpretation. Then, a natural choice is to apply a normal prior distribution on δ_ik:

$$ {\delta}_{ik}\sim N\left({\varDelta}_{ik},{\sigma}_{ik}^2\right) $$

where Δ_ik is the average belief about the missingness scenario in arm k of trial i and $ {\sigma}_{ik}^2 $ is the uncertainty about this belief.

Other alternative missingness parameters that have been proposed are the event probability ratio within a pattern-mixture model by Akl et al. [6], and the response probability ratio within a selection model by Magder [23]. Being ratios of risks, these missingness parameters are more likely to be used alongside the relative risk ratio as outcome measure. Turner et al. [10] also reported these missingness parameters in the context of a Bayesian framework. In the present study, we preferred IMOR to the aforementioned alternative missingness parameters for being intuitively related to OR and for sharing the same statistical properties with OR (i.e. symmetry and prediction of event rates within [0, 1]) [2].

Identical and hierarchical structure of normal prior distribution for δ _ik

Identical structure was the preferred prior structure in the majority of methodological articles (Additional file 1: Table S1) and is the simplest assumption as it yields the least parameters to estimate. Under this structure, δ_ik is considered identical depending on further assumptions that relate to whether missingness mechanisms may be common in the whole network:

$$ {\delta}_{ik}=\delta, \delta \sim N\left(\Delta , {\sigma}^2\right) with\ {\Delta }_{ik}=\Delta \ and\ {\sigma}_{ik}^2={\sigma}^2, $$

trial-related:

$ {\delta}_{ik}={\delta}_i{\delta}_i\sim N\left({\varDelta}_i,{\sigma}_i^2\right)\mathrm{with}{\Delta}_{ik}={\Delta}_i and{\sigma}_{ik}^2={\sigma}_i^2, $

or intervention-related:

$$ {\delta}_{ik}={\delta}_{t_{ik}},{\delta}_{t_{ik}}\sim N\left({\Delta}_{t_{ik}},{\sigma}_{t_{ik}}^2\right) with{\Delta}_{ik}={\Delta}_{t_{ik}} and{\sigma}_{ik}^2={\sigma}_{t_{ik}}^2 $$

In the present study, we considered σ², $ {\sigma}_i^2 $ and $ {\sigma}_{t_{ik}}^2 $ to be the same: $ {\sigma}^2={\sigma}_i^2={\sigma}_{t_{ik}}^2 $.

Hierarchical structure assumes that δ_ik s are different yet related to each other by allowing for ‘information to be borrowed’ that is common-within-network:

δ_ik~N(Δ, σ²) with Δ~N(ξ, ψ²), σ~U(0, ψ),

trial-specific (i.e. across different interventions in the same trial):

$ {\delta}_{ik}\sim N\left({\varDelta}_i,{\sigma}_i^2\right) $ with $ {\varDelta}_i\sim N\left({\xi}_i,{\psi}_i^2\right) $ and σ_i~U(0, ψ_i),

or intervention-specific (i.e. across different trials for the same intervention):

$ {\delta}_{ik}\sim N\left({\varDelta}_{t_{ik}},{\sigma}_{t_{ik}}^2\right) $ with $ {\varDelta}_{t_{ik}}\sim N\left({\xi}_{t_{ik}},{\psi}_{t_{ik}}^2\right) $, $ {\sigma}_{t_{ik}}\sim U\left(0,{\psi}_{t_{ik}}\right). $

with ξ, ξ_i and $ {\xi}_{t_{ik}} $ being the mean of the hyper-parameters Δ, Δ_i and $ {\varDelta}_{t_{ik}} $, respectively, and ψ², $ {\psi}_i^2 $ and $ {\psi}_{t_{ik}}^2 $ being the corresponding variances. In the present study, we considered ψ², $ {\psi}_i^2 $ and $ {\psi}_{t_{ik}}^2 $ to be the same: $ {\psi}^2={\psi}_i^2={\psi}_{t_{ik}}^2 $. We assigned a uniform distribution on σ, σ_i and $ {\sigma}_{t_{ik}} $; however, researchers may consider other appropriate prior distributions for variance components [24]. Turner et al. [10] also briefly presented the independent structure, which is the least strong assumption to consider but yields the most parameters to estimate; however, in the present study, we did not consider the independent structure.

Missingness scenarios using δ _ik

On average MAR (i.e. $ \varDelta ={\varDelta}_i={\varDelta}_{t_{ik}}=0 $ and $ \xi ={\xi}_i={\xi}_{t_{ik}}=0 $ under identical and hierarchical structure, respectively) with moderate prior variance of δ_ik (i.e. σ² = 1 and ψ² = 1 under identical and hierarchical structure, respectively) was the principal scenario in the present study. In addition, we considered the following extreme scenarios for identical structure only and we applied them under the pattern-mixture model (again with σ² = 1):

$ {e}^{\varDelta_{t_{ik}}}=2 $: the odds of an event in missing participants is twice the odds of an event in completers across all interventions – we call this scenario ‘more missing cases are events (MME)’;
$ {e}^{\varDelta_{t_{ik}}}=1/2 $: the odds of an event in completers is twice the odds of an event in missing participants across all interventions – we call this scenario ‘more missing cases are non-events (MMNE);
the odds of an event in missing participants is twice the odds of an event in completers in all non-reference interventions of the network (i.e. $ {e}^{\varDelta_{t_{ik}}}=2 $ for t_ik ≠ 1 with 1 being the reference of the network), whereas the opposite holds for the reference intervention (i.e. $ {e}^{\varDelta_1}=1/2 $ with 1 being the reference of the network) – we call this scenario ‘more missing cases are events for the non-reference interventions of the network’ (best-case scenario (BC) for the non-reference interventions); and
the odds of an event in completers is twice the odds of an event in missing participants in all non-reference interventions of the network (i.e. $ {e}^{\varDelta_{t_{ik}}}=1/2 $ for t_ik ≠ 1 with 1 being the reference of the network), whereas the opposite holds for the reference intervention (i.e. $ {e}^{\varDelta_1}=2 $ with 1 being the reference of the network) – we call this scenario ‘more missing cases are non-events for the non-reference interventions of the network’ (worst-case scenario (WC) for the non-reference interventions).

Ideally, $ {\varDelta}_{t_{ik}}\ne 0 $ should be defined based on expert judgment tailored to the condition and interventions studied; however, we used the values we applied in our previous work [11].

Research questions investigated

We re-analysed all 29 networks while considering the aforementioned missingness models and structures of normal prior distribution for δ_ik in order to investigate, initially (i) whether there is agreement between on average MAR and extreme scenarios (analysis A1); and (ii) whether there is agreement between accountability and ignorance of uncertainty due to MOD under MAR and extreme scenarios (analysis A2). Then, we evaluated (i) whether there is agreement between identical and hierarchical prior structure for δ_ik while considering δ_ik to be common-within-network, trial- and intervention-specific (analysis B1); (ii) whether there is agreement among further structural assumptions (i.e. common-within-network, trial- and intervention-specific) when δ_ik has identical prior structure (analysis B2a) and when δ_ik has hierarchical prior structure (analysis B2b); and (iii) whether there is agreement between pattern-mixture and selection model while considering δ_ik to be common-within-network, trial- and intervention-specific (analysis B3). Lastly, as an additional analysis, we investigated whether moderate prior variance of δ_ik (σ² = 1 applied in all aforementioned analyses) agrees with conservative (σ² = 4) and liberal (σ² = 0.25) prior variance of δ_ik (analysis C1) – the latter carries more information about the missingness mechanism. These prior variance values for δ_ik have been recommended by White et al. [3, 4]. Details on missingness models, structures of δ_ik and missingness scenarios considered in each analysis can be found in Table 1.

Table 1 Research questions investigated and description of applied missingness models and prior structures for δ_ik

Full size table

Network estimates and measure of disagreement

We obtained posterior distribution of log ORs for the basic parameters, τ² s, inconsistency factors (IF; difference between direct and indirect estimates for a comparison in a closed loop, that is, a polygon that connects three or more interventions [14]) through the node-splitting approach [25, 26], SUCRAs (surface under the cumulative ranking) and posterior median rankings for all studied interventions [27]. A brief explanation of node-splitting approach and SUCRAs can be found in Additional file 2. For each analysis, we measured disagreement in compared methods in terms of NMA estimates using difference in posterior mean of log ORs, IFs, SUCRAs, and ratio of posterior median of τ² s. Furthermore, we measured disagreement in compared methods in terms of uncertainty around NMA estimates using ratio of posterior standard deviation of log ORs, τ² s, IFs, and difference in posterior standard deviation of SUCRAs. Moreover, we measured disagreement in compared methods in terms of δ_ik (analyses B1, B3, and C1) using differences in posterior mean and ratio of posterior standard deviation under the corresponding structural assumptions (Table 1).

Presentation of results using Bland-Altman plots and Cohen’s kappa statistic

We used Bland-Altman plots to investigate level of agreement in all analyses [28]. In each Bland-Altman plot, we displayed average bias (i.e. mean of the differences or exponential of the mean of log ratios) and 95% limits of agreement (LoA; average bias as mean of differences or log ratios ± 1.96 ∙s_D, that is, the standard deviation of differences or log ratios, respectively) [28]. We decided in advance to consider compared methods as having good agreement when average bias was close to 0 (for differences) or 1 (for ratios) and most of the points were uniformly scattered within the LoA – the narrower the LoA, the better the agreement. Agreement in terms of posterior median of rankings was investigated using heat-maps.

Furthermore, in each analysis, we compared strength and direction of evidence in posterior mean of log ORs and posterior mean of IFs. For that purpose, we applied Cohen’s kappa statistic (a coefficient that measures the inter-rater agreement for nominal items) [29] and we presented the estimated statistic alongside its 95% confidence interval. We used the divisions of agreement reported in Landis and Koch [30] in order to interpret this statistic. In a similar way, we worked with the extent of τ² in each network, where we considered empirical distributions tailored to studied outcome and intervention-comparison type per network in order to determine posterior median of τ² as low (less than the median of empirical distribution), moderate (between median and 3rd quartile) and large (above 3rd quartile) [31].

Model specification

All NMA models were fitted using JAGS via the R package R2jags [32] (statistical software R, version 3.3.1 [33]), whereas the node-splitting model was performed using the R package gemtc [25, 34] in conjunction with the node-splitting model of Dias et al. [26]. Further information on specification of the NMA models and node-splitting approach (e.g. prior distributions assigned and diagnostic evaluation of convergence) can be found in the web appendix (Additional file 2). The codes to run all NMA models in JAGS can be found in Additional file 3, whereas the analysed dataset can be found in Additional file 4. We produced self-created Bland-Altman plots using the R packages ggplot2 and cowplot [35, 36].

Results

Distribution of MOD across health-related fields

Out of 29 NMAs, 14 (48%) were judged to have ‘moderate and balance’ MOD, followed by 12 (41%) with ‘low’ MOD, two with ‘moderate and unbalanced’ MOD, and one with ‘large and unbalanced’ MOD (Additional file 1: Tables S2 and S3). No network fell into the ‘large and balance’ MOD category.

Overall, there was great dispersion of total %MOD (blue violin plots) across trials in all health-related fields (Fig. 1). In comparison with dermatology, diabetes, infections and ophthalmology, total %MOD for the remaining health-related fields were distributed across a greater range – most of them exceeding 10%. On the contrary, differences in %MOD between compared arms (red violin plots) were relatively less dispersed across health-related fields, except for cardiology, neurology, respiratory, rheumatology and urology (Fig. 1).

Implications of extreme scenarios about the missingness mechanism

Overall, differences in terms of posterior mean of log ORs ranged in much narrower LoA for on average MAR versus MME and MMNE as opposed to on average MAR versus BC and WC where almost all differences were concentrated systematically below and above 0, respectively, for networks with moderate and large MOD (Fig. 2). Most ratios were uniformly scattered at low averages of posterior median of τ² s (approximately below 0.15). In line with log ORs, differences in terms of posterior mean of IFs and posterior mean of SUCRAs, as well as ratios in terms of posterior standard deviations, ranged overall in narrower LoA for on average MAR versus MME and MMNE as opposed to on average MAR versus BC and WC (Fig. 2; Additional file 5: Figure S1(a)). Generally, there were small perturbations in posterior median of rankings (Additional file 5: Figure S1(b)).

Implications of discounting uncertainty due to MOD

Discounting uncertainty due to MOD led to systematically larger posterior mean of log ORs for MMNE and BC scenarios, yet systematically smaller posterior mean of log ORs for WC scenario, especially for moderate and large MOD (Fig. 3). The majority of ratios of posterior standard deviation of log ORs were systematically above 1 across all scenarios indicating a tendency for increased precision when uncertainty due to MOD was ignored (Additional file 5: Figure S2(a)).

Interestingly, posterior median of τ² s was systematically larger when uncertainty due to MOD was ignored regardless of scenario (Fig. 3). Overall, ignoring uncertainty due to MOD led to slightly smaller and larger posterior mean of SUCRAs for averages below 50% and above 75%, respectively, regardless of scenario. Most differences in posterior standard deviation of SUCRAs were systematically positive across all scenarios after discounting uncertainty due to MOD, indicating a tendency for increased precision. Generally, there was little implication for posterior median of rankings (Additional file 5: Figure S2(b)).

Agreement between identical and hierarchical prior structure for δ _ik

Imposing identical, as opposed to hierarchical, structure on δ_ik led to systematically larger posterior median of τ² s across all structural assumptions for δ_ik; however, ratios of posterior standard deviation of τ² s were uniformly scattered (Fig. 4; Additional file 5: Figure S3(a)). Overall, differences ranged in quite narrow LoA in terms of posterior mean of log ORs (mostly in the case of low MOD), posterior mean of IFs and posterior mean of SUCRAs, as well as ratios of posterior standard deviations (especially for log ORs and SUCRAs under intervention-specific assumption) (Fig. 4; Additional file 5: Figure S3(a)). In general, perturbations for posterior median of rankings were small (Additional file 5: Figure S3(b)).

In all structural assumptions, the majority of differences in posterior mean of log IMORs (i.e. δ s and Δ s for identical and hierarchical structure, respectively), especially those corresponding to networks with low MOD, were uniformly scattered around 0 and in a range from − 0.25 to 0.25 averages of posterior mean of log IMORs (Additional file 5: Figure S3(c)). Ratios of posterior standard deviation of log IMORs were also scattered uniformly in narrow LoA (especially under the trial-specific structure).

Agreement among different prior structures for δ _ik

Under identical structure, differences in terms of posterior mean of log ORs, posterior mean of IFs and posterior mean of SUCRAs as well as ratios of posterior standard deviations were scattered in narrower LoA when common-within-network was compared with trial-specific prior structure (Additional file 5: Figure S4(a-c)). Particularly interesting were the results on posterior standard deviation of log ORs and SUCRAs as they were systematically larger under intervention-specific prior structure, especially in the case of moderate and large MOD (Additional file 5: Figure S4(b)). Under hierarchical structure, inferences were similar to those under identical structure for all NMA components (Additional file 5: Figure S5(a-c)).

Agreement between pattern-mixture model and selection model

Assuming common-within-network or intervention-specific prior structure on identical δ_ik led to relatively wider LoA for posterior mean of log ORs and SUCRAs as opposed to trial-specific prior structure where differences were uniformly scattered in narrower LoA (Fig. 5). Overall, ratios of posterior standard deviation of all NMA estimates were scattered in narrow LoA (especially for trial-specific structure) (Additional file 5: Figure S6(a)). Perturbations for posterior median of rankings were small (Additional file 5: Figure S6(b)). Most posterior means of δ_ik s were scattered uniformly around 0 and in a range from − 0.5 to 0.5 averages of posterior mean of δ_ik s for all prior structures (Additional file 5: Figure S6(c)). Results for posterior standard deviation of δ_ik s were particularly interesting: selection model led to systematically imprecise δ_ik s more frequently than pattern-mixture model for all prior structures and especially for moderate and large MOD.

Overall, there was agreement in strength and direction of posterior mean of log ORs and posterior mean of IFs in all analyses (Additional file 1: Tables S4 – S9). The level of agreement in extent of τ² could not be judged with confidence due to few estimated τ²s (only 29).

Additional analysis

Different prior values for the variance of δ _ik

Using conservative prior variance led to systematically smaller posterior median of τ² s, yet systematically larger posterior standard deviation of log ORs and posterior standard deviation of SUCRAs (Additional file 5: Figure S7(a)). Contrarily, using liberal prior variance led to systematically smaller posterior standard deviations of log ORs and SUCRAs. Overall, differences between moderate and conservative prior variance ranged within wider LoA in terms of posterior distribution of NMA estimates as compared to differences between moderate and liberal prior variance. Implications for posterior median of rankings were small (Additional file 5: Figure S7(b)). There was poor agreement between moderate and alternative prior variances in terms of posterior mean and posterior standard deviation of δ_ik s as indicated by evidence of proportional bias (Additional file 5: Figure S7(c)). Compared to moderate prior variance, posterior mean of δ_ik s was scattered across twice the range under conservative variance but half the range under liberal prior variance (Additional file 5: Figure S7(d)). Furthermore, posterior standard deviation of δ_ik s did not concur between moderate and alternative prior variances as the former always gave smaller and larger posterior standard deviations compared to conservative and liberal prior variance, respectively (Additional file 5: Figure S7(d)). Overall, there was good agreement in strength and direction of posterior mean of log ORs and posterior mean of IFs (Additional file 1: Table S10). The level of agreement in the extent of τ² could not be judged with confidence.

Discussion

Using a collection of 29 NMAs from a wide range of health-related fields [16], we have performed the first empirical study on the most frequently described Bayesian modelling strategies for binary MOD in meta-analyses and elucidated their implications for core NMA estimates.

We found that consideration of BC or WC resulted systematically in much larger and lower log ORs, respectively, particularly when the network was predominated by trials with moderate or large MOD (Fig. 2). A number of methodological articles have illustrated these implications in the context of pairwise and network meta-analysis using invented or real-life examples [2, 4, 8, 10, 11]. Some of the authors pronounced these scenarios as being unrealistic for primary and sensitivity analysis, especially for considerable numbers of missing participants in included trials [2].

Furthermore, we revealed that ignorance of uncertainty due to MOD could implicate estimation of NMA components. Specifically, such a strategy yielded systematically smaller posterior standard deviation of log ORs and smaller posterior standard deviation of SUCRA values, systematically larger posterior mean of log ORs and larger posterior median of τ² s when coupled with extreme scenarios and slight exaggeration of potency of highly ranked interventions in terms of SUCRA value. In our previous study, we showed that fixing δ_ik s, while considering BC or WC scenarios, considerably perturbed effects of log ORs and inflated τ² even in the case of low MOD [11]. White et al. [3], Turner et al. [10], Spineli et al. [11], and Spineli [37] also indicated an association between τ² inflation and fixation of the observations or missingness parameter, especially under extreme scenarios. A possible explanation might be that by fixing the observations or missingness parameter, uncertainty about the trial-specific estimates is reduced and hence, the extent of τ² is uncovered.

We found that pattern-mixture and selection models yielded similar results, particularly when trial-specific structure was considered for δ_ik s. White et al. [4] compared selection model with pattern-mixture model in a real meta-analysis and found a tendency of the former to provide slightly larger ORs. Nevertheless, we found that selection model yielded imprecise δ_ik s and by extension, reduced our ability to learn about the missingness mechanism with certainty.

Making different assumptions about prior structure of δ_ik added further insights into implications of MOD on NMA estimates. Selecting between identical and hierarchical structure mostly affected estimation of τ², whereas the decision to select common-within-network, trial-specific or intervention-specific prior structure for δ_ik mostly implicated uncertainty around the estimation of log ORs and SUCRA values, especially in the case of moderate and large MOD. We found that the intervention-specific structure led to systematically larger posterior standard deviation of log ORs and SUCRAs as opposed to the other prior structures for δ_ik. A possible explanation might be the following: since most networks had either low or moderate but balance MOD across trials, the common-within-network and trial-specific structure (which assumed that MOD were equally informative in the whole network or in all arms of each trial, respectively [3]) assigned relatively larger weight on these trials as opposed to the intervention-specific structure (which assumed that MOD were differently informative in the arms of each trial [3]) – the latter was affected by extent of total MOD in each trial [3].

Nevertheless, as mentioned by Turner et al. [10], structural assumptions for the missingness parameter would be best led by experts and tailored to the condition and interventions investigated, since different prior structures may affect our ability to learn about the missingness mechanisms in a specific meta-analysis and by extension, may impact meta-analysis results. In the context of NMA, the analyst deals with multiple interventions that are appointed to a wider patient setting and thus, interventions may bear on different degree of MOD in different comparisons and possibly different missingness mechanisms. Consequently, we view common-within-network to be a rather implausible structure, especially in networks that include interventions of different functionality (e.g. placebo and active interventions), as the missingness mechanisms are expected to differ in different interventions.

The shortcomings of our study must be acknowledged. First, we were able to extract arm-level binary outcome data in every trial in only 29 (11%) out of 273 NMAs with MOD due to severe limitations in reporting quality of the reviews [16]. As a result, there was scarcity of points in Bland-Altman plots for τ² and δ_ik s for the common-within-network structure that prevented us from fully understanding method performance when compared for these components. Nevertheless, we would not expect our conclusions to differ should a larger dataset be collected. Furthermore, the limited extracted networks did not allowed us to thoroughly learn about the implications of extent of MOD (in terms of prevalence and imbalance) on NMA estimates since relevant groups (as defined in Methods under Characterising networks based on prevalence and balance of MOD) were considerably unbalanced in frequency (Results under Definition of MOD across health-related fields).

Second, using the extraction criteria we developed in a previous work [38], we found that extraction quality was unacceptable in 23 (79%) reviews, because reviewers provided no information on observed outcome or how MOD were handled, whereas for the remaining 6 reviews, extraction was judged as unclear, since only information on observed outcome was unavailable (Additional file 1: Table S11). Consequently, no distinction could be made between observed and imputed outcomes in order to achieve an accurate extraction. In nine networks, unacceptable extraction manifested as calculated negative non-events in some of the included trials, which we removed in order to be able to perform NMA. For discussion on the issue of negative non-events the reader could refer to Spineli [38].

Ideally, good agreement should reflect clinically meaningful differences in measurements of compared methods [28]. We determined two methods as having good agreement when average bias was close to 0 (for differences) or 1 (for ratios) and points were uniformly scattered within narrow LoA. Since we dealt with many different conditions and clinical outcomes, it was not possible to decide in advance on a specific clinically meaningful average bias that would indicate good agreement between compared methods.

Finally, normal prior distributions on log IMORs were specified using values for mean and variance as recommended in relevant methodological articles [3, 4, 11] rather than based on expert opinion. Ideally, informative prior distributions should be elicited tailored to the clinical condition and interventions studied, since the extent and reasons for MOD are expected to vary across different conditions and interventions [10]. Empirical elicitation studies are needed to provide us with proper prior distributions for log IMORs.

Recommendations for good practice

While the focus of our study was on systematic reviews with NMA, the following recommendations also apply to systematic reviews with pairwise meta-analyses.

In line with other authors [2, 39,40,41], perform a primary analysis under on average MAR assumption, and opt for assumptions with clinical plausibility as sensitivity analyses in order to explore robustness of primary analysis results.
Avoid fixing the dataset either by imputing or excluding MOD before analysis and instead, opt for modelling the missingness mechanism via the IMOR parameter in order to accommodate uncertainty about the missingness scenarios considered.
Consider hierarchical rather than identical structure on δ_ik s when MOD are substantial. Nevertheless, further research is needed to clarify conditions for proper utilization of each structure.
Opt for trial-specific prior structure on δ_ik s when compared interventions are believed to trigger similar missingness mechanisms as opposed to trial set-up. Consider intervention-specific prior structure on δ_ik s when missingness mechanisms are believed to differ across interventions. Avoid the common-within-network prior structure, especially in the case of moderate or large MOD. Consult an expert to discuss the prior structure on δ_ik that best fits collected trials (i.e. good knowledge of the specific examples being considered and detailed inspection of the properties of included trials is desired). In line with the aforementioned point, further research is needed to comprehend performance of NMA components under different prior structures for δ_ik s in depth.
When low MOD is present, choice between pattern-mixture and selection models could be based upon conceptual and computational convenience for the researcher. For considerable MOD, pattern-mixture model tends to preserve precision in estimation of δ_ik s. Nevertheless, further research is needed to understand when it is most proper to use one model over the other.
In terms of prior variance for δ_ik, select liberal prior variance (σ² = 0.25) for large MOD and moderate prior variance (σ² = 1) for moderate MOD in order to preserve precision in NMA estimates.

Conclusions

Addressing MOD using extreme scenarios and/or ignoring uncertainty induced by MOD constitutes naïve strategy with serious implication for NMA estimates, especially when participant losses in included trials are substantial. Instead, aiming to model MOD via the log IMOR parameter can ensure credible NMA results via adjustment of attrition bias and, furthermore, offer valuable insights into underlying missingness mechanisms. Researchers should consult an expert in order to decide on the structure of log IMOR that best aligns to the condition and intervention studied and, in addition, to define parameter values of prior distribution for log IMOR.

Abbreviations

BC:: Best-case scenario for all non-reference interventions
IF:: Inconsistency factor
IMOR:: Informative missingness odds ratio
LoA:: Limits of agreement
MAR:: Missing at random
MME:: More missing cases are event in all interventions
MMNE:: More missing cases are non-event in all interventions
MOD:: Missing outcome data
NMA:: Network meta-analysis
OR:: Odds ratio
SUCRA:: Surface under the cumulative ranking curve
WC:: Worst-case scenario for all non-reference interventions

References

Gamble C, Hollis S. Uncertainty method improved on best-worst case analysis in a binary meta-analysis. J Clin Epidemiol. 2005;58:579–88.
Article PubMed Google Scholar
Higgins JP, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trials. Clin Trials. 2008;5:225–39.
Article PubMed PubMed Central Google Scholar
White IR, Higgins JP, Wood AM. Allowing for uncertainty due to missing data in meta-analysis--part 1: two-stage methods. Stat Med. 2008;27:711–27.
Article PubMed Google Scholar
White IR, Welton NJ, Wood AM, Ades AE, Higgins JP. Allowing for uncertainty due to missing data in meta-analysis--part 2: hierarchical models. Stat Med. 2008;27:728–45.
Article PubMed Google Scholar
Yuan Y, Little RJ. Meta-analysis of studies with missing data. Biometrics. 2009;65:487–96.
Article PubMed Google Scholar
Akl EA, Johnston BC, Alonso-Coello P, Neumann I, Ebrahim S, Briel M, et al. Addressing dichotomous data for participants excluded from trial analysis: a guide for systematic reviewers. PLoS One. 2013;8:e57132.
Article CAS PubMed PubMed Central Google Scholar
Ebrahim S, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, et al. Addressing continuous data for participants excluded from trial analysis: a guide for systematic reviewers. J Clin Epidemiol. 2013;66:1014–21 e1.
Article PubMed Google Scholar
Mavridis D, White IR, Higgins JP, Cipriani A, Salanti G. Allowing for uncertainty due to missing continuous outcome data in pairwise and network meta-analysis. Stat Med. 2015;34:721–41.
Article PubMed PubMed Central Google Scholar
Dimitrakopoulou V, Efthimiou O, Leucht S, Salanti G. Accounting for uncertainty due to 'last observation carried forward' outcome imputation in a meta-analysis model. Stat Med. 2015;34:742–52.
Article PubMed Google Scholar
Turner NL, Dias S, Ades AE, Welton NJ. A Bayesian framework to account for uncertainty due to missing binary outcome data in pairwise meta-analysis. Stat Med. 2015;34:2062–80.
Article CAS PubMed PubMed Central Google Scholar
Spineli LM, Higgins JP, Cipriani A, Leucht S, Salanti G. Evaluating the impact of imputations for missing participant outcome data in a network meta-analysis. Clin Trials. 2013;10:378–88.
Article PubMed Google Scholar
White IR, Higgins JP. Meta-analysis with missing data. Stata J. 2009;9:57–69.
Article Google Scholar
Higgins JPT, Deeks JJ, Altman DG. Special topics in statistics. In: JPT H, Green S, editors. Cochrane handbook for systematic reviews of interventions. Version 5.1.0 (updated March 2011); 2011. The Cochrane Collaboration. http://handbook-5-1.cochrane.org/.
Efthimiou O, Debray TP, van Valkenhoef G, Trelle S, Panayidou K, Moons KG, et al. GetReal in network meta-analysis: a review of the methodology. Res Synth Methods. 2016;7:236–63.
Article PubMed Google Scholar
Petropoulou M, Nikolakopoulou A, Veroniki AA, Rios P, Vafaei A, Zarin W, et al. Bibliographic study showed improving statistical methodology of network meta-analyses published between 1999 and 2015. J Clin Epidemiol. 2017;82:20–8.
Article PubMed Google Scholar
Spineli LM, Yepes-Nuñez JJ, Schünemann HJ. A systematic survey shows that reporting and handling of missing outcome data in networks of interventions is poor. BMC Med Res Methodol. 2018;18:115.
Article PubMed PubMed Central Google Scholar
Lee AW. Review of mixed treatment comparisons in published systematic reviews shows marked increase since 2009. J Clin Epidemiol. 2014;67:138–43.
Article PubMed Google Scholar
Nikolakopoulou A, Chaimani A, Veroniki AA, Vasiliadis HS, Schmid CH, Salanti G. Characteristics of networks of interventions: a description of a database of 186 published networks. PLoS One. 2014;9:e86754.
Article PubMed PubMed Central Google Scholar
Chambers JD, Naci H, Wouters OJ, Pyo J, Gunjal S, Kennedy IR, et al. An assessment of the methodological quality of published network meta-analyses: a systematic review. PLoS One. 2015;10:e0121715.
Article PubMed PubMed Central Google Scholar
Walter SD. Choice of effect measure for epidemiological data. J Clin Epidemiol. 2000;53:931–9.
Article CAS PubMed Google Scholar
Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine: how to practice and teach EBM. New York: Churchill Livingstone; 1997.
Google Scholar
Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Mak. 2013;33:607–17.
Article Google Scholar
Magder LS. Simple approaches to assess the possible impact of missing outcome information on estimates of risk ratios, odds ratios, and risk differences. Control Clin Trials. 2003;24:411–21.
Article PubMed Google Scholar
Lambert PC, Sutton AJ, Burton PR, Abrams KR, Jones DR. How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Stat Med. 2005;24:2401–28.
Article PubMed Google Scholar
van Valkenhoef G, Dias S, Ades AE, Welton NJ. Automated generation of node-splitting models for assessment of inconsistency in network meta-analysis. Res Synth Methods. 2016;7:80–93.
Article PubMed Google Scholar
Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat Med. 2010;29:932–44.
Article CAS PubMed Google Scholar
Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clin Epidemiol. 2011;64:163–71.
Article PubMed Google Scholar
Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–60.
Article CAS PubMed Google Scholar
Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.
Article Google Scholar
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
Article CAS PubMed Google Scholar
Turner RM, Jackson D, Wei Y, Thompson SG, Higgins JP. Predictive distributions for between-study heterogeneity and simple methods for their application in Bayesian meta-analysis. Stat Med. 2015;34:984–98.
Article PubMed Google Scholar
Su YS, Yajima M. R2jags: Using R to run ‘JAGS’. R package version 0.5–7. 2015. https://CRAN.R-project.org/package=R2jags.
R Core Team. A Language and Environment for Statistical Computing. Vienna; 2016. https://www.r-project.org.
van Valkenhoef G, Kuiper J. gemtc: Network meta-analysis using Bayesian methods. R package version 0.8-2, 2016. https://github.com/gertvv/gemtc.
Chang W. R Graphics Cookbook: practical recipes for visualizing data. 1st ed. California: O’Reilly Media; 2013.
Google Scholar
Wilke C. cowplot: Streamlined plot theme and plot annotations for ‘ggplot2’. R package version 0.9–3. 2017. https://github.com/wilkelab/cowplot.
Spineli LM. Modeling missing binary outcome data while preserving transitivity assumption yielded more credible network meta-analysis results. J Clin Epidemiol. 2019;105:19–26.
Article PubMed Google Scholar
Spineli LM. Missing binary data extraction challenges from Cochrane reviews in mental health and Campbell reviews with implications for empirical research. Res Synth Methods. 2017;8:514–25.
Article PubMed Google Scholar
White IR, Carpenter J, Horton NJ. Including all individuals is not enough: lessons for intention-to-treat analysis. Clin Trials. 2012;9:396–407.
Article PubMed PubMed Central Google Scholar
Guyatt GH, Ebrahim S, Alonso-Coello P, Johnston BC, Mathioudakis AG, Briel M, et al. GRADE guidelines 17: assessing the risk of bias associated with missing participant outcome data in a body of evidence. J Clin Epidemiol. 2017;87:14–22.
Article PubMed Google Scholar
Akl EA, Kahale LA, Agoritsas T, Brignardello-Petersen R, Busse JW, Carrasco-Labra A, et al. Handling trial participants with missing outcome data when conducting a meta-analysis: a systematic survey of proposed approaches. Syst Rev. 2015;4:98.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgments

The author would like to thank Chrysostomos Kalyvas for commenting on earlier versions of the article.

Funding

This work was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft) [grant number SP 1664/1–1]. The funder had no involvement in study design, collection, analysis and interpretation of data, writing of the report or in the decision to submit the article for publication.

Availability of data and materials

The author declares that all data supporting the findings of this study are available within the article and its Additional files.

Author information

Authors and Affiliations

Midwifery Research and Education Unit, Hannover Medical School, Carl-Neuberg-Str. 1, 30625, Hannover, Germany
Loukia M. Spineli

Authors

Loukia M. Spineli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LMS conceived and designed the study; acquired, analysed and interpreted the data; drafted and revised the manuscript. The author read and approved the final manuscript.

Corresponding author

Correspondence to Loukia M. Spineli.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The author has no competing interests to declare.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Table S1. Overview of published methodological and tutorial articles on missing binary outcome data in systematic reviews. Table S2. Distribution of total percentage of missing outcome data per network. Table S3. Distribution of the difference in %MOD between compared arms per network. Table S4. Agreement on direction, strength of evidence and extent of heterogeneity. Table S5. Agreement on direction, strength of evidence and extent of heterogeneity. Table S6. Agreement on direction, strength of evidence and extent of heterogeneity. Table S7. Agreement on direction, strength of evidence and extent of heterogeneity. Table S8. Agreement on direction, strength of evidence and extent of heterogeneity. Table S9. Agreement on direction, strength of evidence and extent of heterogeneity. Table S10. Agreement on direction, strength of evidence and extent of heterogeneity. Table S11. Judgment of accuracy extraction of the eligible networks with justifications. (DOCX 102 kb)

Additional file 2:

Supplementary information of the Methods. (DOCX 184 kb)

Additional file 3:

Code for all network meta-analysis models. (DOCX 41 kb)

Additional file 4

Analysed dataset of 29 network meta-analyses and selected empirical prior distributions for between-trial variance. (TXT 39 kb)

Additional file 5:

Supplementary Figures. (DOCX 5671 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Spineli, L.M. An empirical comparison of Bayesian modelling strategies for missing binary outcome data in network meta-analysis. BMC Med Res Methodol 19, 86 (2019). https://doi.org/10.1186/s12874-019-0731-y

Download citation

Received: 09 January 2019
Accepted: 11 April 2019
Published: 24 April 2019
DOI: https://doi.org/10.1186/s12874-019-0731-y

An empirical comparison of Bayesian modelling strategies for missing binary outcome data in network meta-analysis

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Selection process of analysed dataset

Characterising networks based on prevalence and balance of MOD

Missingness models in network meta-analysis

Pattern-mixture model

Selection model

Informative missingness odds ratio parameter

Identical and hierarchical structure of normal prior distribution for δ ik

Missingness scenarios using δ ik

Research questions investigated

Network estimates and measure of disagreement

Presentation of results using Bland-Altman plots and Cohen’s kappa statistic

Model specification

Results

Distribution of MOD across health-related fields

Implications of extreme scenarios about the missingness mechanism

Implications of discounting uncertainty due to MOD

Agreement between identical and hierarchical prior structure for δ ik

Agreement among different prior structures for δ ik

Agreement between pattern-mixture model and selection model

Additional analysis

Different prior values for the variance of δ ik

Discussion

Recommendations for good practice

Conclusions

Abbreviations

References

Acknowledgments

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional files

Additional file 1:

Additional file 2:

Additional file 3:

Additional file 4

Additional file 5:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Research Methodology

Contact us

Identical and hierarchical structure of normal prior distribution for δ _ik

Missingness scenarios using δ _ik

Agreement between identical and hierarchical prior structure for δ _ik

Agreement among different prior structures for δ _ik

Different prior values for the variance of δ _ik