 Research
 Open access
 Published:
Tipping point analysis for the betweenarm correlation in an armbased evidence synthesis
BMC Medical Research Methodology volume 24, Article number: 162 (2024)
Abstract
Systematic reviews and metaanalyses are essential tools in contemporary evidencebased medicine, synthesizing evidence from various sources to better inform clinical decisionmaking. However, the conclusions from different metaanalyses on the same topic can be discrepant, which has raised concerns about their reliability. One reason is that the result of a metaanalysis is sensitive to factors such as study inclusion/exclusion criteria and model assumptions. The armbased metaanalysis model is growing in importance due to its advantage of including singlearm studies and historical controls with estimation efficiency and its flexibility in drawing conclusions with both marginal and conditional effect measures. Despite its benefits, the inference may heavily depend on the heterogeneity parameters that reflect design and model assumptions. This article aims to evaluate the robustness of metaanalyses using the armbased model within a Bayesian framework. Specifically, we develop a tipping point analysis of the betweenarm correlation parameter to assess the robustness of metaanalysis results. Additionally, we introduce some visualization tools to intuitively display its impact on metaanalysis results. We demonstrate the application of these tools in three realworld metaanalyses, one of which includes singlearm studies.
Introduction
Systematic reviews play a crucial role in evaluating multiple studies on specific research topics, providing valuable evidence to support healthcare guidelines and decisionmaking. A key component of systematic reviews is metaanalysis, which quantitatively synthesizes evidence from different studies to improve statistical efficiency, reduce bias, and identify discrepancies among studies. A critical challenge in contemporary metaanalyses is their results’ sensitivity to data, model specifications, or inclusion/exclusion criteria; this challenge has prompted various forms of sensitivity analysis. For instance, the fragility index, initially proposed to evaluate the impact of modifications in event status on the statistical significance of clinical trials, has been extended to assess the results of metaanalyses and network metaanalyses of multiple treatment comparisons [1,2,3]. Within the Bayesian framework, the synthesized results, particularly concerning interval estimates, are heavily influenced by the selection of prior distributions [4, 5]. In addition, different metaanalysis models can yield divergent conclusions for medical decisionmaking. This issue has been illustrated by Cornell et al. [6], who demonstrated inconsistent pooled results using different estimation approaches on the same datasets.
In current metaanalysis practices, the most prevalent models are contrastbased models that focus on estimating treatment contrasts. These models typically predetermine a comparative effect measure and then combine studyspecific contrasts into a synthesized effect estimate. Different from contrastbased models, armbased models include armlevel data [7], focus on estimating armspecific parameters across studies, and use the estimates of armspecific parameters to generate a variety of effect estimates for treatment comparisons. In this way, they provide information on the absolute effect of each arm rather than solely focusing on the comparisons between arms. Although armbased models used to be questioned for breaking the randomization of individual clinical trials [8], studies demonstrated that armbased models could effectively respect randomization by modeling correlations among treatment groups across trials [9]. In summary, contrastbased models assume exchangeable comparative effects across trials, while armbased models assume exchangeable absolute effects. Comprehensive comparisons by White et al. [10] and Karahalios et al. [11] concluded that both models are valid tools for metaanalyses but are preferred under different assumptions.
Armbased models have recently received increasing attention from the evidence synthesis community due to their advantages over contrastbased models. First, armbased models offer flexibility without necessitating comparative effect measures from each study. It has improved efficiency by borrowing information from doublezero events, singlearm studies, and historical controls with an easier model fitting and interpretation than a contrastbased model [12]. Second, armbased models can estimate absolute treatment effects from singlearm studies [12,13,14]. Singlearm trials, frequently used in phase I and II clinical trials, allow researchers to evaluate the safety and preliminary efficacy of a treatment in a small group of participants before proceeding to larger, randomized controlled trials. Thirdly, they can simultaneously estimate various types of effect measures (e.g., odds ratios [ORs], relative risks [RRs], and risk differences [RDs]) based on the estimates of armspecific parameters (e.g., overall event probabilities). Moreover, armbased models can also estimate conditional effects given baseline population characteristics [15], making it more feasible than contrastbased models to deliver populationspecific synthesized results.
While the armbased model has its benefits and promising applications, its results rely heavily on the correlation between the outcome of treatment arms, a heterogeneity parameter that accounts for randomization within individual clinical trials [9]. In this paper, we refer to it as the betweenarm correlation, which may reflect assumptions of metaanalysis designs. Some works showed that different design assumptions led to conflicting conclusions from metaanalyses on the same topic [16, 17]. Moreover, the betweenarm correlation indicates the extent of information borrowing when incorporating singlearm studies. As a metaanalysis commonly contains only a limited number of studies, the estimation of this betweenarm correlation may be unstable, which implies different design assumptions and affects the validity of the results. Thus, it is critical to assess the impact of its estimate on the robustness of the synthesized conclusions from an armbased metaanalysis.
This article proposes a set of new methods based on the concept of “tipping point” to address this problem, and the robustness of metaanalysis results is assessed in terms of both point estimates (magnitudes of treatment effects) and interval estimates (which informs whether the treatment likely differs from the control). The term “tipping point” is commonly used in missing data imputation and refers to a critical threshold at which the study conclusions change direction [18]. Specifically, tipping point analyses in missing data imputation evaluate the robustness of missing data assumptions by adding a successive shift parameter to overturn the conclusion [19]. These analyses are commonly required by regulatory agencies as a routine measure to address missing data issues in clinical trials. However, they remain less recognized within the community of evidence synthesis.
In the subsequent sections of this article, we provide the empirical distribution of betweenarm correlations in realworld metaanalyses. Then, we review an armbased metaanalysis model — bivariate generalized linear mixed effects model (BGLMM) — and highlight the critical role of the correlation parameter in armbased metaanalyses. Next, we propose novel tipping point methods for assessing the robustness of metaanalytical estimates and visualizing the results. Then, we demonstrate the application of the proposed methods through three case studies, encompassing scenarios with and without singlearm studies. Finally, we conclude the article with discussions.
Methods
Empirical distribution of betweenarm correlations
We draw on empirical evidence from a large collection of 69,133 metaanalyses from the Cochrane Database of Systematic Reviews to illustrate the importance of betweenarm correlations in armbased metaanalyses. This database has been utilized in our previous research on evaluating the empirical performance of metaanalysis methods, with the data collection process detailed therein [20]. Here, we included all Cochrane pairwise metaanalyses from 2003 Issue 1 to 2020 Issue 1, with binary outcomes that contain at least six studies. Their betweenarm correlations can be illustrated by the correlations of observed event probabilities between two treatment groups. Figure 1 presents the percentiles of Pearson’s correlation coefficients among these Cochrane metaanalyses categorized by the number of studies. The median Pearson’s correlation coefficient was generally around 0.7. Although the distribution ranges typically shrunk as the number of studies increased, the ranges were still wide when metaanalyses contained quite large numbers of studies, with 2.5% and 97.5% quantiles being roughly 0 and 1, respectively. These observations indicate that betweenstudy correlations are mostly positive, but they can be in a wide range, possibly affecting effect estimation.
Armbased metaanalysis model
Model specification
Suppose a metaanalysis contains \(N\) studies. Without loss of generality, this article focuses on binary outcomes. The \(i\)th study has \({n}_{i1}\) subjects in the experimental group and \({n}_{i0}\) subjects in the control group (\(i=1, 2, \dots , N\)). Let \({\pi }_{ik}\) be the probability of events for subjects receiving the experimental (\(k=1\)) or the control (\(k=0\)) treatment in the \(i\)th study. The total number of events in the \(i\)th study’s group \(k\) is \({X}_{ik}\), which is assumed to follow a binomial distribution \(bin\left({n}_{ik},{\pi }_{ik}\right)\) [21].
We consider a BGLMM for the metaanalysis [7, 22, 23]; this model is referred to as bivariate because it assumes a bivariate normal distribution for the studyspecific vector of transformed event probabilities for the control and experimental groups based on a specific link function:
Here, \({\sigma }_{0}\) and \({\sigma }_{1}\) are betweenstudy standard deviations (SDs) of the two treatment groups, reflecting the heterogeneity in treatment arms across studies. The parameter \(\rho\) accounts for the betweenarm correlation, measuring the strength and direction of the relationship between treatment effects in two arms. It is possible to assume an equal betweenstudy SD for both treatment groups (\({\sigma }_{0}={\sigma }_{1}=\sigma\)) to reduce the model complexity; this assumption can be assessed using criteria such as the deviance information criterion (DIC) [24] for Bayesian model selection. The link function \(g\left(\cdot \right)\) has various choices, such as logit, probit, and complementary loglog transformation functions. In addition, \({\mu }_{0}\) and \({\mu }_{1}\) are the fixed effects of armspecific event probabilities on the transformed scale.
When \(g\left(\cdot \right)\) is the logit link function, \(\text{e}\text{x}\text{p}({\mu }_{1}{\mu }_{0})\) represents the conditional OR. The marginal event probability of treatment group \(k\) can be approximated as \({\pi }_{k}=E\left({\pi }_{ik}\right)\approx \text{e}\text{x}\text{p}\text{i}\text{t}(\frac{{\mu }_{k}}{\sqrt{1+{C}^{2}{\sigma }_{k}^{2} }})\), where \(C=16\sqrt{3}/\left(15\pi \right)\); here, the \(\pi\) in \(C\) represents the mathematical constant of about 3.14 [25]. It is smaller than the conditional event probability \(\text{e}\text{x}\text{p}\text{i}\text{t}\left({\mu }_{k}\right)\), unless there is no heterogeneity among studies (\({\sigma }_{k}^{2}=0)\), leading to equal conditional and marginal event probabilities. If we assume an equal betweenstudy SD for the two treatment groups, the marginal OR, RR, and RD have expressions as OR \(=\text{e}\text{x}\text{p}[\left({\mu }_{1}{\mu }_{0}\right)/\sqrt{1+{C}^{2}{\sigma }^{2}}]\), RR \(=\text{e}\text{x}\text{p}\text{i}\text{t}\left({\mu }_{1}/\sqrt{1+{C}^{2}{\sigma }^{2} }\right)/\text{e}\text{x}\text{p}\text{i}\text{t}\left({\mu }_{0}/\sqrt{1+{C}^{2}{\sigma }^{2} }\right)\), and RD \(=\text{e}\text{x}\text{p}\text{i}\text{t}\left({\mu }_{1}/\sqrt{1+{C}^{2}{\sigma }^{2} }\right)\text{e}\text{x}\text{p}\text{i}\text{t}\left({\mu }_{0}/\sqrt{1+{C}^{2}{\sigma }^{2} }\right)\). Based on the marginal event probabilities, the armbased BGLMM can yield the marginal OR, RR, and RD [22, 26]. Interested readers may refer to McCullagh [27] for discussions on the differences in conditional and marginal inferences. In the following sections, we focus on the marginal inferences because all three commonly used effect measures (OR, RR, and RD) can be estimated under this framework.
Model implementation
The BGLMM can be fitted using either frequentist [22, 28] or Bayesian approaches. The Bayesian approach is usually more computationally intensive than the frequentist approach, especially when conducting a more complicated network metaanalysis [29]. Nevertheless, it is less challenging for pairwise metaanalysis models, which are the focus of this article. We adopt the Bayesian framework, where parameters can be controlled through prior distributions, and use the R package “rjags” (version 4–13) [30] for all following analyses. We use the Markov chain Monte Carlo (MCMC) algorithm with three chains of different initial values, each containing 10,000 burnin iterations among 100,000 iterations and a thinning rate of 2 to reduce autocorrelations. This provides 135,000 Gibbs samplers for each analysis, from which the treatment effects and correlation parameters can be estimated by the posterior medians and 95% credible intervals (CrIs).
The influential role of the correlation parameter
The betweenstudy variancecovariance matrix \({\varvec{\Sigma }}_{\varvec{\mu }}\) needs to be carefully estimated when fitting the BGLMM. Within it, the betweenarm correlation is the key to account for randomizations within individual clinical trials in a metaanalysis. Inappropriate estimation of this parameter could affect the validity of conclusions from the armbased metaanalyses. In addition, as discussed in Jackson et al. [31], there are two fitting approaches regarding whether to consider the uncertainty of the estimated variancecovariance matrix. The standard procedure approximates the true variancecovariance matrix with the estimated one when making inferences about the treatment effect. However, this approximation is improper when a metaanalysis includes a small number of studies. It remains unclear how many studies are needed in multivariate metaanalyses for reliable estimation of the variancecovariance matrix.
The Bayesian analyses allow for the uncertainty in the betweenstudy variancecovariance matrix by placing priors on parameters, where external evidence can be incorporated via informative priors. However, researchers should be cautious when using weakly informative prior distribution because sometimes different priors can lead to markedly different results. For example, Wang et al. [5] evaluated the impact of covariance priors on armbased metaanalyses and found that the commonly used conjugate inverseWishart (IW) prior distribution generally produces overestimation of variances and underestimation of correlations between treatmentspecific logodds. It can cause substantial bias in the estimation of log ORs and absolute effects. Other researchers also found considerable uncertainty in the betweenstudy variancecovariance matrix estimation [32].
We simulated a dataset with 50 studies, each comprising 50 subjects in both the control and experimental groups (\({n}_{i0}\)=\({n}_{i1}\)=50). Table 1 illustrates the data structure for the first five studies. Simulation details are provided in Table 2. After randomly sorting these simulated studies, we conducted a set of metaanalyses using the first 5, 10, 15, 20, 30, and 50 studies sequentially. Table 2 indicates great uncertainties in estimating the betweenarm correlation, with notably wide 95% CrIs persisting across different numbers of studies. For instance, the metaanalysis with five studies yielded 95% CrI (\(\)0.382, 0.973) for \(\rho\), which covers a wide range of the correlation’s possible domain (\(\)1, 1). Even with 50 studies, the interval estimate is still wide, with 95% CrI (0.413, 0.807), indicating substantial uncertainty. Nevertheless, realworld metaanalyses often involve a limited number of studies [33, 34]. As shown in our examination of 69,133 pairwise metaanalyses from the previous Cochrane database, the median number of studies was 3. Furthermore, 89.7% of them included fewer than 10 studies, with only 2.5% involved more than 20 studies. Consequently, considerable uncertainties in the betweenarm correlation estimates are common in realworld metaanalyses, posing a big challenge for valid armbased metaanalyses. To address this, we proposed tipping point analyses to assess the robustness of metaanalysis results to the correlation parameter.
Tipping point analysis for the betweenarm correlation
Tipping point analysis regarding interval estimates
Recall that through the implementation of BGLMM, estimates of comparative effect measures (e.g., OR, RR, and RD) can be obtained. The determination of an effect difference between two treatments relies on whether their interval estimates cover the null values (i.e., for OR and RR, the null value is 1; for RD, the null value is 0). In this article, we refer to it as an interval conclusion. In the preliminary Bayesian implementation of BGLMM, where the correlation parameter is assigned a weaklyinformative prior distribution, we term the conclusion drawn from this approach as the original conclusion. To assess the robustness of these effect measure estimates regarding the original interval conclusion, a plausible range of values is assigned to the correlation parameter. The “tipping point” is identified as the value at which the original interval conclusion is flipped. This concept of a “tipping point” is borrowed from the sensitivity analysis of missing data analysis in randomized controlled trials.
For the armbased metaanalysis model, we focus on investigating the tipping point for the betweenarm correlation coefficient \(\rho\). Potentially, \(\rho\) can vary in a range \({R}_{\rho }\), with the most general case being \({R}_{\rho }\)=(\(\)1, 1). Alternatively, one can incorporate clinical or statistical prior knowledge to restrict \({R}_{\rho }\) to a plausible range. For example, when both treatments are acknowledged to have effects in the same direction, one may only consider a positive correlation coefficient between the treatment groups. In this case, \({R}_{\rho }\)=(0, 1).
As \(\rho\) takes continuous values, we can simplify the implementation by discretizing \({R}_{\rho }\). Consider \(B\) equally spaced points within the reasonable range \({R}_{\rho }\) (e.g., \(B\)=100 or by an increment of 0.01 in the \(\rho\) value), we repeatedly estimate absolute and comparative effect measures with \(\rho\) fixed at each value in the set of \(\{{r}_{1}, {r}_{2}, \dots , {r}_{B}\}\). If the original interval conclusion is altered at \(\rho ={r}_{b}\), where \(b\in \{1, 2, \dots , B\}\), \({r}_{b}\) is identified as the tipping point and is denoted by \({r}_{b}^{tp}\). Due to Monte Carlo errors, the tipping point may not be a single point but may instead represent a range of values for \(\rho\), within which the conclusion may oscillate before stabilizing. We define a set of all such tipping points as the “tipping range,” denoted by \({T}_{\rho }=\left[\text{min}\left({r}_{b}^{tp}\right),\text{max}\left({r}_{b}^{tp}\right)\right]\).
After obtaining the tipping point or the tipping range, we can compare it with the original estimates of the correlation, denoted as \(\widehat{\rho }\), from the preliminary implementation. If tipping points are close to \(\widehat{\rho }\), or if the tipping range contains \(\widehat{\rho }\), the original conclusion can be easily flipped by changing the correlation coefficient, suggesting that the metaanalysis conclusion is not robust. On the opposite, if the tipping point or the tipping range does not exist or falls within an implausible region, the metaanalysis conclusion is robust. For example, if the tipping point of the correlation coefficient takes a negative value, but the two treatments are known to have a positive correlation, it suggests that the metaanalysis conclusion could still be robust.
Tipping point analysis regarding point estimates
Monitoring the magnitude change of effect estimate is also important for assessing the potential impact of bias [35, 36]. Thus, we propose the tipping point analysis regarding point estimates in addition to the interval estimates. Similarly, the effect measure is estimated when the correlation parameter is assigned to plausible values. The tipping point or the tipping range is detected when the change of the estimated effect measure from its original estimate exceeds a predefined threshold. In this article, we consider the relative change, which is calculated as the new effect estimate (given a specific value for the correlation) minus the original effect estimate and then divided by the original effect estimate. The relative change can be positive or negative, implying the direction of changes in the new estimates. In the following examples, we use \(\pm\)15% and \(\pm\)30% as thresholds of low and high relative changes. Note that the threshold can be determined based on the clinical context from experts (e.g., clinically meaningful difference).
Visualization of robustness assessment
We visualize tipping points regarding interval estimates in graphs by plotting both point and interval estimates of effects against the prespecified range of the correlation coefficient values. We propose to use different colors to distinguish whether the assigned correlation coefficient value falls within the 95% Crl of the original correlation parameter and whether the conclusions of treatment effects are different from the original results. Specifically, the effect estimates are colored in black if the correlation coefficient takes values within the 95% CrI of its original estimate. The effect estimates are in blue when the correlation coefficient takes the value of its posterior median (i.e., the point estimate \(\widehat{\rho }\)). The effect estimates are in red when the interval conclusion is different from the original result (in terms of whether the CrI covers the null value). A transition to the red color suggests tipping points or tipping range of the correlation coefficient. If the potential tipping point is within the 95% CrI for the correlation coefficient, the effect estimates are colored in dark red. We also present the posterior density of the correlation parameter to reflect the likelihood of the tipping point or range. In summary, the plot suggests sensitive results to the betweenarm correlation when several effect estimates are in dark red, and their corresponding tipping point values show high posterior density.
For tipping points regarding point estimates, we keep tracking the relative change in magnitude of estimated effect measures over assigned correlation values. The trajectory is colored in red if the assigned correlation value is within the original 95% CrI of \(\rho\).
We will demonstrate these visualization approaches in the following section.
Case studies
We applied the proposed method to three pairwise metaanalyses with binary outcomes, comprising one with singlearm studies and two without singlearm studies. Figure 2 presents the forest plots of these three metaanalyses. The first metaanalysis by Au et al. [37] examined the risk of requiring additional treatment or intervention to settle in the initial episode for patients with uncomplicated acute diverticulitis. It compared patients receiving treatments without antibiotics to those receiving treatments with antibiotics. This metaanalysis had three singlearm studies exclusively focusing on treatments without antibiotics. Using a randomeffects model would discard information from these three singlearm studies, resulting in a synthesized RR of 1.47, with a 95% confidence interval (CI) of (0.73, 2.97), based on the remaining six studies.
The other two metaanalyses are from the systematic review performed by Chu et al. [38], investigating the effects of preventive measures on virus transmission for respiratory diseases. One metaanalysis studied the effect of facial mask use on preventing respiratory disease infection in the healthcare setting. They obtained a synthesized RR of 0.30 with 95% CI (0.22, 0.41) from 26 studies using a randomeffects model. Six studies were omitted from the analysis due to zero counts of infection events in both the experimental and control groups. Xiao et al. [39] reanalyzed the data with a frequentist BGLMM that accounted for doublezeroevent studies and obtained the synthesized RR as 0.34 with 95% CI (0.23, 0.51). The last metaanalysis studied the effect of physical distance on preventing Middle East respiratory syndrome (MERS) with a synthesized RR of 0.24 and 95% CI (0.05, 1.24) in a randomeffects model. In the original analysis, four studies did not contribute to the synthesized RR due to zero counts of infection events in both treatment groups.
We reanalyzed the foregoing three metaanalyses with the BGLMMs under the Bayesian framework with the logit link function. All parameters were assigned with weaklyinformative priors: \({\mu }_{0}, {\mu }_{1} \sim N(0, {100}^{2})\), \(\rho \sim U(1, 1)\), and \({\sigma }_{0}, {\sigma }_{1} \sim U(0, 10)\).
We first considered two candidate models with equal variance assumption (\({\sigma }_{0}={\sigma }_{1}=\sigma )\) and unequal variance assumption (\({\sigma }_{0}\ne {\sigma }_{1}\)), and performed a model selection procedure to find the best model based on DIC with the absolute difference strategy, where we always select the less complex model if the difference in DIC is not greater than 3 [24]. We henceforth refer to the results selected in this step (rather than those initially reported in the source papers) as the original result of the metaanalysis. Our main interest is to assess the robustness of the original results. Table 3 presents these original results for the three metaanalyses. Figure 3 displays estimated event probabilities in the experimental group compared to the control group. It suggests a stronger positive correlation between the treatment groups in the facial mask data compared to the other two datasets. This evidence aligns with the estimates presented in Table 3, where the betweenarm correlation coefficient \(\rho\) for the facial mask data was around 0.9 with a narrow 95% CrI, while the other two datasets exhibited smaller correlation coefficients with wide 95% CrIs that included zero.
Subsequently, the proposed method was applied with the selected model assumption to identify potential tipping points for the correlation coefficient \(\rho\). We reestimated a variety of effect measures, including treatmentspecific absolute risks (ARs), ORs, RRs, and RDs, by setting \(\rho\) to a range of values from \(\)0.99 to 0.99 with an increment of 0.01. Due to space limits, we focus on tipping point analyses for \(\rho\) regarding effect measures’ interval conclusions; the tipping point analyses regarding effect measures’ magnitudes are discussed only for the third metaanalysis.
All statistical code and data for implementing the proposed method and case studies are publicly available on the Open Science Framework (https://osf.io/8z9bp/).
Results
Acute diverticulitis data
The DICs in Table 3 suggest the equal betweenstudy SD assumption (\({\sigma }_{0}={\sigma }_{1}=\sigma\)) for the acute diverticulitis data. The posterior median of the correlation coefficient \(\rho\) was 0.415, with a wide 95% CrI of (\(\)0.555, 0.962). Figure 4 summarizes the results of our proposed tipping point analysis for \(\rho\). In Fig. 4(A), the MCMC posterior density for the betweenarm correlation \(\rho\) is leftskewed. The comparative effect measures (i.e., OR, RR, and RD) showed increasing trends as assigned values of \(\rho\) approaching one, indicating the effect of borrowing more homogenous information between the two treatment groups. Their corresponding interval estimates became narrower, which was expected due to increased information sharing between two treatment groups at such high correlations. Tipping points were observed at \(\rho \ge\) 0.920, corresponding to 5.70% of \(\widehat{\rho }\)’s marginal posterior samples. This provides a quantitative measure of the uncertainty surrounding the original conclusion concerning the correlation parameter and the posterior probability of the tipping range of correlation. Therefore, evaluating the impact of tipping points should be complemented by analyzing the posterior density of the tested parameter. By integrating clinical insights regarding the desired precision in the results, one can determine whether the original conclusion is robust or not. The existence of a tipping point itself does not necessarily imply that the original results are not robust.
Facial masks data
In this case study, the model with heterogeneity SDs assumption (\({\sigma }_{0}\ne {\sigma }_{1})\) was selected since the DIC difference between the two candidate models exceeded 3. The results of tipping point analyses are presented in Fig. 5. Overall, we found that the original results were robust to a wide range of values for the correlation coefficient. Although tipping points were observed, they fell outside of the 95% CrI for \(\widehat{\rho }\), suggesting that the correlation coefficient was not likely to take values that could affect the conclusion. Based on these observations, we concluded that the effect estimates were stable and not sensitive to the variation of correlation values in the facial mask data.
Physical distance data
Based on the DICs, we assumed \({\sigma }_{0}={\sigma }_{1}=\sigma\). The results of tipping point analyses in Fig. 6 identified a tipping range of [0.93, 0.99] within 95% CrI of \(\widehat{\rho }\). According to the posterior density, there were 17.3% of \(\widehat{\rho }\) within this region. This raised concerns about the robustness of the original results regarding the correlation coefficient, as conclusions could be reversed due to a small shift in the correlation value.
Figure 7 presents the tipping point analysis for \(\rho\) regarding effect measures’ magnitudes. The solid lines show the trajectory of relative change in magnitude of effect measures (OR, RR, and RD) when \(\rho\) is assigned to values between \(\)1 and 1. For the values within the 95% CrI of \(\widehat{\rho }\), the lines are colored in red; for those outside the 95% CrI of \(\widehat{\rho }\), the lines are colored in black. The blue dashed vertical line marks the original point estimate of \(\rho\). Overall, most parts of the red lines are within the 15% threshold, especially for those around the point estimate of \(\rho\). This suggests that the point estimate of the correlation parameter is of less concern.
Discussion
In metaanalyses without singlearm studies, fixing values of the correlation coefficient \(\rho\) mainly impacts the interval estimates of effect measures. In metaanalyses with singlearm studies, nonrobust results typically arise when most included singlearm studies have event probabilities deviating significantly from those of other comparative studies, as depicted by dashed lines in Fig. 3(A) but without intersecting the scatters. Therefore, our method provides an alternative approach to assess the reliability of results obtained from singlearm studies.
Previous studies in the literature primarily focused on the impact of prior distributions on Bayesian metaanalyses [4, 5], with little attention given to the impact of correlation parameters’ estimates on the metaanalysis results. Our study is the first to investigate such impact and develop novel methods to quantify and visualize it in the framework of Bayesian armbased metaanalyses.
There are some limitations of this study. First, our analyses have mainly focused on interpreting the robustness of the results from a statistical perspective. However, clinical insights are highly needed in such an assessment, particularly when determining a reasonable range of values for the betweenarm correlation coefficient. Second, fixing the correlation parameter to a specific value reduces the uncertainties in the estimates compared to the original analysis, where the correlation parameter is assigned to a prior distribution. Therefore, even assigning the correlation coefficient at the same value as its point estimate in the original analysis, the resulting estimates of effect measures could still be slightly different from the original analysis. Third, this article only considered the armbased model. Contrastbased models are widely used in current metaanalysis research; similar approaches of tipping point analyses could be developed for contrastbased models.
In summary, sensitivity analyses are crucial for interpreting armbased metaanalyses, which are of growing importance. Current sensitivity analyses often consider changes in events (such as the fragility index), model choices, and prior distributions used for Bayesian analyses. Our proposed tipping point analyses tackle the problem from a different perspective, considering the impact of correlation parameters on effect measure estimates. Future work could extend to other commonly used models in metaanalyses, such as the betabinomial model [22] or contrastbased models. Contrastbased models do not involve betweenarm correlations; their heterogeneity parameter is primarily the heterogeneity variance for a treatment contrast. The proposed tipping point analyses can also be extended to network metaanalyses, where multiple treatment comparisons can be jointly synthesized. Such an extension requires a more thorough consideration, as the variancecovariance matrix in this setting can become more complex.
Conclusions
This article focused on the impact of the betweenarm correlation on the results of armbased metaanalyses, an increasingly useful method for including singlearm studies, historical controls, and populationspecific estimates under the Bayesian framework. We have proposed a tipping point analysis method to quantitatively assess the robustness of metaanalysis results by assigning specific values to the correlation parameter within a plausible range. Innovative graphical tools have also been introduced to intuitively visualize the impact of the correlation parameter and its tipping points on the conclusions about treatment effects drawn from metaanalyses. We have demonstrated the proposed tipping point analysis on three realworld metaanalyses.
Data availability
The data that support the findings of this research are available from the forest plots in Fig. 2.
Abbreviations
 AR:

Absolute risk
 BGLMM:

Bivariate generalized linear mixed effects model
 CI:

Confidence interval
 CrI:

Credible interval
 DIC:

Deviance information criterion
 IW:

InverseWishart prior
 MCMC:

Markov chain Monte Carlo
 MERS:

Middle East respiratory syndrome
 OR:

Odds ratio
 RD:

Risk difference
 RR:

Relative risk
 SD:

Standard deviation
References
Atal I, Porcher R, Boutron I, Ravaud P. The statistical significance of metaanalyses is frequently fragile: definition of a fragility index for metaanalyses. J Clin Epidemiol. 2019;111:32–40.
Lin L, Xing A, Chu H, Murad MH, Xu C, Baer BR, et al. Assessing the robustness of results from clinical trials and metaanalyses with the fragility index. Am J Obstet Gynecol. 2023;228:276–82.
Xing A, Chu H, Lin L. Fragility index of network metaanalysis with application to smoking cessation data. J Clin Epidemiol. 2020;127:29–39.
Rosenberger KJ, Xing A, Murad MH, Chu H, Lin L. Prior choices of betweenstudy heterogeneity in contemporary Bayesian network metaanalyses: an empirical study. J Gen Intern Med. 2021;36:1049–57.
Wang Z, Lin L, Hodges JS, Chu H. The impact of covariance priors on armbased Bayesian network metaanalyses with binary outcomes. Stat Med. 2020;39:2883–900.
Cornell JE, Mulrow CD, Localio R, Stack CB, Meibohm AR, Guallar E, et al. Randomeffects metaanalysis of inconsistent effects: a time for change. Ann Intern Med. 2014;160:267–70.
Van Houwelingen HC, Zwinderman KH, Stijnen T. A bivariate approach to metaanalysis. Stat Med. 1993;12:2273–84.
Dias S, Ades AE. Absolute or relative effects? Armbased synthesis of trial data. Res Synth Methods. 2016;7:23–8.
Hong H, Chu H, Zhang J, Carlin BP. Rejoinder to the discussion of a Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons, by, Dias S. and A. E. Ades. Res Synth Methods. 2016;7:29–33.
White IR, Turner RM, Karahalios A, Salanti G. A comparison of armbased and contrastbased models for network metaanalysis. Stat Med. 2019;38:5197–213.
Karahalios A, McKenzie JE, White IR. Contrastbased and armbased models for network metaanalysis. Methods Mol Biol. 2022;2345:203–21.
Singh J, Abrams KR, Bujkiewicz S. Incorporating singlearm studies in metaanalysis of randomised controlled trials: a simulation study. BMC Med Res Methodol. 2021;21:114.
Zhang J, Ko CW, Nie L, Chen Y, Tiwari R. Bayesian hierarchical methods for metaanalysis combining randomizedcontrolled and singlearm studies. Stat Methods Med Res. 2019;28:1293–310.
Wang Z, Lin L, Murray T, Hodges JS, Chu H. Bridging randomized controlled trials and singlearm trials using commensurate priors in armbased network metaanalysis. Ann Appl Stat. 2021;15:1767–87.
Murad MH, Wang Z, Zhu Y, Saadi S, Chu H, Lin L. Methods for deriving risk difference (absolute risk reduction) from a metaanalysis. BMJ. 2023;381:e073141.
Lin L, Chu H, Hodges JS. Sensitivity to excluding treatments in network metaanalysis. Epidemiology. 2016;27:562–9.
Palpacuer C, Hammas K, Duprez R, Laviolle B, Ioannidis JPA, Naudet F. Vibration of effects from diverse inclusion/exclusion criteria and analytical choices: 9216 different ways to perform an indirect comparison metaanalysis. BMC Med. 2019;17:174.
Yan X, Lee S, Li N. Missing data handling methods in medical device clinical trials. J Biopharm Stat. 2009;19:1085–98.
GorstRasmussen A, TarpJohansen MJ. Fast tipping point sensitivity analyses in clinical trials with missing continuous outcomes under multiple imputation. J Biopharm Stat. 2022;32:942–53.
Xiao M, Chen Y, Cole SR, MacLehose RF, Richardson DB, Chu H. Controversy and debate: questionable utility of the relative risk in clinical research: paper 2: is the odds ratio portable in metaanalysis? Time to consider bivariate generalized linear mixed model. J Clin Epidemiol. 2022;142:280–7.
Chu H, Cole SR. Bivariate metaanalysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. J Clin Epidemiol. 2006;59:1331–2.
Chu H, Nie L, Chen Y, Huang Y, Sun W. Bivariate random effects models for metaanalysis of comparative studies with binary outcomes: methods for the absolute risk difference and relative risk. Stat Methods Med Res. 2012;21:621–33.
van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in metaanalysis: multivariate approach and metaregression. Stat Med. 2002;21:589–624.
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J Royal Stat Society: Ser B (Statistical Methodology). 2002;64:583–639.
Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44:1049–60.
Zhang J, Carlin BP, Neaton JD, Soon GG, Nie L, Kane R, et al. Network metaanalysis of randomized clinical trials: reporting the proper summaries. Clin Trials. 2014;11:246–62.
McCullagh P. Sampling bias and logistic models. J Royal Stat Society: Ser B (Statistical Methodology). 2008;70:643–77.
Noh M, Lee Y. REML estimation for binary data in GLMMs. J Multivar Anal. 2007;98:896–915.
Rott KW, Lin L, Hodges JS, Siegel L, Shi A, Chen Y, et al. Bayesian metaanalysis using SAS PROC BGLIMM. Res Synth Methods. 2021;12:692–700.
Plummer M, Stukalov A, Denwood M. rjags: Bayesian graphical models using MCMC. 2022.
Jackson D, Riley R, White IR. Multivariate metaanalysis: potential and promise. Stat Med. 2011;30:2481–98.
Wei Y, Higgins JPT. Bayesian multivariate metaanalysis with multiple outcomes. Stat Med. 2013;32:2911–34.
Davey J, Turner RM, Clarke MJ, Higgins JPT. Characteristics of metaanalyses and their component studies in the Cochrane Database of Systematic Reviews: a crosssectional, descriptive analysis. BMC Med Res Methodol. 2011;11:160.
Mathes T, Kuss O. A comparison of methods for metaanalysis of a small number of studies with binary outcomes. Res Synth Methods. 2018;9:366–81.
Hultcrantz M, Rind D, Akl EA, Treweek S, Mustafa RA, Iorio A, et al. The GRADE Working Group clarifies the construct of certainty of evidence. J Clin Epidemiol. 2017;87:4–13.
Phillippo DM, Dias S, Welton NJ, Caldwell DM, Taske N, Ades AE. Threshold analysis as an alternative to GRADE for assessing confidence in guideline recommendations based on network Metaanalyses. Ann Intern Med. 2019;170:538–46.
Au S, Aly EH. Treatment of uncomplicated Acute Diverticulitis without antibiotics: a systematic review and Metaanalysis. Dis Colon Rectum. 2019;62:1533–47.
Chu DK, Akl EA, Duda S, Solo K, Yaacoub S, Schünemann HJ, et al. Physical distancing, face masks, and eye protection to prevent persontoperson transmission of SARSCoV2 and COVID19: a systematic review and metaanalysis. Lancet. 2020;395:1973–87.
Xiao M, Lin L, Hodges JS, Xu C, Chu H. Doublezeroevent studies matter: a reevaluation of physical distancing, face masks, and eye protection for preventing persontoperson transmission of COVID19 and its policy impact. J Clin Epidemiol. 2021;133:158–60.
Acknowledgements
Not applicable.
Funding
LL was supported in part by the US National Institute of Mental Health grant R03 MH128727, the US National Library of Medicine grants R21 LM014533 and R01 LM012982, and the Arizona Department of Health Services grant RFGA202300811. ZH was supported by the US Agency for Healthcare Research and Quality grant R21 HS029969, the US National Institute on Aging grants R21 AG061431, R01 AG064529, and the US National Library of Medicine grant R21 LM013911. The content is solely the responsibility of the authors and does not necessarily represent the official views of the US National Institutes of Health or AHRQ or the US Agency for Healthcare Research and Quality.
Author information
Authors and Affiliations
Contributions
Wenshan Han: methodology, software, formal analysis, investigation, data curation, writingoriginal draft, writingreview & editing, visualization; Zheng Wang: writingreview & editing; Mengli Xiao: writingreview & editing; Zhe He: writingreview & editing; Haitao Chu: conceptualization, writingreview & editing; Lifeng Lin: supervision, formal analysis, writingreview & editing, supervision.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Ethics approval and consent to participate were not required for this study, as it used published, deidentifiable data.
Consent for publication
Not applicable.
Competing interests
Wenshan Han, Zheng Wang, Mengli Xiao, Zhe He, and Lifeng Lin have no competing interests to report. Haitao Chu is employed by Pfizer and owns its stocks.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Han, W., Wang, Z., Xiao, M. et al. Tipping point analysis for the betweenarm correlation in an armbased evidence synthesis. BMC Med Res Methodol 24, 162 (2024). https://doi.org/10.1186/s1287402402263w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402402263w