Bivariate randomeffects metaanalysis and the estimation of betweenstudy correlation
 Richard D Riley^{1}Email author,
 Keith R Abrams^{2},
 Alexander J Sutton^{2},
 Paul C Lambert^{2} and
 John R Thompson^{2}
DOI: 10.1186/1471228873
© Riley et al; licensee BioMed Central Ltd. 2007
Received: 04 September 2006
Accepted: 12 January 2007
Published: 12 January 2007
Abstract
Background
When multiple endpoints are of interest in evidence synthesis, a multivariate metaanalysis can jointly synthesise those endpoints and utilise their correlation. A multivariate randomeffects metaanalysis must incorporate and estimate the betweenstudy correlation (ρ _{ B }).
Methods
In this paper we assess maximum likelihood estimation of a general normal model and a generalised model for bivariate randomeffects metaanalysis (BRMA). We consider two applied examples, one involving a diagnostic marker and the other a surrogate outcome. These motivate a simulation study where estimation properties from BRMA are compared with those from two separate univariate randomeffects metaanalyses (URMAs), the traditional approach.
Results
The normal BRMA model estimates ρ _{ B }as 1 in both applied examples. Analytically we show this is due to the maximum likelihood estimator sensibly truncating the betweenstudy covariance matrix on the boundary of its parameter space. Our simulations reveal this commonly occurs when the number of studies is small or the withinstudy variation is relatively large; it also causes upwardly biased betweenstudy variance estimates, which are inflated to compensate for the restriction on $\widehat{\rho}$ _{ B }. Importantly, this does not induce any systematic bias in the pooled estimates and produces conservative standard errors and meansquare errors. Furthermore, the normal BRMA is preferable to two normal URMAs; the meansquare error and standard error of pooled estimates is generally smaller in the BRMA, especially given data missing at random. For metaanalysis of proportions we then show that a generalised BRMA model is better still. This correctly uses a binomial rather than normal distribution, and produces better estimates than the normal BRMA and also two generalised URMAs; however the model may sometimes not converge due to difficulties estimating ρ _{ B }.
Conclusion
A BRMA model offers numerous advantages over separate univariate synthesises; this paper highlights some of these benefits in both a normal and generalised modelling framework, and examines the estimation of betweenstudy correlation to aid practitioners.
Background
Traditionally, metaanalysis models combine summary measures of a single quantitative endpoint, taken from different studies, to produce a single pooled result. However, multiple pooled results are required whenever there are multiple outcomes [1] or multiple treatment groups [2]. A multivariate metaanalysis model uses the correlation between the endpoints and obtains the multiple pooled results collectively [3, 4]. For example, Reitsma et al. [5] have suggested a bivariate randomeffects metaanalysis (BRMA) to jointly synthesise logitsensitivity and logitspecificity values from diagnostic studies. Multivariate metaanalysis has been quite widely used, including application to genetic associations [6], surrogate endpoints [7, 8], psychological findings [9] and prognostic markers [10].
Often the advantage of a multivariate randomeffects metaanalysis lies in its ability to use the betweenstudy correlation of the multiple endpoints of interest. For example, in diagnostic studies the sensitivity and specificity are usually negatively correlated across studies due to the use of different thresholds [5]. Van Houwelingen et al. [4] use the betweenstudy correlation to describe the shape of the bivariate relationship between the true logodds in a treatment group and the true logodds in a control group (baseline risk). Riley et al. [10] algebraically assess BRMA and show that the correlation allows a 'borrowing of strength' across endpoints. This leads to pooled estimates that have a smaller standard error than those from corresponding univariate randomeffects metaanalyses (URMAs), especially when some endpoints are missing at random across studies.
Some recent articles have indicated the betweenstudy correlation may often be estimated at the end of its parameter space, as +1 or 1. Thompson et al. [6] apply a normal BRMA model to genetic studies of coronary heart disease and report that the betweenstudy correlation was 'poorly estimated' with the likelihood peaking at +1, and an estimate of +1 has also been reported in other applications [4, 11]. To aid practitioners, in this paper we analytically consider why this occurs, and then explore the impact, if any, this has on the pooled estimates and betweenstudy variance estimates. This investigation also allows us to examine the general benefits of BRMA over URMA, to build on a number of other recent articles [5, 10], and encourage a greater use and appreciation of BRMA in practice.
The outline of the paper is as follows. We begin by introducing the general normal model for BRMA and discuss analytically why the betweenstudy correlation can be estimated at the edge of its parameter space. We then apply the model to two real examples from the literature, one involving a diagnostic marker and one involving a surrogate outcome, and these both give a betweenstudy correlation estimate of 1. This then motivates a realistic simulation study to examine how the estimate of betweenstudy correlation affects the statistical properties of the pooled and betweenstudy variance estimates. It also allows us to compare the performance of the normal BRMA model to two separate URMAs, the more common approach in practice [10]. We then extend our work to consider metaanalysis of proportions, and highlight why a generalised model for BRMA is preferred to the general normal BRMA model [12], and also two separate generalised URMA models. We conclude by summarising the broad benefits of BRMA for practitioners, and discuss future research priorities.
Methods
In this section we introduce a hierarchical normal framework for BRMA and URMA. We describe how the BRMA normal model is estimated, and analytically consider the estimation of betweenstudy correlation. We then describe the rationale and methodology for our simulation study of BRMA versus URMA. To ensure a real world context, this section also includes two motivating examples from the medical literature where a BRMA is potentially important.
Motivating example 1 – the telomerase data
The telomerase data taken from the bladder cancer review of Glas et al. [13]
Study  Number of patients with bladder cancer  Number of patients with bladder cancer and a positive test result  Sensitivity  Logitsensitivity Y _{ i1}  Standard error of logitsensitivity S _{ i1}  Number of patients without bladder cancer  Number of patients without bladder cancer and a negative test result  Specificity  Logitspecificity Y _{ i2}  Standard error of logitspecificity S _{ i2}  Withinstudy correlation ? _{ Wi } 

1  33  25  0.758  1.139  0.406  26  25  0.962  3.219  1.020  0 
2  21  17  0.810  1.447  0.556  14  11  0.786  1.299  0.651  0 
3  104  88  0.846  1.705  0.272  47  31  0.660  0.661  0.308  0 
4  26  16  0.615  0.470  0.403  83  80  0.964  3.283  0.588  0 
5  57  40  0.702  0.856  0.290  138  137  0.993  4.920  1.004  0 
6  47  38  0.809  1.440  0.371  30  24  0.800  1.386  0.456  0 
7*  43  23.5  0.547  0.187  0.306  13  12.5  0.962  3.219  1.442  0 
8  33  27  0.818  1.504  0.451  20  18  0.900  2.197  0.745  0 
9  17  14  0.824  1.540  0.636  32  29  0.906  2.269  0.606  0 
10  44  37  0.841  1.665  0.412  29  7  0.241  1.145  0.434  0 
A general normal model for bivariate randomeffects metaanalysis (BRMA)
Suppose that i = 1 to n studies are identified by a systematic review, and that two endpoints (j = 1 or 2) are available from each study. Each study supplies summary measures, Y _{ ij }, and associated standard errors, s _{ ij }, for each endpoint. For instance, for diagnostic studies Reitsma et al. [5] suggest the logitsensitivity is Y _{ i1 }and the logitspecificity is Y _{ i2}. Each summary statistic (Y _{ ij }) is assumed to be an estimate of a true value (θ _{ ij }) in each study, and in a hierarchical structure each θ _{ ij }is assumed to be drawn from a distribution with mean (or 'pooled') value β _{ j }and betweenstudy variance ${\tau}_{j}^{2}$. If Y _{ ij }/θ _{ ij }and θ _{ ij }are both assumed normally distributed then the BRMA can be specified as:
$\begin{array}{ll}\left(\begin{array}{c}{Y}_{i1}\\ {Y}_{i2}\end{array}\right)~N\left(\left(\begin{array}{c}{\theta}_{i1}\\ {\theta}_{i2}\end{array}\right),{\mathbf{\delta}}_{i}\right)\hfill & {\mathbf{\delta}}_{i}=\left(\begin{array}{cc}{s}_{i1}^{2}& {s}_{i1}{s}_{i2}{\rho}_{wi}\\ {s}_{i1}{s}_{i2}{\rho}_{wi}& {s}_{i2}^{2}\end{array}\right)\hfill \\ \left(\begin{array}{c}{\theta}_{i1}\\ {\theta}_{i2}\end{array}\right)~N\left(\left(\begin{array}{c}{\beta}_{1}\\ {\beta}_{2}\end{array}\right),\mathbf{\Omega}\right)\hfill & \mathbf{\Omega}=\left(\begin{array}{cc}{\tau}_{1}^{2}& {\tau}_{1}{\tau}_{2}{\rho}_{B}\\ {\tau}_{1}{\tau}_{2}{\rho}_{B}& {\tau}_{2}^{2}\end{array}\right)\hfill \end{array}\phantom{\rule{0.5em}{0ex}}\left(1\right)$
This model is the general normal model for BRMA [4], where δ _{ i }and Ω are the withinstudy and the betweenstudy covariance matrices respectively. Usually of key interest from the analysis are the pooled estimates of β _{1} and β _{2}, although sometimes an estimated function of these may be desired; for instance in the telomerase example an estimate of the log of the diagnostic odds ratio is given by $\widehat{\beta}$ _{1} + $\widehat{\beta}$ _{2}. The BRMA differs from two independent URMAs by the inclusion of the withinstudy correlations (i.e. the ρ _{ Wi }s) and the betweenstudy correlation (ρ _{ B }). Equation (2) is equivalent to two independent URMAs when ρ _{ Wi }= ρ _{ B }= 0 for all i, i.e. there is zero correlation:
$\begin{array}{ll}\left(\begin{array}{c}{Y}_{i1}\\ {Y}_{i2}\end{array}\right)~N\left(\left(\begin{array}{c}{\theta}_{i1}\\ {\theta}_{i2}\end{array}\right),{\mathbf{\delta}}_{i}\right)\hfill & {\mathbf{\delta}}_{i}=\left(\begin{array}{cc}{s}_{i1}^{2}& 0\\ 0& {s}_{i2}^{2}\end{array}\right)\hfill \\ \left(\begin{array}{c}{\theta}_{i1}\\ {\theta}_{i2}\end{array}\right)~N\left(\left(\begin{array}{c}{\beta}_{1}\\ {\beta}_{2}\end{array}\right),\mathbf{\Omega}\right)\hfill & \mathbf{\Omega}=\left(\begin{array}{cc}{\tau}_{1}^{2}& 0\\ 0& {\tau}_{2}^{2}\end{array}\right)\hfill \end{array}\phantom{\rule{0.5em}{0ex}}\left(2\right)$
In equation (1) it is common to assume the ${s}_{ij}^{2}$ s and ρ _{ Wi }s are known [1, 4]. Including the uncertainty of the ${s}_{ij}^{2}$ s is unnecessary for URMA [14], but whether the uncertainty of the ${s}_{ij}^{2}$ s and ρ _{ Wi }s should be incorporated in BRMA is yet to be examined. This issue is outside the scope of this paper but we note that a Bayesian framework is particularly flexible for incorporating such uncertainty [15].
Within and betweenstudy correlation
The withinstudy correlation, ρ _{ Wi }, indicates whether Y _{ i1 }and Y _{ i2 }are correlated within a study, and these ρ _{ Wi }s are usually assumed known. For the telomerase data the ρ _{ Wi }s might be assumed to be zero because sensitivity and specificity values are calculated independently in a study using different sets of patients. In other BRMA applications the ρ _{ Wi }s can be nonzero, for example where the two endpoints are overall and diseasefree survival [10]. In practice it may be difficult to obtain the value of nonzero ρ _{ Wi }s, although it can be done as evident in Berkey et al. [1] and the 'motivating example 2' below [7]. Suggestions for limiting the problem of unavailable ρ _{ Wi }s have been proposed [15–17], and this issue is considered further in the discussion.
The betweenstudy correlation, ρ _{ B }, is not generally known and has to be estimated when fitting the BRMA. It indicates how the underlying true values, i.e. the θ _{ i1}s and the θ _{ i2 }s, are related across studies. It may be caused by differences across studies in patientlevel characteristics, such as age and stage of disease, or changes in studylevel characteristics, such as the threshold level in diagnostic studies
Motivating example 2 – the CD4 data
Daniels and Hughes [7] assess whether the change in CD4 cell count is a surrogate for time to either development of AIDS or death in drug trials of patients with HIV. They consider betweentreatmentarm loghazard ratios of time to onset of AIDS or death (Y _{ i1}), and betweentreatmentarm differences in mean changes in CD4 count (Y _{ i2}) from pretreatment baseline to about six months. Fifteen relevant trials were identified. Some of the trials involved three or four treatment arms, but to enable application to BRMA here we only consider outcome differences between the control arm and the first treatment arm in the reported dataset [7]. All fifteen trials provided complete data, including the withinstudy correlations which were quite small, varying between 0.22 and 0.17 with a mean of 0.08.
Estimation
In our analyses of equation (1) in this paper the betweenstudy parameters (i.e. ${\tau}_{1}^{2}$, ${\tau}_{2}^{2}$ and ρ _{ B }) and the two pooled values (β _{1} and β _{2}) are estimated iteratively using restricted maximum likelihood (REML) in SAS Proc Mixed, as described elsewhere [4]. Unless otherwise stated, we also use Cholesky decomposition [18] of Ω to ensure that this matrix is estimated to be positive semidefinite and therefore that the betweenstudy correlation estimate, $\widehat{\rho}$ _{ B }, is in the range [1,1]. Cholesky decomposition of Ω also helps ensure convergence when $\widehat{\rho}$ _{ B }is very close to 1 or 1.
Analytic consideration of the betweenstudy covariance parameters
Estimation and inference in classical linear mixed models are based on the marginal model, which for equation (1) is the bivariate normal model with variancecovariance δ _{ i }+ Ω. Assume for the sake of simplicity that the withinstudy covariance matrix δ _{ i }= δ for all studies. Then the covariance matrix of the observed Y _{ i1 }s and Y _{ i2 }s is given by V = δ + Ω, where δ is known. Now, this puts (severe) restrictions on V, namely that V  δ is a covariance matrix, that is nonnegative definite. If the estimated V does not satisfy this restriction, the maximum likelihood estimate of Ω will be truncated on the boundary of its parameter space. This means that if Ω is diagonal (as for URMA) the maximum likelihood estimate on the boundary will have one or both of the betweenstudy variances equal to zero; else if Ω is nondiagonal (as for BRMA) then either one or both of the betweenstudy variances equals zero or else the betweenstudy correlation equals 1 or +1. In metaanalysis a betweenstudy variance estimate of zero is wellunderstood, but a betweenstudy correlation estimate of 1 or +1 is likely to be less familiar to practitioners. Thompson et al. [6] refer to this issue as 'poor estimation' but it perhaps should rather be considered a natural consequence of the sensible restrictions imposed on V, and one that prevents a variance estimate < 0 or a correlation estimate > +1 or < 1, as might otherwise be obtained.
Rationale for the research in this paper
In this paper, to aid practitioners we will further assess the normal BRMA model and the role of betweenstudy correlation. In particular, we aim to identify the situations when $\widehat{\rho}$ _{ B }is likely to be +1 or 1 and examine the impact this has on the pooled and betweenstudy variance estimates. We will also evaluate the benefits of BRMA over two separate URMAs, the more common approach in practice, and explore extensions to a generalised BRMA model for metaanalysis of proportions. To achieve these goals we firstly apply the normal BRMA of equation (1) to the two motivating examples. We then perform a simulation study of the normal BRMA model, as described below. The generalised BRMA model is then introduced and assessed in relation to the normal BRMA model and two separate generalised URMAs (see Results).
A simulation study to assess BRMA and the betweenstudy correlation
Scenarios used in the simulations based on equation (1)
Pooled values  Betweenstudy variances  Within and betweenstudy correlation  Withinstudy variation  

Scenario  β _{1}  β _{2}  ${\tau}_{1}^{2}$  ${\tau}_{2}^{2}$  ρ _{ Wi }  ρ _{ B }  Median value of the ${s}_{ij}^{2}$ s:  Description  
Complete data  n = 50  n = 5  
(i)  0  2  0.25  0.25  0  0  0.254  0.147  Zero correlation; withinstudy variation similar to betweenstudy variation 
(ii)  0  2  0.25  0.25  0  0.8  0.254  0.147  No withinstudy correlation but high betweenstudy correlation; withinstudy variation similar to betweenstudy variation 
(iii)  0  2  0.25  0.25  0.8  0  0.254  0.147  High withinstudy correlation but no betweenstudy correlation; withinstudy variation similar to betweenstudy variation 
(iv)  0  2  0.25  0.25  0.8  0.8  0.254  0.147  High within and betweenstudy correlation; withinstudy variation similar to betweenstudy variation 
(v)  0  2  0.0025  0.0025  0.8  0.8  0.254  0.147  High within and betweenstudy correlation; withinstudy variation large relative to betweenstudy variation 
(vi)  0  2  0.0025  1.5  0.8  0.8  0.254  0.147  High within and betweenstudy correlation; withinstudy variation large (for endpoint 1) and small (for endpoint 2) relative to betweenstudy variation 
(vii)  0  2  1.5  1.5  0.8  0.8  0.254  0.147  High within and betweenstudy correlation; withinstudy variation small relative to betweenstudy variation 
Missing data  n = 50  n = 10  
(viii)  0  2  1.5  1.5  0  0.8  0.244  0.183  No withinstudy correlation but high betweenstudy correlation; withinstudy variance small relative to betweenstudy variance 
(ix)  0  2  0.25  0.25  0  0.8  0.244  0.183  No withinstudy correlation but high betweenstudy correlation; withinstudy variance similar to betweenstudy variance 
(x)  0  2  1.5  1.5  0.8  0.8  0.244  0.183  High within and betweenstudy correlation; withinstudy variance small relative to betweenstudy variance 
(xi)  0  2  0.25  0.25  0.8  0.8  0.244  0.183  High within and betweenstudy correlation; withinstudy variance similar to betweenstudy variance 
Description of the simulation procedure for scenario (i) with n = 50
Generation of a dataset of 1000 metaanalyses
We chose β _{1} = 0 in order to reflect little clinical benefit (e.g. a sensitivity of 50%) and in contrast β _{2} = 2 (e.g. a specificity of 88%). The within study variances required, i.e. the 50 ${s}_{i1}^{2}$ s and 50 ${s}_{i2}^{2}$ s, were each found by sampling from a N(0.25,0.50) distribution and squaring the value obtained. This produced a median ${s}_{ij}^{2}$ of 0.25 and an interquartile range of 0.7, values similar to those for the telomerase data (median ${s}_{ij}^{2}$ = 0.21) and for the BRMA of Thompson et al. [6] (median ${s}_{ij}^{2}$ = 0.28, interquartile range = 0.76). The betweenstudy variances, ${\tau}_{1}^{2}$ and ${\tau}_{2}^{2}$, were chosen to be 0.25 in scenario (i), which meant that they were similar in size to the median withinstudy variances. The within and betweenstudy correlations were both set to zero for this scenario. All these choices were substituted into equation (1) and 1000 metaanalyses each of 50 studies were generated. Calculations were performed in SPlus using the 'rmvnorm' function for generating bivariate normal values (code available upon request).
Estimation using the dataset of 1000 metaanalyses
Each of the 1000 metaanalyses in scenario (i) were analysed separately by:

fitting two separate URMAs as in equation (2) (where ρ _{ B }= 0) using REML to estimate β _{1}, β _{2}, ${\tau}_{1}^{2}$ and ${\tau}_{2}^{2}$

fitting a BRMA as in equation (1) using REML to estimate β _{1}, β _{2}, ${\tau}_{1}^{2}$, ${\tau}_{2}^{2}$, and ρ _{ B }
The 1000 BRMA estimates and the corresponding 1000 URMA estimates from scenario (i) were then compared by calculating:

average parameter estimates across all the simulations (to check for bias)

coverage of the 95% confidence intervals for β _{1} and β _{2}

average standard error and meansquare error (MSE) of β _{1} and β _{2}

the number of occasions $\widehat{\rho}$ _{ B }was equal to +1 or 1 in the BRMA
To assess coverage, the 95% confidence intervals for β _{ j }were calculated using:
${\widehat{\beta}}_{j}\pm \left({t}_{{n}_{j}1}(0.05)\ast \sqrt{var({\widehat{\beta}}_{j})}\right),$
with n _{ j }the number of studies providing endpoint j. This tdistribution is commonly used in the metaanalysis literature, although it is only an approximation [20].
Description of the simulation procedure for scenarios (ii) to (xi)
Simulations in the other scenarios followed in the same manner as described above but with the data generated from the parameter values specific to each scenario as given in Table 2. For those missing data simulations of scenarios (viii) to (xi) we simulated data as described for complete data, except that for each generated metaanalysis we removed the data for the second endpoint in a randomly selected 50% of studies. So, for example, with n = 50 in scenario (viii) each of the 1000 simulated metaanalyses contained 25 studies with complete data and 25 studies with data for the first endpoint only.
Results
Application to the telomerase and CD4 data
URMA and BRMA results for the telomerase and CD4 datasets
Dataset  Model  Pooled value endpoint 1 $\widehat{\beta}$ _{1} (s.e.)  Betweenstudy variance endpoint 2 ${\widehat{\tau}}_{1}^{2}$  Pooled value endpoint 2 $\widehat{\beta}$ _{2} (s.e.)  Betweenstudy variance endpoint 2 ${\widehat{\tau}}_{2}^{2}$  Betweenstudy correlation $\widehat{\rho}$ _{ B } 

Telomerase  Normal URMA  1.155 (0.186)  0.186  1.964 (0.541)  2.386  NA 
Normal BRMA  1.166 (0.186)  0.202  2.058 (0.554)  2.584  1.0  
Generalised URMA  1.182 (0.176)  0.155  2.215 (0.578)  2.680  NA  
CD4  Normal URMA  0.049 (0.0695)  0.025  17.300 (5.561)  379.73  NA 
Normal BRMA  0.109 (0.0748)  0.048  18.314 (5.740)  412.96  1.0 
The question is thus posed: is the estimation of ρ _{ B }at the boundary of its parameter space adversely influencing the other BRMA parameter estimates and, if so, how (e.g. does it introduce bias)? Also, in terms of the individual pooled estimates, the telomerase and CD4 examples indicate little benefit of BRMA over two separate URMAs, despite the utilisation of correlation; but is this generally true and in what situations should a BRMA be preferred? To understand the answers to these important questions, it is helpful to now consider the results from our simulation study.
We note at this point that, for both telomerase and CD4, we also tried estimation of BRMA using an unstructured form of Ω, rather than using Cholesky decomposition of Ω as previously. Interestingly, this approach produced nonsensical betweenstudy correlation estimates of 1.12 and 1.074 for the telomerase and CD4 datasets respectively. This emphasises the importance of a boundary constraint on ρ _{ B }as imposed by the Cholesky decomposition.
Simulation results
Simulation results of the normal BRMA and URMA models for scenarios (i), (ii), (viii), and (ix)
Metaanalysis model  n  No. of the 1000 simulations that converged  Bias of mean $\widehat{\beta}$ _{1}  Mean s.e. of $\widehat{\beta}$ _{1}  MSE of $\widehat{\beta}$ _{1}  Coverage of the 95% CIs for $\widehat{\beta}$ _{1}  Bias of mean $\widehat{\beta}$ _{2}  Mean s.e. of $\widehat{\beta}$ _{2}  MSE of $\widehat{\beta}$ _{2}  Coverage of the 95% CIs for $\widehat{\beta}$ _{2}  Bias of mean ${\widehat{\tau}}_{1}^{2}$(no. of ${\widehat{\tau}}_{1}^{2}$= 0)  Bias of mean ${\widehat{\tau}}_{2}^{2}$(no. of ${\widehat{\tau}}_{2}^{2}$= 0)  Bias of mean $\widehat{\rho}$ _{ B }  % of $\widehat{\rho}$ _{ B }= 1  % of $\widehat{\rho}$ _{ B }= 1 

Scenario (i): Complete data – zero correlation; withinstudy variance similar to betweenstudy variance  
URMA  50  1000  0.005  0.102  0.010  94.8%  0.001  0.107  0.0108  95.4%  0.003 (0)  0.005 (1)       
BRMA  50  1000  0.005  0.101  0.010  94.7%  0.001  0.106  0.0108  95.5%  0.003 (0)  0.005 (0)  0.001  0.2%  0.4% 
URMA  5  1000  0.002  0.267  0.081  96.0%  0.006  0.267  0.0887  94.0%  0.006 (89)  0.015 (81)       
BRMA  5  998  0.002  0.274  0.081  96.7%  0.006  0.269  0.0894  95.3%  0.008 (10)  0.024 (0)  0.027  29.6%  29.0% 
Scenario (ii): Complete data – no withinstudy correlation, high betweenstudy correlation; withinstudy variance similar to betweenstudy variance  
URMA  50  1000  0.004  0.102  0.010  95.6%  0.001  0.106  0.0114  94.4%  0 (0)  0.004 (1)       
BRMA  50  1000  0.004  0.100  0.010  95.3%  0  0.104  0.0107  95.4%  0.001 (0)  0.001 (0)  0.005  0%  25.2% 
URMA  5  999  0.002  0.271  0.077  97.3%  0.005  0.263  0.0826  94.2%  0.004 (80)  0.005 (104)       
BRMA  5  1000  0.002  0.279  0.077  98.1%  0.008  0.268  0.0819  95.7%  0.024 (15)  0.024 (0)  0.161  10.3%  60.5% 
Scenario (viii): Missing data – no withinstudy correlation, high betweenstudy correlation; withinstudy variance smaller than betweenstudy variance  
URMA  50  1000  0.004  0.071  0.005  94.9%  0  0.099  0.0101  95.0%  0.006 (0)  0.005 (0)       
BRMA  50  1000  0.004  0.071  0.005  94.9%  0  0.082  0.0068  95.2%  0.006 (0)  0.007 (0)  0.001  0%  0% 
URMA  10  1000  0.002  0.154  0.028  94.1%  0.003  0.209  0.0576  93.7%  0.006 (0)  0.006 (0)       
BRMA  10  1000  0.002  0.154  0.028  94.1%  0.001  0.174  0.0427  93.3%  0.006 (0)  0.006 (0)  0.040  0%  3.9% 
Scenario (ix): Missing data – no withinstudy correlation, high betweenstudy correlation; withinstudy variance similar to betweenstudy variance  
URMA  50  1000  0.004  0.102  0.010  95.6%  0.001  0.145  0.0228  94.2%  0 (0)  0.003 (6)       
BRMA  50  1000  0.004  0.101  0.010  95.8%  0.003  0.137  0.0203  94.7%  0.001 (0)  0.003 (0)  0.012  0.1%  35.6% 
URMA  10  1000  0.001  0.218  0.045  93.9%  0.005  0.263  0.0825  94.2%  0.006 (45)  0.005 (84)       
BRMA  10  997  0.001  0.222  0.045  96.5%  0.007  0.255  0.0797  95.6%  0.020 (0)  0.025 (0)  0.164  10.1%  60.2% 
Betweenstudy correlation
One can see from Table 4 that the normal model for BRMA often estimates the betweenstudy correlation, ρ _{ B }, as either +1 or 1, especially when the number of studies in the metaanalysis is small. For example, with n = 5 in scenario (ii), where the true ρ _{ B }was 0.8, 605 of the 1000 simulations (60.5%) gave $\widehat{\rho}$ _{ B }equal to +1 and 103 simulations (10.3%) gave $\widehat{\rho}$ _{ B }equal to 1. This led to a mean value of $\widehat{\rho}$ _{ B }equal to 0.639, a downward bias of about 20%. However, this downward bias is not in itself a concern because it is simply caused by the maximum likelihood estimator sensibly truncating $\widehat{\rho}$ _{ B }at +1 and 1, which improves the meansquare error of $\widehat{\rho}$ _{ B }. Furthermore, $\widehat{\rho}$ _{ B }is clearly asymptotically unbiased, with the occurrence of $\widehat{\rho}$ _{ B }equal to 1 or +1 and thus the bias in mean $\widehat{\rho}$ _{ B }becoming increasingly less as the number of studies in the metaanalysis increases (Table 4). Interestingly though, the number of studies required to reduce the occurrence of $\widehat{\rho}$ _{ B }equal to 1 or +1 was far greater when the withinstudy variation was large relative to the betweenstudy variation. For example, even with a large n = 50 studies in scenario (v), where the withinstudy variation was relatively large, 58% of the simulations gave a betweenstudy correlation of 1 or +1.
These findings indicate why $\widehat{\rho}$ _{ B }equals 1 in the BRMAs of the telomerase and CD4 datasets. For the telomerase data there are 10 studies; the simulations show that this magnitude of studies will often provide little information about ρ _{ B }, causing $\widehat{\rho}$ _{ B }to often be constrained at 1 or +1 so that the restrictions imposed on V are met. For the CD4 data, even though there are five more studies than telomerase, the mean withinstudy variance for endpoint j = 1 is 0.15 and this is large relative to the betweenstudy variation (${\widehat{\tau}}_{1}^{2}$ = 0.048). In such situations where the withinstudy variation dominates, the simulations again show that $\widehat{\rho}$ _{ B }will often require truncation at the end of its parameter space to ensure V is nonnegative definite.
Betweenstudy variance estimates
Our simulation results show that the betweenstudy variance estimates were less frequently truncated at zero in the BRMA than the URMA (Table 4). For example, in scenario (ii) with n = 5 ${\widehat{\tau}}_{2}^{2}$ was zero for 104 of the URMA simulations and none of the BRMA simulations. Furthermore, in those scenarios where the betweenstudy correlation was often +1 or 1 (e.g. scenario (ii) with n = 5), the normal BRMA model produces a noticeable upward bias in the betweenstudy variance estimates. For example, in scenario (ii) with n = 5 there was an upward bias of 0.024 in ${\widehat{\tau}}_{1}^{2}$ and ${\widehat{\tau}}_{2}^{2}$, about 10% above their true value. To understand analytically why this occurs, we need to consider that the betweenstudy covariance (τ _{12}) is formulated by τ _{12} = ρ _{ B } τ _{1} τ _{2}. Now, if $\widehat{\rho}$ _{ B }is constrained at 1 or +1, then to obtain the necessary solution for $\widehat{\tau}$ _{12} the maximum likelihood estimator can only increase the $\widehat{\tau}$ _{ j }s, which do not have an upper bound constraint. Thus the betweenstudy variance estimates are inflated to compensate for the constraint on $\widehat{\rho}$ _{ B }. This explains why the BRMAs of the telomerase and CD4 data, where $\widehat{\rho}$ _{ B }was truncated at 1, give $\widehat{\tau}$ _{ j }s that are noticeably larger than those from two separate URMAs. Practitioners need to be aware of this issue; however, we do not consider it a major concern as the maximum likelihood estimator for ${\tau}_{j}^{2}$ is still asymptotically unbiased (the bias decreases as the number of studies increases) and the inflation is simply caused by the sensible and necessary constraint on $\widehat{\rho}$ _{ B }. Furthermore, the inflation is essentially conservative, leading to a larger standard error and meansquare error of pooled estimates as now discussed.
Pooled estimates
For all complete and missing data scenarios, the pooled estimates were approximately unbiased for both BRMA and URMA. Even in those scenarios where $\widehat{\rho}$ _{ B }was often + 1 or 1 it is encouraging that, despite the upward bias in betweenstudy variances, there was no systematic bias in the pooled estimates from the BRMA (Table 4 and Appendix 1 – see Additional file 1). The main affect on the pooled estimates was an inflated standard error and meansquare error. This is a conservative property, but meant that in some complete data scenarios the BRMA performed slightly worse than URMA, despite the utilisation of correlation in the BRMA that might be expected to improve efficiency [10]. For example, in the n = 5 results for scenario (iii), where about 56% of simulations gave $\widehat{\rho}$ _{ B }equal to +1 or 1 and there was an upward bias in betweenstudy variances, the standard error/meansquare error of $\widehat{\beta}$ _{1} was larger in the BRMA (0.272/0.083) than the URMA (0.268/0.081). This explains the BRMA results for telomerase and CD4, where the standard errors of pooled estimates were larger in the BRMA than the URMA due to the inflated betweenstudy variances.
The coverage of β _{1} and β _{2} was between 93% and 98% in most scenarios assessed, and was often similar in URMA and BRMA. It is hard, though, to make general conclusions regarding coverage, as those situations where $\widehat{\rho}$ _{ B }is often +1 or 1 are the same situations where the tdistribution with n1 degrees of freedom is a poor approximation to the true sampling distribution. The true degrees of freedom to use here are complex and account for the withinstudy variances [12]; however this is rarely done in metaanalysis and is beyond the scope of this paper.
BRMA versus URMA for estimating the pooled values
For complete data, in most scenarios the BRMA was marginally superior to URMA as the pooled estimates had slightly smaller standard errors and meansquare errors, especially given large correlations (Table 4 and Appendix 1 – see Additional file 1). However, the URMA sometimes performed equally well, and occasionally even better in those scenarios where $\widehat{\rho}$ _{ B }was often +1 or 1 as discussed above. We also compared the subset of BRMA results where $\widehat{\rho}$ _{ B }did not equal +1 or 1 with the corresponding URMA results, and again found that BRMA was generally slightly superior to URMA. This finding agrees with previous algebraic results [10], that given complete data there is generally a very small benefit of BRMA over URMA for estimating β _{1} and β _{2} themselves. Our focus here is on the individual pooled estimates, but we note that there are also broader reasons why a BRMA may be preferred over URMA for complete data. These are summarised in the Discussion to ensure a more complete picture for practitioners considering BRMA.
For the missing data simulations, the pooled estimate for endpoint j = 2 was of particular interest because of the missing data for this endpoint. Encouragingly, the meansquare error and mean standard error of $\widehat{\beta}$ _{2} were much smaller in the BRMA than the URMA, although the coverage was comparable (Table 4 and Appendix 1 – see Additional file 1). For example, in the n = 10 simulations of scenario (xi) the mean standard error was 0.225 in the BRMA compared to 0.262 in the URMA, and the MSE was 0.0708 in the BRMA compared to 0.0921 in the URMA. The reduction in standard error and MSE was larger when both the within and betweenstudy correlations were high. Even when ρ _{ Wi }was zero there was still a reasonable benefit if ρ _{ B }was high; for example, in the n = 10 simulations of scenario (viii) the mean standard error of $\widehat{\beta}$ _{2} was 0.174 in the BRMA compared to 0.209 in the URMA. This finding agrees with algebraic work regarding the benefits of BRMA for when there are data missing at random [10]. Practitioners should again consider this benefit alongside the other broader reasons for using BRMA rather than URMA (see Discussion).
Extended simulations of the normal BRMA model
In our above simulations of the normal BRMA model we used nonnegative within and betweenstudy correlations; however, in reality negative correlations may arise as in the telomerase and CD4 examples. Also, our simulations took the withinstudy correlations to be the same in each study, while in reality their value may vary. Further simulations were thus performed to assess negative correlation and discrepant withinstudy correlations. These gave findings consistent with those identified previously (Appendix 2 – see Additional file 2); the BRMA was still beneficial over URMA for estimating the pooled endpoints, and where the betweenstudy correlation estimate was often +1 or 1 there was again an upward bias in the BRMA betweenstudy variance estimates.
simulation results for metaanalysis of proportions
Metaanalysis model  n  No. of the 1000 simulations that converged  Bias of mean $\widehat{\beta}$ _{1}  Mean s.e. of $\widehat{\beta}$ _{1}  MSE of $\widehat{\beta}$ _{1}  Coverage of the 95% CIs for $\widehat{\beta}$ _{1}  Bias of mean $\widehat{\beta}$ _{2}  Mean s.e. of $\widehat{\beta}$ _{2}  MSE of $\widehat{\beta}$ _{2}  Coverage of the 95% CIs for $\widehat{\beta}$ _{2}  Bias of mean ${\widehat{\tau}}_{1}^{2}$(no. of ${\widehat{\tau}}_{1}^{2}$= 0)  Bias of mean ${\widehat{\tau}}_{2}^{2}$(no. of ${\widehat{\tau}}_{2}^{2}$= 0)  Bias of mean $\widehat{\rho}$ _{ B }  % of $\widehat{\rho}$ _{ B }= 1  % of $\widehat{\rho}$ _{ B }= 1 

Complete data – we set 25 true positives and 25 true negatives in each study, with the pooled logitsensitivity and pooled logitspecificity set at 1.386 (i.e. a sensitivity and specificity of 0.8). The betweenstudy correlation was set as 0.8, and the betweenstudy variation was set as 1 for both sensitivity and specificity. Simulations were performed as in Chu and Cole [12].  
Normal URMA  10  995  0.152  0.311  0.120  93.1%  0.174  0.310  0.119  93.2%  0.313 (17)  0.309 (5)       
Normal BRMA  10  995  0.124  0.318  0.118  93.8%  0.144  0.317  0.115  94.5%  0.245 (7)  0.245 (2)  0.067  53.7%  0% 
Generalised URMA  10  1000  0.013  0.330  0.124  93.8%  0.037  0.328  0.116  94.0%  0.186 (20)  0.189 (21)       
Generalised URMA  10  603*  0.012  0.339  0.119  94.9%  0.030  0.342  0.118  95.8%  0.136 (0)  0.114 (0)       
Generalised BRMA  10  603  0.009  0.339  0.119  95.4%  0.029  0.341  0.116  96.2%  0.134 (0)  0.113 (0)  0.084  0%  0% 
Normal URMA  50  1000  0.175  0.141  0.049  77.2%  0.182  0.141  0.050  75.5%  0.337 (0)  0.335 (0)       
Normal BRMA  50  1000  0.151  0.142  0.042  81.4%  0.157  0.143  0.043  81.3%  0.285 (0)  0.282 (0)  0.087  17.0%  0% 
Generalised URMA  50  1000  0.018  0.157  0.024  95.1%  0.026  0.157  0.023  96.1%  0.091 (0)  0.084 (0)       
Generalised URMA  50  973*  0.019  0.157  0.024  95.1%  0.022  0.157  0.023  96.1%  0.087 (0)  0.080 (0)       
Generalised BRMA  50  973  0.016  0.157  0.024  96.0%  0.020  0.158  0.023  96.2%  0.078 (0)  0.071 (0)  0.019  0%  0% 
A generalised model for BRMA of proportions
So far in this paper we have modelled the summary statistics across studies, i.e. the Y _{ ij }s, and assumed they are normally distributed. Indeed, in our main simulations we generated the Y _{ ij }s directly from the normal BRMA model of equation (1); thus, our conclusions are only valid for Y _{ ij }s that can truly be assumed normally distributed. This normality assumption is common in the metaanalysis field, and will often be suitable (see Discussion). However, in our two motivating examples it is more plausible for the CD4 data than the telomerase data as the latter involves a metaanalysis of proportions, for which the normality assumption is not appropriate when some studies have a small number of patients or the proportions are close to 0 or 1. For this reason recent articles [12] suggest that, rather than modelling the logitproportions using the normal distribution, one should directly model the binary data using a binomial distribution. This approach also avoids the use of ad hoc continuity corrections in those studies which have zero cells. In terms of diagnostic studies, this generalised model for BRMA of sensitivity and specificity can be written as follows [12]:
no. testing positive_{ i }~Binomial (total no. true positives_{ i }, sensitivity_{ i }) logit(sensitivity_{ i }) = β _{1} + u _{1}
no. testing negative_{ i }~Binomial (total no. true negatives_{ i }, specificity_{ i }) logit(specificity_{ i }) = β _{2} + u _{2}
$\begin{array}{cc}\left(\begin{array}{c}{u}_{1}\\ {u}_{2}\end{array}\right)~N\left[\left(\begin{array}{c}0\\ 0\end{array}\right),\mathbf{\Omega}\right],& \mathbf{\Omega}=\left(\begin{array}{cc}{\tau}_{1}^{2}& {\tau}_{1}{\tau}_{2}{\rho}_{B}\\ {\tau}_{1}{\tau}_{2}{\rho}_{B}& {\tau}_{2}^{2}\end{array}\right)\end{array}\phantom{\rule{0.5em}{0ex}}\left(3\right)$
Equation (3) can be fitted using maximum likelihood estimation in SAS NLMIXED. Chu and Cole [12] show that where the true sensitivity and specificity are large, this generalised BRMA model produces close to unbiased pooled and betweenstudy correlation estimates, whereas the general normal BRMA model produces somewhat biased estimates. This can also be seen in our simulations results for metaanalysis of proportions in Table 5. The meansquare error, coverage, and, most noticeably, bias of estimates are far superior in the generalised BRMA than the normal BRMA. Furthermore, the generalised BRMA is also marginally superior in terms of bias to two separate generalised URMAs (i.e. equation (3) where ρ _{ B }= 0), emphasising that the BRMA is also beneficial over URMA in the generalised model framework. Note though that, although it is the best method, the generalised BRMA model is itself not without bias (Table 5); to rectify this, extension to REML or other estimation techniques is potentially important.
To conclude our research we applied the generalised BRMA model to the telomerase data. Unfortunately the model would not converge appropriately; different starting values all produced a betweenstudy correlation estimate of 1 but gave markedly different parameter estimates and caused spurious standard errors. For example, for one set of starting values the model gave the standard error of $\widehat{\beta}$ _{1} as 30.5, whereas for another set the standard error was close to zero. Indeed SAS provides the following warning: 'the final Hessian matrix is not positive definite, and therefore the estimated covariance matrix is not full rank and may be unreliable'. The problem here is again due to the betweenstudy correlation of estimate 1 in Ω, as this causes the determinant of $\widehat{\mathbf{\Omega}}$ to be zero. This has greater implications in equation (3) than for the normal BRMA model of equation (1). The maximum likelihood estimator for equation (1) involves the determinant of δ _{ i }+ $\widehat{\mathbf{\Omega}}$ in each study, which will not be zero unless the withinstudy correlations are also +1 or 1. However, in equation (3) the maximum likelihood estimator involves the determinant of $\widehat{\mathbf{\Omega}}$ itself, which causes problems akin to dividing by zero, which is why spurious estimates and standard errors are produced. It is clear that there is simply little information to estimate the betweenstudy correlation for the telomerase dataset, due to the small number of studies. This issue is also evident in our n = 10 simulations of the generalised BRMA model (Table 5), where 397 of the 1000 simulations did not converge appropriately. In such situations where estimating ρ _{ B }is difficult, application of two generalised URMAs may be the most appropriate option available (Table 3), although specifically for diagnostic studies other methods may also be valuable [21].
Discussion
Multivariate metaanalysis models are increasingly used to synthesise multiple, correlated endpoints of interest, especially in studies of diagnosis [5, 21] and surrogate outcomes [7, 8]. The Campbell Collaboration suggests that metaanalysts 'should not ignore the dependence among study outcomes'; however, they also note that 'the consequences of accounting for (modelling) dependence or ignoring it are not well understood' [22]. To therefore aid practitioners considering the approach, in this paper we have examined two models for BRMA and compared them to separate univariate syntheses, the traditional approach. We now discuss the main conclusions from our work and suggest future research priorities.
The general normal model for BRMA
A normal metaanalysis model is appropriate when the Y _{ ij }s can be assumed normally distributed; this assumption is commonly used, for example where the Y _{ ij }s are logodds ratios [15], loghazard ratios [10], mean differences [1] and logevent rates [11].
Betweenstudy covariance parameters
It is clear from our work that maximum likelihood estimation of a normal randomeffects metaanalysis model will often truncate the betweenstudy covariance matrix, Ω, on the boundary of its parameter space. For URMA this is observed by a betweenstudy variance estimate of zero, whilst in BRMA it is more likely observed by a betweenstudy correlation of +1 or 1. Practitioners are likely to be familiar with the concept of zero variance, but are perhaps less likely to appreciate why a correlation is estimated at unity. However, both arise for the same reason, namely that Ω must be a nonnegative definite matrix such that ${\tau}_{j}^{2}$ is not < 0 and ρ _{ B }is not > +1 or < 1. Our simulations show that, especially when the number of studies is small and/or the withinstudy variance is large relative to the betweenstudy variance, such truncation is often necessary to ensure the sensible restrictions are met. In the normal BRMA, we have also shown that a consequence of $\widehat{\rho}$ _{ B }being truncated at +1 or 1 is an upward bias in betweenstudy variance estimates, which are inflated upwards to compensate for the restriction on $\widehat{\rho}$ _{ B }. Practitioners should not, though, be overly concerned by this. We have shown it does not cause any systematic bias in the pooled estimates from BRMA, and it leads to conservative standard errors and meansquare errors.
The benefits over URMA for the pooled estimates
Our simulation results highlight that a normal model for BRMA is preferable to two separate URMAs for estimating the pooled endpoints, and our results are consistent with previous findings that show how the inclusion of correlation allows the 'borrowing of strength' across endpoints [1, 10, 19]. We thus recommend practitioners use a BRMA rather than two separate URMAs where possible. In particular, when some data are missing at random the BRMA is likely to produce a much smaller standard error and meansquare error of pooled estimates than URMA, even for moderate correlations. Riley et al. [10] give an applied example that shows this. For complete data, practitioners should not expect to see much gain in statistical efficiency over URMA; the meansquare error and standard error of pooled estimates are generally only marginally smaller in BRMA than URMA, and on the occasion of $\widehat{\rho}$ _{ B }= +1 or 1 they may even be slightly worse in BRMA (due to the inflated betweenstudy variances, as in the telomerase and CD4 examples). However, there are broader reasons why BRMA may still be preferable in this situation (see below).
The generalised model for BRMA
In equation (3) we extended our work to a generalised BRMA model for metaanalysis of two proportions. For synthesis of two proportions, like sensitivity and specificity, this approach is preferable to the general normal BRMA model (Table 5) because the normality assumption breaks down when the proportions are close to 0 or 1 and when there are small patient numbers [12]. It also avoids the use of adhoc continuity corrections when there are zero cells in some studies. Practitioners synthesising diagnostic studies are thus encouraged to use the generalised BRMA model, rather than the normal model or indeed two separate generalised URMAs. However, they should also be aware that a betweenstudy correlation estimate of +1 or 1 in the generalised BRMA model is likely to be associated with nonconvergence and unstable pooled estimates, as discussed for the telomerase data. In such situations there may be little information to estimate the correlation, and so practitioners may wish to consider other methods for synthesising diagnostic studies, such as the hierarchical summary receiver operating characteristic (HSROC) method [21]; if this is also not possible then the best option may be a generalised URMA for sensitivity and specificity separately (Hamza et al., personal communication).
The broader benefits of BRMA
Our simulations focused mainly on the benefits of BRMA over URMA for estimating the pooled endpoints. However, there are also broader reasons why a BRMA may be preferable to URMA, for either complete or missing data. For example, BRMA allows one to describe the bivariate relationship between endpoints [4, 5], model, test or make predictions from their association [7], and estimate some function of the two pooled endpoints, like β _{1}  β _{2} [10], β _{1} + β _{2} [5], or β _{1}/β _{2} [6]. For instance, for the telomerase data, a BRMA enables a single framework to estimate the pooled sensitivity, pooled specificity, and the pooled diagnostic odds ratio (exp(β _{1} + β _{2})) [21]. Furthermore, Reitsma et al. [5] show that a BRMA of diagnostic studies enables the correlation between pooled endpoints to be estimated, which allows one to measure the shape of their bivariate relation and construct confidence ellipses. It also allows calculation of the conditional variance in one parameter given a fixed value of the other parameter, and allows drawing of the summary ROC curve. In terms of the CD4 data, the estimated correlation between pooled endpoints from a BRMA enables one to predict the time of onset of AIDS or death from a future patient's CD4 level. Thus the BRMA can help establish whether CD4 should be used as a surrogate of diseasefree survival [7, 8], and further research of multivariate metaanalysis in this context is recommended. A BRMA may also be extended to a bivariate metaregression by including additional studylevel covariates that explain the betweenstudy heterogeneity. For example, one may wish to include a covariate for study quality in metaanalysis of diagnostic studies [23]. Berkey et al [1] show that a bivariate metaregression is more efficient than separate univariate metaregressions for assessing such studylevel covariates, again due to the inclusion of correlation.
Further research suggestions
For the general normal BRMA model, the role of estimation techniques other than REML would be interesting to consider, especially as other potentially better options have just been proposed [24]. For the generalised BRMA model, SAS NLMIXED currently only allows maximum likelihood estimation and so extension to REML is required, especially as the maximum likelihood estimates are not without bias (Table 5). Further research of multivariate metaanalysis within a Bayesian framework is also potentially important, as it would enable the incorporation of prior knowledge about the parameters, which may be valuable when the number of studies is small [15, 25]. It would also allow the uncertainty of the ${s}_{i1}^{2}$ s, ${s}_{i2}^{2}$ s and ρ _{ Wi }s to be taken into account, as in practice they will only be estimates themselves as mentioned by Daniels and Hughes [7]. Further assessment of the role of the withinstudy correlations is also required, in particular what should we do when they are nonzero but unavailable? For the metaanalysis of surrogate endpoints it has been suggested that the withinstudy correlations are likely to be small (between 0 and 0.2) and can plausibly be considered constant across studies [7], or even zero [26]. However, this is not necessarily true in other fields; for example, in a multivariate metaanalysis of longitudinal data the withinstudy correlations varied between 0.48 and 0.97 (Jones et al., personal communication).
The use of individual patient data (IPD) in multivariate metaanalysis should also be considered, especially as IPD is the goldstandard for metaanalysis [27] and it would allow any unavailable withinstudy correlations to be calculated directly [7]. In practice though, IPD may only be available for a proportion of studies, and so methods for multivariate metaanalysis are required that combine IPD and aggregate data [27, 28]. There has also be little consideration of how to assess dissemination bias using a multivariate metaanalysis framework [29, 30], and this warrants attention as metaanalysis datasets are often fraught with such issues as publication bias [31] and withinstudy selective reporting [32, 33]. In such scenarios some of the missing endpoints may not be missing at random, and so sensitivity analyses to assess how the metaanalysis results change under a variety of missing data assumptions would be potentially valuable [34].
Conclusion
In this paper we have used analytic reasoning, two applied examples and a realistic simulation study to highlight the benefits of a normal model for BRMA over two separate URMAs, and explain why the betweenstudy correlation is often estimated as +1 or 1. For metaanalysis of proportions, we also extended our work to a generalised model for BRMA, to ensure the binary data is modelled correctly. Our work adds to a growing body of literature indicating the rationale and benefits of a multivariate approach to metaanalysis, and we encourage metaanalysts to consider the approach in practice.
Abbreviations
 BRMA:

Bivariate randomeffects metaanalysis
 IPD:

Individual patient data
 URMA:

Univariate randomeffects metaanalysis
 REML:

Restricted maximum likelihood
Declarations
Acknowledgements
Richard Riley is funded by the Department of Health National Coordinating Centre for Research Capacity Development as a Postdoctoral Research Scientist in Evidence Synthesis. We would like to thank the three reviewers of our manuscript, whose comments have substantially improved the content of this paper. We also greatly thank Haitao Chu, who kindly provided SAS simulation code that enabled us to assess the generalised bivariate model.
Authors’ Affiliations
References
 Berkey CS, Hoaglin DC, AntczakBouckoms A, Mosteller F, Colditz GA: Metaanalysis of multiple outcomes by regression with random effects. Stat Med. 1998, 17 (22): 25372550. 10.1002/(SICI)10970258(19981130)17:22<2537::AIDSIM953>3.0.CO;2C.View ArticlePubMedGoogle Scholar
 Hasselblad V: Metaanalysis of multitreatment studies. Med Decis Making. 1998, 18 (1): 3743.View ArticlePubMedGoogle Scholar
 Becker BJ, Tinsley HEA, Brown S: . Multivariate Metaanalysis. 2000, San Diego , Academic PressView ArticleGoogle Scholar
 Van Houwelingen HC, Arends LR, Stijnen T: Advanced methods in metaanalysis: multivariate approach and metaregression. Stat Med. 2002, 21 (4): 589624. 10.1002/sim.1040.View ArticlePubMedGoogle Scholar
 Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH: Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005, 58 (10): 982990. 10.1016/j.jclinepi.2005.02.022.View ArticlePubMedGoogle Scholar
 Thompson JR, Minelli C, Abrams KR, Tobin MD, Riley RD: Metaanalysis of genetic studies using Mendelian randomizationa multivariate approach. Stat Med. 2005, 24 (14): 22412254. 10.1002/sim.2100.View ArticlePubMedGoogle Scholar
 Daniels MJ, Hughes MD: Metaanalysis for the evaluation of potential surrogate markers. Stat Med. 1997, 16 (17): 19651982. 10.1002/(SICI)10970258(19970915)16:17<1965::AIDSIM630>3.0.CO;2M.View ArticlePubMedGoogle Scholar
 Gail MH, Pfeiffer R, Van Houwelingen HC, Carroll RJ: On metaanalytic assessment of surrogate outcomes. Biostatistics. 2000, 1 (3): 231246. 10.1093/biostatistics/1.3.231.View ArticlePubMedGoogle Scholar
 Kalaian HA, Raudenbush SW: A Multivariate Mixed Linear Model for MetaAnalysis. Psychological Methods. 1996, 1(3): 227235. 10.1037/1082989X.1.3.227.View ArticleGoogle Scholar
 Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR: An evaluation of bivariate randomeffects metaanalysis for the joint synthesis of two correlated outcomes. Statistics in Medicine. 2007, 26: 7897. 10.1002/sim.2524.View ArticlePubMedGoogle Scholar
 Arends LR, Voko Z, Stijnen T: Combining multiple outcome measures in a metaanalysis: an application. Stat Med. 2003, 22 (8): 13351353. 10.1002/sim.1370.View ArticlePubMedGoogle Scholar
 Chu H, Cole SR: Bivariate metaanalysis for sensitivity and specificity with sparse data: a generalized linear mixed model approach (letter to the Editor). Journal of Clinical Epidemiology. 2006, (in press):Google Scholar
 Glas AS, Roos D, Deutekom M, Zwinderman AH, Bossuyt PM, Kurth KH: Tumor markers in the diagnosis of primary bladder cancer. A systematic review. J Urol. 2003, 169 (6): 19751982. 10.1097/01.ju.0000067461.30468.6d.View ArticlePubMedGoogle Scholar
 Hardy RJ, Thompson SG: A likelihood approach to metaanalysis with random effects. Stat Med. 1996, 15 (6): 619629. 10.1002/(SICI)10970258(19960330)15:6<619::AIDSIM188>3.0.CO;2A.View ArticlePubMedGoogle Scholar
 Nam IS, Mengersen K, Garthwaite P: Multivariate metaanalysis. Stat Med. 2003, 22 (14): 23092333. 10.1002/sim.1410.View ArticlePubMedGoogle Scholar
 Berrington A, Cox DR: Generalized least squares for the synthesis of correlated information. Biostatistics. 2003, 4 (3): 423431. 10.1093/biostatistics/4.3.423.View ArticlePubMedGoogle Scholar
 Raudenbush SW, Becker BJ, Kalaian H: Modeling multivariate effect sizes. Psychological Bulletin. 1988, 103 (1): 111120. 10.1037/00332909.103.1.111.View ArticleGoogle Scholar
 Gentle JE, Gentle JE: Cholesky Factorization. Numerical Linear Algebra for Applications in Statistics. 1998, Berlin , SpringerVerlag, 9395.View ArticleGoogle Scholar
 Sohn SY: Multivariate metaanalysis with potentially correlated marketing study results. Naval Research Logistics. 2000, 47: 500510. 10.1002/15206750(200009)47:6<500::AIDNAV3>3.0.CO;2Z.View ArticleGoogle Scholar
 Follmann DA, Proschan MA: Valid inference in random effects metaanalysis. Biometrics. 1999, 55 (3): 732737. 10.1111/j.0006341X.1999.00732.x.View ArticlePubMedGoogle Scholar
 Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JA: A unification of models for metaanalysis of diagnostic accuracy studies. Biostatistics. 2006Google Scholar
 Becker BJ, Hedges LV, Pigott TD: Campbell Collaboration Statistical Analysis Policy Brief. A Campbell Collaboration resource document (available at http://wwwcampbellcollaborationorg/ECG/policy_statasp). 2004Google Scholar
 Westwood ME, Whiting PF, Kleijnen J: How does study quality affect the results of a diagnostic metaanalysis?. BMC Med Res Methodol. 2005, 5 (1): 2010.1186/14712288520.View ArticlePubMedPubMed CentralGoogle Scholar
 Sidik K, Jonkman JN: A comparison of heterogeneity variance estimators in combining results of studies. Stat Med (in press).
 Abrams KR, Sutton AJ, Cooper NJ, Sculpher MPS, Ginlley L, Robinson M: Populating economic decision models using metaanalyses of heterogenously reported studies augmented with expert beliefs. Technical Report 0506, Centre for Biostatistics and Genetic Epidemiology, University of Leicester. 2005Google Scholar
 Korn EL, Albert PS, McShane LM: Assessing surrogates as trial endpoints using mixed models. Stat Med. 2005, 24 (2): 163182. 10.1002/sim.1779.View ArticlePubMedGoogle Scholar
 Riley RD, Look MP, Simmonds MC: Combining individual patient data and aggregate data in evidence synthesis: a systematic review identified current practice and possible methods. Journal of Clinical Epidemiology (in press). 2007Google Scholar
 Goldstein H, Yang M, Omar RZ, Turner RM, Thompson SG: Metaanalysis using multilevel models with an application to the study of class size effects. Applied Statistics. 2000, 49: 399412.Google Scholar
 Riley RD, Sutton AJ, Abrams KR, Lambert PC: Sensitivity analyses allowed more appropriate and reliable metaanalysis conclusions for multiple outcomes when missing data was present. Journal of Clinical Epidemiology. 2004, 57(9): 911924. 10.1016/j.jclinepi.2004.01.018.View ArticleGoogle Scholar
 Jackson D, Copas J, Sutton AJ: Modelling reporting bias: the operative mortality rate for ruptured abdominal aortic aneurysm repair . Journal of the Royal Statistical Society, Series A. 2005, 168: 737752.View ArticleGoogle Scholar
 Sterne JA, Egger M, Smith GD: Systematic reviews in health care: Investigating and dealing with publication and other biases in metaanalysis. BMJ. 2001, 323 (7304): 101105. 10.1136/bmj.323.7304.101.View ArticlePubMedPubMed CentralGoogle Scholar
 Hahn S, Williamson PR, Hutton JL, Garner P, Flynn EV: Assessing the potential for bias in metaanalysis due to selective reporting of subgroup analyses within studies. Stat Med. 2000, 19 (24): 33253336. 10.1002/10970258(20001230)19:24<3325::AIDSIM827>3.0.CO;2D.View ArticlePubMedGoogle Scholar
 Hutton JL, Williamson PR: Bias in metaanalysis due to outcome variable selection within studies. Appl Stat. 2000, 49: 359370.Google Scholar
 Copas J, Shi JQ: Metaanalysis, funnel plots and sensitivity analysis. Biostatistics. 2000, 1 (3): 247262. 10.1093/biostatistics/1.3.247.View ArticlePubMedGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/7/3/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.