Quantifying, displaying and accounting for heterogeneity in the metaanalysis of RCTs using standard and generalised Qstatistics
 Jack Bowden^{1, 2}Email author,
 Jayne F Tierney^{1},
 Andrew J Copas^{1} and
 Sarah Burdett^{1}
DOI: 10.1186/147122881141
© Bowden et al; licensee BioMed Central Ltd. 2011
Received: 26 November 2010
Accepted: 07 April 2011
Published: 07 April 2011
Abstract
Background
Clinical researchers have often preferred to use a fixed effects model for the primary interpretation of a metaanalysis. Heterogeneity is usually assessed via the well known Q and I ^{2} statistics, along with the random effects estimate they imply. In recent years, alternative methods for quantifying heterogeneity have been proposed, that are based on a 'generalised' Q statistic.
Methods
We review 18 IPD metaanalyses of RCTs into treatments for cancer, in order to quantify the amount of heterogeneity present and also to discuss practical methods for explaining heterogeneity.
Results
Differing results were obtained when the standard Q and I ^{2} statistics were used to test for the presence of heterogeneity. The two metaanalyses with the largest amount of heterogeneity were investigated further, and on inspection the straightforward application of a random effects model was not deemed appropriate. Compared to the standard Q statistic, the generalised Q statistic provided a more accurate platform for estimating the amount of heterogeneity in the 18 metaanalyses.
Conclusions
Explaining heterogeneity via the prespecification of trial subgroups, graphical diagnostic tools and sensitivity analyses produced a more desirable outcome than an automatic application of the random effects model. Generalised Q statistic methods for quantifying and adjusting for heterogeneity should be incorporated as standard into statistical software. Software is provided to help achieve this aim.
Background
Metaanalysis provides a way of quantitatively synthesising the results of medical studies or trials that target a particular research question. As shown in a 2005 review of the clinical research literature [1], it is still most common to metaanalyse results across clinical studies using the inverse variance approach, to yield a 'fixed' or 'common' effect estimate. By obtaining individual patient data (IPD) from all trials in a metaanalysis, some aspects of clinical heterogeneity can be minimised through data cleaning [2]. However, regardless of whether the metaanalysis is based on IPD or aggregate data, substantial statistical heterogeneity between studies may still remain.
Cochran's Q statistic has long been used to assess statistical heterogeneity in metaanalysis. When Q is larger than its expected value E[Q] under the null hypothesis of no heterogeneity, the difference Q  E[Q] can be used to furnish the most popular estimate of the heterogeneity parameter, using the DerSimonian and Laird method [3]. Higgins and Thompson's I ^{2} statistic [4, 5] is also a simple function of Q and quantifies the proportion of total variation that is between trial heterogeneity. Unlike Q, I ^{2} is designed to be independent of the number of trials constituting the metaanalysis and independent of the outcome's scale, so it can easily be compared across metaanalyses. It is now reported as standard, with or without Cochran's Q.
The presence of significant and substantial heterogeneity demands some form of action. Ideally, after exploration of the data, heterogeneity can be explained by variation in the constituent trial's characteristics. If this is not possible then some may feel a metaanalysis inappropriate altogether, whereas some would opt for fitting a random effects model to the data instead. There is no accepted rule for deciding on when a move from a fixed to a random effects model is the right course of action [6]. Clearly, all other things being equal, the larger the magnitude of the heterogeneity the stronger the case for a shift. However, as the amount of heterogeneity increases, so too does the potential impact of moving from one model to the other. Thus, with increasingly diverging interpretations, it is sometimes very difficult to make a satisfactory decision on which model to choose, or indeed whether to pool the trials in a metaanalysis at all.
In Methods we review the standard approach to metaanalysis and heterogeneity quantification based on the Q statistic. We then introduce a similar approach based on a 'generalised Q' statistic that has recently been proposed. In Results we analyse the summary data from 18 separate IPD metaanalyses to see whether the original conclusions could have been sensitive to the choice of fixed or random effects model. A more indepth analysis is then conducted on the two metaanalyses with the largest observed heterogeneity. The 18 metaanalysis are then used to illustrate the relative performance of the standard and generalised Q statistics in measuring the extent of heterogeneity present. Finally, in Discussion and Conclusions we review the issues raised and offer recommendations for the future quantification and reporting of heterogeneity in metaanalysis.
The data
The summary statistics for 18 metaanalyses carried out by the MAG.
Metaanalysis  # trials  Q, Pvalue  (%)  Fixed Effect HR (CI) Pvalue 

cervix 1 [15]  18  44.48, 0.00  62  1.05 (0.931.19) 0.39 
cervix 2 [17]  18  20.83 0.23  18  0.76 (0.670.85) 0.00 
cervix 3 [15]  5  9.18, 0.06  56  0.65 (0.530.80) 0.00 
bladder 1 [14]  9  7.27, 0.51  0  0.91 (0.831.01) 0.08 
bladder 2 [16]  6  2.25, 0.81  0  0.75 (0.600.96) 0.02 
nsclc 1 [8]  17  28.98, 0.02  45  1.04 (0.961.12) 0.33 
nsclc 2 [8]  7  3.63, 0.73  0  0.98 (0.831.14) 0.76 
nsclc 3 [8]  25  22.32, 0.56  0  0.90 (0.830.97) 0.01 
nsclc 4 [8]  11  39.63, 0.00  75  0.84 (0.740.95) 0.01 
ovarian 1 [7]  19  21.92, 0.24  18  0.98 (0.911.06) 0.69 
ovarian 2 [7]  11  12.83, 0.23  22  0.93 (0.831.05) 0.23 
ovarian 3 [10]  9  14.78, 0.06  46  0.88 (0.790.98) 0.02 
ovarian 4 [10]  9  10.35, 0.24  23  0.91 (0.801.05) 0.21 
ovarian 5 [10]  12  2.57, 1.00  0  1.02 (0.931.12) 0.66 
port [11]  9  13.06, 0.11  39  1.21 (1.081.34) 0.00 
sarcoma [9]  14  11.80, 0.54  0  0.89 (0.761.03) 0.12 
oeso [12]  6  10.37, 0.07  52  0.89 (0.781.01) 0.06 
glioma [13]  12  13.29, 0.27  17  0.85 (0.780.92) 0.00 
Methods
The ϵ _{ i }term relates to the precision of study i's estimate, and is assumed to follow a N (0, ) distribution.
where W _{ i } = 1/ is study i's precision.
Heterogeneity quantification using the standard Qstatistic
where , and is referred to as the 'typical' within study variance.
when Q > M  1.
From a philosophical perspective, fixed effect and random effects estimates target very different quantities. Fixed effect models estimate the weighted mean of the study estimates, whereas random effects models estimate the mean of a distribution from which the study estimates were sampled. However, if model (1) is correct and we are additionally willing to assume that the u _{ i } terms are independent of the ϵ _{ i }terms, then they should both provide estimates of the same parameter θ. Another consequence of this independence assumption is that the individual study estimates should be independent of the ϵ _{ i }terms, and hence we do not expect the magnitude of the effect estimate to be correlated with its precision.
Heterogeneity quantification using a 'generalised' Qstatistic
where and where is also calculated from equation (2) by replacing W _{ i } with . Like the standard Q statistic in equation (3), this also follows a distribution under the null hypothesis of no heterogeneity. Paule and Mandel [23] (PM) and DerSimonian and Kacker [22] propose to estimate τ ^{2} by iterating equation (5) until Q(τ ^{2}) equals its expected value of M1; this estimate will be referred to as . DerSimonian and Kacker recommend using since it is still very easy obtain, is guaranteed to have at most one solution and provides a more accurate estimate of τ ^{2} that closely mirrors both the REML estimate and the generalized Bayes estimate [24], which are both much harder quantities to obtain computationally.
Viechtbauer [25] suggests that equation (5) can additionally be used to provide an αlevel confidence set for , by finding the values of τ ^{2} that equate Q(τ ^{2}) with the α/2th and 1 α/2th percentiles of the distribution. He showed that this method performed very well in a simulation study that evaluated its coverage properties compared to a range of other methods  such as Biggerstaff and Tweedie [26] and Sidik and Jonkman [27]  primarily because it is based on an exact χ ^{2} distribution, rather than a distributional approximation.
for any estimate of the between study variance . From now on we will refer to Inconsistency statistics specifically utilising the DL method as and those specifically utilising the PM method as . The term I ^{2} will be reserved for discussing the general concept of Inconsistency.
Reference intervals for and
where and represent the values of τ ^{2} equating Q(τ ^{2}) to the lower α/2 and upper 1α/2 percentiles of the relevant χ ^{2} distribution.
Results
A standard Qstatistic analysis
One could use the reference intervals around to directly test for the presence of heterogeneity, as apposed to Q; a strategy suggested by Medina et. al. [28]. From Figure 1 we see that only 3 out of the 7 metaanalyses with significant Q statistics produced significant statistics at the 10% level. Since Q and are so closely related it is perhaps surprising to some reviewers that such differing conclusions could arise.
From Figure 1, the two metaanalyses with the most apparent statistical heterogeneity were NSCLC 4 [8] and Cervix 1 [15]. They also exhibit the most marked differences between their fixed and random effects estimates, as highlighted by large deviations from the diagonal  shown in red in Figure 2. These two metaanalyses are now discussed further, in order to demonstrate how we chose to investigate these heterogeneous data sets.
The NSCLC 4 metaanalysis
This metaanalysis compared the effectiveness of supportive care plus chemotherapy versus supportive care alone for patients with advanced nonsmall cell lung cancer. The fixed effect hazard ratio estimate of 0.84 suggests a substantial and highly significant benefit from the addition of chemotherapy with a pvalue for a null effect of 0.005. The random effects model estimate of 0.77 suggested an even more extreme benefit of chemotherapy. However, such was the magnitude of heterogeneity detected  as revealed by an I ^{2} of 75%  this estimate is attributed much less certainty, with a pvalue of 0.04.
Subgroup analyses for the two examples.
Trial Group  # trials  Q, Pvalue, (%)  Fixed Effect HR (CI) Pvalue  Random Effects HR (CI) Pvalue 

NSCLC data  
all  11  39.6 (1.97e05) 74.8  0.84 (0.740.95) 5.42e03  0.77 (0.590.99) 0.042 
Cisplatin  8  22.2 (2.34e03) 68.5  0.73 (0.630.85) 6.63e05  0.70 (0.530.93) 0.014 
Q _{ int } = 39.62  (22.20 + 8.72) = 8.70 (p = 0.003)  
all*  11  0.84 (0.611.16) 0.21  
Cervix data  
>14 days  11  12.76 (0.24) 22  1.25, (1.07,1.46) 0.005  1.27 (1.06,1.53) 0.0099 
≤ 14 days  7  20.74 (0.002) 71  0.83, (0.69,1.00) 0.046  0.87 (0.60,1.25) 0.44 
Q _{ int } = 44.48  (12.76 + 20.74) = 10.98 (p = 9e04) 
The Cervix 1 metaanalysis
Significant heterogeneity persisted in the results of trials using shorter chemotherapy cycles. The fixed effect result suggested a modest benefit from short cycle chemotherapy, whereas the random effects model suggested less of an effect and a much wider confidence interval overlapping the null effect of 1. However our conclusions were also guided by a sensitivity analysis of the shorter duration trials, excluding the MRC CeCa trial. Figure 4 (right) shows a Baujat plot [32, 33] of the data; on the horizontal axis is the contribution of each study to the overall Q statistic in equation (3), on the vertical axis is the difference between the fixed effect estimate with and without each study, standardised by the total variance of the fixed effect estimate without that study. If the fixed effects model is correct, each point's horizontal component should be approximately distributed. The CeCa trial is way out on its own, whereas the other trials all fall within the 95th percentile of this distribution. Thus the total heterogeneity present is very much a product of this single trial. Furthermore, the CeCa trial's large vertical component shows that its inclusion Significantly alters the fixed effects estimate too. Excluding the CeCa trial gave a fixedeffect result still favouring short cycle chemotherapy (HR = 0.76, 95%CI = 0.620.92) and heterogeneity was much reduced. Repeating the sensitivity analysis using a random effects model gave very similar results (HR = 0.75,95%CI = 0.580.95).
A generalisedQanalysis
A simulation study
The point estimates and confidence intervals for differ from the original  in particular the confidence intervals for are noticeably wider. In order to see if this extra width truly reflected the uncertainty in the estimation of , or instead if it was overconservative, we conducted twelve simulation studies, each one based on the characteristics of a metaanalysis which exhibited some heterogeneity (from 'Glioma' to 'NSCLC 4'). From each one we took the number of studies M, within study variances σ ^{2} and the DL heterogeneity estimate . For metaanalysis j, j = 1, ..., 12, we then simulated 10,000 new metaanalyses of M _{ j } study estimates for i = 1, ... M _{ j } . The choice of θ = 0 is clearly unimportant. Since the within study variances and the true τ ^{2} values were held fixed, the true value of I ^{2} stayed fixed at the original value reported in Table 1, and . We then calculated the proportion of 95% reference intervals for and that contained the true value. Figure 6 (left) shows the results. Higgins and Thompson's reference interval appears to exhibit suboptimum coverage, which is especially clear when the true value of I ^{2} is large. Reference intervals for based on equation (6) appear to well maintain the desired coverage across all 12 simulation scenarios.
Discussion
NSCLC 4 and Cervix 1
As mentioned in Methods, in the presence of heterogeneity we still expect fixed and random effects estimates to be targeting a single quantity. However, in Results the two metaanalyses with the largest heterogeneity also showed that largest empirical differences between and . The NSCLC 4 data was a good example of this, being the metaanalysis with the largest outward heterogeneity, but with also clear funnel plot asymmetry. If we had been ignorant as to the type of chemotherapy used in each study, and therefore had no way of explaining the heterogeneity, we would perhaps have considered applying a random effects model, despite suspecting small study effects. Random effects estimation in this context can start to look considerably less attractive, because gives more (rather than less) relative weight to the smaller studies than since for any study i, W _{ i } ≥ , a fact first highlighted by Greenland [34]. This has lead some to propose bias adjustment procedures to counteract small study effects [35–37]. Henmi and Copas [38] have recently advocated an interesting compromise; to use the fixed effects point estimate  that is robust to small study effects  but surrounded by a confidence interval derived under the random effects model. As shown in (Table 2), when applied to all studies in the NSCLC 4 metaanalysis this puts a 95% confidence interval of (0.611.16) around = 0.84, with an associate pvalue of 0.21, bringing the treatment's benefit severely into doubt. Fortunately, we were able to plausibly explain most of the asymmetry present by the differing types of chemotherapy regimens used, providing a much more useful answer with added clinical insight.
For the Cervix metadata, stratifying the trials by chemotherapy cycle duration helped to partially explain the heterogeneity. Again, in doing so it raised interesting clinical questions about the effective treatment of this cancer. The remaining heterogeneity present in the short cycle chemotherapy trials was removed by excluding an outlying study in a sensitivity analysis, guided by the results of a Baujat plot. Throwing data away is generally frowned upon by statisticians, and more sophisticated methods for incorporating so called 'outliers' have been proposed [39]. However, for small outlying studies this strategy is clearly a convenient and effective option. We could find no explanation for the extreme effect found by the CeCa trial in its design or patient population, but it is perhaps worth noting that, along with the PMG and LGOG trials, its results were never published in a peer reviewed journal. Clearly, one of the advantages of a metaanalysis is to bring together the totality of evidence, including especially trials whose results were not fully disseminated in the past. We do not know if the extreme results observed specifically in the CeCa and LGOG trials influenced their original nonpublication, but it is certainly worrying that the overall picture of evidence is far easier to interpret in their absence.
Standard or Generalised Qstatistic?
In Methods and Results we described and demonstrated the use of metaanalytical techniques based on the generalised Q statistic. Are these worth using? As can be seen from Figure 5 (right), whenever is zero so is . For nonzero values is generally greater than , the difference between the two appears to increase as the magnitude of the heterogeneity increases. This suggests that when a substantial amount of heterogeneity is present, may be systematically underestimating it because a oneiteration formula is not sufficient to arrive at an estimate near the truth. This underestimation does not effect in any meaningful way the estimate for θ. Across the 18 metaanalyses, the random effects estimates for based on and were very similar (and are therefore not shown) since the overall mean estimate is fairly insensitive to small changes in τ ^{2}[25, 40]. However, the variance of , V _{ RE }, and I ^{2} are far more sensitive to changes in τ ^{2} and hence accurate estimation is important for these quantities.
Conclusions
In this paper we have restricted our focus to the estimation of the metaanalytical quantities τ ^{2}, I ^{2} and the overall mean parameter θ, as well as providing confidence intervals for the latter two. We note that this does not reflect the stateoftheart in what can estimated via a random effects metaanalysis; one can for instance also estimate trial level effect parameters (θ + u _{ i }), predict the likely effects of future studies and test hypotheses relating to these additional parameters [19]. With this in mind, we make the following tentative conclusions.
The actual magnitude of the estimate τ ^{2} is often overlooked as a heterogeneity measure [41], and in keeping with modern developments the Dersimonian and Laird estimate is no longer considered to be the best choice [22, 24]. We recommend using the PM estimate for τ ^{2}  and by extension the it implies  since it is still very easy to calculate, but shares much of the accuracy and rigor of more complex methods. Van der Tweel and Bollen [42] use the PM method to estimate the overall random effects mean θ _{ RE } and heterogeneity parameter within the context of a sequential metaanalysis, but appear to stick with the original for other aspects of their analysis. We recommend that practitioners additionally make use of the PM estimate in the Inconsistency measure . R code to estimate , θ _{ RE } and (with confidence intervals) is provided below.
An I ^{2} of over 75% has traditionally been considered as indicating a high level of inconsistency, I ^{2}'s of above 50% as moderate and I ^{2}'s of below 25% as low. It is tempting to consider a random effects model when the I ^{2} is high. However, the range of the reference intervals shown in Figure 6 (left) highlights the considerable uncertainty around this measure. The recently updated Cochrane handbook [6] now gives overlapping rather than mutually exclusive regions for low, moderate and high heterogeneity, but when the heterogeneity is measured with as much uncertainty as in the Cervix 3 metaanalysis (90% reference intervals for of 0% to 93%) any categorisation feels dubious. Inconsistency intervals based on the statistic will generally be wider than those based on the standard measure but is a more accurate reflection of the uncertainty present. These findings are based on a fairly large simulation study for widely varying τ ^{2}, typical within study variance s ^{2} and trial number M. Although the simulated data were normally distributed, we do not think the conclusions would have changed if the study effects had been drawn from a more nonstandard distribution. By plotting at the lower and upper reference levels, as well at a spread of more central measures such as the mean, median and mode, one can easily and effectively convey this uncertainty to the analyst. For a comprehensive comparison of methods for estimating the heterogeneity parameter τ ^{2} see Biggerstaff and Tweedie [26] or Viechtbauer [25].
In the presence of heterogeneity, the naive and automatic application of the random effects model has been widely criticised. It is sensible to conduct a further investigation the data [34, 43, 44], but this may not lead to the identification of any explanatory factors. If unexplained heterogeneity also leads to large differences between the fixed and random effects estimates, there is the obvious prospect that conflicting clinical interpretations could arise. When funnel plot asymmetry is the predominant cause of this, I ^{2} statistics have a less meaningful interpretation. For this reason Rücker et. al [37] have recently proposed an alternative 'G' statistic, that expresses the inconsistency between studies after this asymmetry has been accounted for (through a bias correction for small study effects). As demonstrated on the NSCLC metaanalysis, the HenmiCopas method combining a fixed effects estimate with a 'random effects' confidence interval provides an alternative way of dealing with funnel plot asymmetry without making an explicit bias correction. Both the approaches of Rücker et. al. and Henmi and Copas appear to offer sensible and practical solutions to this problem, and merit further investigation.
R code
This code calculates point estimates and αlevel confidence intervals for , and , given the estimated effect sizes y within study standard errors s and desired type I error Alpha. This code is based on the algorithm suggested by DerSimonian and Kacker [22].
PM = function(y = y, s = s, Alpha = 0.1){
K = length(y) ; df = k 1 ; sig = qnorm(1Alpha/2)
low = qchisq((Alpha/2), df) ; up = qchisq(1(Alpha/2), df)
med = qchisq(0.5, df) ; mn = df ; mode = df1
Quant = c(low, mode, mn, med, up) ; L = length(Quant)
Tausq = NULL ; Isq = NULL
CI = matrix(nrow = L, ncol = 2) ;MU = NULL
v = 1/s^2 ; sum.v = sum(v) ; typS = sum(v*(k1))/(sum.v^2  sum(v^2))
for(j in 1:L){
tausq = 0 ; F = 1 ;TAUsq = NULL
while(F>0){
TAUsq = c(TAUsq, tausq)
w = 1/(s^2+tausq) ; sum.w = sum(w) ; w2 = w^2
yW = sum(y*w)/sum.w ; Q1 = sum(w*(yyW)^2)
Q2 = sum(w2*(yyW)^2) ; F = Q1Quant[j]
Ftau = max(F,0) ; delta = F/Q2
tausq = tausq + delta
}
MU[j] = yW ; V = 1/sum(w)
Tausq[j] = max(tausq,0) ; Isq[j] = Tausq[j]/(Tausq[j]+typS)
CI[j,] = yW + sig*c(1,1) *sqrt(V)
}
return(list(tausq = Tausq, muhat = MU, Isq = Isq, CI = CI, quant = Quant))
}
Authors' information
JB is a biostatistician working within the London and Cambridge MRC hubs for trials methodology research. JFT is the head of the Metaanalysis group at the MRC Clinical Trials Unit (CTU). AJC is a senior statistician within the MRC CTU and also a senior lecturer in medical statistics at University College, London. SB is a systematic reviewer at the CTU, working within the Metaanalysis group.
List of Abbreviations
 IPD:

Individual Patient Data
 FE:

fixed effect
 DL:

DerSimonian and Laird
 RE:

Random effects
 PM:

PauleMandel.
Declarations
Acknowledgements
None declared.
Authors’ Affiliations
References
 Simmonds M, Higgins J, Stewart L, Tierney J, Clarke M, Thompson S: Meta analysis of individual patient data from randomized trials: a review of methods used in practice. Clinical Trials. 2005, 2: 209217. 10.1191/1740774505cn087oa.View ArticlePubMedGoogle Scholar
 Stewart L, Parmar M: Bias in the analysis and reporting of randomized controlled trials. International Journal of Technology Assessment in Health Care. 1996, 12: 264275. 10.1017/S0266462300009612.View ArticlePubMedGoogle Scholar
 DerSimonian R, Laird N: Metaanalysis in clinical trials. Controlled Clinical Trials. 1986, 7: 177188. 10.1016/01972456(86)900462.View ArticlePubMedGoogle Scholar
 Higgins J, Thompson S: Quantifying heterogeneity in a metaanalysis. Statistics in Medicine. 2002, 21: 15391558. 10.1002/sim.1186.View ArticlePubMedGoogle Scholar
 Higgins J, Thompson S, Deeks J, Altman D: Measuring inconsistency in metaanalyses. BMJ. 2003, 327: 557560. 10.1136/bmj.327.7414.557.View ArticlePubMedPubMed CentralGoogle Scholar
 Higgins J, Green S: Cochrane Handbook for Systematic Reviews of Interventions Version 5.0.2. 2009, The Cochrane Collaboration, [http://www.cochranehandbook.org]Google Scholar
 AOCTG: Chemotherapy in advanced ovarian cancer: an overview of randomised clinical trials. Advanced Ovarian Cancer Trials Group. British Medical Journal. 1991, 303: 884893. 10.1136/bmj.303.6807.884.View ArticleGoogle Scholar
 NSCLC: Chemotherapy in nonsmall cell lung cancer: a metaanalysis using updated data on individual patients from 52 randomised trials. NonSmall cell Lung cancer Collaborative group. British Medical Journal. 1995, 311: 899909.View ArticleGoogle Scholar
 SMC: Adjuvant chemotherapy for localised resectable soft tissue sarcoma of adults: metaanalysis of individual data. Sarcoma Metaanalysis Collaboration. Lancet. 1997, 350: 16471654. 10.1016/S01406736(97)081658.View ArticleGoogle Scholar
 AOCTG: Chemotherapy in advanced ovarian cancer: for systematic metaanalyses of individual patient data from 37 randomized trials. Advanced Ovarian Cancer Trials Group. British Journal of Cancer. 1998, 78: 14791487. 10.1038/bjc.1998.710.View ArticleGoogle Scholar
 PMTG: Postoperative radiotherapy in nonsmallcell lung cancer: systematic review and metaanalysis of individual patient data from ninerandomised controlled trials. PORT Metaanalysis Trialists Group. Lancet. 1998, 352: 257263. 10.1016/S01406736(98)063417.View ArticleGoogle Scholar
 OCCG: Preoperative radiotherapy in esophageal carcinoma: A metaanalysis using individual patient data. Oesophageal Cancer Collaborative Group. Int J Radiation Oncology Biol Phys. 1998, 41: 579583. 10.1016/S03603016(97)005695.View ArticleGoogle Scholar
 GMTG: Chemotherapy in adult highgrade glioma: a systematic review and metaanalysis of individual patient data from 12 randomised trials. Glioma Metaanalysis Trialists Group. Lancet. 2002, 359: 10111018. 10.1016/S01406736(02)080911.View ArticleGoogle Scholar
 ABCMC: Neoadjuvant chemotherapy in invasive bladder cancer: a systematic review and metaanalysis. Advanced Bladder Cancer Metaanalysis Collaboration. Lancet. 2003, 361: 19271934. 10.1016/S01406736(03)135805.View ArticleGoogle Scholar
 NACCCMA: Neoadjuvant chemotherapy for locally advanced cervical cancer: A systematic review and metaanalysis of individual patient data from 21 randomised trials. Neoadjuvant Chemotherapy for Cervix Cancer Metaanalysis Collaboration. European Journal of Cancer. 2003, 39: 24702486. 10.1016/S09598049(03)004258.View ArticleGoogle Scholar
 ABCMC: Adjuvant chemotherapy in invasive bladder cancer: a systematic review and metaanalysis of individual patient data. Advanced Bladder Cancer Metaanalysis Collaboration. European Urology. 2005, 48: 189201. 10.1016/j.eururo.2005.04.005.View ArticleGoogle Scholar
 CCCMC: Reducing uncertainties about the effects of chemoradiotherapy for cervical cancer: a systematic review and metaanalysis of individual patient data from 18 randomized trials. Chemoradiotherapy for Cervical Cancer Metaanalysis Collaboration. Journal of Clinical Oncology. 2008, 26: 58025812. 10.1200/JCO.2008.16.4368.View ArticleGoogle Scholar
 Yusef S, Peto R: Beta blockade during and after myocardial infarction: an overview of the randomised trials. Prog Cardio Dis. 1985, 27: 335371. 10.1016/S00330620(85)800037.View ArticleGoogle Scholar
 Higgins J, Thompson S, Spiegelhalter D: A reevaluation of randomeffects metaanalysis. J Royal Statistical Soc Series A. 2009, 172: 137159.View ArticleGoogle Scholar
 Hardy R, Thompson S: A likelihood approach to metaanalysis with random effects. Statistics in Medicine. 1999, 15: 619629. 10.1002/(SICI)10970258(19960330)15:6<619::AIDSIM188>3.0.CO;2A.View ArticleGoogle Scholar
 Tweedie R, Scott D, Biggerstaff B, Mengersen K: Bayesian metaanalysis, with application to studies of ETS and lung cancer. Lung Cancer. 1996, 14: S171S194. 10.1016/S01695002(96)902226.View ArticlePubMedGoogle Scholar
 DerSimonian R, Kacker R: Randomeffects models for metaanalysis of clinical trials: An update. Contemporary Clinical Trials. 2007, 28: 105114. 10.1016/j.cct.2006.04.004.View ArticlePubMedGoogle Scholar
 Paule R, Mandel J: Consensus values and weighting factors. J Res Natl Bur Stand. 1982, 87: 377385.View ArticleGoogle Scholar
 Ruhkin A, Biggerstaff B, Vangel M: Restricted maximum likelihood estimation of a common mean and the MandelPaule algorithm. Journal of Statistical Planning and Inference. 2000, 83: 319330. 10.1016/S03783758(99)000981.View ArticleGoogle Scholar
 Viechtbauer W: Confidence intervals for the amount of heterogeneity in metaanalysis. Statistics in Medicine. 2007, 26: 3752. 10.1002/sim.2514.View ArticlePubMedGoogle Scholar
 Biggerstaff B, Tweedie R: Incorporating variability in estimates of heterogeneity in the random effect smodel in metaanalysis. Statistics in Medicine. 1997, 16: 753768. 10.1002/(SICI)10970258(19970415)16:7<753::AIDSIM494>3.0.CO;2G.View ArticlePubMedGoogle Scholar
 Sidik K, Jonkman J: Simple heterogeneity variance estimation for metaanalysis. J Royal Statistical Soc Series C. 2005, 54: 367384. 10.1111/j.14679876.2005.00489.x.View ArticleGoogle Scholar
 HuedoMedina T, SanchezMeca J, MarinMartinez F, Botella J: Assessing Heterogeneity in MetaAnalysis: Q Statistic or I ^{2} Index?. Psychological Methods. 2006, 11: 193206. 10.1037/1082989X.11.2.193.View ArticlePubMedGoogle Scholar
 Light R, Pillemer D: Summing up: The Science of Reviewing Research. 1984, Cambridge: Harvard University PressGoogle Scholar
 Egger M, DaveySmith G, Schneider M, Minder C: Bias in metaanalysis detected by a simple graphical test. British Medical Journal. 1997, 315: 629634.View ArticlePubMedPubMed CentralGoogle Scholar
 Copas J, Malley P: A robust Pvalue for treatment effect in metaanalysis with publication bias. Statistics in Medicine. 2008, 27: 42674278. 10.1002/sim.3284.View ArticlePubMedGoogle Scholar
 Baujat B, Mahe C, Pignon J, Hill C: A graphical method for exploring heterogeneity in metaanalyses: application to a metaanalysis of 65 trials. Statistics in Medicine. 2002, 21: 26412652. 10.1002/sim.1221.View ArticlePubMedGoogle Scholar
 AnzuresCabrera J, Higgins J: Graphical displays for metaanalysis: An overview with suggestions for practice. Research Synthesis Methods. 2010, 1: 6680. 10.1002/jrsm.6.View ArticlePubMedGoogle Scholar
 Greenland S: Invited Commentary: A Critical Look at Some Popular MetaAnalytic Methods. American Journal of Epidemiology. 1994, 140: 291296.Google Scholar
 Stanley T: Metaregression methods for detecting and estimating empirical effects in the presence of publication selection. Oxford Bulletin of Economics and Statistics. 2008, 70: 103127.Google Scholar
 Moreno S, Sutton A, Ades A, Stanley T, Abrams K, Peters J, Cooper N: Assessment of regressionbased methods to adjust for publication bias through a comprehensive simulation study. BMC Medical Research Methodology. 2009, 9: 210.1186/1471228892.View ArticlePubMedPubMed CentralGoogle Scholar
 Rücker G, Schwarzer G, Carpenter J, Binder H, Shumacher M: Treatmenteffect estimates adjusted for small study effects via a limit metaanalysis. Biostatistics. 2011, 12: 122142. 10.1093/biostatistics/kxq046.View ArticlePubMedGoogle Scholar
 Henmi M, Copas J: Confidence intervals for random effects metaanalysis and robustness to publication bias. Statistics in Medicine. 2010, 29: 29692983. 10.1002/sim.4029.View ArticlePubMedGoogle Scholar
 Baker R, Jackson D: A new approach to outliers in metaanalysis. Health Care Management Science. 2008, 11: 121131. 10.1007/s1072900790418.View ArticlePubMedGoogle Scholar
 Jackson D, Bowden J, Baker R: How does the DerSimonian and Laird procedure for random effects metaanalysis compare with its more efficient but harder to compute counterparts. Journal of Statistical Planning and Inference. 2010, 140: 961970. 10.1016/j.jspi.2009.09.017.View ArticleGoogle Scholar
 Rücker G, Schwarzer G, Carpenter J, Shumacher M: Undue reliance on I ^{2} in assessing heterogeneity may mislead. BMC Medical Research Methodology. 2008, 8: 7910.1186/14712288879.View ArticlePubMedPubMed CentralGoogle Scholar
 Van der Tweel I, Bollen C: Sequential metaanalysis: an efficient decisionmaking tool. Clinical Trials. 2010, 7: 136146. 10.1177/1740774509360994.View ArticlePubMedGoogle Scholar
 Thompson S: Why sources of heterogeneity in metaanalysis should be investigated. British Medical Journal. 1994, 309: 13511355.View ArticlePubMedPubMed CentralGoogle Scholar
 Rücker G, Schwarzer G, Carpenter J, Shumacher M: Comments on "Empirical vs natural weighting in random effects metaanalysis" by JJ Shuster, Statistics in Medicine 2009. Statistics in Medicine. 2010, 29: 29632965. 10.1002/sim.3957.View ArticlePubMedGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/11/41/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.