Estimating required information size by quantifying diversity in randomeffects model metaanalyses
 Jørn Wetterslev^{1}Email author,
 Kristian Thorlund^{1},
 Jesper Brok^{1} and
 Christian Gluud^{1}
DOI: 10.1186/14712288986
© Wetterslev et al; licensee BioMed Central Ltd. 2009
Received: 15 May 2009
Accepted: 30 December 2009
Published: 30 December 2009
Abstract
Background
There is increasing awareness that metaanalyses require a sufficiently large information size to detect or reject an anticipated intervention effect. The required information size in a metaanalysis may be calculated from an anticipated a priori intervention effect or from an intervention effect suggested by trials with lowrisk of bias.
Methods
Information size calculations need to consider the total model variance in a metaanalysis to control type I and type II errors. Here, we derive an adjusting factor for the required information size under any randomeffects model metaanalysis.
Results
We devise a measure of diversity (D ^{2}) in a metaanalysis, which is the relative variance reduction when the metaanalysis model is changed from a randomeffects into a fixedeffect model. D ^{2} is the percentage that the betweentrial variability constitutes of the sum of the betweentrial variability and a sampling error estimate considering the required information size. D ^{2} is different from the intuitively obvious adjusting factor based on the common quantification of heterogeneity, the inconsistency (I ^{2}), which may underestimate the required information size. Thus, D ^{2} and I ^{2} are compared and interpreted using several simulations and clinical examples. In addition we show mathematically that diversity is equal to or greater than inconsistency, that is D ^{2} ≥ I ^{2}, for all metaanalyses.
Conclusion
We conclude that D ^{2} seems a better alternative than I ^{2} to consider model variation in any randomeffects metaanalysis despite the choice of the between trial variance estimator that constitutes the model. Furthermore, D ^{2} can readily adjust the required information size in any randomeffects model metaanalysis.
Background
Outcome measures in a single randomised trial or a metaanalysis of several randomised trials are typically dichotomous, especially for important clinical outcomes such as death, acute myocardial infarction, etc. Although metaanalysts cannot directly influence the number of participants in a metaanalysis like trialists conducting a single trial, the assessment of the metaanalytic result depends heavily on the amount of information provided. A limited number of events from a few small trials and the associated random error may be underrecognised sources of spurious findings. If a metaanalysis is conducted before reaching a required information size (i.e., the required number of participants in a metaanalysis) it should be evaluated according to the increased risk that the result may represent a chance finding. It has recently been suggested that sample size estimation in a single trial may be less important in the era of systematic review and metaanalysis [1]. Therefore, the reliability of a conclusion drawn from a metaanalysis, despite standardly calculated confidence limits, may depend even more on the number of events and the total number of participants included than hitherto perceived [2–8]. Both numbers determine the amount of available information in a metaanalysis. The information size (IS) required for a reliable and conclusive metaanalysis may be assumed to be at least as large as the sample size (SS) of a single wellpowered randomised clinical trial to detect or reject an anticipated intervention effect [2–4].
The estimation of a required information size for a metaanalysis in order to detect or reject an anticipated intervention effect on a binary outcome measure should be considered based on reasonable assumptions. These assumptions may be derived from two kinds of information. Firstly, by anticipating an a priori intervention effect, most appropriately decided at the time when the protocol for a systematic review is prepared. An a priori intervention effect may be estimated by consulting related interventions for the same disease or the same intervention for related diseases suggesting a clinically relevant effect to be detected or ruled out [2–4]. This situation would be almost analogous to the hypothesis testing in a single randomised trial. Secondly, an intervention effect estimated by trials with lowrisk of bias in the metaanalysis may represent our best estimate, at a given time point, of a possible intervention effect knowing the available data [5]. This would be a kind of a post hoc analysis of the information needed to detect or reject an intervention effect suggested by data already available. When planning a new trial it may be very important to estimate which IS is needed for the updated metaanalysis to be conclusive. In both instances the estimated required information size may be applied to grade the evidence reported in a cumulative metaanalysis adjusting for the risk of random error due to repetitive testing on accumulating data [5, 6]. If the number of actually accrued participants falls short of the required IS the metaanalysis may be inconclusive even though the confidence interval is suggestive of a clinical relevant effect or. Because if the confidence interval (or the pvalue) is appropriately adjusted with sequential methods, it may no longer show a statistically significant or clinically relevant effect. Conversely, if the actually accrued number of participants supersedes the required information size without the metaanalysis becoming statistically significant we may be able to rule out the anticipated intervention effect size [5].
It is not realistic to assume that the population of the included trials in a metaanalysis is truly homogenous, as it may be in a single clinical trial. Metaanalysis, therefore, should not analyse included participants as if they are coming from one trial [9]. Consequently the difference between obtaining the required IS and SS is rooted in the underlying assumption of between trial variability, and thus, the chosen metaanalytical model.
If the betweentrial variability of the outcome measure estimates in a metaanalysis is incorporated into the model using the traditional oneway randomeffects model, the required IS will be affected [5]. In this vein, the required IS is a monotonically increasing function of the total variability among the included trials. An estimate of the required IS can therefore be derived once the degree of variability is known or prespecified [5]. The test statistic for heterogeneity in a metaanalysis, the inconsistency factor (I ^{2}) based on Cochran's Q proposed by Higgins and Thompson [10], may seem an obvious quantity to use for this purpose as it allow us to estimate the degree of the variation, which is not covered by assumption of homogeneity [5]. However, I ^{2} is derived using a set of general assumptions that may be inappropriate in this context.
In this paper we derive a general expression for the required IS in any randomeffects model. We prove the monotone relationship between IS and the degree of total variability in a oneway randomeffects metaanalysis. We use our results to define a quantification of diversity (D ^{2}) between included trials in a metaanalysis, which is the relative model variance reduction when the model of pooling is changed from a randomeffects model into a fixedeffect model. We analyse and discuss the differences between our definition of diversity, D ^{2}, and the commonly used measure for heterogeneity, I ^{2}.
Methods
2.1 Deriving the required metaanalysis information size and diversity
If the required IS needed to detect or reject an intervention effect in a metaanalysis should be at least the sample size needed to detect or reject a similar effect in a single trial, then the following scenario applies:
This yields the intuitive interpretation that the required IS in a randomeffects model is a monotone increasing function of the degree of heterogeneity.
2.2 Limitations of a momentbased 'sampling error' in the definition of heterogeneity, I^{2}
Metaanalyses examples
Metaanalysis  Title  Intervention  Outcome measure  Number of trials  Number of participants 

Afshari and others 2007 [16]  Antithrombin III for critically ill patients  Antithrombin III  Mortality  20  3,458 
AlInany and others, 2006 [17]  Cycle cancellations due to poor ovarian response  Gonadotropin releasing hormone for assisted reproductive therapy  Number of cycle cancellations  13  2,543 
Soll and others, 1997 [18]  Prophylactic surfactant to prevent morbidity and mortality in preterm infants.  Surfactant  Mortality or pneumothorax  8  988 
Wetterslev and Juhl, 2006 [19]  Effect of perioperative βblockade on nonfatal perioperative AMI  Perioperative βblockers for noncardiac surgery  Perioperative myocardial infarction within 30 days of operation  11  2,211 
Bury and Tudehope, 2000 [20]  Effect of antibiotics on necrotizing enterocolitis in newborn  Enteral antibiotics in newborn  Necrotizing enterocolitis  5  458 
Li and others, 2007 [21]  Intravenous magnesium for acute myocardial infarction  Magnesium  Mortality  23  72,472 
Meyhoff and others, 2008 [22]  Perioperative ventilation with 80% versus 30% oxygen during intestinal surgery  Perioperative ventilation with 80% oxygen  Wound infection within 15 days of surgery  4  989 
Derived data from metaanalyses examples
Metaanalysis  Range of weights w _{ i }(% weights) in the fixedeffect model  Inconsistency (I ^{2})%  Diversity (D ^{2})%  (D ^{2}  I ^{2})%  A priori relative risk reduction % (RRR)  Unadjusted information size (SS)  Heterogeneityadjusted information size (HIS)  Diversityadjusted information size (DIS) 

Afshari and others 2007 [16]  0.2281 (0.0480%)  0.0  0.0  0.0  10  3,317  3,317  3,317 
AlInany and others, 2006 [17]  0.23.9 (118%)  7.2  13.9  6.7  25  3,516  3,789  4,083 
Soll and others, 1997 [18]  3.622.2 (6.238.1%)  22.9  37.3  14.3  60  193  250  307 
Wetterslev and Juhl, 2006 [19]  0.310.4 (142%)  13.4  40.5  27.1  20  8,421  9,726  14,164 
Bury and Tudehope, 2000 [20]  1.5  9.6 (738%)  40.2  57.7  17.5  45  440  736  1,039 
Li and others, 2007 [21]  0.24565.1 (0.0242%)  61.9  89.9  28.0  10  31,094  81,466  306,276 
Meyhoff and others, 2008 [22]  1.315.2 (447%)  74.2  79.4  5.2  30  1,699  6,581  8,239 
If the focus is shifted towards a sufficient IS estimation, then adjusting factors based on I ^{2} calculated from a momentbased sampling error may be insufficient. We therefore suggest to consider an alternative adjusting factor to obtain an adequate estimation of the required IS.
2.3 Defining and implementing a measure of diversity
This way, D ^{2} in a metaanalysis may become a central measure of the betweentrial variability relative to the sum of the betweentrial variability with an estimate of the sampling error basically originating from the required information size.
As such, D ^{2} is able to quantify the relative model variance change from a randomeffects into a fixedeffect model. More importantly D ^{2}, in contrast to I ^{2}, is not based on underlying assumptions of a 'typical' sampling error that are violated in most metaanalyses. D ^{2} is the percentage of the total variance (the sum of between trial variance and sampling error), in a randomeffects model, contributed by the between trial variance.
2.4 Simulating metaanalyses
In our simulations, we considered metaanalyses with k = 6 and k = 20 trials. For each k, we considered the four combinations from two different average control event proportions, (PC) of 10% and 30%, and two true values of the overall effect in terms of odds ratios of 1 and 0.7. The above values were selected aiming to cover different plausible metaanalytic scenarios. In total, these values make up for 8 simulation scenarios.
For each combination of the above mentioned variables we generated data for k 2×2 tables. For all k trials, within group sample sizes were determined by sampling an integer between 20 and 500 participants. Group sizes were equal in each simulated trial. We drew the trial specific control group event rate, PC _{ i }, from a uniform distribution, PC _{ i }~U(PC0.15, PC+0.15). We drew the number of observed events in the control group from a binomial distribution e _{ iC } ~bin(n _{ i }, PC _{ i }). For each metaanalysis scenario we varied the degree of heterogeneity by sampling the betweenrial standard deviation, τ (not the betweentrial variance τ ^{2}), from a uniform distribution, τ ~U(10 ^{10}, SQRT(0.60)). We simulated the underlying true trial intervention effects, as log odds ratio ln(OR _{ i })~N(OR, τ ^{2}), where OR is the true intervention effect expressed as an odds ratio. We drew the observed number of events in the intervention group from a binomial distribution e _{ iE } ~bin(n _{ i },PE _{ i }), where PE _{ i }= PC _{ i } exp(ln(OR _{ i }))/(1  PC _{ i }+ PC _{ i } exp(ln(OR _{ i })))
For all metaanalysis scenarios we simulated 10,000 metaanalyses and for each of these we calculated the and the . For each scenario we plotted D ^{2} against I ^{2} and incorporated the line of unity in the scatterplot.
2.5 Selection of metaanalyses examples
We selected traditional randomeffects metaanalyses to cover a range of inconsistency I ^{2} from 0% to 100% and to come from a wide range of medical research fields.
Results
3.1 The relationship between diversity, D^{2}, and heterogeneity, I^{2}
So it follows from (3.5) that for all metaanalyses. Furthermore if we apply Chebyshev's inequality [13] arranging the weights and at the same time w _{1} ≥ w _{2} ≥ ...... w _{ k }then:
and, finally, D ^{2} ≥ T ^{2} ≥ I ^{2} in all metaanalyses.
3.2 Some useful properties of D^{2}
demonstrating that the percentage of increase in variance when the model of metaanalysis is changed from a fixedeffect model into a randomeffects model can, of course, also be expressed in terms of diversity.
as (1 + w _{i}·τ ^{2}) ≥ 1 for all i and for all estimators of τ ^{2} including the DerSimonianLaird estimator [15] with being at least greater than or equal to 0.
3.3 Simulations of metaanalyses
3.4 Examples
We used the expression of D ^{2} to calculate this quantity in seven traditional randomeffects metaanalyses [16–22] listed in Table 1. These metaanalyses cover a range of inconsistency, I ^{2}, from 0% to 74.2% and come from different medical research fields: intensive care [16], assisted reproductive technology [17], perioperative medicine [19, 22], neonatology [18, 20], and cardiology [21]. The results of the calculations of I ^{2},D ^{2}, inconsistencyadjusted information size HIS ( ), and diversityadjusted information size DIS ( ) from these metaanalyses are shown in Table 2. The range of the calculated unadjusted SS range from 440 to 31,094 participants.
Discussion
Using a mathematical derivation, metaanalyses simulations, and examples of metaanalyses we derive a concept of diversity, D ^{2}. D ^{2} may be used for adjustment of the required information size in any randomeffects model metaanalysis once the between trial variance is estimated. Focusing on the required information size estimation in a randomeffects metaanalysis, D ^{2} seems less biased compared to I ^{2}. The D ^{2} is directly constructed to fulfil the requirements of the information size calculation and is subsequently independent of any 'typical' a priori sampling error estimate, whereas the I ^{2} is influenced by an a priori 'typical' sampling error estimate. We therefore find that it is possible and appropriate taking D ^{2} into consideration to calculate the required IS in metaanalyses as DIS.
DIS has several advantages. It measures the required IS needed to preserve the anticipated risk of type I and type II errors in a randomeffects model metaanalysis. DIS considers total variance change when the model shifts from a fixedeffect into a randomeffects model. DIS is a model dependent and derived estimate of the required IS. The adjustment is dependent only on the anticipated intervention effect and on the model used to incorporate the betweentrial variance estimate . D ^{2} applies to randomeffects models other than that proposed by DerSimonianLaird [16] as long as the betweentrial estimator, , is specified. The adjustment of IS does not depend on the level of type I and II errors, as (Z _{1α/2 }+ Z _{1β })^{2} is levelled out during the derivation of the adjustment factor A _{ RF }(see equation 2.1, 2.2, and 2.5). The relationship D ^{2} ≥ I ^{2} in all the simulations and in all the examples (shown as points above the line of unity in figure 1, 2, and 3) are in accordance with the properties of D ^{2} compared to I ^{2} derived in section 3.1.
There are limitations of DIS. Like HIS the use of DIS cannot compensate for systematic bias such as selection bias, allocation bias, reporting bias, collateral intervention bias, and time lag bias [5, 23–28]. Furthermore, DIS is always greater than or equal to HIS, which may emphasise that caution is needed when interpreting metaanalysis before the required DIS has been reached [2–8].
The calculation of HIS and DIS may seem to contrast the SS calculation in a single trial where no adjustment for heterogeneity or diversity is performed. However, Fedorov and Jones [29] advocated the necessity of adjusting SS for heterogeneity arising from different accrual numbers among centres in a multicentre trial in order to avoid the trial being underpowered. If such an adjustment seems fair for a single trial, it also appears appropriate for a metaanalysis of several trials. As an example, we calculated the DIS to 14,164 participants for a metaanalysis of the effect on mortality of perioperative betablockade in patients for noncardiac surgery (Table 2). This may explain why a recent metaanalysis of seven randomised trials with lowrisk of bias including 11,862 participants indicates, but still does not convincingly show, firm evidence for harm [30]. The actual accrual of 11,862 participants is beyond the HIS of 9,726 participants, but below the DIS of 14,164 participants, and the metaanalysis [30] may still be inconclusive. This suggest that HIS is not a sufficiently adjusted metaanalytic information size. Furthermore, the example demonstrates the important question of the stability of I ^{2} and D ^{2} beyond a certain number of trials in a metaanalysis as I ^{2} was 13.4% in the metaanalysis after 2,211 participants [19] and has now doubled to I ^{2} = 27.0% after 11,862 accrued participants in the metaanalysis of seven trials with lowrisk of bias [30]. The assumption of I ^{2} and D ^{2} becoming stable after five trials is probably wrong and illustrates the moving target concept, which we have to face doing cumulative metaanalysis as evidence accumulates. Although a moving target may cause conceptual problems, a moving target may be better than no target at all.
The assumption that the IS required for a reliable and conclusive fixedeffect metaanalysis should be as large as the SS of a single wellpowered randomised clinical trial to detect or reject an anticipated intervention effect [2–4] may not be necessary in some instances. The statistical information (SINF) required in a metaanalysis could ultimately be expressed as [31], with δ being the effect size. As SINF is the reciprocal of the variance in the metaanalysis, say , it follows that in metaanalyses with , the amount of information may eventually suffice to detect, or reject, an effect size of δ, without yet having reached HIS or DIS. This criterion, however, is not a simple one and may only be fulfilled occasionally. Furthermore, it seems impossible to forecast or even to get an idea of the magnitude of in the beginning of a series of trials as well as along the course of trials being performed.
D ^{2} offers a number of useful properties compared to I ^{2}. In contrast to I ^{2}, D ^{2} reflects the relative variance expansion due to the between trial variance estimate without assuming an estimate of a 'typical' sampling error σ ^{2}. D ^{2} is reduced when the estimate is reduced, even for the same set of trials. In case diversity is larger than inconsistency this may be an indication that total variability among trials in the metaanalysis is even greater than suggested by I ^{2}. I ^{2} is intrinsically influenced by a potentially overestimated sampling error ( ), thereby underestimating and inherently placing less weight on large trials with many events. On the other hand a 'typical' sampling error originating from the required information size, , could be deduced from the D ^{2}. We would, however, advise great cautiousness in such an attempt. The difference (D ^{2}  I ^{2}) reflects the difference of the momentbased and the information sizebased 'typical' sampling error estimate. The calculation of diversity and (D ^{2}  I ^{2}) may serve as supplementary tools to the assessment of variability in a metaanalysis. D ^{2} is a transformation of the variance ratio of the variances from the randomeffects model and the fixedeffect model. This variance ratio was a candidate for the quantification of heterogeneity [10].
D ^{2} may vary within the same set of trials when different between trial variance estimators are used in the corresponding randomeffects model. On the contrary, I ^{2} is intimately linked to the specific between trial variance estimator in the DerSimonianLaird randomeffects model as I ^{2} by definition is [10] and Q is used to estimate a momentbased between trial variance [15]. The interpretation of heterogeneity is obviously dependent on the variance estimator as well. An estimate of τ ^{2} is a prerequisite for any randomeffects model and the actual estimated value, together with the way is incorporated into the model, actually constitutes the model [32]. Therefore, a quantification of the betweentrial variability rather than sampling error which is independent of the specific randomeffects model is impossible, as it is constituted by the between trial variance estimator [32]. D ^{2} adapt automatically to different between trial variance estimators [32] while I ^{2} is linked to the estimator from the DerSimonianLaird randomeffects model.
D ^{2} may have some limitations too. The derivation of D ^{2} depends on the assumption that the point estimate of the intervention effect in the fixedeffect model and the point estimate of the intervention effect in the randomeffects model are approximately equal. Metaanalyses with considerable difference of the point estimate in the fixedeffect model and the point estimate in the randomeffects model represent specific problems. Probably more information is needed when μ _{ F }>> μ _{ R }since the formula yields higher values for N _{ R }under the assumption of a constant variance ratio. On the other hand less information may be needed when μ _{ F }<<μ _{ R }since the formula then yields lower values for N _{ R }under the assumption of a constant variance ratio. However, examples with considerable differences of the point estimates in a fixed and randomeffects model presumably represent metaanalyses of interventions with considerable between trial variance due to small trial bias. The metaanalysis of the effect of magnesium in patients with myocardial infarction is such an example [21] where one large trial totally dominate the result in the fixedeffect model but are unduly downweighted in the randomeffects model. Care should be taken to interpret the randomeffects model despite any calculated information size in such a situation. Further, to foresee a priori the size of the difference between μ _{ F }and μ _{ R }seems impossible and the calculation may then degenerate exclusively to a post hoc analysis.
Second, D ^{2}, though potentially unbiased with respect to information size calculations, could come with a greater variance than I ^{2} when both are calculated in the same set of metaanalyses. This latter situation presents a potentially unfavourable 'biasvariancetrade off' but an estimate of its magnitude will have to await simulation studies addressing the issue.
It may seem an advantage that I ^{2} is always reported in metaanalysis and therefore readily available to adjust the expected information size. On the other hand is also calculable for metaanalysis of ratio measures (e.g, RR or OR), width_{F} and width_{R} refers to the widths of the confidence intervals for the logarithmic transformed measures in the fixedeffect and the randomeffects models, respectively.
Last but not least the decision to pool intervention effect estimates in metaanalysis should be the clinical relevance of any inconsistency or diversity present. The between trial variance,τ ^{2}, rather than I ^{2} or D ^{2}, may be the appropriate measure for this purpose [33–35].
The estimation of a required IS for a metaanalysis to detect or reject an anticipated intervention effect on a binary outcome measure should be considered based on reasonable assumptions. Accordingly, it may not be wise to assume absence of heterogeneity in a metaanalysis unless the intervention effect is anticipated to be zero [36, 37]. On the contrary it may be wise to anticipate moderate to substantial heterogeneity (e.g., more than 50%) in an a priori adjustment of the required IS [37]. The concept of diversity points to the fact that an adjustment based on the experience with inconsistency would result in underestimated heterogeneity and hence an underestimated required IS [37]. Alternatively for a future updated metaanalysis to become conclusive we may apply the actual estimated heterogeneity of the available trials in a metaanalysis as the best we have for the adjustment of the required IS. D ^{2} seems more capable than I ^{2} in obtaining such an adequate adjustment.
Conclusion
A quantity to characterise the proportion of between trial variation in any metaanalysis relative to the total model variance of the included trials is needed. Diversity, D ^{2}, may be such a quantity. D ^{2} describes the relative model variance reduction changing from a randomeffects model into a fixedeffect model. Diversity may be described as the proportion of the total variance in a randomeffects model contributed by the between trial variation despite the chosen between trial variance estimator. Furthermore, D ^{2} can adequately adjust the required information size in any randomeffects metaanalysis irrespective the metaanalytic model.
Conflicts of interests
The authors declare that they have no competing interests.
Authors information
JW is an anaesthesiologist and a trialist working with metaanalysis and trial sequential analysis at the Copenhagen Trial Unit having special interests in perioperative medicine.
KT is a biostatistician working with metaanalysis and trial sequential analysis at the Copenhagen Trial Unit.
JB is an intern working in paediatrics with metaanalysis and trial sequential analysis.
CG is head of the Copenhagen Trial Unit, EditorInChief of the Cochrane HepatoBiliary Group, a trialist, and an associate professor at Copenhagen University.
List of abbreviations
 α :

Risk of type 1 error
 β :

Risk of type 2 error
 A _{ RF } :

Adjustment factor of information size changing from a fixedeffect to a randomeffects model
 ∀:

For any...
 Q :

Cochran's Q
 D ^{2} :

Diversity
 DIS :

Diversity adjusted information size ( )
 HIS :

Heterogeneity adjusted information size ( )
 I ^{2} :

Inconsistency factor
 K :

Number of trials in a metaanalysis
 N _{R} :

Required number of participants in a randomeffects metaanalysis
 N _{F} :

Required number of participants in a fixedeffect metaanalysis
 IS :

Required number of participants in a metaanalysis
 μ _{F} :

Estimate of the intervention effect in a fixedeffect metaanalysis
 μ _{R} :

Estimate of the intervention effect in a randomeffects metaanalysis
 OR :

Odds ratio
 PC :

Control event rate
 RRR :

Relative risk reduction
 SS :

Sample size in a single randomised clinical trial
 :

Estimate of a typical sampling error considering diversity
 :

Estimate of a typical momentbased sampling error
 :

Mean of estimates of sampling errors in a metaanalysis
 τ ^{2} :

Estimator of the variance of between trial intervention effect estimates
 :

Estimate of the variance of between trial intervention effect estimates
 :

DerSimonianLaird estimate of the variance of between trial intervention effect estimates
 V _{ F } :

The variance in a fixedeffect metaanalysis
 V _{ R } :

The variance in a randomeffects metaanalysis
 Z _{1α/2} Fractile for 1α :

/2
 Z _{1β } Fractile for 1β :

.
Declarations
Acknowledgements
We are grateful to Jørgen Hilden, M.D., associate professor emeritus at the Department of Biostatistics, Copenhagen University, for having critically reviewed a former version of our manuscript. We thank the peer reviewers Rebecca Turner, MSc in statistics and Gerta Rücker, MSc in statistics for helpful suggestions for improvements of the manuscript.
Authors’ Affiliations
References
 Guyatt GH, Mills EJ, Elbourne D: In the era of systematic reviews, does the size of an individual trial still matter?. PLoS Medicine. 2008, 5 (1): e410.1371/journal.pmed.0050004. doi:10.1371/jounal.pmed.0050004.View ArticlePubMedPubMed CentralGoogle Scholar
 Pogue J, Yusuf S: Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative metaanalysis. Controlled Clinical Trials. 1997, 18: 58093. 10.1016/S01972456(97)000512.View ArticlePubMedGoogle Scholar
 Pogue J, Yusuf S: Overcoming the limitations of current metaanalysis of randomised controlled trials. Lancet. 1998, 351 (9095): 4752. 10.1016/S01406736(97)084614.View ArticlePubMedGoogle Scholar
 Devereaux PJ, Beattie WS, Choi PT, Badner NH, Guyatt GH, Villar JC: How strong is the evidence for the use of perioperative betablockers in noncardiac surgery? Systematic review and metaanalysis of randomised controlled trials. BMJ. 2005, 331 (7512): 31321. 10.1136/bmj.38503.623646.8F.View ArticlePubMedPubMed CentralGoogle Scholar
 Wetterslev J, Thorlund K, Brok J, Gluud C: Trial sequential analysis may establish when firm evidence is reached in a metaanalysis. Journal of Clinical Epidemiology. 2008, 61 (1): 6475. 10.1016/j.jclinepi.2007.03.013.View ArticlePubMedGoogle Scholar
 Brok J, Thorlund K, Gluud C, Wetterslev J: Trial sequential analysis reveals insufficient information size and potentially false positive results in many metaanalyses. Journal of Clinical Epidemiology. 2008, 61 (8): 7639. 10.1016/j.jclinepi.2007.10.007.View ArticlePubMedGoogle Scholar
 Thorlund K, Devereaux PJ, Wetterslev , Guyatt G, Ioannidis JPA, Thabane L, Gluud LL, AlsNielsen B, Gluud C: Can trial sequential monitoring boundaries reduce spurious inferences from metaanalyses?. International Journal of Epidemiology. 2008, Doi:10.1093/iej/dyn179.Google Scholar
 Brok J, Thorlund K, Wetterslev J, Gluud C: Apparently conclusive metaanalyses may be inconclusiveTrial sequential analysis adjustment of random error risk due to repetitive testing of accumulating data in apparently conclusive neonatal metaanalyses. International Journal of Epidemiology. 2008, Doi:10.1093/iej/dyn188.Google Scholar
 Altman DG, Deeks JJ: Metaanalysis, Simpson's paradox, and the number needed to treat. BMC Medical Research Methodology. 2002, 2: 310.1186/1471228823.View ArticlePubMedPubMed CentralGoogle Scholar
 Higgins JP, Thompson SG: Quantifying heterogeneity in a metaanalysis. Statistics in Medicine. 2002, 21: 15391558. 10.1002/sim.1186.View ArticlePubMedGoogle Scholar
 Feinstein AR: Clinical Epidemiology: the Architecture of Clinical Research. 1985, Philadelphia: W.B. Saunders, 166Google Scholar
 Chow SC, Shao J, Wang H: Sample Size Calculation in Clinical Research. Edited by: SheinChung Chow. 2003, CRC, Taylor & Francis Group, Chapter 8.8.1: 204206.Google Scholar
 Spiegel MR: Mathematical Handbook of Formulas and Tables. 1971, Schaum's outline series, McGrawHill Book Company, [http://en.wikipedia.org/wiki/Chebyshev%27s_sum_inequality]Google Scholar
 Takouche B, CadarsoSuaréz C, Spiegelman D: Evaluation of old and new tests of heterogeneity in epidemiologic metaanalysis. American Journal of Epidemiology. 1999, 150: 206215.View ArticleGoogle Scholar
 DerSimonian R, Laird NM: Metaanalysis in clinical trials. Controlled Clinical Trials. 1986, 7: 177188. 10.1016/01972456(86)900462.View ArticlePubMedGoogle Scholar
 Afshari A, Wetterslev J, Brok J, Møller AM: Antithrombin III in critically ill patients. A systematic review with metaanalysis and trial sequential analysis. BMJ. 2007, 335 (7632): 121920. 10.1136/bmj.39398.682500.25.View ArticleGoogle Scholar
 AlInany HG, AbouSettea AM, Aboulghar M: Gonadotrophinreleasing hormone antagonists for assisted conception. Cochrane Database of Systematic Reviews. 2006, 3: CD001750PubMedGoogle Scholar
 Soll RF: Prophylactic natural surfactant extract for preventing morbidity and mortality in preterm infants. The Cochrane Database of Systematic Reviews. 1997, 4: CD00051110.1002/14651858. C.Google Scholar
 Wetterslev J, Juul AB: Benefits and harms of perioperative betablockade. Best Practice and Research of Clinical Anesthesiology. 2006, 20: 285302. 10.1016/j.bpa.2005.10.006.View ArticleGoogle Scholar
 Bury RG, Tudehope D: Enteral antibiotics for preventing necrotizing enterocolitis in low birthweight or preterm infants. The Cochrane Database of Systematic Reviews. 2000, 2: CD000405DOI: 10.1002/14651858.CD000405.PubMedGoogle Scholar
 Li J, Zhang M, Egger M: Intravenous magnesium for acute myocardial infarction. Cochrane Database of Systematic Reviews. 2007, 2: CD002755PubMedGoogle Scholar
 Meyhoff CS, Wetterslev J, Jorgensen LN, Henneberg SW, Simonsen I, Pulawska T, Walker LR, Skovgaard N, Heltø K, GochtJensen P, Carlsson PS, Rask H, Karim S, Carlsen CG, Jensen FS, Rasmussen LS, the PROXI Trial Group: Perioperative oxygen fraction  effect on surgical site infection and pulmonary complications after abdominal surgery: a randomized clinical trial. Rationale and design of the PROXITrial. Trials. 2008, 9 (1): 5810.1186/17456215958.View ArticlePubMedPubMed CentralGoogle Scholar
 Gluud LL: Bias in clinical intervention research. American Journal of Epidemiology. 2006, 163: 493501. 10.1093/aje/kwj069.View ArticlePubMedGoogle Scholar
 Chan AW, Hrobjartsson A, Haahr MT, Gøtzsche PC, Altman DG: Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. Journal of American Medical Association. 2004, 291 (20): 24572465. 10.1001/jama.291.20.2457.View ArticleGoogle Scholar
 Chan AW, Altman DG: Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors. BMJ. 2005, 330 (7494): 75310.1136/bmj.38356.424606.8F.View ArticlePubMedPubMed CentralGoogle Scholar
 Wood L, Egger M, Gluud LL, Schulz KF, Jüni P, Altman DG, Gluud C, Martin RM, Wood AJ, Sterne JA: Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: metaepidemiological study. BMJ. 2008, 336 (7644): 6015. 10.1136/bmj.39465.451748.AD.View ArticlePubMedPubMed CentralGoogle Scholar
 Montori VM, Devereaux PJ, Adhikari NK, Burns KE, Eggert CH, Briel M, Guyatt G: Randomized trials stopped early for benefit: a systematic review. Journal of American Medical Association. 2005, 294 (17): 22032209. 10.1001/jama.294.17.2203.View ArticleGoogle Scholar
 Flather MD, Farkouh ME, Pogue JM, Yusuf S: Strengths and limitations of metaanalysis: larger studies may be more reliable. Controlled Clinical Trials. 1997, 18 (6): 568579. 10.1016/S01972456(97)00024X.View ArticlePubMedGoogle Scholar
 Fedorov V, Jones B: The design of multicentre trials. Statistical Methods in Medical Research. 2005, 14: 205248. 10.1191/0962280205sm399oa.View ArticlePubMedGoogle Scholar
 Bangalore S, Wetterslev J, Pranesh S, Sawhney S, Gluud C, Messerli FH: Perioperative beta blockers in patients having noncardiac surgery: a metaanalysis. Lancet. 2008, 372 (9654): 196276. 10.1016/S01406736(08)615603.View ArticlePubMedGoogle Scholar
 Jennison C, Turnbull BW: Group sequential methods with application to clinical trials. 2000, Chapman & Hall/CRC, Chapter III: 49Google Scholar
 Sidik K, Jonkman JN: A comparison of heterogeneity variance estimators in combining results of studies. Statistics in Medicine. 2007, 30;26 (9): 196481. 10.1002/sim.2688.View ArticleGoogle Scholar
 Rücker G, Schwarzer , Carpenter JR, Schumacher M: Undue reliance on I ^{2} in assessing heterogeneity may mislead. BMC Medical Research Methodology. 2008, 8: 7910.1186/14712288879. doi:10.1186/14712288879.View ArticlePubMedPubMed CentralGoogle Scholar
 Higgins JP: Commentary: Heterogeneity in metaanalysis should be expected and appropriately quantified. International Journal of Epidemiology. 2008, 37: 11581160. 10.1093/ije/dyn204.View ArticlePubMedGoogle Scholar
 Rücker G, Schwarzer G, Carpenter JR, Schumacher M: Are large trials less reliable than small trials? Letter to the editor. Journal of Clinical Epidemiology. 2009, 62: 886889. 10.1016/j.jclinepi.2009.03.007.View ArticlePubMedGoogle Scholar
 Ioannidis JP, Trikalinos TA, Zintzaras E: Extreme betweenstudy homogeneity in metaanalyses could offer useful insights. Journal of Clinical Epidemiology. 2006, 59 (10): 102332. 10.1016/j.jclinepi.2006.02.013.View ArticlePubMedGoogle Scholar
 Ioannidis JP, Patsopoulos NA, Evangelou E: Uncertainty in heterogeneity estimates in metaanalyses. BMJ. 2007, 335 (7626): 9146. 10.1136/bmj.39343.408449.80.View ArticlePubMedPubMed CentralGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/9/86/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.