The heterogeneity statistic I2 can be biased in small meta-analyses

von Hippel, Paul T

doi:10.1186/s12874-015-0024-z

Research article
Open access
Published: 14 April 2015

The heterogeneity statistic I² can be biased in small meta-analyses

Paul T von Hippel¹

BMC Medical Research Methodology volume 15, Article number: 35 (2015) Cite this article

68k Accesses
706 Citations
21 Altmetric
Metrics details

Abstract

Background

Estimated effects vary across studies, partly because of random sampling error and partly because of heterogeneity. In meta-analysis, the fraction of variance that is due to heterogeneity is estimated by the statistic I². We calculate the bias of I², focusing on the situation where the number of studies in the meta-analysis is small. Small meta-analyses are common; in the Cochrane Library, the median number of studies per meta-analysis is 7 or fewer.

Methods

We use Mathematica software to calculate the expectation and bias of I².

Results

I² has a substantial bias when the number of studies is small. The bias is positive when the true fraction of heterogeneity is small, but the bias is typically negative when the true fraction of heterogeneity is large. For example, with 7 studies and no true heterogeneity, I² will overestimate heterogeneity by an average of 12 percentage points, but with 7 studies and 80 percent true heterogeneity, I² can underestimate heterogeneity by an average of 28 percentage points. Biases of 12–28 percentage points are not trivial when one considers that, in the Cochrane Library, the median I² estimate is 21 percent.

Conclusions

The point estimate I² should be interpreted cautiously when a meta-analysis has few studies. In small meta-analyses, confidence intervals should supplement or replace the biased point estimate I².

Peer Review reports

Background

When different studies estimate the effect of a treatment or exposure, the estimates will vary from one study to another. Some of this between-study variance comes from random sampling error, while some may come from heterogeneity. There are several sources of heterogeneity, including differences in the treatment, the treated population, the study design, or the data analysis method. When there is no heterogeneity, estimates are said to be homogeneous and differ only because of random sampling error.

Heterogeneity is very important. If the existing studies of a treatment are homogeneous, or nearly homogeneous, then there is some assurance that the treatment will have a similar effect when applied to new subjects. On the other hand, if the existing studies are very heterogeneous, then unless the reasons for heterogeneity are well understood, the effect of the treatment on new subjects will be hard to predict [1].

Unfortunately, when studies are compared in a meta-analysis, it is often difficult to say anything definitive about heterogeneity. The reason for this difficulty is that most meta-analyses are small. One summary of the Cochrane Library reported that the median number of studies per meta-analysis was 7 [2], another summary reported that the median was 6 [3], and another reported that the median was just 3 [3]. With so few studies, the classical test for heterogeneity, Cochran’s Q [4], is not very informative because its result is as much a function of the number of studies as it is of the amount of heterogeneity. When the number of studies is large, Q will often reject the null hypothesis even if the true extent of heterogeneity is trivial, but if the number of studies is small, Q provides little power to reject the null hypothesis of homogeneity even if substantial heterogeneity is present [5]. The power of Q and other homogeneity tests is further reduced when the studies in the meta-analysis are unbalanced in size—for example, if one of the studies in the meta-analysis is much larger than the others [5].

To better describe heterogeneity, Higgins and Thompson [6] introduced the I² statistic, which was meant to improve in two ways on Cochran’s Q. First, I² is more interpretable than Q; specifically, I² estimates the proportion of the variance in study estimates that is due to heterogeneity. Second, unlike Q, I² was meant to be independent of the number of studies; regardless of the number of studies, I² ranges from 0 to 1 because it estimates a proportion. The I² statistic is now used not just in meta-analysis but also in other analyses where we want to know what fraction of the variance in a set of estimates is due to heterogeneity [7-9].

I² does not eliminate the uncertainty that comes from having a small number of studies. No statistic can. In small meta-analyses, for the same reason that Q has low power, I² is very imprecise. For example, if Q fails to reject the null hypothesis of homogeneity, then the confidence interval around I² will usually include 0. In meta-analyses from the Cochrane Library, the 95% confidence interval around I² typically runs approximately from 0 to .60, implying that up to 60% of the between-study variance could be due to heterogeneity, or there could be no heterogeneity at all [2]. This is not a very informative conclusion. Unfortunately, the uncertainty of the I²estimate is not obvious to the typical reader of a meta-analysis published in, for example, Epidemiology [10,11], the American Journal of Epidemiology [12,13], or the Cochrane Library [14]. These outlets do not report the confidence interval around I²; they only report the point estimate I², which may give a false impression of precision.

In this note, we show that I² is not just imprecise; it is also biased. Depending on the circumstances, the bias of I² can be small or large, positive or negative, but the bias is largest when the number of studies is small and the true fraction of variance that is due to heterogeneity is either very large or very small. For example, in meta-analyses with 7 studies and no true heterogeneity, the I² statistic will on average lead us to believe that heterogeneity accounts for about 12% of the between-study variance. At the other extreme, with 7 studies and 80% of the variance due to heterogeneity, the I² statistic can on average lead us to believe that just 52% of the variance is due to heterogeneity. These biases of 12 to 28 percentage points are not trivial when one considers that, in the Cochrane Library, the median I² value is just 21% [2].

In the following sections, we calculate and illustrate the bias of I² and discuss implications for the statistics reported in meta-analyses.

Methods

We use Mathematica software, version 8, to calculate the expectation and bias of I² analytically. This Methods section introduces notation, assumptions, and statistical properties, and describes the calculations that we submitted to Mathematica. The Results section will give the results of those calculations.

Meta-analysis

Meta-analysis summarizes the results of K studies, each of which has sample size n_k, k = 1,…,K. In each study, there is a true effect β_k estimated by $ {\widehat{\beta}}_k $, with a true standard error σ_k estimated by $ {\widehat{\sigma}}_k $, or, equivalently, a true variance $ {\sigma}_k^2 $ estimated by $ {\widehat{\sigma}}_k^2 $. With large n_k, the quantity $ \left({\widehat{\beta}}_k-{\beta}_k\right)/{\widehat{\sigma}}_k $ approaches a standard normal distribution according to the central limit theorem.

Two models can be used in meta-analysis: a fixed-effects model and a random-effects model. Some confusion is possible because the term fixed effects is used in two different senses [15]. In some literature, the term fixed effects means that the K study effects β_k are assumed to be homogeneous. We use the term fixed effects in its other sense, where it means that we seek only to generalize about the K studies in the meta-analysis. The true effects β_k can be either homogeneous or heterogeneous, but they are regarded as fixed quantities. Because of sampling error, the K studies would produce different estimates $ {\widehat{\beta}}_k $ and $ {\widehat{\sigma}}_k $ if they were repeated, but the true effects β_k and true standard errors σ_k would not change.

Under a random-effects model, by contrast, we assume that the true effects β_k in the meta-analysis were drawn at random from a larger population of effects, and we seek to make inferences about that larger population [16]. So the β_k are not fixed quantities but random variables that would be different if a different sample were drawn from the population of effects.

The estimand ι²

In order to understand the properties of the estimator I², we must first define the quantity that is being estimated. We call the estimand ι². It represents the fraction of variance in the estimated effects $ {\widehat{\beta}}_k $ that is due to heterogeneity rather than measurement error.

More formally, the $ {\widehat{\beta}}_k $ vary from one study to another. The variance in $ {\widehat{\beta}}_k $ is partly due to the heterogeneity of the true effects β_k and partly due to estimation error summarized by the standard errors σ_k. By the law of total variance we have

$$ \begin{array}{c}V\left({\widehat{\beta}}_k\right)=V\left({\beta}_k\right)+E\left({\sigma}_k^2\right)\\ {}={\tau}^2+{\sigma}^2\end{array} $$

(1)

where τ² = V(β_k) is the heterogeneity variance or between-study variance, and $ {\sigma}^2=E\left({\sigma}_k^2\right) $ is the average within-study variance. Under a fixed-effects model these variances and expectations refer only to the K effects β_k and standard errors σ_k in the meta-analysis. Under a random effects model τ² refers to the larger population of effects, but σ² still refers only to the K standard errors σ_k in the meta-analysis, unless we are willing to regard the σ_k as well as the β_k as samples from a larger population.

The fraction of variance that is due to heterogeneity is

$$ {\iota}^2=\frac{V\left({\beta}_k\right)}{V\left({\widehat{\beta}}_k\right)}=\frac{\tau^2}{\tau^2+{\sigma}^2} $$

(2)

If ι² = 0 then the effects β_k are homogeneous; if ι² > 0 then they are heterogeneous.

Note that, unlike some past definitions [6], our definition of ι² does not assume equal standard errors σ₁ = σ₂ = … = σ_K. Note also that ι² is not an absolute measure of heterogeneity. Instead, τ² is an absolute measure of heterogeneity, while ι² compares τ² to σ². When the estimation error is small, as it is if n_k is large, then ι² can be large even if τ² is small [17].

The naïve estimator $ {\widehat{\boldsymbol{\iota}}}^{\mathbf{2}} $

To estimate the fraction ι², Higgins and Thompson [6] first derived the naïve estimator

$$ {\widehat{\iota}}^2=1-\frac{df}{Q} $$

(3)

where df = K–1, Q is Cochran’s Q statistic [4]

$$ Q={\displaystyle \sum_{k=1}^K}\frac{{\left({\widehat{\beta}}_k-\widehat{\overline{\beta}}\right)}^2}{{\widehat{\sigma}}_k^2} $$

(4)

and

$$ \widehat{\overline{\beta}}=\frac{{\displaystyle {\sum}_{k=1}^K}{\widehat{\sigma}}_k^{-2}{\widehat{\beta}}_k}{{\displaystyle {\sum}_{k=1}^K}{\widehat{\sigma}}_k^{-2}} $$

(5)

is the precision-weighted average of the estimated effects.

The distribution of $ {\widehat{\iota}}^2 $ depends on the distribution of Q. Under homogeneity, with large n_k, Q has a central chi-square distribution with df degrees of freedom.

Under heterogeneity, the large-n_k distribution of Q depends on whether we regard the effects as fixed or random. Under a random-effects model, Q is distributed like a weighted sum of K–1 central $ {\chi}_1^2 $ variables, where the weights are given by a matrix function of τ² and $ {\sigma}_k^2 $ [18]. If we make the simplifying assumption that all the standard errors are equal (σ_k= σ) then the weights are all equal to 1 + τ²/σ² [18] or, in our notation (1 − ι²)^− 1, so that

$$ X=\left(1-{\iota}^2\right)Q $$

(6)

has a central chi-square distribution with df degrees of freedom [18]. As ι² gets small, we converge toward the homogeneous situation where Q itself has a central chi-square distribution with df degrees of freedom.

Under a fixed-effects model, by contrast, Q has a non-central chi-square distribution with df degrees of freedom and a non-centrality parameter of [19]

$$ \lambda ={\displaystyle \sum_{k=1}^K}\frac{{\left({\beta}_k-\overline{\beta}\right)}^2}{\sigma_k^2} $$

(7)

where $ \overline{\beta} $ is the precision-weighted mean of the true effects β_k. If we make the simplifying assumption that all the standard errors are equal (σ_k= σ) then the non-centrality parameter reduces to

$$ \begin{array}{c}\lambda =\frac{1}{\sigma^2}{\displaystyle \sum_{k=1}^K}{\left({\beta}_k-\overline{\beta}\right)}^2\\ {}=K\frac{\tau^2}{\sigma^2}\\ {}=K\frac{\iota^2}{1-{\iota}^2}\end{array} $$

(8)

The last line shows that λ is an increasing function of ι² and that that λ = 0 if ι² = 0. So again, as ι² gets small, Q converges toward the central chi-square distribution that it has under homogeneity.

The truncated estimator I²

A shortcoming of the naïve estimator $ {\widehat{\iota}}^2 $ is that it can be negative even though the estimand ι² cannot. Negative values of $ {\widehat{\iota}}^2 $ occur whenever Q < df, which is not a rare event. Figure 1 shows the probability that $ {\widehat{\iota}}^2 $ is negative when the effects are homogeneous. The probability decreases as df increases, but the probability is always greater than 50%.

To avoid negative estimates, Higgins and Thompson [6] suggested rounding them up to zero. The rounded or truncated estimator

$$ {I}^2= \max \left(0,{\widehat{\iota}}^2\right) $$

(9)

is the estimator that is widely used today. I² cannot be negative but can be zero. Values of I² = 0 occur in about one-quarter of published meta-analyses [20].

Expectation and bias of the estimators

The expectation of the naïve estimator $ {\widehat{\iota}}^2 $ is

$$ E\left({\widehat{\iota}}^2\right)=1-df\ E\left(\frac{1}{Q}\right) $$

(10)

This is easily calculated in the homogeneous case, where 1/Q is an inverse chi-square variable whose expectation is 1/(df – 2). It is just as easily calculated in the heterogeneous case with fixed effects; in that case, 1/Q is a scaled inverse chi-square variable with an expectation of (1 − ι²)/(df − 2). The calculation is harder in the heterogeneous case with random effects; in that case, 1/Q is the scaled inverse of a noncentral chi-square variable. Although the expectation of this inverse has a closed-form solution [21], it is not transparent or easy to calculate by hand. However, we can calculate it using Mathematica.

The expectation of the truncated estimator I² is a little harder to calculate. It is the weighted average of two conditional expectations: the expectation of I² when I² = 0 and the expectation of I² when I² > 0. The probability that I² = 0 is P(Q < df), and the probability that I² > 0 is P(Q > df). Therefore the expectation of I² is

$$ \begin{array}{c}E\left({I}^2\right)=P\left(Q<df\right)\times 0+P\left(Q>df\right)\times E\left({I}^2\left|Q>df\right.\right)\\ {} = P\left(Q>df\right)\times E\left(1-\frac{df}{Q}\left|Q>df\right.\right)\end{array} $$

(11)

Under homogeneity, Q has a central chi-square distribution and the expectation E(I²) has a closed-form solution which Mathematica can calculate.

Under heterogeneity, the expectation E(I²) depends on whether we regard the effects as fixed or random. If effects are random, then X = (1 − ι²)Q has a central chi-square distribution. The probability that I² = 0 is P(X < (1 − ι²)df), and the probability that I² > 0 is P(X > (1 − ι²)df). Therefore the expectation of I² is

$$ E\left({I}^2\right) = P\left(X>\left(1-{\iota}^2\right)df\right)\times E\left(1-\left(1-{\iota}^2\right)\frac{df}{X}\left|X>\left(1-{\iota}^2\right)df\right.\right) $$

(12)

which again has a closed-form solution which Mathematica can calculate.

If instead effects are fixed, then the expectation E(I²) in (11) has no closed-form solution. But the expectation for specific values of ι² and df can be calculated using numerical integration in Mathematica.

Results and discussion

Expectation and bias of I² under homogeneity

Under homogeneity, there are two sources of bias in I², one positive and one negative. The positive source is larger, so the net bias in I² is positive.

The first source of bias is negative bias in the naïve estimator $ {\widehat{\iota}}^2=1-df/Q $. Since the estimand ι² is zero, the bias of $ {\widehat{\iota}}^2 $ is the expectation

$$ Bias\left({\widehat{\iota}}^2\right)=E\left({\widehat{\iota}}^2\right)=\frac{-2}{df-2} $$

(13)

which is negative, and larger if df is small.

The second source of bias arises when $ {\widehat{\iota}}^2 $ is truncated to yield $ {I}^2= \max \left(0,{\widehat{\iota}}^2\right). $ Since truncation rounds negative values up to 0, the resulting truncation bias is positive. When df is small, truncation is more common (Figure 1), so the truncation bias is more severe.

While this intuitive explanation is helpful, it does not tell us whether the positive and negative components combine to produce a net bias that is positive or negative, large or small. To answer that question, we evaluate the expectation E(I²) in (11), which is also the bias since the estimand is ι² = 0. Mathematica gives the bias as

$$ Bias\left({I}^2\right)=E\left({I}^2\right)=\left(\frac{df}{df-2}\right)\frac{{\left(\frac{df}{2\mathrm{e}}\right)}^{df/2}-\Gamma \left(\frac{df}{2},\frac{df}{2}\right)}{\Gamma \left(\frac{df}{2}+1\right)} $$

(14)

where Γ(df/2 + 1) is the gamma function and Γ(df/2, df/2) is the upper incomplete gamma function (which has two arguments).

It is hard to tell by inspecting (14) whether the bias is positive or negative, small or large. To visualize the answer, Figure 2 plots the expectation E(I²), which is also the Bias(I²), as a function of the number of studies K = df + 1. The bias is always positive, indicating that the positive truncation bias outweighs the negative bias in in $ {\widehat{\iota}}^2 $. The bias shrinks at a decreasing rate as K grows. With K = 3 studies (which is the median in one summary of the Cochrane Library [22]), the bias is undefined because E(I²) is only defined if df > 2. With K = 7 studies (which is the median in another summary of the Cochrane Library [2]), the bias is .12. With K = 10 studies, the bias is .11; with K = 50 studies the bias is .06.

Expectation and bias of I² under heterogeneity

Under heterogeneity, the expectation E(I²) depends on whether we regard the effects as fixed or random.

Random-effects model

With random effects, there are still two sources of bias in I², one positive and one negative. But now the positive source can be either smaller or larger than the negative source, so that the overall bias can be either negative or positive.

The first source of bias is negative bias in the naïve estimator $ {\widehat{\iota}}^2 $:

$$ \begin{array}{c} Bias\left({\widehat{\iota}}^2\right)=E\left(1-\frac{df}{Q}\right)-{\iota}^2\\ {}=\frac{2{\iota}^2-2}{df-2}\end{array} $$

(15)

This bias is always negative since 0 ≤ ι² < 1. The bias is larger if df is small.

The second source of bias arises when $ {\widehat{\iota}}^2 $ is truncated to yield $ {I}^2= \max \left(0,{\widehat{\iota}}^2\right). $ Since truncation rounds negative values up to 0, truncation yields a positive bias. The truncation bias is smaller if df is large or ι² is large. This is because the probability of truncation is a little smaller when df is large, and a lot smaller when ι² is large. (From (12) the probability of truncation is P(X > (1 − ι²)df), where $ X\sim {\upchi}_{df}^2 $).

Intuitively, when ι² is small, we approach the homogeneous case where the bias in I² is positive because of truncation. However, when ι² is large, truncation is less common and the bias in I² approaches the bias of $ {\widehat{\iota}}^2 $, which is negative.

More formally, under a random-effects model, the expectation E(I²) in (12) has a solution which Mathematica gives as

$$ E\left({I}^2\right)=\frac{\left(-2{\mathrm{e}}^{\frac{1}{2}df\left({\iota}^2-1\right)}\left(df\left({\iota}^2-1\right)-2\right)-df\left({\iota}^2-1\right)\left(df{\iota}^2-2\right){E}_{-\frac{\mathrm{df}}{2}}\left(\frac{1}{2}\left(df-df{\iota}^2\right)\right)\right)}{\left(df-2\right)df{E}_{1-\frac{df}{2}}\left(-\frac{1}{2}df\left({\iota}^2-1\right)\right)} $$

(16)

where expressions of the form E_n(z) represent the exponential integral function.

The expectation in (16) is in closed form but is even less transparent than its predecessor in (14). It is not clear from inspection whether the bias is large or small, positive or negative.

To visualize E(I²), Figure 3 gives a graphics grid displaying 9 plots of E(I²) as a function of K, for ι² values between .1 and .9. In each plot, a dotted line is drawn at the value of the estimand ι², so that the bias of I² is the difference between the dotted line and the curve E(I²).

The bias is generally larger for small K. At ι² = .1 the bias is positive. At ι² = .2 there is practically no bias, and above ι² = .2 the bias switches from positive to negative. As ι² increases beyond .2 the bias gets larger for small K, but smaller for large K.

When K is large there is practically no bias, particularly if ι² is large as well. But when K is small, as is often the case in meta-analysis, the bias can be noticeable even if ι² is large. For example, if ι² = .8 and K = 7 (a typical or even high value for the Cochrane Library [2]), the expectation of I² is just .52.

Fixed-effects model

Under heterogeneity with fixed effects, Mathematica gives the expectation of the naïve estimator $ {\widehat{\iota}}^2 $ as

$$ E\left({\widehat{\iota}}^2\right)=1+df\ {2}^{df/2-2}\ {\mathrm{e}}^{-\lambda /2}{\left(-1\right)}^{-df/2}{\lambda}^{1-df/2}\left(\Gamma \left(\frac{df}{2}-1\right)-\Gamma \left(\frac{df}{2}-1,-\frac{\lambda }{2}\right)\right) $$

(17)

where λ = Kι²/(1 − ι²) from equation (8). However, this expression for $ E\left({\widehat{\iota}}^2\right) $ is only real if df is even.^a If df is odd, a much longer exact expression for $ E\left({\widehat{\iota}}^2\right) $ can be derived using results in [21], or an approximation can be obtained numerically.

The bias of the naïve estimator is $ {\widehat{\iota}}^2-{\iota}^2 $. Although it is not obvious from inspection, the bias is negative for ι < .8, and very slightly positive for ι ≥ .8.

The bias of the truncated estimator I² is a little different. Intuitively, when ι² is small, we approach the homogeneous case where the bias in I² is positive because of truncation. However, as ι² gets large, truncation is less common and the bias in I² approaches the bias of $ {\widehat{\iota}}^2 $, which again is negative for ι < .8, and very slightly positive for ι ≥ .8.

The expectation of the truncated estimator I² can be calculated from equation (11) but under a fixed-effects model the solution no longer has a closed form, not even a complicated one. Instead, to evaluate E(I²) we use numerical integration in Mathematica.

Figure 4 is a graphics grid displaying 9 plots of E(I²) as a function of K, for ι² values between .1 and .9. The bias is generally larger for small K. At ι² = .1 the bias is positive. At ι² = .2 there is practically no bias except for very small K. Above ι² = .2 the bias switches from positive to negative. As ι² increases from .3 to .5 the negative bias gets larger, but as ι² increases further from .6 to .7, the bias gets smaller and is increasingly restricted to small values of K, until at ι² =.8 there is practically no bias. At ι² =.9 the bias is positive again but very small and restricted to very small values of K.

In general, the bias is milder under the fixed-effects model than under the random-effects model, particularly if ι² is large. For example, if ι² = .8 and K = 7 (a typical or even high value for the Cochrane Library [2]), the expectation of I² is just .52 under the random-effects model but is .80 (practically unbiased) under the fixed-effects model.

Conclusions

We have shown that, in small meta-analyses, the widely used heterogeneity statistic I², which was already known to be imprecise, is biased as well. The bias shrinks as the number of studies K grows, but since K is often small in published meta-analyses, the bias of I² is often large in practice.

The bias and imprecision of I² are to some extent unavoidable and should not be taken as a criticism of the I² statistic itself. All statistics are imprecise in small samples, and any reasonable estimator of the heterogeneity fraction ι² will be biased when the true value of ι² is close to 0. The reason for the bias is fundamental. Like the estimand ι², any reasonable estimator should be limited to nonnegative estimates, but the expectation of those nonnegative estimates will be positive and will exceed ι² when the true value of ι² is close to 0.

Similar bias has been observed in the heterogeneity variance τ². Any reasonable estimator of τ² will be limited to nonnegative values, and this will cause bias when the true value of τ² is close to zero [15,23]. Estimators of τ² have been constructed that are less biased or more precise under some circumstances, but all nonnegative estimators are biased when the true value of τ² is close to zero [24].

Despite its bias and imprecision, the I² statistic remains useful. In large meta-analyses, I² can be precise with little bias, and even in small meta-analyses it is better to have a biased and imprecise estimate of ι² than it is to have no estimate at all. In addition, although the bias of I² depends to some extent on the number of studies K, I² is much less dependent on K than Q is.

Nevertheless, I² should be presented and interpreted cautiously in small meta-analyses. Perhaps the most straightforward response to the bias and imprecision of I² is to report a 95% confidence interval in addition to—or even instead of—the point estimate I². Although methods for calculating confidence intervals around I² can be a bit complicated [6,19,23,25], the best methods have good coverage and they give a sense of the range of possible ι² values without highlighting a point estimate that may be biased and imprecise. While some meta-analyses do report confidence intervals around I² [26], such confidence intervals are not included in recent meta-analysis published in journals such as Epidemiology [10,11], the American Journal of Epidemiology [12,13], or the Cochrane Library. Journals publishing meta-analysis should consider requiring confidence intervals for ι².

In small meta-analyses, confidence intervals for ι² are often very wide [2] but their width tells us something. The width of the confidence intervals tells us how little information a small meta-analysis typically provides about heterogeneity. In many small meta-analyses, we may not be able to estimate heterogeneity with much precision; in fact, we may have little confidence in any estimate beyond the average effect size. No statistic can change the limitations of small meta-analyses, and the statistics that we report should make those limitations clear.

Endnote

^aWe filed a bug report with Wolfram Research regarding Mathematica’s failure to provide a real solution for odd df.

References

Melsen WG, Bootsma MCJ, Rovers MM, Bonten MJM. The effects of clinical and statistical heterogeneity on the predictive values of results from meta-analyses. Clin Microbiol Infect. 2014;20(2):123–9.
Article CAS PubMed Google Scholar
Ioannidis JPA, Patsopoulos NA, Evangelou E. Uncertainty in heterogeneity estimates in meta-analyses. BMJ. 2007;335(7626):914–6.
Article PubMed PubMed Central Google Scholar
Davey J, Turner RM, Clarke MJ, Higgins JP. “Characteristics of meta-analyses and their component studies in the Cochrane Database of Systematic Reviews: a cross-sectional, descriptive analysis”. BMC Med Res Methodol. 2011;11(1):160.
Article PubMed PubMed Central Google Scholar
Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10:101–29.
Article Google Scholar
Hardy RJ, Thompson SG. Detecting and describing heterogeneity in meta-analysis. Stat Med. 1998;17(8):841–56.
Article CAS PubMed Google Scholar
Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539–58.
Article PubMed Google Scholar
Koedel C. An empirical analysis of teacher spillover effects in secondary school. Econ Educ Rev. 2009;28(6):682–92.
Article Google Scholar
Koedel C, Parsons E, Podgursky M, Ehle M. Teacher preparation programs and teacher quality: are there real differences across programs? Washington, DC: American Institutes for Research; 2012. CALDER working paper 63.
Google Scholar
von Hippel PT, Osborne C, Lincove A, Bellows L, Mills N. The challenges of seeking exceptional teacher preparation programs among many noisy estimates. Rochester, NY: Social Science Research Network; 2014. SSRN Scholarly Paper ID 2506935.
Google Scholar
Kivimäki M, Batty GD, Ferrie JE, Kawachi I. Cumulative meta-analysis of job strain and CHD. Epidemiology. 2014;25(3):464–5.
Article PubMed Google Scholar
Aune D, Saugstad OD, Henriksen T, Tonstad S. Physical activity and the risk of preeclampsia: a systematic review and meta-analysis. Epidemiology. 2014;25(3):331–43.
Article PubMed Google Scholar
Crippa A, Discacciati A, Larsson SC, Wolk A, Orsini N. Coffee consumption and mortality from all causes, cardiovascular disease, and cancer: a dose–response meta-analysis. Am J Epidemiol. 2014;180(8):763–75.
Article PubMed Google Scholar
Kim Y, Je Y. Dietary fiber intake and total mortality: a meta-analysis of prospective cohort studies. Am J Epidemiol. 2014;180(6):565–73.
Article PubMed Google Scholar
Cochrane Collaborative, “Cochrane Library”, 2015. [Online]. Available: http://www.cochranelibrary.com/. [Accessed: 25-Feb-2015].
Hedges LV, Vevea JL. Fixed- and random-effects models in meta-analysis. Psychol Methods. 1998;3(4):486–504.
Article Google Scholar
Higgins JPT, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. J R Stat Soc A Stat Soc. 2009;172(1):137–59.
Article Google Scholar
Rücker G, Schwarzer G, Carpenter JR, Schumacher M. “Undue reliance on I2 in assessing heterogeneity may mislead”. BMC Med Res Methodol. 2008;8(1):79.
Article PubMed PubMed Central Google Scholar
Biggerstaff BJ, Jackson D. The exact distribution of Cochran’s heterogeneity statistic in one-way random effects meta-analysis. Stat Med. 2008;27(29):6093–110.
Article PubMed Google Scholar
Hedges LV, Pigott TD. The power of statistical tests in meta-analysis. Psychol Methods. 2001;6(3):203–17.
Article CAS PubMed Google Scholar
Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Stat Med. 2000;19(13):1707–28.
Article CAS PubMed Google Scholar
Bock ME, Judge GG, Yancey TA. A simple form for the inverse moments of non-central χ2 andF random variables and certain confluent hypergeometric functions. J Econometrics. 1984;25(1–2):217–34.
Article Google Scholar
Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JP. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. Int J Epidemiol. 2012;41(3):818–27.
Article PubMed PubMed Central Google Scholar
Viechtbauer W. Bias and efficiency of meta-analytic variance estimators in the random-effects model. J Educ Behav Stat. 2005;30(3):261–93.
Article Google Scholar
Chung Y, Rabe-Hesketh S, Choi I-H. Avoiding zero between-study variance estimates in random-effects meta-analysis. Stat Med. 2013;32(23):4071–89.
Article PubMed Google Scholar
Hartung J, Knapp G. On confidence intervals for the among-group variance in the one-way random effects model with unequal error variances. J Stat Plann Infer. 2005;127(1–2):157–77.
Article Google Scholar
Ray KK, Seshasai SRK, Erqou S, Sever P, Jukema JW, Ford I, et al. Statins and all-cause mortality in high-risk primary prevention: a meta-analysis of 11 randomized controlled trials involving 65,229 participants. Arch Intern Med. 2010;170(12):1024–31.
Article CAS PubMed Google Scholar

Download references

Acknowledgments

This article was not part of any funded project. I thank Erika Patall, James Pustejovsky, Tasha Beretvas, and journal reviewers for comments on earlier drafts.

Author information

Authors and Affiliations

Center for Health and Social Policy, LBJ School of Public Affairs, University of Texas, Austin, 2315 Red River, Box Y, Austin, TX, 78712, USA
Paul T von Hippel

Authors

Paul T von Hippel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul T von Hippel.

Additional information

Competing interests

The author declares that he has no competing interests.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

von Hippel, P.T. The heterogeneity statistic I² can be biased in small meta-analyses. BMC Med Res Methodol 15, 35 (2015). https://doi.org/10.1186/s12874-015-0024-z

Download citation

Received: 13 December 2014
Accepted: 24 March 2015
Published: 14 April 2015
DOI: https://doi.org/10.1186/s12874-015-0024-z

The heterogeneity statistic I² can be biased in small meta-analyses