Random-effects meta-analysis
Meta-analysis is very commonly performed via a random-effects approach, utilizing the normal-normal hierarchical model. Here the data are given in terms of a number k of estimates y
i
∈ℝ that are associated with some uncertainty given through standard errors s
i
>0 that are taken to be known without uncertainty. The estimates are assumed to measure trial-specific parameters θ
i
∈ℝ:
$$ y_{i} \sim \mathrm{N}\left(\theta_{i}, {s_{i}^{2}}\right) \qquad \text{for}\, i = 1,\ldots,k. $$
((1))
The parameters θ
i
vary from trial to trial around a global mean μ∈ℝ due to some heterogeneity variance between trials that constitutes an additive variance component to the model,
$$ \theta_{i} \sim \mathrm{N}\left(\mu, \tau^{2}\right) \qquad \text{for}\, i = 1,\ldots,k, $$
((2))
where τ
2≥0. The model may then be simplified by integrating out the parameters θ
i
, leading to the marginal expression
$$ y_{i} \sim \mathrm{N}\left(\mu,\, {s_{i}^{2}} + \tau^{2}\right) \qquad \text{for}\, i = 1,\ldots,k. $$
((3))
Among the two unknowns in the model, the overall mean μ, the effect, usually is the figure of primary interest, while the heterogeneity variance τ
2 constitutes a nuisance parameter. When τ
2=0, the model simplifies to the so-called fixed-effect model [1, 2].
The “relative amount of heterogeneity” in a meta-analysis may be expressed in terms of the measure I
2, which is defined as
$$ I^{2} = \frac{\tau^{2}}{\tau^{2}+\tilde{s}^{2}} $$
((4))
where \(\tilde {s}\) is some kind of “average” standard error among the study-specific s
i
[24]. In the following simulation studies, we will determine \(\tilde {s}^{2}\) as the arithmetic mean of squared standard errors.
Parameter estimation
If the value of the heterogeneity variance parameter τ
2 were known, the (conditional) maximum-likelihood effect estimate would result as the weighted average
$$ \hat{\mu} = \frac{\sum_{i} w_{i} \,y_{i}}{\sum_{i} w_{i}} $$
((5))
with “inverse variance weights” defined as
$$ w_{i} = \frac{1}{{s_{i}^{2}} + \tau^{2}} \qquad \text{for} \,i = 1,\ldots,k. $$
((6))
A common approach to inference within the random-effects model is to first estimate the heterogeneity variance τ
2, and subsequently estimate the effect μ
conditional on the heterogeneity estimate. Note that the w
i
are effectively treated as “known” while in fact both the \({s_{i}^{2}}\) as well as τ
2 are only measured with some uncertainty (that depends on the size/precision of the ith individual study and the number of studies k). There is a wide range of different heterogeneity estimators available (see e.g. [3–6] for more details). In the following we will concentrate on some of the most common ones, the DerSimonian-Laird (DL) estimator, a moment estimator [25] with acknowledged shortcomings [26], the restricted maximum likelihood (REML) estimator [3, 27], and the Paule-Mandel (PM) estimator, an essentially heuristic approach [28, 29]. Software to compute the different estimates is provided e.g. in the metafor and meta
R packages [30, 31].
Confidence intervals and tests
Normal approximation
Confidence intervals and, equivalently, tests for the effect μ are commonly constructed using a normal approximation for the estimate \(\hat {\mu }\). The standard error of \(\hat {\mu }\), conditional on a fixed heterogeneity variance value τ
2, is given by
$$ \hat{\sigma}_{\mu} = \sqrt{\frac{1}{\sum_{i} w_{i}}}. $$
((7))
A confidence interval for the effect μ then results via a normal approximation as
$$ \hat{\mu} \; \pm \; \hat{\sigma}_{\mu} \; z_{(1-\alpha/2)} $$
((8))
where z
(1−α/2) is the (1 − α/2)-quantile of the standard normal distribution, and (1 − α) is the nominal coverage probability [1, 2]. The normal approximation based on a heterogeneity variance estimate \(\hat {\tau }^{2}\) usually works well for many studies (large k) and small standard errors (small s
i
), or negligible heterogeneity (small variance τ
2), but tends to be anticonservative otherwise [8, 9], Friede T, Röver C, Wandel S, Neuenschwander B, Meta-analysis of few small studies in orphan diseases, Submitted.
The Hartung-Knapp-Sidik-Jonkman (HKSJ) method
Hartung and Knapp [8, 9] and Sidik and Jonkman [10] independently introduced an adjusted confidence interval. In order to determine the adjusted interval, first the quadratic form
$$ q = \frac{1}{k-1} \sum_{i} w_{i}(y_{i}-\hat{\mu})^{2} $$
((9))
is computed [8–10]. The adjusted confidence interval then results as
$$ \hat{\mu} \; \pm \; \sqrt{q} \; \hat{\sigma}_{\mu} \; t_{(k-1); (1-\alpha/2)} $$
((10))
where t
(k−1);(1−α/2) is the (1 − α/2)-quantile of the Student-t distribution with (k − 1) degrees of freedom. Note that \(q \hat {\sigma }^{2}\mu \) is derived from a non-negative and unbiased estimator of \(\frac {1}{\sum _{i} w_{i}}\) [32]. Confidence intervals based on the normal approximation may easily be converted to HKSJ-adjusted ones [12]. The method has also been generalized to the cases of multivariate meta-analysis and meta-regression [33].
The modified Knapp-Hartung (mKH) method
The HKSJ confidence interval (10) tends to be wider than the one based on the normal approximation (8), since the Student-t quantile is larger than the corresponding normal quantile, while q will tend to be somewhere around unity. However, q may in fact also turn out arbitrarily small, and if \(\sqrt {q}<\frac {z_{(1-\alpha /2)}}{t_{(k-1);(1-\alpha /2)}}\), then the modified interval will be shorter than the normal one, which may be considered counter-intuitive. A simple ad hoc modification to the procedure results from defining
$$ q^{\star} = \max\{1, q \} $$
((11))
[11] and using q
⋆ instead of q to construct confidence intervals and tests. This will ensure a more conservative procedure. The modification was originally proposed in the meta-regression context, but the simple meta-analysis here constitutes the special case of an “intercept-only” regression.
Note that the PM heterogeneity variance estimator is effectively defined by choosing \(\hat {\tau }^{2}\) such that q (Eq. (9)) is = 1 (or less, if no solution exists) [28, 29], so that for the PM estimator the corresponding q
⋆ value always equals q
⋆ = 1.
Simulations
Since a (1 − α) confidence interval is supposed to cover the true parameter value with probability (1 − α), the calibration of such intervals may be checked by repeatedly generating random data based on known parameter values and then determining the empirical frequency with which true values are actually covered [34]. We performed such a Monte Carlo simulation comparing the HKSJ and mKH approaches using the setup that was introduced by IntHout et al. [12]. Data were simulated on a continuous scale, according to the random-effects model described above, with study-specific standard errors s
i
set to reflect certain scenarios with respect to the relative size of studies and their variation due to estimation uncertainty. Many meta-analyses are based on (discrete) count data, where the random-effects model assumptions only hold to an approximation that works well unless event probabilities are very low. Alternative methods have been proposed to deal with low event probabilities [35, 36], but low-event-rate effects were not considered in the present investigation. These simulations considered four different scenarios, namely meta-analyses (A) with trials of equal size, (B) with equally sized trials but including one small trial, (C) with 50 % large and small trials, and (D) equally sized trials and one large trial. The sizes of “small” and “large” trials (and hence, squared standard errors \({s_{i}^{2}}\)) here differ by a factor of ten, so that the associated standard errors differ by roughly a factor of 3; for more details see ([12], Appendix 2). Numbers of studies k considered here are in the range of 2–11, and the true levels of heterogeneity were I
2∈{0.00,0.25,0.50,0.75,0.90}. At each combination of parameters 10 000 meta-analyses were simulated. All simulations were performed using R [37].