In this section we develop two exact frequentist methods for computing confidence intervals for *τ*
^{2} and we also present our proposal for Bayesian meta-regressions using informative prior distributions for this parameter. By ‘exact’ we mean, under the random effects model for meta-regression (1), that these two frequentist methods provide confidence intervals with exactly the nominal coverage probability (subject to the way we interpret empty confidence intervals, see below). A disadvantage of these methods is that they are not based on sufficient statistics and so have not been shown to possess any optimality properties. We assume throughout that the design matrix is of full rank, so that all parameters are identifiable. In practice we suggest that, as a minimum, *n*=10 is needed to fit a meta-regression with a single covariate effect. Even larger *n* will be required in order to estimate the effect of multiple covariates in the same meta-regression model.

### Generalised Cochran heterogeneity statistics for meta-regression

The conventional *Q* statistic for meta-regression is

Q=\sum _{i=1}^{n}{w}_{i}{\left({y}_{i}-{\u0177}_{i}\right)}^{2}

(2)

where {w}_{i}={\sigma}_{i}^{-2} and {\u0177}_{i} is the fitted value for the *i*th study from the fixed effects model, where *τ*
^{2}=0. That is {\u0177}_{i}={\mathit{x}}_{i}{\widehat{\mathit{\beta}}}_{\mathit{F}} where {\widehat{\mathit{\beta}}}_{\mathit{F}}={\left({\mathit{X}}^{t}\mathit{W}\mathit{X}\right)}^{-1}{\mathit{X}}^{t}\mathit{W}\mathit{Y} and *W*=diag(*w*
_{
i
})=*Δ*
^{-1}. Under the null hypothesis *H*
_{0}:*τ*
^{2}=0, *Q* follows a {\chi}_{n-p}^{2} distribution and so may be used as a test statistic.

DerSimonian and Kacker [13] proposed a generalised version of *Q* in the special case of meta-analysis and where the only ‘covariate’ is the intercept. They proposed using an arbitrary set of fixed positive constants *a*
_{
i
} instead of *w*
_{
i
} when computing (2). Jackson [8] showed that this can be used to compute confidence intervals for the between-study variance in random effects meta-analyses. An obvious and more general version of DerSimonian and Kacker’s heterogeneity statistic for meta-regression is

{Q}_{a}=\sum _{i=1}^{n}{a}_{i}{\left({y}_{i}-{\u0177}_{i}\right)}^{2}

(3)

where the *a*
_{
i
} are arbitrary positive constants and {\u0177}_{i} is now {\u0177}_{i}={\mathit{x}}_{i}{\widehat{\mathit{\beta}}}_{\mathit{a}}, where {\widehat{\mathit{\beta}}}_{\mathit{a}}={\left({\mathit{X}}^{t}\mathit{A}\mathit{X}\right)}^{-1}{\mathit{X}}^{t}\mathit{A}\mathit{Y} and *A*=diag(*a*
_{
i
}). If *a*
_{
i
}=*w*
_{
i
} for all *i*, or equivalently *A*=*W*, then *Q*
_{
a
} in equation (3) reduces to the usual heterogeneity statistic (2). We will develop the theory in terms of *Q*
_{
a
} and so include the conventional *Q* as a special case.

In order to derive the properties of (3) we write this in matrix form. We have that

\mathit{Y}-\widehat{\mathit{Y}}=\left(\mathit{I}-\mathit{X}{\left({\mathit{X}}^{t}\mathit{A}\mathit{X}\right)}^{-1}{\mathit{X}}^{t}\mathit{A}\right)\mathit{Y}

and (3) can be written as

{Q}_{a}={\left(\mathit{Y}-\widehat{\mathit{Y}}\right)}^{t}\mathit{A}\left(\mathit{Y}-\widehat{\mathit{Y}}\right).

After a little manipulation we can write

{Q}_{a}={\mathit{Y}}^{t}\mathit{B}\mathit{Y}

(4)

where *B*=*A*-*A* *X*(*X*
^{t}
*A* *X*)^{-1}
*X*
^{t}
*A*.

Next, essentially following Biggerstaff and Jackson [7] and Jackson [8], let **Z** denote a standard *n* dimensional multivariate normal vector. Noting that *Q*
_{
a
} is ‘location invariant’ (by this we mean that we can write **Y**=*X* *β*+**Σ**
^{1/2}
**Z** in equation (4), where **Z** is standard multivariate normal, and the location *X* *β* cancels in the computation of *Q*
_{
a
}), we can obtain

{Q}_{a}={\mathbf{Y}}^{t}\mathbf{B}\mathbf{Y}\stackrel{d}{=}{\mathbf{Z}}^{t}{\mathbf{\Sigma}}^{1/2}\mathbf{B}{\mathbf{\Sigma}}^{1/2}\mathbf{Z}={\mathbf{Z}}^{t}\mathbf{S}\mathbf{Z},

defining **S**=**Σ**
^{1/2}
**B** **Σ**
^{1/2}. **B** is symmetric and hence so is **S**. Following the same procedure described by Biggerstaff and Jackson [7] and Jackson [8], writing **S** in terms of its spectral decomposition, we obtain

{Q}_{a}\stackrel{d}{=}\sum _{i=1}^{n-p}{\lambda}_{i}\left(\mathbf{S}\right){\chi}_{1,i}^{2}

(5)

where {\chi}_{1,i}^{2} are mutually independent chi-squared random variables with 1 degree of freedom and *λ*
_{1}(**S**)≥*λ*
_{2}(**S**)≥⋯≥*λ*
_{
n
}(**S**) are the ordered eigenvalues of **S**. Because we take the covariates, the within-study variances {\sigma}_{i}^{2} and the *a*
_{
i
} as fixed constants, from (5) we have that the only unknown parameter that the distribution of *Q*
_{
a
} depends on is *τ*
^{2}. The summation in (5) extends to *n*-*p*, rather than *n*, because *p* of the eigenvalues of **S** are zero. This is because *S*=*Σ*
^{1/2}
*A*
^{1/2}(*I*-*H*)*A*
^{1/2}
*Σ*
^{1/2}, where *H* is the hat matrix for the regression where the design matrix is *A*
^{1/2}
*X*. Because *A* and *Σ* are diagonal, and because the eigenvalues of *C* *D* are those of *D* *C* when *C* and *D* are both square matrices [14], p 57, the eigenvalues of *S* are those of *Σ*
^{1/2}
*A*
^{1/2}(*I*-*H*)*A*
^{1/2}
*Σ*
^{1/2} or equivalently those of (*I*-*H*)*A* *Σ*. Then, because (*I*-*H*) and *A* *Σ* are positive semi-definite symmetrical matrices, Theorem 8.12 from Zhang [14], p 274, implies that *λ*
_{
i
}(**S**)=*λ*
_{
i
}((*I*-*H*)*A* *Σ*)≤*λ*
_{
i
}(*I*-*H*)*λ*
_{1}(*A* *Σ*). *λ*
_{1}(*A* *Σ*)>0 and, because *Q*
_{
a
} is non-negative, the eigenvalues of **S** are also non-negative; *p* eigenvalues of (*I*-*H*) are zero and hence so are *p* eigenvalues of **S**, as stated.

#### Obtaining confidence intervals numerically

The cumulative distribution function of *Q*
_{
a
} is decreasing in *τ*
^{2}. This is because the eigenvalues of *S* are increasing in *τ*
^{2}. This can be shown in a similar way to the meta-analysis setting as explained by Jackson [8]. The eigenvalues of **S** are those of *Σ*
^{1/2}
*A*
^{1/2}(*I*-*H*)*A*
^{1/2}
*Σ*
^{1/2} as noted above. Consider a larger between-study variance, so that *Σ* becomes *Σ* *M* and *Σ*
^{1/2} becomes *Σ*
^{1/2}
*M*
^{1/2}, where **M** is a diagonal matrix in which all diagonal entries and hence all eigenvalues are also greater than one. Then Theorem 8.12 from Zhang [14], p 274, implies that *λ*
_{
i
} (*A*
^{1/2}
*Σ*
^{1/2}
*M*
^{1/2}(*I*-*H*)*A*
^{1/2}
*Σ*
^{1/2}
*M*
^{1/2})=*λ*
_{
i
}(*A*
^{1/2}
*Σ*
^{1/2}(*I*-*H*)*A*
^{1/2}
*Σ*
^{1/2}
*M*)≥*λ*
_{
i
}((*A*
^{1/2}
*Σ*
^{1/2}(*I*-*H*)*A*
^{1/2}
*Σ*
^{1/2}))*λ*
_{
n
}(**M**), which means that the larger between-study variance has resulted in larger eigenvalues of **S**. Hence these eigenvalues are increasing in *τ*
^{2} as stated.

This ensures that confidence intervals (rather than confidence sets) for *τ*
^{2} can be obtained in the way described by Jackson [8] for meta-analysis, using the distributional result in (5) and test inversion (eg Casella and Berger [15], section 9.2.1). This means that {\tau}_{J}^{2} is accepted by the two-tailed hypothesis test, and so lies in the corresponding confidence set, if and only if

P\phantom{\rule{0.1em}{0ex}}\left({Q}_{a}\ge {q}_{a};{\tau}^{2}={\tau}_{J}^{2}\right)\ge \alpha /2

(6)

and

P\phantom{\rule{0.1em}{0ex}}\left({Q}_{a}\le {q}_{a};{\tau}^{2}={\tau}_{J}^{2}\right)\ge \alpha /2

(7)

where *q*
_{
a
} is the observed value of *Q*
_{
a
} and the coverage probability is (1-*α*). As noted by Jackson [8], alternative values to *α*/2 are possible in (6) and (7), but we will use ‘equal tailed’ two-sided confidence intervals. If no {\tau}_{J}^{2} satisfies (7) then we have two main choices. The first option is to follow Jackson [8] in providing an interval of [0,0] that increases the coverage probability by *α*/2 when *τ*
^{2}=0 which is therefore conservative. The second option is to provide an empty set as suggested by Viechtbauer [9] in relation to the *Q* profile method below. Providing the empty set retains the nominal coverage probability for all *τ*
^{2} but means that the analyst is faced with the task of explaining why the empty set is appropriate to practitioners. We leave it to the reader to decide which convention they prefer. Since the empty set is itself an interval, either convention can be described as providing confidence intervals. A possible third convention, as discussed by Jackson [8], would be to refrain from giving a confidence interval in such instances and instead conclude that the interval is undefined. This statement would be accompanied with a conclusion like ‘the data appear to be highly homogenous’ or that ‘the interval estimation fails’.

Because the cumulative distribution function of *Q*
_{
a
} is decreasing in *τ*
^{2}, we can easily perform numerical searches to obtain the confidence interval limits using Farebrother’s algorithm [16] to evaluate the distribution function of *Q*
_{
a
} implied by (5). Since, as in the *Q* profile method below, the confidence intervals are based upon an exact (under the assumptions of the model) distributional result, exact confidence intervals are obtained, subject to the [0,0] or empty set caveat discussed above. An *R* function is provided in the Additional files 1, 2 and 3 that produces confidence intervals using this method, where empty confidence sets are given as [0,0]. This function has now been implemented in the *metafor* package and code for use with this package is also provided in the Additional files 1, 2 and 3.

#### Choosing the ‘weights’

One issue when using this procedure is that it involves choosing a set of ‘weights’ *a*
_{
i
}. Jackson [8] discusses possible approaches for this. In the absence of information about the likely magnitude of the between-study variance, Jackson [8] suggested using *a*
_{
i
}=1/*σ*
_{
i
} but practitioners may prefer to use the conventional fixed effects weights {a}_{i}=1/{\sigma}_{i}^{2} because these may be more familiar. An investigation into the optimal choice of weights, as undertaken in a simulation study by Jackson [8], could provide suitable further work but similar results to those in the meta-analysis (no covariates) scenario may reasonably be anticipated: that is, weights that reflect the magnitude of the true residual heterogeneity (equal to the reciprocal of the true total variances) can be expected to perform well.

#### Moments of generalised Cochran heterogeneity statistics

The above method of computing confidence intervals using generalised Cochran heterogeneity statistics requires the use of Farebrother’s algorithm and iterative methods to find the bounds of the confidence interval. In this section we show how to use the moments of generalised Cochran heterogeneity statistics to produce a point estimate of the residual between-study variance and a much simpler method for quantifying the uncertainty in this point estimate. The point estimate of *τ*
^{2} proposed in this section is a natural estimate to accompany the confidence interval obtained using a generalised Cochran heterogeneity statistic.

In Appendix 1, we show that the expectation of the generalised Cochrane heterogeneity statistic is

\mathrm{E}\left({Q}_{a}\right)=\text{tr}\left(\mathbf{B}\mathit{\Delta}\right)+\text{tr}\left(\mathbf{B}\right){\tau}^{2}

(8)

where tr (·) denotes the trace of a matrix. Alternatively, E(*Q*
_{
a
}) can be evaluated numerically as \sum _{i=1}^{n-p}{\lambda}_{i}\left(\mathbf{S}\right).

Hence (8) provides a moments based estimator of *τ*
^{2} upon replacing E(*Q*
_{
a
}) with *Q*
_{
a
}, solving the resulting equation and truncating to zero if the estimate is negative. Equation (8) reduces to the usual formula [1] for the expectation of *Q* when *a*
_{
i
}=*w*
_{
i
} for all *i*, or equivalently that **A**=*W*, so that the conventional moments based estimator is obtained as this special case. Using (8) to produce a point estimate of *τ*
^{2} using the method of moments in this way, we obtain an ‘untruncated’ (not set to zero if negative) estimate

{\widehat{\tau}}_{u}^{2}=\frac{{Q}_{a}-\text{tr}\phantom{\rule{0.1em}{0ex}}\left(\mathbf{B}\mathit{\Delta}\right)}{\text{tr}\phantom{\rule{0.1em}{0ex}}\left(\mathbf{B}\right)}.

(9)

The estimator {\widehat{\tau}}_{u}^{2} is unbiased. Upon truncating {\widehat{\tau}}_{u}^{2} to zero if it is negative, which introduces positive bias, we obtain a generalisation of the estimator using DerSimonian and Kacker’s [13] general method of moments for meta-analysis. This estimator is itself a generalisation of the method proposed by DerSimonian and Laird [17]. Since the entries of **B** are fixed constants, we have

\text{Var}\left({\widehat{\tau}}_{u}^{2}\right)=\frac{\text{Var}\phantom{\rule{0.1em}{0ex}}\left({Q}_{a}\right)}{{\left(\text{tr}\phantom{\rule{0.1em}{0ex}}\right(\mathbf{B}\left)\right)}^{2}}.

(10)

Using standard properties of quadratic forms involving normal distributions, (e.g., Searle [18], page 57, Corollary 1.2), we have that

\begin{array}{c}\text{Var}\phantom{\rule{0.1em}{0ex}}\left({Q}_{a}\right)=\phantom{\rule{0.3em}{0ex}}2\text{tr}\phantom{\rule{0.1em}{0ex}}\left(\mathbf{B}\mathit{\Sigma}\mathbf{B}\mathit{\Sigma}\right)=2\text{tr}\left(\phantom{\rule{0.1em}{0ex}}\mathbf{B}\mathit{\Delta}\mathbf{B}\mathit{\Delta}\right)+4{\tau}^{2}\text{tr}\phantom{\rule{0.1em}{0ex}}\left(\mathbf{B}\mathit{\Delta}\mathbf{B}\right)\\ \phantom{\rule{4.5em}{0ex}}+2{\tau}^{4}\text{tr}\phantom{\rule{0.1em}{0ex}}\left({\mathbf{B}}^{2}\right)\hfill \end{array}

(11)

This formula reduces to equation (7) from Biggerstaff and Tweedie [19] for meta-analyses and the standard weights if *a*
_{
i
}=*w*
_{
i
}. Alternatively, Var(*Q*
_{
a
}) can be obtained as 2\sum _{i=1}^{n-p}{\lambda}_{i}^{2}\left(\mathbf{S}\right). Together (10) and (11), after taking the square root, provide the standard error and hence a measure of the uncertainty in the ‘untruncated’ estimate {\widehat{\tau}}_{u}^{2}. However, the distribution of *Q*
_{
a
}, and hence the distribution of {\widehat{\tau}}_{u}^{2}, is usually very skewed in typical meta-analytic datasets with few studies. Hence using a normal approximation for these statistics is generally inadequate. For this reason we do not propose that confidence intervals are obtained using {\widehat{\tau}}_{u}^{2}, its standard error and a normal approximation, but the standard error of {\widehat{\tau}}_{u}^{2} could still be used to give an indication of the uncertainty in this point estimate by those who do not have access to Farebrothers’ algorithm and/or iterative methods. When {\widehat{\tau}}_{u}^{2}>0, a normal approximation on the log scale might be expected to perform better because a log transformation would counteract the skewness of *Q*
_{
a
} but the resulting confidence interval could then not include zero.

#### Using weights equal to the reciprocal of the estimated total variances

Methods for estimating *τ*
^{2} have been suggested above and it is tempting to consider using weights {a}_{i}=1/({\sigma}_{i}^{2}+{\widehat{\tau}}^{2}), where {\widehat{\tau}}^{2} is some estimate of *τ*
^{2}. Jackson [8] found that this idea performed well in random effects meta-analyses using the DerSimonian and Laird [17] estimator. Weights of this form reflect the variation in the sample of study estimates, and can be expected to provide precise estimation (and so shorter confidence intervals). These weights mean that the conventional weights for making inferences about the overall effect are also used for making inferences about the residual between-study variance. However this choice of weights invalidates the theory because the weights are now random variables and are no longer fixed constants. Jackson [8] found that this problem was of little practical consequence in the context of meta-analysis and this idea warrants further investigation in both the meta-analysis and meta-regression scenarios.

### The Q profile method for meta-regression

The *Q* profile method for meta-analysis has recently been proposed [9, 20, 21]. Viechtbauer [11] and Panityakul *et al*. [12] subsequently explained how to extend this method to meta-regression models in the way we also describe below. This method is based on the *Q* profile statistic given in equation (12) below and is not based on the profile likelihood. The *Q* profile method has been implemented in the *R* package *metafor* for both meta-analyses and meta-regressions. In this paper we show that the method proposed by Viechtbauer [11] and Panityakul *et al*. [12] for meta-regression is guaranteed to provide confidence intervals, that is, confidence sets that are not intervals are never produced. Viechtbauer [11] implicitly assumes that this is the case. The previous proof [12] that the function *Q*(*τ*
^{2}) below is decreasing in *τ*
^{2} is sophisticated, so the main contribution of this paper concerning this particular method is to prove this result in a simpler and more direct way. We do this in a similar way to Knapp *et al.*[21], who prove this result in the simpler setting of the random effects model for meta-analysis (no covariates).

In order to describe the *Q* profile method for meta-regression, we define {\u0177}_{i}\left(\widehat{\beta}\right({\tau}^{2}\left)\right) as the fitted value of *Y*
_{
i
} from the true meta-regression model. This means that {\u0177}_{i}\left(\widehat{\mathit{\beta}}\left({\tau}^{2}\right)\right)={\mathit{x}}_{i}\widehat{\mathit{\beta}}\left({\tau}^{2}\right), where \widehat{\mathit{\beta}}\left({\tau}^{2}\right)={\left({\mathit{X}}^{t}{\mathit{\Sigma}}^{-1}\mathit{X}\right)}^{-1}{\mathit{X}}^{t}{\mathit{\Sigma}}^{-1}\mathit{Y}.

The *Q* profile method is then based on

Q\phantom{\rule{0.1em}{0ex}}\left({\tau}^{2}\right)=\sum _{i=1}^{n}\frac{{\left\{{Y}_{i}-{\u0177}_{i}\left(\widehat{\mathit{\beta}}\left({\tau}^{2}\right)\right)\right\}}^{2}}{{\sigma}_{i}^{2}+{\tau}^{2}}\sim {\chi}_{n-p}^{2}

(12)

where we use (12) as a pivot to test the null hypothesis that the true value of the residual between-study variance is equal to *τ*
^{2}. If {\chi}_{(n-p,\alpha /2)}^{2}\le Q\left({\tau}^{2}\right)\le {\chi}_{(n-p,1-\alpha /2)}^{2}, where {\chi}_{(n-p,\alpha /2)}^{2} and {\chi}_{(n-p,1-\alpha /2)}^{2} denote the *α*/2 and 1-*α*/2 quantiles of the {\chi}_{n-p}^{2} distribution respectively, then we accept (or fail to reject) the null hypothesis that the true residual between-study variance is equal to *τ*
^{2} using a significance level of *α*. Otherwise, we reject this hypothesis.

The confidence set for the residual between-study variance, with (1-*α*) coverage probability, is again obtained by test inversion. The confidence set for *τ*
^{2} contains all the values of *τ*
^{2} that are accepted by the hypothesis test resulting from (12).

#### Ensuring that confidence intervals are obtained

As previously noted [12], the condition that the derivative *d* *Q*(*τ*
^{2})/*d* *τ*
^{2} is negative is sufficient to show that the *Q* profile method for meta-regression is guaranteed to provide confidence intervals. In Appendix 2, we show that

\frac{\mathit{\text{dQ}}\left({\tau}^{2}\right)}{d{\tau}^{2}}=-\sum _{i=1}^{n}\frac{{\left\{{Y}_{i}-{\u0177}_{i}\left(\widehat{\mathit{\beta}}\left({\tau}^{2}\right)\right)\right\}}^{2}}{{\left({\sigma}_{i}^{2}+{\tau}^{2}\right)}^{2}}<0

(13)

which means that *Q*(*τ*
^{2}) is decreasing in *τ*
^{2}. This is a generalisation of the result provided in Appendix 2 of Hartung and Knapp [20], that Knapp *et al.*[21] refer to when establishing that the *Q* profile method for meta-analysis is guaranteed to provide confidence intervals. The form of the derivative in (13) is considerably more straightforward than the expression previously given [12].

#### Obtaining confidence intervals numerically

From (13) we have that *Q*(*τ*
^{2}) is decreasing in *τ*
^{2}. If Q\left(0\right)<{\chi}_{(n-p,\alpha /2)}^{2} then no *τ*
^{2} satisfies {\chi}_{(n-p,\alpha /2)}^{2}\le Q\left({\tau}^{2}\right)\le {\chi}_{(n-p,1-\alpha /2)}^{2}, and an empty confidence set is obtained. This occurs when the data appear to be highly homogenous, and we can either follow Knapp *et al.*[21] and Jackson [8] by interpreting these empty sets as providing confidence intervals of [0,0], or Viechtbauer [9] and report the empty confidence set. As for the confidence intervals obtained using generalised Cochran heterogeneity statistics described above, providing the interval [0,0] increases the coverage probability by *α*/2 when *τ*
^{2}=0 but this avoids the difficulty in interpreting empty confidence sets.

If instead Q\left(0\right)\ge {\chi}_{(n-p,\alpha /2)}^{2} then there are no further difficulties. Furthermore if Q\left(0\right)\le {\chi}_{(n-p,1-\alpha /2)}^{2} then Q\left({\tau}^{2}\right)\le {\chi}_{(n-p,1-\alpha /2)}^{2} for all *τ*
^{2}, so that the confidence interval is comprised of the *τ*
^{2} that satisfy Q\left({\tau}^{2}\right)\ge {\chi}_{(n-p,\alpha /2)}^{2}. The lower bound of the confidence interval is then zero and the upper bound is given by the value of *τ*
^{2} that satisfies Q\left({\tau}^{2}\right)={\chi}_{(n-p,\alpha /2)}^{2}.

Finally, if Q\left(0\right)>{\chi}_{(n-p,1-\alpha /2)}^{2}, then the lower and upper bounds of the confidence interval are given by the values of *τ*
^{2} satisfying Q\left({\tau}^{2}\right)={\chi}_{(n-p,1-\alpha /2)}^{2} and Q\left({\tau}^{2}\right)={\chi}_{(n-p,\alpha /2)}^{2} respectively. Since *Q*(*τ*
^{2}) is continuous and decreasing in *τ*
^{2}, simple numerical search methods for finding the values of *τ*
^{2} that satisfy the equations that give rise to the confidence intervals will always converge to the correct answers.

#### A Newton Raphson procedure for implementing the *Q*profile method in practice

In order to implement the *Q* profile method, we need to solve equations of the form *Q*(*τ*
^{2})=*c*, where *c* are critical values from {\chi}_{n-p}^{2}. Since in (13) we have the derivative of *Q*(*τ*
^{2}) in easily computed form we can use the Newton Raphson procedure for this purpose, where we use the iterative method

{\tau}_{k+1}^{2}={\tau}_{k}^{2}+\frac{\sum _{i=1}^{n}\frac{{\left\{{Y}_{i}-{\u0177}_{i}\left(\widehat{\mathit{\beta}}\left({\tau}_{k}^{2}\right)\right)\right\}}^{2}}{{\sigma}_{i}^{2}+{\tau}_{k}^{2}}-c}{\sum _{i=1}^{n}\frac{{\left\{{Y}_{i}-{\u0177}_{i}\left(\widehat{\mathit{\beta}}\left({\tau}_{k}^{2}\right)\right)\right\}}^{2}}{{\left({\sigma}_{i}^{2}+{\tau}_{k}^{2}\right)}^{2}}}

with a suitable starting value {\tau}_{0}^{2}. Once {\tau}_{k+1}^{2}\approx {\tau}_{k}^{2} convergence is reached and the solution to *Q*(*τ*
^{2})=*c* is obtained. This algorithm has been found to perform well provided that estimates of *τ*
^{2} are used as starting values and any negative values of {\tau}_{k}^{2} are replaced by zero when performing the iteration.

#### A corresponding point estimate of the residual between-study variance

Paule-Mandel [13, 22] suggested solving Q\left({\widehat{\tau}}_{\mathit{\text{PM}}}^{2}\right)=n-1 in the context of meta-analysis. This idea can be generalised to the meta-regression scenario considered here [12] by solving Q\left({\widehat{\tau}}_{\mathit{\text{PM}}}^{2}\right)=n-p. This equation can be solved using the Newton Raphson procedure suggested above, with *c*=*n*-*p*, and if there is no solution to this equation then {\widehat{\tau}}_{\mathit{\text{PM}}}^{2}=0[13, 22]. The empirical Bayes estimator [23] is equivalent to this Paule-Mandel estimator [24]. By adopting the convention that empty sets are interpreted as [0,0], this estimator will always lie in the corresponding *Q* profile confidence interval. In general point estimates of *τ*
^{2} need not lie inside the various confidence intervals that are available, so this is an attractive property. Hence the Paule-Mandel estimator provides a natural point estimate to accompany the confidence interval from the *Q* profile method and we recommend using them together. The Paule-Mandel estimator is also unique, as previously noted [12].

### Bayesian analyses with informative prior distributions for the residual between study variance

In addition to specifying the form of the likelihood, Bayesian analyses also require the specification of prior distributions. ‘Vague’ or ‘uninformative’ prior distributions, that are typically taken to be uniform distributions on a particular scale, are often used for this purpose. However choosing ‘vague’ prior distributions for variance components is problematic as Lambert *et al.*[25] show for the between-study variance in random effects meta-analysis with small numbers of studies. We anticipate that that the issues surrounding the use of ‘vague’ priors for variance components, as exemplified by the simulation study by Lambert *et al.*, will also apply in the meta-regression scenario.

Informative prior distributions for parameters in random effects meta-analyses are now available for some settings [10, 26–28]. In practice we suggest also using prior distributions for the between-study variance that have been derived for meta-analysis as prior distributions for the residual between-study variance in meta-regressions. Deriving prior distributions for *τ*
^{2} for specific meta-regression models is, at best, very difficult because there are very many combinations of study covariates that could be used in regression models. Typically meta-regression models involve only a few covariates and, although we might anticipate the residual between-study variance to be less than the between-study variance in a corresponding random effects meta-analysis (because we might expect the covariates to explain some heterogeneity), unless the covariate effects are strong we can reasonably expect the difference between these two types of between-study variance to be small. Furthermore, if the residual between-study variance is indeed the smaller of these two types of between-study variances, then using priors elicited in the context of meta-analysis for meta-regression models can be thought of as being conservative. This is because the values of *τ*
^{2} supported by the prior, and hence the posterior, will be too large. This generally results in wider credible intervals and larger posterior standard deviations for the regression parameters, and favours the side of concluding that there is further unexplained variation that could be explained using additional covariates.

We follow Turner *et al.*[10] in using vague priors for the regression parameters (the entries of *β*) and informative priors for *τ*
^{2}. An informative prior for *τ*
^{2}, where available, is to be taken from the context of the meta-regression. Hence we only use out-of-sample information about the residual between-study variance, because it is this particular parameter that is hard to identify in typically small meta-analytic samples. We provide WinBUGS [29] code, for both examples, in the Additional files 1, 2 and 3 to show how informative priors for *τ*
^{2} can be used in practice. The use of MCMC to fit meta-regression models means that convergence of the chains must be carefully checked in practice. Alternative (non MCMC) methods for performing Bayesian estimation of meta-regression models deserve investigation and could form the subject of further work. Meta-analysts should be aware that Bayesian analyses of this kind make more assumptions than frequentist meta-analyses because they involve priors. However, by taking the limits of the credible interval as the 2.5% and 97.5% quantiles of the posterior distribution, Bayesian interval estimation is easily performed.