 Research article
 Open Access
 Published:
Estimation in metaanalyses of response ratios
BMC Medical Research Methodology volume 20, Article number: 263 (2020)
Abstract
Background
For outcomes that studies report as the means in the treatment and control groups, some medical applications and nearly half of metaanalyses in ecology express the effect as the ratio of means (RoM), also called the response ratio (RR), analyzed in the logarithmic scale as the logresponseratio, LRR.
Methods
In randomeffects metaanalysis of LRR, with normal and lognormal data, we studied the performance of estimators of the betweenstudy variance, τ^{2}, (measured by bias and coverage) in assessing heterogeneity of studylevel effects, and also the performance of related estimators of the overall effect in the log scale, λ. We obtained additional empirical evidence from two examples.
Results
The results of our extensive simulations showed several challenges in using LRR as an effect measure. Point estimators of τ^{2} had considerable bias or were unreliable, and interval estimators of τ^{2} seldom had the intended 95% coverage for small to moderatesized samples (n<40). Results for estimating λ differed between lognormal and normal data.
Conclusions
For lognormal data, we can recommend only SSW, a weighted average in which a study’s weight is proportional to its effective sample size, (when n≥40) and its companion interval (when n≥10). Normal data posed greater challenges. When the means were far enough from 0 (more than one standard deviation, 4 in our simulations), SSW was practically unbiased, and its companion interval was the only option.
Background
Users of metaanalysis assemble estimated effects from several studies in order to assess their heterogeneity and obtain an overall estimate. Here we focus on the measure of effect known as the response ratio (RR, also known in medical applications as the ratio of means, RoM), analyzed in the logarithmic scale as the logresponseratio, LRR. In ecology almost half of all metaanalyses use this outcome measure [1, 2]. When some studies report means but include no information on the corresponding variances, the alternative of analyzing the standardized mean difference is not available.
The RR was introduced by Hedges et al. [3] and rediscovered as RoM by Friedrich et al. [4]; they assumed normality of the underlying data. To avoid confusing RR with relative risk, we use RoM on the original scale and LRR on the logscale. Because the LRR is not defined for negative values of the study means, Lajeunesse [5] modeled the data by lognormal distributions. We explore metaanalysis of LRR under both normal and lognormal distributions, combined with the fixedeffect model and the randomeffects model.
The fixedeffect (FE) model regards the studies as sharing a single true effect, whereas the randomeffects (RE) model regards those true effects as a sample from a distribution. The variance of that distribution, usually denoted by τ^{2}, is often used in estimating the overall effect (the mean of the distribution of random effects). In studying estimation for metaanalysis of LRR, we focus first on τ^{2} and then proceed to the overall effect, which we denote by λ.
Veroniki et al. [6] give a broad overview of methods of estimating τ^{2} and its uncertainty. However, they study only “methods that can be applied for any type of outcome data.” As we show elsewhere [7], the performance of the methods varies widely among effect measures. Also, empirical information on the comparative performance of the methods remains limited. We address both of these limitations for the effect measure LRR.
Veroniki et al. [6, (Appendix Table 1)] cite no previous simulation studies on the comparative performance of estimators of τ^{2} for LRR.
Several studies have considered the quality of estimation of λ. Friedrich et al. [4] report extensive simulations for LRR under normality, but they use only the DerSimonianLaird (DL) method to estimate τ^{2} and do not report on its quality. Lajeunesse [5] discusses bias correction for studylevel LRR and its variance, and provides some simulation results for lognormal distributions, but there is no evidence on applicability of this correction under the RE model. Doncaster and Spake [8] provide some limited simulation results under normality for accuracy of estimation of the overall LRR and its variance, using the DL and restricted maximumlikelihood (REML) methods to estimate τ^{2}. To assess bias of the estimators of λ, they use mean absolute error, which is not a measure of bias; it is the linear counterpart of mean squared error.
To address this gap in information on methods of estimating the heterogeneity variance for LRR, we use simulation to study four generalpurpose methods recommended by Veroniki et al. [6]. These are the wellestablished methods of DerSimonian and Laird [9], restricted maximum likelihood, and Mandel and Paule [10] (MP) and the lessfamiliar method of Jackson [11]. We also study coverage of confidence intervals for τ^{2} achieved by four methods: the Qprofile method of Viechtbauer [12], the methods of Biggerstaff and Jackson [13] and Jackson [11], and the profilelikelihoodbased interval [14].
For each estimator of τ^{2}, we also study bias of the corresponding inversevarianceweighted estimator of the overall effect. However, it is well known that these inversevarianceweighted estimators have unacceptable bias for some other effect measures, as Bakbergenuly et al. [7] and Hamman et al. [15] show for the standardized mean difference and Bakbergenuly et al. [16] show for the logoddsratio. As a way of reducing this bias, we added an estimator (SSW) whose weights depend only on the sample sizes of the Treatment and Control arms. We study the coverage of the confidence intervals associated with the inversevarianceweighted estimators, and also the HKSJ interval [17, 18], a modification of the HKSJ interval that uses the MP estimator of τ^{2} instead of the DL estimator, and an interval centered at SSW that uses the MP estimator of τ^{2} in estimating its variance and bases its width on a t distribution.
Under “Methods” we briefly review studylevel estimation of logresponseratio and the standard randomeffects model for metaanalysis, discuss models for metaanalysis of logresponseratio, very briefly discuss point and interval estimation of betweenstudy variance and overall effect, introduce the SSW estimator, and describe our simulations. Then we summarize results of the simulations, discuss the two examples, and provide discussion and conclusions.
Additional files list the methods and provide additional plots and our R programs.
Methods
Studylevel estimation of logresponseratio
We assume that each of the K studies in the metaanalysis consists of two arms, Treatment and Control, with sample sizes n_{iT} and n_{iC}. The total sample size in Study i is n_{i}=n_{iT}+n_{iC}. The subjectlevel data in each arm are assumed to be normally or lognormally distributed with means μ_{iT} and μ_{iC} and variances \(\sigma _{iT}^{2}\) and \(\sigma _{iC}^{2}\). The sample means are \(\bar {X}_{ij}\), and the sample variances are \(s^{2}_{ij}\), for i=1,…,K and j=C or T.
The response ratio is usually metaanalyzed on a log scale, where the effect measure is λ_{i}= log(μ_{iT}/μ_{iC}), estimated by \(\hat \lambda _{i}=\log \left (\bar {X}_{iT}/\bar {X}_{iC}\right)\), and the population and sample means are assumed to be positive. As Bakbergenuly et al. [19] discuss, however, the log transformation introduces bias: the expected value of \(\exp \left (\hat {\lambda }_{i}\right)\) is not equal to exp(λ_{i}). The withinstudy variance estimate for \(\hat {\lambda }_{i}\), obtained by the delta method, is [3]
where \(\hat {V}_{ij}\) is the sample coefficient of variation (CV). The positively skewed distribution of \(\hat {\lambda }_{i}\) is not well approximated by a normal distribution when at least one of the sample means is close to zero. This behavior results in considerable bias of \(\hat {\lambda }_{i}\) and affects the quality of its variance estimate [3]. The standard advice is to use LRR for Study i only when at least one of \(\sqrt {n_{ij}} \bar {X}_{ij} / s_{ij} \geq 3\). This advice is often ignored in practice [5].
To eliminate this bias in small samples, Lajeunesse [5] proposed two biascorrected modifications, and he recommended
and estimated its variance by
However, this biascorrected estimator and its variance would be even less stable for small values of \(\bar {X}_{ij}\).
Standard randomeffects model
The standard randomeffects model accounts for within and betweenstudy variabilities by assuming approximately normal distributions of within and betweenstudy effects. We use this model to facilitate comparison with previous work, which has relied on it, and because applications have used methods based on it. This traditional model, however, has shortcomings. For most measures of effect, withinstudy variability is only approximately normal, generally in large samples. In practice, betweenstudy variability is unlikely to be normal, but few metaanalyses involve enough studies to reliably evaluate that assumption or to point toward suitable alternatives. However, some simulation evidence [20] indicates that various methods based on the randomeffects model are largely robust to departures from normality.
For a generic measure of effect, denoted by λ,
the resulting marginal distribution is \(\hat {\lambda }_{i} \sim N\left (\lambda, v_{i}^{2}+\tau ^{2}\right)\). \(\hat {\lambda }_{i}\) is the estimate of the effect in study i, and its withinstudy variance is \(v_{i}^{2}\), estimated by \(\hat {v}_{i}^{2}, i=1,\ldots, K\). The betweenstudy variance, τ^{2}, is estimated by \(\hat {\tau }^{2}\). The overall effect, λ, is customarily estimated by the weighted mean
with inversevariance (IV) weights \(\hat {w}_{i}\left (\hat {\tau }^{2}\right)=\left (\hat {v}_{i}^{2}+\hat {\tau }^{2}\right)^{1}\). In the FE estimate \(\hat {\lambda }=\hat \lambda _{FE}\) the weights are \(\hat {w}_{i} = \hat {w}_{i}(0)\). A reviewer raised the concern that, by using IV weights, randomeffects estimators of the overall effect redistribute weight from larger studies (which contain more information) to smaller studies. Whenever \(\hat {\tau }^{2} > 0\), such redistribution is an inherent feature of IV weights. Under “Point estimators” we discuss an estimator that avoids this shortcoming.
If the \(\text {Var}(\hat {\lambda }_{i})\) are the (known) true variances and \(w_{i} = 1 / \text {Var}(\hat {\lambda }_{i})\), the variance of the weighted mean of the \(\hat {\lambda }_{i}\) is \(1/ \sum w_{i}\). Thus, many authors estimate the variance of \(\hat {\lambda }_{\mathit {RE}}\) by \(\left [\sum _{i=1}^{K}\hat {w}_{i}\left (\hat {\tau }^{2}\right)\right ]^{1}\). In practice, however, this estimator may not be satisfactory [21–23].
Several methods of estimating τ^{2} use Cochran’s Q statistic [24]
or a modification of it.
Models for metaanalysis of logresponseratio
Since \(\hat {\lambda }\) is defined only in the positive quadrant \(\left (\bar {X}_{T}>0,\;\bar {X}_{C}>0\right)\), the standard FE and RE models of metaanalysis are not quite appropriate for normally distributed data. (We exclude the extraneous and unlikely possibility \(\bar {X}_{T} < 0,\;\bar {X}_{C} < 0\).) Let \(\bar {X}_{j}\sim N\left (\mu _{j},\sigma _{j}^{2}/n_{j}\right)\). Then, under the fixedeffect model, the probability that \(\hat {\lambda }\) is not defined is
where Φ(·) is the cumulative distribution function of the standard normal distribution. This probability is a decreasing function of n_{j} and an increasing function of V_{j}=σ_{j}/μ_{j}, and it is close to 0 for large sample sizes. However, very small sample sizes, starting from n_{T}=n_{C}=2, are abundant in ecology, where the LRR is a measure of choice. Keeping, say, \(\sqrt {n_{C}}/V_{C}=3\), the probability that \(\hat {\lambda }\) is not defined is 0.0027 for \(\sqrt {n_{T}}/V_{T}=3\), 0.024 for \(\sqrt {n_{T}}/V_{T}=2\), 0.160 for \(\sqrt {n_{T}}/V_{T}=1\), and 0.309 for \(\sqrt {n_{T}}/V_{T}=0.5\). Therefore, only large values of both \(\sqrt {n_{C}}/V_{C}\) and \(\sqrt {n_{T}}/V_{T}\) would avoid problems. When the population value \(\sqrt {n_{T}}/V_{T}<3\), say, studies for which \(\hat {\lambda }\) is not defined would occur with comparatively high probability and would need to be dropped from the metaanalysis. These omissions imply that the mean in each arm comes from a normal distribution truncated from below at \(\sqrt {n_{j}}/V_{j}\), even under the FE model. This truncation may introduce bias in \(\hat {\lambda }\).
Restrictions on the range are not unique to LRR; they apply to other effect measures such as the relative risk [25], where the restriction arises because the probabilities of events must be between 0 and 1.
In the RE model, λ_{i}∼N(λ,τ^{2}) and can be negative. If so, μ_{iT} (=μ_{iC} exp(λ_{i})) may be close to 0, and consequently the contribution of \(\hat {V}_{iT}^{2} / n_{iT}\) to \(\hat {v}_{i}^{2}\) in Eq. (1) may be large, further increasing the probability of truncation for \(\bar {X}_{iT}\).
Because \(\hat {\lambda }\) is not defined for negative values of the study means, Lajeunesse [5] modeled the data by lognormal distributions. In principle, lognormal distributions often make sense for nonnegative data. This choice would eliminate the restrictedrange bias of LRR. Of course, the choice of model should be based on the properties of the data and not on perceived ease of statistical modeling.
Even though sample means and variances are unbiased estimators of the population means and variances for lognormal distributions, they are very inefficient, especially in variance estimation [26, Section 14.4.1, p. 220–222]. If the data are assumed to come from lognormal distributions, a much more straightforward approach would logtransform the individual observations, which would reduce the problem to metaanalysis of mean difference. This approach would provide much better inferences. However, when individuallevel data are not available, metaanalyses must work with the sample means and variances.
Methods of estimating betweenstudy variance
In this section we briefly list point and interval estimators of betweenstudy variance τ^{2} used in our study.
Point estimators
The most popular (but rather biased) estimator of τ^{2} is the methodofmoments estimator of DerSimonian and Laird [9] (DL), denoted by \(\hat {\tau }_{\mathit {DL}}^{2}\). It is based on the appoximate first moment of Cochran’s Q, Eq. (6).
Assuming that the \(\hat {\lambda }_{i}\) are distributed as \(N\left (\lambda,\hat {v}_{i}^{2}+\tau ^{2}\right)\), the restrictedmaximumlikelihood (REML) estimator \(\hat {\tau }_{\mathit {REML}}^{2}\) maximizes the restricted (or residual) loglikelihood function l_{R}(λ,τ^{2}). REML is superior to DL because of its balance between unbiasedness and efficiency [27].
DerSimonian and Kacker [28] generalized DL, replacing the weights \(\hat {w}_{i}\) of Cochran’s Q, Eq. (6), by arbitrary fixed positive constants, a_{i}. For situations in which some unknown amount of heterogeneity is anticipated, Jackson [11] proposed the estimator of τ^{2} with \(a_{i} = 1 / \hat {v}_{i}\). We refer to this method as J.
The MandelPaule (MP) estimator [10], \(\hat {\tau }_{\mathit {MP}}^{2}\), is another momentbased estimator of the betweenstudy variance. Its calculation requires iteration. It is known to be superior to DL [6], but no simulations for LRR had been performed previously.
Interval estimators
The 95% profilelikelihood (PL) confidence interval consists of the values of τ^{2} that are not rejected by the likelihoodratio test with τ^{2} as the null hypothesis [14]. This interval is usually used with \(\hat {\tau }_{REML}^{2}\).
Similarly, the Qprofile confidence interval [12] consists of the values that are not rejected by the test based on a Q statistic (6) in which the weights are \(\hat {w}_{i}\left (\tau ^{2}\right)\). The distribution of Q is assumed to follow the chisquare distribution with K−1 degrees of freedom.
For a generic effect measure, Biggerstaff and Jackson [13] derived the exact distribution of the Q statistic with arbitrary but constant weights a_{i}. That distribution yielded a generalized Qprofile confidence interval. We refer to this interval with \(a_{i}=\hat w_{i}\) as the BJ confidence interval.
Jackson [11] proposed another generalized Qprofile confidence interval (J). The approach is the same as for the BJ interval, but with \(a_{i} = 1 /\hat {v}_{i}\).
Methods of estimating overall effect
Because some of the point estimators of the overall effect do not have corresponding interval estimators, we describe point estimators and interval estimators in separate sections.
Point estimators
A randomeffects method that estimates λ by a weighted mean with inversevariance weights, as in Eq. (5), is determined by the particular \(\hat {\tau }^{2}\) that it uses in \(\hat {w}_{i}\left (\hat {\tau }^{2}\right)\). We refer to these estimators by the names of the respective \(\hat {\tau }^{2}\). Because the studylevel effects and their variances are related (see Eq. (1) for LRR), all inversevarianceweighted estimators of overall LRR have considerable biases.
Following negative experience with the bias of inversevarianceweighted estimators of SMD, Bakbergenuly et al. [7] included a point estimator whose weights depend only on the studies’ sample sizes, as proposed by Hedges and Olkin [29] and Hunter and Schmidt [30]. For this estimator (SSW), \(w_{i} = \tilde {n}_{i} = n_{iT}n_{iC}/\left (n_{iT} + n_{iC}\right)\); that is, w_{i} sets \(\hat {V}_{iT} = \hat {V}_{iC} = 1\) in Eq. (1). To reduce bias in estimating the overall LRR, we included SSW.
Interval estimators
The point estimators DL, REML, MP, and J have companion interval estimators of λ. The customary approach estimates the variance of \(\hat {\lambda }_{\mathit {RE}}\) by \(\left [\sum \limits _{i=1}^{K}\hat {w}_{i}\left (\hat {\tau }^{2}\right)\right ]^{1}\) and bases the width of the interval on the normal distribution. These intervals are usually too narrow, and their coverage may also be reduced by bias. Hartung and Knapp [17] and, independently, Sidik and Jonkman [18] developed an improved estimator for the variance of \(\hat {\lambda }_{\mathit {RE}}\). The HartungKnappSidikJonkman (HKSJ) confidence interval uses this estimator and bases the width on critical values from the t distribution on K−1 degrees of freedom. A potential weakness of an HKSJ interval that uses \(\hat {\lambda }_{\mathit {DL}}\) as its midpoint is that it will have any bias that is present in \(\hat {\lambda }_{\mathit {DL}}\). Therefore we also consider an HKSJ confidence interval centered at \(\hat \lambda _{MP}\) (HKSJ MP).
The interval estimator corresponding to SSW (SSW MP) uses the SSW point estimator as its center, and its width equals the estimated standard deviation of SSW under the randomeffects model times twice the critical value from the t distribution on K−1 degrees of freedom. The estimator of the variance of SSW is [7, 5.2]
in which \(\hat {v}_{i}^{2}\) comes from Eq. (1) and \(\hat {\tau }^{2} = \hat {\tau }_{\mathit {MP}}^{2}\).
Simulation study
As mentioned under “Background”, a few studies have used simulation to examine estimators of the overall effect for LRR, but no studies have systematically examined estimators of τ^{2}.
The range of values of LRR may be rather wide. The empirical study by Senior [31] reports values of LRR up to 3.72 (though the secondlargest value is 1.46) and values of K from 3 to 592. The simulations by Friedrich et al. [4] used values of LRR up to 0.445 (RoM=1.56). Lajeunesse [5] used means between 0 and 8 in both arms and small sample sizes, starting from n_{T}+n_{C}=4. Unfortunately, no information is available on the accompanying range of τ^{2} values. In their simulations for SMD, Hamman et al. [15] consider the range from 0 to 2.5 as typical for ecology.
Design of the simulations
As summaries of performance our simulation study estimated bias of point estimators and coverage of interval estimators, for τ^{2} and for λ. Two basic distributions served as the source of the data in the Treatment and Control arms: the normal distribution (the subject of Bakbergenuly et al. [32]) and the lognormal distribution (the subject of a separate report [33]).
For the overall value of LRR, we chose λ=(0,0.2,0.5,1,2) (corresponding to 1≤RoM≤7.39), as realistic for a range of applications. The true values of LRR in the individual studies, λ_{i}, were generated from a normal distribution: λ_{i}∼N(λ,τ^{2}). For a given Control mean μ_{iC}, the Treatment mean was μ_{iT}=μ_{iC} exp(λ_{i}).
To evaluate performance issues related to proximity to zero, we used two values of the mean in the Control arm, μ_{iC}=1 and μ_{iC}=4, in simulations from normal distributions (with \(\sigma ^{2}_{C} = 1\)). When the data are lognormal, proximity to zero does not affect data generation or inferences. Therefore, as the mean of the Control arm we took μ_{iC}=1.
All simulations used the same numbers of studies, small (K=5, 10, 30) and large (K=50, 100, 125), and equal numbers of observations in the Control and Treatment arms.
We studied only metaanalyses in which the study size was the same in all K studies and n_{iC}=n_{iT}=n_{i}/2. The study sizes, n_{i}, started from 4, because some studies in ecology have such small sample sizes; the small sample sizes were n_{i}=4,10,20,40, and the large sample sizes were n_{i}=100,250,640,1000. By using the same set of sample sizes for each combination of the other parameters, we avoided the additional variability in the results that would arise from choosing sample sizes at random (e.g., uniformly between 100 and 250).
For simulations from normal distributions, we generated the withinstudy sample variances \(s_{ij}^{2}\) (j=T, C) from chisquare distributions as \(\sigma _{ij}^{2}\chi _{n_{ij}1}^{2}/(n_{ij}1)\). We generated the estimated means \(\bar {X}_{ij}\) from a normal distribution with mean μ_{ij} and variance \(\sigma _{ij}^{2}/n_{ij}\). We obtained the estimated withinstudy LRR as \(\hat {\lambda }_{i}=\log \left (\bar {X}_{iT}/\bar {X}_{iC}\right)\) and the estimated withinstudy variance from Eq. (1). Studies with at least one negative sample mean were discarded, and the value of K was reduced accordingly, resulting effectively in a simulation from a truncated normal distribution of means in each arm. The median number of studies was 4.859 (quartiles 4.612 and 4.961) for K=5, 9.715 (9.202, 9.925) for K=10, and 29.15 (27.60, 29.77) for K=30. In summary, we varied five parameters: the overall true LRR (λ), the betweenstudies variance (τ^{2}), the mean in the Control arm (μ_{C}), the number of studies (K), and the total sample size (n). We set \(\sigma ^{2}_{C}=\sigma ^{2}_{T}=1\). Table 1 lists the configurations.
For simulations from lognormal distributions, we generated n_{ij} independent observations from the lognormal distribution with mean \(\log \left (\mu _{ij}\right)0.5\log \left (1+\sigma _{ij}^{2}/\mu _{ij}^{2}\right)\) and variance \(\log \left (1+\sigma _{ij}^{2}/\mu _{ij}^{2}\right)\), as in Lajeunesse [5], and obtained their sample means \(\bar X_{ij}\) and sample variances \(s^{2}_{ij}\). Then we calculated the sample LRR \(\hat {\lambda }_{i}=\log \left (\bar {X}_{iT}/\bar {X}_{iC}\right)\) and the estimated variances \( \hat {v}_{i}^{2}\) as in Eq. (1). We also calculated the biascorrected estimate, \(\hat {\lambda }_{i}^{\Delta }\), Eq. (2), and its estimated variance, Eq. (3) [5]. In summary, we varied four parameters: the overall true LRR (λ), the betweenstudies variance (τ^{2}), the number of studies (K), and the total sample size (n). We set \(\sigma ^{2}_{C}=\sigma ^{2}_{T}=1\). Table 1 lists the configurations.
For each combination of parameters, we used a total of 10,000 repetitions. The resulting simulation standard error for estimated coverage of τ^{2} or λ at the 95% confidence level was roughly \(\sqrt {0.95 \times 0.05 / 10,000} = 0.00218\).
We programmed the simulations in R version 3.3.2 using the University of East Anglia 140computernode High Performance Computing (HPC) Cluster, which has a total of 2560 CPU cores, including parallel processing and large memory resources. For each configuration, the 10,000 replications consisted of 10 parallel sets of 1000 replications.
A reviewer inquired about the values of I^{2} underlying our simulations. Figures A3 and A4 in Additional File 3 plot I^{2}=100τ^{2}/(τ^{2}+s^{2}) versus τ^{2}∈[0,1] for LRR when n = 20, 40, 100, and 250, with traces for λ = 0, 0.5, 1, 1.5, and 2. As indicated by the definition, I^{2} increases as τ^{2} increases. The value of n also has a substantial impact (larger n yields higher I^{2}). Importantly, for LRR I^{2} increases as λ increases, especially for the smaller n, contrary to the scaleinvariance criterion of Higgins and Thompson [34]. We emphasize that we discourage use of I^{2}, for reasons that include those mentioned here.
Results
We report our full simulation results under the normal and lognormal distributions in eprints [32, 33]. Those reports include 240 and 160 figures, respectively; each figure presents 12 plots (versus τ^{2}) of bias or coverage of estimators of τ^{2} or estimators of λ, corresponding to four values of n and three values of K. The summary below is illustrated by Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. Those figures, however, do not give a comprehensive picture. In summarizing the results, we sometimes have to describe the joint contribution of two or three parameters.
Bias and coverage in estimation of τ^{2} under normal distribution (Figs. 1 and 2)
Bias The main feature is a negative relation between bias and τ^{2}. When μ_{C}=1 and n=4, all four estimators have definite positive bias at τ^{2}=0 (e.g., 0.13 to 0.22 when λ=0 and K=5). As τ^{2} increases, the bias trends downward, roughly linearly, and it can reach substantial negative values (e.g., −0.47 to −0.64 at τ^{2}=1 when λ=0 and K=5). When n=10, the bias at τ^{2}=0 is less than 0.05, and it goes to 0 as n increases. Also, the slopes of the traces decrease in magnitude, and the traces separate (Fig. 1, row 1). The separation is slight at n=40 but substantial at n=100. Interestingly, MP and REML improve steadily, but DL and J do not (Fig. 1, row 4); when λ=0,K=5, and n=1000, the bias at τ^{2}=1 is −0.33 for DL and −0.20 for J versus −0.06 for REML and −0.03 for MP. Increasing K (even to 50, 100, and 125) generally does not reduce the biases, but the traces separate when n≤20, especially when n=4. As λ increases, the biases at τ^{2}=0 decrease slightly, and the magnitudes of the slopes decrease substantially (Fig. 1, row 2); when λ=2 and n≥40, MP and REML are nearly unbiased. The biases are much smaller when μ_{C}=4 (see Fig. 1, row 3 for comparison); we believe the difference is due to restrictedrange issues. None of the four estimators of τ^{2} has adequately small bias overall, but (excluding n=4, which is generally quite challenging) MP often has the least bias, followed by REML, and J and DL have the most bias.
Bias increases when the biascorrected estimator \(\hat {\lambda }^{\Delta }\) is used for small n. As expected, biascorrection does not have any effect for large n.
Coverage When μ_{C}=1, coverage of τ^{2} depends strongly on K. Counterintuitively, all four methods have coverage close to nominal when K=5, somewhat below nominal when K=10 (unless n>100), and often far below nominal when K≥30 (in problematic patterns when n=4). At τ^{2}=0 coverage is noticeably above nominal when n≥10 (Fig. 2, row 1). When n≤40, coverage generally improves as λ increases (but it deteriorates at small τ^{2} when n=4). When n≥100, QP and PL generally have coverage reasonably close to nominal, even for K=30, and QP is sometimes closer; when K=30, the coverage of BJ and J decreases sharply as τ^{2} increases (except when λ=2) (Fig. 2, row 3). When μ_{C}=4, coverage of the Qprofile intervals is close to nominal for τ^{2}≥0.1 (Fig. 2, row 2). All four methods have belownominal coverage at τ^{2}=0 when n=4 and λ=0, and the deficit becomes worse and affects other values of n as λ increases. When n is large or K is large, the coverage of J and especially BJ declines steeply as τ^{2} increases (Fig. 2, row 4). Use of the biascorrected estimator \(\hat {\lambda }^{\Delta }\) does not affect coverage of τ^{2} for smaller K and makes it considerably worse for larger K.
Bias, mean squared error, and coverage in estimation of λ under normal distribution (Figs. 3, 4 and 5)
Bias and mean squared error All four IVweighted estimators of λ have practically the same bias. When μ_{C}=1 and λ=0, the bias is 0 at τ^{2}=0 and increases as τ^{2} increases (the traces resemble \(\sqrt {\tau ^{2}}\) for τ^{2}≤0.4 and are nearly linear for τ^{2}≥0.4) (Fig. 3, row 1). At τ^{2}=1 it can be large when n is small (e.g., around 0.40 when K=5 and n≤20). When λ≥0.2, the bias is negative at small τ^{2}, with magnitude exceeding 0.12 when λ≥1 and n≤10; it becomes positive as τ^{2} increases. The bias can be large for small sample sizes, and it decreases to zero only for very large sample sizes. The bias depends very little on K. The bias of SSW is positive, though considerably less than the bias of the IVweighted methods, and it decreases much more rapidly as n increases (Fig. 3, row 4). It decreases for increasing values of λ, staying at about 5% for n=10 and λ=2. Interestingly, when λ≥1 and n≥10, the bias of SSW (when positive) is roughly constant across the range of τ^{2} (Fig. 3, row 2). The bias of the IVweighted estimators is still present, though much smaller, when μ_{C}=4, whereas SSW is practically unbiased (Fig. 3, row 3). Hence bias for large μ_{C} is due to the use of the IV weights and/or to transformation bias, as restrictedrange effects are not involved. Use of the biascorrected estimator \(\hat {\lambda }^{\Delta }\) does not affect bias in the IVweighted estimators of λ. SSW, however, often produces aberrantly large values, because of the presence of \(\bar {X}_{iT}^{2}\) and \(\bar {X}_{iC}^{2}\) in the denominators in Eq. (2). An \(\bar {X}_{iC}^{2}\) near zero considerably inflates a study’s \(\hat {\lambda }_{i}\), producing an inflated SSW estimate of λ. Because the corresponding \(\hat {v}_{i}^{2}\) (Eq. (3)) are also inflated, the IVweighted methods downweight these studies, but SSW does not. To summarize, we do not recommend biascorrection under normal distribution.
Astonishingly, the MSE in estimation of λ does not reflect the biases of the respective estimators, but strongly depends on K, even though the bias in estimation of λ does not. The MSE of SSW is somewhat higher than MSE of the IVweighted estimators when K=5, can be higher or lower when K=10 and is usually lower when K=30.
Coverage When μ_{C}=1, coverage of all IVweighted methods is considerably below nominal for n≤40 (with some exceptions near τ^{2}=0 when λ is small) (Fig. 5, row 1). Coverage of the IVweighted methods deteriorates substantially for larger K and larger values of τ^{2}, especially for small values of λ (Fig. 5, rows 2 and 3). The HKSJ intervals provide nominal coverage only for n≥100, and the standard IVweighted methods never achieve the nominal level. SSW MP, with t critical values, is the only choice, but its coverage is somewhat above nominal for very small n. When μ_{C}=4, the HKSJ intervals provide nominal coverage starting from n=10 when K≤30 and τ^{2}>0, and the coverage of all other IVweighted methods is considerably lower than nominal. Coverage of all except SSW MP deteriorates for larger K (Fig. 5, row 4), though coverage of SSW MP is above nominal at τ^{2}=0. Coverage of all methods except DL and J improves for very large sample sizes, but it still can be extremely low at zero. Use of the biascorrected estimator \(\hat {\lambda }^{\Delta }\) does not affect coverage of λ.
Bias and coverage in estimation of τ^{2} under lognormal distribution (Figs. 6 and 7)
Bias When n is very small (Fig. 6, row 1), all four estimators of τ^{2} have substantial positive bias, increasing linearly with τ^{2} (when λ=0 and n=4, the intercept is around 0.4, and the slope is around 0.9). This pattern persists for λ≤1; but when λ=2, the slope is essentially 0 (Fig. 6, row 3). As n increases to 40, the intercept and slope decrease; but the trace for DL begins to diverge from the others, followed by the trace for J, and increasingly as λ increases. K has little effect. MP and REML have similar, reasonably small, bias when n≥40 (Fig. 6, rows 2 and 4). When n≥100, the traces for DL and J bend toward increasingly negative bias as τ^{2} increases; their bias becomes worse as n increases and slightly worse as K increases (for example, when λ=0,n=1000, and K≥50, the bias of DL is −0.28 at τ^{2}=1). The bias correction for \(\hat {\lambda }_{i}\) does not reduce the bias [33] (Appendices A2, A4, C2, and C4).
Coverage When n<40 and K=5, the coverage of all four intervals for τ^{2} is below the nominal 95%, especially when n=4 and τ^{2}<0.4; increasing K to 10 and 30 reduces coverage substantially and makes this pattern worse (Fig. 7, row 1), and increasing λ has little effect. Increasing K to 50 and beyond reduces coverage further, even to 0 when n=4 and τ^{2}=0 (Fig. 7, row 2). When n≥40 and K=5 or 10, BJ and J generally provide nominal or slightly higher coverage, and QP and PL are slightly lower. Situations with K≥30 are often quite challenging; BJ has low coverage for K≥30, and for larger n and K, coverage of J deteriorates similarly to BJ, but QP and PL provide good coverage (Fig. 7, row 3). Coverage of all methods improves for larger λ (Fig. 7, row 4). The bias correction does not improve coverage.
Bias, mean squared error, and coverage in estimation of λ under lognormal distribution (Figs. 8, 9 and 10)
Bias and mean squared error All five estimators of λ have bias that shows little dependence on K. When λ=0 and τ^{2}=0, they all have essentially no bias. When τ^{2}>0, the bias is very roughly linear in τ^{2}, with negative slope but a nonnegative intercept for the IVweighted estimators and a negative intercept for SSW. The intercept for the IVweighted estimators is positive for n≥10, so their bias is positive for smaller τ^{2} and negative for larger τ^{2}; but the traces flatten as n increases, and by n=40 their bias is positive for 0.1≤τ^{2}≤1. The trace for SSW flattens similarly; and when n=40, its bias has smaller magnitude than the IVweighted estimators when 0.1≤τ^{2}≤0.5 and larger magnitude when 0.6≤τ^{2}≤1 (see Fig. 8, rows 1 and 2). When λ>0, the biases of all five estimators at τ^{2}=0 and the intercepts (i.e., biases) at τ^{2}=0.1 increase; for a given λ both the intercepts and the slopes decrease as n increases. As a result, when λ≥0,5, the bias of SSW usually has smaller magnitude than the IVweighted estimators. In relative terms, when n<40, the biases are substantial: as much as 10% of λ in some cases. Here SSW has the least bias, about 10% for λ≥1 and n=10, declining to 5% for λ≥1 and n=20 (Fig. 8, row 3). The bias correction for \(\hat {\lambda }_{i}\) reduces the bias (Fig. 8, row 4) and should be used.
All estimators have approximately the same MSE. This may seem astonishing for row 1, where SSW is, on average, more biased than the IVweighted estimators. The explanation lies in inefficiency of the IV weights based on the sample means and variances, for the lognormal distribution. The fixed weights are more efficient, and this is another argument for using SSW.
Coverage.tintervals centered at SSW provide the best coverage of λ, and that coverage is satisfactory when n≥20 and K≤30. Those intervals may have coverage greater than 97% (primarily when τ^{2}=0 and K=5 or 10 and in a few cases where τ^{2}=0.1,K=5,n=20 or 40, and λ≤0.5) or coverage less than 93% (mainly when τ^{2} is small, K≥50,n=20 or 40, and λ≥0.5). All other methods have inferior coverage and are not recommended (Fig. 10, row 1). Coverage of the intervals centered at SSW is better when the bias correction is used for \(\hat {\lambda }_{i}\); then it is good when n≥10. When n is small, K≥50, and λ=0, coverage of the standard methods improves somewhat, whereas coverage of SSW MP becomes less than 93% when K>50, especially for large τ^{2} (Fig. 10, row 2). When the bias correction is used, coverage of SSW MP is the best, and it is good overall for small λ, but it is much below 95% for n=4,λ≥0.5 and small τ^{2}, where it worsens for larger K. For large K and large n, coverage of SSW MP is still the best, especially at τ^{2}=0, and the bias correction still produces better results (Fig. 10, rows 3 and 4).
Examples
Two examples from the literature illustrate ways in which metaanalyses of LRR can differ. In the first example the standardized sample means appear to be normally distributed, whereas in the second example they are skewed to the right, so the individuallevel data may be more likely to have come from lognormal distributions.
Example 1: mate choice copying
Mate choice copying (MCC, initially observed in various nonhuman animal species) occurs when an individual’s likelihood of accepting or choosing a potential mate is greater if others regard that mate as attractive. In controlled experiments, human subjects are commonly shown images of model individuals and asked to rate the model’s desirability, interest, or physical attractiveness. Designs differ between female and male subjects. GoudaVossos et al. [35] report that “The majority of studies on MCC effects in humans have focused on ‘individualbased copying’ where the dependent variable is the response of the subject toward the target individual presented alongside an oppositesex other, compared with a target presented alone. Recent studies have suggested that MCC may … include assessing the underlying biological and social qualities of potential mates. For instance, a man in the presence of a ‘high quality woman of higher mate value’ who is physically attractive or has a desirable personality informs female copiers that the man must also have highquality features that are not readily observable.”
In one metaanalysis GoudaVossos et al. combined data from 17 studies of female choice in which the experimental manipulation was the addition of a female cue; that is, the female subject viewed images of male targets presented with (vs. without) a female. The data are available from the supplementary material in GoudaVossos et al. [35] and are reproduced in Fig. 11. The armlevel sample sizes range from 30 to 263, and all but four studies have balanced sample sizes. The response ratio was used as the measure of effect size; a positive logresponseratio indicates higher ratings when the male is in the presence of a female. Parker 2009 has substantially lower armlevel means than the other studies; but all values of \(\sqrt {n_{ij}} \bar {X}_{ij} / s_{ij}\) are above 6, and only six are less than 12, so proximity to zero should not cause problems. Figure 11 shows the forest plot from the REMLbased analysis. The authors concluded that “The positive mean revealed that men are rated 6.01% more attractive/desirable when in the presence of a female.” Q–Q plots of the standardized sample means (Additional File 2) showed their distribution to be approximately normal, though with some outliers. Therefore our simulation results for the normal distribution may be more relevant.
Table 2 shows the results from various methods of estimation. The estimators of τ^{2} show clear differences. DL provides the highest value at 0.0158, followed by J at 0.0119, REML at 0.0091, and MP at 0.0080, half the size of DL.
The closest simulation scenarios are for λ=0,τ^{2}=0 and 0.1, and n=40 and n=100. Under these conditions for both normal and lognormal distributions, all methods overestimate τ^{2} at zero, but we do not have results for positive values of τ^{2} below 0.1. Our results show that for small τ^{2}>0, the standard methods overestimate λ=0, and only SSW provides nearly unbiased estimates of λ under both normal and lognormal distributions. These results agree with our findings for this example. The SSW estimate of λ, 0.0528, is lower than all IVweighted estimates (except FE, which is negative).
As the sample sizes are moderate to large, the differences in estimated τ^{2} have considerable effect on the width of the confidence intervals for λ.
Among the IVweighted methods, only REML and MP produced confidence intervals for λ that did not include 0. These reflect the combination of a positively biased estimate of λ and the low estimate of τ^{2}. The original metaanalysis used REML. Interestingly, the HKSJ confidence intervals based on MP and DL also do not cross 0. For HKSJ (DL), this happens because it is actually shorter than the DL interval for these data. The SSW MP CI is wider than the HKSJ confidence intervals, and it includes 0.
As the sample sizes are moderate to large, the bias correction is not needed. It is not recommended for normally distributed data in any case.
Because, in our simulations, the IVweighted methods tend to give positively biased results, we consider the SSW point estimate to be close to the true value of λ, and the HKSJ MP interval to provide correct coverage. Therefore, we do not consider the effect of MCC in women to be significant.
Example 2: lowdose dopamine
Controversy surrounded the use of lowdose dopamine (a catecholamine with dosedependent effects) to promote kidney function in several categories of patients (e.g., having various types of surgery, receiving intravenous contrast dye). In a systematic review and metaanalysis of randomized controlled trials that compared lowdose dopamine (≤5μg/kg of body weight per minute) with placebo or no therapy, Friedrich et al. [36] examined clinical outcomes (mortality, need for renal replacement therapy, and adverse events) and renal outcomes (urine output, creatinine level, and creatinine clearance—each on Days 1, 2, and 3 after starting therapy). Because of differences among trials in the units for reporting urine output, Friedrich et al. could not use mean difference as the measure of effect. As a more clinically meaningful summary of treatment than the SMD, they chose the relative change in the dopamine group compared with the control group, analyzed as the logtransformed RoM.
J. Friedrich kindly provided the data on the effect of lowdose dopamine therapy on urine output at Day 1 after its initiation, which we show in Fig. 12. We reanalyze the data from the full sample of 34 studies, with armlevel sample sizes ranging from 6 to 160 and, separately, the data from the 10 studies in the “Other surgery” category. All but two studies had balanced sample sizes. A positive logresponseratio indicates higher urine output with dopamine. Because urine output is nonnegative, it is reasonable to assume that all μ_{ij}>0. All \(\bar {X}_{ij}\) are large enough that proximity to 0 should not be a problem. Figure 12 shows the forest plot from the DLbased analysis. The authors concluded that, on average, dopamine increased Day 1 urine output by 24%. Q–Q plots of the standardized sample means (Additional File 2) showed their distribution to be skewed to the right, so our simulation results for the lognormal distribution may be more relevant.
Table 3 shows the results from various methods of estimation. There are clear differences between the estimators of τ^{2}. DL provides the lowest value at 0.038, followed by REML at 0.041, J at 0.057, and MP at 0.067, almost twice as large as DL. Bias correction results in lower estimates of τ^{2}, from 0.032 for DL to 0.046 for J. The RE estimates of λ vary from 0.210 for SSW to 0.220 for MP. Bias correction further reduces the differences among RE estimates of λ (0.209 to 0.214). In both analyses, all methods find a significant effect of dopamine, with SSW MP providing the widest confidence intervals.
The closest simulation scenarios are for λ=0.2,K=30,τ^{2} between 0 and 0.1, and n=10 to n=100. Under these conditions for both normal and lognormal distributions, all methods overestimate τ^{2} when n=10, but are almost unbiased for n=40. We expect that all methods would provide coverage of τ^{2} that is too low, with QP being the best. We do not expect the bias correction to improve bias or coverage. Our results show that λ=0.2 is overestimated by the standard methods for small τ^{2}>0, and only SSW provides nearly unbiased estimates of λ for both normal and lognormal underlying distributions. This agrees with our findings for this example. The SSW estimate of λ, 0.210 (or 0.209 after bias correction), is lower than all IVweighted estimates.
For the 10 studies in the “Other surgery” subset, estimates of τ^{2} are higher, at 0.093 for DL to 0.226 for MP. From our simulations, they may be more positively biased, and in these data bias correction reduces estimates of τ^{2} to 0.088 to 0.166, respectively, probably reducing biases. We expect SSW to provide a nearly unbiased estimate of λ, at 0.270, with other methods positively biased (the highest at 0.314 for MP). Bias correction does not much affect the SSW estimate, but it does somewhat reduce the positive bias of the other methods (0.292 for MP). Coverage of λ should be nearly nominal for SSW MP (−0.347, 0.888) and lower than nominal for HKSJ MP, with the IVweighted methods providing considerably lower coverage. In summary, the IVweighted methods seem to indicate a significant effect of dopamine, whereas SSW MP does not.
Discussion
Our simulations raise concern about the current state of metaanalysis of LRR. For that effect measure and some others, the relation between the studies’ estimated effects and their estimated variances has several undesirable results: the performance of all inversevarianceweighted methods depends on the effect sizes, estimates of overall effects are biased, and their confidence intervals have belownominal coverage, especially for small sample sizes. Our simulations show this clearly.
It is well known that the distribution of a ratio of normal random variables X/Y is, in general, heavytailed and has no moments. It can be approximated by a normal distribution only when both X and Y are independent and have positive means, and their coefficients of variation fulfill some conditions [37]. The log transformation is used to resolve this difficulty and to make the distribution of LRR closer to normality. Similarly, for a lognormal underlying distribution, the log transformation will make the LRR ‘almost normal’. However, as we have seen, the LRR remains a very challenging effect measure.
For normal underlying distributions, we found considerable biases in all methods of estimating τ^{2} for very small n, but overall MP and REML are reasonable choices, especially for τ^{2} farther from zero. QP provides reasonable, though not perfect, coverage. For lognormal underlying distributions, τ^{2} cannot be estimated reliably for sample sizes smaller than 100; but once more, MP is a good choice for larger sample sizes.
Most applications of metaanalysis aim primarily to provide point and interval estimates of an overall effect. For general use, a point estimator should be unbiased, and a confidence interval should have (close to) nominal coverage.
After estimating the betweenstudy variance τ^{2}, customary practice uses an inversevarianceweighted mean to estimate the overall effect. The origin of the IVweighted approach lies in the fact that, for known variances, and given unbiased estimates of the studylevel effects, it provides a uniformly minimumvariance unbiased estimate (UMVUE). However, in practice, the withinstudy variances are unknown, and using estimates for them leads to bias in the IVweighted estimate of the overall effect and belownominal coverage of the confidence interval. Thus, the IVweighted approach is misguided; for most measures of effect, it cannot avoid these shortcomings.
To improve the coverage of confidence intervals systematically, we would like to have a better understanding of the underlying sampling distributions. For the IVweighted estimators of the overall effect, in this study, we have retained the customary use of critical values from the normal distribution. To remedy the familiar problem of belownominal coverage, the HKSJ interval uses the t distribution on K−1 degrees of freedom, and the SSW MP interval adopts that choice. We have, however, seen no empirical evidence relating these choices of reference distribution to the sampling distributions of the point estimators, for LRR or other measures of effect. In future work we intend to investigate sampling distributions of estimators of overall effect and τ^{2}.
A reviewer called attention to a modification of the IVweighted estimator of the overall effect that takes into account the relative quality of the studies [38, 39]. That approach has not been studied much in simulations, perhaps because of the need to assign judgments of the probability that each study is credible.
The gaps in evidence include the possibility that the unitlevel variances in the two arms may differ, which simulations rarely, if ever, reflect. Because of the extent of our simulations, we did not attempt to fill this gap. However, we do not expect the performance of the IVweighted methods to improve under morechallenging scenarios. Rather they are likely to perform worst in situations where the (withinstudy) variances of X_{iT} and X_{iC} differ substantially.
A pragmatic solution to unbiased estimation of λ uses weights that do not involve estimated variances (for example, weights proportional to the studies’ sample sizes n_{i}). Our point estimator SSW uses weights proportional to an effective sample size, \(\tilde {n}_{i} = n_{iC}n_{iT}/n_{i}\). Then the estimate of the overall effect is \(\hat {\lambda }_{\mathit {SSW}}=\sum \tilde {n}_{i}\hat {\lambda }_{i}/\sum \tilde {n}_{i}\), and the estimate of its variance comes from Eq. (7). Finally, the tbased confidence interval for λ is centered at \(\hat {\lambda }_{\mathit {SSW}}\).
For underlying lognormal distributions, SSW, combined with the biascorrected estimator of λ_{i}, works reasonably well for sample sizes as low as 10 in interval estimation of λ, and for n≥40 in point estimation. We recommend this method for further use in applications.
Unfortunately, for underlying normal distributions, the use of samplesizebased weights does not avoid all the problems with bias in estimation of λ for very small sample sizes and λ close to 0, and the bias correction is harmful. The response ratio is a rather challenging effect measure, with two additional sources of possible bias: its restricted range near 0 under the normal model, and transformation bias in using the log scale for specification of random effects. Both issues are inherent in the choice of the respective true model and cannot be easily resolved. Farther from 0, SSW is practically unbiased, and we recommend its use, in combination with SSW MP, which is the only feasible option for confidence intervals.
Conclusions

In extensive simulations the four point estimators of τ^{2} that we included all had considerable negative bias for normal withinstudy data and very small studylevel sample sizes (n), especially when one of the means was close to zero. For lognormal underlying data, τ^{2} cannot be estimated reliably for n<40. MP had the least bias overall, and DL should not be used.

For normal underlying data, the four interval estimators of τ^{2} that we included had coverage that often was below the nominal 95%, in various patterns, especially as the number of studies (K) increased. For lognormal underlying data, many situations were quite challenging. The QP and, to a lesser extent, PL interval estimators had coverage close to 95% when n≥100 or when n was moderate and K was small.

Point estimators of λ that use inversevariance weights had substantial bias, systematically related to τ^{2}, for normal underlying data, but less bias (with weaker relation to τ^{2}) for lognormal data.

An estimator whose weights used only armlevel sample sizes (SSW) typically has less bias and is recommended. Additionally, we recommend bias correction for lognormal data, but it is contraindicated for normal data.

We recommend an interval estimator based on SSW. It mostly had coverage close to 95% for both normal and lognormal data, but for small sample sizes (n≤40) and large numbers of studies (K≥50) its coverage was sometimes low. Interval estimators based on the inversevarianceweighted point estimators were inferior, in various patterns, and we do not recommend them.
Abbreviations
 BJ:

Biggerstaff and Jackson interval
 CV:

Coefficient of variation
 DL:

DerSimonianLaird
 FE:

Fixed effect
 HKSJ:

HartungKnappSidikJonkman interval
 IV:

Inversevariance
 J:

Jackson estimator/interval
 LOR:

Logoddsratio
 LRR:

Logresponseratio
 MP:

MandelPaule estimator
 MSE:

Mean squared error
 PL:

Profile likelihood
 QP:

Qprofile interval
 QQ plot:

Quantilequantile plot
 RE:

Random effects
 REML:

Restricted maximum likelihood
 RoM:

Ratio of means
 RR:

Response ratio
 SMD:

Standardized mean difference
 SSW:

Samplesizeweighted estimator
 UMVUE:

Uniformlyminimumvarianceunbiased estimator
References
 1
Koricheva J, Gurevitch J. Uses and misuses of metaanalysis in plant ecology. J Ecol. 2014; 102(4):828–44.
 2
Nakagawa S, Santos ES. Methodological issues and advances in biological metaanalysis. Evol Ecol. 2012; 26(5):1253–74.
 3
Hedges LV, Gurevitch J, Curtis PS. The metaanalysis of response ratios in experimental ecology. Ecology. 1999; 80(4):1150–6.
 4
Friedrich JO, Adhikari NK, Beyene J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in metaanalysis: a simulation study. BMC Med Res Methodol. 2008; 8:32.
 5
Lajeunesse MJ. Bias and correction for the log response ratio in ecological metaanalysis. Ecology. 2015; 96(8):2056–63.
 6
Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuss O, Higgins JPT, Langan D, Salanti G. Methods to estimate the betweenstudy variance and its uncertainty in metaanalysis. Res Synth Methods. 2016; 7(1):55–79.
 7
Bakbergenuly I, Kulinskaya E, Hoaglin DC. Estimation in metaanalyses of mean difference and standardized mean difference. Stat Med. 2020; 39(2):171–91.
 8
Doncaster CP, Spake R. Correction for bias in metaanalysis of littlereplicated studies. Methods Ecol Evol. 2018; 9(3):634–44.
 9
DerSimonian R, Laird N. Metaanalysis in clinical trials. Control Clin Trials. 1986; 7(3):177–88.
 10
Mandel J, Paule RC. Interlaboratory evaluation of a material with unequal numbers of replicates. Anal Chem. 1970; 42(11):1194–7.
 11
Jackson D. Confidence intervals for the betweenstudy variance in random effects metaanalysis using generalised Cochran heterogeneity statistics. Res Synth Methods. 2013; 4(3):220–9.
 12
Viechtbauer W. Confidence intervals for the amount of heterogeneity in metaanalysis. Stat Med. 2007; 26(1):37–52.
 13
Biggerstaff BJ, Jackson D. The exact distribution of Cochran’s heterogeneity statistic in oneway random effects metaanalysis. Stat Med. 2008; 27(29):6093–110.
 14
Hardy RJ, Thompson SG. A likelihood approach to metaanalysis with random effects. Stat Med. 1996; 15(6):619–29.
 15
Hamman EA, Pappalardo P, Bence JR, Peacor SD, Osenberg CW. Bias in metaanalyses using Hedges’d. Ecosphere. 2018; 9(9):02419.
 16
Bakbergenuly I, Hoaglin DC, Kulinskaya E. Methods for estimating betweenstudy variance and overall effect in metaanalysis of oddsratios. Res Synth Methods. 2020; 11(3):426–42. https://doi.org/10.1002/jrsm.1404.
 17
Hartung J, Knapp G. A refined method for the metaanalysis of controlled clinical trials with binary outcome. Stat Med. 2001; 20(24):3875–89.
 18
Sidik K, Jonkman JN. A simple confidence interval for metaanalysis. Stat Med. 2002; 21(21):3153–9.
 19
Bakbergenuly I, Kulinskaya E, Morgenthaler S. Inference for binomial probability based on dependent Bernoulli random variables with applications to metaanalysis and group level studies. Biom J. 2016; 58(4):896–914.
 20
RubioAparicio M, LópezLópez JA, SánchezMeca J, MarínMartínez F, Viechtbauer W, Van den Noortgate W. Estimation of an overall standardized mean difference in randomeffects metaanalysis if the distribution of random effects departs from normal. Res Synth Methods. 2018; 9(3):489–503.
 21
Sidik K, Jonkman JN. Robust variance estimation for random effects metaanalysis. Comput Stat Data Anal. 2006; 50(12):3681–701.
 22
Li Y, Shi L, Roth DH. The bias of the commonlyused estimate of variance in metaanalysis. Commun Stat – Theory Methods. 1994; 23(4):1063–85.
 23
Rukhin AL. Weighted means statistics in interlaboratory studies. Metrologia. 2009; 46(3):323.
 24
Cochran WG. The combination of estimates from different experiments. Biometrics. 1954; 10(1):101–29.
 25
Bakbergenuly I, Kulinskaya E, Hoaglin DC. Pitfalls of using the risk ratio in metaanalysis. Res Synth Methods. 2019; 10(3):398–419.
 26
Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions, Volume 1, 2nd edn. New York: John Wiley & Sons; 1994.
 27
Viechtbauer W. Bias and efficiency of metaanalytic variance estimators in the randomeffects model. J Educ Behav Stat. 2005; 30(3):261–93.
 28
DerSimonian R, Kacker R. Randomeffects model for metaanalysis of clinical trials: an update. Contemp Clin Trials. 2007; 28(2):105–14.
 29
Hedges LV, Olkin I. Statistical Methods for MetaAnalysis. San Diego, CA: Academic Press; 1985.
 30
Hunter JE, Schmidt FL. Methods of metaanalysis: Correcting error and bias in research findings. Thousand Oaks, CA: Sage Publications, Inc; 1990.
 31
Senior AM, Grueber CE, Kamiya T, Lagisz M, O’Dwyer K, Santos ESA, Nakagawa S. Heterogeneity in ecological and evolutionary metaanalyses: its magnitude and implications. Ecology. 2016; 97(12):3293–9.
 32
Bakbergenuly I, Kulinskaya E, Hoaglin DC. Simulation study of estimating betweenstudy variance and overall effect in metaanalyses of logresponseratio for normal data: 2020. https://doi.org/10.31222/osf.io/3bnxs. MetaArXiv January 7.
 33
Bakbergenuly I, Kulinskaya E, Hoaglin DC. Simulation study of estimating betweenstudy variance and overall effect in metaanalyses of logresponseratio for lognormal data: 2019. eprint arXiv:1905.01243 [stat.ME].
 34
Higgins J, Thompson SG. Quantifying heterogeneity in a metaanalysis. Stat Med. 2002; 21(11):1539–58.
 35
GoudaVossos A, Nakagawa S, Dixson BJW, Brooks RC. Mate choice copying in humans: a systematic review and metaanalysis. Adapt Hum Behav Physiol. 2018; 4(4):364–86.
 36
Friedrich JO, Adhikari N, Herridge MS, Beyene J. Metaanalysis: lowdose dopamine increases urine output but does not prevent renal dysfunction or death. Ann Intern Med. 2005; 142(7):510–24.
 37
DíazFrancés E, Rubio FJ. On the existence of a normal approximation to the distribution of the ratio of two independent normal random variables. Stat Pap. 2013; 54:309–23. https://doi.org/10.1007/s0036201204292.
 38
Doi SAR, Thalib L. A qualityeffects model for metaanalysis. Epidemiology. 2008; 19(1):94–100.
 39
Doi SAR, Barendregt JJ, Khan S, Thalib L, Williams GM. Simulation comparison of the quality effects and random effects methods of metaanalysis. Epidemiology. 2015; 26(4):42–4.
Acknowledgments
We are grateful to Dr. Jan Friedrich, who kindly provided the raw data for Example 2.
Funding
The work by E. Kulinskaya was supported by the Economic and Social Research Council [grant number ES/L011859/1]. The funder had no role in study design, data collection, analysis and interpretation, decision to publish, or preparation of the manuscript.
Author information
Affiliations
Contributions
All authors contributed to the conception, design, and methodology of this study. IB carried out the simulations, DCH classified and summarized their results, and EK drafted the first version of the manuscript. All authors have participated in revisions and have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Ethics approval was not required for the conduct of this research.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1
QQ plots for the standardized means in Examples 1 and 2.
Additional file 2
R procedures to implement MP SSW, HKSJ MP, and MP IV methods for metaanalysis of LRR.
Additional file 3
Relation of I^{2} to parameters underlying our simulations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Bakbergenuly, I., Hoaglin, D.C. & Kulinskaya, E. Estimation in metaanalyses of response ratios. BMC Med Res Methodol 20, 263 (2020). https://doi.org/10.1186/s12874020011371
Received:
Accepted:
Published:
Keywords
 Betweenstudy variance
 Heterogeneity
 Randomeffects model
 Metaanalysis
 Logresponseratio
 Ratio of means