 Research article
 Open Access
 Published:
A comparison of analytic approaches for individual patient data metaanalyses with binary outcomes
BMC Medical Research Methodology volume 17, Article number: 28 (2017)
Abstract
Background
Individual patient data metaanalyses (IPDMA) are often performed using a onestage approach a form of generalized linear mixed model (GLMM) for binary outcomes. We compare (i) onestage to twostage approaches (ii) the performance of two estimation procedures (Penalized QuasilikelihoodPQL and Adaptive Gaussian Hermite QuadratureAGHQ) for GLMMs with binary outcomes within the onestage approach and (iii) using stratified studyeffect or random studyeffects.
Methods
We compare the different approaches via a simulation study, in terms of bias, meansquared error (MSE), coverage and numerical convergence, of the pooled treatment effect (β _{1}) and betweenstudy heterogeneity of the treatment effect (τ _{1} ^{2} ). We varied the prevalence of the outcome, sample size, number of studies and variances and correlation of the random effects.
Results
The twostage and onestage methods produced approximately unbiased β _{1} estimates. PQL performed better than AGHQ for estimating τ _{1} ^{2} with respect to MSE, but performed comparably with AGHQ in estimating the bias of β _{1} and of τ _{1} ^{2} . The random studyeffects model outperformed the stratified studyeffects model in small size MA.
Conclusion
The onestage approach is recommended over the twostage method for small size MA. There was no meaningful difference between the PQL and AGHQ procedures. Though the randomintercept and stratifiedintercept approaches can suffer from their underlining assumptions, fitting GLMM with a randomintercept are less prone to misfit and has good convergence rate.
Background
Individual Patient Data (IPD) metaanalyses (MA) are regarded as the gold standard in evidence synthesis and are increasingly being used in current practice [1, 2]. However, the implementation of the analysis of IPDMA requires additional expertise and choices [3], particularly when the outcome is binary. These include (i) should a one or twostage model be used [4, 5], (ii) what estimation procedure should be used to estimate the onestage model [6, 7] and, (iii) should the study effect be fixed or random [8].
Although IPDMA were conventionally analyzed via a twostage approach [9], over the last decade, use of the onestage approach has increased [10]. Recently, some have suggested that the twostage and onestage frameworks produce similar results for MA of large randomized controlled trials [5]. The literature suggests the onestage method is particularly preferable when few studies or few events are available as it uses a more exact statistical approach than relying on a normality approximation [3–5].
When IPD are available and the outcome is binary, the onestage approach consists of estimating Generalized Linear Mixed Models (GLMMs) with a random slope for the exposure, to allow the exposure effect to vary across studies. Penalized quasilikelihood (PQL) introduced by Breslow and Clayton is a popular method for estimating the parameters in GLMMs [11]. However, regression parameters can be badly biased for some GLMMs, especially with binary outcomes with few observations per cluster, low outcome rates, or high between cluster variability [12, 13]. Adaptive Gaussian Hermite quadrature (AGHQ) is the current favored competitor to PQL, which approximates the maximum likelihood by numerical integration [14]. Although estimation becomes more precise as the number of quadrature points increases, it often gives rise to computational difficulties for highdimension random effects and convergence problems where variances are close to zero or cluster sizes are small [14].
The heterogeneity between studies is an important aspect to consider when carrying out IPDMA. Such heterogeneity may arise due to differences in study design, treatment protocols or patient populations [8]. When such heterogeneity is present, the convention is to include a random slope in the model as it captures the variability of the exposure across studies. However, there are corresponding assumptions in regards to the study effect being modelled as stratified or random [4, 15].
Few comparisons of GLMMs have been reported in the context of IPDMA with binary outcomes [4, 15], that is, when the number of studies and the number of subjects within each study is small, study sizes are imbalanced, in the presence of large betweenstudy heterogeneity and small exposure effects and there is an interest in the variance parameter of the random treatment effect. According to previous literature, these factors have all been identified as influencing model performance [6]. While several simulation studies have been published, these have mainly limited their attention to simple models with only random intercepts [13, 16]. Thus, the performance of the random effects models including both a random intercept and a random slope are less well known.
Our objective was to assess and compare via simulation studies, (i) onestage approaches to conventional twostage approaches (ii) the performance of different estimation procedures for GLMMs with binary outcomes, and (iii) using stratified studyeffect or random studyeffects in a randomized trial setting. We use our results to develop guidelines on the choice of methods for analyzing data from IPDMA with binary outcomes and to understand explicitly the tradeoffs between computational and statistical complexity.
Methods section introduces the models we are considering, the design of the simulation study and the assessment criteria. In Results section, results for the different methods under varying conditions are presented and discussed. Discussion section concludes with a discussion.
Methods
We conducted a simulation study to compare various analytic approaches to analyze data from IPDMA with binary outcomes. Hereto, our methods all assume that betweenstudy heterogeneity exists, as it is likely in practice, and so only random treatmenteffects IPD metaanalysis models are considered.
Data Generation
The data generation algorithm was developed to generate twolevel data sets (e.g. patients grouped into studies). We generated a binary outcome (Y _{ ij }) and a single binary exposure (X _{ ij }). We denote the number of studies j = 1, 2 …, K and i = 1, 2 …, n _{ j } denotes the individuals per study. Therefore, Y _{ ij } is the outcome observed for the i ^{th} individual from the j ^{th} study.
The dichotomous exposure variable, X _{ ij }, was generated from a Bernoulli distribution with probability = 0.5 and recoded \( \pm {\scriptscriptstyle \raisebox{1ex}{$1$}\!\left/ \!\raisebox{1ex}{$2$}\right.} \) to indicate control/treatment group [15]. To generate the binary outcome variable Y _{ ij }, first the probability of the outcome was calculated from the randomstudy and –treatment effects logistic regression model (Eq. 1), or the stratifiedstudy effects model (Eq. 2):
Here π _{ ij } is the true probability of the outcome for the i ^{th} individual from the j ^{th} study, β _{0}denotes the mean logodds of the outcome (studyeffect) and β _{1} the pooled treatment effect (log odds ratio). The random effects (b _{0j } and b _{1j }) were generated from a bivariate normal distribution with mean = 0 and variancecovariance matrix \( \Sigma =\left(\begin{array}{cc}\hfill {\sigma}^2\hfill & \hfill \rho \sigma \tau \hfill \\ {}\hfill \rho \sigma \tau \hfill & \hfill {\tau}^2\hfill \end{array}\right) \) for the random studyeffect case. In the stratified study effects case, (i.e. Eq. (2)), β _{ j }, were generated from a uniform distribution and b _{1j } was generated from a normal distribution with zero mean and variance, τ ^{2}.
A Bernoulli distribution with probability π _{ ij } from Eqs. (1) and (2) was used to generate the binary outcome Y _{ ij }.
The number of studies, study size, total sample size, variances and correlation of the random effects, and average conditional probability were all varied, with levels described in Table 1. For each distinct combination (n = 480) of simulation parameters, 1000 IPDMA were generated from each Eqs. (1) and (2), allowing us to investigate a wide range of scenarios. The heterogeneity was set at I^{2} = 0.01, 0.23 and 0.55 as defined by τ ^{2}/(τ ^{2} + π ^{2}/3) for a binary outcome using an odds ratio [17]. The levels correspond to: little or no, low and moderate heterogeneity respectively [18].
A sensitivity analysis was also considered to explore the performance of different methods when just 5% of observation had a positive outcome.
Models
Twostage IPD methods
In the twostage approach, each study in the IPD was analyzed separately via logistic regression
The first step estimated the studyspecific intercept and slope and their associated withinstudy covariance matrix (consisting of the variances of the intercept and slope, as well as the covariance) for each study. This model reduces the IPD to its relative treatment effect estimate and variance for each study then at the second stage these aggregate data (AD) are synthesized (described below).
Model 1 Bivariate metaanalysis
The AD were combined via a bivariate randomeffects model that simultaneously synthesized the estimates whilst accounting for their correlation, and the withinstudy correlation [4]. The model assumes that the true effects follow a bivariate normal distribution and is estimated via restricted maximum likelihood with the following marginal distributions of the estimates [19]:
where Σ is the unknown betweenstudy variancecovariance matrix of the true effects (γ _{0} and γ _{1}) and C _{ j } (j = 1, …, K) the within study variancecovariance matrix with the variances of the estimates.
Model 2: Conventional DerSimonian and Laird approach
The within study and betweenstudy covariance estimates are often times not estimated since most researchers assumed that studies are independent, and instead a univariate metaanalysis of the logit of the odds ratios is performed [20]. The marginal distribution of the pooled estimated treatment effect under this approach is easily obtained as:
with unknown parameters γ _{1} and τ _{1} ^{2} , estimated via the inverse variance weighted noniterative method (methodofmoments) [21].
Onestage IPD methods
The onestage approach analyzes the IPD from all studies simultaneously, while accounting for clustering of subjects within studies [4]. The onestage model is a form of GLMM. Two different specifications are considered.
Model 3 Random intercept and random slope
We estimated a GLMM with a random study effect u _{0j } and a random treatment effect u _{1j } via PQL and AGHQ, and allowed the random effects to be correlated, which implies that the betweenstudy covariance between u _{0j } and u _{1j } is fully estimated.
Model 4Stratified intercept onestage
Finally, the stratified onestage approach estimates a separate intercept for each study rather than constraining the intercepts to follow a normal or other distribution. Therefore, there is no need for the normality assumption for the study membership, hence, the betweenstudy covariance term is no longer estimated. The model is defined as follows:
where I _{ k = j } indicates that a separate intercept should be estimated for each study j = 1, …, K and u _{1j } ~ N(0, τ _{1} ^{2} ). Parameters of both Models 3 and 4 were estimated via PQL and AGHQ.
Estimation Procedures and Approximations
The parameters of the onestage models were estimated using PQL and AGHQ. For the twostage approach, a logistic regression was first estimated for each study via maximum likelihood. The parameters of the twostage model were estimated via methodofmoments (MOM) (Model 2) and restricted maximum likelihood (REML) (Model 1) [21–23] at the second stage.
Both likelihoodbased methods (PQL and AGHQ) were implemented on SAS version 9.4 using PROC GLIMMIX with default options [24]. The number of quadrature points in AGHQ was selected automatically [25], the absolute value for parameter convergence criterion was 10–8 and the maximum number of iterations was 100.
Therefore, for each generated data set the following models were fit.

Twostage approach (Models 1 and 2)

Onestage approach via GLMMs (Models 3 and 4) estimated with PQL.

Onestage approach via GLMMs (Models 3 and 4) estimated with AGHQ.
Assessment criteria
The performance of the estimation methods was evaluated using: a) numerical convergence, b) absolute bias; c) root mean square error (RMSE); and d) coverage probability  of the pooled treatment effect and its betweenstudy variability.
Numerical convergence
The convergence rate was estimated for all models fit, as the number of simulation repetitions that did converge (without returning a warning message) divided by the total attempted (M = 1000). Models that returned a warning message specifying that the estimated variancecovariance matrix was not positive definite or that the optimality condition was violated were considered not to have converged.
Bias
The Monte Carlo bias of the pooled treatment effect and its betweenstudy heterogeneity is defined as the average of the bias in the estimates provided by each method as compared to the truth, across the 1000 IPDMA in each scenario. The Monte Carlo estimate of the bias is computed as
where \( {\hat{\theta}}_j \) were the parameter estimates and θ was the true parameter of the pooled treatment effect or its betweenstudy variance. We also reported the mean absolute bias (AB).
Mean square error
The mean square error (MSE) is a useful measure of the overall accuracy, because it penalizes an estimate for both bias and inefficiency. The Monte Carlo estimate of the MSE is:
For each scenario, the RMSE of the pooled treatment effect and its betweenstudy heterogeneity was reported, as this measure is on the same scale as the parameter.
Coverage probability
We estimated coverage for the pooled treatment effect and its betweenstudy heterogeneity for the various methods. Gaussian coverage was estimated, where if \( \left\hat{\theta}\theta \right\le 1.96\times S E\left(\hat{\theta}\right) \) the true value was covered, and if \( \left\hat{\theta}\theta \right>1.96\times S E\left(\hat{\theta}\right) \) it was not.
We reported the median, the 25^{th} and 75^{th} percentiles of the AB and RMSE of the pooled treatment effect and its betweenstudy heterogeneity but reported percentages for the numerical convergence and coverage rate.
Results
Tables 2, 3, 4, 5, 6 and 7 present the median and interquartile range of the AB, RMSE, coverage and convergence of the pooled treatment effect and its betweenstudy variance, respectively, as estimated via two and onestage; AGHQ and PQL; randomintercept and stratifiedintercept methods. We reported results for data generated with imbalances in study sizes (different sample size in all studies) for both the randomintercept and stratifiedintercept data generation (Eqs. 1 and 2) with correlated random effects (ρ = 0.5), as this scenario is likely the closest to reallife.
We did not exclude results from metaanalyses that returned a warning message (imperfect convergence). These metaanalyses were included as nonconvergence and although these models failed to produce proper parameter estimates, these estimates were included in the calculation of the bias and the MSE.
One versus Twostage
In Tables 2 and 3, results for the absolute bias (AB) of the estimates for the pooled treatment effect β _{1} are given. Recalling that the true parameter value was 0.18, we see that the biases were identical and under 0.05 in the onestage and the twostage approaches in both small and large data sets. Results were very comparable when the outcome rate was reduced from 30 to 5% (Additional file 1: Table S1). For both the one and the twostage, results depended on the true τ ^{2}, and the sample size.
For the larger sample size, root mean square error (RMSE) in β _{1} was generally slightly larger when the onestage method was used than when the twostage was used. The picture was similar across all heterogeneity levels (Tables 2 and 3) and when the outcome rate was reduced (Additional file 2: Table S3).
Neither onestage nor twostage methods yielded coverage of β _{1} close to nominal levels (Tables 2 and 3). Increasing sample size had a positive effect on percent coverage, and increasing the true heterogeneity made estimation more difficult, hence decreasing the coverage (Table 3).
Absolute bias of the betweenstudy heterogeneity, τ _{1} ^{2} was usually slightly lower when the onestage approach was used than when the twostage approach was (Tables 2 and 3), particularly, when the sample size was small (Table 2) and when greater amount of heterogeneity exist in the random effects (Bottom panel of Table 2). Regarding the effects of the simulation parameters, AB decreased when data was generated with equal study sizes and increased when the rate of occurrence was reduced (Additional file 3: Table S2). In these cases, the onestage approach was most biased.
The RMSE of τ _{1} ^{2} for the onestage estimates was mostly smaller than the RMSE of the twostage method estimates. For increased sample size or reduction in the level of heterogeneity in the random effects, RMSE of τ _{1} ^{2 } decreased at least by a factor of three across both methods. While the RMSE of τ _{1} ^{2 } was inflated when the outcome rate was reduced, the onestage method continued to outperform that of the twostage method (Additional file 4: Table S4).
Convergence was not a problem for the twostage approach while convergence of the onestage method varied from 90 to 100% (Tables 2 and 3).
AGHQ versus PQL
Onestage models estimated via PQL and AGHQ methods often yielded similar AB in β _{1}. There was no observed difference in the AB (β _{1}) between the methods when the outcome rate was reduced (Additional file 1: Table S1).
RMSE of β _{1} were generally greater when AGHQ was used than when PQL was used (Tables 4 and 5). Decreasing sample size, increasing the variances of the random effects or reducing the event rate (Additional file 2: Table S3) made precise estimation more difficult, hence RMSE increased.
When the true heterogeneity was large and total sample was small (Top panel of Table 4), AGHQ provided coverage for β _{1}closer to nominal levels than PQL, while both methods provided comparable coverage when the sample size was increased (Table 5). Note that across both methods, levels of coverage were higher as heterogeneity increased and similar coverage was observed when the outcome rate was reduced (Additional file 5: Table S5).
AB in τ _{1} ^{2} , was very comparable but slightly lower when PQL was used rather than AGHQ (Tables 4 and 5). The AB decreased with increasing sample size, particularly, when PQL was used (Table 5). There was substantial bias in τ _{1} ^{2} estimates when the event rate was reduced (Additional file 3: Table S2).
On account of a better overall performance of PQL with regards to AB, RMSE of τ _{1} ^{2 } was generally lower with PQL than with AGHQ (Tables 4 and 5). RMSE decreased with decreased variability in the random effects, and with increased sample size. In addition, PQLestimates continued to yield smaller RMSE than AGHQestimates when the outcome rate was reduced (Additional file 4: Table S4).
We found important undercoverage of the estimates for τ _{1} ^{2 } for both estimation methods, particularly when PQL was used (Tables 4 and 5). The percent coverage was usually fair for both estimation methods when sample size increased, but was poor when the outcome rate was reduced (Additional file 6: Table S6).
Convergence occurred more often when AGHQ was used than when PQL was used (Tables 4 and 5). Convergence was problematic for PQL, particularly when true heterogeneity was low and sample size was small (Bottom panel of Table 4). Comparable convergence was seen when the event rate was reduced (Additional file 5: Table S5).
Random intercept versus stratifiedintercept
The results of the simulation studies, modeling the intercept as random or fixed (random slope was always considered) via PQL estimation are summarized in the Tables 6 and 7.
The convergence was markedly low (1497%) for the fixed intercept & random slope method (Tables 6 and 7). Convergence was only reasonable for the approach when the sample size was large and heterogeneity was small, whereas convergence was always greater than 80% for the random intercept and slope approach.
In general, AB in β _{1} was similar for both stratifiedintercept (randomslope only) and random intercept & slope methods. Regarding the simulation parameters, sample size and variability of the random effects, were not influential in reducing the AB in β _{1}.
The RMSE in β _{1} was smaller when estimated via the random intercept and slope model than when only a random slope was fit (Tables 6 and 7).
Increased sample size and level of heterogeneity in the random effect was most influential in determining coverage probability.
Absolute bias in τ _{1} ^{2} was clearly comparable when fit with a random intercept & slope approach or a random slope only (Tables 6 and 7). For lower outcome rate, there was a trend towards less pronounced bias when a random slope only was fit (Additional file 3: Table S2).
We observed lower RMSE of τ _{1} ^{2} when a random intercept was fit, especially when the true heterogeneity was large (Top panels of Tables 6 and 7). Comparable results were seen when both models were fit in large sample and the true heterogeneity was small (Bottom panel of Table 7) also when outcome rate was reduced (Additional file 4: Table S4).
We found significant under coverage of τ _{1} ^{2 } when both models were fit, however, this was more severe when a random slope only model was fit (Tables 6 and 7). When the generated values of τ _{0} ^{2} or τ _{1} ^{2} were low (i.e. low variability in the random effects) and sample size was increased, we had less difficulty to estimate the coverage of τ _{1} ^{2} when both models were fit. The coverage probability continued to be an issue when the rate of occurrence was reduced (Additional file 6: Table S6).
Discussion
Findings
Our simulation results indicate that when the number of subjects per study is large, the one and twostage methods yield very similar results. Our results also confirm the finding of previous empirical studies [5, 26, 27] that in some cases, the onestage and twostage IPDMA results coincide. However, we found discrepancies between these methods, with a slight preference towards the onestage method when the number of subjects per study is small. In these situations, neither method produced accurate estimates for the betweenstudy heterogeneity associated with the treatmenteffect; however, the biases were larger for the twostage approach. Furthermore, onestage methods produced less biased and more precise estimates of the variance parameter and had slightly higher coverage probabilities, though these differences may be due to using the REML estimate of τ _{1} ^{2} instead of the der Simonian and Laird estimator used in the twostage approach.
Estimation of GLMMs with binary outcomes continues to pose challenges, with many methods producing biased regression coefficients and variance components [7]. AGHQ has been shown to overestimate the variance component with few clusters or few subjects [17]. On the contrary, PQL has been found to underestimate the variance component while the standard errors are overestimated [12]. In the context of IPDMA, we found similar absolute bias of the PQL and AGHQestimated pooled treatment effect, while the PQLestimates of the betweenstudy variance had greater precision when study sizes were small and random effects were correlated. This somewhat confirms previous results, which found that PQL suffers from large biases but performs better in terms of MSE than AGHQ [6]. Both estimation methods experienced difficulty in attaining nominal coverage of the betweenstudy heterogeneity associated with the treatment effect in two situations: (i) when the number of studies included was small and/or (ii) the true variances of the random effects were small. We also found that convergence was not an important problem for AGHQ when metaanalyses included studies with less than 50 individuals per study. However, convergence was poor when the prevalence of the outcome was reduced to 5% and the true heterogeneity was close to zero.
Stratification of the intercept in onestage models avoids the need to estimate the random effect for the intercept and the correlation between the random effects. This approach may be preferable in situations not investigated in this work (e.g. when the distribution of the random effects is skewed). However, this approach suffered from marked convergence rates when fit to small data sets (15 studies and on average 500 subjects).
Strengths and Limitations
We used simulation studies to compare various analytic strategies to analyze data arising from IPDMA across a wide range of data generation scenarios but made some simplifications. We only considered binary outcomes, one dichotomous treatment variable, a twolevel data structure, and no confounders. Moreover, we estimated GLMMs via PQL and AGHQ, but did not compare Bayesian or other estimation methods, which might be particularly useful in sparse scenarios [28]. We have made the assumption throughout that IPD were available. Certainly, the time and cost associated with collecting IPD are considerable. However, once such data is in hand, we have addressed several open questions relating to the best way to analyze it. We should also note that methods exist for combining IPD and aggregated data [7]. Further study is needed to investigate alternative confidence intervals (or coverage probability) for the betweenstudy heterogeneity that can be used to remedy the undercoverage of Gaussian intervals. The normalitybased intervals (coverage rate) we studied greatly underperformed in most scenarios because the constructions of the confidence interval are likely to be invalid. A further simplification that limits the generalizability of this work is that it is restricted to only twoarm trials. The extension to three or more arms would require careful consideration of more complicated correlation structures in treatment effects across arms and within studies [29].
One important comparison we have not addressed is, computational speed where the twostage method had a distinct advantage over the onestage; PQL was faster than AGHQ and the stratifiedintercept model runtime was quicker than the randomintercept model.
As far as we know, this simulation study is the first to simultaneously generate data with normally distributed and stratified random intercepts. This study also compares approaches that include a random intercept for study membership to those that do not. Furthermore, the use of simulation  to systematically investigate the robustness of the approaches to variation in sample size, study number, outcome rate, magnitude of correlation and variances. As a result, our scenarios have allowed us to assess performance without being too exhaustive.
Guidelines for Best Practice
On the basis of these findings, we can make several recommendations. When the IPDMA included many studies and the outcome rate was not too low, this work supports the conclusion of a previous study [5] that the conventional twostage method by DerSimonian and Laird [21] is a good choice under the data conditions simulated here. Cornell et al. found that the DL method produced toonarrow confidence bounds and p values that were too small when the number of studies was small or there was high betweenstudy heterogeneity [30]. In such cases, a modification such as the HartungKnapp approach may be preferable [31]. Further, while the bivariate twostage approach is very rarely used in practice, we found that it tended to yield good overall model performance, comparable with that of the onestage models when study sizes are small. In addition, our results also suggest that the onestage method can be used in IPDMA where study sizes are less than 50 subjects per study or few events were recorded in most studies (outcome rate of 5%). In these cases, the onestage approach is more appropriate as it models the exact binomial distribution of the data and offers more flexibility in model specification over the twostage approach [32].
If interest lies in estimation of the pooled treatment effect or the betweenstudy heterogeneity of the treatment effect, estimation using PQL appeared to be a better choice due to its lower bias and mean square error for the settings considered. On the contrary, computational issues such as convergence occurred less with this technique than with AGHQ. However, it is important to note that convergence and coverage in τ ^{2} was an issue in small and large total sample sizes and also, when level of true heterogeneity was large.
For these simulated data, the results of both the randomintercept and stratifiedintercept models were not importantly different. However, under both data generations, fitting a GLMM with the randomintercept was overall less sensitive to misspecification in small sample sizes with large betweenstudy heterogeneity than the stratifiedintercept GLMM since we have observed high rates of nonconvergence via the stratifiedintercept model.
There are four important caveats to these recommendations. First, our simulations show greater accuracy of the pooled odds ratio as the number of studies increase. Therefore, an IPDMA with more studies will provide more accurate estimates. Secondly, our results show that the estimation of the betweenstudy heterogeneity of the treatment effect is highly biased regardless of the sample size and number of studies. Therefore, we should always expect that the variance parameter be estimated with some error. Thirdly, small overall samples mark the tradeoff under which a metaanalyst might consistently choose precision over bias and our simulations show that PQL estimation may be preferred in these situations. Finally, large overall sample size can eliminate the lack of statistical power present in small overall samples. In such cases, comparable results are seen for one and twostage methods and fitting a twostage analysis as a first step may be advisable. This could aid as a quick and efficient investigation of heterogeneity and treatmentoutcome association.
Conclusion
To summarize, the one and twostage methods consistently produced similar results when the number of studies and overall sample are large. Although the PQL and AGHQ estimation procedures produced similar bias of the pooled log odds ratios, PQLestimates had lower RMSE than the AGHQestimates. Both the randomintercept and stratifiedintercept models yielded precise and similar estimates for the pooled log odds ratios. However, the randomintercept models gave good coverage probabilities of the betweenstudy heterogeneity in small sample sizes and yielded overall good convergence rate as compared to the random slope only model.
Abbreviations
 AB:

Absolute bias
 AGHQ:

Adaptive Gaussian hermite quadrature
 GLMM:

Generalized linear mixed model
 IPDMA:

Individual patient data metaanalysis
 MOM:

Methodofmoments
 MSE:

Meansquared error
 PQL:

Penalized quasilikelihood
 REML:

Restricted maximum likelihood
References
 1.
Riley RD, Simmonds MC, Look MP. Evidence synthesis combining individual patient data and aggregate data: a systematic review identified current practice and possible methods. J Clin Epidemiol. 2007;60(5):431–9. doi:10.1016/j.jclinepi.2006.09.009. [published Online First: Epub Date].
 2.
Stewart LA, Parmar MK. Metaanalysis of the literature or of individual patient data: is there a difference? Lancet. 1993;341(8842):418–22.
 3.
Debray T, Moons K, Valkenhoef G, et al. Get real in individual participant data (IPD) meta‐analysis: a review of the methodology. Res Synth Methods. 2015;6(4):293–309.
 4.
Debray TPA, Moons KGM, AboZaid GMA, et al. Individual participant data metaanalysis for a binary outcome: Onestage or Twostage? PLoS ONE. 2013;8(4):e60650. doi:10.1371/journal.pone.0060650. [published Online First: Epub Date].
 5.
Stewart GB, Altman DG, Askie LM, et al. Statistical analysis of individual participant data metaanalyses: a comparison of methods and recommendations for practice. PLoS ONE. 2012;7(10):e46042. doi:10.1371/journal.pone.0046042. [published Online First: Epub Date].
 6.
Callens M, Croux C. Performance of likelihoodbased estimation methods for multilevel binary regression models. J Stat Comput Simul. 2005;75(12):1003–17. doi:10.1080/00949650412331321070. [published Online First: Epub Date].
 7.
Capanu M, Gönen M, Begg CB. An assessment of estimation methods for generalized linear mixed models with binary outcomes. Stat Med. 2013;32(26):4550–66. doi:10.1002/sim.5866. [published Online First: Epub Date].
 8.
Rondeau V, Michiels S, Liquet B, et al. Investigating trial and treatment heterogeneity in an individual patient data metaanalysis of survival data by means of the penalized maximum likelihood approach. Stat Med. 2008;27(11):1894–910. doi:10.1002/sim.3161. [published Online First: Epub Date].
 9.
Simmonds MC, Higgins JP, Stewart LA, et al. Metaanalysis of individual patient data from randomized trials: a review of methods used in practice. Clinical trials (London, England). 2005;2(3):209–17.
 10.
Thomas D, Radji S, Benedetti A. Systematic review of methods for individual patient data metaanalysis with binary outcomes. BMC Med Res Methodol. 2014;14:79.
 11.
Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993;88(421):9–25. doi:10.2307/2290687. [published Online First: Epub Date].
 12.
Breslow NE, Lin X. Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika. 1995;82(1):81–91. doi:10.2307/2337629. [published Online First: Epub Date].
 13.
Jang W, Lim J. A numerical study of PQL estimation biases in generalized linear mixed models under heterogeneity of random effects. Commun Stat Simul Comput. 2009;38(4):692–702. doi:10.1080/03610910802627055. [published Online First: Epub Date].
 14.
Pinheiro JC, Bates DM. Approximations to the Loglikelihood function in the nonlinear mixedeffects model. J Comput Graph Stat. 1995;4(1):12–35. doi:10.2307/1390625. [published Online First: Epub Date].
 15.
Turner RM, Omar RZ, Yang M, et al. A multilevel model framework for metaanalysis of clinical trials with binary outcomes. Stat Med. 2000;19(24):3417–32.
 16.
Benedetti A, Platt R, Atherton J. Generalized linear mixed models for binary data: Are matching results from penalized quasilikelihood and numerical integration less biased? PLoS ONE. 2014;9(1):e84601. doi:10.1371/journal.pone.0084601. [published Online First: Epub Date].
 17.
Moineddin R, Matheson FI, Glazier RH. A simulation study of sample size for multilevel logistic regression models. BMC Med Res Methodol. 2007;7:34. doi:10.1186/14712288734. [published Online First: Epub Date].
 18.
Higgins JP, Thompson SG, Deeks JJ, et al. Measuring inconsistency in metaanalyses. BMJ (Clin Res Ed). 2003;327(7414):557–60. doi:10.1136/bmj.327.7414.557. [published Online First: Epub Date].
 19.
van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in metaanalysis: multivariate approach and metaregression. Stat Med. 2002;21(4):589–624. doi:10.1002/sim.1040. [published Online First: Epub Date].
 20.
Riley RD. Multivariate metaanalysis: the effect of ignoring withinstudy correlation. J R Stat Soc A Stat Soc. 2009;172(4):789–811. doi:10.1111/j.1467985X.2008.00593.x. [published Online First: Epub Date].
 21.
DerSimonian R, Laird N. Metaanalysis in clinical trials. Control Clin Trials. 1986;7(3):177–88.
 22.
Chen H, Manning AK, Dupuis J. A method of moments estimator for random effect multivariate metaanalysis. Biometrics. 2012;68(4):1278–84. doi:10.1111/j.15410420.2012.01761.x. [published Online First: Epub Date].
 23.
Hardy RJ, Thompson SG. A Likelihood approach to metaanalysis with random effects. Stat Med. 1996;15(6):619–29. doi:10.1002/(SICI)10970258(19960330)15:6<619::AIDSIM188>3.0.CO;2A. [published Online First: Epub Date].
 24.
Littell RC, Milliken GA, Stroup WW, Wolfinger DR. SAS system for mixed models. Cary: SAS Institute, Inc.; 1996.
 25.
Proc Glimmix. Maximum Likelihood Estimation Based on Adaptive Quadrature, SAS Institute Inc., SAS 9.4 Help and Documentation. Cary: SAS Institute Inc; 2002–2004.
 26.
AboZaid G, Guo B, Deeks JJ, et al. Individual participant data metaanalyses should not ignore clustering. J Clin Epidemiol. 2013;66(8):865–73.e4. doi:10.1016/j.jclinepi.2012.12.017. [published Online First: Epub Date].
 27.
Mathew T, Nordström K. Comparison of Onestep and Twostep metaanalysis models using individual patient data. Biom J. 2010;52(2):271–87. doi:10.1002/bimj.200900143. [published Online First: Epub Date].
 28.
Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press; 2007.
 29.
Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med. 2004;23(20):3105–24. doi:10.1002/sim.1875. [published Online First: Epub Date].
 30.
Cornell JE, Mulrow CD, Localio R, Stack CB, Meibohm AR, Guallar E, et al. Randomeffects metaanalysis of inconsistent effects: a time for change. Ann Intern Med. 2014;160(4):267–70.
 31.
IntHout J, Iaonnidis JPA, Borm GF. The the hartungknappsidikjonkman method for random effects metaanalysis is straightforward and considerably outperforms the standard DerSimonianlaird method. BMC Med Res Methodol. 2014;14:25.
 32.
Noh M, Lee Y. REML estimation for binary data in GLMMs. J Multivar Anal. 2007;98(5):896–915. http://dx.doi.org/10.1016/j.jmva.2006.11.009. [published Online First: Epub Date].
Acknowledgement
We have no acknowledgements.
Funding
This work was supported by an operating grant from the Canadian Institutes of Health Research. Andrea Benedetti is supported by the FRQS.
Availability of data and materials
Data are available upon request.
Authors’ contributions
DT led this project in the study design, performed simulation of data and statistical analyses, and also led the writing of the manuscripts. AB participated in the study design, guided statistical analyses and edited the final draft. RP helped draft and revised the manuscript. All authors read and approved the final manuscript.
Competing interest
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable. This article reports a simulation study and does not involve human participants.
Author information
Additional files
Additional file 1:
Median (Interquartile range (IQR)) absolute bias (%) for treatment effect, β_{1} for different approach, by number of studies, total average sample size, mixture of studies sizes and degree of random effects variances  data generated from random study and treatment effect: Eq. 1 with 5% outcome rate. (DOC 72 kb)
Additional file 2:
Median (Interquartile range (IQR)) (%) root mean square error for treatment effect, β_{1} for different approach, by number of studies, total average sample size, mixture of studies sizes and degree of random effects variances  data generated from random study and treatment effect: Eq. 1 with 5% outcome rate. (DOC 70 kb)
Additional file 3:
Median (Interquartile range (IQR)) absolute bias (%) for random treatmenteffect variance, τ^{2} _{1} for different approach, by number of studies, total average sample size, mixture of studies sizes and degree of random effects variances  data generated from random study and treatment effect: Eq. 1 with 5% outcome rate. (DOC 73 kb)
Additional file 4:
Median (Interquartile range (IQR)) (%) root mean square error for random treatmenteffect variance, τ^{2} _{1} for different approach, by number of studies, total average sample size, mixture of studies sizes and degree of random effects variances  data generated from random study and treatment effect: Eq. 1 with 5% outcome rate. (DOC 63 kb)
Additional file 5:
Percent Coverage (percent convergence rate) for treatment effect, β_{1} for different approach, by number of studies, total average sample size, mixture of studies sizes and degree of random effects variances  data generated from random study and treatment effect: Eq. 1 with 5% outcome rate. (DOC 62 kb)
Additional file 6:
Percent Coverage for random treatmenteffect variance, τ^{2} _{1} for different approach, by number of studies, total average sample size, mixture of studies sizes and degree of random effects variances  data generated from random study and treatment effect: Eq. 1 with 5% outcome rate. (DOC 63 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Thomas, D., Platt, R. & Benedetti, A. A comparison of analytic approaches for individual patient data metaanalyses with binary outcomes. BMC Med Res Methodol 17, 28 (2017). https://doi.org/10.1186/s1287401703077
Received:
Accepted:
Published:
Keywords
 Individual patient data metaanalyses
 One and twostage models
 Generalized linear mixed models
 Penalized quasilikelihood
 Adaptive gausshermite quadrature
 Fixed and random studyeffects