 Research article
 Open Access
 Published:
Simulationbased power calculations for planning a twostage individual participant data metaanalysis
BMC Medical Research Methodology volume 18, Article number: 41 (2018)
Abstract
Background
Researchers and funders should consider the statistical power of planned Individual Participant Data (IPD) metaanalysis projects, as they are often timeconsuming and costly. We propose simulationbased power calculations utilising a twostage framework, and illustrate the approach for a planned IPD metaanalysis of randomised trials with continuous outcomes where the aim is to identify treatmentcovariate interactions.
Methods
The simulation approach has four steps: (i) specify an underlying (data generating) statistical model for trials in the IPD metaanalysis; (ii) use readily available information (e.g. from publications) and prior knowledge (e.g. number of studies promising IPD) to specify model parameter values (e.g. control group mean, intervention effect, treatmentcovariate interaction); (iii) simulate an IPD metaanalysis dataset of a particular size from the model, and apply a twostage IPD metaanalysis to obtain the summary estimate of interest (e.g. interaction effect) and its associated pvalue; (iv) repeat the previous step (e.g. thousands of times), then estimate the power to detect a genuine effect by the proportion of summary estimates with a significant pvalue.
Results
In a planned IPD metaanalysis of lifestyle interventions to reduce weight gain in pregnancy, 14 trials (1183 patients) promised their IPD to examine a treatmentBMI interaction (i.e. whether baseline BMI modifies intervention effect on weight gain). Using our simulationbased approach, a twostage IPD metaanalysis has < 60% power to detect a reduction of 1 kg weight gain for a 10unit increase in BMI. Additional IPD from ten other published trials (containing 1761 patients) would improve power to over 80%, but only if a fixedeffect metaanalysis was appropriate. Prespecified adjustment for prognostic factors would increase power further. Incorrect dichotomisation of BMI would reduce power by over 20%, similar to immediately throwing away IPD from ten trials.
Conclusions
Simulationbased power calculations could inform the planning and funding of IPD projects, and should be used routinely.
Background
Individual patient data (IPD) metaanalysis involves obtaining and then synthesising the raw, individual leveldata from multiple studies. The approach has become increasingly common over the past decade [1,2,3], due to the increasing willingness (and expectation [4]) of collaborators to share their IPD in order to answer questions previously unconsidered or not powered in their primary studies. One typical question is whether a patientlevel characteristic modifies a treatment effect, in order to identify subgroups of patients who may be at greater benefit (or harm) than others. Such stratified medicine is a major interest of clinical decision makers and pharmaceutical companies, looking to identify those populations in whom treatment is more effective (or less harmful) [5]. A single trial is usually underpowered for this purpose. Brookes et al. [6] show that if a single trial has 80% power to detect a particular treatment effect (across all patients), then its power to detect an interaction (with a binary covariate) with the same magnitude as the overall treatment effect will only be 29%. To ensure 80% power to detect the interaction, the sample size in a single trial needs to be increased by approximately four times. Furthermore, to have 80% power to detect an interaction term half the size of the overall treatment effect there needs to be an approximately 16fold increase in sample size. Therefore a project that pools the IPD from multiple trials is highly appealing to funders to substantially increase the power to detect genuine treatmentcovariate interactions.
However, IPD metaanalyses are both timeconsuming and expensive to perform, requiring significant resources to obtain, clean and harmonise the IPD from relevant trials before then synthesising them; a process that can take months or even years [7, 8]. Therefore, before embarking on an IPD project, researchers and funders should ensure that it is likely to be worth the effort. In particular, how many studies are likely to provide their IPD and, based on this, what is the potential power of the planned IPD metaanalysis? In our experience, power calculations and sample size justifications are rarely reported in IPD metaanalysis protocols or publications. Researchers are perhaps grateful for whatever IPD can be obtained, and appeal to any IPD metaanalysis adding value over a single trial. However, if it was known in advance that IPD from a particular number of studies would only increase power to 50%, then researchers and funders may think twice before undertaking the IPD project. Conversely, if a potential IPD metaanalysis increases the power to over 80%, then funders will be more reassured that the IPD project is worth resourcing. Power calculations could also reveal which studies contribute most to the power, and thus direct how much IPD is needed and from which studies, although this last point is potentially contentious.
Formal power calculations for an IPD metaanalysis are nontrivial and depend on many factors, which perhaps explains why they are currently neglected. The IPD cannot be considered as coming from a single trial, and thus sample size calculations must account for the clustering of patients within trials and the potential heterogeneity (e.g. in baseline risk and treatment effects) betweentrials. Also, the power depends on the choice and specification of analysis model (e.g. covariates to be included, number of parameters, magnitude of effects), and the parameter estimation method, amongst other factors. Therefore, simple algebraic solutions are not straightforward unless simplifying conditions are made [9,10,11]. For this reason, Kontopantelis et al. previously proposed a simulationbased approach, where IPD metaanalysis datasets are simulated multiple times based on a chosen datagenerating mechanism (including numbers of studies, effect sizes, and heterogeneity), and then a chosen onestage IPD metaanalysis model is applied to each dataset, with subsequent results (e.g. estimates and confidence intervals) summarised over the multiple analyses [12]. In particular, the proportion of all simulations that give a pvalue < 0.05 can be calculated, to give an estimate of the power.
Complementary to this work, in this paper we also propose simulationbased power calculations but within a twostage IPD metaanalysis framework, rather than a onestage. The twostage approach is more common in practice, as the second stage enables metaanalysis models (such as inverse variance weights) and estimation methods (such as DerSimonian and Laird [13] or restricted maximum likelihood, REML) that are familiar to those working in the metaanalysis field. Also, it avoids convergence problems that are often more problematic for onestage models (due to the inclusion of many study stratification terms and/or multiple random effects), and enables novel approaches (such as HartungKnapp SidikJonkmann, HKSJ [14, 15]) to deriving confidence intervals that account for uncertainty in variance estimates. Crucially, it also automatically avoids ecological bias, which occurs in onestage models when a treatmentcovariate interaction is included without separating out individuallevel associations from acrossstudy associations [16, 17].
Below, we describe our new proposal and apply it to a real IPD metaanalysis of randomised trials in pregnancy, where the aim is to examine an interaction between baseline BMI and treatment effect. This illustrates how to tailor power calculations to the IPD metaanalysis at hand, using prior information (e.g. from published articles) and contextspecific knowledge. The article is structured as follows. Section 2 briefly explains the twostage approach to an IPD metaanalysis of continuous outcomes from randomised trials. Section 3 then outlines our simulationbased approach to power calculations, and Section 4 then details its application to the pregnancy example. Section 5 concludes with discussion, including how to extend to continuous and timetoevent outcomes.
Methods
The twostage approach to IPD metaanalysis
We now introduce the twostage approach to IPD metaanalysis of continuous outcomes, which was recently described by Burke et al. [18]
First stage
Let us assume that there are i = 1 to K randomised trials for the IPD metaanalysis and that a treatment effect is of interest. In the twostage approach, usually the first stage involves a separate analysis in each study to derive the K treatment effect estimates and their variances, using an appropriate method chosen by the metaanalyst. In particular, a suitable regression model can be used for the outcome of interest, as now described.
If the outcome is continuous (weight, say) then one may use, for example, maximum likelihood (ML) estimation to fit an appropriate linear regression in each study separately. The ideal approach is an analysis of covariance (ANCOVA) model [19], which regresses the final value at end of followup, y_{ Fij }, and adjusts for baseline value, y_{ Bij }, and treatment (x_{ ij } = 0/1 for participants in the control/treatment group) for the j^{th} participant in the i^{th} trial, as follows:
In this model, α_{ i } is the intercept (the expected final value in the control group for those with a baseline value of zero), δ_{ i } is the expected effect on the final value for a 1unit increase in the baseline value, θ_{ i } is the treatment effect (the mean difference in weight between treatment groups after adjusting for baseline value), and σ_{ i }^{2} is the residual variance of the responses after accounting for the treatment effect and baseline value. As this model is fitted to each study separately, the true values of all parameters are naturally allowed to be different in each study (hence the i subscripts).
Although ANCOVA is preferred, sometimes baseline values are not provided in available IPD studies, and therefore alternative analyses are required, such as a final score model or a change score model. A final score model is the same as model (1), except without the δ_{ i }y_{ Bij } term. The change score model is sensible when only the change score for each patient (\( {\mathit{\mathsf{y}}}_{\mathit{\mathsf{ij}}} \), say) is provided in the IPD, such as the weight gain during pregnancy from baseline (e.g. first consultation during pregnancy) to end of followup (e.g. last consultation before birth). The change score is then regressed against the treatment effect:
In this model, α_{ i } is the intercept (e.g. the expected weight gain in the control group), θ_{ i } is the treatment effect (the mean difference in weight gain between treatment groups), and σ_{ i }^{2} is the residual variance of the responses after accounting for the treatment effect. It is worth noting that where interest lies in the change rather than final score, the change score model can also be adjusted for baseline to accurately estimate the treatment effect and its uncertainty.
Further baseline covariates might also be included in eqs. (1) and (2) in order to increase power or to adjust for baseline confounding. Indeed, an IPD metaanalysis project is usually initiated in order to go beyond the overall treatment effect, and examine how baseline covariates are associated with (interact with) treatment effect, in order to identify effect modifiers. For example, to examine the interaction between baseline BMI measured as a continuous variable and treatment effect (i.e. a treatmentBMI interaction), eq. (1) can be modified to,
and eq. (2) modified to
where the interaction term, λ_{ i }, denotes the mean increase in treatment effect for a 1unit increase in the baseline BMI value. Estimation of eqs. (3) or (4) in each trial then provides the metaanalyst with K treatmentcovariate interaction estimates (and their variances) ready for the second stage. Although continuous variables such as BMI, and interactions with BMI, could alternatively be modelled as categorical or with nonlinear trends, in this article we generally assume that a linear relationship is appropriate. However, our approach can easily be adapted to situations where nonlinear trends are considered more plausible.
Second stage
Following estimation of an equation such as (1) to (4) in each trial separately, the metaanalyst obtains K parameter estimates of interest. For example, eqs. (1) to (2) would provide treatment effect estimates,\( {\widehat{\theta}}_i \), and their variances, Var(\( {\widehat{\theta}}_i \)); whilst eqs. (3) and (4) would provide interaction effect estimates, \( {\widehat{\lambda}}_i \), and their variances, Var(\( {\widehat{\lambda}}_i \)). These can now be combined in the second stage of the IPD metaanalysis. Let us focus on pooling treatmentcovariate interactions (\( {\widehat{\lambda}}_i \)), as these are usually the primary focus for an IPD metaanalysis of randomised trials. However, what follows could equally apply to any parameter estimate of interest, such as a treatment effect or a prognostic factor effect.
A metaanalysis model is chosen to pool the interaction estimates, \( {\widehat{\lambda}}_i \), typically assuming that the true interaction is either fixed (common) or random across studies. The fixed effect model assumes that \( {\widehat{\lambda}}_i \)are all estimates of the same underlying interaction effect in all studies, represented as λ. It can be written generally as [20],
where the Var(\( {\widehat{\lambda}}_i \)) estimates are also taken from the first stage, and usually assumed known. The most common method to estimate λ is the inverse variance method, which provides a weighted average, where the weight of each trial, w_{ i }, is defined as [21],
and the pooled interaction effect, λ, and its variance are calculated by:
The random effects model allows for betweenstudy variation, τ^{2}, in the true interaction effect, and makes the assumption that the different studies are estimating different, yet related, interaction effects. The random effects model can be written generally as [20],
where the Var(\( {\widehat{\lambda}}_i \)) estimates are again typically assumed known, and the true interaction effect in the i^{th} trial, λ _{ i }, is assumed normally distributed about an average interaction effect, λ, with betweenstudy variance, τ^{2}. Equation (9) reduces to equation (5) when τ^{2} equals zero. To obtain metaanalysis results, an inverse variance approach can again be taken but with the weights of each trial now adjusted to incorporate an estimate of τ^{2}:
Then, the estimate of the summary interaction effect and its variance are calculated using:
There is ongoing debate about the best method to estimate τ^{2} [15, 22]. Traditionally, the most common method of estimating τ^{2} is the noniterative, nonparametric methods of moments (MoM) estimator of DerSimonian and Laird [13]. However, other noniterative estimators are available [23, 24], and iterative methods such as REML are also popular.
Following estimation of the chosen metaanalysis model, a standard 95% confidence interval for λ can be calculated as \( \widehat{\lambda} \) ± 1.96\( \sqrt{\mathit{\operatorname{var}}\left(\widehat{\lambda}\right)\ }. \)However, this has been criticised because it ignores uncertainty in variance estimates, in particular \( {\widehat{\tau}}^2 \), and thus leads to inappropriate coverage of confidence intervals (inflated type I errors) [15, 25]. To address this, alternative methods have been proposed for deriving 95% confidence intervals for the summary effect; in particular, the HKSJ approach provides a modification to the variance (var_{ HKSJ }) of the summary estimate [14, 26,27,28,29], and derives 95% confidence intervals by \( \widehat{\lambda}\pm \left({t}_{0.975,k1}\sqrt{{\mathit{\operatorname{var}}}_{HKSJ}\left(\widehat{\lambda}\right)}\right) \), which are usually appropriately wider than the standard approach.
Simulationbased power calculations for a twostage IPD metaanalysis of continuous outcomes
We now propose our simulationbased approach to power calculations, which utilise the twostage IPD metaanalysis framework. The general premise is that an IPD metaanalysis dataset is simulated and then a twostage metaanalysis performed. This is repeated many (e.g. thousands of) times (m, say), and each time the resulting summary estimates, confidence intervals and pvalues are stored. Based on a traditional frequentist paradigm, power can then be estimated by calculating the proportion of times the summary estimate was statistically significant (e.g. as defined by the associated 95% confidence interval excluding the null value, or equivalently an associated pvalue < 0.05). The general stepbystep process is now outlined.
Step (i): Specification of a statistical model in each trial
Firstly, a data generating model needs to be assumed for each trial. Ideally, this should be in accordance with the model that will be fitted in the first stage of the twostage IPD metaanalysis. For example, ANCOVA model (1) might be assumed when interest lies in a continuous outcome and a treatment effect, or model (3) if the focus is a treatmentcovariate interaction effect. However, if baseline values are potentially not available, change score models (2) and (4) may be alternatively assumed. The choice may also be influenced by the reported information in the publications, for example in regard whether final score or change score summary statistics are given, as these inform step (ii) below. Also, it may help to centre covariates about their trialspecific mean value, to ease the interpretation of the parameters for step (ii).
Step (ii): Choose parameter values for the statistical model and study characteristics (e.g. number of patients, covariate distributions)
Next, sensible parameter values need to be specified for the chosen model. Table 1 provides a summary of the input values required for continuous outcomes, respectively, when adopting models (1) to (4) as the statistical model within each trial. This includes specifying the magnitude of trial intercepts (control group responses / baseline risk), the magnitude and distribution of treatment and interaction effects, and the magnitude of residual and betweenstudy variances. Also required are characteristics of the trials themselves. That is, the number of trials promising IPD, the number of patients therein, and the distribution of covariate values (e.g. proportion in the treatment and control groups; mean and standard deviation of baseline BMI in each trial; etc).
Though this may sound onerous, it is usual to know which trials may provide (or could be approached for) their IPD. Then, aggregate information (summary statistics) available in trial publications and reports can be used to inform the values of parameters and characteristics within trials. This is illustrated in detail in the worked example in Section 4.
Step (iii): Generate an IPD metaanalysis dataset and undertake a twostage IPD metaanalysis
Following steps (i) and (ii), an IPD metaanalysis dataset of a given number of trials and patients can be generated based on the statistical model and characteristics specified, using the simulation approach. This requires userwritten software to randomly generate the IPD metaanalysis dataset based on the conditions given. Our supplementary material provides Stata code to illustrate how this can be done for the pregnancy example presented in Section 4 (see Additional file 1).
Once the IPD metaanalysis dataset is generated, a twostage IPD metaanalysis is then applied as outlined in the previous section, to obtain the summary effect estimate of interest, and its associated confidence interval and pvalue. The exact approach depends on the preference of the user. For example, after model (4) is applied to each trial separately, the second stage could implement either model (5) or (9) to pool the trial interaction estimates using either a fixed effect or random effects analysis, respectively. Confidence intervals and pvalues of summary estimates can then be calculated, for example using the standard normalbased approach or the HKSJ method.
Step (iv): Repeat multiple times and evaluate power
Step (iii) is then repeated many (thousands of) times, until m summary effect estimates, confidence intervals and pvalues are obtained. Assuming the IPD were simulated according to a genuine effect (e.g. a nonzero mean difference between treatment and control, or a nonzero treatmentcovariate interaction), the proportion of these m results that were statistically significant gives an estimate of the power of the IPD metaanalysis. Thus, it reveals the probability that, if the IPD metaanalysis project could be repeated identically many times, the summary result would detect (with statistical significance) the genuine effect. The definition of statistical significance is of course arbitrary. Usually p < 0.05 (or equivalently the 95% confidence excluding the null value) will be used, but the user can adapt this if desired (e.g. p < 0.01), for example for multiple testing. Once the power estimate is obtained, a 95% confidence interval for the power can also be calculated (for example using an exact method [30]), which will become narrower as m increases.
It is also sensible for steps (i) to (iv) to be repeated after adopting different (yet still realistic) parameter values, to ascertain if and how power changes accordingly. For example, initially the assumed model may assume no betweentrial heterogeneity on treatment or interaction effects, but this may be relaxed in subsequent simulations. This will be illustrated in Section 4.
Applied example: Power of a planned IPD metaanalysis of trials of interventions to reduce weight gain in pregnant women
We now illustrate the key concepts through an applied example. In this example, our aim is to reflect the process researchers go through when considering or planning an IPD metaanalysis project. We assume that a clinical question has been identified and an IPD metaanalysis project is desired to address it. Additionally, a set of trials has been identified (and potentially promised their IPD) and aggregate data (summary statistics) for these trials have been published. The researchers want to know, in advance of collecting IPD, whether an IPD metaanalysis of these trials is likely to be powered to answer the clinical question at hand.
Background for applied example
Thangaratinam et al. [31] performed a systematic review to investigate the effects of weight management interventions on maternal and fetal outcomes. One of the primary outcomes was maternal weight gain and their aggregate data metaanalysis of 30 randomised trials showed a significant average reduction in weight gain of 0.97 kg (95% CI: 0.34 kg to 1.60 kg reduction) for lifestyle interventions compared with control. However, there was a large amount of betweenstudy heterogeneity, with an Isquared statistic of 87% and \( {\widehat{\tau}}^2 \)of 1.87. Therefore, a major recommendation of Thangaratinam et al. was that an “IPD metaanalysis is needed to provide robust evidence on the differential effect of intervention in various groups based on BMI, age, parity, socioeconomic status and medical conditions in pregnancy”. That is, IPD was needed to examine potential treatmentcovariate interactions.
In response to this, in 2012 the Weight Management in Pregnancy International IPD Collaboration (iWIP) was established to share IPD from multiple randomised trials, and the National Institute for Health Research (NIHR) Health Technology Assessment (HTA) programme subsequently funded the project. At the time of developing the grant application, 14 of the trials (containing 1183 patients) included in the aforementioned aggregate data metaanalysis had provisionally agreed to provide their IPD. These are summarised in Table 2, including information about the weight gain in each treatment group, and the distribution of baseline BMI values. No formal power calculation was originally performed for the iWIP grant application, but it was noted that in order to detect treatmentcovariate interactions “our IPD metaanalysis provides an efficient way to substantially increase the sample size, without the need for a new trial”.
Retrospectively, we now consider how our simulationbased approach to power would have been useful to the iWIP collaborators, to provide formal quantitative reassurance of the power of their planned IPD metaanalysis project. Here we focus on the power to detect a potential interaction between baseline BMI and intervention effect, which was one of the primary objectives of their study. The prior hypothesis was that those with high baseline BMI may benefit most from weight management interventions.
What is the power to detect a treatmentBMI interaction with 14 trials promising IPD?
We start by applying random effects metaanalysis model (9) to the 14 published intervention effect estimates shown in Table 2. This gives a summary mean difference of − 0.84 kg (95% CI: 1.63 to − 0.06), indicating an average reduction in weight gain of 0.84 kg by using an intervention rather than control. Heterogeneity was large, with an Isquared statistic of 63% and \( {\widehat{\tau}}^2 \)of 1.1, with the latter estimated by the approach of DerSimonian and Laird (methods of moments) [13]. These results are very similar to those from the original aggregate data metaanalysis of 30 trials, suggesting the 14 trials are broadly representative of the original set of trials. Let us now apply our simulationbased approach, following the steps described in Section 3, to quantify the potential power to detect an interaction between baseline BMI and intervention effect using these 14 trials, with BMI measured as a continuous covariate and linear effects and interactions for BMI assumed correct.
Methods for applied example
Step (i): Specification of the treatmentcovariate model in each trial
The first step of the simulation approach is to define an underlying (data generating) model for each trial. It is preferable to keep this simple and reflect the analysis model that is likely to be used in the first stage of the IPD metaanalysis. As weight is a continuous outcome, we ideally wanted to consider an ANCOVA model (1). However, the summary statistics reported in each trial mainly focused on weight gain (rather than final weight), thus it was considered sensible to focus initially on eq. (4), to ease specification of parameter values in step (ii) (NB extension to ANCOVA is considered in Section 4.5). Thus the assumed model was as follows:
Here, Y_{ ij } is the weight gain during pregnancy for patient j in trial i, and this is regressed against baseline BMI value (\( {\overline{\mathrm{BMI}}}_{ij}\Big) \), the treatment group (x_{ ij }), and the interaction between baseline BMI and treatment (\( {x}_{ij}\times {\overline{BMI}}_{ij}\Big) \). Note that \( {\overline{BMI}}_{ij} \)denotes the baseline BMI value for patient j centred about the mean baseline BMI in trial i. This specification greatly eases the interpretation and specification of model parameters in step (ii).
We also assumed that,
such that the residuals (e_{ ij }) in each trial have a variance of \( {\sigma}_i^2 \), and the parameters of β_{ i } (the effect of a 1unit increase in baseline BMI on the mean control group weight gain), θ_{ i } (the treatment effect for a patient with the mean baseline BMI) and λ_{ i } (the effect of a 1unit increase of baseline BMI above the mean baseline BMI on the treatment effect) are drawn from independent normal distributions with means (β, θ, λ) and variances (\( {\tau}_{\beta}^2 \), \( {\tau}_{\theta}^2 \), \( {\tau}_{\lambda}^2 \)). This is the simplest option, but of course different (and dependent) betweentrial distributions could be assumed, but for parsimony the use of normal distributions was deemed sensible. If considered important, betweenstudy correlation could also be included between the baseline risk (α_{ i }) and overall treatment effect (θ_{ i }).
Step (ii): Choose parameter values for the statistical model and study characteristics
In order to simulate IPD under this model structure, the next step was to specify the assumed magnitude of α_{ i }, β, θ, λ, \( {\sigma}_i^2 \), \( {\tau}_{\alpha}^2 \), \( {\tau}_{\beta}^2 \), \( {\tau}_{\theta}^2 \), and \( {\tau}_{\lambda}^2 \). Though this may seem onerous, it is relatively straightforward. A summary of our chosen parameter values is given in Table 3, and we now explain the justification.
Each α_{ i } corresponds to the mean weight gain for control individuals with the mean BMI, which we considered similar to the mean weight gain in the control group, and was available for each trial (Table 2). For example, α_{1} was set to 13.3. The residual variance (\( {\sigma}_i^2 \)) in each trial was approximated from the standard deviation of weight gain values available from the publications (Table 2). For example, for trial 1, assuming that the residual variance would be the same for control and treatment groups, we took an average of 5.5^{2} and 7.5^{2}, which is 43.25. We assumed that θ = − 0.84, which is the summary treatment effect estimate from the aforementioned metaanalysis of the 14 published estimates. Similarly, based on the estimated betweentrial variability in treatment effects from this metaanalysis, we assumed that \( {\tau}_{\theta}^2 \) = 1.1.
It was considered sensible to have a parsimonious situation where \( {\tau}_{\beta}^2 \) and \( {\tau}_{\lambda}^2 \) were zero, such that there was no betweentrial heterogeneity in the prognostic effect of baseline BMI or in the interaction effect (this latter assumption is relaxed in Section 4.4). A value for β was also needed. Using the nine trials with baseline BMI information, a random effects metaregression of the mean weight gain versus the mean baseline BMI in the control group (weighted by the inverse of the variance of mean weight gain) was fitted, and this gave an association of − 0.28. We took this studylevel association as a proxy for the individuallevel association represented by β, which suggests that weight gain decreases by 0.28 kg for every unit increase in baseline BMI. This agrees with guidelines that recommend weight gain should be lower in those with a higher baseline BMI. In this way, the generation of an individual’s change in weight is now correlated with the baseline BMI (and thus baseline weight), as expected by definition. Furthermore, it allows individuals with a high BMI to be more likely to be amongst a small subset that actually lose weight during pregnancy, which is plausible given the reported magnitude of the standard deviations for weight gain relative to the mean value (Table 2).
Lastly, we needed to choose the magnitude of λ, our key parameter of interest in the IPD metaanalysis. Our specification of model (13) assumes that the interaction effect is linear, such that a 1unit increase in baseline BMI modifies the treatment effect on weight gain by λ. Although categorical or nonlinear relationships could alternatively be assumed [32], the linear effect was chosen for parsimony. The hypothesis that the treatment effect may be larger for those with a higher baseline BMI implies that λ would be negative. Rather than choosing a single value for λ, we repeated simulations for each of a range of values between − 0.01 and − 0.5, moving from small (and potentially not clinically important) to extremely large interaction effects. For example, if λ was 0.05, then for a ten unit increase in baseline BMI, there would be an extra 0.5 kg reduction in weight gain by using the intervention rather than the control.
The number of trials in the IPD metaanalysis was set at 14 trials, each containing the number of patients known (Table 2), with close to an even allocation of patients to treatment and control groups. The distribution of baseline BMI was also needed within each trial. For nine trials, the published data gave the mean baseline BMI and its standard deviation (Table 2), and we assumed a normal distribution for baseline BMI in these trials. For example, for trial 1, using the average observed values for the treatment and control groups, baseline BMI was assumed to be normally distributed with a mean of 34.75 and variance of 12.5. For the remaining five trials without BMI information, the mean baseline BMI was drawn from a normal distribution with a mean of 30 and standard deviation of 2.5, and a withintrial standard deviation of 3.5 was assumed; this was based on the range of baseline BMI values observed within and across the other nine trials.
Steps (iii): Generate an IPD metaanalysis dataset and undertake a twostage IPD metaanalysis
We created a module within Stata that generated an IPD metaanalysis dataset containing 14 trials based on model (13) and the chosen set of parameter values and trial characteristics shown in Table 3. That is, in each trial, for each patient we randomly generated their treatment group (x_{ ij }), their baseline BMI value centred about the observed trial’s mean baseline BMI (\( {\overline{BMI}}_{ij} \)), and their weight gain (Y_{ ij }).
This enabled us, within the same Stata module, to then immediately undertake a twostage IPD metaanalysis. In the first stage, model (13) was fitted to each trial separately to produce the treatmentBMI interaction estimate and its variance; then, in the second stage a fixed effect metaanalysis model (model (5)) was used to pool the interaction estimates.
Step (iv): Repeat multiple times and evaluate power
Step (iii) was repeated until we had randomly generated 10,000 IPD metaanalysis datasets, each containing 14 trials. For each of the 10,000 datasets, the Stata module performed a twostage IPD metaanalysis and the results were stored. This produced 10,000 summary treatmentBMI interaction estimates and their 95% confidence intervals and pvalues (one for each IPD metaanalysis dataset). Confidence intervals were derived using the standard (normalbased) method. The power of the planned IPD metaanalysis was then calculated as the proportion of 10,000 metaanalyses where the summary interaction estimate was detected by a pvalue < 0.05 (or equivalently a 95% confidence interval that did not contain the null value).
The Stata module to implement steps (i) to (iv) is provided in the supplementary material (see Additional file 1). This module allowed us to repeat steps (i) to (iv) for different assumed parameter values and model approaches. In particular, we also considered nonzero values of \( {\tau}_{\lambda}^2 \) and fitted a random effects metaanalysis model (9) in the second stage of the IPD metaanalysis, and rather calculated pvalues and confidence intervals according to the HKSJ method, to examine if and how power was affected.
Results
Our simulationbased power estimates for the potential IPD metaanalysis are shown in Fig. 1, across the range of true interaction effects from − 0.01 to − 0.5. Power increases as the magnitude of the interaction estimate increases, which is to be expected as, other things being equal, a pvalue becomes smaller as the estimate moves further from the null (which, here, is an interaction of zero).
Despite having IPD from 14 trials, including 2319 patients, the estimated power is less than 80% unless the true interaction effect is about − 0.15 or more. For example, for a true interaction effect of − 0.1, the power is estimated to be 63.6% (95% CI: 62.6%, 64.5%) because 6360 of the 10,000 simulated IPD metaanalyses produced a significant result. For a true interaction effect of − 0.05, the power reduces dramatically to just 20.7%. This indicates that the planned IPD metaanalysis may be underpowered to detect potentially clinically relevant treatmentBMI interactions.
Of note, the mean interaction estimates across each set of 10,000 simulations were almost identical to the true interaction effect, across the entire range from − 0.01 to − 0.5. Thus, the low power was not due to any systematic bias due to the IPD metaanalysis model or estimation process.
Extension to consider obtaining IPD from additional trials
When faced with such findings of low power, researchers are then likely to enquire about whether additional IPD are available, and indeed how much IPD is required to adequately improve the power. In the iWIP project, following discussion with collaborators, IPD were additionally promised from a further 10 trials that, for various reasons, were not included in the original published metaanalysis of aggregate data [31]. Given that the collection of IPD is potentially timeconsuming and resource intensive [7, 8], a dilemma is whether IPD is needed from all of these 10 trials, or perhaps just a representative subset. Power calculations are helpful to resolve this. For illustration, here we consider two options: (i) adding IPD from just the largest of the 10 additional trials, which contained 399 patients; or (ii) adding IPD from all 10 additional trials (a total of 1761 additional patients). We repeated our simulation approach for each of these situations. Sample sizes for the 10 additional trials were known, but information was often lacking about other factors (e.g. the control group mean, or the distribution of baseline BMI) and so we sampled these from the distributions observed in others trials. For example, control group mean weight gain was sampled from \( {\alpha}_i\sim N\left(\alpha, {\tau}_{\alpha}^2\right) \), with α and \( {\tau}_{\alpha}^2 \) set to 11 and 22 respectively, corresponding to their values from a random effects metaanalysis of the mean weight gain estimates for the control groups from the original 14 trials (Table 2).
The results are presented within Fig. 1, and show that adding IPD from further trials would increase the power as expected. However, adding just the IPD from the largest trial is not sufficient, as the power remains lower than typically desired at relevant values of the interaction effect. For example, with a true interaction effect of − 0.1 the IPD metaanalysis of 15 trials has an estimated power of 68.7% (67.8% to 69.6%), and with an interaction effect of − 0.05 it has an estimated power of only 23.2% (95% CI: 22.4% to 24.1%).
Findings based on adding IPD from all 10 additional trials are more promising. In particular, for a true interaction effect of − 0.1 the IPD metaanalysis of 24 trials has an estimated power of 83.2% (95% CI: 80.2% to 85.0%). This is above 80% for the first time, which is a threshold often used in power calculations for single randomised trials. Thus, there is large power to detect interaction effects of ≤ 0.1. However, the power to detect an interaction of size − 0.05 remains very low (31.2%). Therefore, if the true interaction effect is − 0.05, then the IPD metaanalysis is unlikely to have the power required even with 24 trials.
We note that sample size is not the only criteria that will impact upon a study’s contribution toward power. For a treatmentcovariate interaction, the standard deviation of covariate values is also important [9]: other things being equal, those studies with larger variation in covariate values will have a greater contribution. For example, assuming a true interaction effect of − 0.1, if we remove the Barakat study from the IPD metaanalysis of 24 trials, the power estimate is lower than if we remove the Wolff study, even though the latter has far fewer patients. The reason is that the standard deviation of BMI values is substantially larger in the Wolff study (Table 2).
Extension to random effects metaanalysis and alternative confidence interval derivations
The above power calculations assume a fixed effect metaanalysis of interaction estimates and no betweenstudy heterogeneity on the interaction effect. Also, our confidence intervals and pvalues for the intervention effect were derived using the standard normalbased approach, but options such as HKSJ are also possible, as previously mentioned [15].
We therefore repeated our power calculations for the IPD metaanalysis of 24 trials using the same parameter values as shown in Table 3, except with nonzero heterogeneity on the interaction effect (\( {\tau}_{\lambda}^2 \) > 0) and with trial interaction estimates pooled using random effects metaanalysis model (9) via the DerSimonian and Laird approach. Confidence intervals and pvalues were derived using the standard approach, but also using the HKSJ approach for comparison. We focus only on the situation where λ = − 0.1, as this was the critical value for an 80% power as identified from the fixed effect simulations. A range of values for τ_{ λ }was considered, from 0.01 to 0.05, which covered low heterogeneity to large heterogeneity relative to an interaction effect of − 0.1.
The findings are shown in Fig. 2, and the mean I^{2} value was between 10% and 13% for all scenarios. Immediately apparent is that the power gradually reduces as the size of the betweentrial heterogeneity increases, and it is now about 70% or less across the range of τ_{ λ }values. This is alarming, as it signals a planned random effects IPD metaanalysis of the 24 trials would not have adequate power to detect an interaction of − 0.1, even with only low heterogeneity. For example, with τ_{ λ } = 0.01 (mean I^{2} = 10%), the estimated power based on pvalues and confidence intervals is 70.8% based on the standard normalbased approach, which is more than a 10% reduction in power compared to that for the fixedeffect metaanalysis given no heterogeneity (which was 83.2%, Fig. 1). Interestingly, this is mainly due to poor estimation of the betweenstudy variance itself, as we observed an upward bias in the estimate of τ_{ λ } across simulations leading to wider confidence intervals and thus reduced power than if τ_{ λ }was truly known. The bias is because τ_{ λ }is especially problematic to estimate well, as the corresponding Isquared is about 10% and the true τ_{ λ }of 0.01 is close to zero. This leads to large variation in estimates of τ_{ λ }across the 10,000 simulated datasets, and because variance estimates are bounded at zero, their average value has a notable upward bias. Consequently, we observe lower power when τ_{ λ }is estimated than if we truly knew τ_{ λ }. This reflects the impact of using a randomeffects model.
The power is also consistently lower (by about 3%) when using the HKSJ approach rather than the standard approach (Fig. 2). This is expected, as standard 95% confidence intervals are typically too narrow (leading to a > 5% type I error rate), and the HKSJ correction aims to address this, usually leading to wider confidence intervals and larger pvalues.
Extension to consider BMI as a binary variable
Out of interest, we also considered the power of a twostage IPD metaanalysis of all 24 trials that rather includes baseline BMI as a binary covariate. To do this, the IPD were again simulated according to model [13] and thus continuous BMI effects were set as the truth, and with a true interaction of − 0.1 assumed between the intervention and baseline BMI. However, upon application to the simulated IPD the twostage IPD metaanalysis wrongly included baseline BMI as a binary covariate, with a BMI ≥ 30 classed as 1 and a BMI < 30 classed as 0. This dichotomisation corresponded to a true interaction of about − 0.65 kg between the intervention effect and binary BMI, such that the group of individuals with a BMI ≥ 30 have, on average, a 0.65 kg further reduction in weight gain by using the intervention rather than control, in comparison to those with a BMI < 30.
When there was no heterogeneity in the interaction effect, and a twostage fixed effect IPD metaanalysis was applied to the simulated IPD from the 24 trials, the estimated power to detect this interaction was 60.5%. This is over a 20% reduction in power compared to when baseline BMI was analysed correctly as a continuous variable (83.2%), emphasising a huge loss of information by wrongly dichotomising BMI (Fig. 2). Indeed, the estimated power of 60.5% is now similar to that for the original IPD metaanalysis of just 14 trials when baseline BMI was analysed correctly as continuous (59.2%). Therefore, in this particular example, the loss of power by dichotomising baseline BMI in the IPD metaanalysis of 24 trials is similar to throwing away IPD from 10 trials. The cost of dichotomisation is well known in single studies [33, 34], and the results here emphasise that it also generalises to the IPD metaanalysis setting.
Findings are similar in the settings with heterogeneity in the interaction effect, with power estimates now less than 50% compared to about 65–70% when analysed correctly as continuous (Fig. 2).
Consideration of an analysis of covariance approach
Due to the published information available in each trial, our power calculations assumed interaction estimates are derived from a change score analysis, as this was the typical approach taken and reported for each trial. These power estimates may be deemed conservative, as after IPD are obtained it is probable that interaction estimates could be derived from an ANCOVA, which is potentially more powerful. However, the correlation between final weight and baseline pregnancy weight is extremely high (often > 0.9) and Vickers and Altman note that: [35] “the efficiency gains of analysis of covariance compared with a change score are low when there is a high correlation (say r>0.8) between baseline and follow up measurements. This will often be the case, particularly in stable chronic conditions such as obesity. In these situations, analysis of change scores can be a reasonable alternative ...”. This reassures us that power calculations based on the change score approach are pertinent here. However, we would advocate that when IPD is obtained, the ANCOVA approach is the analysis of choice as it adjusts for any baseline imbalance in addition to improving power [19].
Adjustment for additional covariates
Given the potentially inadequate power (< 70%, Fig. 2) when there is heterogeneity, it may be of interest to prespecify the inclusion of additional covariates (prognostic factors) in the first stage of the twostage IPD metaanalysis. Inclusion of prognostic factors would reduce the residual variance in each trial, leading to more precise interaction estimates and potentially larger power. So far the chosen size of residual variances (\( {\sigma}_i^2\Big) \) was based on the variance of weight gain, as reported in publications (Table 3); however, this is potentially conservative given that baseline BMI was also included as a covariate in the data generating model [13]. There are also other prognostic factors in this field, such as age and parity, which could be included.
We therefore repeated our simulations of power in the IPD metaanalysis of 24 trials when residual variances were reduced by between 10% and 90% in each trial. For brevity, we again focus on a true interaction effect of − 0.1, across a range of values on the betweenstudy standard deviation (τ_{ λ }). The results in Fig. 3 show that the power improves as the residual variances decrease, and thus prespecified adjustment for prognostic factors is recommended. However, the power only consistently exceeds 80% across the entire range of τ_{ λ } values when the reduction in residual variances is at least 40%.
Had this been known to the iWIP researchers when planning their IPD project, it could have motivated them to identify the strongest prognostic factors in this field, and ascertain what the likely percentage reduction in residual variance by including them (e.g. by obtaining IPD from one trial and comparing the residual variances before and after inclusion of prognostic factors).
Discussion
IPD metaanalyses are widely considered the gold standard in metaanalysis, and an increasing number are being funded to examine subgroup effects and interactions. However, it is currently rare to see power addressed in IPD metaanalysis grant applications or protocols. Yet power and sample size considerations are pivotal, as an IPD metaanalysis is costly and timeconsuming, and so resources are better allocated to those projects that are adequately powered to detect effects of interest. Even when IPD are available for all studies, the power may not be adequate. Conversely, whilst ensuring selection biases are avoided [36], IPD may not be needed from all studies if a representative subset of trials has large power (e.g. > 95%), which could save considerable time, costs and frustration [7, 8].
To address this, here we outlined a simulationbased approach to power calculations for IPD metaanalysis that utilise a twostage IPD metaanalysis framework. We demonstrated the approach for continuous outcomes, using a planned IPD metaanalysis of pregnancy trials (iWIP), and showed that IPD from 14 trials was unlikely to have adequate power to detect a treatmentBMI interaction unless the effect was very large (Fig. 1). However, IPD from 24 trials was identified to have over 80% power to detect an interaction of at least − 0.1, assuming a fixedeffect metaanalysis was appropriate. Had this information been available at the time, it would have helped the iWIP collaboration to justify the costs and resources needed to collate and metaanalyse IPD from 24 trials. Nevertheless, there would remain a concern that even low heterogeneity on the interaction effect would have reduced the power to 70% or less (Fig. 2) when a randomeffects model was used. Therefore, we also showed the potential gain in power by including prognostic factors in the analysis, which would increase power to over 80% even with heterogeneity (Fig. 3), and thus motivates the identification and prespecification of prognostic factors for inclusion in the IPD metaanalysis. If the true relationships for BMI are linear, the power calculations also made it clear that baseline BMI should be analysed as a continuous variable, as the power is reduced dramatically when BMI is wrongly (and arbitrarily) dichotomised at 30 (Fig. 3). Of course, after IPD is obtained, one may rather examine nonlinear trends using splines for example. Our Stata code can be easily modified to generate IPD assuming nonlinear trends and interactions, if that is considered plausible. However, unless there is evidence to the contrary, the assumption of linearity would appear a sensible starting point when considering potential power prior to the IPD being collected.
Our Stata module for the continuous outcome setting of the iWIP metaanalysis is available in the supplementary material, and requires inputs as shown in Table 1 (see Additional file 1). Users will need to tailor this for their own IPD projects, as outlined by the four step process of Section 3. Extension to binary or survival outcomes would require consideration of event prevalence and event rates, respectively, and the latter would also require assumptions about the distribution of survival times (shape of hazard function), censoring and length of followup [37]. Table 4 provides key details about how to extend the approach to binary and timetoevent outcomes. Each IPD metaanalysis project is unique, and the simulationbased approach will need to be tailored to the information and setting at hand, as with standard power calculations for single trials. For example, in our application the mean and standard deviation of baseline BMI values were not known for all trials, and thus our module needed to generate BMI values differently for these trials compared to the others.
Simulationbased power calculations have been proposed by many others before us [38,39,40], including for randomeffects models in general [41], and within the IPD metaanalysis field [12]. However, the novel aspect of our work is that it is based on a twostage IPD metaanalysis framework [18, 42]. Onestage and twostage approaches to IPD metaanalysis usually give similar results if their assumptions and estimation methods agree [18]. The main disadvantage of the twostage approach is when there are rare events and/or small sample sizes, as then continuity corrections may be required and the assumption of ‘known’ withinstudy variances is likely to be inappropriate [18]. However, the twostage approach also has many advantages. Firstly, it is relatively quick, and in particular facilitated by the excellent module ‘ipdmetan’ within Stata [43], which undertakes both stages automatically. Secondly, in the second stage it utilises wellknown metaanalysis approaches, such as inverse variance weighted fixed effect and random effects analyses, and enables a variety of estimation methods, such as REML and the DerSimonian and Laird method as desired. Indeed, in our applied example we showed how the user can examine power for their own preferred approach and estimation methods. Thirdly, it allows novel options such as HKSJ for deriving pvalues and confidence intervals, which have been shown to improve type I error rates (and thus will give more appropriate power results) [14, 15, 44]. Fourthly, and perhaps most importantly, it automatically avoids using acrosstrial information to inform treatmentcovariate interactions, as these are estimated separately in each trial.
In contrast, onestage models utilise both withintrial and acrosstrial information toward interaction estimates unless covariates are centred, and this would lead to wrongly inflated power estimates, as utilising acrosstrial information is strongly discouraged, being prone to ecological bias and studylevel confounding [16, 17]. Indeed, two competing options to power calculations by Kontopantelis et al. [12] and by Kovalchik et al. [10, 11] utilise a onestage IPD metaanalysis framework amalgamating withintrial and acrosstrial interactions. That being said, these are otherwise excellent alternative options for considering power for IPD metaanalysis, which use simulation or analytic methods. Our approach is somewhat faster than the ‘ipdpower’ module of Kontopantelis et al., as the twostage framework is typically faster than the onestage framework, due to the large number of parameters usually estimated simultaneously in the onestage approach. Indeed, as noted by Kontopantelis et al. in their online help file, onestage models are also prone to convergence problems, and for complex models (with multiple random effects) “nonconvergence is more frequent than convergence.” The analytic approach of Kovalchik et al. is restricted to a fixed interaction effect, and so is limited when heterogeneity is of interest, and does not accommodate adjustment for prognostic factors. Further research comparing power in the context of twostage and onestage approaches would be welcome.
Simmonds and Higgins also provide algebraic solutions for the power of an IPD metaanalysis of continuous outcomes, under certain conditions, for both a onestage IPD metaanalysis (that amalgamates withintrial and acrosstrial interactions) and a twostage IPD metaanalysis [9]. However, these are based on strong assumptions, in particular no heterogeneity of overall treatment effects or interactions, the same number of patients in each treatment group within a trial, and same residual variances in all trials. The beauty of a simulationbased approach is that such assumptions can be easily relaxed, whereas an algebraic approach quickly becomes intractable, especially for nonnormal outcomes. For example, simulations can be adapted to allow noncontinuous outcomes (binary, survival, ordinal, etc), nonnormal distributions for continuous covariates, multiple adjustment terms, nonlinear trends, and multiple (even correlated) randomeffects terms, as desired. This is at the expense of increased computational time, although 1000 simulations for our example would rarely take longer than 3 min for a particular set of inputs. The number of simulations required could be reduced in particular cases, with researchers able to calculate the number of simulations needed to achieve a given precision on the estimated power of their IPD metaanalysis. Our approach also could be extended to incorporate studylevel covariates in the data generating model. This would allow true treatment and interaction effects in each trial to be tailored to studylevel covariates, whereas we currently generate them randomly. Importantly, although we focussed on IPD metaanalysis of randomised trials, the simulationbased approach could be equally used to estimate power for other IPD metaanalysis research, such as prognostic factor research [45].
Conclusions
In summary, we encourage researchers and funders to make assessments of power when planning or commissioning an IPD metaanalysis project. We propose a simulationbased approach to do this, utilising a twostage IPD metaanalysis framework, as illustrated here for continuous outcomes. This overcomes the need for deriving analytic solutions, and is flexible enough to be tailored to each IPD metaanalysis project at hand. In particular, the user can evaluate power based on chosen statistical models and estimation methods, whilst utilising existing aggregate data about the set of trials promising their IPD. This informs how much IPD is required and helps reveal whether the IPD project is worth the investment.
Abbreviations
 ANCOVA:

Analysis of covariance
 BMI:

Body mass index
 HKSJ:

HartungKnapp SidikJonkmann
 HTA:

Health technology assessment
 IPD:

Individual participant data
 iWIP:

Weight Management in Pregnancy International IPD Collaboration
 ML:

Maximum likelihood
 MoM:

Method of moments
 NIHR:

National Institute for Health Research
 REML:

Restricted maximum likelihood
References
 1.
Riley RD, Lambert PC, AboZaid G. Metaanalysis of individual participant data: rationale, conduct, and reporting. BMJ. 2010;340:c221.
 2.
Simmonds M, Stewart G, Stewart L. A decade of individual participant data metaanalyses: a review of current practice. Contemp Clin Trials. 2015;45:76–83.
 3.
Huang Y, Mao C, Yuan J, et al. Distribution and epidemiological characteristics of published individual patient data metaanalyses. PLoS One. 2014;9:e100151.
 4.
Krumholz HM. Why data sharing should be the expected norm. BMJ. 2015;350:h599.
 5.
Hingorani AD, Windt DA, Riley RD, et al. Prognosis research strategy (PROGRESS) 4: stratified medicine research. BMJ. 2013;346:e5793.
 6.
Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroupspecific analyses; power and sample size for the interaction test. J Clin Epidemiol. 2004;57:229–36.
 7.
Hróbjartsson A. Why did it take 19 months to retrieve clinical trial data from a nonprofit organisation? BMJ. 2013;347.
 8.
Altman DG, Trivella M, Pezzella F, Harris AL and Pastorino U. Systematic review of multiple studies of prognosis: the feasibility of obtaining individual patient data. In: Auget JL, Balakrishnan N, Mesbah M, Molenberghs G, (eds.). Advances in statistical methods for the health sciences Boston: Birkhäuser, 2006, p. 3–18.
 9.
Simmonds MC, Higgins JP. Covariate heterogeneity in metaanalysis: criteria for deciding between metaregression and individual patient data. Stat Med. 2007;26:2982–99.
 10.
Kovalchik SA. Aggregatedata estimation of an individual patient data linear random effects metaanalysis with a patient covariatetreatment interaction term. Biostatistics. 2013;14:273–83.
 11.
Kovalchik SA, Cumberland WG. Using aggregate data to estimate the standard error of a treatmentcovariate interaction in an individual patient data metaanalysis. Biom J. 2012;54:370–84.
 12.
Kontopantelis E, Springate DA, Parisi R, Reeves D. SimulationBased Power Calculations for Mixed Effects Modeling: ipdpower in Stata. J Stat Softw. 2016;1(12).
 13.
DerSimonian R, Laird N. Metaanalysis in clinical trials. Control Clin Trials. 1986;7:177–88.
 14.
Hartung J, Knapp G. A refined method for the metaanalysis of controlled clinical trials with binary outcome. Stat Med. 2001;20:3875–89.
 15.
IntHout J, Ioannidis JP, Borm GF. The HartungKnappSidikJonkman method for random effects metaanalysis is straightforward and considerably outperforms the standard DerSimonianLaird method. BMC Med Res Methodol. 2014;14:25.
 16.
Fisher DJ, Copas AJ, Tierney JF, Parmar MK. A critical review of methods for the assessment of patientlevel interactions in individual participant data metaanalysis of randomized trials, and guidance for practitioners. J Clin Epidemiol. 2011;64:949–67.
 17.
Hua H, Burke DL, Crowther MJ, Ensor J, Tudur Smith C, Riley RD. Onestage individual participant data metaanalysis models: estimation of treatmentcovariate interactions must avoid ecological bias by separating out withintrial and acrosstrial information. Stat Med. 2017;36:772–789.
 18.
Burke DL, Ensor J, Riley RD. Metaanalysis using individual participant data: onestage and twostage approaches, and why they may differ. Stat Med. 2016;
 19.
Riley RD, Kauser I, Bland M, et al. Metaanalysis of randomised trials with a continuous outcome according to baseline imbalance and availability of individual participant data. Stat Med. 2013;32:2747–66.
 20.
Whitehead A, Whitehead J. A general parametric approach to the metaanalysis of randomized clinical trials. Stat Med. 1991;10:1665–77.
 21.
Higgins JPT, Green S. Cochrane handbook for systematic reviews of interventions: John Wiley & Sons. 2011.
 22.
Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to estimate the betweenstudy variance and its uncertainty in metaanalysis. Res Synth Methods. 2016;7:55–79.
 23.
Kontopantelis E, Reeves D. Performance of statistical methods for metaanalysis when true study effects are nonnormally distributed: a simulation study. Stat Methods Med Res. 2012;21:409–26.
 24.
DerSimonian R, Kacker R. Randomeffects model for metaanalysis of clinical trials: an update. Contemp Clin Trials. 2007;28:105–14.
 25.
Knapp G, Hartung J. Improved tests for a random effects metaregression with a single covariate. Stat Med. 2003;22:2693–710.
 26.
Hartung J. An alternative method for metaanalysis. Biom J. 1999;41:901–16.
 27.
Hartung J, Knapp G. On tests of the overall treatment effect in metaanalysis with normally distributed responses. Stat Med. 2001;20:1771–82.
 28.
Sidik K, Jonkman JN. A simple confidence interval for metaanalysis. Stat Med. 2002;21:3153–9.
 29.
Sidik KJ, J. N. On constructing confidence intervals for a standardized mean difference in metaanalysis. Comm StatistSimulation Comput. 2003;32:1191–203.
 30.
Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934;26:404–13.
 31.
Thangaratinam S, Rogozinska E, Jolly K, et al. Effects of interventions in pregnancy on maternal weight and obstetric outcomes: metaanalysis of randomised evidence. BMJ. 2012;344:e2088.
 32.
Kahan BC, Rushton H, Morris TP, Daniel RM. A comparison of methods to adjust for continuous covariates in the analysis of randomised trials. BMC Med Res Methodol. 2016;16:42.
 33.
Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006;25:127–41.
 34.
Altman DG, Royston P. Statistics notes: the cost of dichotomising continuous variables. BMJ. 2006;332:1080.
 35.
Vickers AJ, Altman DG. Statistics notes: Analysing controlled trials with baseline and follow up measurements. BMJ. 2001;323:1123–4.
 36.
Ahmed I, Sutton AJ, Riley RD. Assessment of publication bias, selection bias and unavailable data in metaanalyses using individual participant data: a database survey. BMJ. 2012;344:d7762.
 37.
Crowther MJ, Lambert PC. Simulating biologically plausible complex survival data. Stat Med. 2013;32:4118–34.
 38.
Arnold BF, Hogan DR, Colford JM Jr, Hubbard AE. Simulation methods to estimate design power: an overview for applied research. BMC Med Res Methodol. 2011;11:94.
 39.
Landau S, Stahl D. Sample size and power calculations for medical studies by simulation when closed form expressions are not available. Stat Methods Med Res. 2013;22:324–45.
 40.
Feiveson AH. Power by simulation. Stata J. 2002;2(2):107–24.
 41.
Browne WJ, Golalizadeh LM, Parker RMA. A guide to sample size calculation for random effects models via simulation and the MLPowSim software package. University of Bristol. 2009.
 42.
Simmonds MC, Higgins JPT, Stewart LA, Tierney JF, Clarke MJ, Thompson SG. Metaanalysis of individual patient data from randomized trials: a review of methods used in practice. Clinical Trials. 2005;2:209–17.
 43.
Fisher DJ. Twostage individual participant data metaanalysis and generalized forest plots. Stata J. 2015;15:369–96.
 44.
Partlett C, Riley RD. Random effects metaanalysis: coverage performance of 95% confidence and prediction intervals following REML estimation. Stat Med. 2017;36:301–317.
 45.
AboZaid G, Sauerbrei W, Riley RD. Individual participant data metaanalysis of prognostic factor studies: state of the art? BMC Med Res Methodol. 2012;12:56.
Acknowledgements
We thank Professor Shakila Thangaratinam for helpful feedback on an earlier version of the article.
Funding
Danielle Burke is funded by an NIHR School for Primary Care Research PostDoctoral Fellowship. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
Availability of data and materials
Stata code to generate the simulated datasets used in this article is provided as additional file.
Author information
Affiliations
Contributions
RR and JE developed the research question. JE and RR developed the simulation code to implement the methods and calculate power for IPD metaanalysis. RR, JE, DLB, KIES and KH developed the method and designed the power by simulation approach. JE applied the approach to the example. JE and RR wrote the first draft of the manuscript, and all authors contributed to subsequent revisions. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not Applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
Stata simulation program code. Stata code to simulate power for IPD metaanalysis as proposed in this article. (PDF 158 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Ensor, J., Burke, D.L., Snell, K.I.E. et al. Simulationbased power calculations for planning a twostage individual participant data metaanalysis. BMC Med Res Methodol 18, 41 (2018). https://doi.org/10.1186/s128740180492z
Received:
Accepted:
Published:
Keywords
 Treatmentcovariate Interactions
 Simulationbased Approach
 True Interaction Effect
 Change Score Model
 Interaction Estimates