 Research article
 Open Access
 Published:
When does the use of individual patient data in network metaanalysis make a difference? A simulation study
BMC Medical Research Methodology volume 21, Article number: 21 (2021)
Abstract
Background
The use of individual patient data (IPD) in network metaanalyses (NMA) is rapidly growing. This study aimed to determine, through simulations, the impact of select factors on the validity and precision of NMA estimates when combining IPD and aggregate data (AgD) relative to using AgD only.
Methods
Three analysis strategies were compared via simulations: 1) AgD NMA without adjustments (AgDNMA); 2) AgD NMA with metaregression (AgDNMAMR); and 3) IPDAgD NMA with metaregression (IPDNMA). We compared 108 parameter permutations: number of network nodes (3, 5 or 10); proportion of treatment comparisons informed by IPD (low, medium or high); equal size trials (2armed with 200 patients per arm) or larger IPD trials (500 patients per arm); sparse or wellpopulated networks; and type of effectmodification (none, constant across treatment comparisons, or exchangeable). Data were generated over 200 simulations for each combination of parameters, each using linear regression with Normal distributions. To assess model performance and estimate validity, the mean squared error (MSE) and bias of treatmenteffect and covariate estimates were collected. Standard errors (SE) and percentiles were used to compare estimate precision.
Results
Overall, IPDNMA performed best in terms of validity and precision. The median MSE was lower in the IPDNMA in 88 of 108 scenarios (similar results otherwise). On average, the IPDNMA median MSE was 0.54 times the median using AgDNMAMR. Similarly, the SEs of the IPDNMA treatmenteffect estimates were 1/5 the size of AgDNMAMR SEs. The magnitude of superior validity and precision of using IPDNMA varied across scenarios and was associated with the amount of IPD. Using IPD in small or sparse networks consistently led to improved validity and precision; however, in large/dense networks IPD tended to have negligible impact if too few IPD were included. Similar results also apply to the metaregression coefficient estimates.
Conclusions
Our simulation study suggests that the use of IPD in NMA will considerably improve the validity and precision of estimates of treatment effect and regression coefficients in the most NMA IPD datascenarios. However, IPD may not add meaningful validity and precision to NMAs of large and dense treatment networks when negligible IPD are used.
Background
The use of network metaanalysis (NMA) has grown exponentially over the past few years [1]. With its increased use has come a number of methodological developments, including the expansion from aggregate data (AgD) to the combined use of individual patient data (IPD) and AgD [2]. Many of these newer methods have been highlighted in the recent, and highly influential, National Institute for Health Care and Excellence (NICE) Technical Support Document 18 [3]. Chief among them are populationadjusted indirect comparisons (PAIC), such as matched indirect comparisons (MAIC) [3]. The NICE guidance demonstrates that PAICs can be used in connected networks to adjust for imbalances in effectmodifiers and in disconnected networks to adjust for both effectmodifiers and other prognostic factors. While both are feasible, the guidance emphasizes that PAIC is more ideally used in connected networks where the aim is to make better adjustments of imbalances in effectmodifiers only. The difference is akin to using randomized trials for causal inference and propensity scoreadjusted analyses of observational studies. A great achievement of PAIC has been its important uptake, both in the general research community [4] and within the health technology assessment (HTA) community [5]. The latter has been particularly important in the current pharmaceutical climate that often sees treatments fasttracked through development due to very promising early results, which can lead to noncomparative studies. While the first phase of uptake of IPD use within HTA submissions has been principally focused on disconnected networks, a consistent criticism of such analyses has been the lack of prognostic factors being adjusted for [5].
NICE guidance acknowledges that there are numerous ways to combine IPD and AgD, in addition to PAIC. These methods are restricted to connected networks, thus avoiding the criticism regarding the need for prognostic factors. Primarily, they include: a twostage approach where data are first transformed into AgD only and then analyzed traditionally; a onestage approach where IPD and AgD are analyzed simultaneously, and other more underdeveloped methods, such as hierarchical metaregression, which may reduce some forms of bias. Despite the acknowledgement of these other methods, little guidance is provided for these methods; in part due to lack of evidence surrounding their performance. Thus, while the uptake of PAIC is partially due to its ability to handle disconnected networks, it is also due to the clearer guidance that has been provided for these methods. It is anticipated that as better understandings of other IPDAgD methods become available, and that in turn better guidance is provided, the uptake of other methods will increase substantially.
Effect modification occurs in NMA, when one or more variables that impact the treatmenteffect, dubbed effect modifiers, are imbalanced across different edges of the network. In order to move forward with better adjustments for imbalances in effectmodifiers in networks of evidence, the onestage approach seems destined to play a larger role in evidence synthesis moving forward. The reason is twofold. First, relative to the twostage approach, a onestage approach takes full advantage of the data in a single analysis rather than adjusting each IPD trial separately. Second, it allows for adjustments in larger networks than using MAIC, which can only make adjustments in small networks (3 nodes at a time).
Empirically, the use of IPD in NMA has generally been seen to have a large impact on their results; however, that isn’t always the case [2, 4, 6]. There are a number of factors that could explain why IPD adjustments can have minimal impact on the evidence synthesis results. Ideally, the underlying reason is due to the evidence base not being imbalanced with respect to effectmodifiers (i.e., because there are no adjustments needed to the data). An alternative may be that there are simply not enough IPD to make a meaningful impact. Donegan et al. conducted a onestage NMA in which IPD made up 30% of patients and 25% of trials covering much of the network geometry, and found a meaningful impact [6]. In their discussion, the authors state: “It would be interesting to compare the proposed approach with AD metaanalysis of all studies, while varying the number of studies that contribute IPD, to establish whether equally dramatic improvements are observed.” [6]
In this study, which was part of a doctoral thesis [7], we aim to determine through the use of simulations, ‘how much IPD is enough to make a difference’. Put in more absolute terms, we aim to examine if select factors are predictive of whether the use of IPD will lead to improvements in the validity and precision of estimates of comparative treatmenteffects and metaregression coefficients within NMA. In particular, the factors include factors specific to the individual patient data, namely the proportion of treatment comparisons and number of patients for which individual patient data are available; factors pertaining to the network of evidence, namely its number of nodes and whether it is sparsely populated; and the presence or absence of effectmodification and whether it was fully or partially shared across the comparisons in the network.
Methods
We performed simulations of several AgDIPD NMA data scenarios by varying the following data properties across scenarios: proportion of treatment comparisons with IPD, proportion of patients for which IPD are available, number of nodes in the network, network density, and the nature of effectmodification in the network [8]. To ensure that observed differences could be attributed to these parameters, each was varied individually and all other factors were kept constant. Consistently across all individual simulated scenarios, three NMA models were used to analyze the data: 1) AgD NMA without adjustments (AgDNMA)); 2) AgD NMA with metaregression (AgDNMAMR); and 3) IPDAgD NMA with metaregression (IPDNMA).
Simulation model parameters – outline and rationale
The simulation model parameters are described in Table 1. The values for the proportion of treatment comparisons and network density in the table were purposely broad because their meanings are codependent (see Data Generation below). Varying the number of nodes in the network (3, 5 and 10 nodes) was highly motivated by the desire to understand the impact of IPD in larger networks, which is not feasible using MAIC. The reason for including the trial sizes model parameter is to improve the differentiation between the number of patients available and the proportion of treatment comparisons with IPD, which is best captured through the proportion of treatment comparisons with IPD. Together, these factors are critical to answering the motivational question: How much IPD are needed to make a difference?
The type of effectmodification is an intrinsic characteristic regarding the nature of the impact of the covariates on the treatmenteffects. These would not be known to the researcher, except through clinical hypothesizing. Figure 1 presents the three types of effectmodification that were tested. Consistent effectmodification implies the relationship between outcome and covariate is shared between all trials. Exchangeable effectmodification implies the relationship is contrastspecific (shared by trials with the same comparison). While the slopes differ for each contrast, the slopes come from a distribution of slopes. Thus, in the exchangeable model there is a shared mean effectmodification from the distribution of possible effectmodifications.
Data generation
For simplicity, only twoarm trials were simulated. All trials included 200 patients per arm. If IPD trial sizes were set to large, then the trials selected as having IPD were set to having 500 patients per arm. The first step in creating the data was to determine the number of trials in the network (N) and the number of patients per trial arm (n_{i}). Figure 2 depicts the number of trials used according to the combination of number of nodes and network density (see Web Appendix for precise counts). This step allowed the construction of the treatment (t) matrix for the AgD data to be built and hence construct empty y, se and n matrices to be filled in subsequently. The treatment matrix comprised of treatment numbers t_{jk} for each arm k of trial j.
The second step was to identify the IPD trials. The number of treatment comparisons with IPD was not fully fixed. When set to low, a single treatment comparison was selected to have IPD regardless of the number of nodes. When set to medium, these increased to 2 treatment comparisons for 3node networks, 3 for 5node networks, and 4 for 10node networks. When set to high, the number of treatment comparisons selected to have IPD was set to all 3 for 3node networks, 4–6 for 5 node networks, and 5–15 for 10 node networks. Following this, a random sample of trials along these treatment comparisons were selected to have IPD. The random selection of the number of trials served to help vary the proportion of patients for which IPD was available, which is a central feature in the research question.
Having identified the number of trials and patients, and having assigned them to treatment arms, it was then possible to construct the observed outcomes. Without loss of generality, the generated data were continuous rather than dichotomous, count or otherwise; principally for computing speed. The mean observed change in the predefined outcome of interest y_{jk} in the case of aggregate data and y_{ijk} in the case of IPD (a separate observation for each patient i); and a standard error se_{jk} for the mean change in each arm for aggregate data and a standard deviation sd_{ijk} in the case of IPD.
For an IPD trial with no effectmodification, the data were generated using:
Where μ_{j} is the study effect that was generated using a random uniform distribution between − 3 and 6. The treatmenteffects were constant from replication to replication. To be clear, d_{5} was only used in instances where the number of nodes was set to 5 or 10. The standard deviation was set to 1.0 and the heterogeneity was set 0.03, which can be considered moderate.
For an AgD trial with no effectmodification, the data were generated using:
Effectmodification was a result of a single covariate, X, that was generated in two steps. To ensure variability at the aggregate level, the aggregate values of X, x. agg_{j}, were generated using a random normal distribution with a mean 0.75 and a standard deviation of 0.25. For trials that had IPD, the covariate values x_{ijk} were generated using a random uniform distribution centered at x. agg_{j} and extended by 0.35 in either direction. When constant effectmodification was called for, the data for IPD trials was generated in the following manner:
Where β_{1} was set to 0.5 and β_{2} was set to 1. The data were generated in the same way and aggregated in the case of aggregate trials. When the effectmodification was set to exchangeable, the slopes for each treatment were generated using a random normal distribution. For full transparency, the code used to generate the data is provided in the Web Appendix.
Data were generated over 200 simulations for each specific set of parameter combinations. The choice of 200 simulations was on the basis of balancing computing time and having a minimal number of simulations to not be overly influenced by a single occurrence. According to Burton et al., the choice regarding the number of simulations can be based on the accuracy of an estimate of interest, such as a regression coefficient [9]. Given that this study did not aim to estimate a specific parameter, this approach was not used. A review of simulation studies pertaining to NMAs suggests a similar distribution of number of simulations in this field, with at least one study using less than 200 simulations per scenario [10]. With 200 simulations per set of parameter permutations and a total of 108 permutations (3 node setting × 3 proportion IPD × 3 effectmodification settings × 2 trial size settings × 2 network density settings), a total of 21,600 analyses were conducted (see p.3 of Web Appendix for computer and runtime details).
Data analysis
For each simulation, following the data generation described above, three analyses were conducted: 1) AgDNMA; 2) AgDNMAMR; and 3) IPDNMA. In all three cases, the NMA were modeled using a randomeffects approach given that the data were generated using betweenstudy heterogeneity. Specifically, the model used for AgDNMA was:
Where δ_{jbk} is the trialspecific treatmenteffect of k relative to treatment b. These trialspecific effects are drawn from a randomeffects distribution: δ_{jbk}~N(d_{bk}, σ^{2}). The pooled effects, d_{bk}, are identified by expressing them in terms of the reference treatment A. The heterogeneity σ^{2} is assumed constant for all treatment comparisons.
The model used for AgDNMAMR was:
Where x_{lj} is the l^{th} trialspecific covariate value. β_{lk} is the corresponding treatmentbycovariate interaction term, as suggested by the NICE DSU TSD 3 document [11].
The model used for IPDNMA was:
For the IPD, β_{0j} is a studyspecific effect of subjectlevel covariate x_{ij}. β_{1Ak} − β_{1Ab} reflects the interaction effects of covariate x_{ij} for treatment k relative to control treatment b. k1 different regression coefficient β_{1Ak} will be estimated by the model. Parameters of primary interest from analyses are the pooled estimates of d_{Ak}, the estimates for the heterogeneity, and treatmentbycovariate interaction effects β_{1Ak}.
The parameters of the different models were estimated using a Markov Chain Monte Carlo (MCMC) method. The first 15,000 iterations were discarded as ‘burnin’, and the inferences were based on additional 10,000 iterations using two chains. Given that there were 21,600 analyses to conduct, convergence was assessed numerically for all analyses using the multivariate potential scale reduction factor (PSRF) [12]. Values above 1.1 were seen as evidence of nonconvergence. While trace plots, density plots and GelmanRubinBrooks (shrink factor) plots are a better, more indepth way of assessing convergence, it was simply not feasible to do so for the entire set of simulations [12].
Data collection and measures of comparison
The final step to each replication was collecting the results. To assess model performance, the mean squared error (MSE) and the bias of the treatmenteffects and covariate estimates were collected. Additionally, the power to detect the covariate was also collected to assess coverage (i.e., the frequency at which the 95% credible interval did not contain 0 in the estimation of β_{1}). To assist with answering the hypothesis, the proportion of treatment comparisons with IPD and the proportion of patients with IPD was also collected.
The simulations included a varying number of parameters corresponding to treatmenteffects, ranging from two to nine according to the size of the network. To simplify the quantification of the simulation results across simulation scenarios, the MSE and bias measures were calculated overall treatment parameters (i.e., the bias was calculated using d_{2} and d_{3} for 3node networks and over d_{2} through to d_{10} in 10node networks). Moreover, given the 108 scenarios resulting from the different factorpermutations, an average over each factorlevel was used as an easier way to make sense of the results. In addition to comparing the summary statistics of the MSE, a paired ttest was used to determine whether the differences were statistically differentiable. To this end, each observed MSE pair, that is, for each parameter in each instance of the analysis, the difference between the AgDNMAMR analysis and the IPDNMA were calculated and the resulting sample of differences was tested using a Wilcoxon signedrank paired test.
All analyses were performed using R version 3.5.1 (http://www.rproject.org/) and JAGS version 4.3.
Results
When did IPD help?
As expected, both AgDNMAMR and IPDNMA outperformed the AgDNMA (except for scenarios with no effect modification). Therefore, comparisons are focused on IPDNMA and AgDNMAMR, unless specified otherwise. The use of IPD was beneficial to the estimation process in 88 of the 108 factor permutations that were explored, was neutral in 11 factor permutations and was detrimental in 9 of the 108 scenarios. The scenarios with small, neutral and negative improvements were consistently densely populated, often large, and often with a low or medium proportion of edges with IPD. Indeed, the largest benefits to IPD were observed in small networks. Overall, the results suggest that sometimes more IPD is better than having very few and that in a larger, betterpopulated network too few IPD will have a negligible impact on the NMA results. With respect to the scenarios where AgDNMAMR had a lower median MSE than IPDNMA, 8 of the 9 scenarios were cases with an exchangeable effectmodification. The lone exception was a scenario with no effect modification. For the numeric differences of each scenario, see the Web Appendix. Having discussed the big picture, we present more detailed results in the remainder of this section.
Treatmenteffect estimation
Across all scenarios both the IPDNMA and the AgDNMAMR had distributions of bias that were centered at zero. The impact of using IPDNMA varied greatly across scenarios, from leading to a noticeably narrower distribution of bias and more precise estimates to a more negligible improvement. Averaging over these scenarios led to density plots that suggest only a moderate improvement in validity and, at times, a large improvement in precision when using IPDNMA.
The MSE and bias for the different numbers of nodes in the network are presented in Fig. 3. The average gained benefits of using IPD were largest in the small 3node networks than in the larger 10 node networks. Again, this aligns with the hypothesis that the benefits of IPD may be less noticeable when there are few IPD in larger evidence networks. Similarly, as presented in Fig. 4, the relative difference in MSE and bias between IPDNMA and AgDNMAMR was largest among sparsely populated networks. However, it should be recognized that while the difference was greatest in sparse networks, both methods performed better in the wellpopulated networks as these had considerably lower MSE. This was not the case with the size of network, which did not impact the MSE. Although the MSE values were small, the median MSE was 3.1 times larger for AgDNMAMR than in IPDNMA in sparse networks and twice as large in wellpopulated networks.
Figure 5 panel a presents the standard error of the treatment effect estimates for both the AgDNMAMR and the IPDNMA averaged overall treatmenteffects across each of the factor levels. Both network size and density had the largest differences across levels. On average, the benefits in terms of precision were immense in a 3node network and negligible in a 10node network. Similarly, for sparse and wellpopulated networks.
The effect of factors relating to the IPD had less impact. Neither proportion IPD nor the trial size settings had a noticeable impact on the degree of improved precision of estimation (Fig. 5a). The differences in MSE and bias between IPDNMA and AgDNMAMR across the different proportions of treatment comparisons with IPD were in the expected direction (Fig. 1 of Web Appendix). That is to say that trials with a higher proportion of treatment comparisons with IPD had a bigger reduction in bias. Among the IPDNMA, the median MSE went from 0.0048 to 0.0038 to 0.0026 for low, medium and high proportions of IPD, respectively, while the AgDNMAMR was consistent with a median MSE of 0.010 across all three scenarios. It was also reassuring that the results for the nonIPD based analyses were not affected by this factor. The difference between IPDNMA and AgDNMAMR was not as pronounced for the difference between large and equal sample sizes as that observed for size and density of network (Figure 7 of the Web Appendix).
One potential issue with averaging over all scenarios is that we can lose sight of important interactions between factors. In this regard, it can be helpful to visualize differences in a factor for a specific set of factors rather than averaged over all other factors. Figures 2, 3, 4 and 5 of the Web Appendix compare the distribution of treatmenteffect estimation bias in more specific scenarios. These help highlight that in larger networks the proportion of treatment comparisons with IPD matters more.
In the trivial case of having no effectmodification, the unadjusted model performed best (Figure 6 of the Web Appendix). With no effectmodification, all modeling approaches were unbiased, but the variance in the unadjusted model was considerably lower: this phenomenon was entirely expected. In the situation with effectmodification, IPDNMA performed best given that AgDNMAMR had thicker tails in the bias distribution. Note that given that the bimodal behaviour of unadjusted AgDNMA is on the basis of differences in effectmodification, by looking at each effect modification separately, the unadjusted AgDNMA was now unimodal. Finally, the advantage of IPDNMA relative to AgDNMAMR was more muted when effectmodification was exchangeable (varying from one edge to another in accordance to a Normal distribution).
Regression coefficient estimation
Understanding how the regression coefficients are estimated can add additional insight into the results observed with respect to treatmenteffect estimation. Table 2 presents the summary statistics of the MSE for the covariate coefficient estimates. For simplicity, only the simulations with a constant effectmodification were explored given that there were no covariates to estimate in simulations without effectmodification and that the estimation of MSE and bias were rendered more difficult when the covariates were generated from a random Normal distribution in the simulations with exchangeable effectmodification. The statistics regarding MSE for the regression coefficient estimates resemble those from the treatmenteffect estimates. Specifically, both estimators appeared to be unbiased and the AgDNMAMR had a much larger range and standard deviation. As a result, the IPDNMA had smaller MSE (both median and mean), suggesting that it leads to a reliable estimate.
There was additional interest in the statistics of the regression coefficient estimates themselves because of their impact on the treatmenteffect estimates. Poor estimates of the covariate coefficient will lead to poor estimates of the treatmenteffects. To this end, there was a notable difference in the precision of these estimates. Figure 5 panel b presents the standard deviation of the regression coefficient estimates for both the AgDNMAMR and the IPDNMA across each of the factor levels. As can be seen, the very same patterns observed in the treatment effects were observed for the regression coefficients.
Model diagnostics
The multivariate PSRF were collected for each model (Table 4 of the Web Appendix). The summary values by analysis type are presented below in. Convergence was consistently met throughout the simulations, with the exception of very few simulations. The very small proportion of nonconvergence was judged to be negligible. Scenarios where nonconvergence took place were small, sparse networks and large, wellpopulated networks. Note that a high multivariate PSRF is not always indicative of nonconvergence. Small PSRF can be obtained for each parameter and still get a large multivariate PSRF. Nonetheless, this does not happen commonly.
Discussion
This study used simulations to explore the improvements in estimation using IPD and AgD relative to using AgD only to conduct NMA with metaregression in accordance with numerous extrinsic and intrinsic factors of the evidence base. Study results suggest that IPDNMA reduces estimation bias and, to a greater extent, improves the precision of treatmenteffect and regression covariate estimates over NMA conducted using AgD only. On the basis of the conducted simulations, in evidence bases afflicted by effectmodifiers, the inclusion of IPD may be most impactful among small and/or sparse networks of evidence. While IPD consistently improves validity and precision in these networks, they do not always improve them in large and/or dense networks. When too few IPD are used in large or dense networks, their impact appear to be washed out and negligible. As application of IPDNMA becomes more common in larger networks, care will be required to ensure sufficient IPD are used.
This study suggests caution in guiding users that IPD is always the approach despite the promising attributes of using IPD within NMA. The use of IPD within NMA is also quite promising. Under the strong assumptions of having access to all effectmodifiers and prognostic factors, PAIC can be used to conduct NMA with disconnected networks of evidence [13]. PAIC methods are well understood enough to warrant NICE guidance on their use [13]; however, there remain many properties of onestage IPDAgD NMA that remain unknown. Simulation results do help confirm and quantify some commonsense properties. Among small and/or sparsely populated networks, the use of IPDNMA leads to significant improvements in both reduction of bias and precision of estimates. Incidentally, PAIC tends to be used in smaller networks, so use of IPD in this manner is likely to be equally impactful. Based on previous work, we hypothesized that too few IPD in large networks would lead to negligible impact – a form of washing out. Indeed, our simulations showed there needed to be at least 10% of patients in the network being from IPD in order for results to be impacted. In this way, IPD should not be included blindly, but only included in situations that could be impactful to the model estimates.
The selected circumstances were restricted to situations where it was unclear a priori whether there would be a meaningful advantage to the use of IPDNMA. As such, all effectmodification was attributable to a single variable and the association between treatmenteffect and effectmodifier was perfectly conserved at the aggregate level. For example, in the presence of ecological fallacy, the phenomenon that arises when trends in aggregate data do not match trends in individual data, using IPD will trivially be superior to AgD only [14]. There are various reasons this could happen, such as large differences in sample sizes and weights leading to Simpson’s paradox. Metaregression in AgDNMA is always at risk of making a model correction using a biased estimate due to the ecological fallacy and IPD is a simple way to avoid or reduce the impact of this issue. As another example, we can imagine many realworld situations where multiple effectmodifiers are imbalanced [15]. Only in exceptional circumstances can AgD be used to conduct metaregression adjusting for multiple variables at a time. On the other hand, unless dealing with a single, smallsampled trial with IPD, IPD provides many more data points than AgD and as a result, allows for the simultaneous adjustment of multiple covariates [16]. Thus, though not demonstrated through these simulations, it is important to recognize the ability to make more complex adjustments through IPDNMA [16]. Under both these circumstances, the added benefits of IPD are clear and there is no need to quantify these differences.
Previous studies exploring the use of IPD in combination with AgD in NMA have noted the advantages that it can bring with respect to both precision and validity. As noted by Donegan and colleagues, studies have yet to explore how much IPD is enough for the gains to be impactful [6]. A review of the literature did reveal another simulation study exploring the use of IPD and AgD for NMA [10]. Leahy and colleagues explored the benefits of IPD from the perspective of model selection, rather than bias and MSE. To this end, they found that “an increased proportion of IPD resulted in more accurate and precise estimates for most models and datasets.” They concluded that use of IPD was always beneficial relative to not having IPD. This study adds to theirs by considering the impact of size of network (theirs only considered 5node networks), density of networks and proportion of nodes and edges available with available IPD. These studies are in agreement in that IPD is beneficial to evidence synthesis; however, our study provides further insight that too few IPD within a large network will lead to negligible benefits that may not be worth the effort.
In these simulations, the impact of IPDNMA was more notable with respect to the increased precision of estimates. More attention was paid to the bias and MSE of the estimates; however, it is important to recognize the impact of improved precision of IPDNMA. Improved precision leads to increased ability to correctly differentiate the impact of treatments and improve subsequent decisionmaking. Here too, gains were not uniform across all scenarios. Network density was the most influential factor, with improved precision most notable within sparse networks.
There are some limitations to the simulations conducted for this study. Firstly, there was no variation in the heterogeneity of studies. Network heterogeneity is an extrinsic factor that can be evaluated for a network of evidence, so understanding how the impact of IPDNMA varies with heterogeneity would be useful to future researchers. The current study has a relatively large scale already, which led to both computational challenges and interpretational challenges, and ultimately it was not included in the study scope in order to control the complexity of the simulations. Secondly, the AgD generated for the simulations can be improved and made more realistic in future simulations, particularly when working with large sample sizes for IPD trials. By aggregating the IPD data, the residual standard error at the aggregate level was much smaller than at the individual level in some settings.
Simulation analyses represent a powerful research tool that can provide important insights into IPDAgD NMA. While our analyses have shed light on some popular methods, future research could be expanded to much more than the suggestions that arose from our limitations above. Chief among them are simulations to expand these simulations to other IPDAgD methods. As previously mentioned, PAIC methods tend to be restricted to small networks and the questions around large networks do not apply. Nonetheless, there are methods that have been developed to overcome the ecological fallacy in larger networks, such as those developed by Jackson et al. [14] Properties of these methods are not well understood. As such, comparisons through simulations to other IPDAgD methods as well as the impact of the factors explored in this analysis would help shed light both on those methods as well as the differences in impact of ecological fallacy in AgD and IPDAgD models.
Conclusion
This study illustrates the value of IPD for network metaanalysis, but also shows that it is not a panacea. The effects of too few IPD in too large a network will get washed out in the analysis and fail to provide the potential advantages of including IPD. Nonetheless, in most circumstances, IPD can be used to improve the validity and precision of treatmenteffects, which in turn leads to more useful model results.
Availability of data and materials
Not applicable. Code to generate data provided in Web Appendix.
Abbreviations
 AgD:

Aggregate data
 AgDNMA:

Aggregate data network metaanalysis without adjustments
 AgDNMAMR:

Aggregate data network metaanalysis with metaregression adjustments
 IPD:

Individual patient data
 IPDNMA:

Network metaanalysis with metaregression adjustments, using both individual patient data and aggregate data
 MAIC:

Matched indirect comparisons
 MCMC:

Markov Chain Monte Carlo
 MSE:

Mean squared error
 NICE:

National Institute for Health Care and Excellence
 NMA:

Network metaanalysis
 PAIC:

Population adjusted indirect comparisons
 PSRF:

Potential scale reduction factor
References
Nikolakopoulou A, Chaimani A, Veroniki AA, Vasiliadis HS, Schmid CH, Salanti G. Characteristics of networks of interventions: a description of a database of 186 published networks. Plos One. 2014;9(1):e86754.
Saramago P, Sutton AJ, Cooper NJ, Manca A. Mixed treatment comparisons using aggregate and individual participant level data. Stat Med. 2012;31(28):3516–36. https://doi.org/10.1002/sim.5442 Epub 2012 Jul 5.
Signorovitch JE, Sikirica V, Erder MH, Xie J, Lu M, Hodgkins PS, et al. Matchingadjusted indirect comparisons: a new tool for timely comparative effectiveness research. Value Health. 2012;15(6):940–7. https://doi.org/10.1016/j.jval.2012.05.004.
Veroniki A, Straus S, Soobiah C, Elliott M, Tricco A. A scoping review of indirect comparison methods and applications using individual patient data. BMC Med Res Methodol. 2016;16:47. https://doi.org/10.1186/s128740160146y.
Muresan B, Hu Y, Postma M, Ouwens M, Heeg B. PCN63  review of NICE HTA submissions including matchingadjusted indirect comparisons and simulated treatment comparisons. Value Health. 2018;21:S24.
Donegan S, Williamson P, D'Alessandro U, Garner P, Smith CT. Combining individual patient data and aggregate data in mixed treatment comparison metaanalysis: individual patient data may be beneficial if only for a subset of trials. Stat Med. 2013;32(6):914–30. https://doi.org/10.1002/sim.5584 Epub 2012 Sep 17.
Kanters S. Comparative efficacy and safety of firstline treatments for hiv patients for clinical guideline development and the impact of individual patient data. Vancouver: University of British Columbia; 2019.
Morris T, White I, Crowther M. Using simulation studies to evaluate statistical methods. Stat Med. 2019;16(10).
Burton A, Altman D, Royston P, Holder R. The design of simulation studies in medical statistics. Stat Med. 2006;25(24):4279–92.
Leahy J, O'Leary A, Afdhal N, Gray E, Milligan S, Wehmeyer MH, et al. The impact of individual patient data in a network metaanalysis: an investigation into parameter estimation and model selection. Res Synth Methods. 2018;9(3):441–69.
Dias S, Sutton A, Welton N, Ades A. Evidence synthesis for decision making 3: heterogeneitysubgroups, metaregression, bias, and biasadjustment. Med Decis Mak. 2013;33(5):618–40.
Brooks S, Gelman A. General methods for monitoring convergence of iterative simulations. J Comput Graph Stat. 1998;7(4):434–55.
Phillippo D, Ades A, Dias S, Palmer S, Abrams K, Welton N. NICE DSU Technical support document 18: methods for populationadjusted indirect comparisons in submission to NICE; 2016.
Jackson C, Best N, Richardson S. Improving ecological inference using individuallevel data. Stat Med. 2006;25(12):2136–59.
Kovic B, Zoratti M, Michalopoulos S, Silvestre C, Thorlund K, Thabane L. Deficiencies in addressing effect modification in network metaanalyses: a metaepidemiological survey. J Clin Epidemiol. 2017;88:47–56.
Debray T, Moons K, van Valkenhoef G, Efthimiou O, Hummel N, Groenwold RH, et al. Get real in individual participant data (IPD) metaanalysis: a review of the methodology. Res Synth Methods. 2015;6(4):293–309.
Acknowledgements
The authors would like to thank critical feedback from Hubert Wong, Michael John Milloy and Tom Trikalinos.
Funding
Steve Kanters received funding through a doctoral research award from the Canadian Institutes for Health Research, who played no role in this research.
Author information
Authors and Affiliations
Contributions
SK and NB had full access to all of the data in the study. SK and NB take responsibility for the integrity of the data, the accuracy of the data analysis, and the final decision to submit for publication. All authors have read and approved the manuscript. Study concept and design: SK, KT and NB. Acquisition, analysis, or interpretation of data: SK, MEK and KT. Drafting of the manuscript: SK and NB. Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: SK. Study supervision: NB and AA.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
When does use of individual patient data make a difference? A simulation study – Web Appendix. Description: R Code to generate data.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Kanters, S., Karim, M.E., Thorlund, K. et al. When does the use of individual patient data in network metaanalysis make a difference? A simulation study. BMC Med Res Methodol 21, 21 (2021). https://doi.org/10.1186/s12874020011982
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874020011982
Keywords
 Individual patient data
 IPD
 Network metaanalyses
 NMA
 Simulation study
 Methods