Comparing the use of aggregate data and various methods of integrating individual patient data to network meta-analysis and its application to first-line ART

Background The 2018 World Health Organization HIV guidelines were based on the results of a network meta-analysis (NMA) of published trials. This study employed individual patient-level data (IPD) and aggregate data (AgD) and meta-regression methods to assess the evidence supporting the WHO recommendations and whether they needed any refinements. Methods Access to IPD from three trials was granted through ClinicalStudyDataRequest.com (CSDR). Seven modelling approaches were applied and compared: 1) Unadjusted AgD network meta-analysis (NMA) – the original analysis; 2) AgD-NMA with meta-regression; 3) Two-stage IPD-AgD NMA; 4) Unadjusted one-stage IPD-AgD NMA; 5) One-stage IPD-AgD NMA with meta-regression (one-stage approach); 6) Two-stage IPD-AgD NMA with empirical-priors (empirical-priors approach); 7) Hierarchical meta-regression IPD-AgD NMA (HMR approach). The first two were the models used previously. Models were compared with respect to effect estimates, changes in the effect estimates, coefficient estimates, DIC and model fit, rankings and between-study heterogeneity. Results IPD were available for 2160 patients, representing 6.5% of the evidence base and 3 of 24 edges. The aspect of the model affected by the choice of modeling appeared to differ across outcomes. HMR consistently generated larger intervals, often with credible intervals (CrI) containing the null value. Discontinuations due to adverse events and viral suppression at 96 weeks were the only two outcomes for which the unadjusted AgD NMA would not be selected. For the first, the selected model shifted the principal comparison of interest from an odds ratio of 0.28 (95% CrI: 10.17, 0.44) to 0.37 (95% CrI: 0.23, 0.58). Throughout all outcomes, the regression estimates differed substantially between AgD and IPD methods, with the latter being more often larger in magnitude and statistically significant. Conclusions Overall, the use of IPD often impacted the coefficient estimates, but not sufficiently as to necessitate altering the final recommendations of the 2018 WHO Guidelines. Future work should examine the features of a network where adjustments will have an impact, such as how much IPD is required in a given size of network. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01254-5.


(Continued from previous page)
Conclusions: Overall, the use of IPD often impacted the coefficient estimates, but not sufficiently as to necessitate altering the final recommendations of the 2018 WHO Guidelines. Future work should examine the features of a network where adjustments will have an impact, such as how much IPD is required in a given size of network.

Background
With an ever-growing number of scientific publications, the need for meta-analysis to help make sense of the evidence continues to escalate [1]. Meta-analyses require that the included studies be sufficiently similar; otherwise resulting estimates may be biased due to imbalances between studies in the distribution of trial or patient characteristics that affect the relative effectiveness of the interventions being compared, named effectmodifiers [2]. Meta-regression has long been used to overcome such biases, as well as improve precision [3].
Meta-analyses typically consist of combining aggregate data (AgD) results from publications. As such, metaregression most commonly consists of conducting linear regression of the study results as a function of an effect modifier, both in the aggregate. Two potential limitations to this form of meta-regression are: a limited number of data points to reliably estimate trends and risk of ecological fallacy (when trends at the trial-level do not match trends at the individual-level) [4]. A less common form of meta-regression involves using individual patient data (IPD), with or without AgD [5]. The use of IPD is less common, primarily due to the additional complications in obtaining such data [6]. Nonetheless, IPD metaanalysis can help overcome the two aforementioned limitations of AgD meta-regression [7]. Conducting meta-regression at patient-level values provide more data points, which also lends itself better to simultaneously adjusting for multiple variables [8].
Network meta-analysis (NMA) is an expansion of traditional meta-analysis that allows for the simultaneous analysis of multiple comparisons within a connected network of evidence [9]. Meta-regression is also an important technique to improve validity and precision of estimates in NMA [2,10]. Given that NMA lends itself to larger evidence bases, the most common manner in which IPD is used in NMA is in analyses that include both IPD and AgD [11]. There are various ways by which to use IPD and AgD to conduct meta-regression, including two-stage approaches (whereby adjusted AgD are created using the IPD) [6], and one-stage approaches that integrate IPD and AgD together using hierarchical models [12][13][14].
In 2005, Simmonds et al. reported that 28/44 (63%) published IPD meta-analyses used the two-stage approach to IPD-AgD NMA. In a more recent 2015 review, the same researchers report roughly even use of one-and two-stage approaches, though outside of survival outcomes, the use of one-stage IPD-AgD NMA has become more popular [6]. There have also been further developments of ones-stage methods with Jackson et al. developing an expanded hierarchical method that may improve IPD-AgD meta-analysis by further reducing the risk of ecological fallacy [15,16]. But there have not been many studies that have examined how the different types of meta-regression compare in their ability to improve analyses and conclusions. We sought to use a case study to examine if and which type (AgD or IPD) of meta-regression make such improvements.
The case study we used was a systematic literature review (SLR) and NMA that helped inform the 2016 World Health Organization (WHO) HIV clinical guidelines. The 2016 SLR found evidence of improved efficacy and tolerability of dolutegravir (DTG) relative to standard-dose efavirenz (EFV), the preferred first-line anchor treatment [17]. Following its completion, we sought IPD, independent of updating guidelines, for the comparison of AgD and IPD meta-regression methods and to see if more precise estimates might lead to stronger conclusions. In the 2016 analyses, DTG was nominally better than other treatments in its class, integrase inhibitors; however, these differences were seldomly statistically significant. In the same year, IAS-USA released its own clinical guidelines that suggested that all INSTIs were equivalent [18,19]. We sought to further investigate this point.
The primary objective of this study, which was part of a doctoral thesis [20], was to compare the impact of using different established AgD-and IPD-based methods for meta-regression adjustments. A secondary objective was also to examine the change in outputs in the evidence synthesis of antiretroviral therapy (ART) among first-line HIV patients when including IPDwith a particular focus on the relative efficacy, safety and tolerability of DTG relative to other anchor treatments.
A comprehensive systematic search of the literature was conducted on 12 February 2018 using the following databases: MEDLINE, EMBASE, and CENTRAL (see Additional file 1: Web Appendix for search strategy). Further manual searches of the 2016-2018 Conference on Retroviruses and Opportunistic Infections (CROI), the 2016 AIDS and Glasgow HIV conferences, and the 2017 International AIDS Society (IAS) conference were conducted. Additional studies were identified through a review of clinical trial registries and the reference lists of identified publications. Two investigators, working independently, scanned all titles and abstracts identified in the literature search and reviewed subsequent full-texts. A third investigator provided arbitration as needed for discrepancies. The same approach was used for data extraction.
The validity of individual RCTs was assessed using the Risk of Bias instrument, endorsed by the Cochrane Collaboration [32]. This instrument is used to evaluate 7 key domains: sequence generation; allocation concealment; blinding of participants and personnel; blinding of outcome assessors; incomplete outcome data; selective outcome reporting; and other sources of bias.
Reporting is in accordance with the preferred reporting items for systematic review and meta-analysis of individual participant data (PRISMA-IPD) guidelines [33].
Preparation of the individual patient data IPD were provided in a series of lengthwise tables following the Clinical Data Interchange Standards Consortium (CDISC) standards. Using these tables, an amalgamated IPD set combining all three studies was prepared. The patients were restricted to the full analysis sets, as in each of the respective trials [22,29,34]. The following outcomes were obtained: Viral suppression and change from baseline in CD4 cell counts at 24, 48 and 96 weeks; discontinuations, discontinuations due to adverse events, serious adverse events. There were no missing values except for CD4, for which analyses were only conducted on the observed data. Data were further verified to ensure that published results for each trial could be obtained from the IPD.

Statistical models
Only select outcomes were used for the purpose of comparing the various statistical models of interest for conducting meta-regression adjustments with IPD and AgD. Assessing the impact on the HIV related results involved applying the preferred adjustment method to the remaining outcomes. The statistical models are presented below. Only the more complex random-effects models are presented, but both fixed-and randomeffects were considered throughout.

AgD NMA
This served as the "baseline" results from which to draw comparisons. The model is as follows: In this equation, θ jk reflects the 'underlying' outcome for treatment k in study j that has been link-functiontransformed to a normally distributed scale (e.g., logit link for dichotomous outcomes). δ jbk is the trial-specific treatment effect of treatment k relative to treatment b. These trial-specific effects are drawn from a randomeffects distribution: δ jbk~N (d bk , σ 2 ). The pooled effects, d bk , are identified by expressing them in terms of the reference treatment A. The heterogeneity σ 2 is assumed constant for all treatment comparisons.

AgD NMA with meta-regression
Traditional meta-regression for NMA as described in the NICE Technical Support Document 3 [2], and the statistical analysis plan (SAP) [35].

Two-stage IPD-AgD NMA
For these analyses, aggregate values for the DTG trials were calculated using the IPD. Specifically, mixed linear regression among the IPD was used to model each outcome adjusted for candidate covariates and provide predicted estimates of the aggregate value within the target population. The adjusted values were then simply applied to the above methods.
One-stage IPD-AgD NMA with and without adjustments IPD and AgD were combined, along with metaregression, in a single model. This has the advantage of being a single model using all data. The model is shown in eq. (2), where θ ijk is the link-function-transformed parameter from the likelihood function of interest for the i th individual, in the j th trial, treated with treatment k. Similarly, η jk is the link-function-transformed parameter from the likelihood function for the AgD. μ jb and λ jb are the study-effects for the IPD and AgD, respectively. When including meta-regression adjustment, for the IPD β 0j is a study-specific effect of the subject-level covariate x ij . β 1Ak − β 1Ab reflects the interaction effects of covariate x ij for treatment k relative to control treatment b. k-1 different regression coefficient β 1Ak will be estimated by the model. Parameters of primary interest from analyses are the pooled estimates of d Ak , the estimates for the heterogeneity, and treatment-by-covariate interaction effects β 1Ak .
Two-stage IPD-AgD NMA with empirical-priors These models were the same as described in (2), except that the regression coefficients were provided with an empirical prior that was informed by the IPD. Rather than start with the non-informative prior for β 1Ak , the IPD were first used to estimate meta-regression coefficients using mixed-effects linear regression. The estimates and standard errors of the meta-regression were used to construct an empirical prior: β 1Ak Normalðβ; precβÞ . The idea here is to ensure that the IPD principally inform the metaregression (potentially avoiding some ecological fallacy bias).
One-stage IPD-AgD NMA with hierarchical meta-regression The final model that was considered was an expansion of one-stage IPD-AgD NMA that applies the hierarchical meta-regression adjustments first described by Jackson et al. and developed for NMA by Jansen et al. [15,16] Unfortunately, these methods have only been developed for binomial outcomes. The model is shown in (3). It shares the same notations as (2).
The IPD part of this model is the same as that of the one-stage IPD-AgD NMA with adjustments, with the exception that β 0 is not study specific but fixed across studies because it is now also used in the AgD part of the model (which reflects different studies). For the AgD part of the model, the number of events r in study j for treatment k is assumed to be binomially distributed with probability q jk and sample size n jk . q jk can be considered as the average probability of the response of interest for an individual in study j treated with intervention k.
The covariate adjustment values β 1Ak are distinct from those used in previous equations in that they are patient-level effects rather than trial-level effects. Even in the other IPD models, the effects are trial-level because they are estimated by both IPD and AgD. In (3) the values q 0 jk and q 1 jk are latent probabilities, therefore it is not possible to point identify β 0 and β 1Ak from AgD only. As such, these are solely estimated through IPD, which removes the possibility of the ecological fallacy bias entirely.

Statistical analyses
The following outcomes were used for the comparison of meta-regression methods: viral suppression and change from baseline in CD4 cell counts at 48 weeks (+/− 4 weeks), discontinuations, and discontinuations due to adverse events. We selected these because DTG and EFV 400 are viewed to have as good or better efficacy and improved tolerability relative to EFV [36]. The target population was set to be the average population amongst EFV patients, the recommended preferred first-line regimen at the time. The following baseline variables were considered for covariate adjustments: CD4 cell counts, viral RNA (log-transformed), and proportion of males.
The three trials for which IPD were available tended to include healthier patients (higher baseline CD4 and lower baseline HIV RNA) and more males than the average EFV trial. In addition to being imbalanced, these factors were both plausible effect-modifiers and well-reported. Analyses consisted of comparing the modeling approaches described in the previous section. Identity link functions with Normal likelihoods were used for continuous outcomes. For dichotomous outcomes, logit link functions were use.
To assess the different models, the following measures were compared: Treatment-effect estimates and posterior distributions of key comparisons. Coefficient estimates and posterior distributions Deviance information criterion (DIC) value comparisons across models, as well as pD and deviance Between-study heterogeneity (between-study variance of the modelled outcome, e.g., log odds ratio [OR]; as calculated in the random-effects model) The proportion of points falling outside the lines c = 3 and c = 4 within leverage plots (the curves are of the form x 2 + y = c). Points outside of the lines with c = 3 can generally be identified as contributing to the model's poor fit (see TSD2) [37]. Change in SUCRA (surface under the cumulative ranking curve) scores The posterior distributions for treatment-effect estimates are the output that are subsequently used to draw inference and for decision-making in Bayesian modeling. Therefore, this was a primary measure of modeling impact. There were no specific hypotheses regarding how these would be affected beforehand. For comparisons in treatment-effect, the absolute effect was used because it is the most interpretable. For example, a difference of 5% in the proportion of viral suppression is more interpretable than a difference of 1.5 in the logarithm of the odds ratio. For the dichotomous variables, a difference of 1% was chosen as the threshold of minimal clinically important difference. For a change in CD4, a difference of 10 cells/mm 3 was chosen to align with the values that were used in the WHO reviews. The SAP for this study was publicly available prior to conducting the analyses and provides further details regarding methods [35].

Software
The parameters of the different models were estimated using a Markov Chain Monte Carlo method implemented in the JAGS software package. The first series of 30,000 iterations from the OpenBUGS sampler were discarded as 'burn-in', and the inferences were based on additional 50,000 iterations using two chains. For all analyses, model convergence was assessed through trace plots, density plots and Gelman-Rubin-Brooks (shrink factor) plots [38]. All analyses were performed using R version 3.4.4 (http://www.r-project.org/) and JAGS version 4.3. Code used to conduct the analyses is presented in the Additional file 1: Web Appendix.

Evidence base
Study and patient selection are presented in the PRIS MA-IPD [33] flow diagram in Fig. 1. The search was conducted in three phases: the first search of AgD was conducted in May 2015 (the original SLR), a search for IPD was conducted on 15 August 2016, and then an updated search of AgD was conducted on 12 February 2018. The IPD search in 2016 involved both YODA and CDSR; however, data were only obtained through CSDR. These included 2160 patients from FLAMINGO (DRV/r vs. DTG) [22,23], SINGLE (DTG vs EFV) [24][25][26][27], and SPRING-2 (DTG vs RAL) [28,29]. As shown in Fig. 1, the 2160 patients for which individual patient-level data were available represent 6.5% of the total evidence base (2160/33,148), and as shown in Fig. 2, the three trials cover a total of 3 of 24 edges (12.5%; shown in red) with trials providing head-to-head evidence.
Overall study quality was generally high (i.e., low risk of bias). Exceptions were restricted to open-label trials having a high risk of bias due to blinding and some of the more recent trials that were only reported upon in posters having insufficient information to determine with certainty that the risk of bias was either low or high (Additional file 1: Web Appendix).
The patient characteristics have been described previously [21]. As shown in Figure 1-4 of the Additional file 1: Web Appendix, in addition to being the variables that were best reported in the evidence base, the covariates selected for adjustments in this study had a high degree of variability. This was especially apparent in the baseline CD4. For full posterity, the reported results by study are provided in Tables 4-5 of the Additional file 1: Web Appendix.

Comparing meta-regression adjustments
Overall, the use of IPD appeared to have a negligible impact on the results. In each outcome, the use of IPD impacted an aspect of the resultssay DIC, rankings or covariate estimatesbut the aspect affected changed from one outcome to the next and tended to not be meaningful. The full set of results are shown for viral load at 48 weeks. For the remaining primary outcomes, tables and figures are presented in the Additional file 1: Web Appendix and only key highlights are focused on here. Table 1 presents the model fit for the various models of interest for viral suppression at 48 weeks. The lowest DIC was for the unadjusted one-stage IPD-AgD NMA; however, the difference between it and the base model was not meaningful (requires a difference ≥ 3, as per SAP). The fit using the one-stage IPD-AgD NMA were considerably better than those using informative priors based on external analyses (two-stage empirical-priors approach). The use of IPD appeared to have minimal impact on the heterogeneity parameter estimate for this outcome (as calculated by the random-effects model).
The proportion of observations above the third and fourth parabola in the leverage vs deviance plot tended to be stable. Nonetheless, the trend was towards having more outliers among the two-stage AgD NMA. Rankings remained generally unchanged by the model choice. Change in rankings tended to happen in the models with the highest DICs and hence those were not at risk of being favoured. Changes in the top three rankings tended to be limited to a re-ordering of the same treatments, with DTG usually remaining on top (Additional file 1: Web Appendix). Table 2 presents the estimated effects for the comparisons of primary interest (DTG, EFV 400 and EFV). Metaregression adjustments based on IPD tended to lower the estimated efficacy of DTG, but almost never rendered it non-significant. The exception was the use of hierarchical meta-regression, which was limited to single variable adjustments. Importantly, these analyses included much wider credible intervals than other analyses and this was consistently observed throughout the outcomes. This aligns with results previously presented by Jansen [16]. The analyses also led to the largest shifts in estimates and these were in either direction depending on the variable of adjustment. While these methods are noted for increasing validity, we cannot conclude bias in the previous analyses on the basis of these results. Mean and maximum changes in the log-odds were large across all analyses. These changes are more easily interpretable through the change in proportions, where the maximum change was often close to 4%. The difference between 86 and 90% of patients being virally suppressed would have important implications.
The estimated coefficients across the analyses are presented in Table 3. When comparing the meta-regression coefficients, the coefficient for CD4 was statistically significant in each of the IPD analyses that included it as a covariate. Moreover, its estimated effect size was consistent across the model using IPD. The coefficient estimates were notably different across AgD and IPD models, with HIV RNA leading the way.
For a change in baseline CD4 at 48 weeks, no models led to a meaningfully lower DIC than the unadjusted AgD NMA; however, contrary to viral suppression, here it was the two-stage models that appeared to have the best fit (DIC ranging from 182.02-184.23, relative to183.63 for the base model) among the IPD adjusted models (DIC up to 191.41 for the rest). Moreover, the two-stage analyses also reduced the number of points outside the fourth parabola in the leverage plots (0 vs. , suggesting an overall better fit to the data. The rankings were the measure most affected by choice of model for CD4. DTG was ranked first in the base case and in the IPD-AgD NMA, but EFV 400 was ranked first when using AgD meta-regression and two-stage IPD-AgD NMA. DTG remained the favoured treatment in the one-stage and two-stage empirical-priors. With respect to the research question at hand, using a two-stage approach would impact how data were interpreted, given the change in rankings, particularly with DTG becoming a midranked treatment and EFV 400 becoming the number one ranked treatment. Finally, with respect to CD4 most regression coefficients were not statistically significant, but similarly to the viral suppression analysis, the estimated coefficients using IPD were substantially different than those obtained through AgD meta-regression. For example, the effect of baseline HIV RNA went from 2.5 (95% CrI: − 21.2, 26.7) to 45.5 (95% CrI: 31.3, 59.9). In other words, the AgD meta-regression estimated that on average a trial initiating at a baseline HIV RNA that was one log unit higher led to a relative change in CD4 that was 2.5 cells/ml higher, whereas the one-stage IPD-AgD NMA estimated an average increase that was 45.5 cells/ml higher (keep in mind  that trials did not differ by a full log unit of baseline HIV RNA). For discontinuations, none of the models were meaningfully different from the base AgD NMA with respect to DIC. Change in estimates tended to be minimal across models. Interestingly, the exception to this was the HMR IPD-AgD NMA with adjustments for the proportion of males, which was also the model with the lowest DIC. In this model, both DTG (OR: 0.36; 95% CrI: 0.22-0.57) and EFV 400 (OR: 0.61; 95% CrI: 0.30-1.23) were considerably more tolerable relative to EFV than in the unadjusted model, with an OR of 0.52 and 0.91, respectively.
Out of all the primary outcomes, only discontinuations due to adverse events had a model other than the unadjusted AgD NMA selected through a meaningfully lower DIC. In this case, it was the two-stage empirical priors approach with adjustments for the proportion of males that was selected with a DIC of 202.79 vs. 205.79. The one-stage analyses and two-stage empirical-priors analyses also led to a lower estimate of the betweenstudy heterogeneity, suggesting that the adjustments helped account for between-study differences as well. The selected model shifted the principal comparison of interest from an OR of 0.28 (95% CrI: 0.17-0.44) to 0.37 (95% CrI: 0.23-0.58), but this would have little impact on decision making. With respect to absolute effects, most model adjustments led to minimal differences. This aligns well with the fact that none of the covariates were found to be statistically significant. The rankings were stable across models; however, with the selected model, DTG changed from being ranked 1st to being ranked 2nd.

Comparative efficacy and safety
Largely, results of the analyses for the secondary outcomes led to similar impacts to those observed in the selected four outcomes above. Only in the case of viral suppression at 96 weeks, the model adjusted for baseline HIV RNA was selected (instead of the unadjusted model). As shown in Table 4, the DIC for the selected model more than 12 units smaller than the AgD NMA.
The table also shows that there are other adjustments that lead to similar DICs, but in this case, we've selected the smallest DIC. There was no meaningful impact with respect to rankings across outcomes.
The impact of adjustments with IPD on the actual estimates was noticeable, particularly in the case of viral suppression and change in CD4 cell counts at 96 weeks. In the case of viral suppression, the relative efficacy of DTG was reduced relative to both EFV and EFV 400 . In the selected model, the OR decreased from 1.94 (95% CrI: 1.52, 2.48) to 1.58 (95% CrI: 1.23, 2.03) relative to EFV, with a similar change relative to EFV 400 . While none of the effects changed with respect to statistical significance, the average change in modeled proportions was rather large at a mean shift of 4.1% in the selected model.

Discussion
This study examined the change in outputs in the evidence synthesis of ART among first-line HIV patients when including IPD and compared the extent of this impact using different established IPD-based methods for meta-regression adjustments utilizing a mixture of IPD and AgD. The four methods of adjusting for covariate imbalances using IPD that were compared are: a twostage approach, a two-stage approach with empirical priors, a one-stage approach, and hierarchical metaregression. In this case study, none of the four methods stood out as a clearly superior approach solely on the basis of the numerical results. Nonetheless, this study does provide insights into these methods of adjustment. First, while in most analyses, the four strategies were in general agreement, there were situations where the results differed notably between the two-stage approach and other approaches, and thus the choice of method matters. Second, the hierarchical meta-regression tended to lead to the most considerable changes in effect estimates, but did so at the steep cost of reduced precision. Third, there was a remarkable difference in the coefficient estimates obtained through IPD methods and those obtained through more traditional meta-regression using AgD only, suggesting that when adjustments are needed,  AgD NMA meta-regression -CD4     IPD is more appropriate to use. This study also aimed to understand the potential impact of including individual patient data for the particular application of comparing the therapeutic landscape of anchor treatments in firstline ART for the treatment of HIV. To this end, it was reassuring to find that the conclusions reached through the evidence synthesis supplemented by the individual patient data did not lead to changes that would have impacted the WHO change in guidelines that took place in December 2018 and subsequently in 2020 [39,40]. The possibility that the limited impact of IPD on study results are due in part to the relatively small number of patients in the network providing IPD was investigated through a separate simulation study [41]. The simulation study was borne from this work. The aim of the simulation was to investigate various network factors that could be associated with the degree of benefits from including IPD, rather than to compare the various methods of adjustments, as was the goal here. The simulation study did find that the benefits of IPD are greater in small and/or sparse networks and that having too few IPD leads to negligible benefits. Another possible reason for the lack of differences between methods is a lack of ecological fallacywhereby trends in AgD are do not reflect the trends in IPDwhich is when differences between IPD and AgD adjustments are most important. Nonetheless, it is important to note that while there were minimal differences in the results between the multiple modeling methods, these do not imply that there are no differences between the methods. Several differences are still distinguishable within this case study, as further explained below.
Despite the limited impact on the interpretation of the therapeutic landscape on the basis of IPD, there are a number of advantages to the use of IPD that were observed and that have been discussed previously [6]. First, IPD more easily allows for the simultaneous adjustment of multiple covariates because it has much higher degrees of freedom. Only edges with multiple trials and differences in covariate values along those edges allow for the estimation of the covariate of interest in an AgD setting. Second, the results of this study suggested that where traditional AgD meta-regression was feasible, it was underpowered, as demonstrated by the estimated coefficients. Under the assumption that the IPD estimates based on 2160 data points are more accurate than the meta-regression adjustments based on trends among a small number of aggregate data points, the large differences seen in estimates suggest an inaccuracy among the AgD meta-regression.
There is a clear trend towards improved access to IPD and its increased use [11,42,43]. The most popular IPD methods have the distinct advantage of being able to adjust for unanchored networks, but require strong assumptions (no unobserved prognostic factors and effect-modifiers) and are usually limited to indirect comparisons [8,44]. As the use of IPD increases, we can expect increased use of IPD-AgD NMA, such as the methods compared in this study. In terms of meta-analyses and network metaanalyses, there has been a shift from the predominant use of a two-stage approach to a one-stage approach [6]. As Simmonds et al. explain in their review, this is likely due to a growing familiarity with methods, improvements in computing and the recognition that regression model offers the greatest flexibility for IPD analysis [6]. The two-stage analyses in this study included the use of regression in the first stage, which was not always used in published two-stage analyses [6]. To the best of our knowledge, no study has compared the results of one-stage and two-stage IPD-AgD NMA directly. In most analyses, there were no meaningful differences in the results using either approach. Nonetheless, there were instances where one-stage and two-stage adjustments went in opposite directions. This may be a result of having the regression adjustments for the IPD done independently for each trial in the two-stage approach, rather than collectively. In the absence of differences, the two-stage approach had the advantage of being computationally less intensive and being easier to code. Conversely, AgD Aggregate data, IPD Individual patient data, NMA Network meta-analysis, DIC Deviance information criterion, pD Effective number of parameters, prop3 Proportion of observations above deviance 2 + leverage = 3, prop4 Proportion of observations above deviance 2 + leverage = 4. Between-study heterogeneity obtained through the random-effects model, not the fixed-effect model if it was selected the one-stage approaches had the benefit of having more easily interpretable regression coefficients and having all the analytical steps combined. Given these advantages and the fact that choice appeared to matter for some analyses, the recommendation would be to not use the traditional two-stage approach.
The choice between one-stage IPD-AgD NMA and two-stage IPD-AgD NMA with empirical-priors is less straightforward, and is ultimately dependent on the evidence base at hand. The difference between these two approaches was much more subtle. The empirical-priors method does not appear to have been used previously. As described in the methods, the motivation for its use was to isolate the coefficient estimation to the IPD (i.e., reduce the influence of the AgD on the estimation of the regression adjustments). As such, the greater difference is seen in comparisons for which there is no IPD, so that this method becomes more important when there are numerous comparisons with AgD only. Inspection of the DTG vs. EFV estimates, for which there was an IPD trial, reveals that there was general agreement between the two modeling approaches (when keeping the same covariates). On the other hand, for the EFV 400 vs. EFV comparisons, for which there were no IPD available, the difference was notable, with the empirical-priors approach leading to a larger shift in estimates. In situations where there is an abundant number of trials and treatment comparisons that have IPD, such as in the Donegan et al. example [45], the one-stage approach, which is already well adopted, would be recommended. For networks of evidence that have few treatment comparisons with IPD trials, the empirical-priors approach is likely to maximize the IPD.
Although hierarchical meta-regression has shown some promising results, it appears that more research is still needed for these methods. Simulation work has suggested that these methods reduce bias [16], which is usually favoured over precision; however, the loss of precision observed in our work was not negligible. Moreover, it was difficult to use these methods with multiple variables at a time and the methods for use on continuous outcomes have not yet been published. Once further advancements are conducted on this method, it will be worthwhile reviewing a comparison with traditional one-stage analyses again.
As discussed above, the implications for first-line ART regimens (i.e., our secondary objective) are minimal. The evidence continues to support the DTG as the more efficient and tolerable choice of treatment. In instances where models were selected, the differences between treatments tended to be less pronounced, albeit DTG continued to perform best with respect to viral suppression, change in CD4 and tolerability.
There are several limitations to this study. First and foremost, there were very few trials for which IPD were obtained, which is a problem commonly encountered by researchers. These represented a small fraction of the trials and patients and may explain why the impact on model estimates appeared to be somewhat muted (i.e., too few IPD may get washed out in a large network). The limitation of too few data was exacerbated by the missed opportunity to get IPD for the SPRING-1 trial. The oversight was identified too far along in the process and thus could not be corrected in time. Given that this was a small Phase 2 trial that would have added a small fraction of patients to an already small sample of IPD, the impact of including or excluding its IPD is very likely to be negligible. Moreover, the SPRING-1 trial was still included in the analyses. Second, use of a single case study, particularly one with few IPD relative to the size of the network, limits the generalizability of the comparisons between the different methods of adjustments to other settings. To this end, while some conclusions have been reached, further research will be needed. Third, it is unclear whether the multiple forms of metaregression interfered with one another. To account for differences in backbone regimens, an arm-based metaregression was used in addition to the more traditional trial/patient-based regression adjustments, and this may have been a nuisance to the modeling process. Third, the trials for which IPD were available were principally conducted in high-income countries, which may limit the ability to make adjustments needed in studies conducted in the LMICs. Nonetheless, there tended to be a wide range of values for the covariates of interest, so this is unlikely to have been an issue [22,23,25]. Fourth, specific to this evidence base, there were numerous other potential effect-modifiers that were too poorly reported to allow for meta-regression adjustments to be made. These principally included ethnicity and acquisition risk groups. Finally, due to low event counts and data unavailability, not all outcomes were available for re-analysis using IPD.

Conclusion
There are many ways in which IPD can be integrated with AgD for the purpose of NMA. Choosing the method by which to integrate these data will impact results. In most cases, the one-stage approach is recommended; however, in situations with fewer treatment comparisons that have IPD, the empirical-priors approach is a viable alternative. Further research is needed to understand whether having too few IPD can mitigate their beneficial impact. Finally, even with the revised analyses, DTG continues to demonstrate improved efficacy and tolerability over other anchor treatments.