 Research
 Open access
 Published:
Comparing Bayesian hierarchical metaregression methods and evaluating the influence of priors for evaluations of surrogate endpoints on heterogeneous collections of clinical trials
BMC Medical Research Methodology volume 24, Article number: 39 (2024)
Abstract
Background
Surrogate endpoints, such as those of interest in chronic kidney disease (CKD), are often evaluated using Bayesian metaregression. Trials used for the analysis can evaluate a variety of interventions for different subclassifications of disease, which can introduce two additional goals in the analysis. The first is to infer the quality of the surrogate within specific trial subgroups defined by disease or intervention classes. The second is to generate more targeted subgroupspecific predictions of treatment effects on the clinical endpoint.
Methods
Using real data from a collection of CKD trials and a simulation study, we contrasted surrogate endpoint evaluations under different hierarchical Bayesian approaches. Each approach we considered induces different assumptions regarding the relatedness (exchangeability) of trials within and between subgroups. These include partialpooling approaches, which allow subgroupspecific metaregressions and, yet, facilitate data adaptive information sharing across subgroups to potentially improve inferential precision. Because partialpooling models come with additional parameters relative to a standard approach assuming one metaregression for the entire set of studies, we performed analyses to understand the impact of the parameterization and priors with the overall goals of comparing precision in estimates of subgroupspecific metaregression parameters and predictive performance.
Results
In the analyses considered, partialpooling approaches to surrogate endpoint evaluation improved accuracy of estimation of subgroupspecific metaregression parameters relative to fitting separate models within subgroups. A random rather than fixed effects approach led to reduced bias in estimation of metaregression parameters and in prediction in subgroups where the surrogate was strong. Finally, we found that subgroupspecific metaregression posteriors were robust to use of constrained priors under the partialpooling approach, and that use of constrained priors could facilitate more precise prediction for clinical effects in trials of a subgroup not available for the initial surrogacy evaluation.
Conclusion
Partialpooling modeling strategies should be considered for surrogate endpoint evaluation on collections of heterogeneous studies. Fitting these models comes with additional complexity related to choosing priors. Constrained priors should be considered when using partialpooling models when the goal is to predict the treatment effect on the clinical endpoint.
Background
There is broad interest in the use of validated surrogate endpoints to expedite clinical trials in areas of slowly progressing disease, such as chronic kidney disease (CKD) [1,2,3,4,5]. A surrogate endpoint is typically a measure of disease progression captured earlier than an established clinical endpoint and should have the property that the treatment effect on the surrogate accurately predicts the treatment effect on the clinical endpoint [6,7,8]. This predictive potential is commonly established in a metaregression analysis of previously conducted trials, where the metaregression quantifies the strength of the association between treatment effects on the clinical and surrogate endpoints [3,4,5,6,7,8]. Accurate estimation of the metaregression parameters requires variability in the treatment effects on the surrogate and clinical endpoints across trials used for analysis. To achieve this, the collection of trials can contain heterogeneity in terms of interventions and subclassifications of disease [3, 4]. There is often interest among entities such as regulatory agencies regarding the performance of the surrogate in prespecified, clinically or biologically motivated, and mutually exclusive subgroups defined by intervention or disease classes [1]. These interests introduce two specific goals the analytical approach must facilitate: The first is accurate estimation of subgroupspecific metaregression parameters. The second is accurate prediction of treatment effects on the clinical endpoint, either for subgroups used in model fitting or for those not available for model fitting (e.g., for a novel intervention).
One metaregression methodology involves a Bayesian hierarchical model, which can be used to account for estimation error of the treatment effects on both endpoints as well as the correlation of the sampling errors (a frequently used weighted generalized linear regression approach accounts only for sampling error of the effect estimate on one of the two endpoints) [6, 8, 9]. Under the hierarchical Bayesian approach, it is common to assume all trials used in the analysis to be fully exchangeable despite underlying differences in interventions or diseases across trials [4,5,6, 8]. In effect, this is accomplished by fitting a model with a single metaregression relating treatment effects on the clinical endpoint to those of the surrogate endpoint to all trials available for the analysis, which we refer to as the “fullpooling” approach. Alternatively, distinct metaregressions can be fit within subgroups in what we will refer to as the “nopooling” approach [4, 7]. There are often too few trials and insufficient variability in treatment effects within subgroups to estimate the metaregression parameters with satisfactory precision under a strict nopooling strategy. An additional limitation to the full and nopooling strategies is that each induces limitations to modelbased prediction of the treatment effect on the clinical endpoint in a future trial. This is especially the case when there is interest in prediction for a trial which is of a “new subgroup”, one that was not available for the initial surrogacy evaluation. Afterall, in the ideal scenario a surrogate can be used for a trial evaluating a novel intervention or when applying an approved indication in a new patient population. Use of a fullpooling model requires the assumption that any future trial is fully exchangeable with the previous trials. Use of a nopooling approach requires the future trial to be of a subgroup used for the surrogacy evaluation (“existing subgroup”).
Bayesian hierarchical metaregression lends naturally to a “partialpooling” compromise to these earlier approaches, where a between subgroup distribution is assumed for some or all subgroupspecific model parameters [7]. The partialpooling approach relaxes the assumption of fullexchangeability of all trials used for the analysis, can improve precision of inference on subgroupspecific parameters due to data adaptive information sharing across subgroups, and provides a framework for modelbased prediction of an effect on a clinical endpoint for a trial of either an existing or a new subgroup. However, critical decisions needed to fit models of this class are without empirical guidance in the literature. For example, use of fixed and random effects approaches are used interchangeably when employing fullpooling models, and the implications of these two approaches are not well understood under a partialpooling model [8]. To our knowledge, there is also not yet work evaluating the impact of the choice of priors under partialpooling strategies, even though the role of certain prior distributions is likely to be amplified in likely scenarios in which the number of subgroups is small.
In this paper, we provide results from a series of analyses intended to help guide practical decision making for surrogate endpoint evaluations on collections of heterogeneous studies. We explore the extent to which partialpooling approaches improve precision in key posteriors of interest in surrogate evaluation, the extent to which bias occurs, contrast fixed and random effects variants of models described, and explore the impact of priors. In the Methods section, we describe the modeling approaches evaluated, priors, and how these methods can be used for prediction. In the Results section, we provide results of a limited simulation study and of an applied analysis of CKD trials. We then conclude with the Discussion section.
Methods
Modeling approaches to the triallevel analysis of a surrogate
For the triallevel evaluation of a surrogate endpoint, a two stage approach to the analysis is often used [6,7,8]. In the first stage, treatment effects on both the clinical and surrogate endpoint as well as standard errors and a withinstudy correlation between the error of the estimated effects are calculated for each trial. These triallevel measures are used as the data input in the metaregression evaluation (the second stage). A twolevel hierarchical model for the metaregression can be used to account for withinstudy estimation error for both treatment effects [4,5,6,7,8].
Under the twostage approach, one key distinction between commonly used secondstage models involves whether true treatment effects on the surrogate endpoint are viewed as fixed or random [6, 8]. Under the fixed effects approach, the true treatment effects on the surrogate endpoint are fixed and the true effects on the clinical endpoint are regressed on the true effects on the surrogate assuming Gaussian residuals. Under the random effects approach, the true treatment effects on both the surrogate and the clinical endpoints are assumed to follow a bivariate normal distribution [4, 5, 8]. The withinstudy joint distribution can be reasonably approximated with a bivariate normal distribution due to asymptotic normality, but the bivariate normality assumption for the betweenstudy model is made for modeling convenience. Bujkiewicz et al. contrast the predictive performance of a surrogate under fixed and random effects approaches when using the fullpooling approach, but do not summarize differences in estimates of key parameters such as the metaregression slope [8]. Papanikos et al. evaluate and contrast different fixed effects approaches in subgroup analyses of a surrogate, but do not compare fixed and random effects approaches [7]. We hypothesized that the fixed and random effects approaches could produce differing results because there may be more or less shrinkage in the true effects on the surrogate across trials (the “xaxis” variable in the regression) depending on the method used.
We next introduce the full pooling random and fixed effects models, which are applicable when the clinical trials being analyzed can be regarded as exchangeable. Let there be N total clinical trials, each of which compares an active treatment to a control. For trials \(j = 1, \dots , N\), \((\widehat{\theta }_{1j}, \widehat{\theta }_{2j})'\) jointly represents the suitably scaled within study estimates of treatment effects on the clinical and surrogate endpoints for trial j. The pair \((\theta _{1j}, \theta _{2j})'\) represents the latent joint true treatment effects on the clinical and surrogate endpoints in study j. We let \(\Sigma _j\) denote a within study variancecovariance matrix for study j (\(\Sigma _{j1,1} = SE(\widehat{\theta }_{1j})^2\) is the squared standard error of the estimated clinical effect, \(\Sigma _{j2,2} = SE(\widehat{\theta }_{2j})^2\) the squared standard error of the estimated surrogate effect, \(\widehat{r}_j\) is the estimated within trial correlation for study j, implying \(\Sigma _{j1,2} = \Sigma _{j2,1} = \widehat{r}_j SE(\widehat{\theta }_{1j}) SE(\widehat{\theta }_{2j})\)). When the standard errors and within study correlation are available, it is customary to consider all entries of \(\Sigma _j\) fixed and known [6,7,8, 10, 11]. For the random effects model, \(\mu _s\) represents a population average true treatment effect on the surrogate, and \(\sigma _s^2\) the between trial variance in true effects on the surrogate. We parameterize the model such that \(\alpha\) denotes the metaregression intercept, \(\beta\) the slope, and \(\sigma _e\) the residual standard deviation. The following represents the fullpooling random effects model (FPRE).
To fit a fullpooling fixed effects model (FPFE), rather than assuming a Gaussian distribution for which parameters will be estimated for \(\theta _{2j}\) as above, an independent prior is assigned directly to each \(\theta _{2j}\).
Next, suppose that the N trials are to be divided into I total subgroups because exchangeability is plausible for the trials within each subgroup but not necessarily between trials in different subgroups. In our experience, regulatory agencies have expressed concern of heterogeneity in surrogate quality across prespecified subgroups present in the data being used to evaluate CKDrelevant surrogate endpoints. The models discussed throughout the remainder of this paper are thus intended for similar scenarios where: the I subgroups which motivate concern over the full exchangeability of trials (i.e., there might be a different association between treatment effects on the clinical and surrogate endpoint depending on the subgroup a trial pertains to) are presented to the statistical analyst independent of any statistical criteria, subgroup assignment for the trials available for model fitting is not ambiguous (e.g., the inclusion and exclusion criteria of a trial would easily determine the subgroup assignment if diseasebased subgroups are of interest), and there can not be misclassification of trials into the wrong subgroups. When such an analytical scenario is presented, we might first consider fitting separate models within each subgroup. For \(i = 1,\dots , I\), the following represents what we refer to as a nopooling random effects (NPRE) model for the \(j^{\text {th}}\) trial within the \(i^{\text {th}}\) subgroup.
We note that one could fit a nopooling fixedeffects model by placing a prior directly on each \(\theta _{2ji}\), rather than assuming the Gaussian distribution as above.
For the partial pooling approach, we can incorporate betweensubgroup distributions as an intermediate layer in the Bayesian analysis to induce information sharing across subgroups [7, 12]. The terms controlling heterogeneity between subgroups are informed by the data. For example, if the data suggests a lack of betweensubgroup heterogeneity for any given term, fitting this model should result in substantial information sharing and similar subgroupspecific parameter estimates. The partial pooling model may generate some amount of bias, but could counterbalance this bias with increased precision due to information sharing [12]. Among other reasons, because betweensubgroup variation drives the dataadaptive information sharing, betweensubgroup variance terms were of primary interest in our investigation of the influence of priors.
A partialpooling random effects (PPRE) model is displayed below. Consider there are additional model parameters necessary to define this model. We let \(\mu _s\) and \(\sigma _s^2\) represent the between subgroup average and variance of true treatment effects on the surrogate; \(\alpha\) and \(\sigma _{\alpha }^2\) and \(\beta\) and \(\sigma _{\beta }^2\) represent the between subgroup average and variance of the metaregression intercept and slope, respectively; \(\tau _s\) and \(\tau _e\) denote the betweensubgroup mean logtransformed true surrogate effects standard deviation and metaregression residual standard deviation, respectively; \(\gamma _s^2\) and \(\gamma _e^2\) denote the between subgroup variance of the logtransformed withinsubgroup true surrogate treatment effects standard deviation and metaregression residual standard deviation, respectively.
If fitting a partialpooling fixed effects (PPFE) model, a prior can be placed directly on each \(\theta _{2ji}\), rather than assuming the hierarchical Gaussian distribution displayed above. We display an example of a PPFE model here to contrast it with the PPRE model more clearly. In this example, we place a N(0,10\(^2\)) prior on each trial’s true treatment effect on the surrogate.
To our knowledge, there has been just one other paper to evaluate partialpooling strategies for the triallevel analysis of a surrogate. As discussed in the introduction, Papanikos et al. evaluated different fixed effects partialpooling approaches [7]. An additional difference between the PPFE model displayed above and those considered by Papaniko’s et al. is that there was not a betweensubgroup distribution assumed for \(\sigma _{ei}\) in their models. One advantage of allowing a betweensubgroup distribution for \(\sigma _{ei}\) is that it enables estimating posteriors for parameters defining betweensubgroup distributions for all metaregression parameters (intercept, slope, and residual variance). This subsequently facilitates prediction for a trial of a new subgroup, as is discussed in the Generating posterior predictive distributions section.
Analysis set 1: simulation study
We generated trial level summary data (estimated treatment effects, standard errors, and the withinstudy correlations) based on four broad simulation setups, where within each we introduced two variants depending on the distribution used to simulate true treatment effects on the surrogate. The setups considered were motivated by applied data used to evaluate GFR slope. We consider three subgroups of trials as in previous evaluations of GFR slope and to reflect the likely scenarios where the available data limits the number of subgroups, stressing the potential for benefit from data adaptive partialpooling [4]. We simulated 15 mediumtolarge trials per subgroup (standard errors on either endpoint reflect trials with roughly 3002000 patients). Withinstudy correlations were drawn equally at random from the range of values present in our application data. Without loss of generalizability, we modeled a negative triallevel association. As discussed in the section titled Analysis set 2: application analysis of CKD trials, there is a negative association between treatment effects on the clinical endpoint and treatment effects on GFR slope. We also varied the sizes of subgroups and the degree of betweenstudy variability in true effects on the surrogate. Broadly, we consider one setup (S1) where there is homogeneity in the quality of the surrogate across subgroups, another setup (S2) where the surrogate is weak in two subgroups and strong in another, another setup (S3) where the surrogate is weak in one subgroup and strong in the other two, and a final setup (S4) where surrogate quality is different in all three subgroups. The strength of the surrogate was defined by the true metaregression \(R^2\). Earlier work has proposed that \(R^2 \in (0,0.49)\), \(R^2 \in (0.5,0.72)\), and \(R^2 \in (0.73,1)\) suggest a weak, moderate, and strong surrogate, respectively [13]. For our purposes, we simulated data from true parameter values to obtain \(R^2 = 0.35,0.65,0.95\) to define the surrogate as weak, moderate or strong within subgroups, respectively.
Consider the data generating model below for the first variant (V1) of the four simulation setups. To simulate estimated clinical and surrogate effects for trial j (\(j = 1,\dots , 15\)) in subgroup i (\(i = 1,2,3\)) when true surrogate effects are Gaussian, we first drew true surrogate effects from (9), then drew conditional true clinical effects from (10), and finally drew a pair of estimated effects using (11). The standard errors and withinstudy correlations forming the matrices \(\Sigma _{ji}\) were drawn according to the rules described above using uniform distributions to reflect variation in trial sizes.
We also sought to contrast results under the different models when true treatment effects on the surrogate were distinctly nonGaussian (V2). We used the following data generating model, where true effects on the surrogate for each trial were drawn from a bimodal distribution (12).
To summarize results, we provide simulation average posterior medians, \(2.5^{\text {th}}\) and \(97.5^{\text {th}}\) percentiles for models fit across 100 simulated datasets per setup. We also summarize posterior predictive distributions (PPDs  described further below).
Analysis set 2: application analysis of CKD trials
We compare analyses using the models discussed above on a set of 66 CKD studies. Data from these studies was collected by the Chronic Kidney Disease Epidemiology Collaboration (CKDEPI), an international research consortium [3, 4]. Evaluations of GFR slope on this collection of studies have been described extensively [3, 4]. For the purposes of this paper, we focus on the GFR “chronic slope” as the surrogate [4]. Timetodoubling of serum creatinine or kidney failure is used as the clinical endpoint, which is accepted by regulatory agencies and is widely used as the primary endpoint in pivotal phase 3 clinical trials of CKD [3]. Treatment effects on the clinical endpoint were expressed as log transformed hazard ratios (HRs), estimated using proportional hazards regression. A shared parameter mixed effects model was used to jointly model longitudinal GFR trajectories and the time of termination of GFR followup due to kidney failure or death for each randomized patient. Treatment effects on the chronic GFR slope are expressed as the mean difference in the treatment arm slope minus the control arm slope, expressed in ml/min/1.73 m\(^2\) peryear. Further detail on the methods used to estimate effects on GFR slopebased endpoints are described elsewhere in the literature [4, 14]. Finally, we obtained robust sandwich estimates of the withinstudy correlations using a joint model as in previous work by CKDEPI [4].
Heterogeneity across the CKDEPI trials can be attributed to many study level factors. We consider four diseasedefined subgroups (CKD with unspecified cause (CKDUC), diabetes (DM), glomerular diseases (GN), and cardiovascular diseases (CVD)) and 16 interventiondefined subgroups (listed in the Additional file 1: Section 1). For the application analyses, we focus on fitting the FPRE and PPRE models, and use different sets of priors under the PPRE model (we also contrast results under the PPRE and PPFE models where subgroups are defined by disease to complement certain simulation analyses). To capture the scenario where there is interest in prediction for a future trial of a new subgroup, we first fit models by leaving out CVD studies, and we generated PPDs for those studies leftout. For interventiondefined subgroups, we fit the model for trials of 7 subgroups for which there were at least 3 studies, and we then generated PPDs for studies of the remaining leftout, smaller subgroups. We also summarize PPDs obtained for studies of the subgroups used for model fitting under these two subgroup schema.
Priors
For the purposes of the simulation study, we utilized diffuse priors, which is a common practice in surrogate endpoint evaluations [4, 6,7,8]. For the fullpooling and nopooling models, we used the \(N(0,10^2)\) prior for the intercept (\(\alpha\) or \(\alpha _i\)) and slope (\(\beta\) or \(\beta _i\)), and for the mean true treatment effect on the surrogate (\(\mu _s\), \(\mu _{si}\) under random effects models) or for trialspecific true effects on the surrogate when fitting the fixed effects models (\(\theta _{2ji}\)). As in previous work in CKD, we used inversegamma priors on variance terms (\(\text {IG(a,b)}\) for shape \(\text {a}\) and scale \(\text {b}\)) [4, 5]. For the fullpooling and nopooling models, we used \(\sigma _{ei}^2,\sigma _{e}^2 \sim \text {IG}(0.001,0.001)\). Where appropriate (random effects models), we also used \(\sigma _s^2,\sigma _{si}^2 \sim\) \(\text {IG}(0.001,0.001)\). The \(\text {IG}(0.001,0.001)\) prior is considered an approximation to the Jeffery’s prior. For partialpooling models, we let \(\tau _e^2 \sim \text {IG}(0.0025,0.001)\) and \(\gamma _e \sim \text {halfnormal}(0,3^{2})\), and for the random effects variants \(\tau _s^2 \sim \text {IG}(0.0025,0.001)\) and \(\gamma _s \sim \text {halfnormal}(0,3^2)\). This combination translates to priors for within subgroup standard deviations in the partialpooling models matching those of the nopooling models to the extent that the 25\(^{\text {th}}\), 50\(^{\text {th}}\), and 75\(^{\text {th}}\) prior percentiles differed by less than 0.05. For \(\sigma _{\alpha }\), \(\sigma _{\beta }\) , \(\sigma _{s}\), we used \(\text {halfnormal}(0,2^2)\). These specific halfnormal priors should be considered highly diffuse for all of our analyses.
For our application analyses, we considered three variations on priors when employing the PPRE model. We considered different priors for partialpooling models because we hypothesized that not only narrow priors, but also highly diffuse priors could unduly influence certain results of our analyses. This is because there is often a limited number of studies available for metaanalysis, which can limit the number of subgroups. The categorization of studies based on constructs such as disease subtype or treatment comparison class may also provide a small number of subgroups. When there are just a few subgroups, the data provides very little information on subgrouptosubgroup variation. The posteriors for betweensubgroup variance terms may be more likely to exhibit minimal updates from the priors based on the data. As such, if priors are so diffuse that they represent a range of variability that is beyond practical reality, so too could the posteriors. As described below, this is also important because betweensubgroup variance parameters are utilized in generating posterior predictive distributions for a trial of a new subgroup. A practical degree of narrowing certain priors could be seen as a necessary middle ground between use of overly narrow or overly diffuse priors. While we narrowed all priors for our constrained “sets” considered, the priors we focused on were for betweensubgroup standard deviations for metaregression parameters. We first used the fully diffuse priors displayed above. We then employed an iterative procedure, where we narrowed priors (emphasizing betweensubgroup standard deviation parameters such as \(\sigma _{\alpha },\sigma _{\beta },\gamma _e\)) until a set was found that produced no more than 0.05 difference in the posterior median, 2.5\(^{\text {th}}\), and 97.5\(^{\text {th}}\) percentiles for the withinsubgroup metaregression posteriors, no matter how much narrower posteriors on betweensubgroup parameters became (referred to as “Constrained Priors Set 1”, which were ultimately the same for either subgroup classification). Finally, we chose what we will refer to as “domainconstrained” priors (“Constrained Priors Set 2”). It is reasonable to choose a prior that constrains betweensubgroup variability to a range that is actually plausible in reality based on subject matter expertise (e.g., through a prior elicitation process). For example, in our case the intercept is the expected true logHR on the clinical endpoint when the true effect on the surrogate is the null effect. When there is a nulleffect on the surrogate, we may suspect a low probability of an expected HR on the clinical endpoint that is very strong in either direction (e.g., below 0.5 or above 2.0), and this logic can be used to provide a moderate to low probability for subgroupspecific intercepts to go beyond these values. Domainconstrained priors were the narrowest among those considered for our analyses, and further detail on choosing these priors is provided in Section 2 of Additional file 1.
We wish to also emphasize that there is an important distinction between narrowing priors for the terms that define variability in the treatment effects on the surrogate across studies, and for the metaregression parameters. The degree of variability of treatment effects on the surrogate influences the extent to which the data allows the quality of the surrogate to be inferred. Priors for the distribution(s) of true treatment effects on the surrogate should be left sufficiently diffuse so as not to restrict variation in effects across studies. In our cases, these were narrowed because the diffuse priors typically used are excessively wide relative to the range of treatment effects that are reasonable. The priors of primary interest are again those governing the degree of variability between subgroups in the metaregression terms (e.g., \(\sigma _{\beta }\)).
Generating posterior predictive distributions
There are a number of strategies that can be used to generate PPDs for the treatment effect on the clinical endpoint based on the treatment effect on the surrogate. In our simulation study, we compare summaries of PPDs for the true treatment effect on the clinical endpoint, which only takes into account uncertainty in the estimated metaregression parameters. This is possible in a simulation analysis because we actually know the true effect on the surrogate [7]. For each study leftout of model fitting, let the true effect on the surrogate for that study be denoted \(\theta _{2}^N\). Then, the PPD for the true effect on the clinical endpoint is generated by taking \(m=1,\dots ,M\) draws (for each of M posterior draws obtained in model fitting) from \(N(\alpha ^{*m} + \beta ^{*m}\theta _{2}^N,\sigma _e^{*m2})\), where \(\alpha ^{*m}, \beta ^{*m}, \sigma _e^{*m}\) represent draws from posteriors from either the fullpooling, nopooling or partialpooling models. For our purposes, subgroupspecific parameters were used when trials were simulated from the same subgroup if using nopooling or partialpooling.
In application analyses, it is only possible to obtain the PPD for the estimated effect on the clinical endpoint, which involves a procedure that takes into account not only uncertainty in the metaregression posteriors, but also uncertainty due to sampling error in the treatment effect estimates. Section 3 of the Additional file 1 provides further detail on the procedures used for prediction in our application analyses. We provide an overview here. For one part of our application analyses, we generated PPDs for trials of existing subgroups. Under fullpooling models, we directly used the single set of estimated metaregression posteriors to map the effect on the surrogate to a predicted effect on the clinical endpoint. Under nopooling and partialpooling models, we used the appropriate subgroupspecific metaregression posteriors estimated directly in model fitting (e.g., to make a prediction for a trial of subgroup \(i \in \{1,\dots ,I\}\) we directly use a draw from the posterior for \(\beta _i\) obtained through model fitting). In our second prediction exercise we generated PPDs for trials of a new subgroup. Only the fullpooling and partialpooling models were used as nopooling models do not facilitate estimation of parameters which allow the surrogate to be applied in a new subgroup. Again, under fullpooling models we used the single set of estimated metaregression posteriors, which induces the assumption that the new study is fully exchangeable with those used for model fitting despite that it pertains to a new subgroup. Under partialpooling models we used draws from population subgroup distributions (e.g., we draw a new \(\beta _{\text {new}}\) from \(N(\beta ,\sigma _{\beta }^2)\)) to map the effect on the surrogate to the predicted clinical effect (that this process requires \(\sigma _{\beta }\), which again may be influenced by the choice of priors in practical scenarios where the number of subgroups is small, is what motivated our interest in careful choosing of priors). This way, for all prediction exercises we were using subgroupspecific metaregression posteriors for prediction, just that these were random draws from the population distribution when applying the surrogate to a new setting under the partialpooling approach. When we are extrapolating the triallevel association to a new subgroup, drawing from the population distribution for each metaregression posterior induces an additional degree of uncertainty into the prediction. This could be seen as a reasonable compromise between applying the fitted fullpooling model, which ignores that the new study represents a new scenario, and not applying the surrogate at all (i.e., the nopooling approach). As discussed when introducing the PPRE approach, the reason why we assume betweensubgroup distributions for \(\sigma _e\) is to facilitate the possibility of drawing subgroupspecific residual standard deviations needed in prediction for a trial of a new subgroup.
Software
For simulation and applied analyses, we used the University of Utah Center for High Performance Computing Linux cluster. On the cluster, we used R version 4.0.3 for data preparation and for generating model summaries. The mcmc sampling algorithms for model fitting were implemented using RStan version 2.21.12 [15]. We utilized the GelmanRubin statistic to assess adequate convergence of chains and the effective sample size to evaluate whether there were sufficient mcmc draws to utilize certain posterior summaries such as tail percentiles (as well as additional visual summaries such as rank plots) [16, 17]. We landed on 10,00020,000 mcmc iterations and 3 independent chains across all analyses. Finally, for the application analyses, the SAS NLMixed procedure was used to estimate treatment effects on the clinical and surrogate endpoints, standard errors, and withinstudy correlations within each study [18]. Example RStan code (PPRE model) and R code (for simulating data) is provided in Section 4 of Additional file 1.
Results
Simulation study results
Contrasting different random effects approaches under gaussian surrogate effects
Table 1 provides summaries of posterior distributions obtained from fitting models on simulation setups 14 (V1 and V2). When there was no heterogeneity in the true metaregression parameters across subgroups (Setup 1), the PPRE model resulted in limited additional uncertainty in posteriors relative to the FPRE model, and also resulted in negligible additional bias via the posterior medians. Across Setups 24, where the strength of the association between effects on the clinical and surrogate endpoint varied across subgroups, for any given metaregression parameter summarized, use of the FPRE model naturally obscured such heterogeneity. The NPRE and PPRE models more adequately produced subgroupspecific metaregression posteriors that suggested heterogeneity in the quality of the surrogate, but in every case the PPRE model produced more precise posteriors than that of the NPRE model. Benefits were especially evident when focusing on posteriors for the metaregression slope. While the PPRE model typically resulted in a small degree of bias, betweensubgroup heterogeneity was potentially more evident due to improved precision. Precision gains under the PPRE over the NPRE model were also observed in the sensitivity analyses considered (Tables 2 and 3 of Additional file 1), including where there was heterogeneity in subgroup sizes. There was a larger degree of pooling away from parameter values true for smaller subgroups under partialpooling, but the PPRE model still allowed for heterogeneity in posterior medians and 95% credible intervals to aid in understanding variations in surrogate quality across subgroups. One potential drawback of all approaches considered was that \(R^2\) posterior medians appeared biased in every scenario evaluated, reflecting the challenge associated with accurate estimation of \(R^2\) with limited data. The average posterior median \(R^2\) under partialpooling was more biased than under nopooling in certain scenarios such as where the surrogate was weak, possibly due to information sharing. The challenges associated with estimating \(R^2\) emphasize why it is important to consider not only reporting \(R^2\) point estimates but also credible intervals. The credible intervals under the PPRE approach remained wide in subgroups where the surrogate was weak. Differences in model performance were also evident in evaluations of modelbased prediction of treatment effects on the clinical endpoint (Table 2). Coverage of true clinical effects by 95% posterior prediction intervals was lower when using the FPRE model even where metaregression parameters were truly the same across subgroups. The NPRE model resulted in highest coverage because of excessively wide prediction intervals, whereas prediction under the PPRE model resulted in improved precision with adequate coverage.
Contrasting fixed vs. random effects partialpooling models under nonGaussian surrogate effects
Where the true treatment effects on the surrogate were nonGaussian, the PPFE model resulted in downward bias in metaregression intercept posteriors (e.g., via the posterior median), whereas the PPRE model either did not result in any bias or resulted in a lesser degree of bias. The PPFE model also resulted in downward bias in the metaregression slope posteriors (regression dilution bias) in subgroups where the surrogate was simulated to be moderatetostrong. We hypothesize that this downward bias was due to the absence of shrinkage of true treatment effects on the surrogate (the “xaxis” variable in the metaregression) towards one another. Because no common distribution is assumed for true effects on the surrogate across studies, the true effects are likely to be more dispersed in contrast to use of the random effects model, where the Gaussian distributional assumption could result in pooling of true treatment effects on the surrogate across studies. Although the random effects model resulted in a small degree of upward bias in the metaregression slope in subgroups where the surrogate was weak, the \(R^2\) posteriors were wider and their median’s lower than under the fixed effects model. This means that the risk of concluding a stronger surrogate than was true in reality was mitigated due to the less optimistic \(R^2\) posteriors. The implications of these biases observed in metaregression posteriors are also evidenced in summaries of prediction in Table 2. Despite the use of fixed effects, coverage of the true treatment effect on the clinical endpoint by 95% posterior predictive intervals under the PPFE model was poorer than under the PPRE model, to the largest extent in subgroups where the surrogate was strongest, which is likely where prediction is of greatest interest.
Application analysis results
The primary goal of the application analysis was to compare metaregression posteriors and PPDs obtained after fitting the PPRE model with different priors. However, we also note that Fig. 7 in the Additional file 1 indicates differences in the metaregression slope estimates under the PPRE and PPFE models from the analysis where models were fit to diseasedefined subgroups. The discrepancy in the posterior median between the two models grew larger for subgroups with a stronger metaregression slope under the PPRE model (under the PPRE model, medians were 0.25, 0.30, 0.35, whereas, under the PPFE model, these were 0.27, 0.29, 0.29).
Table 3 summarizes metaregression slope posteriors from the application analyses (3 diseasedefined subgroups, with 59 studies for model fitting in one analysis and 7 interventiondefined subgroups with 51 studies used for model fitting in the other). Additional file 1: Tables 5 and 6 contain posterior summaries for the full set of metaregression parameters from these analyses. When there were three diseasedefined subgroups, using increasingly narrow priors resulted not only in narrower posteriors for betweensubgroup standard deviation parameters but also for the betweensubgroup mean parameters (even when priors for betweensubgroup means were left the same). However, priors could be narrowed considerably before the withinsubgroup posteriors narrowed. In most cases, even the narrowest priors used did not meaningfully change the inference on subgroupspecific posteriors. When there were 7 subgroups, narrower priors again resulted in equivalent or narrower posteriors for betweensubgroup means and standard deviations, but to a lesser extent when compared to the analysis with fewer subgroups. Similarly, the use of narrower priors resulted in little, if any change in the withinsubgroup posteriors under the options considered for interventiondefined subgroups.
Figures 1 and 2 display and illustrate the implications of the choice of priors on prediction for trials of a new subgroup or an existing subgroup. A subset of trials is displayed in the figures to be concise, and the remaining results are displayed in Additional file 1: Tables 712. Firstly, consider the trials of novel subgroups. For every study, the PPRE model resulted in wider PPDs than the FPRE model. When there were fewer subgroups, predictive distributions for leftout studies were excessively and unrealistically wide when using completely diffuse priors under the PPRE model. The use of constrained priors, especially those motivated by domainspecific reasoning (P3), resulted in PPDs which were narrowest among those obtained, but still wider than those under the FPRE model with diffuse priors. Increasingly constrained priors resulted in more realistic uncertainty in HRs relative to the use of diffuse priors. When predicting for a trial of a novel intervention class (Fig. 2), where more subgroups were available for modelfitting, PPDs were narrower under the PPRE approach (contrast PPDs in Fig. 1 relative to Fig. 2). This could be because of improved inferential precision for parameters associated with betweensubgroup variability when more subgroups are present. These results indicate the PPRE model may be more suitable for prediction to induce an appropriate degree of added uncertainty in predicting a clinical effect in a trial meaningfully different than those used to evaluate the surrogate. However, these results also suggest that PPDs can be excessively wide due to overly diffuse and unrealistic priors and not due to the true quality of the surrogate or its applicability to a new setting. Next, when trials were of a subgroup available for model fitting, the summaries of PPDs under the PPRE model were more robust to the choice of priors relative to prediction for studies of a new subgroup (even for subgroups with few trials). In our setting, predictive distributions were also similar in width under the PPRE relative to FPRE model (evidenced by the 2.5\(^{\text {th}}\) and 97.5\(^{\text {th}}\) percentiles). The PPRE model may thus increase accuracy and precision in prediction of clinical effects for future trials of existing subgroups over use of the FPRE model by allowing subgroupspecific metaregression parameters.
Discussion
Triallevel surrogate endpoint evaluations are often performed on collections of heterogeneous clinical trials. Standard methodology that yields estimates of a single set of metaregression parameters may not be appropriate when trials meaningfully differ across prespecified subgroups, and may also provide unrealistic precision in prediction of clinical effects in new studies that differ from those used to evaluate the surrogate. In this paper, we explored a class of models we refer to as “partialpooling” models, where subgroupspecific metaregressions are assumed, and yet betweensubgroup distributions facilitate data adaptive information sharing across subgroups. Partialpooling models provide a framework both for prediction of treatment effects on the clinical endpoint for a trial that meaningfully differs (is of a new subgroup) from those used for the surrogate evaluation itself and for prediction of future studies of an existing subgroup. There are various challenges in the implementation of a partialpooling approach, such as the choice of priors and distribution for the true treatment effects on the surrogate. We conducted analyses to help guide such decision making.
Under the scenarios considered (e.g., unless there are a large number, exceeding at least 30, of large trials within a given subgroup), our analyses indicated that fitting separate models for surrogate endpoint evaluation within subgroups (nopooling) can result in excessive uncertainty in posteriors. We found that partialpooling methods can be a practical solution with noteworthy benefits (we saw improved precision in posteriors with limited bias due to information sharing in our analyses). If interest is in inference for subgroupspecific metaregression posteriors, our results showed key differences in interpretations when using fixed versus random effects under the partialpooling approach. In our analyses, the partialpooling fixed effect variant produced downward bias in the metaregression slope in subgroups of trials where the surrogate was strong, which translated to more biased prediction. The partialpooling random effects approach did not produce such biases in subgroups where the surrogate was strong. We also did not see noteworthy biases under the partialpooling random effects approach when the Gaussian distributional assumption of the true treatment effects on the surrogate was definitively violated.
A key theme of our results is that posterior distributions of the metaregression parameters within each subgroup under the partialpooling random effects model were robust to a degree of narrowing of priors on betweensubgroup parameters. Similarly, inferences which apply the metaregressions fit under the partial pooling model to estimate the posterior predictive distribution for the treatment effect on the clinical endpoint in a new trial were robust to the prior distributions when the new trial belonged to one of the same subgroups included when fitting the metaregression. Conversely, however, inferences to a new trial which did not belong to one of the subgroups of the prior trials could be highly dependent on the prior distributions, especially for priors on the between subgroup standard deviations of the metaregression parameters. Notably, when highly diffuse priors were used, the posterior predictive distributions for the new trial exhibited very high dispersion, indicating poor ability to extend the relationship between the treatment effects on the surrogate and clinical endpoints from the previous trials to the new trial. The extent to which the choice of priors influenced dispersion of posterior predictive distributions for a trial of a new subgroup was greater when there were fewer subgroups used in model fitting (e.g., if there were 3 as opposed to 7 subgroups, as in our analyses). This suggests that when fitting partialpooling models, not only the use of overly constrained, but also the use of overly diffuse priors can unduly influence certain predictive analyses, and it is thus important to consider a strategy to identify more practical priors.
These quantitative findings are consistent with the general concept that the relationship between treatment effects on the surrogate and clinical endpoints observed in previously conducted trials can be reasonably applied to a new trial if at least one of the following three conditions hold: 1) there is strong evidence for a highquality surrogate with a lack of heterogeneity in performance across a large number of subgroups representing an exhaustive array of intervention types and disease subclassifications; 2) the new trial can be viewed as a member of the same subgroups used to evaluate the surrogate; 3) subject matter knowledge is sufficiently strong to support informative prior distributions, which mitigate heterogeneity in the metaregression parameters between subgroups. This third condition appears related to the stress regulatory agencies place on the strength of evidence for a strong biological relationship between the surrogate and clinical endpoints. If the new trial is evaluating a novel treatment or disease subtype which is fundamentally distinct from any of the previous subgroups of trials, and subject matter knowledge cannot rule out heterogeneity in the metaregression parameters between subgroups, application of the relationship between the surrogate and clinical endpoints observed in the prior trials to the new trial is tenuous. Of course, priors which drive the applicability of the metaregression for prediction to a trial of a new subgroup can be tuned with multiple considerations in mind. In one regard, even without strong subject matter knowledge, basic logic can be used to narrow priors to some degree (such as for the metaregression intercept, a log hazard ratio in our case, which is a commonly used metric and need not be expected to vary excessively). On the other hand, priors could be further constrained if there is strong subject matter knowledge indicating to do so, ideally from multiple stakeholders. Key is that the use of completely diffuse priors is likely to be highly impractical when employing partialpooling models for surrogate evaluation, and the applicability of the surrogate should not depend on the excessive uncertainty imposed by the use of such priors as opposed to those that are realistic according to sound subject matter reasoning.
A noteworthy implication of our findings is that use of a partialpooling model on a diverse collection of studies may be more useful than highly targeted surrogate evaluations on small subsets of studies. For example, there have been many evaluations of surrogates such as tumor response or progression free survival for highly specific tumor types in cancer [19,20,21,22]. However, there may be insufficient data in such settings to truly infer the quality of the surrogate. Partialpooling models (with appropriately defined priors) fit to data sets with more tumor types, for example, may yield more useful information than fitting separate models within the small subgroups.
There are potential limitations to our analyses and findings. The use of Bayesian methods for surrogate evaluation is computationally demanding and we thus considered a limited number of scenarios in our application and simulation analyses. There may also be many additional distributions that could provide further benefit over the Gaussian or fixedeffects approaches we considered. For example, Bujkiewicz et al. showed potential benefits of using a tdistribution for certain terms [8]. Other strategies to refine priors may also be appropriate in other disease settings. Our analyses and discussion are embedded within the context where we initiate the analysis by assuming (through our priors) there may be some heterogeneity in the metaregression across subgroups, but that priors on terms related to betweensubgroup heterogeneity can be narrowed to some degree to ensure the inference is not unduly influenced by unrealistically wide priors. An alternative approach may be to use priors which, to some degree, induce the assumption that there is no betweensubgroup heterogeneity in the quality of the surrogate to start the analysis, forcing the data to provide strong evidence for heterogeneity for the metaregression posteriors to differ at all across subgroups. For example, spike and slab priors could be considered in future work, if the use of such priors aligns with the analytical goals in a given surrogate evaluation.
It is also important to note that there are many approaches to triallevel surrogate endpoint evaluation. For example, Buyse et al. have proposed joint models that can be fit in a singlestage analysis to simultaneously estimate within and betweenstudy surrogacy metrics [23]. While joint modeling strategies have a number of advantages, their uptake appears less common than twostage approaches in practice [9]. Other authors have also used network metaregression strategies for surrogate endpoint evaluations on collections of heterogeneous studies [24]. Finally, within the context of evaluating whether there is heterogeneity in triallevel associations, alternative model structures may be useful depending on the ultimate scientific question. For example, one might consider a single linear regression with interaction terms. One potential drawback to such an approach is that with increasing triallevel factors (e.g., subgroups), such models become increasingly complex, potentially overparameterized, and may pose challenges for nonstatisticians to interpret. On the other hand, an advantage of the partialpooling approaches discussed is that these maintain the linear regression structure within subgroups, which is again an approach that is already familiar to many investigators.
Conclusions
The methods discussed in this paper are applicable to the twostage approach often used to establish the triallevel validity of a surrogate endpoint. Because establishing triallevel surrogacy requires a collection of clinical trials, analysts are often confronted with limited data. A strategy to overcome such data limitations is to incorporate a broad collection of studies with various disease and therapy subcategories. However, analyses on such data in, for example, chronic kidney disease has encouraged regulatory agencies to question whether surrogate performance varies across prespecified and clinically motivated subgroups of trials defined by disease or intervention classes. Analyses requiring subdividing available trials into subgroups will only exacerbate issues associated with model fitting on small amounts of data. We performed analyses that showed that partialpooling modeling approaches may improve the potential to infer the quality of the surrogate within subgroups of trials even on limited datasets. However, our analyses also showed that even diffuse priors used for partialpooling analyses can strongly influence the perceived quality of the surrogate as well as the ability to predict the treatment effect on the clinical endpoint. We discussed strategies that can be used to constrain priors used for the analysis to obtain more realistic estimates of key parameters for surrogate endpoint evaluation. Ultimately, analyses of a surrogate endpoint could result in appropriately expanding the feasibility of trials in an entire disease area, or could lead to the use of an endpoint that is not ultimately useful for patients. Partialpooling models should be considered for surrogate endpoint evaluation on heterogeneous collections of trials, but the choice of a given model and priors to implement the model should be handled rigorously.
Availability of data and materials
Data restrictions apply to the data used for the application analyses presented, for which we were given access under license for this manuscript. These data are not publicly available due to privacy or ethical restrictions. The programs used to generate data used for the purposes of the simulation study is provided in the supplemental materials.
Abbreviations
 CKD:

Chronic kidney disease
 GFR:

Glomerular filtration rate
 RE:

Random effects
 FP:

Fixedeffects
 FP:

Fullpooling
 NP:

Nopooling
 PP:

Partialpooling
 PPD:

Posterior predictive distribution
 DM:

Diabetes mellitus
 GN:

Glomerular disease
 CVD:

Cardiovascular disease
 IG:

Inversegamma
References
Thompson A, Smith K, Lawrence J. Change in estimated GFR and albuminuria as end points in clinical trials: a viewpoint from the FDA. Am J Kidney Dis. 2020;75(1):4–5.
Food and Drug Administration US. Guidance for industry: expedited programs for serious conditions  drugs and biologics. 2014. https://www.fda.gov/regulatoryinformation/searchfdaguidancedocuments/expeditedprogramsseriousconditionsdrugsandbiologics. Accessed 1 Jan 2022.
Levey AS, Gansevoort RT, Coresh J, Inker LA, Heerspink HL, Grams M, et al. Change in albuminuria and GFR as end points for clinical trials in early stages of CKD: a scientific workshop sponsored by the National Kidney Foundation in collaboration with the US Food and Drug Administration and European Medicines Agency. Am J Kidney Dis. 2020;75(1):84–104.
Inker LA, Heerspink HJL, Tighiouart H, Levey AS, Coresh J, Gansevoort RT, et al. GFR slope as a surrogate end point for kidney disease progression in clinical trials: a metaanalysis of treatment effects of randomized controlled trials. J Am Soc Nephrol. 2019;30(9):1735–45.
Heerspink HJL, Greene T, Tighiourt H, Gansevoort RT, Coresh J, Simon AL, et al. Change in albuminuria as a surrogate endpoint for progression of kidney disease: a metaanalysis of treatment effects in randomised clinical trials. Lancet Diabetes Endocrinol. 2019;7(2):128–39.
Daniels MJ, Hughes MD. Metaanalysis for the evaluation of potential surrogate markers. Stat Med. 1997;16(17):1965–82.
Papanikos T, Thompson JR, Abrams KR, Stadler N, O C, Taylor R, et al. Bayesian hierarchical metaanalytic methods for modeling surrogate relationships that vary across treatment classes using aggregate data. Stat Med. 2020;39(8):1103–1124.
Bujkiewicz S, Thompson JR, Spata E, Abrams KR. Uncertainty in the Bayesian metaanalysis of normally distributed surrogate endpoints. Stat Methods Med Res. 2017;26(5):2287–318.
Belin L, Tan A, De Rycke Y, Dechartress A. Progressionfree survival as a surrogate for overall survival in oncology: a methodological systematic review. Br J Cancer. 2022;122(11):1707–14.
Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate randomeffects metaanalysis and the estimation of betweenstudy correlation. BMC Med Res Methodol. 2007;7(3):1471–2288.
Riley RD. Multivariate metaanalysis: the effect of ignoring withinstudy correlation. J R Stat Soc Series A Stat Soc. 2009;172(4):789–811.
Jones HE, Ohlssen DI, Neuenschwander B, Racine A, Branson M. Bayesian models for subgroup analysis in clinical trials. Clin Trials. 2011;8(2):129–43.
Prasad V, Kim C, Burotto M, Vandross A. The strength of association between surrogate end points and survival in oncology: a systematic review of triallevel metaanalyses. JAMA Intern Med. 2015;175(8):1389–98. https://doi.org/10.1001/jamainternmed.2015.2829.
Vonesh E, Tighiouart H, Ying J, Heerspink HJL, Lewis J, Staplin N, et al. Mixedeffects models for slopebased endpoints in clinical trials of chronic kidney disease. Stat Med. 2019;38(22):4218–39.
RStan Development Team. Rstan: The R interface to Stan. 2020. https://cran.rproject.org/web/packages/rstan/rstan.pdf. Accessed 1 Dec 2022.
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. New York: Chapman and Hall; 1995.
Vehtari A, Gelman A, Simpson D, Carpenter B, Burkner PC. Ranknormalization, folding, and localization: an improved rhat for assessing convergence of MCMC (with discussion). Bayesian Anal. 2021;16(2):667–718. https://doi.org/10.1214/20BA1221.
The SAS Institute. The NLMIXED procedure. 2015. https://support.sas.com/documentation/onlinedoc/stat/141/nlmixed.pdf. Accessed 1 Dec 2022.
Kataoka K, Nakamura K, Mizusawa J, Kato K, Eba J, Katayama H, et al. Surrogacy of progressionfree survival (PFS) for overall survival (OS) in esophageal cancer trials with preoperative therapy: Literaturebased metaanalysis. Eur J Surg Oncol. 2017;43(10):1956–61.
Chen YP, Sun Y, Chen L, Mao YP, Tang LL, Li WF, et al. Surrogate endpoints for overall survival in combined chemotherapy and radiotherapy trials in nasopharyngeal carcinoma: Metaanalysis of randomised controlled trials. Radiother Ooncol. 2015;116(2):157–66.
Gharzai LA, Jiang R, Wallington D, Jones G, Birer S, Jairath N, et al. Intermediate clinical endpoints for surrogacy in localised prostate cancer: an aggregate metaanalysis. Lancet Oncol. 2021;22(3):402–10.
Michiels S, Pugliano L, Marguet S, Grun D, Barinoff J, Cameron D, et al. Progressionfree survival as surrogate end point for overall survival in clinical trials of HER2targeted agents in HER2positive metastatic breast cancer. Ann Oncol. 2016;27(6):1029–34.
Buyse M, Molenberghs G, Paoletti X, Oba K, Alonso A, Elst WV, et al. Statistical evaluation of surrogate endpoints with examples from cancer clinical trials. Biom J. 2016;58(1):104–32.
Bujkiewicz S, Jackson D, Thompson JR, Turner RM, Stadler N, Abrams KR, et al. Bivariate network metaanalysis for surrogate endpoint evaluation. Stat Med. 2019;38(18):3322–41.
Acknowledgements
The support and resources from the Center for High Performance Computing at the University of Utah are gratefully acknowledged. We thank all investigators, study teams, and participants of the studies included in the Analysis set 2: application analysis of CKD trials and Application analysis results sections. Specific details for the same studies used in our analyses have been detailed in previous work by CKDEPI [4, 5].
We also thank the following CKDEPI investigators/collaborators representing their respective studies (study acronyms/abbreviations are listed in Table 13 of Additional file 1): AASK: Tom Greene; ABCD: Robert W. Schrier, Raymond O. Estacio; ADVANCE: Mark Woodward, John Chalmers, Min Jun; AIPRI (Maschio): Giuseppe Maschio, Francesco Locatelli; ALTITUDE: HansHenrik Parving, Hiddo JL Heerspink; Bari (Schena): Francesco Paolo Schena, Manno Carlo; Bologna (Zucchelli): Pietro Zucchelli, Tazeen H Jafar; Boston (Brenner): Barry M. Brenner; canPREVENT: Brendan Barrett; Copenhagen (Kamper): AnneLise Kamper, Svend Strandgaard; CSG (Lewis 1992, 1993): Julia B. Lewis, Edmund Lewis; EMPAREG OUTCOME: Christoph Wanner, Maximilian von Eynatten; Fukuoka (Katafuchi): Ritsuko Katafuchi; Groningen (van Essen): Paul E. de Jong, GG van Essen, Dick de Zeeuw; Guangzhou (Hou): Fan Fan Hou, Di Xie; HALTPKD: Arlene Chapman, Vicente Torres, Alan Yu, Godela Brosnahan; HKVIN: Philip KT Li, KaiMing Chow, CheukChun Szeto, ChiBon Leung; IDNT: Edmund Lewis, Lawrence G. Hunsicker, Julia B. Lewis; Lecco (Pozzi): Lucia Del Vecchio, Simeone Andrulli, Claudio Pozzi, Donatella Casartelli; Leuven (Maes): Bart Maes; Madrid (Goicoechea): Marian Goicoechea, Eduardo Verde, Ursula Verdalles, David Arroyo; Madrid (Praga): Fernando CaravacaFontán, Hernando Trujillo, Teresa Cavero, Angel Sevillano; MASTERPLAN: Jack FM Wetzels, Jan van den Brand, Peter J Blankestijn, Arjan van Zuilen; MDRD Study: Gerald Beck, Tom Greene, John Kusek, Garabed Eknoyan; Milan (Ponticelli): Claudio Ponticelli, Giuseppe Montagnino, Patrizia Passerini, Gabriella Moroni ORIENT: Fumiaki Kobayashi, Hirofumi Makino, Sadayoshi Ito, Juliana CN Chan; Hong Kong Lupus Nephritis (Chan): Tak Mao Chan; REIN: Giuseppe Remuzzi, Piero Ruggenenti, Aneliya Parvanova, Norberto Perico; RENAAL: Dick De Zeeuw, Hiddo JL Heerspink, Barry M. Brenner, William Keane; ROAD: Fan Fan Hou, Di Xie; Rochester (Donadio): James Donadio, Fernando C. Fervenza; SHARP: Colin Baigent, Martin Landray, William Herrington, Natalie Staplin; STOPIgAN: Jürgen Floege, Thomas Rauen, Claudia Seikrit, Stefanie Wied; Strasbourg (Hannedouche): Thierry P. Hannedouche; SUNMACRO: Julia B. Lewis, Jamie Dwyer, Edmund Lewis; Texas (Toto): Robert D. Toto; Victoria (Ihle): Gavin J. Becker, Benno U. Ihle, Priscilla S. KincaidSmith.
Funding
The study was funded by the National Kidney Foundation (NKF). NKF has received consortium support from the following companies: AstraZeneca, Bayer, Cerium, Chinook, Boehringer Ingelheim, CSL Behring, Novartis and Travere. This work also received support from the Utah Study Design and Biostatistics Center, with funding in part from the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR002538.
Author information
Authors and Affiliations
Contributions
Willem Collier was the primary author for all sections of the manuscript, worked on the design and implementation of all analyses, wrote the programs used for analyses and results reporting, and generated summaries. Tom Greene contributed to writing and editing in all sections throughout the manuscript and helped in the design of all analyses. Benjamin Haaland contributed to writing and editing in all sections throughout the manuscript and helped in the design of all analyses. Lesley Inker contributed to writing and editing of the introduction, application analysis, and discussion sections, and helped to design the application analyses. Hiddo Heerspink contributed to writing and editing of the introduction, application analysis, and discussion sections, and helped to design the application analyses.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The analyses presented in this study were deemed exempt from review by the Tufts Medical Center Institutional Review Board. The research presented in this paper complies with all relevant ethical regulations (Declaration of Helsinki). Only aggregated data from previously conducted clinical trials are presented. The protocol and consent documents of the individual trials used were reviewed and approved by each trial’s participating centers’ institutional review board, and informed consent was provided by all participants of the studies for which results were aggregated for our analyses.
Consent for publication
Not applicable.
Competing interests
Willem Collier received funding from the National Kidney Foundation for his graduate studies while working on aspects of the submitted work.
Benjamin Haaland is a full time employee of Pentara Corporation and consults for the National Kidney Foundation.
Hiddo JL Heerspink received grant support from the National Kidney Foundation to his institute and is a consultant for AbbVie, AstraZeneca, Bayer, Boehringer Ingelheim, Chinook, CSL Behring, Dimerix, Eli Lilly, Gilead, GoldFinch, Janssen, Merck, Novo Nordisk and Travere Pharmaceuticals.
Lesley A Inker reports funding from National Institutes of Health, National Kidney Foundation, Omeros, Chinnocks, and Reata Pharmaceuticals for research and contracts to Tufts Medical Center; consulting agreements to Tufts Medical Center with Tricida; and consulting agreements with Diamerix.
Tom Greene reports grant support from the National Kidney Foundation, Janssen Pharmaceuticals, Durect Corporation and Pfizer and statistical consulting from AstraZeneca, CSL and Boehringer Ingleheim.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Collier, W., Haaland, B., Inker, L. et al. Comparing Bayesian hierarchical metaregression methods and evaluating the influence of priors for evaluations of surrogate endpoints on heterogeneous collections of clinical trials. BMC Med Res Methodol 24, 39 (2024). https://doi.org/10.1186/s12874024021700
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874024021700