Network meta-analysis of survival data with fractional polynomials

Background Pairwise meta-analysis, indirect treatment comparisons and network meta-analysis for aggregate level survival data are often based on the reported hazard ratio, which relies on the proportional hazards assumption. This assumption is implausible when hazard functions intersect, and can have a huge impact on decisions based on comparisons of expected survival, such as cost-effectiveness analysis. Methods As an alternative to network meta-analysis of survival data in which the treatment effect is represented by the constant hazard ratio, a multi-dimensional treatment effect approach is presented. With fractional polynomials the hazard functions of interventions compared in a randomized controlled trial are modeled, and the difference between the parameters of these fractional polynomials within a trial are synthesized (and indirectly compared) across studies. Results The proposed models are illustrated with an analysis of survival data in non-small-cell lung cancer. Fixed and random effects first and second order fractional polynomials were evaluated. Conclusion (Network) meta-analysis of survival data with models where the treatment effect is represented with several parameters using fractional polynomials can be more closely fitted to the available data than meta-analysis based on the constant hazard ratio.


Background
Healthcare decision-making requires comparisons of all relevant competing interventions. If the available evidence consists of a network of multiple randomized controlled trials (RCTs) involving treatments compared directly or indirectly or both, it can be synthesized by means of network meta-analysis [1][2][3][4]. Network metaanalysis of survival data is often based on the reported hazard ratio, which relies on the proportional hazards assumption.
The proportional hazards assumption that underlies current approaches of evidence synthesis of survival outcomes is not only often implausible, but can have a huge impact on decisions based on cost-effectiveness analysis. In extreme cases survival curves intersect and the hazard ratio is not constant. Furthermore, even if survival functions do not intersect, the hazard functions might and the assumption is violated. For cost-effectiveness evaluations of competing interventions that aim to improve survival, differences in expected survival between the competing interventions are of interest. Common practice is to assume a certain parametric survival function for the baseline intervention (e.g. Weibull) and apply the treatment specific constant hazard ratio obtained with the (network) meta-analysis to calculate a corresponding survival function enabling comparisons of expected survival. Since the tail of the survival function has a great impact on the expected survival, violations of the constant hazard ratio can lead to severely biased estimates. Hence, the proportional hazards assumption has become a source of concern in drug reimbursement based on cost-effectiveness evidence.
As an alternative to a network meta-analysis of survival data in which the treatment effect is represented by a single parameter, i.e. the hazard ratio, a multi-dimensional treatment effect approach is presented. With fractional polynomials the hazard over time is modeled by which the treatment effect is represented with multiple parameters [5]. With this approach a network meta-analysis of survival can be performed with models that can be fitted more closely to the data. With these parametric hazard functions, expected survival can be calculated to facilitate cost-effectiveness analysis. The method is illustrated with an example.

Fractional polynomials and the hazard function
Royston and Altman introduced fractional polynomials as an extension of polynomial models for determining the functional form of a continuous predictor [5]. These models are well suited for nonlinear data. In contrast to categorizing continuous predictors, the analysis is no longer dependent on the number and choice of cut points [6]. Fractional polynomials have been used in many applications including survival and meta-regression analysis [7][8][9].
By transforming t, a continuous variable, in a linear model the first-order fractional polynomial model is obtained: The power p is chosen from the following set: -2. -1, -0.5, 0, 0.5, 1, 2, 3 with t 0 = log t The second order fractional polynomial is defined as: If p 1 = p 2 = p the model becomes a 'repeated powers' model: Royston and Altman showed that by varying p 1 and p 2 and the parameters b 0, b 1 and b 2 a wide range of curve shapes can be obtained [5,6,8,10,11].
The first order fractional polynomial for the hazard at time t of a two arm treatment B versus A randomized controlled trial can be presented as follows: ln (h kt ) = β 0k + β 1k t p with t 0 = log(t) where: h kt reflect the hazard with treatment k at time t. The vector μ 0 μ 1 reflects the parameters b 0 and b 1 of the 'baseline' treatment A, whereas the vector d 0 d 1 reflects the difference in b 0 and b 1 of the log hazard curve for treatment B relative to A. The parameter d 0 corresponds to the treatment effect with a proportional hazard model. Under the proportional hazards assumption d 1 equals 0. If d 1 ≠ 0, d 1 reflects the change in the log hazard ratio over time. Hence, by incorporating d 1 in addition to d 0 a multi-dimensional relative treatment effect is used rather than single parameter for the relative treatment effect.
Hazard functions can have different shapes, including a constant hazard over time, a linear increasing or decreasing hazard over time, and bathtub shaped. If in equation 4 b 1 equals 0, a constant log hazard function is obtained, reflecting exponentially distributed survival times. If b 1 ≠ 0 and p = 1 a linear hazard function is obtained which corresponds to a Gompertz survival function. If b 1 ≠ 0 and p = 0 a Weibull hazard function is obtained, and d 0 d 1 reflects the difference in respectively the scale and shape of the Weibull log hazard curve for treatment B relative to A. Extending the firstorder fractional polynomial hazard function to a second-order fractional polynomial increases the possible (differences in) shapes even further. Hence, modeling the hazard function of competing interventions with fractional polynomials provides a general framework that includes some of the commonly used parametric survival functions and does not rely on the constant hazard ratio assumption.
Network meta-analysis model for survival data using fractional polynomials Network meta-analysis has been presented as an extension of traditional meta-analysis by including multiple different pairwise comparisons across a range of different interventions. Meta-analysis models for the comparison of treatment B versus A can be extended to models allowing simultaneous comparisons of B versus A as well as C versus A [1][2][3][4]. To appreciate the randomization of the different studies in the evidence synthesis, a study of a certain pairwise comparison has to be 'linked' to any of the other studies in the network. When the network consists of AB-trials, AC-trials, as well as BC trials, we have a mixture of direct and indirect comparisons and these analyses have been called mixed treatment comparisons (MTC) [3]. For a network meta-analysis, the similarity and consistency relation needs to hold regarding the estimated model parameters [3,12,13]. If AB trials and AC trials are comparable on effect modifiers (i.e. covariates that affect the relative treatment effect), then an indirect estimate for the relative effect of C versus B (d BC ) can be obtained from the estimates of the effect of B versus A (d AB ) and the effect of C versus A (d AC ): d BC = d ACd AB . In essence, this implies that the same d BC is obtained as would have been estimated in a three arm randomized ABC trial. In general, for a model described by the function f x (t) where x = A, B, or C, we have: . For a network meta-analysis of survival data, the comparison can be performed on the log hazard ratio, and this relation needs to apply to every timepoint t: ln(HR BC (t))-ln (HR AC (t))-ln(HR AB (t)) with HR BC (t) reflecting the hazard ratio of C relative to B at time t. Based on equation 4 it follows that: Hence, the differences in the model parameters b 0 and b 1 of the first order fractional polynomials are independent of time. Furthermore, according to equation 5 the difference in b 0 and b 1 of the BC comparison can be described by the difference in these parameters for the AC comparison and AB comparison. Given this relation, a network meta-analysis can be performed based on the differences in b 0 and b 1 of log hazard curves across studies. Similarly, the transitivity assumption holds for fractional polynomials of any order.
Using a similar notation as Cooper et al. [13], the random effects model for a network meta-analysis of survival data based on a fractional polynomial of order M for k treatments labeled A, B, C, etc can be described as: where h jkt reflects the underlying hazard rate in trial A random effects model with only a heterogeneity parameter for d 0Ak implies that the between study variance of the log hazard ratios remains constant over time. Random effects models with (additional) heterogeneity parameters for d 1Ak ,..., d MAk have the flexibility to capture between study variance regarding changes in the log hazard ratios over time.
The random effects fractional polynomial model in equation 6 treats multiple-arm trials (>2 treatments) without taking account of the correlations between the trial-specific δs that they estimate. Bayesian random effects fractional polynomials models with only a heterogeneity parameter for d 0Ak can be easily extended to fit trials with 3 or more treatment arms by decomposition of a multivariate normal distribution as a series of conditional univariate distributions [13]. If then the conditional univariate distributions for arm i given the previous 1,....(i-1) arms are: Different values for the powers p 1 and p 2 of the fractional polynomials correspond to different models. The best fitting model can be selected based on goodness-offit comparisons. The goodness of fit can be computed as the difference between the deviance for the fitted model and the deviance for the saturated model (which fits the data perfectly). Within a frequentist framework the Akaike information criterion (AIC) can be used for model selection [14]. In a Bayesian framework the Bayesian information criterion (BIC) or deviance information criterion (DIC) can be used [15,16].

Illustrative example
To understand how the analytical approach proposed can be applied in practice, an example is presented for oncology where trials are typically focused on overall (and progression free) survival.
Lung cancer is a leading cause of cancer mortality in both men as well as women, with non-small cell lung carcinoma (NSCLC) accounting for 80% of all cases [17]. Second line treatment for advanced NSCLC includes docetaxel and pemetrexed [18]. Gefitinib has been studied as second line treatment as well.
A literature search identified seven RCTs comparing docetaxel with best-supportive care (1 study), gefitinib with best-supportive care (1 study), docetaxel with gefitinib (4 studies), and docetaxel with pemetrexed (1 study) [19][20][21][22][23][24][25]. The network of RCTs is presented in Figure 1 and shows that for the comparisons of BSC, docetaxel and gefitinib both direct and indirect evidence is available. For each treatment arm in each study reported Kaplan-Meier curves were digitized (Engauge Digitaliser v4.1) In Figure 2 the scanned survival proportions are presented. This aggregate data was analyzed with fractional polynomial network meta-analysis models.
Whilst network meta-analysis can be performed with a frequentist or a Bayesian approach, for this manuscript the focus is on the Bayesian approach. Within the Bayesian framework, analyses consist of data, likelihood, parameters, a model, and prior distributions. More specifically, Bayesian analysis involves the formal combination of a prior probability distribution that reflects a prior belief of the possible values of the parameters of the model with a likelihood distribution of the model parameters based on the observed data in the different studies to obtain a posterior probability distribution of these [26][27][28].
The scanned survival curves can be divided into multiple consecutive intervals over the follow-up period. Extracted survival proportions were used to calculate the incident number of deaths for each interval and patients at risk at the beginning of that interval. A binomial likelihood distribution of the incident number of deaths for every interval [t,t+Δt] (Δt is the time from t to t+1) of the Kaplan-Meier curves can be described according to: Where r jkt is the observed number of incident deaths in the interval [t,t+Δt] for study j and treatment k. n jkt Is the number of subjects alive at t, adjusted for the subjects censored in the interval [t,t+Δt]. p jkt is the observed cumulative incidence of deaths in the interval [t,t+Δt]. In the appendix more detail is provided how a dataset for n jkt and r jkt can be obtained from the Kaplan-Meier curve taking into account censoring in the interval [t,t +Δt]. In Table 1 the incident deaths and patients at risk for every 2-month period of the individual studies are Lee, 2010Kim, 2008Maruyama, 2008Cufer, 2006 Gefitinib Docetaxel Pemetrexed Hanna, 2004 Placebo Chang, 2006Shepherd, 2000 BSC Figure 1 Network of randomized controlled trials.   presented. When the time interval is relatively short, the hazard rate can be assumed constant within the time interval, and the hazard rate h jkt is: In this example fixed and random effects first and second order fractional polynomial models were used with powers chosen from the following set: -2. -1, -0.5, 0, 0.5, 1, 2, 3 with t 0 = log t according to eq. 6. Two different random effects second order fractional polynomial models were compared: one model with a heterogeneity parameter for d 0 , and one model with heterogeneity parameters for all three treatment parameters (d 0 , d 1 or d 2 ). Although random effects models with a heterogeneity parameter for only d 1 or d 2 can be estimated as well, these were considered less appropriate because these models assume that heterogeneity in treatment effects only develop over time, and is not present at treatment initiation. In other words: heterogeneity is only a function of time, and not (also) a function of differences in patient characteristics across studies. If only one heterogeneity parameter is (to be) used, it should be for d 0 because it assumes constant variance for the complete follow-up period.
The non-informative prior distributions as used for the parameters of the random effects second-order fractional polynomial model with heterogeneity corresponding to d 0 , d 1 and d 2 are presented (according to equation 6): For a first order fractional polynomial model these 3dimensional multivariate prior distributions are reduced to bivariate normal distributions. With a random effects model, where only for d 0 a heterogeneity parameter is used, the corresponding prior distribution can be defined as σ~uniform(0,2). When all relative effects parameters are assumed fixed, there is no heterogeneity to be estimated, and no such prior distribution needs to be defined.
The parameters of the different models were estimated using a Markov Chain Monte Carlo (MCMC) method as implemented in the WinBUGS software package [29]. (See appendix for the code.) The WinBUGs sampler, using two chains, was run for 30000 iterations for the models and these were discarded as 'burn-in' and the model was run for a further 50 000 iterations on which inferences were based. Convergence of the chains was confirmed by the Gelman-Rubin statistic.
The DIC was used to compare the goodness-of-fit of different fixed and random effects models with first and second order fractional polynomials with different powers. DIC provides a measure of model fit that penalizes model complexity according to DIC =D + pD, pD =D −D [16].
D is the posterior mean residual deviance [15], pD is the 'effective number of parameters' andD is the deviance evaluated at the posterior mean of the model parameters. The model with the lowest DIC, is the model providing the 'best' fit to the data. For every combination of p1 and p2 the DIC was determined. The powers p1 and p2 corresponding to the best fitted fixed effects models were also used to evaluate corresponding random effects models.

Illustrative example
The model fit statistics for the different models are presented in Table 2. The fixed effects Weibull model (p1 = 0) was one of the worst regarding goodness-of-fit. Of the first order fractional polynomial models, the model with power p1 = -2 was the best fit. Adding a second time-related effect to this first order fractional polynomial model dramatically improved the model fit. Although the model with p1 = -2 and p2 = 1 has the lowest DIC of all the fixed effects models evaluated, the model with p1 = -2 and p2 = 2 and the model with p1 = -2 and p2 = 3 deserve consideration as well because these are within 1-2 points of the "best" model [16]. However, the modeled hazard function with p2 = 1 is not as sensitive to small sample fluctuations near the end of the follow-up of each study as the models with p2 = 2 or p2 = 3. To facilitate the extrapolation of the survival curves beyond the trial period, the model with p1= -2 and p2 = 1 was considered the most appropriate fixed effects model. The corresponding random effects models showed similar values for the DIC, and as such the random effects models were considered more appropriate. The model with a heterogeneity parameter for d 0 only showed more stable parameter estimates than the random effects model with heterogeneity parameters for d 0 , d 1 and d 2 . Given the similar fit of these random effect models, the model with one heterogeneity parameter was used. Table 3 provides parameter estimates for the fixed effects first and second order fractional polynomial models with p1 = -2 and p2 = 1, as well as the corresponding random effects model with a heterogeneity parameter for d 0 . Based on the pooled relative treatment effects regarding b 0 , b 1 and b 2 of each intervention relative to docetaxel (d 0Ak , d 1Ak , and d 2Ak with k = B,C,D corresponding to respectively gefitinib, BSC, and pemetrexed) the corresponding hazards ratios as a function of time were obtained: ln(HR Ak ) = d 0Ak + d 1Ak · t -2 + d 2Ak · t. The hazard ratios over time obtained with the random effects model are presented in Figure 3. It is obvious that the assumption of constant hazards ratio does not apply to any comparison with BSC involved. Although for the comparison of gefitinib relative to docetaxel a constant hazard ratio over time might be defended, the additional indirect evidence via BSC for this comparison clearly does not allow this assumption. Based on this observation, one can argue that d 1 and d 2 for gefitinib and pemetrexed relative to docetaxel can be set to zero, and that d 1 and d 2 only need to be estimated for BSC versus docetaxel. However, it has to be realized that by making that assumption the uncertainty regarding the proportional hazards assumption for gefitinib and pemetrexed is no longer taken into consideration.
In the example there is both direct evidence (i.e. headto-head studies) and indirect evidence (via BSC) for the comparison of gefitinib versus docetaxel. As such, the network meta-analysis combining both direct and indirect comparisons uses more information than a pairwise meta-analysis of the 4 gefitinib versus docetaxel studies. In Figure 4, the hazard ratio over time is presented for the pairwise meta-analysis of gefitinib versus docetaxel based on 4 studies, as well as the mixed treatment comparison. The estimates of the two analyses are comparable (at least from month 3 onwards) suggesting that inconsistency between direct and indirect estimates is not an issue of concern. However, the uncertainty of the hazard ratio over time is greater with the pairwise metaanalysis of 4 studies than the network meta-analysis of 6 studies. By incorporating indirect evidence the parameters of the fractional polynomial can be estimated more precisely in this example.
By using the average of study specific estimates for b 0 , b 1 and b 2 with docetaxel as the reference, the expected b 0 , b 1 and b 2 for the other interventions were calculated using the relative treatment effects d 0Ak , d 1Ak , and d 2Ak .
(See Table 4) The corresponding hazard and survival functions for each of the four interventions are presented in Figure 5 and 6A. With these parametric survival curves it is now possible to calculate the expected   Table 4 as well. When, as is common practice for cost-effectiveness analysis, a constant hazards ratio in combination with a Weibull distribution was assumed, the DIC of the model was 959.1. The fitted survival curves for docetaxel, gefitinib, BSC, and pemetrexed are presented in Figure  6B. The expected survival was respectively 15.1, 14.5, 8.0, and 15.2 months, and shows the overestimate relative to the random effects second order fractional polynomial model. The greatest difference is observed for the BSC survival curve, and the tails of the active  Figure 3 Hazard ratio over time for each of the interventions relative to docetaxel as obtained with random effects second order fractional polynomial (p1 = -2, p2 = 1) network meta-analysis model. (Corresponding parameter estimates are presented in Table 3: interventions. To illustrate that the fractional polynomials produce a visibly better fit to the data than a simple model like the Weibull with a proportional hazards assumption, these models are presented for 3 studies in Figure 7. For the other 4 studies, the difference between the fractional polynomial curves and Weibull curves was not as great.

Discussion
In this paper a method for (network) meta-analysis of survival data using a multi-dimensional treatment effect is presented as an alternative to synthesis of the constant hazards ratio. With first or second order fractional polynomials the hazard functions of the interventions compared in a trial are modeled and the difference in the parameters of these fractional polynomials within a trial are considered the multidimensional treatment effect and synthesized (and indirectly compared) across studies. In essence, with this approach the treatment effects are represented with multiple parameters rather than a single parameter or outcome.
Meta-analysis of survival data using the constant hazards ratio can be considered a special case of the model presented here. When in equation 6 d 1Ak , d 2Ak , ... d MAk equal 0, only the time independent parameters b 0jk can be different across treatments within a trial and accordingly d 0Ak reflect the constant log hazard ratio of treatment k relative to A. (Please note that the baseline hazard can still be modelled with multiple b 1jk , b 2jk , ..., b Mjk that can be different from 0, but these are constant across all interventions within a trial. With a Cox proportional hazards model the baseline hazard is unconstrained and not described by parametric distribution or function.) The advantage of the approach presented here is that it does not rely on the proportional hazards assumption and as a result the model used can be more closely fitted to available survival data. In a situation, where the violation of the proportional hazard ratio is less clear due to limitations of the data, it still can be considered useful modeling a multi-dimensional treatment effect to express the uncertainty in the violation of the assumption of proportional hazards.
For network meta-analysis it is important that for the relative effect measure of interest the transitivity assumption holds [3,12,13]. Although the transitivity assumption holds for the constant (log) hazards ratio, violations of the proportional hazards assumption within or across trials, can result in biased indirect and mixed treatment comparisons of relative survival over time. By incorporating additional parameters for the treatment effect, the proportional hazards assumption is relaxed and therefore indirect and mixed treatment comparisons are arguably less likely to result in biased indirect estimates.
With a (network) meta-analysis the value of randomization only holds within a trial, and not across trials [3,12,13]. In other words, patients are randomly assigned to treatments within a trial, but patients are not randomly assigned to different trials. As a result there is the risk that patients assigned to the different trials are not comparable. If the distribution of patient and study level characteristics that modify the relative treatment effects is not similar across trials indirectly compared results will be affected by confounding bias [13]. In the models presented in this paper, treatment effect estimates will be biased if there is an imbalance in the distribution of treatment*covariate interactions across studies regarding the multidimensional treatment effect. Hence, it is suggested to expand the current models by incorporating treatment*covariate interactions. An additional advantage is that it can explain heterogeneity and facilitates the prediction of expected survival for subgroups [13].
In the example analysis, aggregate level data, i.e. scanned Kaplan-Meier curves, were used for all interventions compared. However, the models can also be used in combination with individual patient level data, using a different likelihood. Patient-level analyses have the advantage that no (conservative) assumption has to be made regarding the censoring process. Furthermore, patient-level network meta-analyses have greater power to estimate meta-regression models thereby reducing inconsistency and providing the opportunity to explore differences in effect among subgroups. However, obtaining patient-level data for all RCTs in the network may be considered infeasible. As an alternative one could use patient-level data when available, and aggregate level data for studies in the network for which such data is not available thereby improving parameter estimation over aggregate-data-only models. Drug coverage decision-making is often informed by cost-effectiveness analysis where expected costs and expected outcomes are compared. When the main objective of the competing interventions is to improve survival, the primary outcome of interest is expected survival or for-quality-of-life adjusted expected survival. Unfortunately, given the available follow-up in the clinical trials, survival data is often censored and therefore the expected survival cannot be obtained without extrapolation of the data over time. Standard practice is to extrapolate the available survival data for the reference treatment using a parametric survival function (e.g. Weibull, lognormal or log-logistic). This baseline hazard function is multiplied with the constant hazard ratio for each of the competing interventions relative to this baseline to obtain hazard functions for the interventions of interest. The assumption of a constant hazards function implies that only the scale of these parametric functions is affected by treatment, and accordingly all the competing interventions have the same shape. Since the tail of the survival function has a great impact on the expected survival this assumption may lead to biased or at least highly uncertain estimates regarding differences in expected survival and therefore cost-effectiveness estimates. Given the multi-dimensional treatment effect of the approach presented in this paper, the parametric hazards functions of the competing interventions can be    different regarding all of their parameters. As a result the extrapolated survival functions for all the interventions are more closely fitted to the available data and expected survival is less likely to be over or underestimated. An additional advantage of the use of fractional polynomials is that models can be fitted that go to asymptotes, and are therefore far more stable at the ends than, say, standard polynomials or splines. Although the proposed models constitute a substantial liberalization for evidence synthesis of survival curves from RCTs, there is still a danger of under-stating the uncertainty in extrapolating the curves because the choice of fractional polynomials is based on model fit criteria. In order to reflect model uncertainty, it might be of interest to estimate the powers of the fractional polynomials as well.

Conclusions
(Network) meta-analysis of survival data is commonly performed with models represented with one parameter for the relative treatment effect: the constant hazard ratio. When the proportional hazards assumption does not hold, models in which the treatment effect is represented by several parameters using fractional polynomials can be more closely fitted to the available data. The models allow straightforward estimation of expected survival to facilitate cost-effectiveness analysis.

Appendix
Extraction of data from survival curves to use in the network meta-analysis model According to the Kaplan-Meier curve, the proportion of people alive at time point t S t that die between time point t and time point t + 1 is equal to (S t -S t+1 )/S t and can be described by binomial likelihood distribution: r t bin(p t , n t ). Where is the number of deaths r t in the interval [t,t+1]. n t is the number of subjects at risk in that interval, and p t is the underlying risk.
In the absence of censoring for the interval [t,t+1], n t is the number at risk at the beginning of the interval and r t can be obtained by multiplying n t with (S t -S t+1 )/S t .
The number at risk for a particular interval might be provided below the Kaplan-Meier graph; if not reported, it can be obtained according to n t = n t+1 S t+1 /S t starting at the time point where n t is provided below the graph.
In the case of censoring, the overlap of the sequence of censoring and deaths within the time interval [t,t+1] is unclear, and it is not possible to derive the exact number of deaths and censoring in the interval. As extreme cases we can assume that, on the one hand, censoring occurs after the deaths within the interval, or, on the other hand, all censoring occurs before the deaths. In the first scenario n t is the number at risk at the beginning of the interval, whereas in the second scenario n t is the number at risk at the beginning of the interval minus the number of censored subjects. With the second scenario it is clear that n t and r t are smaller given (S t -S t+1 )/S t resulting in more uncertainty regarding the estimate p t . To not underestimate the uncertainty we opted for the second scenario. Under the assumption that all censoring occurs before the deaths  Figure 7 Three representative studies that illustrate that a constant hazard ratio in combination with a Weibull reference curve does not fit the data as closely as the fractional polynomial models. occur, n t can again be obtained by n t = n t+1 S t+1 /S t with n t+1 reported below the graph, or based on the same calculation for the interval [t+1, t+2], etc.