Robust estimation of the expected survival probabilities from high-dimensional Cox models with biomarker-by-treatment interactions in randomized clinical trials

Background Thanks to the advances in genomics and targeted treatments, more and more prediction models based on biomarkers are being developed to predict potential benefit from treatments in a randomized clinical trial. Despite the methodological framework for the development and validation of prediction models in a high-dimensional setting is getting more and more established, no clear guidance exists yet on how to estimate expected survival probabilities in a penalized model with biomarker-by-treatment interactions. Methods Based on a parsimonious biomarker selection in a penalized high-dimensional Cox model (lasso or adaptive lasso), we propose a unified framework to: estimate internally the predictive accuracy metrics of the developed model (using double cross-validation); estimate the individual survival probabilities at a given timepoint; construct confidence intervals thereof (analytical or bootstrap); and visualize them graphically (pointwise or smoothed with spline). We compared these strategies through a simulation study covering scenarios with or without biomarker effects. We applied the strategies to a large randomized phase III clinical trial that evaluated the effect of adding trastuzumab to chemotherapy in 1574 early breast cancer patients, for which the expression of 462 genes was measured. Results In our simulations, penalized regression models using the adaptive lasso estimated the survival probability of new patients with low bias and standard error; bootstrapped confidence intervals had empirical coverage probability close to the nominal level across very different scenarios. The double cross-validation performed on the training data set closely mimicked the predictive accuracy of the selected models in external validation data. We also propose a useful visual representation of the expected survival probabilities using splines. In the breast cancer trial, the adaptive lasso penalty selected a prediction model with 4 clinical covariates, the main effects of 98 biomarkers and 24 biomarker-by-treatment interactions, but there was high variability of the expected survival probabilities, with very large confidence intervals. Conclusion Based on our simulations, we propose a unified framework for: developing a prediction model with biomarker-by-treatment interactions in a high-dimensional setting and validating it in absence of external data; accurately estimating the expected survival probability of future patients with associated confidence intervals; and graphically visualizing the developed prediction model. All the methods are implemented in the R package biospear, publicly available on the CRAN. Electronic supplementary material The online version of this article (doi:10.1186/s12874-017-0354-0) contains supplementary material, which is available to authorized users.


Background
Thanks to the advances in genomics and targeted treatments, an increasing interest is being devoted to develop prediction models with biomarkers, called treatment-effect modifiers (for which the relative treatment effect varies according to the biomarker values), to predict how much benefit individual patients would derive from specific treatments. This aims at taking the therapeutic decision that fits the best each individual patient. Recently, there have been some attempts to identify prediction models that are associated with higher efficacy of particular treatments such as an 8-gene and a 14-gene signatures (i.e. prediction model) to evaluate the degree of trastuzumab benefit in early breast cancer patients on disease-free survival [1,2]. Statistical methodology has been proposed for the development of prediction models and their validation [3,4] and to estimate individual predictions in a survival setting [5,6]; however, no clear guidance has been yet reached and evaluated in a high-dimensional setting.
Results from randomized clinical trials are often difficult to translate into predictions for individual patients, but the estimated absolute risk reductions from large randomized trials do still provide the best guidance [7]. Therefore, in addition to the challenges coming from the identification of treatment-effect modifiers in a high-dimensional setting, it is also important to identify prognostic biomarkers (i.e. associated with the clinical outcome and independently of treatment) to adjust for the main effect of established clinical and genomic variables in order to obtain individual survival probabilities. Indeed, the clinical impact of a treatment can be judged only with the knowledge of the prognosis of a patient. It is thus of importance to reliably predict the prognosis of patients to assist treatment counseling [8].
The aim of the present study was to propose a unified framework for developing and validating a highdimensional Cox model [9] in a randomized clinical trial to estimate the expected treatment effect according to different values of the biomarkers included in the model, and to estimate survival probabilities for individual patients with associated confidence intervals. We propose different approaches that can be used in this context. We present a simulation study including null (i.e. with no treatmenteffect modifier) and alternative scenarios (i.e. with at least one treatment-effect modifier) to evaluate the performance of the proposed approaches. We also illustrate the methods in a large randomized clinical trial of breast cancer patients. Finally, we discuss the findings.

Methods
In the present section we investigate several approaches to: (i) identify a parsimonious Cox regression model in a high-dimensional setting, (ii) estimate internally the predictive accuracy metrics of the selected model, (iii) estimate the expected survival probability for patients, (iv) construct confidence intervals thereof, and (v) visualize them graphically. A schematic representation of the framework is provided in Fig. 1.

Identification of a prediction model with treatment-effect modifiers
The objective of developing a prediction model is to identify a model allowing the computation of a prognostic and a treatment-effect modifying score for each patient. From a statistical viewpoint, Rothwell put forward that the most reliable statistical approach for identifying treatmenteffect modifiers is to test the interactions between the biomarkers and the treatment effect [7]. Hereinafter, we consider the proportional hazards model where α, β = (β 1 , …, β p ) T and γ = (γ 1 , …, γ p ) T are the regression coefficients for: the treatment T coded as +0.5 and −0.5 for the experimental and control arm, respectively; the main effects of the p standardized biomarkers X = (X 1 , …, X p ) T ; and the p biomarker-by-treatment interactions, i.e. the product between the standardized biomarkers X and the treatment T. In other words, the first sum in (1) corresponds to an average prognostic component to the total score and the second sum in (1), corresponds to the treatment-effect modifying component. For simplicity, we do not consider clinical covariates in the presentation of the framework.
To overcome the nonidentifiability of the models due to the high-dimensional setting, we previously reviewed and proposed several methods to identify a parsimonious model starting from a large number of candidate biomarkers in a randomized clinical trial and to predict the magnitude of the relative treatment effect for future patients [10]. Most of these methods are based on penalized regression, which maximizes the penalized partial loglikelihood l p (α, β, γ, λ, T,X) that is the partial log-likelihood l(α, β, γ, T, X) of model (1) minus a penalty: In the present work, we implemented two penalties: the lasso [11] which corresponds to θ j = ϑ j = 1, and the adaptive lasso [12,13] which corresponds to θ j ¼ 1=β j andγ R j the regression coefficients estimated in a preliminary step through the model (1) subject to the ridge penalty. As compared to the popular lasso penalty, the adaptive lasso penalty was initially proposed as it fulfilled the oracle property [14] in the prognostic setting. However, the performance of the oracle property is less known in high-dimensional samples with relatively low number of events. In a time-to-event setting with many candidate biomarker-by-treatment interactions, the adaptive lasso penalty was one of the best performing methods for selecting these interactions in a simulation study in [10]. For both the lasso and adaptive lasso, the tuning parameter λ was determined via a k 1 - Fig. 1 Schematic representation of the proposed framework. 1CV and 2CV: single and double cross-validation, ĥ 0 (t) baseline hazard at time t,α;β;γ: j X ij : treatment-effect modifying score,π : entire linear predictor of the selected model, iBrier: integrated Brier score, C: Uno's C-statistic, ΔC: different in arm-specific Uno's C-statistics, Ŝ(t): estimated survival probability at time t fold cross-validation scheme (called 1CV for the rest of the article) by maximizing the cross-validated loglikelihood criterion of the Cox model [15]: with k 1 = 5 and whereα −k ,β −k ,γ −k and the tuning parameter are estimated in the training data excluding the subsample k: T − k and X − k . Penalized regression coefficients are estimated through the R-package glmnet [16,17]. We also investigated the "refit adaptive lasso" and the "refit lasso" for which the selected non-zero regression coefficients are re-estimated in an unpenalized Cox regression model.

Internal assessment of predictive accuracy of the prediction model
To assess the performance of a prediction model, some predictive accuracy measures (see the section "Simulation study" for more details) can be used. However, computing these metrics on the same data used to develop the model could lead to over-optimistic predictive accuracy measures and thus, external validation is crucial [18]. In the absence of external validation data set, a strategy suggested by Simon et al. [19] and illustrated by Matsui et al. [3] consists in performing a double cross-validation (2CV) to mimic an external validation. Indeed, patients are first divided into k 2 subsamples (in this work k 2 = 5); then, for each fold, the data in the fold are put apart and the data in the remaining k 2 − 1 folds are used to estimate the tuning parameter λ through k 1 -fold cross validation and to estimate the prediction scores of the left-out patients. The process is iterated for all the k 2 folds to estimate the crossvalidated prediction scores of the complete data set. Of note, these scores obtained using the 2CV are only used to compute these prediction accuracy metrics, whereas the 2CV is not used elsewhere in this study.

Point estimates of the expected survival probability
with Ĥ 0 (t) the cumulative baseline hazard at time t, that is commonly estimated through the non-parametric Breslow estimator [20].

Estimation of confidence intervals of the expected survival probability
We compared two strategies to estimate the confidence intervals at level 1 -θ of the expected survival probabilities: one based on analytical expressions deriving from the normal approximation of the estimator and one based on a non-parametric bootstrap approach. The analytical approach [21] consists in estimating the variance of the cumulative risk Ĥ i (t) = Ĥ(t, T i , X i ) based on the Breslow estimator [22] in the Cox model. Thus, the confidence interval of Ŝ i (t) is approximated as : This can be directly implemented via the R-package survival [23]. In the case that penalized coefficients are taken without re-estimation, a practical solution to estimate their standard errors can be to compute the Hessian matrix based on the estimated parameter values. The non-parametric bootstrap approach consists in generating B bootstrap samples of the original data (B = 200 in this work) and in estimating the model (1) for each of them. Thus, B models ĥ 1 (t, T, X), …, ĥ B (t, T, X) are obtained, allowing to estimate B times the expected survival probability for each patient i at time t, noted Ŝ i,boot (t) = {Ŝ i,1 (t), …, Ŝ i,B (t)}. Then, the non-parametric confidence interval of the Ŝ i (t) based on the empirical percentiles q α (⋅) of the distribution of Ŝ i, boot (t) is given by Of note, both the analytical and bootstrap confidence intervals are based on the model (1) which is estimated using a penalty parameter determined via a single crossvalidation (1CV).

Graphical visualization
We visualize the expected survival probability of the patients according to their treatment-effect modifying j X ij at a given horizon τ. A toy example is used for illustration in Fig. 2.
For the graphical visualization, the scoresη ¼η 1 ; …; ;η n À Á are scaled so that the 2.5% (i.e. q 0:025η ð Þ) and 97.5% (i.e. q 0:975η ð Þ) quantiles equal 0 and 1 respectively in the training set:η i ¼η i − q 0:025η ð Þ q 0:975η ð Þ−q 0:025η ð Þ : In the simplest case in which only one treatment-effect modifier and no prognostic biomarker (β j ¼ 0; ∀j ) are included in the model, the relationship betweenη i and Ŝ i (t) (and its confidence interval bounds) is strictly monotone (Fig. 2a). However, in the setting of many treatment-effect modifiers and no prognostic biomarker, two patients with equal scoresη i have the same Ŝ i (t), but their confidence intervals may be different (Fig. 2b), due to different biomarker values. In order to obtain smoothed average confidence bounds, we considered constrained basis splines (B-splines) spl(t) [24]. Splines are numerical functions that can be used for curvefitting by approximately fitting the data at particular nodes. The number of nodes was estimated through the Akaike Information Criterion (AIC, default criterion in the chosen R-package cobs [25]). We also forced the constraint that 0 ≤ spl(t) ≤ 1.
When the model contains also prognostic biomarkers, patients with the same treatment-effect X ij : Due to the possible large heterogeneity of Ŝ i (t) for equal scoresη i , we stratified the treatment-effect modifying plot into four groups according to the prognostic score, using the percentiles proposed by Cox [26]: 16.4%, 33.6%, 33.6% and 16.4%. For our toy example, Fig. 2c and 2d represent the relationship between Ŝ i (t) andη i for the best and worst prognostic risk groups, respectively. Heterogeneity within groups is still present for both point estimates of the expected survival probability and confidence interval bounds; therefore we also considered smoothing the point estimates through B-splines. Even though categorization reduces the available information [27], it leads to a more meaningful graphical representation of the prediction model. In the rest of the study, we investigated both the spline strategy with prognostic categorization and the pointwise strategy.

Simulation study
In this section, we present a simulation study covering several null (i.e. with no treatment-effect modifier) and alternative (i.e. with at least one true treatmenteffect modifier) scenarios to evaluate the operating characteristics of the different approaches.

Data generation and scenarios
For each simulated data set, we generated p = 500 Gaussian unit-variance variables, including prognostic biomarkers and/or treatment-effect modifiers. Data were generated for a total of n = 1500 patients, randomly assigned to the experimental or control group with equal probability of 0.5. We generated random survival times t with constant hazard. Table 1 shows the three null and three alternative scenarios considered in the simulations. Simulation parameters such as the desired baseline (i.e. x = (0, …, 0) T ) survival probability S 0 (⋅) at the horizon τ (5 years in our simulations) and the regression parameters α, β and γ were chosen on the basis of the early breast cancer trial presented in the section "Application". Thus, we fixed S 0 (5) = 77%. All the scenarios included censoring and an autoregressive correlation structure [28] between biomarkers ρ jj 0 ¼ 0:8 j−j 0 j j within 25biomarker blocks. As sensitivity analyses, we also investigated other values for: the baseline survival probability S 0 (5) (i.e. 50%), the number of biomarkers p (i.e. 100 and 1000) and the empirical censoring rate (i.e. between 36% and 62%) to assess their impact on the results. For each scenario, 250 data sets were generated and 250 additional ones were generated for external validation, each with the same parameters as each of the training data sets.

Evaluation criteria
All the evaluation criteria have been measured at τ = 5 years. Integrated brier score. As a measure of overall prediction error of the models, we used the integrated Brier score (iBrier score, [29]). The time-dependent Brier score is a quadratic score based on the predicted time-dependent survival probability defined by and takes censoring into account through Ŝ C (t i ) = P(C > t i ), the Kaplan-Meier estimate of the censoring distribution with t i the survival time of patient i. The integration of the Brier score can be done by over time t ∈ [0, τ] with respect to some weight function W(t) for which a natural choice is (1 − Ŝ(t))/(1 − Ŝ(τ)) [29]. The lower the iBrier score, the larger the prediction accuracy is.
Uno's C-statistic. To evaluate the discrimination of the prediction model, we also measured the concordance betweenπ i , the linear predictor of (1), and the survival time based on the Uno's C-statistic, one of the least biased concordance statistic estimator in the presence of censoring [30]. For the entire model, we computed The larger the C(τ), the higher the overall discrimination is.
ΔUno's C-statistic. For the treatment-effect modifying componentη i , we computed the absolute difference of the arm-specific Uno's C-statistics as we proposed previously [10] to evaluate the biomarker-by-treatment interaction strength: where UnoCη; τ; T ¼ T ð Þis the Uno's C-statistic computed within the arm T only. The larger the ΔC(τ), the higher the interaction strength is.  As a comparator, we also computed these prediction accuracy metrics (iBrier, Uno and ΔUno) for the "oracle model" that is the unpenalized Cox proportional hazards model fitted to the truly related biomarkers in the training set and applied to the validation set.

Bias and precision of the expected survival probability
To assess reliability of the estimation of the expected survival probability, we compared for each patient i, its estimated survival probability Ŝ i (τ) to its theoretical survival probability S i (τ). The theoretical survival probability is computed from the baseline survival and the regression parameters of the simulation model. We evaluated the accuracy of Ŝ i (τ) through the mean bias and its precision through its standard error Finally, to evaluate the accuracy of the developed confidence intervals, we also evaluated their empirical coverage probability as

Results
The results of the simulation study are summarized in Tables 2 and 3. These tables summarize both the predictive accuracy metrics and the accuracy and precision of survival estimates of the time-to-event regression models subject to the adaptive lasso penalty.
As the lasso penalty provided slightly worse coverage of the confidence intervals and obtained slightly less good biomarker-selection performance in some scenarios [10], its results are given in the Additional files 1 and 2. The models identified through the adaptive lasso penalty provided predictive accuracy ( Table 2) relatively close to the oracle models. Computed on an external validation data set, the iBrier score was quite low (varying from 0.099 to 0.109) and the Uno's C-statistic was quite large in presence of prognostic effects (scenarios 2 and 6, varying from 0.586 to 0.670). The ΔUno's Cstatistic was also large in presence of true treatmenteffect modifiers (scenarios 4-6, varying from +0.207 to +0.229). When evaluating these criteria on the training set used to compute the prediction model, we observed overrated performances of the models as compared to results in external data (training 1cv vs. validation: from +0.077 to +0.155 for C and from +0.062 to +0.107 for ΔC). The double cross-validation technique (training 2cv) gave internal results close to those of an external validation (validation).
When focusing on the estimation of the survival probability (Table 3), the mean bias was close to zero in all scenarios. Nonetheless, for S 0~5 0% (Additional file 3), a slight negative bias was observed (from −0.007 to −0.020). In terms of precision, the standard error of the survival point estimate was low in absence of biomarker effects (null scenarios 1-2, from 0.05 to 0.06) and was higher in presence of prognostic biomarkers (null scenario 3, 0.09) and even higher with treatment-effect modifiers (scenarios 4-5, from 0.10 to 0.11). The variability was the largest when both prognostic and treatment-modifiers were present (scenario 6, SE from 0.13). No relevant difference in terms of accuracy and precision of survival point estimate was observed between the pointwise and spline strategy ( Table 3). The standard error was inflated, too, when refitting the selected regression model in a second step ("refit adaptive For the computation of the 95% confidence interval of the estimated survival probability, the non-parametric bootstrap approach outperformed the analytical approach in terms of empirical coverage (Table 3). Despite the former (through the pointwise strategy) produced slightly overly conservative confidence intervals (CP from 0.96 to 0.97), the latter was more biased and in the opposite direction (CP from 0.91 to 0.93 in null scenarios and 0.88 in alternative scenarios). These probabilities for the analytical approach were even lower when using the refit model (CP from 0.68 to 0.73, Additional file 4), whereas remained unchanged for the bootstrap approach.
The categorization of the prognostic scores into risk groups and the use of splines slightly increased the coverage probability of the confidence intervals for the bootstrap approach (increase from 0.96-0.97 to 0.96-1.00) whereas the coverage probability remained unchanged for the analytical approach.
In sensitivity analyses, the overall number of biomarkers p and the empirical censoring rate slightly impacted the results. Indeed, for smaller p and lower censoring rate, the prediction accuracy metrics were closer to the oracle models and the overfitting issue was smaller (Additional files 5 and 6). Also, the precision of the expected survival estimates was higher for this setting (Additional files 7 and 8).

Application
A retrospective biomarker study was performed on tumor samples from n = 1574 patients in an early breast cancer randomized clinical trial comparing chemotherapy plus adjuvant trastuzumab (arm C + T, n = 779) or not (arm C, n = 795). The comprehensive description of this data set is provided in Pogue-Geile et al. [1]. Gene expression data had been collected for p = 462 genes and normalized as in the original publication. Clinicopathological covariates such as ER and nodal status, and tumor size were also available. Median follow-up time for distant-recurrence free survival (DRFS) was 7.1 years and the censoring rate was 73% (i.e. 431 events for DRFS). The arm-specific 5-year DRFS was 84% (CI95%: 81%-86%) and 64% (CI95%: 61%-68%) for patients in arm C + T and C, respectively (Fig. 3). In the 1574 patients, adjuvant trastuzumab led to significant better DRFS as compared to chemotherapy alone (Hazard Ratio = 0.46 [95% CI: 0.38-0.56]).
The clinico-genomic prediction model, established through the model (1) subject to the adaptive lasso penalty, is presented in Table 4. It contains 102 prognostic variables (4 clinical and 98 genomic variables) and 24 treatment-effect modifiers. Interestingly, some prognostic biomarkers have already been identified in the biomedical literature, such as SOX4 [31] or CSNK1D [32]. In addition, some immune genes were also identified in the treatment-effect modifying component: CD9 and CCL21, which is consistent with some articles highlighting the involvement of immune pathways in the efficacy of trastuzumab [33,34]. Fig. 4 provides a graphical visualization of the model representing the 5-year DRFS according to the treatment-effect modifying score for different prognostic risk groups. Confidence intervals were estimated through the non-parametric bootstrap approach (B = 200) and were smoothed via B-splines using two or three nodes estimated by AIC. The prognostic score was categorized through the distribution proposed by Cox [26], i.e. group 1 (φ < 0.20), group 2 (0.20 ≤φ < 0.99), group 3 (0.99 ≤φ < 1.85) and group 4 (1.85 ≥φ). The entire prediction model (i.e. 126 variables) has a moderate ability to discriminate patients according to their survival probability (C = 0.67 for a double cross-validation). Regarding the treatment-effect modifying component of the model, it only slightly discriminates patients for their treatment benefit (ΔC = Anly: analytical approach, Boot: non-parametric bootstrap approach, CI: confidence interval. Average quantities across 250 replications +0.02 for a double cross-validation): the lowest the treatment-effect modifying score, the highest the benefit of the trastuzumab is. As expected, when performing a single cross-validation the prediction measures are higher (C = 0.80 and ΔC = 0.23). In terms of point estimates, the effect of treatment with trastuzumab on absolute DRFS seems higher for the low treatment-effect modifying scores. However, the width of the confidence intervals is extremely large and the confidence intervals largely overlap between the two arms. The selected model subject to the lasso penalty identified only the SIAH2 gene as treatment-effect modifier (Additional file 9). Some recent articles have been proposed to reduce the number of selected biomarkers by selecting a more stringent tuning parameter. We previously proposed the pcvl criterion [35] that reduces the number of selected penalized biomarkers in the present application from 122 to 64 for the adaptive lasso (Additional file 10).

Discussion
Individual predictions for future patients are the ultimate goal of the stratified medicine era [36]. Individual outcome predictions from population risks are most often affected by an inherently large amount of uncertainty. Nevertheless, the expected survival probabilities under different treatment alternatives can be a valuable tool to inform therapeutic decision-making.
In the present study, we proposed a unified framework for developing, evaluating and visualizing a prediction tool for future patients in the setting of randomized clinical trials with a survival endpoint in a high-dimensional space. We performed a simulation study covering null and alternative scenarios to evaluate different approaches.  Table 4 Developed clinico-genomic model through the full biomarker-by-treatment interaction Cox model subject to the adaptive lasso penalty Based on simulations, the models identified either with the lasso or the adaptive lasso penalties give close results, with accurate and precise estimations of the expected survival probability. The lasso penalty give slightly worse results than the adaptive lasso penalty for the construction of confidence intervals through the analytical approach; however, for both penalized models, the coverage probabilities of their non-parametric 95% confidence intervals are close to the nominal level. The simulations also highlighted that refitting the regression model after biomarker selection to obtain unpenalized coefficients reduced the precision of the expected survival probability estimations. In case no external data is available, a double cross-validation is useful to reduce the overoptimism of the measures computed internally (prediction error through the iBrier score and discrimination through the Uno's C-statistics). Other resampling techniques such as bootstrap or jackknife could have been also proposed, but are known to be more time consuming. In addition, a previous study comparing resampling methods highlighted that the cross-validation performed well for predictive accuracy [37]. Regarding the construction of confidence intervals, the non-parametric (bootstrap) confidence intervals outperform the analytical approach in our simulation study. A possible explanation is that standard errors of penalized regression coefficients are difficult to estimate, we used the Hessian matrix based on the estimated parameter values in this study. Furthermore, this approach does not take into account the upstream variable selection process, which adds variability and could explain why the coverage probability is lower than the nominal level.
Finally, we also considered to possibly use smoothing splines and to categorize the prognostic score into four risk groups to obtain a more meaningful graphical visualization of the model. Of course, the cut-offs are arbitrary and should be predefined by warranting a sufficient number of patients per group. Even if the categorization of the prognostic scores slightly reduces the information, this is counterbalanced by a more straightforward graphical representation of the model  Table 4. DRFS: distant-recurrence free survival, dot: point estimate, solid curves: average smoothed splines for point estimates, dashed curves: average smoothed splines for confidence bounds predictions. In particular, this reduction in coverage probability is minimal with the bootstrap approach.
To the best of our knowledge, no simulation study has previously investigated the accuracy and precision of expected survival probabilities from a penalized Cox regression model investigating in high-dimensional spaces both prognostic biomarkers and treatment-effect modifiers in a randomized clinical trial. Matsui et al. [3] showed in a real data example how to obtain individual estimations from prediction models based on both prognostic biomarkers and treatment-effect modifiers. However, they did not mention the problem of computing confidence intervals and selected the biomarkers through a univariate approach, which is suboptimal in a high-dimensional setting [10]. The inference of survival probabilities is a topic increasingly discussed in the literature. Two recent articles [5,6] focused on a prognostic setting, mainly in low-dimensional spaces. Sinnott and Cai [5] proposed a two-step ensemble voting approach and Lin and Halabi [6] proposed a perturbation approach to better estimate the standard errors, but did not investigate the impact on expected survival probabilities.
We also applied the proposed strategies to a randomized clinical trial of early breast cancer. Interestingly, the identified model contains some biomarkers already known in the biomedical literature but the interaction strength seemed low based on double cross-validation. Due to the low number of events of the complete data set, we did not separate it into training and validation sets. Of note, the large width of the confidence intervals of the survival predictions translates the high uncertainty around these measures, as also discussed by [5]. The identified model contains a very large number of biomarkers, but parsimonious models could be preferred. Some strategies have been proposed to reduce the number of selected biomarkers by selecting a more stringent tuning parameter such as the pcvl criterion [35]. Finally, a remaining open question concerns the integration of clinical and genomic variables. Ongoing research focuses on this topic, but no consensus has been reached yet [38,39]. In the application shown, we left the clinical variables unpenalized because they are known to have a strong prognostic main effect in the early breast cancer setting.

Funding
This work was supported by the Foundation Philanthropia Lombard-Odier as a PhD scholarship. The funding sources had no role in study design, data collection, data analysis, data interpretation, or manuscript writing.