A simulation study was performed to compare the predictive performance of the RSF landmarking approach to a joint model and Cox landmarking method. Specifically, the four main aims were to compare performance under the scenarios of (1) a simple relationship between the longitudinal and survival processes, (2) a violation of the proportional hazard assumption and a complex dependence of the longitudinal marker process, (3) multiple longitudinal outcomes, and (4), multiple noise variables not associated with the survival process.
Data generation
The data were generated using a joint model. The longitudinal processes for k longitudinal outcomes were modeled using a linear mixed effects model that included only main effects for three binary baseline covariates and a random intercept and slope.
$$y_{{ik}}(t) = m_{{ik}}(t) + \epsilon_{{ik}}(t) = x_{{ik}}' \beta_{k} + z_{{ik}}' b_{{ik}} + \epsilon_{{ik}}(t) $$
where the longitudinal measurements yijk={yik(tij),j=1,…,nik} are simulated for each subject i for measurement time tijk, and xik and zik are covariate vectors associated, respectively, with the vector of fixed effects βk and vector of random effects bik,bik∼N(0,Gk), where Gk is the variance matrix, and is independent of \(\epsilon _{{ik}}(t)\sim N(0,\sigma _{k}^{2})\). The estimate of true unobserved value of each underlying longitudinal covariate is mik(t). In this simulation study, we simulate two longitudinal markers, mi1(t) and mi2(t), where xi1=xi2=(1,t,X1,X2,X3)′ and zi1=zi2=(1,t)′.
For Aims 1-2 including only one longitudinal outcome, two different scenarios were assumed for the survival process, each with varying complexity in the association between the survival and longitudinal processes. In Scenario I, we simulated from a standard joint model where the hazard of the event depended only on the true current marker level. Scenario II included an interaction between log(1+t) and the association parameter to induce a violation of proportional hazards, and also included a quadratic trend in the relationship between the longitudinal marker and the survival outcome:
$$\begin{array}{*{20}l} {}\text{Scenario I: }& h_{i}(t) = h_{0}(t)\exp\{ \gamma'w_{i} + \alpha_{1} m_{i1}(t)\}. \\ {}\text{Scenario II: }& h_{i}(t) = h_{0}(t)\exp\{ \gamma'w_{i} + \alpha_{1} m_{i1}(t)^{2}\log(1+t)\} \end{array} $$
where wi=(Xi1,Xi2,Xi3)′ is the vector of baseline covariates associated with the vector of coefficients γ=(γ1,γ2,γ3)′. The scalar α1 quantifies the association between the error free current measurement, mi1(t), and the time-to-event outcome. We use a Weibull baseline hazard, \(\phantom {\dot {i}\!}h_{0}(t) = \exp (\sigma _{1}) \sigma _{2} t^{\sigma _{2}-1}\).
For Aim 3, we included the second longitudinal marker using a quadratic relationship with the marker and the survival process:
$$\begin{array}{*{20}l} {}\text{Scenario III: }& h_{i}(t) \,=\, h_{0}(t)\exp\{ \gamma'w_{i} \,+\, \alpha_{1} m_{i1}(t) \,+\, \alpha_{2} m_{i2}(t)^{2} \}. \\ \end{array} $$
A total of 500 simulated data sets of 1000 subjects each were generated. The subjects were assumed to have been followed for a period of up to 15 years, with longitudinal measurements at baseline and at up to 30 follow-up times generated from an exponential distribution with a rate of 2 per year for both longitudinal outcomes. Censoring was simulated under a uniform distribution. Each of the scenarios resulted in approximately 50%, 43%, and 45% censoring, respectively. To address the fourth aim of the simulation study, an additional set of 7 binary “noise” variables, Xi4,…,Xi10, were simulated with varying prevalence, but not included in generating the survival and longitudinal data under Scenarios I, II, or III. Additional details and the parameter values for the simulation are given in Supplementary Section 1 in Additional file 1.
Model fitting
An identical procedure was repeated on data sets from Scenario I, II and III. Each dataset was randomly split into 50% training and 50% testing data sets. Each prediction method was then fit to the training sets and used to compute dynamic survival probabilities on the 500 test individuals, which we used to assess predictive performance. Evaluation of predictive ability via AUC, Brier score and RMSPE was performed at five follow-up times for each scenario, and values were averaged across simulated data sets. A prediction window of s=5 years was used for all three scenarios.
We fit a joint model as specified in data generation with a linear mixed effects model for the longitudinal process and the true current value of the marker included in the survival process. This model was correctly specified for Scenario I, but misspecified for Scenario II. For Scenario III, we fit two misspecified joint models. One in which fit the longitudinal marker y1 as in Scenarios I and II, and the longitudinal marker y2 is treated as a baseline covariate, and a second in which we include both longitudinal markers using a multivariate shared parameter joint model for longitudinal and survival outcomes under a Bayesian estimation approach (implemented using the JMBayes R package), ignoring the quadratic relationship with the hazard. We used Cox landmarking by fitting a simple Cox model with LOCF imputation of the longitudinal marker at each of the landmark times. Details of the Cox model and joint model that were fit and how dynamic predictions were obtained can be found in Supplementary Section 2 (Additional file 1).
For the RSF landmarking approach we fit a RSF at each landmark time, using LOCF for the longitudinal marker as in the Cox landmarking. Each RSF was run with 1000 trees and all other suggested values in Table 1. We also fit the RSF models with parameter tuning at each landmark time to select the number of covariates to try as candidates (mtry) and final node size. We fit all the modeling approaches twice: (1) with w including only the 3 baseline variables that were used in the data generation, as well as (2) including the 7 simulated noise variables.
Results
In Scenario I (Fig. 1, left panel), when there is a simple relationship between the survival and marker processes where the survival depends only on the true current level of the marker, the RSF landmarking approach does not perform better than the Cox landmarking or correctly specified joint model based on both AUC and Brier score. The RMSPE followed a similar pattern to BS (Additional file 1 Table S3) The AUC of all models decrease over time, which is likely due to a selection process that induces increasing homogeneity in the at-risk population at later prediction time points [33]. The landmark Cox and RSF models experience a greater decrease in AUC over time possibly since these models are trained on smaller data sets at later prediction times.
In Scenario II (Fig. 1, middle panel), when we violate the proportional hazards assumption and introduce a more complex relationship between the marker and the survival process, RSF landmarking performed better in terms of both AUC and BS at every time point except the earliest landmark where it performed similarly to the Cox landmarking approach and the misspecified joint model. The performance of RSF landmarking was similar across all landmark times. In contrast, both the misspecified joint model and Cox landmarking had worsening predictive performance at later landmark times, with the largest decrease over time seen in the misspecified joint model. The RMSPE of the models can be seen in Additional File 1 Table S6, and follows a similar pattern to BS.
In Scenario III (Additional File Tables S7-9), the naïve joint model that uses only longitudinal information from one marker performed similarly to RSF at baseline, but at later landmarks had worse performance. The multivariate joint model that uses both longitudinal markers (Fig. 1, right panel) performed slightly better than RSF at early landmark times but followed the same pattern of reduction over time as the naïve joint model. The Cox landmarking approach had performance similar to the joint models at later landmark times as the longitudinal quadratic relationship between y2 and the hazard changed. The RSF method performed consistently across time points and was closest to the true probability at every landmark time, with AUC, BS, and RMSPE showing similar patterns.
In all scenarios, all methods (Cox noise, JM noise, RSF noise) had decreased performance when variables were included that were not associated with the survival process (Fig. 1). In Scenario I (simple relationship), the performance of the RSF decreased more with the inclusion of the noise variables compared to the other two methods. In Scenario II (complex relationship), the reduction in the RSF landmark and joint modeling approach was less severe than the Cox landmark approach. All predictions were similar with and without noise in Scenario III. In Figures S1 in Additional file 1, we demonstrate that for Scenario II the VIMP of the noise variables under the RSF landmarking approach is close to 0 across the landmark times.
Estimation and prediction using the Cox landmarking model, the joint model, and RSF landmarking took on average about 25 s (s), 80s, and 19s, respectively, for the 5 landmark times for all scenarios. The multivariate joint model took much longer at about 15 min on average. When noise covariates were included, estimation and prediction of the RSF landmarking increased to around 30s. RSF landmarking with tuning increased time by about 10s for the correct set of covariates and about 30s with the noise variables included. No substantial improvements in predictive performance were seen from tuning in Scenario I and across all scenarios when no noise is present. But, with the more complex relationships in Scenarios II and III, slight improvements in AUC, BS and RMSPE can be seen in the tuned compared to author-selected parameter models when noise variables are included (Additional File 1, Tables S1-S9).
Overall, from the results of our simulation study we found that (1) RSF landmarking did not have good predictive performance when there was a simple relationship between the survival and longitudinal marker processes and few baseline covariates; (2) When the relationship between the marker and survival process was complex, and the proportional hazards assumption violated, the RSF landmark approach performed better than Cox landmarking and a misspecified, simple joint model; (3) The RSF method can accommodate more than one longitudinal covariate with good predictive performance relative to misspecified joint modeling and Cox landmarking; (4) When extra noise variables are included and there are complex relationships present, including these covariates in the RSF landmarking reduces predictive performance only slightly; (5) Parameter tuning when predicting with a small number of covariates did not substantially improve performance and may not be necessary.