 Research article
 Open Access
 Published:
Dynamic prediction of repeated events data based on landmarking model: application to colorectal liver metastases data
BMC Medical Research Methodology volume 19, Article number: 31 (2019)
Abstract
Background
In some clinical situations, patients experience repeated events of the same type. Among these, cancer recurrences can result in terminal events such as death. Therefore, here we dynamically predicted the risks of repeated and terminal events given longitudinal histories observed before prediction time using dynamic pseudoobservations (DPOs) in a landmarking model.
Methods
The proposed DPOs were calculated using Aalen–Johansen estimator for the event processes described in the multistate model. Furthermore, in the absence of a terminal event, a more convenient approach without matrix operation was described using the ordering of repeated events. Finally, generalized estimating equations were used to calculate probabilities of repeated and terminal events, which were treated as multinomial outcomes.
Results
Simulation studies were conducted to assess bias and investigate the efficiency of the proposed DPOs in a finite sample. Little bias was detected in DPOs even under relatively heavy censoring, and the method was applied to data from patients with colorectal liver metastases.
Conclusions
The proposed method enabled intuitive interpretations of terminal event settings.
Background
Events of interest are repeatedly observed during followup of some diseases. For example, in colorectal liver metastases following curative surgical resection, the incidence of recurrence is as high as 75% [1]. Surgical reresection for recurrence is performed when possible, and repeat recurrences are often resected until lesions are unresectable or fatal. The most recent clinical observations of tumors are highly predictive of subsequent recurrence, particularly in patients with multiple tumors or previously resected tumors that may result in recurrence. Risk of recurrence or death can be predicted easily by a conventional approach with time to recurrencefree survival. Prediction of the risk of recurrence and the risk of death separately has more clinical importance because a recurrent case may undergo reresection, unlike a fatal case, in other words, a recurrent case is still at risk. Furthermore, the prediction of the number of recurrences will be helpful for recognition of its severity. Therefore, personalized predictions of each risk of recurrence and death can be used to communicate his/her prognosis to the patient and decide optimal examination intervals for detecting recurrences.
Dynamic prediction has an intuitive expression associated with the estimated probability of event occurrence within (t, t + w), assuming that the patient is at risk at prediction time t. For univariate survival outcomes, this probability can be calculated using the Breslow estimator based on Cox proportional hazards models [2]. However, for repeated events data, dynamic prediction models are required to estimate the probability of numbers of events within (t, t + w), assuming being risk set at time t. Marginal Cox models [3] can be used to estimate ktimes event probability as the difference between the marginal probabilities of k^{th} and (k − 1)^{th} events.
During following up subjects, we often observe a terminal event, such as death, which preclude subsequent repeated events and induce informative censoring because of correlations between repeated events and a terminal event [4]. Thus, using joint frailty models [5,6,7], terminal events were regarded as informative censoring, and conditional distributions of repeated events on frailty could be obtained. This modeling of repeated events adjusted for the correlation between repeated events and a terminal event may be difficult to interpret clinically. Instead, a comprehensive approach should consider probabilities or hazards of both repeated and terminal events that are subject to estimation. In this context, numbers of repeated and terminal events are regarded as semicompeting risks [8, 9] that encompass an illness–death process [10]. Our proposed method is based on the illness–death process and dynamically predicts the probability of terminal events.
Researcher might include longitudinal data such as biomarkers, health status, and disease histories as timedependent covariates in a prediction model. These clinical measures are usually internal covariates and required extra modeling to predict survival functions accurately [11, 12], and joint modeling and landmarking are major approaches and explored the predictive performance [13,14,15]. The former couples longitudinal trajectory and survival models [16,17,18,19,20], and for continuous variables observed at some interval; the model explicitly specifies its trajectory for accurate predictions. The latter landmarking approach is robust to model misspecification for longitudinal processes [21,22,23,24] and uses only longitudinal data Z(s) accumulated until a certain fixed (landmarking) time s. This procedure treats longitudinal data as fixed external covariates and leads to adequate modeling for timedependent internal covariates. Recently parsimonious modeling approach in longitudinal data was proposed [25]. Landmarking models of residual lifetimes t – s have been developed [21, 26], and competing risks data have been considered in landmarking models after extension based on FineGray [27] models [28] and on dynamic pseudoobservations (DPOs) [29].
Mauguen et al. [18] and Musoro et al. [24] examined dynamic prediction on repeated events data. The former was interested in dynamic prediction of death using cancer relapse, which is repeated events data. Because of the dependency of death and relapse, time to death and time to relapse were linked by joint frailty term. The latter was interested in the dynamic prediction of infection risk using longitudinal marker and history of infection. Because patients repeat infection events, Cox frailty model was employed and landmarking technique was used for the longitudinal marker. The above methods cannot be used as they are in situations where the risk of each number of repeated event and the risk of a terminal event are predicted in parallel. Therefore, we propose a prediction method using DPOs for repeated events data in the presence and absence of terminal events.
DPOs are proposed in the framework of pseudoobservations, which are each subject’s contribution to occurring the event replace each observed or censored event indicator at some time point. Although the realized value of event indicator is unknown on the censored subject, the contribution can be calculated by jackknife estimates. So covariate information on a censored subject can be incorporated into a generalized linear model directly. Unlike multiple imputations, it is unnecessary of pseudoobservations to repeat the creation of datasets nor combine results obtained from datasets. DPOs are extended such pseudoobservations in dynamic prediction for competing risks [29], and the idea of DPOs was applied to illnessdeath process [30]. So our proposal is regarded as an extension of DPOs for semicompeting risks settings, and that provides a dynamic prediction method in the presence of a terminal event. It is unnecessary to specify the correlation between repeated events and a terminal event, or model hazard function addressed by existing methods mainly. Although the asymptotic behavior of this approach has been demonstrated [29, 31,32,33], few studies have assessed the performance of pseudoobservations in finite samples. Thus, we conducted simulations to evaluate bias and efficiency of DPOs. Finally, we applied the proposed method to data of Japanese patients with colorectal liver metastases.
Method
Dynamic prediction and landmarking
For subject i (i = 1, …, n), let T_{ij}, T_{i}^{D}, and C_{i} be times for j^{th} (j = 1, 2, …, J_{i}) repeated events, a terminal event and censoring time, respectively, and let Z_{i}(t) = [Z_{i1}(t), …,Z_{ip}(t)]^{T} be the timedependent covariate vector at time t. Therefore, potential data {{T_{ij}}, T_{i}^{D}, C_{i}, Z_{i}(t)} in subject i are assumed to be independent from {{T_{i’j}}, T_{i’}^{D}, C_{i’}, Z_{i’}(t)} in another subject (i’) (assumption 1). If a terminal event is not considered, T_{i}^{D} is set to ∞. Because dynamic prediction estimates a conditional probability of an event based on at risk at a certain time point s, we refined notations of random variables of repeated events time as follows: Let \( {T}_{mik}^{\ast } \) be k^{th} (k = 1, 2, …) repeated event time counted from m^{th} (m = 0,…, M) conditional(landmark) time s_{m}. For example, if the subject i has experienced two events until s_{m}, time \( {T}_{mi1}^{\ast },{T}_{mi2}^{\ast },{T}_{mi3}^{\ast } \) is assigned to T_{i3}, T_{i4}, T_{i5}, respectively. For parsimonious definition, the number of events occurred before s_{m} could be treated as covariates Z(s) or stratified factor to taking into consideration for dynamic prediction. We would illustrate the former approach with a real example in section 2.5. We abbreviate s_{m} and \( {T}_{mik}^{\ast } \) to s and \( {T}_k^{\ast } \), respectively.
Hence, the probability of just ktime events occurring within (s, s + w) and a terminal event occurring after s + w given the subject being at risk at time s is expressed as follows:
and the probability of a terminal event is expressed as follows:
where w (w > 0) is the prediction window. To make these probabilities {F^{D}, F_{0}, F_{1}, …} mutually exclusive and equal to a total of 1, we define dynamic prediction to estimate conditional probabilities {F^{D}, F_{0}, F_{1}, …} and ‘at risk’ as the status of subjects who have experienced neither censoring nor terminal events.
Proposed DPOs
The predicted probabilities presented in eq.(1) and eq.(2) are regarded as expectations of event indicators. For example, \( E\left\{I\left({T}_k^{\ast}\le s+w,{T}_{k+1}^{\ast }>s+w\left{T}^D>s\right.\right)\right\} \) represents the probability of ktimes repeated events. If there are no censored subjects, this expectation can be calculated as the average of indicators for each subject. Conversely, these indicators must be unknown for the censored subject. Andersen et al. proposed that indicators among all subjects could be replaced with pseudoobservations [34], and Nicolaie et al. extended them to dynamic predictions named dynamic pseudoobservations (DPOs) [29]. Pseudoobservations are calculated by the difference between the estimates of predicted probability multiplied by sample size n and the leaveoneout estimates multiplied by n1, and looked upon as predicted indicators if censoring does not occur in any subjects. In addition to this, predicted probabilities can be calculated from the generalized linear model with regression pseudoobservations on some covariates, directly.
For subject i at risk at landmark time s, DPOs of ktimes repeated events and the terminal event were defined as follows, respectively:
where n_{s} is the number of risk sets at the landmark time s, and \( {\widehat{F}}_k \) and \( {\widehat{F}}^D \) are consistency estimates of F_{k} and F^{D}, respectively, as described in 2.2.1 and 2.2.2. The superscript (−i) represents corresponding leaveoneout estimates from datasets that are omitted only for subject i.
We have added the following assumptions that underlie the landmark dataset.

Censoring time C_{i} is independent on potential event times {T^{*}_{ij}}, T_{i}^{D,} and covariates Z_{i}(s); noninformative censoring (assumption 2)

G(·) denotes the survival function of censoring. Hence, for any τ; τ > s + w, G(τ) > 0. (assumption 3)
From previous researches [29, 31], assumptions 1–3 were required for the DPOs listed in sections DPOs in the absence of a terminal event and DPOs in the presence of a terminal event to maintain consistency of estimated regression coefficients and corresponding variances, as presented in section Construction of the dynamic prediction regression model.
DPOs in the absence of a terminal event
In the presence of repeated events only, a terminal event may be treated as usual censoring, and we target the probability in eq.(1). Assuming the multistate model in repeated events process (Fig. 1a), prediction probabilities can be obtained using the product–integral as the state occupation probability (detailed in Additional file 1: Appendix). Under Markov processes between states, this transition probability matrix is termed Aalen–Johansen (AJ) estimator [35] and has consistency in O(n^{1/2}) with assumptions 1,2 and some regulatory conditions [36]. Furthermore, Datta and Satten [37] showed that empirical state occupation probabilities for the whole dataset could be consistently estimated using the product integral, in the same manner as that using the AJ estimator. Hence, the estimated transition matrix using product integral is referred to as the AJ estimator regardless of Markovian principles.
Because all subjects belong to state 0 (initial state) at the landmark point in the multistate model for repeated events (Fig. 1a), the probability of the k^{th} repeated event is denoted by P_{0k}(t), which is the transition probability from state 0 to state k pictured in Fig. 1a. Accordingly, DPOs for the ktime event in eq.(3) are calculated as follows:
This method is referred to as the DPOsbased on the AJ estimator.
The method described in eq.(5) requires difficult matrix computation because of the product integral. Thus, to improve convenience, we propose another DPO using the sequence of repeated events and rewrite the ktimes event probability function F_{k}(s + ws) as follows:
where \( {S}_k^{\ast}\left(\cdotp \right) \) is survival function for \( {T}_k^{\ast } \). The probability of the ktimes event is expressed as the difference between the survival probability of the (k + 1)^{th} event and the k^{th} event. Subsequently, \( {S}_k^{\ast}\left(\cdotp \right) \) can be directly and consistently estimated using Kaplan–Meier (KM) estimators [38] or using the exponential transformed Nelson–Aalen estimator in O(n^{1/2}). However, we only consider using the KM estimator to estimate \( {S}_k^{\ast}\left(\cdotp \right) \), and DPOs based on KM estimator was calculated as follows:
DPOs in the presence of a terminal event
The illnessdeath model is applicable to situations with only one terminal event and one nonterminal event. As an extension of the illnessdeath model, the multistate model shown in Fig. 1(b) expresses repeated event processes in the presence of a terminal event. Accordingly, DPOs for repeated events are as in eq.(5), and DPOs for a terminal event in eq.(4) are described as follows:
Construction of the dynamic prediction regression model
To demonstrate dynamic prediction at one fixed landmark time s, the relationships between event probabilities and landmarking covariates Z_{i}(s) are constructed in the generalized linear model framework. The fixed landmark regression model is described as follows:
where g denotes a link function. Accordingly, we regard \( {\boldsymbol{\uptheta}}_i(s)={\left[{\theta}_{i1}(s),{\theta}_{i2}(s),\cdots, {\theta}_{iJ_s}(s)\right]}^{\top } \) as multinomial probability and exclusion of θ_{i0}(s) from θ_{i}(s) evaded parameter redundancy. Note that \( {\mathbf{Z}}_i^{\ast }(s) \) is among baseline and longitudinal covariates up to the landmark time s and the intercept. For example, when we assume a multinomial model [39, 40] with a generalized logit link to DPOs, the link function is logit and covariates \( {\mathbf{Z}}_i^{\ast }(s) \) are formed as follows:
The regression coefficient vector β is calculated by solving the following estimating equation:
where V_{i}(s) follows a multinomial distribution. Accordingly, asymptotic variance of β(s) is calculated from model variance as \( \mathrm{Var}\left\{\widehat{\boldsymbol{\upbeta}}(s)\right\}={\left\{\sum \limits_i{\widehat{\mathbf{D}}}_i{(s)}^{\top }{\widehat{\mathbf{V}}}_i^{1}(s){\widehat{\mathbf{D}}}_i(s)\right\}}^{1} \). Estimation in event probability is transformed from \( \widehat{\boldsymbol{\upbeta}}(s) \) using the inverse link function, and its variance is calculated from \( \mathrm{Var}\left\{\widehat{\boldsymbol{\upbeta}}(s)\right\} \) using the delta method.
Because predicted probabilities at each time point were estimated from different fixed landmark regression models, predicted probabilities tend to vary with proceeding landmark times. Especially when the number at risk is small, predicted probabilities take a large value change. Hence, to enhance an interpretation, the smoothers f(s) against landmark times are included in regression coefficients to continuously predict event probabilities. In addition, the precision of predicted probabilities is likely improved because the information from several landmark times can be used. This supermodel is described that regression coefficients are divided into polynomial smoother parts and invariant parts and are multiplied by β(s) = f(s) · β. In practice, the following covariates are stacked for several landmark times, and their interactions are described in the following polynomial function f_{0}(s), f_{1}(s), ⋯, f_{h}(s),
Similarly, DPOs are stacked as \( {\widehat{\boldsymbol{\uptheta}}}_i={\left[{\widehat{\boldsymbol{\uptheta}}}_i{\left({s}_0\right)}^{\top },\cdots, {\widehat{\boldsymbol{\uptheta}}}_i{\left({s}_0\right)}^{\top}\right]}^{\top } \). However, because multiple DPOs from the same subject are present, a generalized estimating equations approach [41] was adapted and solved as follows:
The robust (sandwich) estimator is used for the covariance matrix of \( \tilde{\boldsymbol{\upbeta}} \) and only specified as the independent type because of consistencies of estimating equations [42].
Simulation studies
Because the above asymptotic properties of DPOs were already shown, to assess the performance of proposed DPOs in a finite and realistic sample (n = 100), we conducted Monte Carlo simulation studies. Conditions WITHOUT/WITH terminal events are described in Predictions of repeated events without a terminal event and Prediction of repeated events WITH a terminal event, respectively.
Predictions of repeated events without a terminal event
Repeated events for subject i were limited to two, and corresponding times were represented as t_{i1} and t_{i2}. Subsequent exponential distribution with hazards λ_{i1} and λ_{i2} resulted in the first event time t_{i1} and the lag time between the first and second time t_{i2} − t_{i1} as follows:
where u_{i} denotes the frailty parameter, and λ_{01} and λ_{02} are baseline hazards. To assume heterogeneity of event occurrence between subjects, u_{i} must follow a gamma distribution with shape and rate parameters of 0.5. Moreover, the correlation between t_{i1} and t_{i2} − t_{i1} is theoretically 0.5 according to Kendall’s tau, and when u_{i} is constant at 1, no heterogeneity is present. Baseline hazards λ_{01} and λ_{02} are set to reflect Markov or semiMarkov process. In particular, λ_{01} = λ_{02} = 1 under the assumption of Markovian event processes. When λ_{01} = 1 and λ_{02} = 2, a semiMarkov process is assumed, such that the first event accelerates the hazard of the second event. Finally, probabilities for each number of events at time 1 were calculated from generated data (n = 100), and empirical probabilities were considered to be true values for each replication.
Right censoring was generated using an exponential distribution that is independent of repeated event processes, with hazards λ_{c} = 0.5 or λ_{c} = 2. Proposed DPOs based on the AJ (eq.(5)) and KM (eq.(6)) estimators were applied to these rightcensored data, and expectations of these DPOs were calculated as dynamic predicted values. In all scenarios, bias, and efficiency of prediction at t = 0 with window w = 1 were evaluated from 1000 repetitions of true and dynamic predicted values using the absolute bias, and the root mean squared error (RMSE), respectively.
Prediction of repeated events WITH a terminal event
To examine the performance of DPOs in eq.(5) and eq.(7) in the presence of terminal events, we performed simulation studies as described in section Predictions of repeated events without a terminal event. In addition to the two repeated event times, the potential time for the occurrence of a terminal event t_{iD} was generated using the following exponential distribution with hazard λ_{iD}:
where λ_{0D} is a baseline hazard and is fixed at 0.3. The frailty parameter u_{i} was subject to a gamma distribution; therefore, correlations of any pair among t_{i1}, t_{i2} − t_{i1} and t_{D} were of the same strength. Other procedures were similar to those described in section Predictions of repeated events without a terminal event.
Application of the DPOs to colorectal liver metastases data
We applied the proposed DPOs method to colorectal liver metastases data of 263 patients from Japan [1]. The database had been prospectively collected from 263 patients who underwent upfront hepatic resections from January 1996 to December 2010 at the HepatoBiliaryPancreatic Surgery Division, Department of Surgery, Graduate School of Medicine, The University of Tokyo. No included patient had received postoperative adjuvant chemotherapy or was enrolled in clinical trials. A total of 202 patients (76.8%) suffered first recurrences and 108 (53.3%) of these were reresected. Patients had up to four recurrences, and dynamic predictions of 3year event risks were calculated using information from the most recent tumor to identify patients at high risk of recurrence.
In these analyses, 3year risks of recurrence were classified as no recurrence, single recurrence or multiple recurrences, and were dynamically predicted according to numbers of recurrences (≥1 vs. 0), numbers of tumors (single or multiple) and tumor lengths (> 2 cm or ≤ 2 cm) before landmark time. Initially, DPOs from eq.(5) and eq.(6) were applied, and death was thought as censoring. In addition to the fixed landmark regression model, a supermodel was constructed using cubic smoothers against landmark time as follows: f(s) = [f_{0}(s), … , f_{3}(s)] = [1, s, s^{2}, s^{3}].
To take a terminal event into consideration, 3year risks of recurrence were classified as no recurrence, single recurrence, multiple recurrence or death. DPOs in eq.(5) and eq.(7) were applied and a fixed landmark model and a supermodel with cubic smoothers against landmark time were constructed.
Results
Simulation
Performance of DPOs in the absence of a terminal event
To examine scenario characteristics, the mean true probability of event numbers, censored proportions and Kendall’s tau between t_{i1} and t_{i2} − t_{i1} were calculated (Table 1). In all scenarios, true probabilities of event numbers were > 10%, and event times were censored heavily at λ_{c} = 2 and could be observed the event time at < 25% subject.
Absolute bias and RMSE values are presented in Table 2. In all scenarios, both methods resulted in approximately zero bias indicating accurate estimates of event processes based on the use of Markovian or semiMarkovian principles and heterogeneity of event occurrences between subjects. RMSE in the method based on the AJ estimator was less than or equally efficacious compared with the method based on the KM estimator, and RMSE ratios were 0.9–1.0.
Performance of DPOs in the presence of a terminal event
Scenario characteristics are shown in Table 3. Results of absolute bias and RMSE analyses are shown in Table 4. Little bias was observed, and RMSE values were consistent with those described in section 3.1.1.
Application of the DPOs to a real example
First, we applied twotypes DPOs described in eq.(5) and eq.(6). Parameter estimates from the supermodel are shown in Table 5, and few differences in parameter estimates from the two types of DPOs were observed. Predicted probabilities are shown in Additional file 2.
Further, we applied DPOs that included terminal events. Predicted probabilities among patients with data for single and small (≤2 cm) tumors are shown in Fig. 2, and other predicted probabilities and parameter estimates of the supermodel are shown in Additional file 2. Patients who had experienced recurrences before landmark time had higher risks of recurrence and death than those who have not experienced recurrences, whereas among patients with single and small tumors at the first hepatic resection and no recurrence for approximately 3 years, subsequent recurrences were very rare. In contrast, patients with recurrences before landmark time had a moderate risk of rerecurrence. Also, multiple tumors resulted in worse prognoses compared with single tumors.
Discussion
Here we applied dynamic prediction methods for repeated events using pseudoobservations and examined that performance using simulation studies. Prediction biases of the ensuing DPOs were calculated in a finite sample and indicated good performance regardless of processes for repeated events, which were assumed to be Markovian or semiMarkovian in the presence or absence of frailty. These assumptions were not testable using the observed data; therefore, independence of the present DPOs enables application of dynamic prediction based on landmarking to most types of timetoevent data observed in medical studies. Through real example, we drew the predicted probabilities of various type of events, such as repeated and terminal events. These comprehensive graphs must improve a subjective understanding of the disease.
In practice, it is easier to calculate proposed DPOs based on the KM estimator than on the AJ estimator using standard analysis packages because the AJ estimator requires matrix multiplication using proc. IML in SAS. A structure of landmark dataset and program codes of AJ estimator are shown in our website(https://github.com/yokotai/ and http://yokotai.wordpress.com/). Although repeated events can be modeled using marginal Cox model, DPOs are preferable to dynamic predictions for the following reasons: First, DPOs are free of the proportional hazards assumptions that are imposed on marginal Cox models. Second, generalized linear models with the supermodel can be used to smoothly predict probabilities against landmark time, whereas the supermodel of the marginal Cox modeling approach returns wiggly function of predicted probabilities against landmark time because of the step functions for estimates of baseline hazards. Since it is hard to think that the predicted probabilities repeat increasing and decreasing over time within a short timespan, it would be better to get the smooth curves for interpretation. Finally, the present DPOs have sufficient flexibility to accommodate the use of several link functions. Although a prediction model based hazard function is an analog of generalized linear model linked with complementary loglog function, our methods do not restrict any link function such as log, logit or probit.
Fitting DPOs to generalized linear models as multinomial outcomes may present practical issues because of the absence of standard analytic tools. Therefore, we recommend multivariate binary models after transformation using multinomial models to provide correct point estimates [39, 40]. Although variance estimates based on multivariate binary models have suspicious zerocovariance estimates between multinomial outcomes, these estimates can be used in practice because standard errors from binary models may slightly differ from those from multinomial models. In addition, because other covariates, model forms and selections of link function affect the lengths of confidence intervals, precision may be of less importance than accuracy in the prediction context.
There were two reasons why the simulation did not deal with the evaluation of predictive performance if we use longitudinal covariates available. First, model performance depends on specification of model form. On landmark supermodel, predicted probabilities at a certain time point was affected from another time point through smoothers f(s). This fact may cause more efficiency and more bias. The number of landmarking time and the interval of landmarking on a supermodel are worth investigating, but we have to make sure in a broad situation, and we would like to make it a future work. Second, there are few methods of predictive performance which can use for repeated events with terminal events. Model selection and model validation require the performance measures. We believe that loss function approaches with squared error, such as Brier scores in survival analyses [43, 44], should be applicable.
Conclusion
In this article, we proposed a dynamic prediction method for repeated events data regardless of whether or not to consider a termination event. The method can predict the event probabilities consistently regardless of processes for repeated events, which were assumed to be Markovian or semiMarkovian in the presence or absence of frailty. Through a simulation study, the method works well in a relatively small finite sample. We contributed a new modeling method of repeated events data with a terminal event which provided predicted probabilities of his/her prognosis and had an intuitive interpretation.
Abbreviations
 AJ:

AalenJohansen
 DPO:

Dynamic pseudoobservation
 KM:

KaplanMeier
 RMSE:

The root mean squared error
References
 1.
Oba M, Hasegawa K, Matsuyama Y, Shindoh J, Mise Y, Aoki T, et al. Discrepancy between recurrencefree survival and overall survival in patients with resectable colorectal liver metastases: a potential surrogate endpoint for time to surgical failure. Ann Surg Oncol. 2014;21:1817–24.
 2.
Cox DR. Regression models and lifetables (with discussion). J R Stat Soc B. 1972;34:187–220.
 3.
Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc. 1989;84:1065–73.
 4.
Cook RJ, Lawless JF. Marginal analysis of recurrent events and a terminating event. Stat Med. 1997;16:911–24.
 5.
Wang MC, Qin J, Chiang CT. Analyzing recurrent event data with informative censoring. J Am Stat Assoc. 2001;96:1057–65.
 6.
Liu L, Wolfe RA, Huang XL. Shared frailty models for recurrent events and a terminal event. Biometrics. 2004;60:747–56.
 7.
Ye Y, Kalbfleisch JD, Schaubel DE. Semiparametric analysis of correlated recurrent and terminal Eeents. Biometrics. 2007;63:78–87.
 8.
Fine JP, Jiang H, Chappell R. On semicompeting risks data. Biometrika. 2001;88:907–19.
 9.
Zhou R, Zhu H, Bondy M, Ning J. Semiparametric model for semicompeting risks data with application to breast cancer study. Lifetime Data Anal. 2016;22:456–71.
 10.
Xu JF, Kalbfleisch JD, Tai BC. Statistical analysis of illnessdeath processes and semicompeting risks data. Biometrics. 2010;66:716–25.
 11.
Kalbfleish JD, Prentice RL. The statistical analysis of failure time data, second edition. Hoboken: Wiley; 2002.
 12.
Cortese G, Andersen PK. Competing risks and timedependent covariates. Biom J. 2009;51:138–58.
 13.
Cortese G, Gerds TA, Andersen PK. Comparing predictions among competing risks models with timedependent covariates. Stat Med. 2013;32:3089–101.
 14.
Maziarz M, Heagerty P, Cai T, Zheng Y. On longitudinal prediction with timetoevent outcome: comparison of modeling options. Biometrics. 2017;73:83–93.
 15.
Suresh K, Taylor JMG, Spratt DE, Daignault S, Tsodikov A. Comparison of joint modeling and landmarking for dynamic prediction under an illnessdeath model. Biom J. 2017;59:1277–300.
 16.
Rizopoulos D. Dynamic predictions and prospective accuracy in joint models for longitudinal and timetoevent data. Biometrics. 2011;67:819–29.
 17.
Rizopoulos D. Joint models for longitudinal and timetoevent data, with applications in R. Boca Raton: CRC; 2012.
 18.
Mauguen A, Rachet B, MathoulinPélissier S, MacGrogan G, Laurent A, Rondeau V. Dynamic prediction of risk of death using history of cancer recurrences in joint frailty models. Stat Med. 2013;32:5366–80.
 19.
Taylor JMG, Park Y, Ankerst DP, Bae K, Pickles T, Sandler H. Realtime individual predictions of prostate cancer recurrence using joint models. Biometrics. 2013;69:206–13.
 20.
Hickey GL, Philipson P, Jorgensen A, KolamunnageDona R. Joint modelling of timetoevent and multivariate longitudinal outcomes: recent developments and issues. BMC Med Res Methodol. 2016;16:117.
 21.
Zheng Y, Heagerty PJ. Partly conditional survival models for longitudinal data. Biometrics. 2005;61:371–91.
 22.
van Houwelingen HC. Dynamic prediction by landmarking in event history analysis. Scand Stat Theory Appl. 2007;34:70–85.
 23.
van Houwelingen HC, Putter H. Dynamic prediction in clinical survival analysis. Boca Raton: CRC; 2011.
 24.
Musoro JZ, Struijk GH, Geskus RB, Ten Berge I, Zwinderman AH. Dynamic prediction of recurrent events data by landmarking with application to a followup study of patients after kidney transplant. Stat Methods Med Res. 2018;27:832–45.
 25.
Huang X, Yan F, Ning J, Feng Z, Choi S, Cortes J. A twostage approach for dynamic prediction of timetoevent distributions. Stat Med. 2016;35:2167–82.
 26.
Parast L, Cai T. Landmark risk prediction of residual life for breast cancer survival. Stat Med. 2013;32:3459–71.
 27.
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94:496–509.
 28.
Nicolaie MA, van Houwelingen JC, de Witte TM, Putter H. Dynamic prediction by landmarking in competing risks. Stat Med. 2013;32:2031–47.
 29.
Nicolaie MA, van Houwelingen JC, de Witte TM, Putter H. Dynamic pseudoobservations: a robust approach to dynamic prediction in competing risks. Biometrics. 2013;69:1043–52.
 30.
Pötschger U, Heinzl H, Valsecchi MG, Mittlböck M. Assessing the effect of a partly unobserved, exogenous, binary timedependent covariate on survival probabilities using generalised pseudovalues. BMC Med Res Methodol. 2018;18:14.
 31.
Graw F, Gerds TA, Schumacher M. On pseudovalues for regression analysis in competing risks models. Lifetime Data Anal. 2009;15:241–55.
 32.
Jacobsen M, Martinussen T. A note on the large sample properties of estimators based on generalized linear models for correlated pseudoobservations. Scand Stat Theory Appl. 2016;43:845–62.
 33.
Overgaard M, Parner ET, Pedersen J. Asymptotic theory of generalized estimating equations based on jackknife pseudoobservations. Ann Stat. 2017;45:1988–2015.
 34.
Andersen PK, Klein JP, Rosthoj S. Generalised linear models for correlated pseudoobservations, with applications to multistate models. Biometrika. 2003;90:15–27.
 35.
Aalen OO, Johansen S. An empirical transition matrix for nonhomogeneous Markovchains based on censored observations. Scand Stat Theory Appl. 1978;5:141–50.
 36.
Andersen PK, Borgan O, Gill RD, Keiding N. Statistical models based on counting processes. NewYork: SpringerVerlag; 1993.
 37.
Datta S, Satten GA. Validity of the Aalen–Johansen estimators of stage occupation probabilities and Nelson–Aalen estimators of integrated transition hazards for nonMarkov models. Stat Probab Lett. 2001;55:403–11.
 38.
Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53:457–81.
 39.
Hartzel J, Agresti A, Caffo B. Multinomial logit random effects models. Stat Modelling. 2001;1:81–102.
 40.
Kuss O, McLerran D. A note on the estimation of the multinomial logistic model with correlated responses in SAS. Comput Methods Prog Biomed. 2007;87:262–9.
 41.
Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
 42.
Kurland BF, Heagerty PJ. Directly parameterized regression conditioning on being alive: analysis of longitudinal data truncated by deaths. Biostatistics. 2005;6:241–58.
 43.
Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999;18:2529–45.
 44.
Schoop R, Beyersmann J, Schumacher M, Binder H. Quantifying the predictive accuracy of timetoevent models in the presence of competing risks. Biom J. 2011;53:88–112.
Acknowledgments
We thank Dr. Masaru Oba and Prof. Norihiro Kokudo for collecting and providing their colorectal liver metastases data at the HepatoBiliaryPancreatic Surgery Division, Department of Surgery, Graduate School of Medicine, The University of Tokyo. We would also like to express my gratitude to Prof. Yoichi M. Ito in the institute of statistical mathematics for financial support.
Funding
No grant support or other funding was received.
Availability of data and materials
The program codes are shared in IY’s website (https://github.com/yokotai/ or http://yokotai.wordpress.com/).
Author information
Affiliations
Contributions
IY and YM designed the concept of this research. IY conducted the simulation, analysed a real example and drafted the manuscript. YM supervised this study and critically reviewed the manuscript. Both authors have read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable because of developing of statistical method. Real example retrospective data were originally published in Oba M, et al. [1].
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Appendix A short explanation of productintegral. (PDF 56 kb)
Additional file 2:
Supplemental tables and figures of proposed DPOs to a real example. (PDF 885 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Yokota, I., Matsuyama, Y. Dynamic prediction of repeated events data based on landmarking model: application to colorectal liver metastases data. BMC Med Res Methodol 19, 31 (2019). https://doi.org/10.1186/s1287401906770
Received:
Accepted:
Published:
Keywords
 Dynamic prediction
 Landmarking
 Pseudoobservations
 Repeated events
 Terminal event