### The model

The course of currently accessible CML treatment can be described by a model where a patient can be in one of four possible states at a given time point (see Figure 1). Initially, each patient appears in state 1, corresponding to the diagnosis and start of the imatinib therapy. There are two possible transitions from this state: a patient may die without achieving the CCyR (state 2) or achieve the CCyR (state 3). After achieving the CCyR, patients may suffer from disease progression manifested by loss of the CCyR (state 4) or they may die while in remission (state 2). Finally, patients in state 4 may move back to the CCyR (state 3) or they may die (state 2). Obviously, all living patients can move from the CCyR (state 3) to the cytogenetic relapse (state 4) and vice versa repeatedly.

To address the first objective, i.e. to estimate the probability that a randomly selected patient is in his first or any subsequent CCyR at time *t* after the initiation of imatinib therapy, means to quantify the probability of being in state 3 over the interval [0, *t*]. A standard competing risks methodology can be used for this purpose. More specifically, a set of cumulative incidence functions can be used to estimate the probabilities associated with the *i*th achievement of CCyR or the *i*th loss of CCyR, respectively.

Let *r* denote the maximum number of CCyRs achieved in time by any patient. Let *I*
_{1}(*t*) be the common cumulative incidence function corresponding to the first achievement of CCyR (representing, transition: state 1 → state 3) and *I*
_{
i
} (*t*), *i* = 2,..., *r*, be the cumulative incidence functions corresponding to the subsequent achievements of CCyR (representing transitions: state 4 → state 3). Similarly, let {I}_{i}^{*}\left(t\right),i=1,\dots ,r, be the cumulative incidence functions corresponding to the *i*th loss of CCyR (representing transitions: state 3 → state 4), and {I}_{i}^{**}\left(t\right),i=1,\dots ,r, be the cumulative incidence functions corresponding to death after the *i*th achievement of CCyR (representing transitions: state 3 → state 2) Then, the probability of state being in first or state any subsequent CCyR at time *t* after the initiation of imatinib therapy, denoted here as the current cumulative incidence of leukaemia-free patients (CCI), can be written using the common cumulative incidence functions:

\begin{array}{ll}\hfill CCI\left(t\right)& =\sum _{i=1}^{r}{I}_{i}\left(t\right)-\sum _{i=1}^{r}{I}_{i}^{*}\left(t\right)-\sum _{i=1}^{r}{I}_{i}^{**}\left(t\right)\phantom{\rule{2em}{0ex}}\\ =\sum _{i=1}^{r}\left[{I}_{i}\left(t\right)-{I}_{i}^{*}\left(t\right)-{I}_{i}^{**}\left(t\right)\right].\phantom{\rule{2em}{0ex}}\end{array}

(1)

Regarding the second objective of this paper, the estimation of the CLFS function, only patients that have ever reached state 3 have to be considered for the estimation process, where the starting time point is the achievement of the first CCyR. Accordingly, we consider only *r* -1 CCyRs that can be achieved in time, indexed with *i* = 2, ..., *r*. Then, let {S}_{1}^{*}\left(t\right) be the common leukaemia-free survival function, where the event of interest is death in the first CCyR (representing transition: state 3 → state 2) or the first loss of CCyR (representing a first transition from state 3 to state 4), and {S}_{i}^{*}\left(t\right),i=2,\dots ,r, be the survival functions corresponding to the subsequent losses of CCyR, where the event of interest is death prior to the *i*th loss of CCyR (any transition to state 2 prior to the *i*th transition: state 3 → state 4) or the *i*th loss of CCyR (*i*th transition: state 3 → state 4). Furthermore, let *S*
_{
i
} (*t*), *i* = 2,..., *r*, be the survival functions corresponding to the *i*th achievement of CCyR, i.e. patients who die before their *i*th CCyR (any transition to state 2 prior to the *i*th transition: state 4 → state 3) or who have achieved their *i*th CCyR (*i*th transition: state 4 → state 3) are treated as events. Then, the probability of being in first or any subsequent CCyR at time *t* after the achievement of the first CCyR, denoted as CLFS, can be written using the survival functions:

CLFS\left(t\right)={S}_{1}^{*}\left(t\right)+\sum _{i=2}^{r}\left[{S}_{i}^{*}\left(t\right)-{S}_{i}\left(t\right)\right].

(2)

### Nonparametric estimation

To estimate the individual *CCI*(*t*) components, *I*
_{
i
} (*t*) and {I}_{i}^{*}\left(t\right), respectively, we need to associate each patient with a pair of vectors denoted as (*T*
_{1},..., *T*
_{
r
} , *R*
_{1},..., *R*
_{
r
} )' and {\left({T}_{1}^{*},\dots ,{T}_{r}^{*},{R}_{1}^{*},\dots ,{R}_{r}^{*}\right)}^{\prime}, respectively, where the former vector represents the CCyR achievements and the latter represents the CCyR losses. Regarding the first of the vectors, *T*
_{
i
} is the time to the *i*th event, i.e. the *i*th CCyR or death, or *T*
_{
i
} is the censoring time, whereas *R*
_{
i
} is the failure cause. When the exact *i*th failure time is not observed, i.e. *T*
_{
i
} is censored, then *R*
_{
i
} = 0; when the exact failure time is known and the failure cause is the achievement of CCyR, then = *R*
_{
i
} = 1; and when the exact failure time is known and the failure cause is death, then *R*
_{
i
} = 2. The second vector is organized in the same manner with the exception that the event of interest is the loss of CCyR instead of the CCyR achievement.

Let *λ*
_{
i
} (*t*) and {\lambda}_{i}^{*}\left(t\right) be the cause-specific hazard functions representing the intensity of achieving the *i*th CCyR in time and the intensity of losing the *i*th CCyR in time, respectively. Similarly, let *S*
_{
i
} (*t*) and {S}_{i}^{*}\left(t\right) be the all-cause survival functions considering either the *i*th CCyR or death and the loss of the *i*th CCyR or death, respectively, as the competing causes of failure. Then, assuming independent and identically distributed observations and independent right censoring, the cumulative incidence functions *I*
_{
i
} (*t*) and {I}_{i}^{*}\left(t\right), respectively, can be expressed in the standard way as

\begin{array}{ll}\hfill {I}_{i}\left(t\right)& =P\left({T}_{i}\le t,{R}_{i}=1\right)\phantom{\rule{2em}{0ex}}\\ =\underset{0}{\overset{t}{\int}}{\lambda}_{i}\left(u\right){S}_{i}\left(u\right)du,\phantom{\rule{2em}{0ex}}\end{array}

(3)

\begin{array}{ll}\hfill {I}_{i}^{*}\left(t\right)& =P\left({T}_{i}^{*}\le t,{R}_{i}^{*}=1\right)\phantom{\rule{2em}{0ex}}\\ ={\int}_{0}^{t}{\lambda}_{i}^{*}\left(u\right){S}_{i}^{*}\left(u\right)du.\phantom{\rule{2em}{0ex}}\end{array}

(4)

The cumulative incidence function corresponding to death after the *i*th achievement of CCyR, {I}_{i}^{**}\left(t\right), can be expressed in a similar way treating death after the *i*th achievement of CCyR as an event of interest and both death before the *i*th achievement of CCyR and the *i*th loss of CCyR as competing risks.

Regarding the *i*th CCyR achievement and loss, respectively, let *t*
_{
i
}
_{1},..., *t*
_{
ij
} ,..., *t*
_{
in
} and {t}_{i1}^{*},\dots ,{t}_{ij}^{*},\dots ,{t}_{in}^{*} be the observed individual times from the imatinib therapy initiation to the *i*th CCyR achievement and loss, respectively, ranked in ascending order. The *λ*
_{
i
} (*t*) and {\lambda}_{i}^{*}\left(t\right) functions can be then estimated using the Nelson-Aalen estimator [15] which in case of *λ*
_{
i
} (*t*) and a particular time point *t*
_{
ij
} is of the form {\widehat{\lambda}}_{i}\left({t}_{ij}\right)={c}_{ij}\u2215{n}_{ij}, where *n*
_{
ij
} is the number of patients "at risk" of *i*th CCyR or death at time *t*
_{
ij
} , i.e. the number of patients with *T*
_{
i
} ≥ *t*
_{
ij
} , and *c*
_{
ij
} is the number of patients achieving the *i*th CCyR at time *t*
_{
ij
} , i.e. the number of patients with *T*
_{
i
} = *t*
_{
ij
} , and *R*
_{
i
} = 1. The overall survival functions, *S*
_{
i
} (*t*) and {S}_{i}^{*}\left(t\right), can be estimated using the standard Kaplan-Meier estimator [8] which for *S*
_{
i
} (*t*) at *t*
_{
ij
} has the form {\u015c}_{i}\left({t}_{ij}\right)={\prod}_{k:{t}_{ik}\le {t}_{ij}}\left(1-{d}_{ik}\u2215{n}_{ik}\right), where *d*
_{
ik
} is the number of patients achieving the *i*th CCyR or dying at time *t*
_{
ik
} , i.e. the number of patients with *T*
_{
i
} = *t*
_{
ik
} and *R*
_{
i
} = 1 or *R*
_{
i
} = 2. Incorporating these nonparametric estimates to eq. (3) and (4), *I*
_{
i
} (*t*) and {I}_{i}^{*}\left(t\right) functions, respectively, can be estimated with

\begin{array}{c}{\widehat{I}}_{i}(t)={\displaystyle \sum _{j:{t}_{ij}\le t}{\widehat{\lambda}}_{i}({t}_{ij}){\widehat{S}}_{i}({t}_{i(j-1)})}\\ ={\displaystyle \sum _{j:{t}_{ij}\le t}\left[\frac{{c}_{ij}}{{n}_{ij}}{\displaystyle \prod _{k:{t}_{ik}<{t}_{ij}}\left(1-\frac{{d}_{ik}}{{n}_{ik}}\right)}\right]},\end{array}

(5)

and

{\widehat{I}}_{i}^{*}(t)={\displaystyle \sum _{j:{t}_{ij}\le t}{\widehat{\lambda}}_{i}^{*}({t}_{ij}){\widehat{S}}_{i}^{*}({t}_{i(j-1)})}.

(6)

Let {\widehat{\lambda}}_{i}^{**}\left({t}_{ij}\right) be the Nelson-Aalen estimator of the cause-specific hazard function representing the intensity of dying after the *i*th CCyR in time. Then the {I}_{i}^{**}\left(t\right) function can be estimated with

{\widehat{I}}_{i}^{**}(t)={\displaystyle \sum _{j:{t}_{ij}\le t}{\widehat{\lambda}}_{i}^{**}({t}_{ij}){\widehat{S}}_{i}^{*}({t}_{i(j-1)})},

(7)

where {\u015c}_{i}^{*}\left({t}_{i\left(j-1\right)}\right) is the Kaplan-Meier estimate of the all-cause survival function considering either the loss of the *i*th CCyR or death, respectively, as the competing causes of failure. Obviously, the {\u015c}_{i}^{*}\left({t}_{i\left(j-1\right)}\right) estimate in eq. (7) is the same one as in eq. (6).

The model depicted in Figure 1 represents rather a clinical background of the process than the computational aspects. To be able to compare the proposed estimator with that of Klein et al. [11], the model has to be expressed as a progressive multi-state model with each achievement of the CCyR, each loss of the CCyR, and death following either of these possibilities being represented with one state. For example, regarding only two possible disease remissions and two subsequent relapses, the progressive multi-state model has nine states. The first state (state 0) corresponds to CML diagnosis and treatment initiation. Furthermore, a patient may die (state 1) or achieve the first CCyR (state 2). After achieving the first CCyR, a patient may lose the CCyR (state 4) or die (state 3). Once a patient has relapsed, he may die (state 5) or achieve the second CCyR (state 6). Finally, being in the second CCyR, a patient may again relapse (state 8) or die (state 7). Regarding the probability that a randomly selected patient is in his first or second CCyR at time *t* after the initiation of imatinib therapy, our interest is in estimating the probability of being in state 2 or 6.

Let *P*
_{0k
}(*t*) be the probability that a patient who was in state 0 at time 0 will be in state *k* at time *t*. Then the cumulative incidences corresponding to the achievements and the losses of CCyR as well as deaths after the *i*th achievement of CCyR can be written as follows:

\begin{array}{c}{I}_{1}\left(t\right)={P}_{02}\left(t\right)+{P}_{03}\left(t\right)+{P}_{04}\left(t\right)+{P}_{05}\left(t\right)\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+{P}_{06}\left(t\right)+{P}_{07}\left(t\right)+{P}_{08}\left(t\right),\end{array}

(8)

\begin{array}{c}{I}_{1}^{*}\left(t\right)={P}_{04}\left(t\right)+{P}_{05}\left(t\right)+{P}_{06}\left(t\right)+{P}_{07}\left(t\right)\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+{P}_{08}\left(t\right),\end{array}

(9)

{I}_{1}^{**}\left(t\right)={P}_{03}\left(t\right),

(10)

{I}_{2}\left(t\right)={P}_{06}\left(t\right)+{P}_{07}\left(t\right)+{P}_{08}\left(t\right),

(11)

{I}_{2}^{*}\left(t\right)={P}_{08}\left(t\right),

(12)

{I}_{2}^{**}\left(t\right)={P}_{07}\left(t\right).

(13)

Adding and subtracting terms given by eq. (8) - (13) according to eq. (1) we get *CCI*(*t*) = *P*
_{02}(*t*) + *P*
_{06}(*t*), which is exactly the same expression as we would have obtained considering the estimation of the probability of being in state 2 or 6 with the Markov multi-state model of Klein et al. [11]. The difference between these two estimators is in a way how the probabilities of interest are estimated. Regarding the Markov model, the transition probabilities are estimated directly using the estimates {\widehat{P}}_{02}\left(t\right) and {\widehat{P}}_{06}\left(t\right) based on the estimated transition probability matrix, whereas the expression proposed in this paper suggests estimating the probabilities of interest in an indirect manner using the Aalen-Johansen estimates of the cumulative incidences. Comparison of the Markov model and the current cumulative incidence estimates of the probability that a patient will be in any CCyR at time *t* after the initiation of imatinib therapy using the Czech CML data is provided in the Results section.

To estimate *S*
_{
i
} (*t*) and {S}_{i}^{*}\left(t\right) in eq. (2), respectively, the vectors characterizing each patient need to be rewritten with respect to the fact that the starting time point is the achievement of the first CCyR. In other words, we need to adjust all event times for the time *T*
_{1} representing the time to the first CCyR. The vectors can be rewritten as (*T*
_{2} - *T*
_{1},..., *T*
_{
r
} - *T*
_{1}, *R*
_{2},..., *R*
_{
r
} )' and {\left({T}_{1}^{*}-{T}_{1},\dots ,{T}_{r}^{*}-{T}_{1},{R}_{1}^{*},\dots ,{R}_{r}^{*}\right)}^{\prime}. Obviously, only patients with *R*
_{1} = 1 are used for the estimation of *S*
_{
i
} (*t*) and {S}_{i}^{*}\left(t\right). Both functions can be easily estimated using the above mentioned Kaplan-Meier estimator regarding *i*th achievement of CCyR or death before the *i*th CCyR and *i*th loss of CCyR or death before the *i*th loss of CCyR, respectively, as the events of interest. Let {\u015c}_{i}\left(t\right) and {\u015c}_{i}^{*}\left(t\right) be the Kaplan-Meier estimators of *S*
_{
i
} (*t*) and {S}_{i}^{*}\left(t\right), respectively. Then, the *CLFS*(*t*) estimator is given by

\hat{CLFS}\left(t\right)={\u015c}_{1}^{*}\left(t\right)+\sum _{i=2}^{r}\left[{\u015c}_{i}^{*}\left(t\right)-{\u015c}_{i}\left(t\right)\right].

(14)

It should be noted that eq. (14) is principally the same as the nonparametric estimator proposed in Klein et al. [11]. However, their model considered only two remission phases, i.e. they considered *r* = 2. In the context of the current CML therapy, we anticipate multiple disease remissions and relapses over time, which means that *r* will be dependent mainly on the follow-up of the particular group of patients.

So far, only point estimates were considered. However, confidence intervals (CI) are also necessary to fully cover the variability of the point estimates. Standard error estimator can be derived using theoretical results published by Pepe [13] and Lin [16]; however, it is neither mathematically nor computationally simple. Bootstrapping represents a feasible alternative for the estimation of the 100(1 - *α*)% confidence bands. Given the observed data, either Efron's bootstrap procedure I or II [17] can be used to resample from randomly right-censored observations. The Efron's procedure II simply draws random samples with replacement from the observed vectors of survival times and corresponding censoring indicators. On the other hand, the procedure I considers separate resampling with replacement from the empirical distribution of survival times and the empirical distribution of censoring times. The empirical distributions are estimated from data by the Kaplan-Meier estimator, and the bootstrap sample observations are then obtained taking minima and corresponding indicators. The confidence bands can then be estimated using the percentile method, where, for example, the 2.5 and the 97.5 percentiles of the estimated bootstrap distribution can be used as the limits of the 95% confidence interval.