### Estimation of CGRFS

For CGRFS, onset and resolution of chronic GVHD need to be clearly determined. Onset of chronic GVHD event was defined as moderate-to-severe chronic GVHD based on the NIH criteria [15] at the time of the most recent assessment. GVHD evaluation was prospectively performed by a single practitioner within the program. Resolution of chronic GVHD was determined if symptoms became quiescent and systemic immunosuppression discontinued.

We constructed a multi-state model to depict the disease progression after transplantation. Two episodes of chronic GVHD were incorporated in the model (Fig. 1). The model contains the following states,

State 0: Alive in initial remission state without experiencing chronic GVHD

State 1: Dead or relapsed before first chronic GVHD

State 2: Alive with first chronic GVHD

State 3: Dead or relapsed in first chronic GVHD

State 4: Alive without first chronic GVHD

State 5: Dead or relapsed in recovery from first chronic GVHD

State 6: Alive with second chronic GVHD

State 7: Dead or relapsed in second chronic GVHD

State 8: Alive without second chronic GVHD

State 9: Dead or relapsed in recovery from second chronic GVHD

CGRFS is the probability that one stays in state 0 or 4 or 8 at time *t*. Let \(P_{kl}(s,t)\) be the transition probability from state *k* to *l* in time interval [*s*, *t*]. CGRFS, denoted by *C*(*t*), is defined as the sum of three transition probabilities, \(C(t)=P_{00}(0,t)+P_{04}(0,t)+P_{08}(0,t)\). An intensity matrix can be constructed based on the transition intensities, \(\lambda _{01},\cdots ,\lambda _{89}\) (Fig. 1). The conventional method of estimating \(P_{00}(0,t), P_{04}(0,t)\) and \(P_{08}(0,t)\) is to consider the product integral of the intensity matrix [6].

The product-integral method is computationally expensive. The issue is most severe for \(P_{08}(0,t)\) because a subject has to experience three states before reaching state 8, alive without the second chronic GVHD. Following Pepe [7] and Klein, Keiding et al. [8], we developed the model-free approach for CGRFS [2]. We formulated CGRFS as a linear combination of five survival functions, \(S_1(t),\cdots ,S_5(t)\), pertaining to five composite endpoints. These five composite endpoints are explained as follows,

\(S_1(t)\): First occurrence of first chronic GVHD or death/relapse,

\(S_2(t)\): First occurrence of second chronic GVHD or death/relapse,

\(S_3(t)\): First occurrence of resolution of first chronic GVHD or death/relapse,

\(S_4(t)\): Death/relapse at any time,

\(S_5(t)\): First occurrence of resolution of second chronic GVHD or death/relapse.

Let \(T_{k}\) be the time to the *k*th composite endpoint and \(S_k(t)=Pr(T_k > t)\) (\(k=1,\cdots ,5\)). Note that \(S_1(t)\) coincides with \(P_{00}(0,t)\). The composite endpoint for \(T_2\) is the first occurrence of second chronic GVHD or death/relapse. For \(T_2\), the death/relapse event could only be death/relapse in remission, in first chronic GVHD or in resolved first chronic GVHD. A patient without experiencing this composite endpoint stays in state 0, 2 or 4 (see Fig. 1). Using transition probabilities, \(S_2(t)=\mathrm {Pr}(T_2>t)=P_{00}(0,t)+P_{02}(0,t)+P_{04}(0,t)\). Similarly, the composite endpoint for \(T_3\) is the first occurrence of resolution of first chronic GVHD or death/relapse. Without experiencing this composite event, one stays in state 0 or 2. That is, \(S_3(t)=\mathrm {Pr}(T_3>t)=P_{00}(0,t)+P_{02}(0,t)\). Consequently, \(S_2(t)-S_3(t)\) yields \(P_{04}(0,t)\). In addition, based on the composite endpoints for \(T_4\) and \(T_5\), \(S_4(t)\) is the probabilities of staying in state 0, 2, 4, 6 or 8 while \(S_5(t)\) is the probability of staying in state 0, 2, 4 or 6. We can get that \(P_{08}(0,t)=S_4(t)-S_5(t)\). In summary, CGRFS is a linear combination of these five survival functions, \(C(t)=S_1(t)+S_2(t)-S_3(t)+S_4(t)-S_5(t)\). Among these survival functions, \(S_1(t)\) and \(S_4(t)\) are practically meaningful as they are the relapse-free, chronic GVHD free survival and DFS, respectively. Other survival functions are not interpretable in real life but introduced here to find probability of being in one state. All these five survival functions can be estimated by the Kaplan-Meier method.

To introduce inferences for CGRFS, we used the counting process notations. Let *Z* be the censoring time. Suppose that data of *n* patients are collected. The sample can be summarized as \((X_{ik}, \Delta _{ik})\ (i=1,\cdots ,n;\ k=1,\cdots ,5)\), where \(X_{ik}=\min (T_{ik}, Z_i),\ \Delta _{ik}=I(T_{ik}\leq Z_i)\) and \(I(\bullet )\) is an indicator function which takes value 1 if the event happens, and 0 otherwise. \(N_{ik}(t)\) indicates whether the *i*th patient experiences the *k*th composite endpoint at or prior to *t*, that is, \(N_{ik}(t)=I(X_{ik}\leq t, \Delta _{ik}=1)\) and let \(\bar{N}_k(t)=\sum _{i=1}^n N_{ik}(t)\). Also we defined the risk sets related to the five composite endpoints. Let \(Y_{ik}(t)=I(X_{ik}\geq t)\) and \(\bar{Y}_k=\sum _{i=1}^n Y_{ik}(t)\). So \(\bar{Y}_1(t)\) is number of patients alive in remission without experiencing CGVHD, i.e., in state 0 at time *t*, and

\(\bar{Y}_2(t)\): Number of patients in states 0, 2, 4 at time *t*

\(\bar{Y}_3(t)\): Number of patients in states 0, 2 at time *t*

\(\bar{Y}_4(t)\): Number of patients in states 0, 2, 4, 6, 8 at time *t*

\(\bar{Y}_5(t)\): Number of patients in states 0, 2, 4, 6 at time *t*

Using the counting process notations, the Kaplan-Meier estimator of \(S_k(t)\) is given by

$$\begin{aligned} \widehat{S}_k(t)=\prod\limits_{u\geq 0}^t \left[ 1-\frac{d\bar{N}_k(u)}{\bar{Y}_k(u) } \right] , \quad k=1,\cdots , 5. \end{aligned}$$

CGRFS, as a linear combination of five survival functions, can be subsequently estimated by \(\widehat{C}(t)=\widehat{S}_1(t)+\widehat{S}_2(t)-\widehat{S}_3(t)+\widehat{S}_4(t)-\widehat{S}_5(t)\).

The hazard function of \(T_k\) is defined as \(\lambda _k(t)=\lim _{\Delta t\rightarrow 0}\mathrm {Pr}(t\leq T_{k} < t+\Delta t | T_k\geq t)/\Delta t\). Then we have the martingales

$$\begin{aligned} M_{ik}(t)=N_{ik}(t)- \int _0^t Y_{ik}(u) \lambda _k(u)du, \quad i=1,\cdots ,n;\ k=1,\cdots ,5. \end{aligned}$$

Define \(y_k(t)=n^{-1}\lim _{n\rightarrow \infty } \bar{Y}_k(t), \forall k=1,\cdots ,5\). Based on the facts in Andersen et al. [6] and Pepe [7], for large samples,

$$\begin{aligned} \sqrt{n}\left[ \widehat{C}(t)-C(t) \right] \approx \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n W_i(t), \end{aligned}$$

where

$$\begin{aligned} W_i(t)= & {} -S_1(t)\int _0^t \frac{dM_{i1}(u)}{y_1(u)} -S_2(t)\int _0^t \frac{dM_{i2}(u)}{y_2(u)}+S_3(t)\int _0^t\frac{dM_{i3}(u)}{y_3(u)}\\&-S_t(t)\int _0^t \frac{dM_{i4}(u)}{y_4(u)}+S_5(t)\int _0^t\frac{dM_{i5}(u)}{y_5(u)} \end{aligned}$$

Based on the martingale central limit theorem, \(\forall t\), \(\sqrt{n}\left[ \widehat{C}(t)-C(t) \right]\) converges to a mean-zero normal distribution with variance \(\sigma (t)^2=E[ W_i(t)^2]\). It should be noted that the martingale processes, \(M_{i1}(t), \cdots , M_{i5}(t)\), are not orthogonal because by definition the event counting processes may involve the same events. Consequently the covariance should be considered if one wishes to consider the martingale variation process. Here we alternatively consider a moment estimator of the variance, \(\widehat{\sigma }(t)^2=n^{-1}\sum _{i=1}^n [ \widehat{W}_i(t)^2]\), where

$$\begin{aligned} \widehat{W}_i(t)= & {} n\left[ -\widehat{S}_1(t)\int _0^t \frac{\widehat{M}_{i1}(u)}{\bar{Y}_1(u)} -\widehat{S}_2(t)\int _0^t \frac{\widehat{M}_{i2}(u)}{\bar{Y}_2(u)}+\widehat{S}_3(t)\int _0^t\frac{\widehat{M}_{i3}(u)}{\bar{Y}_3(u)} \right. \\&\left. -\widehat{S}_t(t)\frac{\widehat{M}_{i4}(u)}{\bar{Y}_4(u)}+\widehat{S}_5(t)\int _0^t \frac{\widehat{M}_{i5}(u)}{\bar{Y}_5(u)} \right] \end{aligned}$$

and

$$\begin{aligned} \widehat{M}_{ik}(t)=N_{ik}(t)- \int _0^t \frac{dN_{ik}(u)}{\bar{Y}_k(u)} . \end{aligned}$$

The above moment variance estimator is similar to the variance estimation method provided by Klein, Keiding et al. [8] for current LFS. A linear \((1-\alpha )100\%\) confidence interval for CGRFS can be calculated by \(\widehat{C}\pm n^{-1/2}z_{1-\alpha /2}\ \widehat{\sigma }(t)\). Log-log transformation is routinely used for interval estimation of a survival probability. A \((1-\alpha )100\%\) log-log transformed confidence interval for *C*(*t*) is given by

$$\begin{aligned} \left[ \widehat{C}(t)^{1/\theta },\ \widehat{C}(t)^\theta \right] \ \ \mathrm {with}\ \ \theta =\exp \left( \frac{n^{-1/2}z_{1-\alpha /2}\ \widehat{\sigma }(t)}{\widehat{C}(t)\ln \left[ \widehat{C}(t)\right] }\right) . \end{aligned}$$

(1)

### A confidence band for CGRFS

In survival analysis the simulation approach has been commonly used to find the confidence band of a survival function [16, 17]. \(\forall i,k\), let \(G_{ik}(t)\) be a standard normal random variate. The martingale process \(\int _0^t dM_{ik}(u)\) has the same distribution as \(\int _0^t G_{ik}(u)dN_{ik}(u)\). Based on this knowledge, we consider the process \(\widetilde{W}(t)\) that

$$\begin{aligned} \widetilde{W}(t)= & {} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \left[ -S_1(t)\int _0^t \frac{G_{i1}(u)dN_{i1}(u)}{y_1(u)} -S_2(t)\int _0^t \frac{G_{i2}(u)dN_{i2}(u)}{y_2(u)}\right. \nonumber \\&\left. +\ S_3(t)\int _0^t\frac{G_{i3}(u)dN_{i3}(u)}{y_3(u)} -S_t(t)\int _0^t \frac{G_{i4}(u)dN_{i4}(u)}{y_4(u)}+S_5(t)\int _0^t\frac{G_{i5}(u)dN_{i5}(u)}{y_5(u)} \right] . \end{aligned}$$

(2)

Let \(Q(t)=\left| \sqrt{n}\left\{ \widehat{C}(t) -C(t)\right\} \right|\), then *Q*(*t*) can be approximated by \(\widetilde{W}(t)\). To construct a band for *C*(*t*) in an interval \([t_1,t_2]\), one needs to find the critical value such that

$$\begin{aligned} \mathrm {Pr}\left( {\underset{t\in [t_1,t_2]}{\sup}} \left| Q(t) /\sigma (t)\right| >q_\alpha \right) =\alpha . \end{aligned}$$

To obtain a realized process, one can generate standard normal random variates for \(G_{ik}(t),\forall i, k\). Also plug in the Kaplan-Meier estimators for survival probabilities and \(Y_k(t)/n\) to replace \(y_k(t)\),

$$\begin{aligned} \widehat{\widetilde{W}}(t)= & {} \sqrt{n}\sum \limits _{i=1}^n \left[ -\widehat{S}_1(t)\int _0^t \frac{G_{i1}(u)dN_{i1}(u)}{\bar{Y}_1(u)} -\widehat{S}_2(t)\int _0^t \frac{G_{i2}(u)dN_{i2}(u)}{\bar{Y}_2(u)}\right. \nonumber \\&\left. +\ \widehat{S}_3(t)\int _0^t\frac{G_{i3}(u)dN_{i3}(u)}{\bar{Y}_3(u)} -\widehat{S}_t(t)\int _0^t \frac{G_{i4}(u)dN_{i4}(u)}{\bar{Y}_4(u)}+\widehat{S}_5(t)\int _0^t\frac{G_{i5}(u)dN_{i5}(u)}{\bar{Y}_5(u)} \right] . \end{aligned}$$

(3)

Given *B* realized processes, let \(\widehat{\widetilde{W}}_b(t)\) denote the *b*th realized process. The critical value is obtained by finding the \((1-\alpha )100th\) percentile of the supremum values, which is given by

$$\begin{aligned} B^{-1}\sum\limits_{b=1}^B\mathcal {I}\left( {\underset{t\in [t_1,t_2]}{\sup}} \left| \widehat{\widetilde{W}}_b (t)/\widehat{\sigma }(t)\right| > \tilde{q}_\alpha \right) =\alpha , \end{aligned}$$

where \(\mathcal {I}(\bullet )\) is the indicator function. A confidence band for *C*(*t*) is given by \(\widehat{C}(t)\pm n^{-1/2}\tilde{q}_\alpha \widehat{\sigma }(t), \forall t \in [t_1,t_2]\).

### A confidence band for differences in CGRFS between two independent samples

The method described in Section 2.2 can be extended to construct a confidence band for differences in CGRFS between two independent samples. Such a band could tell in what time range that CGRFS’s of two groups differ. This type of band is related to the hypotheses \(H_0: C_1(t)-C_2(t)=0\) versus \(H_a: C_1(t)-C_2(t)\ne 0, \forall t \in [t_1,t_2]\), where \(C_i(t)\) is CGRFS for sample *i*. A supremum test for the hypotheses can be developed.

Let \(\widehat{C}_1(t)\) and \(\widehat{C}_2(t)\) be the estimated CGRFS of samples 1 and 2, respectively. Let the processes described in Eq. (2) for samples 1 and 2 denoted by \(\widetilde{W}^{(1)}(t)\) and \(\widetilde{W}^{(2)}(t)\), respectively. Under the null hypothesis, asymptotically \(\sqrt{n}\left\{ C_1(t)-C_2(t)\right\}\) has the same distribution as \(\widetilde{W}^{(1)}(t)- \widetilde{W}^{(2)}(t)\). To obtain a standardized realized process, we need to estimate the standard error for \(\sqrt{n}\left\{ C_1(t)-C_2(t)\right\}\), \(\widehat{SE}(t)=\sqrt{\widehat{\sigma }_1(t)^2+\widehat{\sigma }_2(t)^2}\), where \(\widehat{\sigma }_i(t)^2 ( i=1,2)\) is the estimated variances for sample *i*. A standardized realized process is defined as \(\widehat{U}(t)=\left\{ \widehat{\widetilde{W}}^{(1)}(t)- \widehat{\widetilde{W}}^{(2)}(t)\right\} / \widehat{SE}(t)\). We can generate *B* realized processes

$$\begin{aligned} \widehat{U}_b(t)=\left\{ \widehat{\widetilde{W}}^{(1)}_b(t)- \widehat{\widetilde{W}}^{(2)}_b(t)\right\} /\widehat{SE}(t),\ b=1,\cdots ,B. \end{aligned}$$

The critical value for the band can be obtained by finding the \((1-\alpha)100th\) percentile among *B* supremum values,

$$\begin{aligned} B^{-1}\sum\limits_{b=1}^B\mathcal {I}\left( {\underset{t\in [t_1,t_2]}{\sup}} \left| \widehat{U}_b (t)\right| > \tilde{g}_\alpha \right) =\alpha . \end{aligned}$$

The \((1-\alpha )100\%\) confidence band for \(C_1(t)-C_2(t)\) is given by \(\widehat{C}_1(t)-\widehat{C}_2(t) \pm n^{-1/2}\tilde{g}_\alpha \widehat{SE}(t)\).

For the supremum test, we evaluate the supremum for the sample data,

$$\begin{aligned} \widehat{K}={\underset{t\in [t_1,t_2]}{\sup}}\left| \sqrt{n}\left\{ \widehat{C}_1(t)-\widehat{C}_2(t)\right\} /\widehat{SE}(t)\right| . \end{aligned}$$

The test *p*-value is to the probability of observing \(\widehat{K}\) or more extreme values in the sampling distribution of \(\sup _{t\in [t_1,t_2]}\left| \widehat{U}(t)\right|\). The *p*-value can be obtained by finding the proportion of the supremum values higher or equal to \(\widehat{K}\),

$$\begin{aligned} p=B^{-1} \sum\limits_{b=1}^n\mathcal {I} \left( {\underset{t\in [t_1,t_2]}{\sup}}\left| \widehat{U}_b(t)\right| \geq \widehat{K}\right) . \end{aligned}$$

In a supremum test, one should determine the time interval \([t_1,t_2]\) in which survival functions of two samples are compared. If the goal is to compare survival functions over the entire study period, one may set \(t_1=\max \left( t_{\mathrm {min}}^{(1)}, t_{\mathrm {min}}^{(2)}\right)\) where \(t_{\mathrm {min}}^{(1)}\) and \(t_{\mathrm {min}}^{(2)}\) are the smallest event times in samples 1 and 2, respectively. If a large number of subjects remain under study even after the largest event time, one may set \(t_2\) to be the largest event time in two samples.