In this section we will present: 1) the definition of the RMST; 2) the general regression model on pseudo-values; 3) the modelling of the RMST with right-censored failure time data with covariates using pseudo-values through a linear model; 4) the extension to smoothing functions and time-varying effects; 5) the model-based comparison of RMST curves.
Restricted mean survival time
In survival analysis the time T elapsed from an initial event to the possible occurrence of a terminating event is analysed. Usually, only a right-censored version of the random variable T is observed and, therefore, the mean value of T is not easy to estimate non-parametrically, [15]. As a replacement, the τ-restricted mean survival time (RMST) is defined as:
$$ \text{RMST}(\tau) = \int_{0}^{\tau} \mathrm{S}(t) dt $$
(1)
where \(\mathrm {S}(t) = \mathrm {P}(T>t) = \exp (- \int _{0}^{t} \lambda (u) du)\) is the survival function and λ(t) is the hazard function. The RMST(τ) represents the expected lifetime, E(T∧τ) over a time horizon equal to τ. The difference between RMST(τ) for different treatments has been advocated as a useful summary measure in clinical applications [7, 8, 10].
General regression model on pseudo-values
The idea of using pseudo-values for censored data analysis was introduced by Andersen et al. (2003) [16]. Let Xi,i=1,…,n, be independent and identically distributed random variables and θ be a parameter of the form
$$ \theta = E(f(X_{i})) $$
(2)
and assume that we have an (at least approximately) unbiased estimator, \(\hat {\theta }\), for this parameter. Let Zi,i=1,…,n be independent and identically distributed covariates and define the conditional expectation
$$ \theta_{i} = E(f(X_{i}) | \boldsymbol{Z}_{i}) $$
(3)
The ith pseudo-observation is defined as
$$ \hat{\theta}_{i} = n \hat{\theta} - (n-1) \widehat{\theta^{-i}} $$
(4)
where \(\widehat {\theta ^{-i}}\) is “the leave-one-out”estimator for θ based on Xj,j≠i. If all Xi are observed then θ may be estimated by the average of the f(Xi) in which case \(\hat {\theta }_{i}\) is simply f(Xi). This approach will be used here with a censored sample of the Xi. A regression model for the parameter θ corresponds to a specification of how θi depends on Zi and this may done via a generalized linear model
$$ g({\theta}_{i}) = \boldsymbol{\beta}^{T} \boldsymbol{Z}_{i} $$
(5)
where the matrix Z contains a column of 1, corresponding to the intercept. The regression coefficients β can be estimated using generalized estimating equations [17]
$$ \begin{aligned} U(\boldsymbol{\beta}) = \sum_{i=1}^{n} U_{i}(\boldsymbol{\beta}) = \sum_{i=1}^{n} \left(\frac{\partial}{\partial \boldsymbol{\beta}} g^{-1} (\boldsymbol{\beta}^{T} \boldsymbol{Z}_{i}) \boldsymbol{R}_{i}^{-1} (\hat{\theta}_{i} - g^{-1} (\boldsymbol{\beta}^{T} \boldsymbol{Z}_{i})) \right) \end{aligned} $$
(6)
In the general situation θ may be multivariate and Ri is a working covariance matrix [18]. Andersen et al. (2003) [16] argued that the variances of β can be obtained by the standard sandwich estimator
$$ \hat{\boldsymbol{V}} = I(\hat{\boldsymbol{ \beta}})^{-1} \hat{var} \{U(\boldsymbol{\beta}) \} I(\hat{\boldsymbol{ \beta}})^{-1} $$
(7)
where
$$ I(\boldsymbol{ \beta}) = \sum_{i=1}^{n} \left(\frac{\partial g^{-1}(\boldsymbol{\beta}^{T} \boldsymbol{Z}_{i})}{\partial \boldsymbol{\beta}} \right)^{T} \boldsymbol{R}_{i}^{-1} \left(\frac{\partial g^{-1} (\boldsymbol{\beta}^{T} \boldsymbol{Z}_{i})}{\partial \boldsymbol{\beta}} \right) $$
(8)
$$ \hat{var} \{U(\boldsymbol{\beta}) \} = \sum_{i=1}^{n} U_{i}(\boldsymbol{\beta}) U_{i}(\boldsymbol{\beta})^{T} $$
(9)
After the computation of pseudo-values, parameter estimates and their standard errors can be obtained using standard statistical software for generalized estimating equations, though the standard errors may be slightly conservative. In fact, a general asymptotic theory of estimates from estimating functions based on pseudo-values demonstrated, under some regularity conditions, consistency and asymptotic normality of the estimates [19]. The ordinary sandwich estimator is however not consistent, leading to an overestimate of the standard errors. The demonstration is derived for real-valued pseudo-values, but, as stated in [19], it can be generalized to handle vector-valued pseudo-values.
Given the asymptotic normality of the model estimates, it is possible to adopt the approach based on simultaneous inference in general parametric models to obtain simultaneous test procedures and confidence intervals [20].
Modelling of the RMST with right-censored failure time data with covariates using pseudo-values through a linear model
The RMST(τ) can be estimated non-parametrically based on the Kaplan-Meier estimator [1] or model-based, possibly resorting to flexible regression.
A model estimate for the RMST can be obtained, through transformation, adopting a model for the hazard function. The piecewise-exponential model assumes proportionality for the covariate effects while separate, piecewise constant, baseline hazards, are used for the different treatments to estimate RMST(τ) [21]. The model was further developed using Cox regression with stratification [22]. The method is implemented in the function restricted.residual.mean in the package timereg [23], in the free R software, [24]. The function can use the Cox regression model or the Aalen regression model to perform the calculations.
A number of different alternatives are available for model based estimates of RMST(τ), for example one convenient possibility is the use of flexible parametric survival models [8]. In general, the standard errors of the RMST (or of the difference of RMST between treatments) are obtained using the delta method, or using the bootstrap or other resampling/simulation techniques.
An alternative estimation method is to directly model RMST as a function of covariate values. This can be achieved using inverse probability of censoring weighting [12], or with pseudo-values [25]. We will focus on estimation based on pseudo-values as it allows to use standard software for generalized linear models and for simultaneous inference in general parametric models.
For the restricted mean we have \(\theta = E(X \wedge \tau) = \int _{0}^{\tau } S(t) dt\), and we use the estimator obtained by plugging in the Kaplan-Meier estimator [25]. For this estimator, results stated in [19] are valid under the assumption of censoring independent of event times and covariates. The ith pseudo-value at time τ is therefore defined as:
$$ \widehat{\theta}_{\tau \, i} = n \int_{0}^{\tau} \widehat{\mathrm{S}}(t) dt - (n-1) \int_{0}^{\tau} \widehat{\mathrm{S}}^{-i}(t) dt $$
(10)
where \(\widehat {\mathrm {S}}^{-i}(t)\) is the Kaplan-Meier estimator excluding subject i.
Instead of considering a single τ, as in [25], we consider a finite grid of M time points τ1,…,τj,…,τM and we compute the pseudo-values for the ith subject at each τj. Time points can be selected as quantiles of the event time distribution, e.g. M could be chosen to have approximately 10 events for each pseudo-value while, in general, it is not useful to have more than 15-20 time points. In fact according to [13, 26] a number of points between 5 and 10 is sufficient to provide good estimates of the regression parameter. In [16], when studying the application to multi state models, model results using 5, 10 or 20 time points produced similar results. The problem with using a large number of time points is the computational burden as the dataset is being expanded with as many subjects’ replications as the number of time points considered.
A regression model for a vector valued \(\hat {\boldsymbol {\theta }}_{i}\), with components calculated at several τ-values, must include terms for time and possibly interaction terms between covariates and time to account for time-varying covariate effects. This was already done for model based on pseudo-values with applications to competing risks [26, 27]. A generalized linear model can be assumed as:
$$ \mathrm{g}(\theta_{\tau_{j} \, i}) = \alpha_{j} + \mathbf{\gamma}^{T} \mathbf{C}_{i} $$
(11)
where αj determines the baseline (transformed) restricted mean time at τj and is explicitly estimated and Ci are time-independent covariates.
A presentation of the different methods available for estimating RMST, applied to the classical Freireich data [28], can be found in the supplementary material [see Additional file 1].
Extension to smoothing functions and time-varying effects
The baseline estimate for the grid points can be modelled by including M−1 dummy variables. It is possible to include time-varying effects through interaction terms between time and covariates. To include a reasonable number of regressors into the model, we propose to use a smoothing function (e.g. a cubic spline) to model the restricted mean time of the reference category. In such a way, model (11) can be rewritten as:
$$ g[\theta_{\tau_{j} \, i}]=\mathbf{\delta }^{T} \mathbf{B}_{j}+\mathbf{\gamma}^{T} \mathbf{C}_{i} +\mathbf{\zeta}^{T} \mathbf{b}_{B_{j} C_{i}} = \mathbf{\beta}^{T}\mathbf{Z} $$
(12)
where Bj is the vector of the spline basis functions and δT the corresponding vector of the regression coefficients and \(\mathbf {b}_{B_{j} C_{i}}\) represents the interaction terms among covariates and the spline bases to account for time-varying effects.
When considering the identity link, as covariate effects estimate differences in RMST, a constant difference through time is not plausible. Covariates should then be included together with their time-dependent effects, i.e. interactions with time.
Model-based comparison of RMST curves
In the following the identity link will be used and, for ease of notation, only a binary covariate, A, will be considered. The regression model will therefore be specified as:
$$ \text{RMST}(\tau_{j}\mid A_{i}) = \theta_{\tau_{j} \, i} = \alpha_{1} + \alpha_{j} \mathrm{I}_{j} + \gamma A_{i} + \zeta_{j} \mathrm{I}_{j} A_{i} $$
(13)
where the Ij,j=2,…,M are M−1 indicator functions for estimation of the baseline function. The same indicator functions are used to model time-varying covariate effects.
The estimate of the difference in RMST through follow-up according to binary covariate A is given by the step function
$$ \mathrm{D}(\tau_{j}) = \Delta(\tau_{j} | A) = \gamma + \zeta_{j} \mathrm{I}_{j} $$
(14)
changing value at each time selected for the computation of the pseudo-values.
The variance of D(τj) can be estimated for each τj from model results as in standard GEE modelling (though, as already said, this may be slightly conservative). For example the variance of D(τj), can be computed using the robust sandwich variance-covariance matrix of the coefficients γ and ζj and an M-dimensional basis vector, with 1 in the first and j positions and 0 otherwise.:
$$\begin{aligned} \left(\begin{array}{ccccc} 1 & \ldots & 1 & \ldots & 0\\ \end{array}\right) \left(\begin{array}{cccc} \mathrm{V}(\widehat{\gamma}) & \ldots & \text{Cov}(\widehat{\gamma}, \widehat{\zeta}_{j}) & \ldots\\ \ldots & \ldots & \ldots & \ldots \\ \ldots & \text{Cov}(\widehat{\zeta}_{j-1}, \widehat{\zeta}_{j}) & \mathrm{V}(\widehat{\zeta}_{j}) &\ldots \\ \ldots & \ldots & \ldots & \ldots \\ \ldots & \ldots & \ldots & \mathrm{V}(\widehat{\zeta}_{M}) \\ \end{array}\right) \left(\begin{array}{c} 1 \\ \vdots \\ 1 \\ \vdots \\ 0 \end{array}\right) \end{aligned} $$
The function D(τj) can also be obtained by incorporating a smooth spline basis into the regression model. In this case, without loss of generality, considering just two basis functions (for ease of notation) B1(t) and B2(t) and an intercept, to model the baseline RMST through time, the regression model can be written as:
$$\begin{array}{*{20}l} \text{RMST}(t \mid A_{i}) & = \delta_{0} + \delta_{1} \mathrm{B}_{1}(t) + \delta_{2} \mathrm{B}_{2}(t) + \gamma A_{i} + \\ & \qquad \zeta_{1} \mathrm{B}_{1}(t) A_{i} + \zeta_{2} \mathrm{B}_{2}(t) A_{i} \end{array} $$
(15)
and the difference in RMST curves is given by the smooth function:
$$ \mathrm{D}(t) = \Delta(t | A) = \gamma + \zeta_{1} \mathrm{B}_{1}(t) + \zeta_{2} \mathrm{B}_{2}(t) $$
(16)
The robust variance at time t of the estimate \(\widehat {\mathrm {D}}(t), V(\widehat {\mathrm {D}}(t))\) can be computed as:
$$\begin{aligned} \left(\begin{array}{ccc} 1 & \mathrm{B}_{1}(t) & \mathrm{B}_{2}(t)\\ \end{array}\right) \left(\begin{array}{ccc} \mathrm{V}(\widehat{\gamma}) & \text{Cov}(\widehat{\gamma}, \widehat{\zeta}_{1}) & \text{Cov}(\widehat{\gamma}, \widehat{\zeta}_{2})\\ \text{Cov}(\widehat{\gamma}, \widehat{\zeta}_{1}) & \mathrm{V}(\widehat{\zeta}_{1}) & \text{Cov}(\widehat{\zeta}_{1}, \widehat{\zeta}_{2})\\ \text{Cov}(\widehat{\gamma}, \widehat{\zeta}_{2}) & \text{Cov}(\widehat{\zeta}_{1}, \widehat{\zeta}_{2}) & \mathrm{V}(\widehat{\zeta}_{2}) \end{array}\right) \left(\begin{array}{c} 1 \\ \mathrm{B}_{1}(t) \\ \mathrm{B}_{2}(t) \end{array}\right) \end{aligned} $$
Based on the asymptotic normality of the model estimates, the asymptotic pointwise 95% confidence interval of R(t) is given by \([\mathrm {D}_{lo}(t); \mathrm {D}_{up}(t)] = \widehat {\gamma } + \widehat {\zeta }_{1} \mathrm {B}_{1}(t) + \widehat {\zeta }_{2} \mathrm {B}_{2}(t) \pm 1.96 \sqrt {\mathrm {V}(\widehat {\mathrm {D}}(t))}\).
The confidence region for the curve, i.e. the simultaneous 95% confidence interval of D(t), can be obtained by adopting the approach developed by Hothorn et al. (2008) [20]. In order to ensure a coverage probability of at least 95% for the entire curve, an appropriate critical value u95% must be chosen instead of the 1.96. The value can be chosen such that P(tmax≤u95%)=95%, where:
$$ t_{max} = \text{sup}_{t\in[a,b]}\frac{[\widehat{\gamma} + \widehat{\zeta}_{1} \mathrm{B}_{1}(t) + \widehat{\zeta}_{2} \mathrm{B}_{2}(t) ] - \mathrm{D}(t)}{\sqrt{\mathrm{V}(\widehat{\mathrm{D}}(t))}} $$
(17)
where the limits of the interval [a,b] span the follow-up time of interest or, more strictly, corresponds to the minimum and maximum times used to compute pseudo-values. In order to compute the u95% value, the supremum of the function can be obtained using an equally spaced grid of time points [a≤t1≤t1≤⋯≤tk=b]. The obtained value should be sufficiently close to the true value and this approach makes it possible to use standard software for the calculation [29]. The function glht from the package multcomp [20] can be used to compute the confidence band.