Computation of association measures
In many situations a researcher is interested in providing adjusted estimates of covariate associations with the outcome. In observational studies (involving no randomization) the exposure effect has to be adjusted for known confounders. Also in randomized controlled trials (RCT) the use of adjusted estimates is suggested for example to account for potential covariate imbalances or since prognostically relevant covariates were considered for a stratified randomization [20–22]. In these cases, Cox regression is widely used to adjust the estimated association between the covariate of interest and outcome for the other covariates.
For the purpose of illustration, let us consider a controlled trial, where the event of interest is death, investigating the efficacy of a new treatment (T=1) in comparison to a standard treatment (T=0). Two covariates such as age, A, and gender, G, are considered for adjustment. The multivariable proportional hazard Cox model can be specified as follows:
\lambda \left(t\rightT,A,G)={\lambda}_{0}(t)exp(\mathrm{\alpha T}+\mathrm{\beta A}+\mathrm{\gamma G})
where t is the time to event; λ(tT,A,G) is the hazard function conditional to covariate values; α, β and γ are the regression coefficients; λ
_{0}(t) is the baseline hazard for a subject in the control group (T=0), 0 years old and female (G=0). The adjusted hazard ratio for the treatment, HR constant through followuptime, is simply obtained as exp(α). Using such a model, R D(t) or N N T(t) for the treatment can be obtained specifying a covariate pattern and the baseline risk. For example, the estimated N N T(t), conditional on being male 40 years old is:
\frac{1}{\u015c\left(t\rightT=1,A=40,G=1)\u015c(tT=0,A=40,G=1)}.
(1)
where \u015c\left(t\rightT=1,A=40,G=1) is the estimated survival probability for a male 40 years old in the experimental treatment group, given by {\left[\phantom{\rule{0.3em}{0ex}}{\u015c}_{0}\right(t\left)\right]}^{\mathit{\text{exp}}(\widehat{\alpha}+\widehat{\beta}\phantom{\rule{2.77626pt}{0ex}}40+\widehat{\gamma})}, while \u015c\left(t\rightT=0,A=40,G=1) is the estimated survival probability for a male 40 years old, in the control group, given by {\left[\phantom{\rule{0.3em}{0ex}}{\u015c}_{0}\right(t\left)\right]}^{\mathit{\text{exp}}(\widehat{\beta}\phantom{\rule{2.77626pt}{0ex}}40+\widehat{\gamma})}. {\u015c}_{0}\left(t\right) is the baseline survivor function from a Cox proportional hazards model estimated according to one of the available methods [23, 24].
In order to obtain adjusted measures of association, different from the hazard ratio, the Cox proportional hazard model is used to estimate adjusted survival curves [25] as outlined in the following paragraphs.
Average covariate method
The simplest approach for obtaining adjusted survival curves is the average covariate method. The mean values among the study patients of the covariates used for adjustment are plugged into the multivariable Cox proportional hazard model. Considering the above example, if the mean age of subjects in study is 45 and 30% males are included, the adjusted \widehat{\mathit{\text{NNT}}}\left(t\right) for the treatment will be \frac{1}{\u015c\left(t\rightT=1,A=45,G=0.3)\u015c(tT=0,A=45,G=0.3)}. The average covariate method was once popular and largely adopted, due to its simplicity, but it was severely criticized [25, 26].
In fact it involves the averaging of categorical covariates, such as gender, which is difficult to understand. Moreover the method provides an estimate of the measure of effect for an hypothetical average individual and not a population averaged estimate.
Corrected group prognosis method and developments
An alternative idea is the corrected group prognosis method (CGPM), [14, 27, 28]. In the following, the CGPM to estimate R D(t), as described by Austin [14], is outlined:

A Multivariable Cox (or fully parametric) regression is used for the treatment and the covariates.

For each subject, the predicted survival probabilities, at the times of interest, are estimated using the multivariable model, assuming each subject is in the experimental treatment group; then the predictions are averaged;

the same predictions are obtained and averaged assuming each subject is in the control group.

the difference between the averaged predicted probabilities between experimental and control group is an estimate of the adjusted R D(t) for the experimental treatment at the specified times.
Pointwise confidence intervals of the obtained R D(t) estimates may be computed via bootstrap resampling [14]. For each bootstrap sample, i.e. a sample of the same size of the original one and randomly drawn with replacement from it, the R D(t) is computed according to the procedure outlined. A non parametric bootstrap 95% pointwise confidence interval is obtained resorting to the 2.5^{th} and 97.5^{th} percentiles of the obtained R D(t) bootstrap distribution.
A simulated example of the estimation of R D(t) in presence of confounding is exemplified in Figure 1.
The CGPM can be applied in principle to whatever regression model and an adequate model must be chosen. Considering, for example, the Cox regression model, in presence of time dependent covariate effects, an interaction of the covariates with a prespecified function of time should be specified, in order to estimate H R(t) varying during followup time. It is important to remark that it is not always easy to specify an adequate model in presence of time dependent covariate effects. In fact it is not always obvious how to model the time dependence itself. In general simple functions of time (linear or logarithm) or more flexible alternatives are used, [29].
To allow the estimation the data set must be augmented as it is done for true timedependent covariates [30]. It is to be remarked that, although the use of predicted values from regression models is simple from a practical point of view, the standard way to obtain summary measures of effect and their confidence interval is to use directly regression model coefficient estimates. The CGPM applied to the Cox model will be used for comparison with the method here proposed and described in the following section.
Laubender and Bender, [13], proposed different averaging techniques to estimate relevant impact numbers in observational studies using Cox model. For the purpose of illustration, let us consider the same example as before simply considering an exposure (E) instead of treatment. To obtain an estimate of N N E(t) it is possible to average predictions considering the subjects as if they were unexposed and as if they were exposed and taking the difference. As the distributions of the covariates used for adjusting are in general different in the exposed and unexposed groups, two different measures should be considered. Specifically, the estimate of the N N E(t) is obtained considering the unexposed subjects only, while E I N(t) is obtained considering the exposed subjects only. A comparison of the model based estimated R D(t) with that obtained through different averaging techniques, namely N N E(t) and E I N(t) [13, 15], is provided in the second example. However, the focus of the paper is not the comparison of different averaging techniques which are provided only for illustrative purposes. In particular, only the estimates obtained through the averaging performed over the whole population are compared with those based on transformation models methods.
Modelbased estimates of association measures
Adjusted modelbased estimates of measures of association can be obtained resorting to a general class of regression models used in Survival Analysis called transformation models [31].
Pseudo values
Considering the previous example, the transformation model can be written as g(S(tT,A,G))=g(S
_{0}(t))+α T+β A+γ G.
A possibility to estimate transformation models, using standard available software, is through pseudovalues [32]. The pseudo value is defined for each subject i at any time t and is given by
{\widehat{\theta}}_{i}\left(t\right)=\mathrm{n\u015c}\left(t\right)(n1){\u015c}_{i}\left(t\right)
(2)
where n is the sample size, \u015c\left(t\right) is the survival probability based on the KaplanMeier estimator using the whole sample and {\u015c}_{i}\left(t\right) is the survival probability obtained by deleting the i subject from the sample. When no censoring is present in the data, the pseudo values for subject i at time t is simply 1 if the subject is alive at t, while it is 0 if the event happened by t. Suppose to have an exposed male, 40 years old, which dies after 30 months of followup. The pseudo values computed at 12, 24 and 36 months are equal to 0, 0 and 1 respectively. The times at which the pseudovalues are computed are called pseudotimes.
When censoring is present in the data, pseudovalues are still defined for each subject (even those censored) and for each time, but the values may also be less than 0 or greater than 1 (See [33]; page 5310–11 for further details on the properties of pseudovalues).
In general, to allow inference on the entire survival curve, M (greater than 5) pseudo times are used, considering, for example, the quantiles of the unique failure time distribution. As M pseudo values are computed for each subject, an augmented data set is created with M observations for each subject.
Transformation models and association measures
The pseudovalues are then used as responses in a regression model for longitudinal data, where time is a covariate. As no explicit likelihood is available for pseudovalues, generalized estimating equations (GEE), [34], are used accounting for the correlation of the pseudovalues within each subject. The cluster robust variancecovariance is used for hypothesis testing using Wald tests. In general an independence working variancecovariance matrix can conveniently be used in the estimation process [32].
In order to model g(S
_{0}(t)), the transformed baseline survival function, the standard procedure is to insert in the regression model indicator functions for each pseudotime. If all event times would be used to compute the pseudovalues, the insertion of indicator functions would result in a non parametric representation of the (transformed) baseline survival, as in the Cox model. In general only a small number of pseudotimes are used obtaining a parametric baseline representation. As an alternative, spline functions can be inserted in the regression model, as did [35] in a nonpseudovalues framework.
Considering for simplicity only two spline bases, the regression model of the example can be written as follows:
g\left({\theta}_{i}\right(t\left)\right)={\varphi}_{0}+{\varphi}_{1}{B}_{1}\left(t\right)+{\varphi}_{2}{B}_{2}\left(t\right)+\alpha {T}_{i}+\beta {A}_{i}+\gamma {G}_{i}
(3)
where B
_{1}(t) and B
_{2}(t) represent the first and second spline bases for time t. For example, if a restricted cubic spline basis is used with three knots at k
_{1},k
_{2},k
_{3}, then B
_{1}(t)=t and {B}_{2}\left(t\right)={(t{k}_{1})}_{+}^{3}\frac{{(t{k}_{2})}_{+}^{3}({t}_{3}{t}_{1})}{({t}_{3}{t}_{2})}+\frac{{(t{k}_{3})}_{+}^{3}({t}_{2}{t}_{1})}{({t}_{3}{t}_{2})}, where, for example, {(t{k}_{1})}_{+}^{3} is equal to (t−k
_{1})^{3} if t>k
_{1}, otherwise is 0. Knots are chosen at quantiles of the failure time distribution. In the case of 3 knots the quantiles commonly suggested are 0.1, 0.5 and 0.9, [36]. To choose the complexity of the spline the QIC, [37], an information criterion proposed for generalized estimating equations, can be used. A less formal strategy is the graphical comparison between the KaplanMeier marginal survival probability and the marginal probability obtained from the transformation model without covariates. Such a procedure will be used in the examples.
The first part of the model, ϕ
_{0}+ϕ
_{1}
B
_{1}(t)+ϕ
_{2}
B
_{2}(t), provides a parametric representation of the (transformed) baseline survival function, g(S
_{0}(t)), during followup time.
The coefficients α, β and γ represent the covariate effects expressed as differences in the Survival probability, transformed by g associated with a unit increase in the covariates. Let us consider such an issue in detail. When g is the logit link function, a proportional odds model is estimated. Accordingly, α, β and γ represent the logarithm of the ratio of the odds of surviving associated with the change of one unit in the covariates. Such an effect is constant through followup times. The exponentiation of the parameter estimates represent therefore the ratio of the odds of surviving. Similarly, the logarithmic link produces a proportional risks model and the e x p(α), e x p(β) and e x p(γ) represent the ratio of the survival probabilities (Relative Risks, RR). The identity link produces a constant survival difference model: α, β and γ represent the adjusted differences in survival probabilities (risk differences, RD). A constant difference model through followup is often not practical as a model such that at the beginning of the followup the survival curves start at 1 and then, eventually, become different. However, it is to be noted that the first pseudotime is never placed at time 0, but later on the followup time scale. In Figure 1 an example of the model based RD estimate with pointwise confidence intervals, constant through time, is reported in the right panel. The constant model estimated RD can be used to obtain a constant estimate of NNT by inversion. In the case of treatment T: \widehat{\mathit{\text{NNT}}}={\left[\widehat{\alpha}\right]}^{1}. The value of 1 indicates the largest possible effect of NNT, while in correspondence of no covariate effect (RD=0) the NNT value is ±∞. The largest possible harmful effect is −1. Positive and negative values of NNT represent the expected number of patients needed to be treated for one additional patient to benefit and to be harmed, respectively.
In the case of the loglog link, g=l o g(−l o g(•)), e x p(α), e x p(β) and e x p(γ) are the ratio between cumulative hazard functions associated with the change of one unit in the covariates. This ratio is equal to that of hazard functions, only in the proportional hazard case.
The method allows to estimate the measures of effect also for continuous covariates. For example, the evaluation of a biomarker effect measured on a continuous scale, without cutoffs, is still possible with this methodology.
The use of different link functions to obtain a particular measure of effect, is an established technique in binomial regression, where the use of noncanonical links, such as the logarithm, allows to obtain adjusted measures of impact different from the odds ratio, [1, 38]. Wacholder, [39], is an excellent reference for deepen such aspects in the framework of logistic regression.
When there is evidence for time dependent effects of the covariates, the interaction between the covariates and the spline bases B
_{1}(t) and B
_{2}(t) are included in model (3).
\begin{array}{lcr}g\left({\theta}_{i}\right(t\left)\right)& =& {\varphi}_{0}+{\varphi}_{1}{B}_{1}\left(t\right)+{\varphi}_{2}{B}_{2}\left(t\right)+\alpha {T}_{i}+\beta {A}_{i}+\gamma {G}_{i}\hfill \\ +\phantom{\rule{0.3em}{0ex}}{\gamma}_{1}{B}_{1}\left(t\right){T}_{i}+{\gamma}_{2}{B}_{2}\left(t\right){T}_{i}+{\gamma}_{3}{B}_{1}\left(t\right){A}_{i}\hfill \\ +\phantom{\rule{0.3em}{0ex}}{\gamma}_{4}{B}_{2}\left(t\right){A}_{i}+{\gamma}_{5}{B}_{1}\left(t\right){G}_{i}+{\gamma}_{6}{B}_{2}\left(t\right){G}_{i}\hfill \end{array}
In such a case, the estimated gtransformed survival probability differences change during followup time. In order to show the effect, varying in time, of a dichotomous covariate, for example treatment T, it is useful to adopt a graphical display, where the time is put on the horizontal axis while the function e x p(α+γ
_{1}
B
_{1}(t)+γ
_{2}
B
_{2}(t)) is on the vertical axis (exponentiation is not used with the identity link; R D(t)=α+γ
_{1}
B
_{1}(t)+γ
_{2}
B
_{2}(t)). In this case the estimated N N T(t) is naturally varying through followup time and again obtain by inversion: R D(t)^{−1}.
For a continuous covariate, such as Age, A in the example, it is possible to use a surface plot, where Age and time are on the x and y axis, while the z axis reports the covariate effect with respect to a reference value. It would also be possible to model Age effect with spline bases. In this case, the interaction between Age and time is obtained through tensor product spline bases of Age and time.
When a large number of pseudotimes is used, spline functions allow to model parsimoniously the baseline risk compared to indicator functions. This is particularly important for the modelling of timedependent effects in connection to the different link functions. In principle when a covariate effect is constant using a specific link, it should be timevarying with the other links. No statistical evidence against a constant covariate effect for more than one link may only be due to lack of power. The problem can also be exacerbated by some multiple testing issue. Timedependent effects selection depends therefore on the link transformation used. As a consequence, the adjusted effect of a covariate may be constant using a link function, but timedependent using a different link.
Moreover, the fitted values of the different models selected for the different link functions are generally different, being equal only if the models are saturated. Traditionally, the strategy used in the application of transformation models such as (3) was to select the best fitting g transform, i.e., the transform where covariate effects are constant through time, see [40, 41] as examples. The approach considered here is different. The interest is in using the gtransform which is the most informative for the clinical or biological counterpart. Generally the best fitting link function and the one selected by the researcher are not the same. Time dependent effects should therefore be expected in the model.
Pointwise confidence intervals
Approximate pointwise 95% confidence intervals are calculated from model results as in standard GLM/GEE modelling. The computation is easy when covariate effects are constant on the gtransformed scale. The clusterrobust variancecovariance matrix must be used. Using model (3) as example, the 95% CI for treatment, on the transformed g scale, will be
{l}_{\mathit{\text{lower}}},{l}_{\mathit{\text{upper}}}=\widehat{\alpha}\pm 1.96\times \mathrm{st.error}\left(\widehat{\alpha}\right)
where \mathrm{st.error}\left(\widehat{\alpha}\right) is the estimated clusterrobust standard error for the model parameter α. When g is the log, logit or l o g−l o g link function, the 95% CI for the treatment effect (respectively an RR, OR or HR) is [ e x p.(l
_{
lower
}),e x p(l
_{
upper
})]. With the identity link, the 95% confidence intervals is [ l
_{
lower
},l
_{
upper
}], without additional transformations, and the corresponding interval for NNT is [ 1/l
_{
upper
},1/l
_{
lower
}].
A Clarification is necessary for the confidence interval of the NNT.
When the estimated constant RD is not statistically significant the confidence interval of RD includes 0. The limits of the confidence interval are one positive and the other negative. The resulting confidence interval for NNT should include infinity (∞), [42]:
95\mathrm{\%C.I.}=\left[\infty ,1/{l}_{\mathit{\text{lower}}}\right]\cup \left[1/{l}_{\mathit{\text{upper}}},+\infty \right]
With timevarying covariate effects, the variance of the sum of a linear combination of different parameter estimates must be computed for the each of the followup times. For example, for treatment T, the variance of interest, at a specific time t, is that of the linear combination α+γ
_{1}
B
_{1}(t)+γ
_{2}
B
_{2}(t). Written in matrix terms the variance at time t is given by:
\begin{array}{c}\left(\begin{array}{ccc}1& {B}_{1}\left(t\right)& {B}_{2}\left(t\right)\end{array}\right)\phantom{\rule{1em}{0ex}}\left(\begin{array}{ccc}V\left(\alpha \right)& \mathit{\text{Cov}}(\alpha ,{\gamma}_{1})& \mathit{\text{Cov}}(\alpha ,{\gamma}_{2})\\ \mathit{\text{Cov}}(\alpha ,{\gamma}_{1})& V\left({\gamma}_{1}\right)& \mathit{\text{Cov}}({\gamma}_{1},{\gamma}_{2})\\ \mathit{\text{Cov}}(\alpha ,{\gamma}_{2})& \mathit{\text{Cov}}({\gamma}_{1},{\gamma}_{2})& V\left({\gamma}_{2}\right)\end{array}\right)\phantom{\rule{1em}{0ex}}\left(\begin{array}{c}1\\ {B}_{1}\left(t\right)\\ {B}_{2}\left(t\right)\end{array}\right)\end{array}
where V(•) stands for the clusterrobust variance, while C o v(•,•) stands for the clusterrobust covariance of two random variables. When the variances at the different times are calculated, the pointwise 95% CI can be computed as before.
Software implementation
The approach to censored data regression based on pseudo values was applied to regression models for the cumulative incidence functions in competing risks and for multistate modeling [43], for the restricted mean [44] and for the survival function at a fixed point in time [45]. Implementation details and software can be found in Klein et al., [32], and Andersen and Perme, [46].
Software is available to compute pseudo values (macro %pseudosurv in SAS and function pseudosurv in R package pseudo[32]) Standard GEE tools available in SAS or R can be used for regression. In SAS the proc genmod allows to change link functions using the instructions FWDLINK and INVLINK. In R, the package geepack can be conveniently used, see [32] for details.
As an example of the R software implementation, the identity link is used:
where the variable pseudo contains the pseudo values and the variable tpseudo the pseudotimes according to the software reported in [32]. The R function rcs of the package rms, [47], is used to compute restricted cubic spline bases. Each subject is represented by multiple rows in the data, one for each pseudo time. The records for each subject are identified by means of the variable id which is used to estimate the robust standard error by the geese function. Using the identity link function, the estimated coefficients can be interpreted as the adjusted R D(t) estimates.