 Technical advance
 Open access
 Published:
Inference about timedependent prognostic accuracy measures in the presence of competing risks
BMC Medical Research Methodology volume 20, Article number: 219 (2020)
Abstract
Background
Evaluating a candidate marker or developing a model for predicting risk of future conditions is one of the major goals in medicine. However, model development and assessment for a timetoevent outcome may be complicated in the presence of competing risks. In this manuscript, we propose a local and a global estimators of causespecific AUC for rightcensored survival times in the presence of competing risks.
Methods
The local estimator  causespecific weighted mean rank (cWMR)  is a local average of timespecific observed causespecific AUCs within a neighborhood of given time t. The global estimator  causespecific fractional polynomials (cFPL)  is based on modelling the causespecific AUC as a function of t through fractional polynomials.
Results
We investigated the performance of the proposed cWMR and cFPL estimators through simulation studies and reallife data analysis. The estimators perform well in small samples, have minimal bias and appropriate coverage.
Conclusions
The local estimator cWMR and the global estimator cFPL will provide computationally efficient options for assessing the prognostic accuracy of markers for timetoevent outcome in the presence of competing risks in many practical settings.
Background
In modern evidencebased medicine, decisions on appropriate early medical intervention or the choice and timing of interventions frequently rely on prognostic markers that are informative for survival outcome. Such marker can be a single covariate or a risk score where the latter is estimated from a survival model of the association between covariates and survival time. For instance, a wellknown Framingham risk score predicts the 10year risk of cardiovascular disease [1]. This model often guides clinicians to identify those at high/lowrisk for future cardiovascular disease. Similar risk models and markers exist for other diseases, e.g. Gail score for breast cancer [2], prostate specific antigen for prostate cancer, kidney donor risk index (KDRI) of donor for kidney transplant etc. In liver transplantation study (LT), the sideeffects of medications put the organ recipients at risk for recurrent liver disease (liver fibrosis) which can lead to graft failure [3]. Liver biopsy is the gold standard for staging, managing and treating liver fibrosis, and useful in the prognosis of LT recipients. However, biopsy is often impractical due to its invasiveness and higher cost. Serum fibrosis markers such as the fibrosis score 4 (FIB4), and the nonalcoholic fatty liver disease fibrosis score (NAFLD) measured one year after LT have recently been shown to accurately stratify organ recipients who are at high risk from those who are at low risk of recurrent liver disease [3], [4]. Therefore, rather than performing biopsy on all patients, these markers could guide clinicians to start timely treatment for those highrisk patients while sparing lowrisk patients from the side effects of biopsy and from the unnecessary costs. Before one adopts such markers into clinical practice, it is crucial to evaluate their prognostic accuracy i.e. whether markers correctly discriminate subjects who will subsequently experience the event of interest at or by time t from those who will not experience any event by t. The goal of this paper is to develop a method to estimate timedependent prognostic accuracy measure of such marker.
The evaluation of prognostic accuracy of marker becomes complicated in the presence of competing risks. Competing risks arise when a subject experiences a terminal event due to one of the multiple mutually exclusive causes. For example, postliver transplant, patients may die due to adverse liver and/or transplantrelated outcomes (e.g. graft failure) while other patients may die due to competing events (e.g. nonliver causes) before experiencing liverrelated adverse outcomes. This leads to a competing risks situation because liverrelated marker could predict graft failure or graft related death, but may not predict nongraft related events. [5]. Here, the scientific question is how well do FIB4 or NAFLD discriminate between patients who progress to graftrelated death and those who do not. Understanding whether FIB4 or NAFLD is highly predictive for death due to graft failure but not others could potentially lead to more rational and costeffective use of specific medications or treatment strategies. In order to facilitate the assessment of prognostic accuracy of marker, the goal of this manuscript is to develop methods to estimate timedependent prognostic accuracy of a baseline marker after taking rightcensoring and competing risks into account.
A number of statistical measures have been proposed to assess the prognostic accuracy of a marker. The Receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) [6] are the most popular measures of a binary classifier system. ROC curve is a graphical illustration of the diagnostic ability of a binary classifier system as its discrimination threshold is varied and AUC provides a global summary of the discriminatory capacity of the marker. For a survival outcome, the event status of a subject can change over time and the risk of developing the event conditional on marker value changes over the followup time. Therefore, the accuracy summaries for evaluating the performance of a marker must take this timedependence into account. Timedependent ROC methods for survival time [7],[8] classify the subjects as cases or controls depending on their survival status at or by time t and compare their observed status with a predicted risk at some or all times. The incident cases/dynamic controls classification arises [8] when scientific interest focuses on correct classification of subjects who are still in risk at time t. The incident (I) cases are those subjects who had an event at t and the dynamic (D) controls are those who survived through t. Among the other definitions of timedependent ROC, cumulative (C) case and dynamic control definition pair is most common. In this classification, cumulative cases are defined as patients having an event within a certain time range, say [0,t]. The I/D version of AUC focuses on incident cases and is better suited for characterizing the trajectory of AUC over time, while C/D AUC does not characterize the evolution of accuracy over time. Another appealing property of I/D measure is that the evaluation of the model at a certain time point t only focuses of the riskset at the time t and therefore prior events and performance of the marker does not influence or distort the marker accuracy at t. As prognostic models aim to predict future, this property is very appealing when evaluating dynamic prognostic models.
In this article, we focus on I/D definition of AUC(t), which is the timedependent area under the I/D ROC curve at t and was introduced in [8]. There has been extensive research in estimating the I/D AUC(t) in the case of single cause of failure. In [9], a locally weighted mean rank (WMR) smoothing based on the intermediate concordance measure was proposed. A fractional polynomials (FPL) estimator based on modeling AUC(t) as a function of time was proposed in [10]. To evaluate the methods, the authors in [11] compared different approaches (i.e. [8], [12], [6]) of estimating the concordance index based on the AUC(t) by simulation studies. However, the estimation of the AUC(t) at different followup time points was not investigated in that study.
Though competing risks is an important issue in many practical clinical settings, there is limited research on estimating the I/D AUC(t) in the presence of competing risks. When there is only a single cause of failure, the I/D definition stratifies only the subjects in the riskset. In the presence of competing risks, I/D definition further stratifies the subjects who are still at risk at time t into a single control group and causespecific case groups depending on the cause of failure. Estimation of timedependent measures under competing risks were discussed in [13] and [14]. In [13] a timedependent I/D ROC curve was estimated using a Cox model for the causespecific hazards and riskset reweighting of the marker distribution. This approach is semiparametric, indirect, and computationally intensive. First, it requires correct specification of a conditional hazard regression model linking the marker to the event time. This approach provides biased estimate when the monotonicity of association between marker and event time is violated. Second, in order to obtain the AUC curve from the ROC, numerical integration of the ROC curve is required. Furthermore, within an interval around each unique event time, a Cox model is assumed and the parameters of the sequence of Cox models are estimated for each neighborhood around the unique event times. Therefore, their approach is also timeconsuming. The goal of this manuscript is to propose nonparametric, direct, intuitive and scalable methods to estimate the I/D causespecific AUC(t) of baseline marker, accounting for censoring and competing events. The major advantage of our proposed method over the existing semiparametric estimator will be that it requires no specification of a conditional hazard model linking the marker to the event time and hence robust to model misspecification. In addition, the inference of this measure can be developed under minimal assumptions.
The rest of the article is organized as follows. In “Notation” section, we provide the notation. In “Weighted mean rank estimator of i/D causeSpecific aUC(t) (cWMR)” and “Fractional polynomial estimator of i/D causeSpecific aUC(t) (cFPL)” sections, we introduce a local and a global estimators of causespecific I/D AUC(t) that are direct, flexible and nonparametric. We report simulation results to illustrate our methodology in “Simulation study” section. We report a real data example in ‘Application” section, and put a discussion in “Discussion” section. We end with conclusion in “Conclusions” section.
Methods
Notation
Let M denote the (baseline) marker that could potentially be used in predicting the survival time in the presence of competing risks where a subject can fail due to J mutually exclusive causes. Note that M can be a single covariate or it could be a risk score that may be calculated from a survival model (e.g. proportional hazard model). Let the implicit event times for each of the J causes be \(\{{T}^{(1)},\dots,{T}^{(J)}\}\). In the presence of competing risks, only the time to the occurrence of the first event is potentially observable. Thus, without censoring, one observes T = min \(({T}^{(1)},\dots,{T}^{(J)})\) and the failure indicator δ, which takes value j if \(T={T}^{(j)} (j=1,\dots J)\). Define the observed event time, Z = min (T,C) where C is the censoring time. A censored observation has Z=C and this is recorded by δ=0. Suppose that there are n subjects in the study. Let 1(.) denote the indicator function. Let R_{i}(t)=1{Z_{i}≥t} denote the atrisk indicator for the ith individual at time t (i=1,2,...,n). Let \(\mathbb {R}_{t}=\{i:R_{i}(t)=1\}\) denote the subjects that are in the riskset at time t. Among the subjects in \(\mathbb {R}_{t}\), the subjects who had an event from cause j at t are the jth causespecific incident (I) cases: \(\mathbb {R}^{(j)}_{1t}=\{{i:T_{i}=t,\delta _{i}=j}\}\). The subjects who did not have an event by t are the dynamic (D) controls: \(\mathbb {R}_{0t}=\{i:T_{i}>t\}\). Let n_{t} be the size of \(\mathbb {R}_{0t}\) i.e. \(n_{t}=\mathbb {R}_{0t}\) and \(d^{(j)}_{t}\) be the size of \(\mathbb {R}^{(j)}_{1t}\), \(d^{(j)}_{t}=\mathbb {R}^{(j)}_{1t}\).
If there is a single cause of failure (J=1), the timedependent incident/dynamic area under the ROC curve at time t, I/D AUC(t), is defined as
This is the timedependent probability at time t that for a randomly selected casecontrol pair (i,k) the marker value for the incident case is higher than the marker value for the control.
Timedependent i/D causeSpecific aUC(t)
In the presence of competing risks, the I case and D control definition can be extended as causespecific incident cases and dynamic controls as follows:

jth causespecific case: \(T = t,\delta =j;\,\,\, j=1,2,\dots,J.\)

Control: T>t.
The I/D AUC(t) in (1) can be redefined as the jth I/D causespecific AUC(t) and define as
where \(j=1,2,\dots,J.\) Below we propose a local and a global estimators for the timedependent causespecific AUC(t) curve.
Weighted mean rank estimator of i/D causeSpecific aUC(t) (cWMR)
A nonparametric estimator of I/D AUC(t) was proposed in [9] using a nearest neighbor method. This estimator is a local average of timespecific observed AUCs. While the method do not address the issue of more than one cause of failure, natural modification of the approach allows estimation of accuracy in the presence of multiple causes of failure. To illustrate our proposed method, let A^{(j)}(t) denote the proportion of (i,k) pairs where subject k has a lower marker value compared with that of subject i who experienced failure due to cause j, provided subject k has longer survival than subject i and define as
Note that, A^{(j)}(t) can be considered as an estimator of \(\text {AUC}^{(j)}_{t}\) in (2). However, typically failure time is measured in continuous scale and it is reasonable to assume that at a given time the likelihood of failures due to multiple causes is negligible. Therefore, there are only few cases experiencing the event of interest (e.g. jth cause) at t and often \(d^{(j)}_{t}= 1\). When \(d^{(j)}_{t} = 1\), \( \mathrm {A}^{(j)}(t)=\frac {1}{n_{t}} \sum _{k\in \mathbb {R}^{0}_{t}} \textbf {1}\{M_{i}>M_{k}\mid T_{i}=t,\delta _{i}=j, T_{k}>t\} \) is the rank of the jth causespecific case marker value among the control markers at time t. In extreme situations, this quantity could jump from 1 to 0 and back between adjacent time points. Hence, the estimation of \(\text {AUC}^{(j)}_{t}\) based on A^{(j)}(t) requires some degree of smoothing. In this situation, the information within a neighborhood around t, \({N}^{(j)}_{t}(h_{n})=\{t_{k}: tt_{k}< h_{n},\delta =j\}\) can be used to estimate marker concordance at t. Let \(\text {cWMR}^{(j)}_{t}\) be an jth causespecific weighted mean rank estimator of \(\text {AUC}^{(j)}_{t}\) and define as
where \(N^{(j)}_{t}(h_{n})\) is the size of \(N^{(j)}_{t}(h_{n}).\) The optimal bandwidth (h_{n}) balances bias and variance of \(\text {cWMR}^{(j)}_{t}\). Therefore, to select the bandwidth, we followed the leaveoneout cross validation approach of [9] and adapted it to account for competing risks. In addition, we also propose a variance estimator for \(\text {cWMR}^{(j)}_{t}\) based on the assumption of bivariate Normality of the case and control marker pairs [9]. An additional file shows this in more detail [see Additional file 1].
Asymptotic properties of \(\,\,\text {cWMR}^{(j)}_{t}\):
In order to evaluate the asymptotic behaviours of \(\,\,\text {cWMR}^{(j)}_{t},\) we restrict our attention to a neighborhood around t and cause j i.e. \({N}^{(j)}_{t}(h_{n})=\{t_{k}: tt_{k}< h_{n},\delta =j\}\) with size \(N^{(j)}_{t}(h_{n})=m^{(j)}_{t}\). Other causes can be treated in a similar way. Let B_{t} denote the number of subjects at the start of the neighborhood i.e. at time t−h_{n}.
Theorem 1
Let \(n\left (B_{t}+m_{t}^{(j)}\right)\) denote the number of subjects surviving past t+h_{n}. If \(n\left (B_{t}+m_{t}^{(j)}\right)\rightarrow \infty \) as n→∞ and h_{n}→0 then \(\text {cWMR}^{(j)}_{t}\) converges to a Normal distribution with mean \(\text {AUC}^{(j)}_{t}+b_{n}(t)\) and asymptotic variance \({\mathrm {V}^{(j)}_{n}=\text {Var}[\text {cWMR}^{(j)}_{t}].}\) Here, b_{n}(t) denotes the bias and the variance is
The proof of this theorem is provided in Additional file 1.
Estimation of variance of \(\text {cWMR}^{(j)}_{t}\)
The variance of \(\text {cWMR}^{(j)}_{t},\mathrm {V}^{(j)}_{n}\), has components var[A^{(j)}(t_{(i)})] and cov[A^{(j)}(t_{(i)}),A^{(j)}(t_{(k)})]. In order to compute these components, we propose the following estimators in the spirit of variance calculation of AUC proposed in [15] and [9]
We estimate \(P_{0}^{(j)}(.),P_{1}^{(j)}(.)\) etc. using a Normal approximation for the jth causespecific case and control markers after a rankbased Zscore transformation and then empirically estimating the parameters of the approximating normal distributions. The detail is provided in Additional file 1.
Fractional polynomial estimator of i/D causeSpecific aUC(t) (cFPL)
The authors in [10] proposed a method for modelling I/D AUC(t) defined in equation (1) in the case of single event type. Their method directly models AUC(t) as a function of the event time t through a flexible fractional polynomials model proposed in [16]. We have extended it in the presence of competing risks as follows.
The \(\text {AUC}^{(j)}_{t}\) after transformation with link function η can be specified as a parametric function of t using fractional polynomials of degree L:
where for l=1,2,...,L
with \(p_{1}\leq p_{2}\leq \dots \leq p_{L}\) realvalued powers. As suggested in [10], we consider the power set \(p_{1},\dots,p_{L}\) in {2,1,0.5,0,0.5,1,2}, which is flexible enough to accommodate most applications. The set of regression parameters \(\beta _{j}=(\beta _{j0},\beta _{j1},\dots,\beta _{j7})\) is then estimated by optimizing a likelihood function. A logit function considered for η(.) similar to [10]. Let there are h_{j} number of failures due to cause j in the study. For each event time {t_{(H)} & δ_{H}=j}, there are two types of random variables
and
where n_{1}(t_{(H)}) and n_{2}(t_{(H)}) are the number of concordant and number of discordant pairs, respectively. Note that, conditional on riskset \(\mathbb {R}_{0t_{(H)}}\), the count n_{1}(t_{(H)}) follows a Binomial distribution with probability \(\text {AUC}^{(j)}_{t_{(H)}}(\beta _{j})\). The partiallikelihood for jth cause of failure is
Maximizing this partiallikelihood yields the parameter estimate \(\hat \beta _{j}\) of β_{j}. Then, by using (4), we obtain \(\text {AUC}^{(j)}_{t}(\hat {\beta }_{j}\)) estimate as a smooth function of time t and \(\hat {\beta }_{j}\). For estimation, we evaluate the score equations that correspond to the proposed likelihood. The proposed likelihood is constructed in the spirit of the [10]. However, the main difference is that, unlike [10], we have multiple causes of failure. We can write the loglikelihood in the following way
where for \(j=1,2,\dots,J\)
We estimate \(\hat \beta _{j}\) of β_{j} as a solution of score vector U(β_{j})=0. The elements of U(β_{j}) are
where
Again, for l=1,2,..,7, we get
where
In order to make inferences about the proposed estimators of the causespecific parameters, the major challenge lies in the fact that the proposed partiallikelihood cannot be treated as a regular likelihood function. Specifically, the asymptotic variance of the estimators is not the inverse of the negative second derivative of the partiallikelihood. We propose a sandwich variance estimator for the proposed global causespecific AUC(t) estimator below.
Asymptotic properties of cFPL estimator
In this section, we describe the asymptotic properties of the model parameter estimators. We state some regularity conditions in Additional file 1. We summarize the asymptotic behavior of the regression parameter estimator in the following theorem.
Theorem 2
Under the regularity conditions, \(\hat {\beta _{j}}\) converges almost surely to β_{0j}, while \(\sqrt {n}(\hat {\beta }_{j}\beta _{0j})\) converges to a multivariate Normal distribution with mean vector 0 and covariance matrix \(\Sigma _{j1}^{1}\Sigma _{j2}\Sigma _{j1}^{1}\). Here,
where, \(g_{ik}(\beta _{j})=\frac {(f_{ik}(\beta _{j})+f_{ki}(\beta _{j}))}{2}\) and
where \(N_{i}^{(j)}(\tau)\) counts number of events due to jth cause of failure occurring over [0,τ]. Theorem 2 can be proven in the spirit of the proof in [10]. However, the main difference is that, unlike [10], our likelihood construction account multiple causes of failure. An additional file shows the proof in more detail [see Additional file 1]. The covariance can be consistently estimated by \(\hat {\Sigma }_{j1}^{1}\hat {\Sigma }_{j2}\hat {\Sigma }_{j1}^{1}\), where
and
Corollary 1: Under the regularity conditions, \(\text {AUC}_{t}^{(j)}(\hat {\beta }_{j})\) is a consistent estimator of \(\text {AUC}_{t}^{(j)}\). Furthermore, it follows that \(\sqrt {n}(\text {AUC}_{t}^{(j)}(\hat {\beta }_{j})\text {AUC}_{t}^{(j)}({\beta }_{0j}))\) converges to a Normal distribution with mean 0 and variance \([\frac {d}{d\beta _{j}}\eta ^{1}(\textbf {A}\,{\beta _{j}}^{T})]_{\beta _{j}=\hat {\beta }_{j}}^{2}\,\,\textbf {A}\hat {\Sigma }_{j1}^{1}\hat {\Sigma }_{j2}\hat {\Sigma }_{j1}^{1}\textbf {A}^{T}\), where \(\textbf {A}=(1,t^{p_{1}},t^{p_{2}},\dots,t^{p_{7}}).\)
Results
Simulation study
Extensive simulation studies are conducted in order to compare the performance of the cWMR, cFPL and semiparametric [13] estimators for estimating \(\text {AUC}^{(j)}_{t}\). We assume two causes of failure (i.e. j=1,2) and a baseline marker M that is correlated with event time of cause 1, T^{(1)} but not with event time of cause 2, T^{(2)}. We consider several parametric combinations under two major scenarios. For each setting, we generate 500 dataset with a sample size of n=500 and for each simulated dataset 200 bootstrap simulations are performed. For each simulation, we estimate \(\text {AUC}^{(j)}_{t}\) at predicted time \(\text {log}(t)=1.5,1.2,\dots,0.6\). We report the average of bootstrap mean estimate of \(\text {AUC}^{(j)}_{t}\), absolute relative bias (ARB) of the estimate, average of model based standard error estimate (SE), average of the 500 bootstrap standard errors (BSE) and the coverage probability of the 90% confidence intervals (CI) for the estimates.
Scenario 1
We assume (log(T^{(1)}),M) jointly follows a bivariate Normal distribution (BVN) with correlation ρ i.e. (log(T^{(1)}),M)∼N_{2}(μ_{1},μ_{2},σ_{1},σ_{2},ρ), where μ_{1} and σ_{1} are mean and standard deviation (SD) of log(T^{(1)}) and μ_{2} and σ_{2} are mean and SD of M. We show the results for μ_{1}=0,μ_{2}=0,σ_{1}=1,σ_{2}=1,and ρ=−0.7. We consider a negative correlation between the marker and the event time which implies that higher marker value is more indicative of poor survival outcome and hence it is indicative of shorter event time. We further assume log(T^{(2)})∼N(0,1) and log(C)∼N(0,1), such that approximately 20% subjects are censored. Since T^{(2)} and M are independent, the I/D ROC curve for the competing cause of failure (i.e. cause 2) lies diagonally on the null ROC curve.
Scenario 2
We focus on a heterogeneous population where the marginal relationship between M and T^{(1)} is nonmonotone, while M and T^{(2)} are independent. The heterogeneous population comprised two distinct subgroups (G=0 or 1) and for G=1, M is also independent of log(T^{(1)}). The distribution of (log(T^{(1)}),M) follows a mixture of two BVNs. We show the results for two different parameter combinations:
We assume \(\mathrm {G}\thicksim \) Bernoulli(0.2). Note that the semiparametric approach is biased because of violation of monotonicity in this scenario, and therefore not estimated. The resulting \(\text {AUC}^{(j)}_{t}\) curves mimic the relationship in the LT data as will be demonstrated later.
In Tables 1, 2, and 3, we summarize the simulation results for scenarios 1–2 respectively. Table 1 demonstrates that the estimates of \(\text {AUC}^{(j)}_{t}\) is less biased when derived using cWMR and cFPL compared with the semiparametric method. For instance, for cause 1, the ARB in the estimate of \(\text {AUC}^{(j)}_{t}\)for the cWMR is 0.36% corresponding to predicted time log(t) = 1.5. However, for the semiparametric approach this is 2.16%. For large predicted time (i.e. log(t) = 0.9) both the cFPL and the semiparametric estimates show large ARB. The bootstarp standard errors for both cWMR and cFPL methods are close to their corresponding modelbased standard error estimates. For cWMR, the coverage probabilities of estimated \(\text {AUC}^{(j)}_{t}\) based on 90% estimated confidence intervals are very close to the nominal value of 0.9 across all predicted times. When we have sufficient data around the given predicted times, say 1.5 ≤log(t)≤ 0.6, the coverage probabilities of estimated \(\text {AUC}^{(j)}_{t}\) using cFPL are very close to the nominal value. However, the coverage probability is much lower than the nominal level when riskset size is 4 at given time log(t)=0.9. This is perhaps the issue of oversmoothing. It could be avoided by choosing the large predicted times as the 90th percentile of the observed survival time points [10]. In Tables 2 and 3, we compare the cWMR and cFPL estimates as the semiparametric estimator is known to be highly biased under scenario 2. Note that, in Table 2, the AUC value for cause 1 increases steadily between −1.5≤ log(t) ≤ 0 and then start decreasing. This kind of nonmonotone pattern may be due to violation of monotone relationship between marker and event time. The estimates of \(\text {AUC}^{(j)}_{t}\) for both cWMR and cFPL are close to their corresponding true values when the predicted time is small. In these comparisons cFPL performs slightly better than cWMR. The cFPL method yields substantially greater variances for large values of log(t) compared to the cWMR. For both methods, the estimated coverage probabilities are very close to the nominal coverage probability of 90% except for edges.
Overall, our results demonstrate that both cWMR and cFPL appear to perform well compared to semiparametric approach in terms of bias and standard errors. In addition, in scenario 2 where the monotonicity between marker and event time is violated, the semiparametric approach is known to be biased. However, both our proposed methods perform adequately well.
Application
We demonstrate the proposed methods for estimating \(\text {AUC}^{(j)}_{t}\) using LT data from a retrospective study conducted at the McGill University Health Center [3]. The LT study included 547 patients who underwent LT between 1991 and 2012 and who met the criteria: patient with graft survival >12 months; serum fibrosis biomarkers including FIB4 and NAFLD score available at 1 year after LT; and a minimum followup of 1 year. The study found that serum fibrosis markers performed well in predicting death and graft loss in LT recipients. According to the authors, this is the first study to establish the prognostic value of the fibrosis markers in a large cohort of LT recipients over a longterm followup period. We further analyzed a subset of the subjects (n=423) after excluding subjects with missing outcome and/or marker values. During the study period, 64 patients who underwent LT died due to graftrelated causes (e.g. graft failure). However, 62 patients died of causes that are unrelated to their transplantation (e.g. sepsis, cardiovascular disease, renal, respiratory failure etc.). Different causes of death led to a competing risks situation. The research objective is to evaluate the performance of FIB4 as a marker to discriminate between subjects who died due to graft related causes and those who died of nongraft related causes after the LT. The top two subplots in Figure 1 show the estimated AUC obtained using cWMR and cFPL methods for graftrelated, nongraft related and allcause death (considering both graft and nongraft related death as events of interest). Irrespective of the methods, the AUC curve for allcause mortality is biased compared to the curves for graft and nograft related deaths. The magnitude of bias is downward compared to graft related deaths and the bias is upward compared to nongraft related deaths. It indicates that consideration of allcause mortality as an event of interest instead of competing risks will result in biased accuracy estimates. Therefore, it leads us to analyse the LT data using the methodology proposed here in the presence of competing risks. Figure 1 also illustrates estimated \(\text {AUC}^{(j)}_{t}\) curves with 95% CI for graft and nongraft related causes using cWMR and cFPL methods. In order to estimate \(\text {AUC}^{(j)}_{t}\) using cWMR, the choice of neighborhood was 1.35 years back and forward. This bandwidth of 1.35 years is obtained after minimizing the IMSE. The estimated \(\text {AUC}^{(j)}_{t}\) curves of the FIB4 for the graft related death sustains a high predictive value (above 0.65 for most of the study duration) irrespective of the estimation procedures. For example, over the first 3 years of followup the estimated \(\text {AUC}^{(j)}_{t}\) under cFPL approach ranges between 0.89 to 0.56. This implies that on any day, t, during the first 3 years of followup, the probability that a subject after LT who dies due to graft related causes on day t having a FIB4 value greater than a subject who survives beyond day t is at least 0.56. Overall, \(\text {AUC}^{(j)}_{t}\) curve estimated using cFPL is a smooth function over predicted time while estimated \(\text {AUC}^{(j)}_{t}\) curve of cWMR is less smooth. We could not find any definitive reasons for the causespecific AUC(t) for graft related cause increases between years 3.3 and 8. However, in reference to results from simulation scenario 2, we observed that if there is any latent (or unobserved) heterogeneity in the data, the AUC curve shows nonmonotone trend over time. On the other hand, these two curves for nongraft related death are almost flat around the horizontal line at AUC(t) = 0.5. Furthermore, 95% CI’s of the estimates of AUC contain the null value of 0.5 which implies that FIB4 is noninformative as a prognostic marker for nongraft related events. Therefore, FIB4 as a baseline marker does not discriminate patients with nongraft related deaths after LT, which is expected. In addition, under cFPL method the CIs of the estimates of \(\text {AUC}^{(j)}_{t}\) curves at the tails of the study period are relatively wider/narrower than that of under cWMR method. The CIs of the estimates using cFPL are wider in both small and large predicted times because cFPL may have oversmoothing issue especially towards the start and end of the study. Finally, our analysis indicates a better discrimination by FIB4 for graft related deaths than nongraft related deaths after LT for most of the study duration.
Discussion
Measures of calibration and discrimination are integral parts to evaluate prognostic accuracy of a marker. Calibration indices provide information on how close the predicted risks are to the observed risks while discrimination indices measure whether markers correctly discriminate subjects who will subsequently experience the event of interest at or by time t from those who will not experience any event by t. Calibration measures e.g. an expected Brier score for competing risks and corresponding estimator were provided in [17]. It measures the closeness of the observed event status and the model predicted event probabilities in the presence of competing risks. Here, we primarily focus on the discrimination accuracy of marker. The main goal of this manuscript is to estimate causespecific AUC(t) of a baseline marker in the presence of competing risks. During such estimation, analysts often censor subjects when a competing event occurs. For instance, the outcome in LT study is timetodeath attributable to LTrelated causes, an analyst may consider a subject as censored once that subject dies of causes unrelated to LT. Because subjects who died of non LTrelated causes are not at risk of dying due to LT, censoring these competing risks events (informative censoring) may lead to distorted risk estimates [5] and subsequently biased accuracy estimators. Alternatively, some may consider a composite event where deaths attributable to LT and nonLT deaths are merged together as any adverse events. In the “Application” section, we mimic this to our LT data to show the drawback of using a composite endpoint to demonstrate the importance of considering competing risks in prognostic accuracy estimation. It is evident from our results that simply considering a composite event instead of competing events introduces bias in accuracy estimation. For competing risks analysis, the influence of covariate can be evaluated in relation to causespecific hazard or the subdistribution hazard of different causes of failure. For estimating causespecific hazard, when a subject experience any event, they are removed from the subsequent risksets. In contrast, for estimating subdistribution hazard [18], a subject who experiences a competing event is not removed from the riskset at that time, but rather is censored at the end of the followup.
We propose a local and a global estimators of causespecific AUC(t) for rightcensored survival time outcome in the presence of competing risks. In [13], a semiparametric approach based on Cox model was suggested for estimating causespecific AUC(t). This approach provides bias estimate when the association between event time and marker is nonmonotone. Also, this is not a onestep approach for estimating AUC trajectory over time and it requires longer computation time. In addition, their method lacks analytical development of the large sample properties for statistical inference. These motivate us to propose new estimators for estimating causespecific AUC(t). As pointed out earlier, the observed proportion of controls ranked lower than the (causespecific) case, generally leads to unstable estimates because it is based on a single ’case’ subject who had an event of interest at the specific time. Hence, the estimation of causespecific AUC that is based on observed proportions requires some degree of smoothing. Our proposed estimators implement the degree of smoothing in different ways. The local estimator  causespecific weighted mean rank (cWMR)  is a local average of unsmoothed time specific observed causespecific AUCs within a neighborhood of a given time t. cWMR is sensitive to neighborhood span. The width of the neighbourhood directly influenced the smoothness of the curves especially towards the end of the study when size of the riskset gets very small. We have considered crossvalidation approach to choose optimal neighborhood span. Use of adaptive smoothing techniques, for example, fixing the number of neighbors instead of a fixed bandwidth may be useful. Instead of using local average of concordance within a neighborhood of time, we propose an alternative method based on global curve fitting approach, cFPL, which estimates the causespecific AUC as a function of time through a flexible fractional polynomial function. It expresses the unsmoothed intermediate AUCs as a function of fractional polynomials of time and then estimates the coefficients of polynomials through a partial likelihood optimization. cFPL overcomes the issue of sensitivity to neighborhood span of cWMR. However, oversmoothing is an issue with cFPL in both small and large time points. In terms of computation time, cWMR is computationally efficient compared to cFPL. We also develop the large sample properties of both estimators as well as their corresponding analytical variances. Our simulation study suggests that these two estimators perform very well compared to the existing semiparametric method for measuring causespecific AUC. The performance has been evaluated in terms of absolute relative bias and coverage probability. Between our two proposed approaches, both estimators show similar level of relative bias particularly for small predicted time in simulation studies. However, for large times the cFPL estimator shows large bias compared to the cWMR estimator. In addition, the coverage probability of cFPL estimator is very different from the nominal coverage probability particularly at the edges. In LT study, our goal is to evaluate the performance of FIB4 as a baseline biomarker to discriminate between subjects who died due to graft related causes against those who died of nongraft related causes after the liver transplantation. Our analysis indicates a better discrimination of graft related deaths than nongraft related deaths after LT for most of the study duration. The estimation methods assume that the censoring time is independent of the survival time. It warrants additional research to allow covariate dependent censoring. Furthermore, in our settings, we do not consider timevarying marker. However, the proposed methodology is equally applicable to settings where timevarying marker exist.
Finally, between two estimators, which one is better: cWMR and cFPL? There is no general answer. The former is more adaptive to the local changes while the latter is good for an overall description. Our advice is to perform sensitivity analysis in which the choice of estimation methods vary. Also, for all methods the estimates at the higher time range are unstable, emphasizing the fact that one should have a sufficient number of events for estimation of the causespecific AUC. This problem may be avoided by choosing a sufficiently wide neighborhood (e.g. fixing the number of neighbors) when using the cWMR. For cFPL, we can choose relevant short future time horizon over which we have sufficient causespecific cases for numerically stable results.
Our study has some limitations. The first limitation is related to the bandwidth selection in cWMR approach. To implement the proposed cWMR estimator in practice, the appropriate bandwidths must be chosen. In this manuscript, we have used leaveoneout cross validation approach [9]. It would be interesting to perform sensitivity analysis by varying different bandwidth selectors. Next, our proposed approaches are not applicable when the data have missing information. However, following a typical imputation method, our approach could be applied to the imputed data. Details of the inference and sensitivity to the imputation methods is yet to be explored. Furthermore, the variance calculation of cFPL estimator is computationally intensive. Future research to explore alternate methods for efficient computation may be worthwhile. Moreover, the estimation methods assume that the censoring time is independent of the survival time. Additional research to allow covariate dependent censoring is warranted. In addition, in our settings, we do not consider timevarying marker. However, the proposed methodology is equally applicable to settings where timevarying marker exist. This is to be explored in the future.
Conclusions
We developed estimation procedures of estimating timedependent prognostic accuracy measures for a rightcensored timetoevent outcome in the presence of competing risks. The proposed methods are nonparametric, direct and computationally simple that will overcome the shortcomings of the existing approach. The methods will provide computationally efficient options for assessing the prognostic accuracy of markers for timetoevent outcome.
Availability of data and materials
The liver transplant dataset is not publicly available due to patient privacy restrictions, but could be made available from the corresponding author pending appropriate approval. The R codes for simulations and data analysis are provided in additional files.
Abbreviations
 KDRI:

Kidney donor risk index
 LT:

livertransplantation
 FIB4:

fibrosis score 4
 NAFLD:

nonalcoholic fatty liver disease fibrosis score
 ROC:

Receiver operating characteristic curve
 AUC:

Area under the ROC curve
 I:

Incident cases
 D:

Dynamic Controls
 C:

Cumulative cases
 WMR:

Weighted Mean Rank
 FPL:

Fractional Polynomial
 cWMR:

Weighted Mean Rank Estimator of I/D CauseSpecific AUC(t)
 cFPL:

Fractional Polynomial Estimator of I/D CauseSpecific AUC(t)
 BVN:

bivariate Normal distribution
 ARB:

absolute relative bias
 BSE:

bootstrap standard errors
 CI:

Confidence interval
 SD:

standard deviation
References
Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998; 97(18):1837–47.
Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. JNCI J Natl Cancer Inst. 1989; 81(24):1879–86.
Bhat M, Ghali P, RolletKurhajec KC, Bhat A, Wong P, Deschenes M, Sebastiani G. Serum fibrosis biomarkers predict death and graft loss in liver transplantation recipients. Liver Transplant. 2015; 21(11):1383–94.
ValletPichard A, Mallet V, Nalpas B, Verkarre V, Nalpas A, DhalluinVenier V, Fontaine H, Pol S. Fib4: an inexpensive and accurate marker of fibrosis in hcv infection. comparison with liver biopsy and fibrotest. Hepatology. 2007; 46(1):32–6.
Prentice RL, Kalbfleisch JD, Peterson Jr AV, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978; 34(4):541–54.
Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. Jama. 1982; 247(18):2543–6.
Heagerty PJ, Lumley T, Pepe MS. Timedependent roc curves for censored survival data and a diagnostic marker. Biometrics. 2000; 56(2):337–44.
Heagerty PJ, Zheng Y. Survival model predictive accuracy and roc curves. Biometrics. 2005; 61(1):92–105.
SahaChaudhuri P, Heagerty P. Nonparametric estimation of a timedependent predictive accuracy curve. Biostatistics. 2013; 14(1):42–59.
Shen W, Ning J, Yuan Y. A direct method to evaluate the timedependent predictive accuracy for biomarkers. Biometrics. 2015; 71(2):439–49.
Schmid M, Potapov S. A comparison of estimators to evaluate the discriminatory power of timetoevent models. Stat Med. 2012; 31(23):2588–609.
Song X, Zhou XH. A semiparametric approach for the covariate specific roc curve with survival outcome. Stat Sin. 2008; 18:947–65.
Saha P, Heagerty P. Timedependent predictive accuracy in the presence of competing risks. Biometrics. 2010; 66(4):999–1011.
Zheng Y, Cai T, Jin Y, Feng Z. Evaluating prognostic accuracy of biomarkers under competing risk. Biometrics. 2012; 68(2):388–96.
Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: Medicine; 2003.
Royston P, Altman DG. J R Stat Soc Ser C (Appl Stat). 1994; 43(3):429–53.
Schoop R, Beyersmann J, Schumacher M, Binder H. Quantifying the predictive accuracy of timetoevent models in the presence of competing risks. Biom J. 2011; 53(1):88–112.
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999; 94(446):496–509.
Acknowledgments
The authors thank Professor James A. Hanley at McGill University for many constructive inputs during writeup which lead to improvement of the manuscript.
Funding
We would like to acknowledge our funding agencies. PSC is supported by the Natural Sciences and Engineering Research Council of Canada(RGPIN201706100) and Fonds de la Recherche en Santé du Québec (Junior 1 Salary Award). GS is supported by a Junior 1 and 2 Salary Award from FRQS (27127 and 267806) and research salary from the Department of Medicine of McGill University. The funding agencies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
This article is conceptualized by RD and PSC. RD carried out the methodological work, computer programming, data analysis and writing, under the supervision of PSC. GS provided the data. All authors have reviewed drafts of the manuscript, offered interpretation and critical comment. All authors have read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The need for informed consent had been waived due to the retrospective nature of the study.The Institutional Research Ethic Board of the Research Institute of McGill University Health Center approved the study, which was conducted according to the Declaration of Helsinki.
Consent for publication
Not Applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional material
Additional file 1
This file includes the proof of asymptotic properties of cWMR and cFPL estimators and their corresponding variance estimation techniques.
Additional file 2
This file includes the computer code (written in R programming language) for estimating causespecific AUC (t) using cWMR and cFPL estimators.
Additional file 3
this file includes description files for using r code provided in Additional file 2 for estimating causespecific aUC(t) using cWMR and cFPL estimators.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Dey, R., Sebastiani, G. & SahaChaudhuri, P. Inference about timedependent prognostic accuracy measures in the presence of competing risks. BMC Med Res Methodol 20, 219 (2020). https://doi.org/10.1186/s12874020011000
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874020011000