 Research
 Open access
 Published:
Model selection for survival individualized treatment rules using the jackknife estimator
BMC Medical Research Methodology volumeÂ 22, ArticleÂ number:Â 328 (2022)
Abstract
Background
Precision medicine is an emerging field that involves the selection of treatments based on patientsâ€™ individual prognostic data. It is formalized through the identification of individualized treatment rules (ITRs) that maximize a clinical outcome. When the type of outcome is timetoevent, the correct handling of censoring is crucial for estimating reliable optimal ITRs.
Methods
We propose a jackknife estimator of the value function to allow for rightcensored data for a binary treatment. The jackknife estimator or leaveoneoutcrossvalidation approach can be used to estimate the value function and select optimal ITRs using existing machine learning methods. We address the issue of censoring in survival data by introducing an inverse probability of censoring weighted (IPCW) adjustment in the expression of the jackknife estimator of the value function. In this paper, we estimate the optimal ITR by using random survival forest (RSF) and Cox proportional hazards model (COX). We use a Ztest to compare the optimal ITRs learned by RSF and COX with the zeroorder model (or onesizefitsall). Through simulation studies, we investigate the asymptotic properties and the performance of our proposed estimator under different censoring rates. We illustrate our proposed method on a phase III clinical trial of nonsmall cell lung cancer data.
Results
Our simulations show that COX outperforms RSF for small sample sizes. As sample sizes increase, the performance of RSF improves, in particular when the expected log failure time is not linear in the covariates. The estimator is fairly normally distributed across different combinations of simulation scenarios and censoring rates. When applied to a nonsmallcell lung cancer data set, our method determines the zeroorder model (ZOM) as the best performing model. This finding highlights the possibility that tailoring may not be needed for this cancer data set.
Conclusion
The jackknife approach for estimating the value function in the presence of rightcensored data shows satisfactory performance when there is small to moderate censoring. Winsorizing the upper and lower percentiles of the estimated survival weights for computing the IPCWs stabilizes the estimator.
Introduction
Precision medicine aims to inform clinical decisions by tailoring treatments to the characteristics of each patient. By using individual informationâ€”such as clinical history, lab results, demographics and social and economic characteristicsâ€”precision medicine takes advantage of heterogeneity to provide treatment recommendations to each patient in a datadriven way [1]. Approaches for estimating an optimal treatment rule have been developed using machine learning techniques [2,3,4,5,6]. An optimal treatment rule is found among a class of all possible treatment rules that maximizes the â€śvalueâ€ť, i.e.,Â the expected reward of a potential outcome when the rule is applied to future patients [2]. Since the value is used to evaluate the performance of a treatment rule, it is important to estimate its bias and standard error accurately. Such estimation is commonly done by crossvalidation (CV). The jackknife estimator or leaveoneoutcrossvalidation (LOOCV) is a special case of CV where each individual in the sample is a fold, all but one fold are used for training and then the trained model is tested on the leftout fold. The procedure is repeated until all folds have been used as the test set once. The jackknife method requires very few assumptions and is approximately unbiased [7].
Jiang et al. [7, 8] used the jackknife estimator for determining the optimal individualized treatment rules for participants in the Intensive Diet and Exercise for Arthritis (IDEA) trial [9, 10]. For the three treatments for knee osteoarthritis available in the trial (exercise, dietary weight loss, and a combination of exercise and dietary weight loss) and seven outcomes of interest, the authors applied their proposed method to 27 different models, including penalized regression, kernel ridge regression, treebased methods, listbased methods, and Bayesian additive regression trees. The method identified random forest as the optimal model for most outcomes, while listbased models were optimal for a few. The results supported the overall findings from the IDEA trial that the combined treatment intervention was optimal for most participants. However, one notable finding of the precision medicine approach was the evidence that, for certain outcomes, a subgroup of participants would benefit more from diet alone.
In biomedical research, rightcensored timetoevent data are frequently collected because of dropouts or administrative censoring. In such settings where the interest is in a survival outcome, parametric and semiparametric methods for estimating an individualized treatment rule have been developed to account for censoring [11,12,13]. Assuming independent or conditionally independent censoring, these methods generally perform well for a small to moderate proportion of censored observations. They are generally categorized as regressionbased [14,15,16] or classificationbased methods [11, 12]. Regressionbased approaches model the mean outcome conditional on treatment assignment and patient data and their performance depends on the correct specification of the posited models. Classificationbased approaches were developed to avoid issues of model misspecification. Within this second class of approaches, the optimal treatment rule is determined by the classifier that minimizes the expected weighted misclassification error. For both types of approaches, an important concern resides in finding tools for evaluating the performance of an ITR. In this paper, we extend the jackknife method proposed by Jiang et al. [7] for a binary treatment and a rightcensored survival outcome. Our extension of the jackknife method uses the inverse probability of censoring weighting (IPCW) to mitigate the bias induced by the censored observations. We show the consistency of the jackknife estimator for rightcensored data and study its asymptotic properties through simulations.
Methods
The jackknife estimator of the value function
In this section, we introduce general notations and present some existing work relevant to this paper. Let \(\{(\mathbf {X}_{\mathbf {i}}, A_i, Y_i )\}_{i=1}^n\), be independent and identically distributed triples \(({\mathbf {X}}, A, Y )\) for n patients. We define \(\mathbf {X} \in \mathcal {X} \subseteq \mathbb {R}^p\) as a vector of baseline patient characteristics, \(A \in \{0, 1 \}\) as the binary treatment indicator, and \(Y \in \mathbb {R}\) as a scalar outcome, with higher values of Y representing more favorable outcomes. Precision medicine looks to maximize the expected reward of a potential outcome under a treatment rule. The treatment rule maps the patientlevel covariates \(\mathcal {X}\) to the binary treatment space \(\{ 0, 1 \}\). In a single stage setting, the optimal treatment rule is selected among a class of all possible treatment rules \(\mathcal {D}\) that maximizes the value of a potential outcome when applied to future patients [2]. The expected reward or value function is defined as:
where \(P(A\mathbf {X})\) is the propensity score, which is known in a randomized study or can be estimated with a logistic regression or some other model in an observational study. The optimal treatment rule is defined as \(d^* = \arg \max _{d} V(d)\).
In the rightcensored survival data setting, we consider \(\tilde{T}\) as the true survival time and \(T= min(\tilde{T}, \tau )\) as the \(\tau\)truncated survival time where \(\tau < \infty\) is the maximum followup time. Also, we assume censoring time C is independent of T given \(({\mathbf {X}}, A)\) and denote the censoring indicator as \(\delta =I(T \le C)\). When the interest is in maximizing the restricted mean survival time, the expression of the above value function then becomes
The optimal treatments are estimated with regressionbased or classificationbased approaches. Regressionbased approaches model the mean outcome conditional on treatment assignment and patient data [3, 17,18,19]. The performance of the regressionbased approaches depend on the correct specification of the posited models. In addition, modeling treatmentcovariate interactions can lead to a highdimensional situation, further complicating the estimation procedures. Classificationbased approaches were developed to avoid issues of model misspecification [4, 5, 20]. This alternative class of approaches relies on fewer modeling assumptions and the optimal treatment rule is estimated by minimizing the expected weighted misclassification error. The selected approach is then evaluated via crossvalidation techniques for assessing the value function associated with the estimated treatment rule.
Jiang et al [7] proposed a LOOCV or jackknife estimator for the value function for continuous outcomes. The approach requires weak assumptions, namely that the data are independent and identically distributed. The proposed jackknife estimator is expressed as:
where \(\hat{d}^{(i)}_{n}\) represents the treatment rule estimated from a training set of size n with the ith observation left outÂ and \({P}(A_{i}\mathbf {X}_{i})\) is the propensity score of the test set i. The variance of the estimator in (1) is estimated by
where \(R_i^{jk}= \frac{1}{\overline{W}_{n}} U_i  \frac{\overline{U}_{n}}{\overline{W}^2_{n}} W_i\), with \(U_i = Y_{i} \frac{1\left\{A_{i}=\hat{d}^{(i)}_{n} (\mathbf {X}_{i})\right\}}{{P}(A_{i}\mathbf {X}_{i})}\), \(W_i= \frac{1\left\{A_{i}=\hat{d}^{(i)}_{n} (\mathbf {X}_{i})\right\}}{{P}(A_{i}\mathbf {X}_{i})}\), \(\overline{U}_{n}=n^{1} \sum _{i=1}^{n} U_{i}\), and \(\overline{W}_{n}=n^{1} \sum _{i=1}^{n} W_{i}\). Through proofs and simulations, Jiang et al. [7] showed consistency and asymptotic normality of the proposed jackknife estimator of the value function. Note that we are ignoring the contribution to the limiting distribution due to estimating the propensity scores. It appears from our simulation study that this omission does not harm the performance of our method for a range of sample sizes. Evaluating this question theoretically is beyond the scope of this paper.
In the presence of dropouts or administrative censoring, the unobserved events bring some complexity to statistical analysis. Specifically, the presence of informative censoring (i.e., the censoring time is dependent on the failure time) may incur bias in both survival times and censoring probabilities. Thus, the higher the proportion of censoring in the data, the more caution is to be taken during the estimations procedures. One approach that is commonly used to address the issues arising from censoring is the inverse probability of censoring weighted estimator (IPCW). The IPCW was initially developed to address bias in survival probabilities due to censoring [21,22,23,24]. The IPCW estimator corrects for bias by giving zero weight to censored observations and extra weights to uncensored observations. The weights are assigned to observations with similar characteristics to the censored observations and are usually estimated by logistic regression or Cox proportional hazards regression. Applying the IPCW recreates a potential scenario of no censoring and the reweighted sample obtained is asymptotically unbiased. However, IPCW does not address possible models misspecification that may remain when parametric or semiparametric estimation methods are employed.
Jackknife estimator of the value for rightcensored data
Definition of the jackknife estimator of the value function for rightcensored data
We propose an extension of the jackknife value estimator developed by Jiang et al [7] in the presence of right censoring. Our jackknife value estimator corrects for the censoringinduced bias by applying the IPCW method to the estimator expressed in (1). The proposed jackknife value estimator has the following form:
where \(\hat{d}^{(i)}_{n}\) represents the treatment rule estimated from a training set of size n with the ith observation left out, \({P}(A_{i}\mathbf {X}_{i})\) is the propensity score of the test set i, \({S}_{c}(t\mathbf {X}_{i})\) the probability that the ith observation has not been censored by time t conditional on \(\mathbf {X}_{i}\), and \(\delta _i=I(T_i \le C_i)\). Here, we assume that \({P}(A_{i}\mathbf {X}_{i})\) and \({S}_{c}(t\mathbf {X}_{i})\) are known. However, in practice, \({P}(A_{i}\mathbf {X}_{i})\) and \({S}_{c}(t\mathbf {X}_{i})\) are sometimes unknown and are estimated by regression models. Once obtained, the estimated propensity scores and censoring probabilities are plugged in the expression for the jackknife estimator in (2).
The estimated variance of the jackknife value estimator is
where \(R_i^{jk}= \frac{1}{\overline{W}_{n}} U_i  \frac{\overline{U}_{n}}{\overline{W}^2_{n}} W_i\) is the biascorrected form of value function inspired by the influence function, with \(U_i= T_{i} \frac{1\left\{A_{i}=\hat{d}^{(i)}_{n} (\mathbf {X}_{i})\right\}}{{P}(A_{i}\mathbf {X}_{i})} \frac{\delta _{i}}{{S}_{c}(T_i\mathbf {X}_{i})}\), \(W_i= \frac{1\left\{A_{i}=\hat{d}^{(i)}_{n} (\mathbf {X}_{i})\right\}}{{P}(A_{i}\mathbf {X}_{i})} \frac{\delta _{i}}{{S}_{c}(T_i\mathbf {X}_{i})}\), \(\overline{U}_{n}=n^{1} \sum _{i=1}^{n} U_{i}\), and \(\overline{W}_{n}=n^{1} \sum _{i=1}^{n} W_{i}\).
We note that, as before, we are ignoring the potential variability due to estimating \({P}(A_{i}\mathbf {X}_{i})\) and \({S}_{c}(t\mathbf {X}_{i})\), but our simulations verify that the approach works acceptably well for a range of sample sizes considered.
Estimation of the optimal treatment rule
The optimal treatment rule \(d_n^{opt}\) maps the patientlevel covariates \(\mathcal {X}\) to the binary treatment space \(\{ 0, 1 \}\). \(d_n^{opt}\) is selected among a class of all possible treatment rules \(\mathcal {D}\) that maximize the value of a potential outcome when applied to future patients [2].
A wide range of machine learning approaches has been used to obtain \(\hat{d}_{n}^{opt}\), the estimated optimal treatment rule when the outcome is rightcensored. Goldberg and Kosorok [15] proposed a Qlearning algorithm with backward recursion for the multistage setting. Zhao et al. [11] developed a doubly robust outcomeweighted learning approach that uses IPCW and requires the correct specification of either the survival or the censoring model. Cui et al. [12] further proposed an extension of outcome weighted learning which relaxes the previous requirements by considering a treebased approach. In this manuscript, we opt for using random survival forests [25] (RSF) and the Cox proportional hazards model (COX) as our estimation approaches. Random survival forests is one of the most commonly used treebased methods for survival analysis. Treebased methods are widely used because they are flexible methods that do not make distributional assumptions for the failure times. Even though Cox proportional hazards models [26] make some assumptions on the distribution of the failure times, they are robust to misspecification and remain the most commonly used regression method for survival analysis [27].
Using each of the selected estimation approaches, we obtain the estimated restricted mean survival times (RMST) [28] corresponding to each treatment option. The optimal treatment for a patient i is then obtained by the treatment that yields the larger area under the curve.
Under mild assumptions, we prove that the proposed jackknife estimator for rightcensored data is consistent.
Consistency of the jackknife estimator for rightcensored data
In this section, we show proof of the consistency of the jackknife estimator. First, we make the following two assumptions.
Assumption 1
\(E[P_{X}(\hat{d}_{n} (\mathbf {X}) \ne \hat{d}_{n1} (\mathbf {X}) )] \longrightarrow 0\)
as \(n\rightarrow \infty\)
Assumption 2
\(E\Big [\frac{\mathbf {T}^{2} + 1}{P^2(A\mathbf {X}) \; S^2_{c} (T\mathbf {X}) } \Big ] < \infty\)
Assumption 1 makes the assumption that the treatment rules based on samples of size n and \(n1\) are asymptotically equal in probability. Assumption 2 assumes that the expectation of the second moment of the outcome, adjusted by the propensity score and the probability of censoring, is finite.
Theorem 1
Given the above assumptions,
when \(P(A\mathbf {X})\) and \(S_{c}(T\mathbf {X})\) are known.
The complete proof is shown in Additional file 1.
Ztest for comparing two ITRs in the presence of rightcensored data
We build a test to compare optimal treatment rules from two estimation approaches. Similarly to Jiang et al. [7], the test statistic is expressed as:
where \(\widehat{V}(\hat{d}_{n,1})\) and \(\widehat{V}(\hat{d}_{n,2})\) are the estimated value functions for the estimated optimal treatment rules obtained from the two estimation approaches. \(R_{\text {1,i}}\) and \(R_{\text {2,i}}\) are the biascorrected, influence functioninspired value function of the ith individual under these treatment rules.
The pvalue for the test is obtained as \(p= 2 \; \int _{\left T \right }^{\infty } f(z) \; dz\), where f is the density of the standard normal distribution. We define the power of the test as the estimated proportion of the total number of simulations whose pvalues are under 0.05.
Through simulations, we study the asymptotic properties of the jackknife estimator. We provide QQ plots for the distribution of the test statistic above and boxplots for the jackknife value estimate obtained after 500 replications.
Simulation studies
Simulation settings
We carry out simulations to assess the performance of the proposed jackknife estimator of the value function. The binary treatment \(A \in \{0, 1 \}\) is assigned with equal probabilities. We generate five covariates \(X_1, \ldots, X_5\) independently from a uniform (U[0,Â 1]) distribution. We obtain and compare the estimated optimal treatment rules with random survival forests (RSF) and proportional hazards regression (COX). We also compare the RSF to the zeroorder model (ZOM) or onesizefitsall, which assigns the best single treatment to all patients. For each estimation approach, using the five generated covariates, we estimate the optimal treatment as detailed in the Methods section. The IPCWs are obtained by Cox proportional hazards models using all five covariates and their interactions with the binary treatment. To control for potential extreme outliers in the estimated values, we apply a \(90\%\) winsorization â€” trimming of the top and bottom \(5\%\) â€” to the estimated censoring probabilities [29]. The propensity scores are obtained with logistic regression where we regress the binary treatment on all five covariates. Although they are known a priori in our simulations, we use the estimated propensity scores to correctly reflect variability in the estimation of the value function. We train the RSF with the R package randomforestSRC [30] with the default tuning parameters. We use the survival [31] package to conduct the proportional hazards regressions.
We consider three simulation scenarios where the true survival ( \(\tilde{T}\)) and censoring times (C) are conditionally independent. We use logtransformed survival and censoring times in all our analyses. For each scenario, we perform the simulations 500 times for sample sizes of 200, 400 and 800, and consider the average censoring rates of \(0 \%\), \(10\%\), \(20\%\), \(40\%\). For all models, the errorsÂ for the failure times (\(\epsilon\)) are generated from a normal distribution with mean 0 and standard deviation 0.2, and the errors for the censoring times (\(\xi\)) are generated from a normal distribution with mean 0 and standard deviation 0.5. For the proportional hazards model, the baseline hazard function \(\lambda _0 {(t)} = 2t\).
The simulation scenarios are detailed below.
 \({\textbf {Scenario 1.}}\):

\(\tilde{T}\) is generated from the accelerated failure time model. \(\tau\) is 1.8 and \(\alpha\) is 0.5, 0.22 and \(0.14\) for \(10\%\), \(20\%\) and \(40\%\) average censoring rates respectively: \(\log (\tilde{T})= 0.2 0.5X_{1} +0.5X_{2} + 0.4X_{3} + (0.3 0.1X_{1} 0.6X_{2} + 0.1X_{3})A + \epsilon\), \(\log (C)= \alpha 0.1X_1 +0.2X_2 +0.2X_3 + (0.5 0.1X_1 0.6X_2 + 0.3X_3)A + \xi\). The value of the optimal ITR is approximately 1.105.
 \({\textbf {Scenario 2.}}\):

\(\tilde{T}\) is generated from the accelerated failure time model with treestructured effects. \(\tau\) is 8 and \(\alpha\) is 0.5, \(0.25\) and \(1.18\) for \(10\%\), \(20\%\) and \(40\%\) average censoring rates respectively: \(\log (\tilde{T})= X_1 + I(X_2> 0.5) I(X_3 > 0.5) + (0.3  X_1)A + 2\{I(X_4< 0.3) I(X_5 < 0.3) \}A + \epsilon\), \(\log (C)= \alpha X_1 +2X_2 +2X_3 + (5 X_1 6X_2 +3X_3)A + \xi\). The value of the optimal ITR is approximately 3.590.
 \({\textbf {Scenario 3.}}\):

\(\tilde{T}\) is generated from the Cox proportional hazards model. \(\tau\) is 2.5 and \(\alpha\) is \(0.05\), \(0.40\) and \(0.93\) for \(10\%\), \(20\%\) and \(40\%\) average censoring rates respectively: \(\lambda _{\tilde{T}}{(t \;  \; A,X)} = \lambda _0 {(t)} \; \text {exp}\{0.2 + 0.75 X_1^{1.5}  0.25 X_2 + (1.6  1.4 X_1^{0.5}  2.4 X_2^2) A \}\), \(\log (C)= \alpha +0.5X_1 +X_2 +0.3X_3 + 0.1X_4 + (0.1 +0.5X_1 X_2 +0.3X_3)A + \xi\). The value of the optimal ITR is approximately 1.144.
One inconvenience of using the jackknife estimator is that it becomes rapidly computationally intensive as the sample size increases. We attempt to mitigate this issue for large sample sizes by performing a partial jackknife (LOOCV) on a random subset \(r, r \le n\) of the original sample. Thus, we compute the jackknife value estimator in (2) for \(n=200\) and \(n=400\) and opted for the partial jackknife for \(n=800\)â€”for each replicate, we apply the expression of the jackknife value estimator on a randomly selected subset \(r=500\) of the initial sample \(n=800\). Through our work with simulated data sets, we observe that using the partial jackknife requires up to \(40 \%\) less computation time than using the â€śfull jackknifeâ€ť for a sample of size \(n=800\).
Results
Table 1 shows the power of the test statistic expressed in (4). We provide a table with the value of the optimal ITR and the estimated values for each scenario under the \(0 \%\) censoring case in Additional file 2. We compare the estimated optimal individualized treatment rules obtained from RSF and COX on the one hand, and RSF and ZOM on the other hand. RSF outperforms ZOM in all scenarios. RSF tends to show an increased performance over COX as the event times stem from more nonlinear distributions. Figure 1 illustrates the boxplots of the estimated values obtained using each estimation method for all three scenarios.
In scenario 1 and across all censoring mechanisms, COX performs better than RSF for \(n=200\). As the sample size increases, the two estimation approaches show similar performances. As expected, RSF always performs better than ZOM, with enhancing performance as the sample size increases and a decreasing performance as the censoring proportion gets larger.
In scenario 2, as the failure times are generated with a treebased structure, COX performs slightly better than RSF for \(n=200\). For \(n=400\) both methods perform similarly and RSF outperforms COX for \(n=800\). We observe a similar pattern as in scenario 1 when we compare RSF to ZOM. RSF does better than ZOM as the sample size increases and the performance decreases as more censoring is introduced in the data.
When the data are generated from a proportional hazards model in scenario 3, we observe similar results to scenario 1. COX consistently performs better than RSF for all censoring mechanisms and across sample sizes, while RSF always does better than ZOM.
We investigated the size of the proposed Ztest under the null hypothesis. The null hypothesis is that the value (or mean outcome) is the same for the two estimation methods being compared. Across all three scenarios, we modified the posited failure time models to make one treatment option clearly more favorable than the other treatment option. Under these settings, we assess whether the type I error is preserved for the proposed Ztest. log(C) and \(\tau\) remain the same as in the previously presented simulation scenarios, and we tuned \(\alpha\) to achieve the desired proportion of censored observation in each simulation. We present the slightly modified scenarios in Additional file 3. In table 2, we present the pvalues of the tests, along with the Monte Carlo errors derived from these models. It is important to note that, for scenario 2 and 40% censoring, we do not achieve the desired test size of around 0.05. We suspect that, for high censoring rates, and when the failure time model is not linear, a sample size larger than the ones considered in our simulations is necessary to achieve a type I error.
We studied the normality of the jackknife estimators under each scenario and censoring mechanism. Figures 2 and 3 contain the QQ plots for the jackknife test statistic expressed in equation (4) with \(90\%\) winsorized censoring probabilities. We show the plots for \(0 \%\), \(20\%\) and \(40\%\) censoring. When the proportion of censoring is relatively small (\(\approx 10\%\)), we observe some results analogous to the no censoring case. Across all scenarios, and when there is no or minimal censoring, we observe that the test statistic has approximately a standard normal distribution. Even as the censoring proportion increases and the failure times are of more complex structures, the test statistic remains normally distributed. Trimming the top and bottom \(5\%\) weights permits avoiding extreme outliers that would cause deviations from normality. In general, departures from normality in the tails are not uncommon in survival analysis when censoring is high.
Data application
We illustrate our proposed method using nonsmallcell lung cancer data [32]. The Phase III randomized trial was conducted to investigate the duration of therapy that would maximize survival. Patients with advanced nonsmallcell lung cancer were recruited and randomized to either four cycles of carboplatin/paclitaxel or continuous therapy with carboplatin/paclitaxel until disease progression. The trial enrolled 230 participants; however, our analysis data set contains information for 224 participants with complete data. In this sample, 115 participants were assigned to continuous therapy with carboplatin/paclitaxel until disease progression and 109 participants were assigned to four cycles of carboplatin/paclitaxel. We consider five covariates in our data analysis: performance status, cancer stage, race, sex, and age. The censoring rate was around 32%.
We apply our proposed method to the analytic sample. We set \(\tau\) = 500 days, thus truncating the six highest survival times to 500 days. Similar to the simulations settings, the IPCWs are estimated by Cox proportional hazards models using all five covariates and their interactions with the binary treatment. The propensity scores are obtained via logistic regressions where we regress the binary treatment on all five covariates. We trained the RSF with default tuning parameters. With each RSF, COX, and ZOM, we compared RMST for both treatment groups to obtain the optimal ITRs. We then computed the corresponding jackknife value estimates, along with corresponding standard errors. We also computed the Ztest comparing RSF to COX, RSF to ZOM and COX to ZOM.
As illustrated by their value estimates (Table 3), RSF performed the worst, followed by COX, and ZOM performed the best. RSF performed worse than either COX (\(p=0.06\)) or ZOM (\(p=0.05\)), and ZOM performed better than COX (\(p=0.46\)).
There are a few possible explanations for why ZOM provided the best performance. First, we may not have a sufficiently diverse pool of patients in our analytic sample; indeed, we know that precision medicine requires heterogeneity to perform well. It appears that tailoring is not needed in this case, and all patients are recommended to receive â€ścontinuous therapy with carboplatin/paclitaxel until disease progressionâ€ť. Second, the sample size was not very large, and we only used five tailoring variables in our analysis. Having more features available to proceed with a thorough variable selection would have benefited RSF, as random forests are known to be highly flexible. ZOM selects â€ścontinuous therapy with carboplatin/paclitaxel until disease progressionâ€ť as the best single treatment, which is consistent with the findings of the paper from [32] discussing the results from the trial.
Discussion
In this article, we proposed an extension of the jackknife method to estimate value functions and optimal treatment rules when outcomes are rightcensored survival data. The method shows strong performance for small to mild censoring rates. When the outcome has a high proportion of censoring, trimming the top and bottom \(5\%\) of the estimated censoring probabilities leads to satisfactory results.
In our method, we used the complete data to estimate the inverse probability of censoring and propensity weights before applying the jackknife method. Unfortunately, it was not possible to estimate these weights with test sets of only one observation within the jackknife procedure. We would have had an empty set each time the leftout observation was censored, and thus, would have been unable to compute the weights. Alternatively, we could have estimated the weights under settings of no censoring. However, this naive approach also has obvious limitations. While we expect this approach in estimating the censoring probabilities to have some effects on our estimation results, it is not clear how significant these effects are based on our simulations.
Future research should investigate the use of leavefiveout or leavetenoutcrossvalidation (instead of the LOOCV) to increase the predictive performance in a high censoring setting. Also, it would be of interest to determine a systematic procedure for defining the truncation point needed for the RMST to improve the method performance in the presence of high censoring. Finally, a theoretical understanding of the reasons why the potential variability created by estimating the propensity scores and the censoring probabilities does not affect the performance of our estimator, remains an open question.
Availability of data and materials
The data set analyzed, and the R codes for the simulations performed in the present manuscript are available from the corresponding author on reasonable request.
Abbreviations
 COX:

Cox proportional hazards model
 CV:

Crossvalidation
 LOOCV:

Leaveoneoutcrossvalidation
 IPCW:

Inverse probability of censoring weighting
 ITR:

Individualized treatment rule
 RMST:

Restricted mean survival time
 RSF:

Random survival forest
 ZOM:

Zeroorder model
References
Kosorok MR, Laber EB. Precision medicine. Ann Rev Stat Appl. 2019;6:263â€“86.
Qian M, Murphy SA. Performance guarantees for individualized treatment rules. Ann Stat. 2011;39(2):1180.
Moodie EE, Dean N, Sun YR. Qlearning: Flexible learning about useful utilities. Stat Biosci. 2014;6(2):223â€“43.
Kang C, Janes H, Huang Y. Combining biomarkers to optimize patient treatment recommendations. Biometrics. 2014;70(3):695â€“707.
Zhao Y, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. J Am Stat Assoc. 2012;107(499):1106â€“18.
Laber EB, Zhao YQ. Treebased methods for individualized treatment regimes. Biometrika. 2015;102(3):501â€“14.
Jiang X, Nelson AE, Cleveland RJ, Beavers DP, Schwartz TA, Arbeeva L, et al. Precision Medicine Approach to Develop and Internally Validate Optimal Exercise and WeightLoss Treatments for Overweight and Obese Adults With Knee Osteoarthritis: Data From a SingleCenter Randomized Trial. Arthritis Care Res. 2021;73(5):693â€“701.
Jiang X, Nelson AE, Cleveland RJ, Beavers DP, Schwartz TA, Arbeeva L, etÂ al. Technical Background for â€śA Precision Medicine Approach to Develop and Internally Validate Optimal Exercise and Weight Loss Treatments for Overweight and Obese Adults with Knee Osteoarthritisâ€ť. 2020. arXiv Preprint arXiv:2001.09930.
Messier SP, Legault C, Mihalko S, Miller GD, Loeser RF, DeVita P, et al. The Intensive Diet and Exercise for Arthritis (IDEA) trial: design and rationale. BMC Musculoskelet Disord. 2009;10(1):1â€“14.
Messier SP, Mihalko SL, Legault C, Miller GD, Nicklas BJ, DeVita P, et al. Effects of intensive diet and exercise on knee joint loads, inflammation, and clinical outcomes among overweight and obese adults with knee osteoarthritis: the IDEA randomized clinical trial. JAMA. 2013;310(12):1263â€“73.
Zhao YQ, Zeng D, Laber EB, Song R, Yuan M, Kosorok MR. Doubly robust learning for estimating individualized treatment with censored data. Biometrika. 2015;102(1):151â€“68.
Cui Y, Zhu R, Kosorok M. Tree based weighted learning for estimating individualized treatment rules with censored data. Electron J Stat. 2017;11(2):3927.
Bai X, Tsiatis AA, Lu W, Song R. Optimal treatment regimes for survival endpoints using a locallyefficient doublyrobust estimator from a classification perspective. Lifetime Data Anal. 2017;23(4):585â€“604.
Zhao Y, Zeng D, Socinski MA, Kosorok MR. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics. 2011;67(4):1422â€“33.
Goldberg Y, Kosorok MR. Qlearning with censored data. Ann Stat. 2012;40(1):529.
Huang X, Ning J, Wahed AS. Optimization of individualized dynamic treatment regimes for recurrent diseases. Stat Med. 2014;33(14):2363â€“78.
Murphy SA. Optimal dynamic treatment regimes. J R Stat Soc Ser B (Stat Methodol). 2003;65(2):331â€“55.
Robins, JM. Optimal structural nested models for optimal sequential decisions. In: Lin, DY.; Heagerty, PJ., editors. Proceedings 2nd Seattle symposium in biostatistics. New York: Springer; 2004, p. 189â€“326.
Taylor JM, Cheng W, Foster JC. Reader reaction to â€śa robust method for estimating optimal treatment regimesâ€ť by Zhang et al.(2012). Biometrics. 2015;71(1):267â€“73.
Zhao YQ, Zeng D, Laber EB, Kosorok MR. New statistical learning methods for estimating optimal dynamic treatment regimes. J Am Stat Assoc. 2015;110(510):583â€“98.
Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N.P., Dietz, K., Farewell, V.T. (eds) AIDS Epidemiology. BirkhĂ¤user, Boston, MA; 1992, p. 297331. https://doi.org/10.1007/9781475712292_14.
Robins JM. Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. In: Proceedings of the biopharmaceutical section, Virginia: American Statistical Association; 1993, p. 2433.
Robins JM, Finkelstein DM. Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) logrank tests. Biometrics. 2000;56(3):779â€“88.
Bang H, Tsiatis AA. Estimating medical costs with censored data. Biometrika. 2000;87(2):329â€“43.
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008;2(3):841â€“60.
Finkelstein DM. A proportional hazards model for intervalcensored failure time data. Biometrics. 1986;42(4):845â€“54.
Struthers CA, Kalbfleisch JD. Misspecified proportional hazard models. Biometrika. 1986;73(2):363â€“9.
Uno H, Claggett B, Tian L, Inoue E, Gallo P, Miyata T, et al. Moving beyond the hazard ratio in quantifying the betweengroup difference in survival analysis. J Clin Oncol. 2014;32(22):2380.
Ruppert, D. Trimming and Winsorization. In Wiley StatsRef: Statistics Reference Online; John Wiley & Sons: Hoboken, NJ, USA, 2014.
Ishwaran H. and Kogalur U.B. Fast Unified Random Forests for Survival, Regression, and Classification (RFSRC); 2022. https://cran.rproject.org/package=randomForestSRC. R package version 3.1.0. Accessed 9 FebÂ 2022.
Therneau TM. A Package for Survival Analysis in R; 2021. https://CRAN.Rproject.org/package=survival. R package version 3.213. Accessed 19 JanÂ 2021.
Socinski MA, Schell MJ, Peterman A, Bakri K, Yates S, Gitten R, et al. Phase III trial comparing a defined duration of therapy versus continuous therapy followed by secondline therapy in advancedstage IIIB/IV nonsmallcell lung cancer. J Clin Oncol. 2002;20(5):1335â€“43.
Acknowledgements
We want to thank Dr. Yingqi Zhao for providing the nonsmall cell lung cancer data set.
Funding
GDH is supported by NCCIH T32 Research Fellowship Program (T32 AT003378).
Author information
Authors and Affiliations
Contributions
All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1:
Proof for the consistency of the proposed jackknife estimator for rightcensored data.
Additional file 2:
Values obtained based on non censored data according to the true and estimated treatment rules.
Additional file 3:
Simulations scenarios for assessing the Type I error.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Honvoh, G.D., Cho, H. & Kosorok, M.R. Model selection for survival individualized treatment rules using the jackknife estimator. BMC Med Res Methodol 22, 328 (2022). https://doi.org/10.1186/s12874022018116
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874022018116