Quasi-linear Cox proportional hazards model with cross- L1 penalty

Background To accurately predict the response to treatment, we need a stable and effective risk score that can be calculated from patient characteristics. When we evaluate such risks from time-to-event data with right-censoring, Cox’s proportional hazards model is the most popular for estimating the linear risk score. However, the intrinsic heterogeneity of patients may prevent us from obtaining a valid score. It is therefore insufficient to consider the regression problem with a single linear predictor. Methods we propose the model with a quasi-linear predictor that combines several linear predictors. This provides a natural extension of Cox model that leads to a mixture hazards model. We investigate the property of the maximum likelihood estimator for the proposed model. Moreover, we propose two strategies for getting the interpretable estimates. The first is to restrict the model structure in advance, based on unsupervised learning or prior information, and the second is to obtain as parsimonious an expression as possible in the parameter estimation strategy with cross- L1 penalty. The performance of the proposed method are evaluated by simulation and application studies. Results We showed that the maximum likelihood estimator has consistency and asymptotic normality, and the cross- L1-regularized estimator has root-n consistency. Simulation studies show these properties empirically, and application studies show that the proposed model improves predictive ability relative to Cox model. Conclusions It is essential to capture the intrinsic heterogeneity of patients for getting more stable and effective risk score. The proposed hazard model can capture such heterogeneity and achieve better performance than the ordinary linear Cox proportional hazards model.


Appendix B : Proof of Proposition 2
Proof. Let β and γ be inR c . Then, we observe that a vector β q = qβ We conclude thatR c is a convex set for all c ≥ 0.
. This is the difference in log partial likelihoods over [0, t], evaluated at parameter θ and the true value θ 0 . Then it is written by the counting processes Let M be the martingale process of N . By the relationship N = Λ + M , we get that θ0) . We find that there exists ϵ > 0 such that r(x i , θ) > ϵ for all θ ∈ Θ and so H i is local bounded.
Because Y i is a left-continuous adapted process, Y i is predictable, which implies that H i is predictable. Therefore, H i is a local-bounded F t -predictable process.
Moreover, M i = N i − Λ i is a local square integrable martingale, by the Theorem 2.3.1 in [3]. Finally, we find that by the Theorem 2.4.5 in [3], X n (θ, t) − A n (θ, t) is a local square integrable martingale. By Theorem 2.5.2 in [3], the variance process

B(θ, t) is given by
Then we get that The final integral in (A.4) converges in probability to zero by condition A, B and C on S (0) . By the Schwarz inequality, the middle integral will converge in probability to zero if the first integral does so. The first integral also converges to zero, as follows. First, the Taylor expansion about θ 0 of log r(x i , θ)/r(x i , θ 0 ) is written as where θ * is between θ and θ 0 . Then The first integral converges in probability to zero by condition A and B on S (1) in accordance with S (5) = S (1) . By the Schwarz inequality, the middle integral will converge in probability to zero if the final integral does so. For the final term it follows that which converges in probability to zero by condition D. Therefore, B(θ, 1) converges in probability to zero. By an inequality of Lenglart in [1], X(θ, 1) converges in probability to the same limit with A(θ, 1) for every θ ∈ Θ. By conditions A and B, it follows that As discussed in [1] and [2], we can take the first and second derivatives of the limit of A(θ, 1) by taking partial derivatives inside the integral. This gives the first derivative which is equal to zero at θ = θ 0 and the second derivative which at θ = θ 0 equals −Σ, which by condition C is positive-definite matrix. Here, we note that S (6) (θ 0 , t) = S (3) (θ 0 , t) − S (2) (θ 0 , t). It now follows that X(θ, 1) converges in probability to a concave function h of θ with a unique maximum at θ = θ 0 . We find thatθ p −→ θ 0 by Appendix 2 in [1]. This completes the proof. Proof of Theorem 1 (2). Consider the score statistics defined by and the Taylor series expansion for U centered at the true value θ 0 of θ as where I(θ * ) is the observed information matrix at θ * , which is on the line between θ 0 andθ. Because U (θ) = 0,  .
Author details