Comparing survival curves based on medians

Background Although some nonparametric methods have been proposed in the literature to test for the equality of median survival times for censored data in medical research, in general they have inflated type I error rates, which make their use limited in practice, especially when the sample sizes are small. Methods In this paper, we propose a new nonparametric test with a simple test statistic to compare median survival times. Results The results from our comprehensive simulation study show the new test controls type I error rate very well under all situations considered, even for small sample sizes. In addition, it has comparable detecting power compared to existing methods. Another advantage of the proposed test is its relatively simple formula that requires less computation. Conclusions We propose a new statistical method for comparing survival curves based on their medians. The new method can be easily implemented and applied to censored event time data.


Background
In analyzing data with censored observations, it is a common task to compare several survival curves. Many nonparametric tests based on the Kaplan-Meier estimator for survival curves [1], such as Gehan's generalized Wilcoxon test, the Cox-Mantel test, the logrank test, and Peto tests, have been proposed and implemented in major statistical software [2]. In addition, Efron proposed a test based on the conditional probability of censored data for two samples cases [3,4].
Due to the skewness of survival data, the median survival time is thought to be an important measurement of the survival distribution. If two crossing survival curves [5] are different but their median survival times are similar, then comparing the survival medians or quantiles rather than the curves is more appropriate to answer some research questions. Several nonparametric tests for comparing median survival times have been proposed in the literature [6][7][8][9][10][11]. Brookmeyer and Crowley proposed a test (called BC test hereafter) based on the weighted survival curve and the weighted median [11]. In their test, the individual survival curve for each group and their weighted survival median time need to be estimated by using Kaplan-Meier estimator; then the survival probabilities from all groups at the time point of the estimated weighted survival median are compared. The authors showed that the test statistic has an asymptotic chi-square distribution with degrees of freedom equal to the number of groups minus one, under the assumption that all of the groups have the same survival distribution. In order to get this test statistic, we need to estimate a covariance matrix and calculate its generalized inverse, which may be a tedious task [8]. Another limitation of the BC test is its inflated type I error rate, which may cause serious problems when sample sizes are small [9]. Rahbar et al. proposed another nonparametric test (called RC test hereafter) which is directly based on the estimated median survival times [9]. Although some methods of estimating the median survival time, such as Reid's method [12], have been proposed, we found that RC method using those approaches to estimate the variances had poor performance. Alternatively, resamplingbased methods, such as bootstrap can be used to estimate the variances. Through simulation study, the authors showed that the RC test performs similarly to the BC test when sample sizes are large. However, when sample sizes are medium (e.g., 50 per group), although the RC test has empirical type I error rates less than the BC test, they are still inflated.
Motivated by avoiding calculating the inverse matrix in the BC test statistic, recently Tang and Jeong proposed a simple nonparametric test (called TJ test hereafter) based on the estimated conditional probability of the censored time which is greater than the overall median [4,8]. Their test statistic is a weighted sum of statistics obtained from a series of contingency tables. The authors claimed that their test statistic under the null hypothesis could be approximated by a chi-square distribution [8].
In this paper, we propose a new nonparametric test with a simple formula. Through comprehensive simulation, we show that the proposed test controls type I error rate very well, even for situations where sample sizes are very small. Our simulation results also show that the existing methods have inflated type I error rates, which is very serious when sample sizes are small. We also illustrate the use of the proposed test by a real data application.

Method
Denoted by X i1 ; X i2 ; …; X in i (i =1, 2,…, K) the n i independent random survival times from the i th group. For right censored data, since some of the survival times may not be observed completely, the data are recorded in the form (Z ij , δ ij ), j = 1, 2, …, n i , i = 1,2,…, K, where Z ij = Min(X ij , Y ij ), δ ij = I(X ij ≤ Y ij ), Y ij is the censoring time of the j th subject in the i th group, and I (⋅) is the indicator function. In this paper, we assume the generic random variables X and Y are independent with survival distributions F i (x) = P(X ij > x), G i (y) = P(Y ij > y), respectively. The median survival time of F i is defined as θ i = inf(t : F i (t) ≤ 0.5). It can be estimated byθ i ¼ inf ðt :F i ðtÞ≤0:5Þ, whereF i is the Kaplan -Meier estimator of F i [1].
The proposed test statistic is as follows: where η i ¼F i ðθ 0 Þ,θ 0 is the overall sample median survival time obtained from the pooled data set, w i ¼ 1=σ 2 i , σ 2 i is a consistent estimate for variance of η i , and For the test statistic C in (1), we have the following theorem: Theorem 1. Under the null hypothesis that the K survival curves have the same median, statistic C has an asymptotic chi-square distribution with K-1 df.
Proof: It is known that under the null hypothesis, η i has an asymptotic normal distribution N(0.5, σ i 2 ) [11]. Then from Theorem 1 of Chen et. al [13] and Slutsky's theorem, j¼1 h j η j Þ 2 has an asymptotic chi-square distribution with df = K-1 givenσ 2 i is a consistent estimate for the variance of η i .

Variance estimate for η i
Although the Greenwood method [14] can be used to estimate the variance ofF iθ 0 , in general, the Greenwood approach tends to underestimate the variance [15]. In fact, it is well known that With Y as the estimated survival function and X as the survival data, the variance decomposition can be applied. The Greenwood estimate is an approximation of E[V(Y|X)]. Therefore if we ignore the second part, V[E(Y|X)], the Greenwood estimate will tend to underestimate the variance. It is also known that a Cochran-type test statistic such as C in (1) is anticonservative [13] and is an increasing function of the weights w i [16]. Therefore, if we replaceσ 2 i in C by the Greenwood estimate, the type I error rate will become inflated [13].
For the first part, E[V(Y|X)], we will use the Greenwood estimate; the second part, V[E(Y|X)], can be estimated using bootstrap. However, here we propose a simple approximation to it: for group i, we estimate the median survival timeθ i using the Kaplan-Meier method and find the uncensored timeθ i1 which is the closest one toθ i , then we obtain the variance betweenF iθ i . This approximation works well because either of the two times will be close to the median from the bootstrap. It is easily seen that this estimated variance has an order of n i − 2 . Hence, for large sample sizes, this part can be ignored and the new test statistic will have a similar value as the BC test statistic. Based on the simulation results (see below), when sample sizes are greater than 100 (as in Tables 1 and 2), BC and the proposed test have similar empirical type I error rates, indicating that under those situations it is safe to ignore the second part. Nevertheless, when sample sizes are smaller than 30 (see Tables 4 and 5), the second part should not be ignored, as it will result in serious inflated type I error rates (see the Results section). Table 1 The empirical type I error rates of the tests at the significance level of 0.05 when the survival times have the same uniform, exponential, or log-normal distributions with sample sizes n 1 = 100, n 2 = 150, n 3 = 150, n 4 = 200 (simulation study with 10,000 replicates for each situation) Distributions Table 2 The empirical type I error rates of the tests at the significance level 0.05 when the survival times have uniform, exponential, log-normal,uniform and exponential, uniform and log-normal, or exponential and log-normal distributions with equal medians but unequal variances and sample sizes n 1 = 100,

Simulation study
To assess the performance of the proposed test, we conducted a comprehensive simulation study to compare it with the BC test and TJ test. RC is excluded in the simulation study since it has similar performance as the BC test in general, but has worse performance when sample sizes are small [6]. All of the simulation were conducted by using R, with the R package "survival" [17].
In this simulation study, the alternative hypothesis is the medians are not the same among several populations. We choose K = 4 groups and considered the following survival time distributions (F i )): (a) all uniform, U(−a i /c i , a i /c i ) + θ i , (b) all exponential, exp(a i ) + θ i , (c) all log-normal, lnorm(θ i , a i ), (d) two uniform and two exponential, (e) two uniform and two log-normal, and (f ) two exponential and two log-normal. The censoring time , respectively, when the survival time distributions are uniform, exponential, and log-normal. Here p i is the censoring rate, which is set to be 0 (noncensoring), 0.1, 0.2, and 0.3 in the simulation. The type I error rates and power values are estimated by the empirical rejection proportions from 10,000 replicates at preset significance level 0.05. We compare these tests under both large sample sizes (n 1 = 100, n 2 = 150, n 3 = 150, n 4 = 200) and small sample sizes (n 1 = 20, n 2 = 25, n 3 = 25, n 4 = 30) situations.
Tables 1, 2 and 3 are the simulation results under the situations of large sample sizes. Table 1 reports the empirical type I error rates for these methods when all of the four groups have the same survival time distribution and the same censoring time distribution. It shows that the BC test and the proposed test control type I error rate very close to the nominal level. The TJ test can control type I error rate only when the censoring rate is 0. With the censoring rate increases, the TJ test has larger inflated empirical type I error rates. The empirical type I error rate can be as large as 0.114, for example, when the survival time distribution is uniform and the censoring rate is 0.3. Table 3 The empirical powers of the tests at the significance level 0.05 when the survival times have uniform, exponential, lognormal,uniform and exponential, uniform and log-normal, or exponential and log-normal distributions with unequal medians and sample sizes n 1 = 100, n 2 = 150,    Table 5 The empirical type I error rates of the tests at the significance level 0.05 when the survival times have uniform, exponential, log-normal,uniform and exponential, uniform and log-normal, or exponential and log-normal distributions with equal medians but unequal variances and sample sizes n 1 = 20, n 2 = 25, n 3 = 25, n 4 = 30 Distributions and parameters Censoring rate BC TJ New  Table 2 gives the empirical type I error rates for each method when the four survival distributions have the same medians but with different variances. Once again, the BC test and the new test have empirical type I error rates close to the nominal level 0.05. On the other hand, the TJ test still has inflated sizes when there are censored data present. Table 3 shows the empirical powers for each method when the four median survival times are not the same (i.e., the null hypothesis is not true). In general, the BC test and the proposed test have close power values. When there are censored data, the TJ test obtains larger power values. This may be because this test has inflated type I error rates.
Tables 4, 5 and 6 report the simulation results for small sample sizes (n 1 = 20, n 2 = 25, n 3 = 25, n 4 = 30) and different levels of censoring. In Table 4 where the null hypothesis is true and the survival time and censoring time each has the same distribution for the four groups, the proposed test controls type I error rate well. Unlike its performance under large sample size situations, when sample sizes are small, the BC test also has inflated type I error rates even for noncensored data. Once again the TJ test cannot control the type I error rate when the censoring rate is not 0.
When the four survival distributions have the same median with different variances, the empirical type I error rates of these tests are given in Table 5. If there is no censored data, the TJ test has empirical type I error rates close to the nominal level; but the BC test still has inflated type I error rates. When there are censored data, both the BC test and the TJ test have inflated type I error rates. In contrast, the proposed test can control type I error rate quite well under all situations considered. Table 6 reports the empirical powers for the three methods when the null hypothesis is not true. In general, the empirical powers from the new test are smaller than those from the BC test and the TJ test. When the censoring rate is high (e.g., 0.3), the power values for the TJ test are higher than that from the BC test. This can be explained by their performance of controlling type I Table 6 The empirical powers of the tests at the significance level 0.05 when the survival times have uniform, exponential, lognormal,uniform and exponential, uniform and log-normal, or exponential and log-normal distributions with unequal medians and sample sizes n 1 = 20, n 2 = 25, n 3 = 25, n 4 = 30

Real data application
To illustrate the use of the proposed test, we conduct a real data application. In a study to investigate the relationship between diet and the development of tumors, 90 rats were divided into three groups and 30 rats in each group were fed with the same low-fat, saturated fat or unsaturated fat. The rats had the same age, species and similar physical condition. The rats were observed for 200 days, and the times to develop a tumor were recorded. This data set is available from Table 3.4 of Lee and Wang's book [2]. The estimated median survival times (tumor-free time) were 191, 107, and 91 days for the low-fat, saturated fat and unsaturated fat groups, respectively. The survival curves (not shown) estimated by the Kaplan-Meier method for the three diet groups indicate the expected tumor-free time for rats fed with low-fat is longer than that for the rats fed with saturated fat, which in turn is longer than that for the rats fed with unsaturated fat. We compare the median survival times for these three groups using the BC test, TJ test and the proposed test. The p-values from the three tests are 0.0059, 0.017, and 0.0043, respectively. All of the three methods reject the null hypothesis at significance level 0.05. In addition, when testing for the equality of survival curves, the p-values from the log-rank test and Wilcoxon test are 3.14 × 10 -5 and 0.0020, respectively.

Discussion and conclusion
Although several nonparametric tests have been proposed to compare the median survival times, they are anticonservative which make their application limited, especially for the situations where the sample sizes are small. To overcome this shortcoming, in this paper, we proposed a new nonparametric approach to testing for the equality of median survival times. Our comprehensive simulation study shows that, unlike existing methods, the new test controls type I error rate very well under all situations considered, even if the sample sizes are small. On the other hand, the detecting power of this new method is comparable to BC when both can control type I error rate (i.e., large sample size). TJ method usually has larger power values; however, this mainly because it has inflated type I error rate, even for large sample sizes (see Tables 1  and 2). The BC test has similar power values as the proposed test when sample sizes are large; but it has inflated type I error rate when sample sizes are small. Therefore, the proposed test is preferred over the BC test.
The new test statistic has a very simple formula; it neither requires calculating a matrix and its inverse, nor relies on the time consuming resampling-based methods, such as bootstrap. Therefore, unlike those resampling based methods, this test needs less computation and can be easily implemented. Furthermore, although the proposed test is designed for detecting the difference of the median survival times, it can be extended without any difficulty to compare survival quantiles. In addition, the proposed test is also suitable for uncensored data.