Two-stage optimal designs with survival endpoint when the follow-up time is restricted

Background Survival endpoint is frequently used in early phase clinical trials as the primary endpoint to assess the activity of a new treatment. Existing two-stage optimal designs with survival endpoint either over estimate the sample size or compute power outside the alternative hypothesis space. Methods We propose a new single-arm two-stage optimal design with survival endpoint by using the one-sample log rank test based on exact variance estimates. This proposed design with survival endpoint is analogous to Simon’s two-stage design with binary endpoint, having restricted follow-up. Results We compare the proposed design with the existing two-stage designs, including the two-stage design with survival endpoint based on the nonparametric Nelson-Aalen estimate, and Simon’s two-stage designs with or without interim accrual. The new design always performs better than these competitors with regards to the expected total study length, and requires a smaller expected sample size than Simon’s design with interim accrual. Conclusions The proposed two-stage minimax and optimal designs with survival endpoint are recommended for use in practice to shorten the study length of clinical trials.


Background
A multiple-stage design is often preferable in early phase clinical trials to investigate the activity of a new treatment. Such design is able to protect patients better as compared to the traditional one-stage design by allowing a trial to be stopped earlier when the new treatment is indeed ineffective. For this reason, early stopping for futility is always allowed in these trials. Among multiple-stage designs, a two-stage design is widely used in phase II clinical trials whose sample size is relatively smaller than that in the following phase III trial to confirm the effectiveness of the new treatment(s).
When the outcome is binary (e.g., response VS non-response), Simon's two-stage minimax and optimal *Correspondence: guogen.shan@unlv.edu; zerozhua@126.com † Guogen Shan and Hua Zhang contributed equally to this work and considered co-first author. 1  designs are widely used in practice [1][2][3][4][5][6][7][8]. When the required number of patients in the first stage are enrolled, a trial generally has to be suspended temporally to allow these patients completing the treatment schedule. After that, data analysis is performed to make the decision whether a trial proceeds to the second stage or not, based on the result from the first stage. This suspension during the clinical trial could lead to a longer study time as compared to the modified Simon's two-stage design with interim accrual [9]. Recently, adaptive version of Simon's two-stage design has been proposed to improve the flexibility of trials [3,4,[10][11][12]. In such trials, the second stage sample size depends on the outcome from the first stage.
In some other trials (e.g., cytostatic therapies), a survival endpoint is served as the primary outcome to measure the activity of a new treatment. Feldman et al. [13] reviewed seven single-arm phase II trials for patients with refractory germ cell tumors, and recommended a 12-week progression-free survival as compared to the commonly used response rate, to test the activity of novel agents. For such trials, a multiple-stage design with survival endpoint would be appropriate for use in practice. Lin et al. [14] proposed group sequential designs for a trial with survival endpoint by deriving the asymptotic joint distribution of the Nelson-Aalen estimates at different time points. Base on Lin et al. 's work, Case and Morgan [9] developed a twostage optimal design evaluating survival probabilities with restricted follow-up. They proposed two-stage optimal designs with the smallest expected duration of accrual or the smallest expected total study length. Later, Kwak and Jung [15] proposed a new two-stage optimal design based on the one-sample log-rank test without follow-up restriction. Power of their proposed design was computed under the average of the cumulative hazard function under the null hypothesis and that under the alternative hypothesis. In addition, the asymptotic variance estimate of the one-sample log-rank test was used in type I error rate and power calculation. Recently, Belin et al. [16] proposed a two-stage design based on the design setting as in Kwak and Jung [15], but having restricted follow-up as in Case and Morgan [9].
For a trial with a survival endpoint as the primary outcome, the survival probability at the clinically meaningful follow-up time is often the parameter of interest, (e.g., the survival probability at 1 year). We develop a new singlearm two-stage optimal design by using the one-sample log-rank test with exact mean and variance estimates [17,18]. A trial is allowed to be stopped in the first stage due to futility to protect patients when the treatment under investigation is indeed ineffective. Although exact mean and variance estimates of the one-sample logrank test are used for sample size calculation, the joint distribution of the test statistic for the first stage and that for the two stages combined is assumed to asymptotically follow a bivariate normal distribution. For this reason, the actual power of the identified study design may not be guaranteed [19]. We propose adjusting the nominal power level in design search to guarantee that the new designs meet the power requirement. The proposed two-stage minimax and optimal designs with survival endpoint are compared with the design by Belin et al. [16] and Simon's two-stage designs with or without interim accrual.
The rest of this article is organized as follows. In Section Methods, we present the type I error rate and power calculation for a two-stage design with survival endpoint by using the one-sample log-rank test, and provide a detailed search method for two-stage minimax and optimal designs. In Section Results, we compare the performance of the new proposed two-stage designs with the existing Belin's design with survival endpoint and Simon's two-stage design with binary endpoint. At the end of that section, we revisit two trials to illustrate the application of the proposed two-stage designs with survival endpoint. Lastly, we provide some comments in Section Discussion.

Methods
Suppose S(t) is the survival function of the survival time T. In a single-arm study, the survival probability of a new treatment at the clinically meaningful follow-up time t c , S(t c ), is compared to the estimated historical survival probability, S 0 (t c ). Then the hypotheses are presented as In this article, the survival function S(t) is assumed to follow the Weibull distribution with the shape parameter k and the scale parameter λ, specifically, where k > 0 and λ > 0. The widely used exponential distribution is a special case of the Weibull distribution when k = 1. Under the Weibull distribution for survival outcome, suppose the failure rate under the null hypothesis is the same as that under the alternative hypothesis (the same shape parameter k), but scale parameters are different with λ 0 and λ 1 under the null hypothesis and the alternative hypothesis, respectively. Then, = (λ 0 /λ 1 ) k is the hazard ratio (HR), which is always less than 1 under the alternative. The hypotheses in Eq. (1) can be specifically rewritten as H 0 : ≥ 1 against H 1 : < 1. ( When a new study is assumed to have a different failure rate as historical data, the HR is then calculated as where k 0 and k 1 are the shape parameter under the null hypothesis and that under the alternative hypothesis, respectively.

Simon's two-stage designs with binary endpoint
In Simon's two-stage optimal designs, a trial is allowed to be stopped in the first stage when the number of responses is insufficient. Suppose X 1 and X are the number of responses out of n 1 and n participants from the first stage and the two stages combined, respectively. The sample size in the second stage is n 2 = n − n 1 . The null hypothesis is rejected when X 1 > r 1 and X > r, where r 1 and r are the critical values for the number of responses from the first stage and both stages, respectively.
In a pancreatic cancer trial with a combination of Gemcitabine and external beam radiation as the new treatment [9], the clinically meaningful follow-time is 1 year, t c = 1. The unacceptable one-year survival rate is S 0 (1) = 35%, and the new treatment is considered as promising for further investigation when S 1 (1) = 50% or more. To attain 90% power of the study at the significance level of 10%, Simon's two-stage minimax design [1] is calculated as: (n 1 , r 1 , n, r) = (43, 14,72,30), with the expected sample size under the null hypothesis where PET is the probability of early termination under the null hypothesis which is defined as PET = p(X 1 ≤ r 1 |S 0 (1) = 35%) = 43.65%. Suppose this is a 3 year study with the patient accrual rate of θ = 24 patients per year. Then the enrollment time for the first stage and the second stage is calculated t 1 = n 1 /θ and t 2 = n 2 /θ, respectively. The expected total study length (ETSL) under the null hypothesis is calculated as The two-stage optimal design needs ESS 0 = 53.2 and ETSL 0 = 3.6 years (see Table 1). The maximum possible sample size for Simon's optimal design n = 81 is much larger than n = 72 for Simon's minimax design.
When Simon's two-stage design allows interim accrual at the end of the first stage, the expected sample size under the null hypothesis is calculated as and the expected total study length under the null hypothesis is The results of Simon's two-stage designs with interim accrual are presented in Table 1. As compared to the traditional Simon's two-stage design without interim accrual, the modified design with interim accrual requires a shorter ETSL 0 but a larger ESS 0 .

Two-stage optimal designs with survival endpoint when the follow-up time is limited
In a two-stage design with sample sizes of n 1 in the first stage and n 2 in the second stage, the maximum possible sample size in the study is n = n 1 + n 2 . Given the patient accrual rate of θ, the accrual time for the first stage is t 1 = n 1 /θ. When the trial goes to the second stage, the total accrual time of the study is t a = n/θ, and the total study time for all patients to complete the study is t = t a + t c .
We assume that patients are uniformly enrolled in the study, with the entering times of τ 1 , τ 2 , · · · , τ n . They have the survival times of T 1 , T 2 , · · · , T n and the censoring times of C 1 , C 2 , · · · , C n . At the end of the first stage t 1 , the observed time for the i-th patient is the smallest of the following three measurements: (1) event time; (2) censoring time; and (3) time that this patient is followed so far in the study, specifically, By using the observed time and the censoring information of the first n 1 patients, the one-sample log-rank test can be calculated as where W 1 is a function of the difference between observed number of events and the expected number of events, andσ 1 is its standard deviation estimate. Please find the detailed formula of Z 1 under the null hypothesis and the alternative hypothesis in Appendix. The null hypothesis is rejected when a small test statistic is observed. Suppose the critical value for Z 1 is c 1 . When the calculated Z 1 is larger than or equal to c 1 , the trial is stopped for futility and no further investigation is warranted. Otherwise, the trial goes to the second stage with additional n 2 = n − n 1 patients treated by the new treatment. At the end of study when all n patients complete the study, the one-sample log-rank test is calculated as It can be seen that Z 1 and Z are not independent from each other since the data of the first n 1 patients is used in both Z 1 and Z. The type I error (TIE) rate of the study is calculated as where c is the critical value for Z. Following Kwak and Jung [15], the joint distribution of (Z 1 , Z) is a bivariate normal distribution asymptotically. Then, the TIE can be specifically written as Table 1 The resectable pancreatic cancer clinical trial with S 0 (t c = 1) = 35%, and S 1 (t c = 1) = 50% to attain 90% power at the significance level of 10%

Survival endpoint
Simon's design, interim accrual The proposed method Belin No Yes The survival function follows an exponential distribution where φ and are the probability density function and the cumulative distribution function of the standard normal distribution, and ρ 0 is the correlation coefficient estimate between Z 1 and Z under the null hypothesis, see Appendix for the detailed formula for ρ 0 . The actual power of the study can be computed similarly with ρ 0 being replaced by the ρ estimate under the alternative hypothesis.

Optimal design search
Similar to the search for Simon's two-stage design, the two-stage optimal design with survival endpoint has to be searched over all the possible sample sizes (n 1 and n) and critical values (c 1 and c), given the design parameters Although the exact variances of Z 1 and Z are available for use in sample size determination, the exact joint distribution of Z 1 and Z is not that straightforward. For this reason, we utilize the limiting distribution of (Z 1 , Z) in searching for the two-stage optimal design for a study with the design parameters (α, β, t c , S 0 (t c ), S 1 (t c ), θ), then use a simulation study to calculate the actual TIE and power of the optimal design. The following three steps are used to search for the two-stage minimax and optimal designs.
Step 1: Given the total sample size n, the range of the first stage sample size n 1 is from 1 to n − 1. The critical value c 1 from -0.3 to 1.6 with an increment of 0.005 is used in the design search. Similar to Kwak and Jung [15], the range of c 1 is chosen based on the simulation studies for all the configurations studied in this article. The range of c 1 is modifiable in the software program for design search.
For each combination of n 1 and c 1 , the critical value c can be determined as the largest c value such that TIE(c) ≤ α from Eq. (3). Power of the study is then computed by using Eq. (4) in Appendix. If power is above the nominal level, this set of sample sizes and critical values, (n 1 , c 1 , n, c), is saved as a candidate for the optimal two-stage design. Among all the sets satisfying the power requirement, the one with the smallest ESS 0 is the optimal two-stage design when the total sample size is n, and it is denoted as B(n) = (n 1 , c 1 , n, c) whose expected sample size is ESS 0 (n).
Step 2: The design search starts with a relatively small n (e.g., 5) with an increment of 1, and B(n) could be a empty set when n is small. The two-stage minimax design is the one with the smallest n, n minimax such that B(n) is not empty. The optimal two-stage design is the one with the smallest ESS 0 . The search may be stopped at n u when its ESS 0 (n u ) is 10% more than the smallest ESS 0 from the identified optimal designs with n from n minimax to n u : ESS 0 (n u ) ≥ 110% × min{ESS 0 (n) : n minimax ≤ n ≤ n u }.
Step 3: Once the minimax and optimal two-stage designs are identified from Step 1 and Step 2, we use a simulation study to calculate the actual TIE and power based on 100,000 simulations. We find that the actual TIE of the optimal design B(n) = (n 1 , c 1 , n, c) is always guaranteed, while power may not be preserved in some cases. If the simulated power of the two designs meet the nominal levels, they are the final two-stage minimax and optimal designs. Otherwise, we search for the designs again with the power nominal level being increased by 1%, (α, β −1%) in Step 1 and Step 2 again. This process is stopped when both minimax and optimal two-stage designs meet the power requirement.

Results
We first compare the proposed two-stage minimax and optimal designs with survival endpoint when the followup time is restricted, with the designs developed by Belin et al. [16] (referred to as Belin's design). They developed a two-stage optimal design as a modification of the design by Kwak and Jung [15] by adding restricted follow-up in the study design [9]. In Belin's design, power of the study is computed at the average of the cumulative hazard functions under the null and the alternative, that is less than Table 2 Comparison between the proposed two-stage minimax and optimal designs with survival endpoint and Belin's two-stage optimal design with survival endpoint, when the follow-up time is restricted to the clinically meaningful follow-up time t c = 1 year  As a result of the over-estimated sample size, the actual power is often above the nominal level. Table 2 shows the comparison between the proposed designs with Belin's design, when the survival distribution follows an exponential distribution. Belin et al. [16] investigated the performance of two-stage optimal designs with restricted follow-up under exponential distributions only (the shape parameter k = 1 in the Weibull distribution). The clinically meaningful follow-up time t c is assumed to be 1 year. Under the null hypothesis, the survival rate at t c = 1 is S 0 (t c ) = 50% (λ 0 = 1.44) as studied in Table 2. The hazard ratio is assumed to be 0.5, which is = λ 0 /λ 1 = 0.5. Then the scale parameter under the alternative is λ 1 = 2.88. The nominal power level is set as either 90% or 95%. The accrual rate θ is 15, 30, or 50. The ESS 0 of the proposed minimax or optimal designs is often less than that of the Belin's design, that may be due to the fact that power of Belin's design is computed outside the alternative hypothesis space. The simulated TIE and power of the developed two-stage minimax and optimal designs are shown in Table 3. In Table 3, we also report the 95% confidence interval for the TIE and power based on 1,000 simulated TIE and power values, where each simulated TIE and power are computed using 10,000 simulations. It can be seen that the proposed designs control for TIE and power.
We further compare the proposed two-stage minimax and optimal designs with survival endpoint, with Simon's two-stage designs with or without interim accrual for a trial with binary endpoint, see Table 4 when the survival distribution follows the Weibull distribution with a common shape parameter of k = 0.5. The significance level is set as 5%, and the nominal power level is 80%. The null survival probability at the clinically meaningful follow-up time t c = 1, S 0 (t c ) = 10% and 60% are studied in Table 4. We consider a medium to large effect size as S 1 (t c ) − S 0 (t c ) = 10%, 15%, and 20%. For each configuration of S 0 (t c ) and S 1 (t c ), the Table 4 Comparison between the proposed two-stage minimax design with survival endpoint and Simon's two-stage minimax design with binary endpoint with or without interim accrual, when α = 5%, β = 20%, and the shape parameter k = 0.5 in the Weibull distribution Simon's two-stage minimax designs

Survival endpoint
No interim accrual Interim accrual % is for the ESS 0 or the ETSL 0 percentage saving of the new proposed two-stage design as compared to Simon's two-stage design, which is computed as (Simon-New)/Simon. When the percentage saving is positive, the new design requires a smaller ESS 0 or a shorter ETSL 0 as compared to the existing Simon's design The patient accrual rate θ is determined by the sample size from Simon's minimax design with no interim accrual as θ = n minimax /3 scale parameters λ 0 and λ 1 in the Weibull distribution can be calculated, the ESS 0 and ETSL 0 of the proposed minimax design and Simon's minimax design are computed. Patient accrual rate θ is calculated by assuming it is a 3 year study when Simon's two-stage minimax design is used. In the table, percentage (%) is for the ESS 0 or the ETSL 0 percentage saving of the proposed two-stage design with survival endpoint as compared to Simon's two-stage design, which is computed as (Simon-New)/Simon. When the percentage saving is positive, the new design requires a smaller ESS 0 or a shorter ETSL 0 as compared to the existing Simon's design. When the null survival probability S 0 (t c ) is low, say 10%, the proposed two-stage design with survival endpoint saves sample size as compared to Simon's two-stage minimax design. This trend is reversed when S 0 (t c ) = 60%. In Table 4, we also present the results of Simon's two-stage minimax design with interim accrual. It can be seen that the new design always requires a smaller ESS 0 than Simon's design with interim accrual. The new design always saves the ETSL 0 as compared to Simon's design with or without interim accrual. The saving becomes smaller as the null survival probability goes up from 10% to 60%. Similar results are observed in Table 5 for the two-stage optimal designs. We further compare the new two-stage minimax design with Simon's two-stage minimax design with the shape parameter k from 0.25 to 2 in Fig. 1 for a trial to attain 90% power at the significance level of 5%. When S 0 (t c ) is low, the new design needs a smaller expected sample size than Simon's minimax design, and this trend is reversed when S 0 (t c ) is high, e.g., 40%, and 75%. The saving of the new design often decreases as k goes up. The new design always requires a shorter expected total study length than Simon's minimax design. Similar results are observed in Fig. 2 where the new twostage optimal design is compared with Simon's optimal design. We also compare the new design with Simon's two-stage minimax and optimal designs with interim accrual in Fig. 3 and Fig. 4, respectively. The results indicate that the new design performs better than Simon's design with interim accrual with regards to both ESS 0 and ETSL 0 .

Examples
We revisit the cancer trial discussed by Case and Morgan [9] in "Simon's two-stage designs with binary endpoint" subsection to investigate the effectiveness of a combination of Gemcitabine and external beam radiation for patients with resectable pancreatic cancer. The clinically meaningful follow-up time is assumed to be 1 year, t c = 1. The survival probability under the null and the alternative are S 0 (1) = 35%, and S 1 (1) = 50%, respectively. The survival function follows an exponential distribution. This trial is designed to attain 90% power at the significance level of 10%. We compute the detailed twostage designs with survival endpoint, including sample sizes and critical values for each stage in Table 1. The ESS 0 of the new design is slightly larger than that of Simon's design, but much smaller than that of Simon's design with interim accrual. The ETSL 0 of the new design is always shorter than that of Simon's designs with or without interim accrual, and the study time saving is substantial.
We also consider a second clinical trial evaluating the activity of a combination of irinotecan and cisplatin for patients with refractory or recurrent non-small cell lung cancer [20]. The response rates are 10% and 25% under the null and the alternative hypotheses. Suppose the clinically meaningful follow-up time is 1 year. For Simon's Table 5 Comparison between the proposed two-stage optimal design with survival endpoint and Simon's two-stage optimal design with binary endpoint with or without interim accrual, when α = 5%, β = 20%, and the shape parameter k = 0.5 in the Weibull distribution Simon's two-stage optimal designs Survival endpoint No interim accrual Interim accrual  . 1 The ESS or ETSL saving of the proposed two-stage minimax design with survival endpoint as compared to Simon's two-stage minimax design with binary endpoint when α = 5% and β = 10% two-stage optimal design when α = 5% and β = 20%, the maximum possible sample size is n = 43 and the expected sample size under the null hypothesis is ESS 0 = 24.7, see Table 5 for the case with S 0 (t c ) = 10% and S 1 (t c ) = 25%. The proposed new two-stage optimal design with survival endpoint needs a slightly smaller ESS 0 as 24.0, and can save the expected total study length by almost 1 year (2.2 VS 3.1 from Simon's design). A 95% two-sided confidence interval of the response rate was reported in the original research article by Takiguchi et al. [20]. The hypothesis is one sided in both Simon's design and the proposed design. Therefore, a 90% two-sided confidence interval for the Fig. 2 The ESS or ETSL saving of the proposed two-stage optimal design with survival endpoint as compared to Simon's two-stage optimal design with binary endpoint when α = 5% and β = 10% response rate or the survival rate should be reported when α = 5%.

Discussion
In the design search process, we search for the minimax and optimal designs when both designs have power above the nominal level. In practice, when one type of design is of interest (e.g., the two-stage minimax design), we would suggest searching for the design such that power of this particular type design is above the nominal level. The written R program computes the designs to have both the minimax design and the optimal design meet the nominal Fig. 3 The ESS or ETSL saving of the proposed two-stage minimax design with survival endpoint as compared to Simon's two-stage minimax design with interim accrual with binary endpoint when α = 5% and β = 10% power level, which is available upon request from the first author.

Conclusions
The commonly used Simon's two-stage design has to suspend the enrollment temporally after n 1 patients enrolled in the first stage [5,11,[21][22][23][24][25][26][27][28]. The research team has to wait a while (t c ) until all n 1 patients complete the study. The calculated test statistic from the first stage is then compared to the pre-determined critical value to make a go or no-go decision to the second stage. Meanwhile, the proposed two-stage designs with survival endpoint do not have to suspend the trial, thus the comparison between the proposed design with Fig. 4 The ESS or ETSL saving of the proposed two-stage optimal design with survival endpoint as compared to Simon's two-stage optimal design with interim accrual with binary endpoint when α = 5% and β = 10% Simon's two-stage design with no interim accrual is not very appropriate. Due to the popularity of Simon's twostage design, we include this design as reference. Simon's two-stage design with interim accrual is a reasonable competitor for the proposed two-stage design with survival endpoint.
Y i (t) = I(T i ≥ t, T i ≥ t c ) be the event process and the atrisk process, respectively. The one-sample log-rank test at the end of the first stage is expressed as: are the observed number of events and the expected number of events, respectively. The one-sample log-rank test can be alternatively written as where W 1 = (O − E)/ √ n andσ = E/n, andσ 2 1 is the variance estimate of W 1 . The one-sample log-rank test Z at the end fo the study can be derived similarly by replacing N i (t) with N i (t) = I(T i ≤ C i )I(T i ≤ t).

Mean and variance estimates of W 1 and W under the null hypothesis
The mean of W 1 or W under the null hypothesis is 0. The clinically meaningful follow-up time t c is the upper bound follow-up time for each patient, then the censoring distribution is G(t) = I(t ≤ t c ). The censoring distribution for the first stage is G 1 (t) = U(0, t 1 )I(t ≤ t c ) due to a possible short follow-up time at the data analysis time t 1 . Then, the variances of W 1 and W are estimated as