To test the activity of a treatment in a two-stage design compared to the historical response rate π
0, the hypotheses are often presented as
$$H_{0}: \pi\leq \pi_{0},$$
against the alternative
$$H_{a}: \pi\geq \pi_{1}, $$
where π
1 is the estimated response rate of the new treatment. The null hypothesis is rejected for a high response rate.
Simon’s two-stage design is traditionally used in clinical trials to assess the activity of a new cancer treatment. In Simon’s design, the second stage sample size n
2(X
1) is a constant when a trial goes to the second stage, where X
1 is the number of responses from the first stage. From an adaptive perspective, a clinical trial could be much more flexible and effective when the second stage sample size is allowed to change based on information gathered so far. It is reasonable to assume that n
2(X
1) has a non-increasing relationship with the number of responses observed from the first stage: \(n_{2}(X_{1})\geq n_{2}(X_{1}^{'})\) when \(X_{1}<X_{1}^{'}\). Shan et al. [6] developed an adaptive optimal two-stage design with the non-increasing sample size relationship respected. This design is referred to be as the AdaptiveS design. The design is given as
$$(n_{1},n_{2}(X_{1}),r(X_{1})), $$
where the number of possible responses out of n
1 participants in the first stage is X
1=0,1,2,…,n
1, and n
2(X
1) and r(X
1) are the second stage sample size and the critical value for the study given X
1 responses from the first stage, respectively. In this article, a study can be stopped in the first stage due to futility when X
1≤r
1(f) or efficacy when X
1≥r
1(e). When the number of responses from the first stage is between r
1(f) and r
1(e), r
1(f)<X
1<r
1(e), the trial proceeds to the second stage with an additional n
2(X
1) participants and the final decision is made by comparing the total number of responses (X
1+X
2) and r(X
1), where X
2 is the number of responses out of n
2(X
1) participants. The new treatment is considered effective enough to proceed to the next phase when X
1+X
2≥r(X
1). Otherwise, the new treatment is not promising for further investigation.
Upon completion of an adaptive clinical trial, confidence interval for the response rate should be computed and reported. The hypothesis for testing the activity of the new treatment is often one-sided, and the confidence interval and the hypothesis testing should be consistent with each other. For this reason, we focus our interest on one-sided lower intervals as the null hypothesis is rejected when a high response rate is observed. When the significance level is α, a 1−α one-sided interval, (L,1], should be computed for statistical inference, where L is the 1−α lower limit.
The method by Clopper and Pearson [11] (CP) was used to construct exact one-sided intervals for a binomial proportion. It is exact because the coverage probability is guaranteed to be at least 1−α and the coverage probability is calculated by using the binomial probabilities, not asymptotic distributions. We extend this approach for the response rate in adaptive two-stage design settings. This approach has to be used with a method to order the sample space, which is often referred to as stochastic ordering. The complete sample space can be divided into three complementary sub-spaces,
$$\Omega=\{G_{1},G_{2},G_{3}\}, $$
where G
1={X
1:0,1,2,…,r
1(f)}, G
3={X
1:r
1(e),r
1(e)+1,…,n
1}, and G
2={(X
1,X
2):r
1(f)<X
1<r
1(e),X
2≤n
2(X
1)}. Sets G
1 and G
3 contain the sample points where a trial is stopped in the first stage due to futility and efficacy, respectively. Set G
2 represents the sample points that a trial goes to the second stage. It should be noted that set G
3 could be empty in some cases when the optimal adaptive two-stage stops in the first stage only due to futility, which often occurs in cases with a large π
0 and a large difference between π
0 and π
1 as seen in Shan et al. [6]. The lower limits for sample points in set G
1 are the smallest, followed by sample points in set G
2 and set G
3. Within sets G
1 and G
3, the lower limits for the sample points are ordered by the number of their responses, and they are the same as the CP lower limits for a binomial proportion. For sample points in set G
2, the second stage sample size changes as the number of responses from the first stage. For this reason, the second stage sample size should be considered in the sample ordering. We propose ordering the sample points in set G
2 by response rates (RR) in the first stage and the two stages combined,
$$\begin{aligned} L\left(\frac{X_{1}}{n_{1}},\frac{X_{1}+X_{2}}{n_{1}+n_{2}(X_{1})}\right)\leq L\left(\frac{X_{1}^{'}}{n_{1}},\frac{X_{1}^{'}+X_{2}^{'}}{n_{1}+n_{2}\left(X_{1}^{'}\right)}\right)\\ \text{if} \ \ \frac{X_{1}}{n_{1}} \leq \frac{X_{1}^{'}}{n_{1}}\ \ \text{and}\ \ \frac{X_{1}+X_{2}}{n_{1}+n_{2}(X_{1})} \leq \frac{X_{1}^{'}+X_{2}^{'}}{n_{1}+n_{2}\left(X_{1}^{'}\right)}. \end{aligned} $$
This approach is referred to as the RR approach. This ordering is motivated by the p-value calculation for a two-stage study. The rejection region includes the extreme outcomes whose first stage and second stage responses are at least as large as the observed data [8–10, 15].
Another stochastic ordering is based on the p-value of each sample point. Similar to the RR approach, the p-value for sample points in set G
1 is the largest, followed by sample points in set G
2 and G
3. A sample point with a large p-value indicates week evidence against the null hypothesis. In other words, it should have a large lower limit. For a sample point (X
1,X
2) in set G
2, its associated p-value is calculated as
$${} P(X_{1},X_{2})\,=\,\sum_{(X_{1}^{'},X_{2}^{'})\in \Theta(X_{1},X_{2})}\!b\!\left(\!X_{1}^{'},n_{1},\pi_{0}\!\right)\!b\left(\!X_{2}^{'},n_{2}\left(\!X_{1}^{'}\right),\pi_{0}\!\right), $$
where b(…) is the probability density function of a binomial distribution, and Θ(X
1,X
2) is the tail area
$${} \begin{aligned} \Theta(X_{1},X_{2})&=\left\{G_{3}\ \text{and}\ \left(X_{1}^{'},X_{2}^{'}\right): \frac{X_{1}}{n_{1}} \leq \frac{X_{1}^{'}}{n_{1}}, \frac{X_{1}+X_{2}}{n_{1}+n_{2}(X_{1})}\right.\\ &\left.\quad\leq \frac{X_{1}^{'}+X_{2}^{'}}{n_{1}+n_{2}\left(X_{1}^{'}\right)} \right\}. \end{aligned} $$
The response rates are used to define the tail area in the p-value calculation. Since the p-value is used to order the sample space in this approach, we name this approach as the PV approach. Although the p-value calculation may not guarantee the type I error rate, it is still a valid measurement to order the sample space. In this approach, we sort the sample points by the p-value from smallest to largest. In the PV approach, every sample point has its order number based on its p-value, from 1 to the size of the sample space. In the RR approach, two sample points from set G
2 can be ordered only if one sample point belongs to another’s tail area.
Once a stochastic ordering of the sample space is defined, we use the CP method to compute the exact one-sided lower limit as the collection of π
$$ \left\{\pi:P\big(\Omega_{\varphi}(X_{1},X_{2})|\pi\big)>\alpha\right\}, $$
(1)
where φ is an approach used to order the sample space (e.g., PV, RR), \(\Omega _{PV}(X_{1},X_{2})=\left \{\left (X_{1}^{'},X_{2}^{'}\right):P\right.\left.\left (X_{1}^{'},X_{2}^{'}\right)\right.\left.\leq P(X_{1},X_{2}){\vphantom {(X_{1}^{'},X_{2}^{'}):P}}\right \}\), and \(\Omega _{RR}(X_{1},X_{2})=\left \{\left (X_{1}^{'},X_{2}^{'}\right): \left (X_{1}^{'},X_{2}^{'}\right)\right.\left.\in \Theta (X_{1},X_{2}){\vphantom {\left (X_{1}^{'},X_{2}^{'}\right): \left (X_{1}^{'},X_{2}^{'}\right)}}\right \}\). Since the null hypothesis is rejected for a large response rate, we focus on the one-sided lower limit. The proposed approach to compute exact one-sided lower limits can be readily applied to calculate exact upper limits.
Theorem 1
For any given response rate π, P(Ω
PV
(X
1,X
2)|π)≥P(Ω
RR
(X
1,X
2)|π) is always true.
Proof
For sample points from set G
1 and set G
3, it is easy to show that P(Ω
PV
(X
1,X
2)|π) is always equal to P(Ω
RR
(X
1,X
2)|π). For a given sample point \(\left (X_{1}^{'},X_{2}^{'}\right)\in \Omega _{RR}(X_{1},X_{2})\) where sample points (X
1,X
2) and \(\left (X_{1}^{'},X_{2}^{'}\right)\) are from set G
2, the relationship between their tail areas is
$$\Theta(X_{1},X_{2}) \supseteq \Theta\left(X_{1}^{'},X_{2}^{'}\right). $$
Thus, the p-value of (X
1,X
2) is not less than that of \(\left (X_{1}^{'},X_{2}^{'}\right)\): \(P(X_{1},X_{2})\geq P\left (X_{1}^{'},X_{2}^{'}\right)\). It follows that the sample point \(\left (X_{1}^{'},X_{2}^{'}\right)\) belongs to Ω
PV
(X
1,X
2). Therefore, P(Ω
PV
(X
1,X
2)|π) is always greater than or equal to P(Ω
RR
(X
1,X
2)|π) □
From Theorem 1 and the construction of the exact one-sided interval in Eq. (1), we see that the exact one-sided lower limit based on the RR approach is always greater than or equal to that based on the PV approach.
Coverage probability is defined as
$${} P\Big(\pi \in (L(X_{1},X_{2}),1]\Big)=P\Big((X_{1},X_{2}):L(X_{1},X_{2})<\pi|\pi\Big). $$
(2)
A confidence interval is called exact if P(π∈(L(X
1,X
2),1])≥1−α is satisfied for any π∈ [ 0,1]. We present the coverage probability plots for the adaptive design with design parameters (π
0,π
1,α,β)=(20%,40%,0.05,0.2) [6] in Fig. 1. It can be seen that the PV approach is exact with the coverage probabilities being at least 95%. However, the RR approach is not exact, as the coverage probability could be as low as 90.5% at the nomial level of 95% in this configuration. One reason for the non-exactness of the RR approach is that the sample points can not be completely ordered. To overcome this issuee, we use the non-exact lower limits from the RR approach to order the sample space again. A new stochastic ordering is created by using the calculated limits. This new ordering can be viewed as a two-step ordering because the ordering is generated after the non-exact limit calculation. This approach is referred to as the RR-A approach.
The following three existing approaches to order the sample space have been discussed in the literature [15]. Sample space can be ordered by the average response rate from the study,
$$\frac{X_{1}+X_{2}}{n_{1}+n_{2}(X_{1})}. $$
This approach is referred to as the RR-B approach. It should be noted that this sample size ordering is equivalent to the ordering by using the MLE in a one-sample problem [15]. Another existing sample size ordering is based on the LR method, which is given as
$$\frac{X_{1}+X_{2}}{n_{1}+n_{2}(X_{1})} \sqrt{n_{2}(X_{1})}, $$
named as the RR-LR approach. Similar to the RR-LR approach, Rosner and Tsiatis [14] discussed another ordering based on the score test:
$$\frac{X_{1}+X_{2}}{n_{1}+n_{2}(X_{1})} {n_{2}(X_{1})}. $$
This approach is referred to be as the RR-Score approach. In general, we would expect a significant number of ties when only the response rate is used to order the sample space in traditional two-stage designs. However, the number of ties could be reduced in adaptive design settings as the second stage sample size n
2(X
1) is a non-increasing function of X
1, not a constant.