- Research Article
- Open Access
- Open Peer Review
Efficient confidence limits for adaptive one-arm two-stage clinical trials with binary endpoints
- Guogen Shan1,
- Hua Zhang2 and
- Tao Jiang3Email author
https://doi.org/10.1186/s12874-017-0297-5
© The Author(s) 2017
- Received: 6 October 2016
- Accepted: 23 January 2017
- Published: 6 February 2017
Abstract
Background
Recently, several adaptive one-arm two-stage designs have been developed by fully using the information from previous stages to reduce the expected sample size in clinical trials with binary endpoints as primary outcome. It is important to compute exact confidence limits for these studies.
Methods
In this article, we propose three new one-sided limits by ordering the sample space based on p-value, average response rate at each stage, and asymptotic lower limit, as compared to another three existing sample size ordering approaches based on average response rate. Among the three proposed approaches, the one based on the average response rate at each stage is not exact, and the remaining two approaches are exact with the coverage probability guaranteed.
Results
We compare these exact intervals by using the two commonly used criteria: simple average length and expected length. The existing three approaches based on average response rate have similar performance, and they have shorter expected lengths than the two proposed exact approaches although the gain is small, while this trend is reversed under the simple average criterion.
Conclusions
We would recommend the two exact proposed approaches based on p-value and asymptotic lower limit under the simple average length criterion, and the approach based on average response rate under the expected length criterion.
Keywords
- Adaptive design
- Clopper-Pearson approach
- Exact one-sided interval
- Response rate
- Two-stage design
Background
To assess the activity of a new treatment in a cancer clinical trial, Simon’s two-stage designs [1] are traditionally used among the multi-stage designs. Simon’s two-stage designs can be improved by allowing the second stage sample size to depend on the number of responses observed from the first stage, which is an adaptive two-stage design [2–6]. Recently, Shan et al. [6] developed an adaptive two-stage design that meets the non-increasing relationship between the second stage sample size and the number of responses from the first stage. This is an adaptive one-arm two-stage design that allows the second stage sample size to change with the number of responses observed from the first stage, and the second stage sample size is a non-increasing function of the first stage responses. This sample size monotonic constraint is considered as an intuitive property in adaptive two-stage designs: fewer participants are needed in the second stage when more responses are observed from the previous stage. It should be noted that these adaptive designs [6] often allow early stopping in the first stage due to futility or efficacy, while Simon’s design only allows stopping for futility in the first stage. In these studies, the hypothesis for testing the activity of the new treatment is often one-sided.
Upon completion of an adaptive clinical trial, it is important to provide statistical inference based on the number of responses and the number of participants in each stage. Recently, Zhao et al. [7] proposed a likelihood based approach to construct confidence interval for a study that is designed by Simon’s two-stage method but with unplanned second stage sample size [8, 9]. The likelihood approach was shown to be associated with good performance with regard to coverage probability and coverage bias, but this interval is asymptotic. To guarantee the coverage probability, Shan [10] proposed several new exact confidence intervals based on exact binomial distribution calculation. These intervals are developed for traditional Simon’s two-stage designs whose second stage sample size is considered as fixed regardless the number of responses observed from the first stage as long as it is over the threshold to move to the second stage.
In this article, we consider statistical inference for adaptive two-stage designs whose second stage sample sizes depend on the first stage responses. Adaptive designs are generally flexible and effective as compared to the traditional Simon’s design, however they are often computationally due to many design parameters in the design. With the new adaptive designs being proposed, it is important to develop statistical inference for these designs. To preserve the nominal coverage probability, we propose developing exact one-sided confidence intervals for the response rate in an adaptive two-stage design setting. Confidence intervals are computed by using exact binomial distributions instead of asymptotic distributions. The hypothesis for these trials is often one-sided and the null hypothesis is rejected when a high response rate is observed. For this reason, we focus our approach on exact one-sided lower intervals. The interval is computed by using the approach developed by Clopper and Pearson [11], who proposed the commonly used exact one-sided intervals for a binomial proportion. This approach has to be used in the conjunction with a method to order the sample space.
Multiple approaches have been proposed to order the sample space for multi-stage designs [9, 12–18]. Four orderings were discussed by Jennison and Turnbull [15] after a group sequential design: stage-wise ordering, maximum likelihood estimate (MLE) ordering, likelihood ratio (LR) ordering, and score test (Score) ordering. Lower and upper confidence limits are used in the first ordering. When the outcome is binary, the MLE ordering is equivalent to the ordering by average response rate, which is the number of responses divided by the total sample size in the study. The last two orderings depend on average response rate and the number of sample size from that stage.
In addition to the three aforementioned sample space orderings for a study with binary endpoints, we propose three new methods to order the sample space. The first method is the one based on the response rates from the first and second stages. We consider this is an intuitive method for ordering the sample space based on the information from both stages. However, we find that not all sample points can be ordered by using this method. This leads to the situation in which the nominal coverage probability is not guaranteed. Although the lower limits from the first method are not exact, they can be used as a measurement to order the sample space again. In the second method, each sample point has a unique order number based on the asymptotic lower limit from the first method. The third method uses the p-value of each sample point to create a new ordering of the sample space. We find that the ordering based on the p-value has a very interesting relationship with that from the first method.
In “Methods” section, we first introduce the basic settings for adaptive two-stage designs, then propose three new methods to order the sample space. When a study is stopped in the first stage due to either futility or efficacy, their ordering positions are the same in each method. For this reason, we focus on the ordering for sample points when a trial goes to the second stage. We then investigate the coverage probability for each interval. In “Results” section, we compare the performance of the two proposed exact approaches and the three existing approaches with regards to simple average length (AL) and expected length (EL). These approaches are compared by using the completed sample space. In addition to that comparison, we also introduce a new subsample space including the sample points whose second stage response rate is within the confidence interval of the first stage response rate. In other words, if a study’s first stage response rate is very different from its second stage response rate, the study population could be changed (e.g., disease status, gender ratio), and other approaches should be explored for such cases. For this reason, we also utilize this new sample space in the performance comparison among the exact intervals. Finally, we conclude our research with a discussion in “Discussion” section.
Methods
Upon completion of an adaptive clinical trial, confidence interval for the response rate should be computed and reported. The hypothesis for testing the activity of the new treatment is often one-sided, and the confidence interval and the hypothesis testing should be consistent with each other. For this reason, we focus our interest on one-sided lower intervals as the null hypothesis is rejected when a high response rate is observed. When the significance level is α, a 1−α one-sided interval, (L,1], should be computed for statistical inference, where L is the 1−α lower limit.
This approach is referred to as the RR approach. This ordering is motivated by the p-value calculation for a two-stage study. The rejection region includes the extreme outcomes whose first stage and second stage responses are at least as large as the observed data [8–10, 15].
The response rates are used to define the tail area in the p-value calculation. Since the p-value is used to order the sample space in this approach, we name this approach as the PV approach. Although the p-value calculation may not guarantee the type I error rate, it is still a valid measurement to order the sample space. In this approach, we sort the sample points by the p-value from smallest to largest. In the PV approach, every sample point has its order number based on its p-value, from 1 to the size of the sample space. In the RR approach, two sample points from set G 2 can be ordered only if one sample point belongs to another’s tail area.
where φ is an approach used to order the sample space (e.g., PV, RR), \(\Omega _{PV}(X_{1},X_{2})=\left \{\left (X_{1}^{'},X_{2}^{'}\right):P\right.\left.\left (X_{1}^{'},X_{2}^{'}\right)\right.\left.\leq P(X_{1},X_{2}){\vphantom {(X_{1}^{'},X_{2}^{'}):P}}\right \}\), and \(\Omega _{RR}(X_{1},X_{2})=\left \{\left (X_{1}^{'},X_{2}^{'}\right): \left (X_{1}^{'},X_{2}^{'}\right)\right.\left.\in \Theta (X_{1},X_{2}){\vphantom {\left (X_{1}^{'},X_{2}^{'}\right): \left (X_{1}^{'},X_{2}^{'}\right)}}\right \}\). Since the null hypothesis is rejected for a large response rate, we focus on the one-sided lower limit. The proposed approach to compute exact one-sided lower limits can be readily applied to calculate exact upper limits.
Theorem 1
For any given response rate π, P(Ω PV (X 1,X 2)|π)≥P(Ω RR (X 1,X 2)|π) is always true.
Proof
Thus, the p-value of (X 1,X 2) is not less than that of \(\left (X_{1}^{'},X_{2}^{'}\right)\): \(P(X_{1},X_{2})\geq P\left (X_{1}^{'},X_{2}^{'}\right)\). It follows that the sample point \(\left (X_{1}^{'},X_{2}^{'}\right)\) belongs to Ω PV (X 1,X 2). Therefore, P(Ω PV (X 1,X 2)|π) is always greater than or equal to P(Ω RR (X 1,X 2)|π) □
From Theorem 1 and the construction of the exact one-sided interval in Eq. (1), we see that the exact one-sided lower limit based on the RR approach is always greater than or equal to that based on the PV approach.
Coverage probability for 95% one-sided lower intervals of the four approaches for the AdaptiveS two-stage design with design parameters (π 0,π 1,α,β)=(20%,40%,0.05,0.2). RR approach: top left; PV approach: top right; RR-A approach: bottom left; RR-B approach: bottom right
This approach is referred to be as the RR-Score approach. In general, we would expect a significant number of ties when only the response rate is used to order the sample space in traditional two-stage designs. However, the number of ties could be reduced in adaptive design settings as the second stage sample size n 2(X 1) is a non-increasing function of X 1, not a constant.
Results
Confidence interval length comparison for each sample point from the adaptive two-stage design with design parameters (π 0,π 1,α,β)=(30%,50%,0.05,0.1) when the AdaptiveS design is used. The RR-B approach, the RR-LR approach, and the RR-Score approach are compared
We present the coverage probability for the adaptive design with design parameters (π 0,π 1,α,β)=(20%,40%, 0.05, 0.2) in Fig. 1. It can be seen that the RR-A approach, the RR-B approach, and the PV approach guarantee the coverage probability while the RR approach does not. For this reason, the RR approach is not going to be included in the following performance comparision. We have illustrated that these three approaches are exact with coverage probability guaranteed. Two criteria, simple average length and expected length, are used to compare the performance of these exact limits. As aforementioned, set G 1 and set G 3 represent a study being stopped in the first stage due to futility or efficacy, respectively. Their lower limits are the same for the three approaches, and actually, they are the exact intervals based on the CP method for a binomial proportion. For this reason, we exclude these sample points in the performance comparison.
Average length comparison among the three exact approaches (the RR-A approach, the RR-B approach, and the PV approach), for the 16 different configurations with π 1=π 0+20% when the AdaptiveS design is used. The sample space G 2 (left side) and the subsample space G 2(C I) (right side) are used in computing the 95% one-sided exact lower limits. An approach with a lower average length is preferable
Confidence interval length comparison for each sample point from the adaptive two-stage design with design parameters (π 0,π 1,α,β)=(30%,50%,0.05,0.1) when the AdaptiveS design is used. The RR-A approach, the RR-B approach, and the PV approach are compared on the left side when the sample space G 2 (left side) is used, and on the right side when the subsample space G 2(C I) (right side) is used
Expected length comparison among the RR-A approach, the RR-B approach, and the PV approach, for the adaptive two-stage design with design parameters (π 0,π 1,α,β)=(20%,40%,0.05,0.1) when the AdaptiveS design is used. The sample space G 2 (left side) and the subsample space G 2(C I) (right side) are used to compute the expected lengths
Example
Since the considered adaptive two-stage designs are relatively new, we do not expect to find a real data set to be used. For this reason, we assume that a study is designed by using the AdaptiveS design with design parameters (π 0,π 1,α,β)=(30%,50%,0.05,0.1). In the first stage, X 1=11 responses are observed among the n 1=22 patients. Then, the required number of patients is n 2(X 1)=35 for the second stage. At the end of the study, we assume that X 2=13 responses are observed from the second stage. Thus, the total number of responses is 11+13=24, and the average response rate is estimated as 24/(22+35)=42.1%. The 95% lower limit for the response rate is calcualted as 0.320 for the PV approach, 0.325 for the RR-A approach, 0.317 for the RR-B approach, 0.317 for the RR-LR approach, 0.317 for the RR-Score approach, and 0.344 for the RR approach. It can be seen that the RR-B approach, the RR-LR approach, and the RR-Score approach have very similar lower limits, and they are smaller than others. The RR approach has the largest lower limit among these approaches.
Thus, the sample point (X 1=12,X 2=11) does not belong to the observed data’s confidence set, neither. Therefore, the coverage probability at π=0.343 could be less than the nominal level. In other words, in a two-stage design, inverting the p-value function is exact only when the sample space can be ordered completely by a test statistic or a measurement.
Discussion
In addition to the proposed confidence interval for adaptive two-stage designs to make statistical inference, unbiased response rate estimate and exact p-value calculation are two important future research topics. Due to the multi-stage nature of the design, the naive estimate that divides the total number of responses by the total number of participants, is biased. Jung and Kim [21] proposed an unbiased response rate estimate for traditional non-adaptive two-stage designs where the second stage sample size is a constant for a trial proceeding to the second stage. In adaptive two-stage design settings, the second stage sample size is a non-increasing function of the responses from the first stage. We consider this as future work to further develop an unbiased response estimate and exact p-value calculations in adaptive two-stage design settings.
Conclusions
Multiple adaptive two-stage designs have been proposed for use in practice, but these is limited research on analyzing the data once an adaptive study is finished. This article proposes three approaches to construct one-sided lower limits. Among these three new approaches, two approaches guarantee coverage probability. We compare these two exact intervals with another three existing approaches with regards to the two commonly used criteria: simple average length and expected length. The RR-A approach and the PV approach have similar performance with regards to these two criteria, although the RR-A approach performs slightly better than the PV approach under the AL criterion. The RR-B approach has a slightly longer average length than the other two approaches when all sample points are used in the calculation, and their difference becomes negligible when the subsample space G 2(C I) is used. In addition, the RR-B approach generally has a shorter expected length than the RR-A approach and the PV approach by using the sample space G 2(C I) [22–25].
Declarations
Acknowledgements
We would like to thank the comments from two reviewers and associate editor, who help us to improve the manuscript.
Funding
Shan’s research is partially supported by grants from the National Institute of General Medical Sciences from the National Institutes of Health: P20GM109025, P20GM103440, and 5U54GM104944. Zhang’s work was supported by the Zhejiang Provincial Natural Science Foundation of China (grant no. LY15F020001) and the National Natural Science Foundation of China (grant no. 61170099).
Availability of data and materials
This is a manuscript to develop novel study designs, therefore, no real data is involved.
Authors’ contributions
The idea for the paper was originally developed by GS and TJ. GS and HZ computed exact confidence intervals for adaptive one-arm two-stage designs in this paper. GS, HZ and TJ drafted the manuscript, revised the paper critically and approved the final version.
Authors’ information
Guogen Shan: Epidemiology and Biostatistics Program, Department of Environmental and Occupational Health, School of Community Health Sciences, University of Nevada Las Vegas, 89154 Las Vegas, NV, USA. Hua Zhang: School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, 310018 Zhejiang, China. Tao Jiang (Corresponding author): Department of Statistics, Zhejiang Gongshang University, Hangzhou, 310018 Zhejiang, China.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials. 1989; 10(1):1–10.View ArticlePubMedGoogle Scholar
- Berry DA. Adaptive clinical trials: the promise and the caution. J Clin Oncol. 2011; 29(6):606–9. doi:http://dx.doi.org/10.1200/jco.2010.32.2685.
- Yin G. Clinical trial design: Bayesian and frequentist adaptive methods, 1st edn: Wiley; 2012. http://www.worldcat.org/isbn/0470581719.
- Lin Y, Shih WJ. Adaptive two-stage designs for single-arm phase IIA cancer clinical trials. Biometrics. 2004; 60(2):482–90.View ArticlePubMedGoogle Scholar
- Shan G. Exact statistical inference for categorical data: Academic Press; 2016. http://store.elsevier.com/Exact-Statistical-Inference-for-Categorical-Data/Guogen-Shan/isbn-9780081006818/.
- Shan G, Wilding GE, Hutson AD, Gerstenberger S. Optimal adaptive two-stage designs for early phase II clinical trials. Statist Med. 2016; 35(8):1257–66. doi:http://dx.doi.org/10.1002/sim.6794.
- Zhao J, Yu M, Feng X-PP. Statistical inference for extended or shortened phase II studies based on Simon’s two-stage designs. BMC Med Res Methodol. 2015; 15(1):48. doi:http://dx.doi.org/10.1186/s12874-015-0039-5.
- Koyama T, Chen H. Proper inference from Simon’s two-stage designs. Stat Med. 2008; 27(16):3145–54.View ArticlePubMedGoogle Scholar
- Jovic G, Whitehead J. An exact method for analysis following a two-stage phase II cancer clinical trial. Stat Med. 2010; 29(30):3118–25.View ArticlePubMedGoogle Scholar
- Shan G. Exact confidence limits for the response rate in two-stage designs with over or under enrollment in the second stage. Stat Methods Med Res. 2016. In press.Google Scholar
- Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934; 26(4):404–13. doi:http://dx.doi.org/10.1093/biomet/26.4.404.
- Duffy DE, Santner TJ. Confidence intervals for a binomial parameter based on multistage tests. Biometrics. 1987; 43(1):81–93.View ArticleGoogle Scholar
- Atkinson EN, Brown BW. Confidence limits for probability of response in multistage phase II clinical trials. Biometrics. 1985; 41(3):741–4.View ArticlePubMedGoogle Scholar
- Rosner GL, Tsiatis AA. Exact confidence intervals following a group sequential trial: a comparison of methods. Biometrika. 1988; 75(4):723–9. doi:http://dx.doi.org/10.1093/biomet/75.4.723.
- Jennison C, Turnbull BW. Group Sequential Methods (Chapman & Hall/CRC Interdisciplinary Statistics), 1st edn: Chapman and Hall/CRC; 1999. http://www.worldcat.org/isbn/0849303168.
- Shan G, Ma C. Unconditional tests for comparing two ordered multinomials. Stat Methods Med Res. 2016; 25(1):241–54. doi:http://dx.doi.org/10.1177/0962280212450957.
- Shan G, Ma C, Hutson AD, Wilding GE. An efficient and exact approach for detecting trends with binary endpoints. Stat Med. 2012; 31(2):155–64. doi:http://dx.doi.org/10.1002/sim.4411.
- Shan G, Ma C, Hutson AD, Wilding GE. Randomized two-stage phase II clinical trial designs based on Barnard’s exact test. J Biopharmaceutical Stat. 2013; 23(5):1081–90. doi:http://dx.doi.org/10.1080/10543406.2013.813525.
- Scherer R. PropCIs: Various Confidence Interval Methods for Proportions. R package version 19–0. 2013.Google Scholar
- Banerjee A, Tsiatis AA. Adaptive two-stage designs in phase II clinical trials. Stat Med. 2006; 25(19):3382–95.View ArticlePubMedGoogle Scholar
- Jung S-HH, Kim KMM. On the estimation of the binomial probability in multistage clinical trials. Stat Med. 2004; 23(6):881–96. doi:http://dx.doi.org/10.1002/sim.1653.
- Wang W, Shan G. Exact confidence intervals for the relative risk and the odds ratio. Biometrics. 2015; 71(4):985–995.View ArticlePubMedPubMed CentralGoogle Scholar
- Shan G. Exact statistical inference for categorical data, 1st edn: Academic Press; 2015. http://www.worldcat.org/isbn/0081006810.
- Shan G, Wang W. ExactCIdiff: an R package for computing exact confidence intervals for the difference of two proportions. R Journal. 2013; 5(2):62–71.Google Scholar
- Wang W. Exact optimal confidence intervals for hypergeometric parameters. J Am Stat Assoc. 2015;(512). In press. doi:http://dx.doi.org/10.1080/01621459.2014.966191.