Adaptive designs based on the truncated product method
 Markus Neuhäuser^{1} and
 Frank Bretz^{2}Email author
DOI: 10.1186/14712288530
© Neuhäuser and Bretz; licensee BioMed Central Ltd. 2005
Received: 12 December 2004
Accepted: 19 September 2005
Published: 19 September 2005
Abstract
Background
Adaptive designs are becoming increasingly important in clinical research. One approach subdivides the study into several (two or more) stages and combines the pvalues of the different stages using Fisher's combination test.
Methods
Alternatively to Fisher's test, the recently proposed truncated product method (TPM) can be applied to combine the pvalues. The TPM uses the product of only those pvalues that do not exceed some fixed cutoff value. Here, these two competing analyses are compared.
Results
When an early termination due to insufficient effects is not appropriate, such as in doseresponse analyses, the probability to stop the trial early with the rejection of the null hypothesis is increased when the TPM is applied. Therefore, the expected total sample size is decreased. This decrease in the sample size is not connected with a loss in power. The TPM turns out to be less advantageous, when an early termination of the study due to insufficient effects is possible. This is due to a decrease of the probability to stop the trial early.
Conclusion
It is recommended to apply the TPM rather than Fisher's combination test whenever an early termination due to insufficient effects is not suitable within the adaptive design.
Background
Randomized controlled experiments were introduced by Sir Ronald A. Fisher in the 1920s for agricultural studies and not in order to compare the effects of different treatments in humans. However, according to Palmer [1] the way clinical trials are conducted today is essentially unchanged from Fisher's day. In contrast to agricultural studies most clinical trials require periodic monitoring of the accumulating data, e.g. to minimize the number of experimental patients who will continue with an inferior treatment [[2], p. 360].
Adaptive designs with at least one interim analysis can potentially be used for periodic monitoring. All information from the first stage(s) can be used to plan the following stage(s). A number of adaptive designs have been proposed recently, for an overview see Bauer et al. [3]. Here, we consider the adaptive procedure according to Bauer and Köhne [4] that uses Fisher's product test.
Let k be the number of stages (i.e., there are k  1 interim analyses), and let p _{ i }be the onesided pvalue observed with the ith stage's data, i = 1, ..., k. According to Fisher's product criterion [[5], pp. 37–39] the null hypothesis H_{0} can be rejected at the end of the trial if
In clinical trials boundaries for early stopping after an interim analysis may be incorporated. Obviously, in the case of p _{1} ≤ c _{α} early stopping with the rejection of H_{0} is possible after stage one. In general, H_{0} can be rejected after the jth stage if . In addition, one may terminate the trial due to insufficient effects. A lower limit α_{0} can be included so that the trial is terminated without rejecting H_{0} if p _{1} ≥ α_{0}. According to Bauer and Köhne [[4], p. 1031] a value of 0.5 may be a suitable choice for α_{0}. Bauer and Röhmel [[6], p. 1596] recommended α_{0} = 1 for establishing a doseresponse relationship, that is, no early stopping without rejecting H_{0} at all. In this context, an early stopping due to insufficient effects is not feasible since doses in a plateau region could have been used. In that case, different doses may be used in the following stage.
Note that, in case of α_{0} < 1, larger boundaries for apply for early stopping with the rejection of H_{0}. For a twostage design, one can reject H_{0} after stage one if p _{1} ≤ α_{1} for a value of α_{1} that lies between c _{α} and α [4, 6]. This value can be calculated iteratively using the formula [[4], p. 1032]
As an alternative to Fisher's product test, Zaykin et al. [7] recently introduced a truncated product method for combining pvalues. To be precise, instead of calculating the product of all pvalues, they suggested the use of the product of only those pvalues that do not exceed some fixed cutoff value τ, 0 < τ ≤ 1. The truncated product W _{τ} is defined as
where I(.) is the indicator function. Since the pvalues of the different stages are independent,
When using the truncated product method, the (1α)quantile of the distribution of W _{τ}, , is the critical value for the combination test. Analogous to Fisher's combination test an can be calculated for given α_{0} such that the overall type I error rate is α.
Zaykin et al. [7] and Neuhäuser [8] investigated the truncated product method for combining a large number of pvalues and demonstrated by simulation that it can provide high power. In this paper we investigate whether the truncated product method is also useful for the adaptive design described above. In contrast to previous applications [7, 8] we consider classical experimental questions involving only few pvalues. Very recently, a rank truncated product was proposed as a further alternative [9]. That method uses the product of the K most significant pvalues where K can be chosen. Since we consider the combination of 2 to 4 pvalues only, the rank truncated product does not seem to be appropriate for our aim.
We first present the comparison of the combinations with and without truncation for designs with two stages. Afterwards, designs with more than two stages are investigated. We then illustrate the method using two examples, and conclusions are given in final section.
Methods
In order to compare the adaptive procedures with and without truncation we consider the situation of two parallel groups with means μ_{1} and μ_{2}. There are 100 observations per stage. These observations are subdivided into two groups and are assumed to be normally distributed with a common, but unknown variance σ^{2}. Student's t test is performed in each of the two stages with a onesided significance level of α = 5%.
The overall pvalue, i.e. the pvalue of the combination test, is defined as follows [10]: In case the study stops after stage 1, the overall pvalue equals p _{1}. Otherwise, the overall pvalue is for Fisher's combination test and for the truncated product test.
The case α_{0}= 0.5
First, we consider a study that is terminated early due to insufficient effects if p _{1} ≥ α_{0} = 0.5. Without any truncation (i.e., τ = 1) we have c _{α} = 0.0087 and α_{1} = 0.0233 in this case [4]. However, when we set τ = α_{0} = 0.5, a smaller value for α_{1} but a larger boundary for is obtained. To be precise, the trial can be terminated early with the rejection of H_{0} if p _{1} ≤ = 0.0190, and there is a significance at the end of the trial if W _{τ = 0.5} ≤ = 0.0095.
Power to reject H_{0} in a twostage design with α_{0} = 0.5 (combination of t tests, onesided, α = 0.05)
δ =  0.1  0.2  0.3  0.4  0.5 

25 observations per group in stage one, 75 observations per group in stage two  
τ = 1  0.149  0.343  0.595  0.808  0.929 
τ = 0.5  0.153  0.352  0.605  0.815  0.931 
50 observations per group and stage  
τ = 1  0.162  0.377  0.644  0.854  0.959 
τ = 0.5  0.165  0.384  0.652  0.860  0.961 
75 observations per group in stage one, 25 observations per group in stage two  
τ = 1  0.166  0.386  0.654  0.860  0.962 
τ = 0.5  0.167  0.389  0.657  0.863  0.963 
Boundaries c _{α} and for two to four stages
Number of stages (k)  c _{α}  for τ = 0.5 

α = 0.025  
2  0.00380  0.00408 
3  0.00072  0.00085 
4  0.00015  0.00020 
α = 0.05  
2  0.00870  0.00948 
3  0.00184  0.00222 
4  0.00042  0.00057 
For instance, in the case of 50 observations per group and stage and δ = 0.4 (α = 0.05) the probabilities to reject H_{0} after the first stage are Pr(p _{1} ≤ α_{1}) = 0.496 and Pr(p _{1} ≤ ) = 0.461, respectively. The probability to stop without rejecting H_{0} is Pr(p _{1} ≥ α_{0}) = 0.023 irrespective of truncation. With the fixed sample size of 100 per stage the expected total sample size is 200  100·Pr(stop after first stage). This expected total sample size is 148 for τ = 1, but 152 in case of truncation. Hence, the slight increase in power is connected with a larger expected total sample size.
An a priori fixed sample size for stage two is uncommon within an adaptive design. Instead, a sample size reassessment can be carried out during the interim analysis [11]. Using p _{1} and the difference and variability observed in stage one, we simulated the sample size for stage two needed for an overall power of 80%. The results (not shown) indicate that, in this case, the application of the truncated product method can lead to a smaller expected total sample size.
Nevertheless, there is still a smaller probability to stop the trial after the first stage when the truncation is applied. That is a clear disadvantage in clinical development where early decisions are desirable. Therefore, despite the (small) improvement in terms of power, a truncation does not seem to be preferable within a twostage adaptive design when α_{0} < 1.
The case α_{0}= 1
As mentioned in the introduction, α_{0} = 1 can be a suitable choice, for example when establishing a doseresponse relationship. The choice α_{0} = 1 leads to the same rejection boundary c _{α} for the interim and the final analysis, respectively. Hence, there is α_{1} = c _{α} and . Since c _{α} < , the expected total sample size is decreased due to truncation even in case of a fixed sample size for stage two. For instance, in the case of 50 observations per group and stage and δ = 0.4 (α = 0.05) the probability to reject H_{0} after the first stage is Pr(p _{1} ≤ α_{1}) = 0.342 for τ = 1, but Pr(p _{1} ≤ ) = 0.354 for τ = 0.5. The resultant expected total sample sizes are 166 and 165, respectively. Therefore, a gain in power would be of more importance in case α_{0} = 1.
We now present results for adaptive designs with three and four stages, respectively, and α_{0} = 1. Again, the behaviour of the strategies is investigated for fixed sample sizes in the separate study stages without including the option for sample size reassessment. The trial can be terminated with the rejection of H_{0} after the jth stage if in case of τ = 1 or if in case of truncation. For up to four stages, Table II displays the boundaries c _{α} and for τ = 0.5.
Simulated power to reject H_{0} and expected total sample sizes in three and fourstage designs with α_{0} = 1 (50 observations per group and stage, combination of t tests, onesided, α = 0.05)
δ =  0.1  0.2  0.3  0.4  0.5  

3 stages  
Overall power  τ = 1  0.198  0.498  0.789  0.950  0.993 
τ = 0.5  0.198  0.502  0.799  0.953  0.993  
Expected total sample  τ = 1  293.3  278.7  250.0  213.7  179.6 
size  τ = 0.5  292.7  276.9  247.1  209.9  176.1 
4 stages  
Overall power  τ = 1  0.230  0.590  0.883  0.984  0.999 
τ = 0.5  0.233  0.596  0.888  0.985  0.999  
Expected total sample  τ = 1  389.0  360.0  308.5  254.1  207.6 
size  τ = 0.5  387.6  356.2  302.5  246.8  202.3 
Discussion
In this section we only consider the case α = 0.025 and α_{0} = 1. The first example discussed in this section was presented by Bauer and Röhmel [6]. In a twostage doseresponse study the effect of a new drug on blood pressure was investigated. Assume that the trial would have started with two medium doses. The pvalue for the onesided t test between these two doses in the interim analysis was p _{1} = 0.206. Thus, the study continued with the comparison placebo vs. a higher dose, and the second stage led to p _{2} = 0.0178. The product in the final analysis was p_{1}p_{2} = 0.00367, the corresponding overall pvalue of the nontruncated product test is 0.024. Hence, the combination test is significant even at the 0.025 level.
The second example is a hypothetical clinical study with two stages. We consider a scenario as Bauer and Köhne [[4], p. 1038] in their example. A clinical trial investigates a new therapy for an indication in which no efficient standard therapy is available. For the first stage five individual endpoints have been selected. The first stage's sample size is 30 each in the therapy and the control group. The changes to the baseline measurements of the five endpoints were combined into a single generalized least squares (GLS) criterion according to O'Brien [12], and the first stage's pvalue was p _{1} = 0.1758. Hence, the study continued.
The overall pvalue of the final analysis based on the combination of p _{1} = 0.1758 and p _{2} = 0.1517 (second example) in dependence of the truncation point τ, τ = 1 corresponds to Fisher's product criterion.
τ  pvalue for TPM 

0.1  1 
0.2  0.0801 
0.3  0.0964 
0.4  0.1064 
0.5  0.1130 
0.6  0.1174 
0.7  0.1203 
0.8  0.1221 
0.9  0.1230 
1.0  0.1233 
Conclusion
The application of the truncated product method instead of Fisher's combination test within an adaptive design hardly changes the overall power. Therefore, to decide whether or not a truncation is useful one should focus on the probability to stop early and on the expected total sample size. According to these criteria, a truncation seems to be preferable in case of α_{0} = 1, but not for α_{0} < 1.
A variety of other combination functions exists [13], for example, the inverse normal method was proposed for adaptive designs [14]. According to Rice [15] Fisher's test is "inappropriate when asking whether a set of tests, on balance, supports or refutes a common null hypothesis ... because ... Fisher's statistic is more sensitive to smaller, as compared to larger, Pvalues" [[15], p. 303–305]. In contrast, the inverse normal method is not differentially sensitive to data that support or refute a common null hypothesis. Thus, one may argue that the inverse normal method is more appropriate for an adaptive design if each stage tests the same null hypothesis. However, in the context of a doseresponse study, discussed here as a motivation for α_{0} = 1, different doses may be tested in different stages, that is, the hypotheses tested change. The resultant question is whether at least one stage is significant, and a high sensitivity to small pvalues is desirable. Consequently, Fisher's test or TPM are appropriate. An additional advantage of these two combination methods is that an early termination with rejection of the null hypothesis is possible with α_{0} = 1 and a full level α combination test at the end.
There is also some literature related to the efficiency of adaptive designs, and to the choice of combination functions. Wassmer [16], for example, compared Fisher's product criterion with an alternative adaptive design proposed by Proschan and Hunsberger [17] based on a conditional power function. Wassmer [16] concluded that "no substantial differences between the procedures were found in terms of rejection regions, power, and expected sample sizes". One of the first to investigate optimal adaptive designs for the control of conditional power were Brannath and Bauer [18]. They constructed twostage designs with overall and conditional power, which minimize the expected sample size for different specifications of the alternative. It transpires that there is a variety of different options to combine Pvalues and there is no consensus on the best method to use. In this paper we improve under special conditions Fisher's combination test using the truncated product method.
It is worthwhile to note that the truncation point τ must be specified a priori in the study protocol. Unless determined a priori, the truncated product method can be misused to alleviate an observed significance. A posthoc choice based on the observed maximum of the individual pvalues is therefore not permitted. As discussed above, τ = 0.5 may be a suitable choice. A further argument for this choice is that those pvalues are excluded from the product that indicate a difference in the unanticipated direction. Note that the truncated product does not follow a χ^{2}distribution. Thus, a penalty results for the exclusion of large pvalues. Nevertheless, this exclusion can be advantageous as demonstrated by Zaykin et al. [7] and above for the case of adaptive designs.
For the presentation of the power a onesided significance level of α = 5% was chosen in this paper. However, completely analogous results can be found in case of α = 2.5%. Regarding the choice of α for onesided tests it is referred to Neuhäuser [19].
Appendix
The power of a twostage test according to Bauer and Köhne [4], that is, a combination with τ = 1, is given e.g. by Wassmer [[20], p. 833].
In case of truncation with τ = α_{0} > α the power is
where f _{δ} denotes the respective density under the alternative δ [20]. In case of truncation with τ > α, but α_{0} = 1, the power is
Wassmer [20] presented a SAS/IML program to calculate the power for the twostage test without truncation. Modifications of this program were used to calculate the different powers given above.
Abbreviations
 TPM:

truncated product method
Declarations
Acknowledgements
The authors would like to thank Roswitha Senske for technical support and a reviewer for helpful comments and suggestions.
Authors’ Affiliations
References
 Palmer CR: Ethics, datadependent designs, and the strategy of clinical trials: time to start learningaswego?. Statistical Methods in Medical Research. 2002, 11: 381402. 10.1191/0962280202sm298ra.View ArticlePubMedGoogle Scholar
 Gauch HG: Scientific method in practice. 2002, Cambridge University Press: CambridgeView ArticleGoogle Scholar
 Bauer P, Brannath W, Posch M: Flexible twostage designs: an overview. Methods of Information in Medicine. 2001, 40: 117121.PubMedGoogle Scholar
 Bauer P, Köhne K: Evaluation of experiments with adaptive interim analyses. Biometrics. 1994, 50: 10291041. Correction in Biometrics 52:380View ArticlePubMedGoogle Scholar
 Hedges LV, Olkin I: Statistical methods for metaanalysis. 1985, Academic Press: OrlandoGoogle Scholar
 Bauer P, Röhmel J: An adaptive method for establishing a doseresponse relationship. Statistics in Medicine. 1995, 14: 15951607.View ArticlePubMedGoogle Scholar
 Zaykin DV, Zhivotovsky LA, Westfall PH, Weir BS: Truncated product method for combining Pvalues. Genetic Epidemiology. 2002, 22: 170185. 10.1002/gepi.0042.View ArticlePubMedGoogle Scholar
 Neuhäuser M: Tests for genetic differentiation. Biometrical Journal. 2003, 45: 974984. 10.1002/bimj.200390064.View ArticleGoogle Scholar
 Dudbridge F, Koeleman BPC: Rank truncated product of Pvalues, with application to genomewide association scans. Genetic Epidemiology. 2003, 25: 360366. 10.1002/gepi.10264.View ArticlePubMedGoogle Scholar
 Brannath W, Posch M, Bauer P: Recursive combination tests. Journal of the American Statistical Association. 2002, 97: 236244. 10.1198/016214502753479374.View ArticleGoogle Scholar
 Friede T, Kieser M: A comparison of methods for adaptive sample size adjustment. Statistics in Medicine. 2001, 20: 38613873. 10.1002/sim.972.View ArticlePubMedGoogle Scholar
 O'Brien PC: Procedures for comparing samples with multiple endpoints. Biometrics. 1984, 40: 10791087.View ArticlePubMedGoogle Scholar
 Loughin TM: A systematic comparison of methods for combining pvalues from independent test. Computational Statistics and Data Analysis. 2004, 47: 467485. 10.1016/j.csda.2003.11.020.View ArticleGoogle Scholar
 Lehmacher W, Wassmer G: Adaptive sample size calculations in group sequential trials. Biometrics. 1999, 55: 12861290. 10.1111/j.0006341X.1999.01286.x.View ArticlePubMedGoogle Scholar
 Rice WR: A consensus combined Pvalue test and the familywide significance of component tests. Biometrics. 1990, 46: 303308.View ArticleGoogle Scholar
 Wassmer G: A comparison of two methods for adaptive interim analyses in clinical trials. Biometrics. 1998, 54: 696705.View ArticlePubMedGoogle Scholar
 Proschan MA, Hunsberger SA: Designed extension of studies based on conditional power. Biometrics. 1995, 51: 13151324.View ArticlePubMedGoogle Scholar
 Brannath W, Bauer P: Optimal conditional error functions for the control of conditional power. Biometrics. 2004, 60: 715723. 10.1111/j.0006341X.2004.00221.x.View ArticlePubMedGoogle Scholar
 Neuhäuser M: The choice of α for onesided tests. Drug Information Journal. 2004, 38: 5760.Google Scholar
 Wassmer G: A technical note on the power determination for Fisher's combination test. Biometrical Journal. 1997, 39: 831838.View ArticleGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/5/30/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.