- Research article
- Open access
- Published:
Sample size re-assessment leading to a raised sample size does not inflate type I error rate under mild conditions
BMC Medical Research Methodology volume 13, Article number: 94 (2013)
Abstract
Background
One major concern with adaptive designs, such as the sample size adjustable designs, has been the fear of inflating the type I error rate. In (Stat Med 23:1023-1038, 2004) it is however proven that when observations follow a normal distribution and the interim result show promise, meaning that the conditional power exceeds 50%, type I error rate is protected. This bound and the distributional assumptions may seem to impose undesirable restrictions on the use of these designs. In (Stat Med 30:3267-3284, 2011) the possibility of going below 50% is explored and a region that permits an increased sample size without inflation is defined in terms of the conditional power at the interim.
Methods
A criterion which is implicit in (Stat Med 30:3267-3284, 2011) is derived by elementary methods and expressed in terms of the test statistic at the interim to simplify practical use. Mathematical and computational details concerning this criterion are exhibited.
Results
Under very general conditions the type I error rate is preserved under sample size adjustable schemes that permit a raise. The main result states that for normally distributed observations raising the sample size when the result looks promising, where the definition of promising depends on the amount of knowledge gathered so far, guarantees the protection of the type I error rate. Also, in the many situations where the test statistic approximately follows a normal law, the deviation from the main result remains negligible. This article provides details regarding the Weibull and binomial distributions and indicates how one may approach these distributions within the current setting.
Conclusions
There is thus reason to consider such designs more often, since they offer a means of adjusting an important design feature at little or no cost in terms of error rate.
Background
The last few years interest in various adaptive trial designs has surged [1]. A greater flexibility of clinical study design and conduct has followed from the application of these new ideas [2]. In [1] adaptive designs are defined as:
“...a clinical trial design that allows adaptations or modifications to aspects of the trial after its initiation without undermining the validity or integrity of the trial. ”
More and more trials of this sort are being reported and regulatory bodies take an increasingly favourable view on them [3]. All stand to win if these designs come to optimal use [4]. However, some concerns have been raised. One of these involve the risk of inflating type I error. The current article will assess that risk in the context of sample size adjustable (SSA) designs that allow choosing between raising the sample size, continuing as originally planned, or closing the trial due to futility.
The following will recapitulate parts of [5], add detail and draw conclusions for trial procedures. In that reference the authors show that if the interim results look promising, no inflation of type I error rate occurs. Here ’promising’ means that the conditional power at the current parameter estimate, i.e. the power updated by the accumulated knowledge at interim, amounts to at least 50%. This article will show that a less strict bound applies, in agreement with [6], exhibit the bound in terms of a test statistic, and present mathematical as well as computational aspects of it.
Methods
Assumptions
Denote the planned final sample size by N 0, the number of patients available at the pre-planned interim analysis by n, and the possible raise determined at the interim taking conditional power into account by r. Let us consider a one-sided test at level α based on observing . Here N final = N 0 or N final = N = N 0 + r, depending on a decision taken during the course of the trial. The main result assumes normal distribution, but as will be outlined, it will still hold true for more general distributions. Further, assume the X i to be independent normal with mean θ and variance 1. The null hypothesis states that θ = 0. Define the normalised test statistic Z (x) by . The test rejects if , where z α is the 100 × (1 − α) percentile of the standard normal distribution: Φ(z α ) = 1 − α (Φ being the cumulative distribution function of the standard normal distribution). The normalised test statistic is observed when n patients have provided data, and the Data Monitoring Committee (DMC) will in part base its recommendations on the observed value. At this interim analysis an adaptation may lead to closing the study due to futility, continuing the study without changes or raising the sample size by recruiting an extra r subjects, yielding a total of N = N 0 + r subjects. Closing the study due to futility may only decrease the type I error rate. So let us, for the sake of argument, disregard that possibility, and show that the type I error rate still remains protected.
The study protocol will specify n and N 0, and at the interim we will consider raising the final sample size based on the conditional power evaluated at the current parameter estimate. Since the objective is to assess if the interim results are promising the current estimate of the parameter of interest gives the appropriate information [6]. As pointed out by Müller and Schäfer in [7], the over-all type I error can be preserved unconditionally under any general adaptive change, provided the conditional type I error that would have been obtained had there been no adaptation is preserved. This article however only considers the case of SSA. Unlike the situation in [8] the design does not permit sequential testing. Also, the article only considers the conventional hypothesis tests and p-values without adjustments.
We assess the conditional error rate as a function of r. By showing that the conditional type I error rate is bounded by the error rate which arises from the design without adaptation the unconditional error rate is proven to be controlled at a pre-specified level α.
Derivation of the main result
We use the notation X ∼ N(μ,σ 2) to signify that X follows a normal law with mean μ and variance σ 2.
The change in type I error rate conditional on a sample size increase decided at the interim equals . The conditional distribution equals (Z (N)|Z (n) = z,θ = 0)∼N(ρ z, 1 − ρ 2), where , and similarly for .
Expressing the difference in terms of normal distributions yields
Now, in order to show this difference to be less than or equal to zero it may be equivalently shown that the difference of the arguments is negative (in the sense of non-positive), and denote this by H(r). Obviously H(0) = 0.
To simplify notation put q = n/(N 0 + r) and V = (N 0 + r)/N 0 for arbitrary n,N 0 and r, satisfying N 0 > n > 0, and r > 0. Please note q V = n/N 0. Then we aim to show
This implies
And, then, multiply by −1 and divide by the multiplier of z, to obtain
which after cancelling out becomes
Now let us compare this bound to . Denote the bound in (1) by z α b(q,V), and set out to prove . By subtracting from both sides and equating denominators we have
But the denominator is positive under current assumptions. Thus, we may disregard it, and thus, we need to prove
Division by , moving the right term to the right hand side of the inequality symbol, and squaring both sides yields
By expanding the left hand side product, eliminating terms and multiplying both sides by −1/q, we finally have
which is true for all positive V.
Now regard b as a function of r for n and N 0 fix. One may show that, asymptotically as r ↘ 0. Further, b decreases as r grows in a close to linear fashion. Also, when r tends to infinity.
Please note that since the a sufficient but not necessary condition is , which will be seen to give the conditional power 50% (the simple criterion). Consequently, this new criterion is less restrictive than the one presented in [5], and, importantly, changes with r. The reference [6] provides an example where the type I error remains intact although the conditional power descends down to 36%.
To obtain the conditional power please note that , and furthermore . Then
The minimum of this probability over z > b(q,V)z α equals
From the definition of G(r) it follows that one cannot go further without increasing the conditional error rate. In this sense the bound is optimal.
Weibull ditributed survival time points
We will now study the situation where survival times follow a Weibull distribution and right censoring time points are exponentially distributed.
In [9] the details of an Edgeworth expansion of the product limit estimator are given . First some notation: X = lifetime, T = left truncation time point, Y= right censoring time point, Z = m i n(X,Y), δ = I(X ≤ Y). Further, put C(z) := P(T ≤ z ≤ Z|T ≤ Z). But since T ≡ 0 this probability equals P(Z ≥ z) = P(X ≥ z,Y ≥ z) = P(X ≥ z)P(Y ≥ z). Then W 1(y) := P(Z ≤ y,δ = 1) = P(X ≤ y,Y ≥ X). With , the constant in the Edgeworth expansion equals . As stated we assume X ∼ Weib(λ,β) and Y ∼ Exp(μ). From this follows that C(y) = exp(−μ y−λ y β), . Thus one may at the interim use parameter estimates to calculate a normal approximation to the conditional power. Alternatively, one may simulate the remainder of the trial. A third option is to base the procedure on the logrank test whose statistic converges to a normal distribution. Consider the situation where the time to some event is compared between patients in an active treatment group and those in a control group. Let r i refer to the number of patients remaining at time i and o i refer to the number of observed events. Further, let A refer to the active treatment group and C to the control group. If and , then will asymptotically be standard normal, e.g. [10]. Hence one may apply the simple criterion to z observed at the interim.
Binomial proportion
For the sake of simplicity of exposition we focus attention to a single binomial proportion p and a one-sided test at the 5% level. Let the null hypothesis and alternative hypothesis be H 0 : p = p 0,H 1 : p > p 0. Please note that for the conditional distribution given {X n = k} is the same as , and similarly for .
From this follows that we may obtain G(r) exactly: in terms of R code [11]
However, we will look at a normal approximation. In the binomial case several test statistics with close to normal distribution exist:
-
1.
the score test statistic: ,
-
2.
the log-odds: [12]
The simple criterion would then say that if z as above exceeds , then the procedures protects the type I error rate (unconditionally). But we set out to find a more accurate approximation.
Now using the condition {X n = k}
where q α,m is the 100 × (1−α) percentile of Bin(m,p 0).
Also the binomial distribution X n ∼ Bin(n,p) admits a normal approximation of the pivotal statistic U = (X n − E[X n ])/S D(X n ), which coincides with the score test statistic above, such that
in terms of the third cumulant of U, which picks up the skewness. As a rule of thumb it is often said that the normal approximation is quite accurate when np and n(1 − p) both exceed 5. But this statement holds even without the correction with respect to skewness.
In this case we may approximate the difference G(r) defined above by
where we insert the percentiles obtained from the inversion of the Cornish-Fisher expansion, cf. e.g. [13]:
denoting μ n = n p 0, , and the third cumulant (1 − 2p 0)/σ 0 by γ 0. This quantity will deviate less than 1 from the true percentile for n from 20 to 200, and min{np 0,n(1 − p 0)} > 5. Let us consider G(r) through the pivotal quantities
We will be concerned with the difference
The task is now to identify when this difference is positive. To simplify notation denote by n 1 the larger of the two sample sizes and by n 2 the smaller. After equating the denominators and noting the difference equals:
Disregarding the positive denominator yields the condition:
Some algebra will unearth the condition
Please note that the first term corresponds to expectation of the null distribution. Further, the second term will be negative if p 0 > 0.5, and the third will always be negative under the conditions of this paper. From this follows that the normal approximation of G(r) is non-positive for k satisfying the above condition. Finally, invoke the fast convergence of the binomial distribution towards a normal law, which means that already 20 observations will make the normal approximation quite accurate, provided min{n p,n(1 − p)} > 5. Simulations indicate that this decision rule is accurate already at an interim sample size n as low as 20. However, the rule does not guarantee preservation of the conditional type I error rate for all p. Thus the conclusion is that for the binomial distribution there is no inflation of the unconditional type I error rate under the above conditions. A total of 900000 simulations with n from 20 to 100, p 0 picked randomly in [5/n,1−5/n], k randomly generated from Bin(n,p 0) and N 0 = 2n and r = n gave a median and mean of G(r) equal to −0.004762 and −0.004574, respectively, over the set defined by the inequality above. A set of similar simulations using the simple criterion () gave median and mean equal to −0.02429 and −0.02389, respectively. Thus the simple criterion will be on the conservative side.
Results
Main result
In Methods the following result was derived.
A conditional power that quarantees preservation of nominal significance level
If the conditional power at the interim, which occurs after n out N 0 planned observations and leads to a raise of r, equals at least
where q = n/(N 0 + r), V = (N 0 + r)/N 0, and
then the type I error rate is preserved. The function b satisfies the inequalities .
A more practical criterion, or rule of thumb, may be to derive a test statistic z with close to a standard normal distribution under the null hypothesis, and check whether . This will be referred to as the simple criterion, and stems from [5]. More generally, the condition z > b(q,V)z α suffices (cf. equations (11) and (12) in [6]). The conditional power bound in (2) decreases as r increases, but the lower bound on b implies a limit.
Example
Take the example of n = 55,N 0 = 110,r = 40 and α = 0.025,z α = 1.96. Then the minimum conditional power equals 43%, see next subsection. Thus a conditional power of considerably less than 50% is permissible from the point of view of type I error rate preservation. This may be good to know if the original sample size calculation was grossly wrong. Then recruiting more subjects than planned may resolve the issue without jeopardising the type I error rate. On the other hand, in such a situation the validity of the scientific hypotheses on which the trial design rests may be questioned, and the sponsor will have to judge whether the updated hypotheses suggest a commercially viable route. Nevertheless, in some cases raising the sample size will make sense, and may save the trial from unnecessary disaster.
Above we assume the variance to be known. If it is not we may estimate it and use for instance a t-test statistic which quickly converges to a normal as the sample size increases.
Examination of the t-test has provided evidence of a small degree of inflation [14]. In [15] further details of when inflation occurs are given. However, already at a sample size of 30 the t-distribution and the normal distribution appear almost identical.
Calculations in R
In the statistical software environment R [11] one may easily define functions. Let us regard the bound z α b(q,V) as a function of (n,N 0,r) instead. We may explore the bound z α b(n,N 0,r) through the R function B.func
and, the minimum conditional power through
So, for instance CP.min(alpha1 = 0.025, n = 55, N0 = 110, r = 40) = 0.43, and, B.func(n = 55, N0 = 110, r = 0.01) = 0.7070907, which approximately equals . Also CP.min(alpha1 = 0.025, n = 55, N0 = 110, r = 110) = 0.3575873.
Deviations from normal distribution
If we use non-normal data such as survival type of data, then it is often possible to approximate the test statistic by a normal variate. Many test statistics, e.g. those derived by the maximum likelihood method, converge quickly to a normal distribution when the sample size increases. This feature extends the relevance of the main result to measurements following other distributions than the normal.
In Methods we looked into the situation where a Kaplan-Meier (KM) estimate is used. The Edgeworth expansion of the distribution of the (standardised) KM estimator has the form , where is specified in Methods[9, 16], Φ the cumulative distribution function of a standard normal variate and ϕ its frequency function. So if we express the change in conditional error rate (G(r) below) in terms of this expansion the correction term to difference between normal distribution functions will approach zero as . Assuming some parametric distribution, such as the Weibull distribution, one may work out the details regarding this approximation. Or, one may assess the deviation from normality through a simulation procedure.
In the case of a single binomial proportion p and a one-sided test of the null hypothesis H 0 : p = p 0 versus the alternative hypothesis p > p 0, it holds that if we at the interim observe X n = k satisfying
with σ 0 = p 0 (1−p 0), γ 0 = (1−2p 0)/σ 0 and n 1 = N 0 + r, then inflation of the type I error rate will not occur. More precisely put: on average, over all possible outcomes, the procedure will preserve the type I error rate. However, the conditional error rate will not always fall below the nominal one.
Discussion
There are operational issues with adaptive designs that must be addressed during the planning stage. In order to safeguard the integrity of the trial and avoid operational bias following an unblinded interim precautions need to be put in place to limit access to both the results and, even, the analysis plans. The latter will specify the output and decision rules, but will leave open the possibility of including other information, such as external factors in the final decision whether to stop for futility or to continue, and if so, whether or not to raise the sample size.
Further, a number of concerns have been raised involving the risk of violating statistical principles or lack of efficacy compared to group sequential designs, e.g. [17–19].
However valid these objections may be, more and more practitioners have felt that the challenges are tractable and have found SSA designs an attractive option. For small biotechnology companies this option gives the possibility of starting a trial with rather limited resources, followed by an additional investment conditional on the interim results being promising. Also, the SSA design makes a lot of sense whenever a fix size design would have to rely on quite limited amount of information regarding the primary variable.
Several references have argued the superiority of seamless phase II/III designs over the traditional phase II and III trials. Merging the two phases produces gains in valuable time [20], and, under reasonable conditions, saves sample size [21].
Earlier research has established that a conditional power at the interim analysis exceeding 50% implies that the conditional, and hence also the unconditional, type I error rate is preserved, cf. [5, 7]. Further, the reference [6] builds on [8] and others to identify a more general region where this happens. The region is identified through equations (11) and (12) in [6]. The derivation of the region relies on results for Brownian motion. Together these two equations implicitly define a bound that coincides with b in (3) above.
Further, one cannot use a lower bound without risking inflation of the conditional error rate, and thus one may not rely on the Müller-Schäfer principle of conditional error functions [7] (new does not exceed the original) to prove preservation of unconditional error rate1. By virtue of the Müller-Schäfer principle of conditional error functions any interim decision rule, pre-defined or not, that does not violate this fundamental requirement will permit a redesign of the trial. So from this perspective the SSA designs described here are well behaved and offer great flexibility.
Conclusions
This article has shown that the risk of compromising the nominal significance level of a statistical test by allowing a sample size increase during the course of a trial remains low and controllable. The conditional error rate and power provide key decision tools.
Endnotes
1 Also, by reversing the order of terms in G(r) and tracing the same line of thought one may conclude that a sample size decrease is permissible when results are discouraging. But then it may make more sense to discontinue the trial due to futility.
References
Chang M: Adaptive Design Theory and Implementation Using SAS and R. 2007, Boca Raton, Florida: Chapman and Hall/CRC
Bretz F, Schmidli H, Konig F, Racine A, Maurer W: Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: general concepts. Biom J. 2006, 48 (4): 623-634. 10.1002/bimj.200510232.
FDA: Guidance for industry: adaptive design clinical trials for drugs and biologics. 2010,, [http://www.fda.gov/downloads/Drugs/.../Guidances/ucm201790.pdf],
PhRMA: Working group on adaptive designs. Full white paper. Drug Inf J. 2006, 40: 421-484.
Chen YHJ, DeMets DL, Gordon Lan KK: Increasing the sample size when the unblinded interim result is promising. Stat Med. 2004, 23 (7): 1023-1038,. 10.1002/sim.1688. [http://dx.doi.org/10.1002/sim.1688],
Mehta CR, Pocock SJ: Adaptive increase in sample size when interim results are promising: A practical guide with examples. Stat Med. 2011, 30 (28): 3267-3284,. 10.1002/sim.4102. [http://dx.doi.org/10.1002/sim.4102],
Muller HH, Schafer H: A general statistical principle for changing a design any time during the course of a trial. Stat Med. 2004, 23 (16): 2497-2508,. 10.1002/sim.1852. [http://dx.doi.org/10.1002/sim.1852],
Gao P, Ware J, Mehta C: Sample size re-estimation for adaptive sequential design in clinical trials. J Biopharmaceutical Stat. 2008, 18 (6): 1184-1196,. 10.1080/10543400802369053. [http://www.tandfonline.com/doi/abs/10.1080/10543400802369053],
Wang Q, Jing BY: Edgeworth expansion and bootstrap approximation for studentized product-limit estimator with truncated and censored data. Commun Stat - Theory and Methods. 2006, 35 (4): 609-623,. 10.1080/03610920500498840. [http://www.informaworld.com/10.1080/03610920500498840],
Whitehead J: The Design and Analysis of Sequential Clinical Trials. 1992, Chichester, West Sussex: Ellis Horwood, second edition
R Development Core Team: R: A Language and Environment for Statistical Computing. 2010,, R Foundation for Statistical Computing, Vienna, Austria, [http://www.R-project.org] [ISBN 3-900051-07-0],
Zhou X, Li C, Yang Z: Improving interval estimation of binomial proportions. Philos Trans R Society A: Math, Phys Eng Sci. 2008, 366 (1874): 2405-2418,. 10.1098/rsta.2008.0037. [http://rsta.royalsocietypublishing.org/content/366/1874/2405],
DiCiccio TJ, Efron B: Bootstrap confidence intervals. Stat Sci. 1996, 11 (3): 189-212,. [http://www.jstor.org/stable/2246110],
Friede T, Kieser M: Sample size recalculation in internal pilot study designs: a review. Biom J. 2006, 48 (4): 537-555,. 10.1002/bimj.200510238. [http://dx.doi.org/10.1002/bimj.200510238],
Graf AC, Bauer P: Maximum inflation of the type 1 error rate when sample size and allocation rate are adapted in a pre-planned interim look. Stat Med. 2011, 30 (14): 1637-1647,. 10.1002/sim.4230. [http://dx.doi.org/10.1002/sim.4230],
Chang MN: Edgeworth expansion for the kaplan-meier estimator. Commun Stat - Theory and Methods. 1991, 20 (8): 2479-2494,. 10.1080/03610929108830645. [http://www.informaworld.com/10.1080/03610929108830645],
Burman CF, Sonesson C: Are flexible designs sound?. Biometrics. 2006, 62 (3): 664-669,. 10.1111/j.1541-0420.2006.00626.x. [http://dx.doi.org/10.1111/j.1541-0420.2006.00626.x],
Tsiatis AA, Mehta C: On the inefficiency of the adaptive design for monitoring clinical trials. Biometrika. 2003, 90 (2): 367-378,. 10.1093/biomet/90.2.367. [http://biomet.oxfordjournals.org/content/90/2/367.abstract],
Jennison C, Turnbull BW: Mid-course sample size modification in clinical trials based on the observed treatment effect. Stat Med. 2003, 22 (6): 971-993,. 10.1002/sim.1457. [http://dx.doi.org/10.1002/sim.1457],
Walton M: Adaptive designs: opportunities, challenges and scope in drug development. PhRMA-FDA Workshop. 2006, [http://www.innovation.org/documents/File/Adaptive_Designs_Presentations/04_Walton_Adaptive_Designs_Trial_ Issues_Goals_and_Needs.pdf],
Bischoff W, Miller F: A seamless phase II/III design with sample-size re-estimation. J Biopharmaceutical Stat. 2009, 19 (4): 595-609,. 10.1080/10543400902963193. [http://www.tandfonline.com/doi/abs/10.1080/10543400902963193],
Pre-publication history
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/13/94/prepub
Acknowledgements
Thanks are due to the referee who found an error in an earlier version of the main result and pointed my attention to crucial references I had overlooked.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The author declares that he has no competing interests.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Broberg, P. Sample size re-assessment leading to a raised sample size does not inflate type I error rate under mild conditions. BMC Med Res Methodol 13, 94 (2013). https://doi.org/10.1186/1471-2288-13-94
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1471-2288-13-94