We follow notations similar to that used by Sebille and Bellissant [8]. Let θ be a measure of the difference between the experimental and standard treatments. The clinical trial can be viewed as a test of the null hypothesis of no treatment difference H0 (θ = 0) against the alternative that there is a difference H1 (θ ≠ 0). This parameter is designed such that θ = 0 when treatments are equivalent, θ > 0 () when the experimental treatment is better than the Standard one, and θ < 0 () when the experimental treatment is worse.
The trial considered here only involves the comparison of two normally distributed responses in two-sided tests. We defined the effect size as the difference between treatments in units of standard deviation, θR = (μ2-μ1)/σ where μ1 and μ2 are the means for the standard and experimental groups, respectively, and σ is the common standard deviation (σ1 = σ2 = σ).
Single stage design (SSD)
The traditional statistical approach in the analysis of clinical trials is SSD with equal patients in each group. In this method, the sample size is computed at the design phase based on the significance level (α), difference of clinical interest (θR), and power (1-β). In a two-group comparative study where the response measure is normally distributed, the total sample size formula would be:
where and Zα is the upper 100α% percentile of N(0,1), that is, α = 1 - Φ(Zα). Z
1-β is defined similarly.
If NE and NS denote the numbers of patients assigned to experimental and standard treatments with NE + NS = N being fixed and denotes the proportion on the experimental treatment, then the power under H1 is given by [16]:
In this formula Φ(·) denotes the cumulative function of the standard normal distribution N(0,1).
However, if the investigator decides to allocate patients in unequal ratio and aims to achieve the pre-specified power, then the total sample size for SSD should be adjusted by a factor dependent on the allocation ratio. Therefore, the total sample size for SSDadj is equal to [2]:
where R is the ratio of patients in the experimental group to the standard group or the reverse ratio.
Once the data have been collected, the statistical analysis is conducted. Based on the SSD or SSDadj we cannot stop an ongoing trial before inclusion of a predetermined sample size, even if the early data show a clear difference between treatments.
Boundaries approach: triangular and double triangular tests (TT and DTT)
Sequential boundaries approach, the TT and DTT, permit repeated statistical analyses to be performed throughout the trial recruitment period in order to allow for early termination of a trial while maintaining a pre-specified α and β level. This reduction in sample size has ethical and economic advantages [8]. From the ethical viewpoint, this reduction minimizes the number of patients who will be given an inferior or ineffective treatment. Moreover, from the economic viewpoint, it leads to saving in time and resources. As shown in Figures 1 and 2, the TT and DTT are based on the two-perpendicular axes [2, 9]. These two axes are two sample statistics that play particularly important role in the investigation of θ and are fundamental to sequential trials. The vertical axis is a cumulative measure of the advantage of the experimental treatment, and will be denoted by Z (efficient score for that calculated under the null hypothesis). The horizontal axis, denoted by V, indicates the amount of information about θ contained in Z (Fisher's information) and it will increase as the trial progresses [7]. The straight lines, the boundaries of the tests, delineate a continuation and stopping region. The equations of the straight line boundaries depend on the values of the benefit to detect, and α and β, as well as on the frequency of the analyses, defined in terms of the number of patients included between two analyses [2]. At each analysis, the two statistics V and Z are calculated from all the data collected since the beginning of the study and a point (V, Z) is defined on the sequential plan. The consecutive points define a sample path from the left to the right of the sequential plan. As long as the sample path stays within the two boundaries, the study is continued and new patients are included. When the sample path crosses one of the boundaries, the trial is stopped.
The triangular tests can be categorized in two classes based on their power function. The power function, denoted by C (θ), is defined as the probability that H0 is rejected when the parameter θ is true. When the true treatment difference is θ, C+ (θ) and C- (θ) are the probability of reaching the conclusion that the experimental treatment is significantly better and worse than the standard, respectively. Based on this definition two alternative power requirements will be specified: power requirement I and power requirement II. TT is designed to satisfy power requirement I. In this situation C+(θR) = 1-β but no specification is made for C- (-θR) and also C- (θR) is usually negligible. On the other hand, DTT is designed to satisfy power requirement II. In this situation, C+(θR) = C- (-θR) = 1-β and both C+(-θR) and C- (θR) are negligible [2, 17].
Simulation study
We studied the ASN for the TT and DTT by multiple simulations in PEST3 [17]. Our simulation design was very similar to that used by Sebille and Bellissant [8]. For each studied situation, we generated 30,000 independent comparative trials in which patient responses were drawn from a normal distribution with mean μ1 (mean response in standard group) equal to 10 and the standard deviation equal to 5. The influences of different values of β and θR (μ2) on the statistical properties of all tests were evaluated. The total number of patients at each interim analysis (n) was equal to 12. We also evaluated the influence of the allocation ratio (R) on the statistical properties. R is defined as the ratio of the patients in the experimental group to the standard group. Namely, we chose two different values for β(0.05 and 0.1), seven values for θR (0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0), one value for n (n = 12) and two values for R (1 and 2). The value of α was set to 0.05 for all simulated trials. We also calculated the required sample size for SSD and SSDadj for the same values of θR, R, β and α as for TT and DTT by Formulas (1) and (3), respectively. Moreover, we simulated the required ASN and power for the TT (n = 12), DTT (n = 12), and two-sided SSDadj for different values of R, when θR = 0.7 and β = α = 0.05. For SSD, the required sample size and power were calculated for the same value of R, θR, α and β using Formulas (1) and (2).