Conducting the non-inferiority test for the means with unknown coefficient of variation in a three-arm trial

Background The non-inferiority test is a reasonable approach to assessing a new treatment in a three-arm trial. The three-arm trial consists of a placebo, reference, and an experimental treatment. The non-inferiority is often measured by the mean differences between the experimental and the placebo groups relative to the mean differences between the reference and the placebo groups. Methods To cope with possible estimation distortion due to the allowance of heteroskedasticity, we adjust the measurement of non-inferiority by the incorporation of coefficient of variation (CV) of the experimental, the reference and the placebo groups. In this research, we propose a generalized \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p$$\end{document}p-value based method (GPV-based method) to facilitate non-inferiority tests for the means with unknown coefficient of variation in a three-arm trial. Results The simulation results show that the GPV-based method can not only adequately control type I error rate at nominal level better but also provide power higher than those from Delta method and the empirical bootstrap method, which verifies the feasibility of our adjustment. Conclusions We revise the measurement of non-inferiority by deducting the CV of each kind of treatment from the average effect of trials. CVs are included in the non-inferiority explicitly to help prevent possible estimating distortion if heteroskedasticity is allowed. Through the simulation study, the performance of GPV-based method for facilitating non-inferiority tests for the means with unknown CV in a three-arm trial is better than those from empirical bootstrap method and Delta method for small, medium and large sample sizes. Hence, the GPV-based method is recommended to be used to conduct the non-inferiority test for the means with unknown CV in a three-arm trial. The GPV-based method still performs well in non-normality cases. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-023-01990-w.


Background
The goal of a non-inferiority test is to determine whether the experimental treatment is statistically not inferior to the active control in a clinical trial.The three-arm clinical trial for non-inferiority test is validated by the recommendation from U.S. Food and Drug Administration (FDA).The three-arm trial, consisting of a placebo, reference, and an experimental treatment, shows the substantial superiority of the comparator over the placebo which is assessed prior to the comparison of reference and new experiment treatment [1].Pigeot et al. [2] formulated the problem of non-inferiority test in threearm trial as a ratio, which is the mean in experimental groups to the mean in reference groups, while deducting the mean in placebo groups respectively.Under a given threshold α 0 (say 0.8), if the alternative hypothesis holds true, then it implies that the efficacy of the experimental group relative to that of the placebo group is more than α 0 ×100% of the efficacy of the reference compound relative to that of the placebo group.Under normality and homogeneous variance assumption, Pigeot et al. [2] developed a test statistic in t-distribution to construct the confidence interval for the hypothesis of ratio by Fieller's method.Meanwhile, Hasler et al. [3] derived a t-distributed test statistic under the variance heteroscedasticity assumption and the confidence intervals based on Fieller's method.
In the above literatures, the test statistic of a non-inferiority test in the three-arm trial is the sample mean difference between the experimental and placebo groups denominated by that between the reference and placebo groups in the three-arm trial.It's well perceived that the sample mean is an unbiased estimator for population mean.Casting aside the unbiasedness, Searls [4] proposed an estimator for mean that includes a known coefficient of variation (CV) in advance, which has a minimum mean square error.In Wu and Hsieh [5], through estimating the population mean of treatment effects in a three-arm rial by Searls' estimator rather than traditional simple sample mean, they show that Searls' estimator performs better, in terms of empirical size and empirical power.Thangjai et al. [6] derives the expectation and variance of Searls' estimator (with unknown CV).Moreover, Thangjai et al. [6] also constructed the confidence intervals for mean and difference of means of normal distributions with unknown coefficients of variation.In this study, we try to use the concept of Thangjai et al. [6] to propose the non-inferiority test procedure in the three-arm trial in which the non-inferiority is measured as the mean difference with unknown coefficient of variation between the experimental and the placebo groups relative to that between the reference and the placebo groups.Since the assumption of heterogeneous variances complicates the distributions of estimators of the difference between the mean with unknown CV of the experimental and the placebo groups relative to that between the reference and the placebo groups, it is a challenge to measure the non-inferiorities of new treatments in the threearm clinical trial.Consequently, we propose the generalized p-value based method (hereafter GPV-based method) that is the statistical test procedure to assess the non-inferiority test in the three-arm trial under heterogeneous variances assumption with unknown coefficient of variation of treatments.
Typically, in the three-arm non-inferiority tests, variances of the effects of trials are assumed to be homogeneous.But if the variances are heterogeneous, the impacts of heteroskedasticity on the test results are evaluated less times.The heteroskedasticity is an issue frequently encountered in the field of econometrics, which results in the problem of biased variance estimates and hence distorts the results of hypothesis tests such as CHOW's coefficient stability test, Student's t-test, and Fisher's F-test [7].Though earlier researches use the tests on variances to detect whether heteroskedasticity exists in the model, Li and Yao [8] and Tovohery et al. [7] use the coefficient of variation (CV) to detect such problem.Inspired by Searls [4], in this research, we explicitly incorporate CV into the mean of the observations of trials, that is, substituting the population mean by Searls' estimator in measuring the non-inferiority, to mitigate the impacts of heteroskedasticity on the test results.
Tsui and Weerahandi [9] explicitly defined the generalized test variables (GTVs), showing that the generalized p -value (GPV) is an exact probability in an extreme region accordingly.Based on their contribution, Tsui and Weerahandi [9] demonstrated that how small sample solution can be provided with GPVs to the cases where nuisance parameters emerge such that testing procedures are difficult to be conducted.Since the proposal of the idea of GPVs, they are applied to several hypothesis test subjects.For instance, Liao et al. [10,11] applied the GPV to tolerance intervals; McNally et al. [12] conducted individual and population bioequivalence tests by GPVs; Mathew and Webb [13] constructed the GPVs and GCIs for variance components; Gamage [14] applied GPVs to MANOVA; with the concept of GPVs, Li et al. [15] measured the difference in paired partial area under the receiver operating characteristic (ROC) curves to construct a non-inferiority test for diagnostic accuracy.Gamalo et al. [16] proposed a GPV approach to assessing the non-inferiority in a three-arm trial, in which the hypothesis test taken into account is the same as those in Hasler et al. [3].
The article is organized as follows.The statistical problem of the non-inferiority hypothesis test with unknown CV in three-arm trial is formulated and the test procedures implemented in bootstrap method and Delta method are derived in the second part of the article.In addition, we propose the GPV-based test for the ratio of mean differences which explicitly incorporating the unknown CV to assess the non-inferiority in a threearm trial in the second part of the article.Furthermore, the empirical size and power of the proposed testing procedures are examined in simulation studies under a variety of scenarios.The proposed method is applied to a numerical example in the literature.Conclusion and some remarks are drawn in finally.

Methods
Let the clinical observations of experimental treatment, reference, and placebo groups be respectively denoted as X E,i , X R,j and X P,k , which are mutually independent and normally distributed with expectations µ E , µ R and µ P , and unknown variances σ 2 E , σ 2 R and σ 2 P , respectively.Since the variance in the reference group is the gold standard in the three-arm trial, to allow for a fair standard of non-inferiority test, in this study, we assume that the variance of the experimental treatment group is equal to that of the reference group, but which is heterogeneous to that of the placebo group.Specifically, and n E ,n R and n P can be unequal.Firstly, establishing the statistical testing problem where where σ 2 E = σ 2 R and δ 0 is a relevant non-inferiority thresh- old.For ξ 0 ∈ (0, 1) , we specify δ 0 as a proportion of the difference between θ E and θ R by δ 0 = (ξ 0 − 1)(θ R − θ P ) .Then rewriting the hypothesis based on the ratio of the differences in means with unknown CV yields where ξ 0 represents the effectiveness threshold between 0 and 1.The value of θ R − θ P is necessarily greater than 0. Because the threshold value ξ 0 is defined as a proportion of the difference θ R − θ P , it is important to select proper reference or positive control.In this way, the evaluation of the non-inferiority in the three-arm trial is specified as a ration of difference in population mean with unknown CV, as is discusses in the background of the text.

Empirical bootstrap method
The bootstrap method has become a widely used technique for statistical inference problem in which either the underlying distributional assumptions are not normal distribution, or the sample statistic is not feasible to derive its distribution under the null hypothesis (Efron and Tibshirani [17]).Now that the variance of experimental treatment group is equal to that of reference group (which is heterogeneous to that of the placebo group), we use the residual method to construct the empirical bootstrap procedure to assess the non-inferiority of a new treatment in a threearm trial.The residual method is somewhat similar to the percentile method, except that it is based on the bootstrap distribution of residuals from the original estimate [18].The empirical bootstrap procedure can be obtained as follows.
Step1: Suppose that and x P = x P,1 , . . ., x P,n P denote the clinical observations for experimental, reference and placebo groups, respectively.Generate a boot- with replacement from the original sample x = (x E , x R , x P ) and draw samples with replacement from each group with sample sizes n E , n R and n P , respectively.Then, non-inferiority can be claimed if L ξ b > ξ 0.

Delta method
Let ξ 1 = θ E − θ P be the difference of population mean with unknown CV in experimental group and placebo group and let ξ 2 = θ R − θ P be the difference of population mean with unknown CV in reference group and placebo group.Therefore, the expectations and variances of ξ1 and ξ2 can be obtained by Thangjai [6].The Delta method is proposed in Dorfman [19].Such method is the result of the application of the concept of Taylor's theorem (series expansion) to construct the normal distribution of the estimators in complex forms asymptotically.Accordingly, the threshold, is distributed asymptotically as.
where When the null hypothesis holds, for the non-inferiority hypothesis test in terms of population mean with unknown CV as shown in (1), the rejection region constructed under Delta method is.
where z α denotes the upper α critical point of the stand- ard normal distribution.

The GPV-based method
Suppose X to be the random variable whose PDF is f (X; ζ ) , where ζ = (ξ , η) .The ξ is parameter of interest such that ξ = θ E −θ P θ R −θ P and η denotes a vector of nuisance parameters.Let x be the observed value of the random var- iable X .The statistic T = T (X; x, ζ ) is said to be a general- ized test variable if the following three properties hold.
Property A: Fixing x and let ζ = (ξ 0 , η) , the distribu- tion of T (X; x, ζ ) is independent of nuisance param- eters η.Property B: The observation of T (X; x, ζ ) , t obs = T (x; x, ζ ) , does not dependent on unknown parameters.Property C: For given x and η , P(T (X; x, ζ ) ≥ t) is either stochastically increasing or decreasing in ξ for any given t .
Without loss of generality, considering the following hypothesis: to test H 0 : ξ ≤ ξ 0 versus H 1 : ξ > ξ 0 , where ξ 0 is a specified value.If T is stochastically increasing in ξ , then the generalized p -value can be defined as.
For the test with a significance level α , if p < α , then we have confidence to reject H 0 .The generalized test variable T is often computed by using Monte-Carlo algorithm, due to the complexity of the exact distribution.
In the following, we use the concept of generalized pivotal quantity (GPQ) by Weerahandi [20] to develop the required generalized test variables (GTVs) to assessment non-inferiority of a new treatment in a three-arm trial measured as a ratio of difference in mean with CV of each treatment.For developing the GTV for hypothesis test in (1), we first define GPQs for µ x E , x R and x P be the observed values of X E , X R and X P , R and s 2 P be the observed values of S 2 E , S 2 R and S 2 P .In addition, we use pooled estimator S 2 pooled to estimate both σ 2 E and σ 2 R .The pooled estimator is defined as , and the s 2 pooled be the observed value of S 2 pooled .Moreover, Z E , Z R , Z P , U E , U R and U P are mutually independent.
The GPQ of ξ = θ E −θ P θ R −θ P can thus be defined as Hence, we can construct a GTV for ξ given by Given the observed data, the observed value of R ξ is equal to ξ and R ξ has the distribution that is free of parameters.Hence, the distribution of T ξ does not depend on nuisance parameters for any given value of ξ = ξ 0 , and that the observation of T ξ is equal to zero.Consequently, Property A and Property B are satisfied.
(4) R µ P = x P − Z P (n P − 1)s 2 Furthermore, the distribution function of T ξ can be expressed as Since the distribution function of T ξ is stochasti- cally increasing in ξ , Property C is also satisfied.By definition, T ξ is a GTV of ξ .To test the hypothesis H 0 : ξ ≤ ξ 0 versus H 1 : ξ > ξ 0 , the following Monte- Carlo algorithms are provided to derive the required GPV.
Step 1: Choose Monte-Carlo samples large enough, e.g., H = 1000010000 .For each h , 1 ≤ h ≤ H , gen- erate three pairs of random outcomes from mutually independent chi-square distributions, U E , U R and U P (with n E − 1 , n R − 1 and n P − 1 degrees of freedom) respectively, and standard normal variables Z E , Z R and Z P .
Since T ξ is stochastically increasing in ξ and the observed value of T ξ is equal to zero, the GPV is thus esti- mated by p = H h=1 I T ξ ,h ≤ 0 H .Under significance level α , the null hypothesis H 0 : θ E −θ P θ R −θ P ≤ ξ 0 in (1) is rejected whenever p < α.

Results
To evaluate the efficacy of the proposed method, three sets of simulation studies are conducted.First, the empirical sizes from GPV-based method are compared with those from the Delta method and empirical bootstrap method in various finite sample sizes.Second, we evaluate the empirical power among the three tests and compare the performance of the proposed GPV-based method with that of other two tests.Third, we show that GPV-based method can be well applied to nonnormality cases.

Simulation study I: type I error rate
We conducted a simulation study of the type I error rates under GPV-based, Delta and empirical bootstrap methods.The non-inferiority limit is chosen as ξ 0 = 0.8 .We consider the following three cases of �=µ R − µ P : (i) = 9 ; (ii) = 15 and (iii) = 20 .We consider the allo- cations of 3:2:1 of the total sample size n for experimen- tal, reference and placebo group, so the total sample sizes will choose as follows: n = 60, 90,120,480 and 900, respec- tively.For cases (i)-(iii), the population mean of placebo group ( µ P ) is set to be 16.5.The population mean of (13 to be 1 and τ P = σ 2 P σ 2 E to be 0.5, 1.0 and 2.0, respectively.In this way, we keep variances of experimental and reference treatments homogeneous, while allowing heteroskedasticity for placebo group.In this simulation study, the standard deviation of placebo group ( σ P ) is set to be 7.5, and the standard deviation of reference group ( σ R ), as well as the standard deviation of experimen- tal group ( σ E ), are both equal to σ p √ τ p .In addition, given any pair of (µ i , σ i ) , i = E, R, P , θ i and hence θ E − θ P , θ R − θ P can be derived.
Under each parameter specification, the simulation data are independently generated 10,000 times.The empirical size and power are computed by the proportion of the 10,000 simulated p-values that are less than 5% (significance level).Given the above nominal significance level and simulation random samples, if a testing procedure can adequately control the size at the 5% nominal level, then the empirical sizes should fall into (0.0457, 0.0543).In this simulation study, for each sample, 5000 GPQs are constructed, and 1000 bootstrap samples are drawn.We display the simulation results in Table 1.
Table 1 presents the results of the type I error rates simulation based on the ratio of population mean differences with unknown coefficients of variation for assessing noninferiority of a new treatment in a three-arm trial in the presence of heteroscedasticity with non-inferiority limit of 0.8 under normal assumption.The simulation results lead us to the following conclusions.
(1) In Table 1, the range of the type I error rates of the GPV-based method is given by (0.0475,0.0518).This range is within (0.0457, 0.0543), and most of the type I error rates of the GPV-based method are quite close to nominal value of 0.05.Therefore, the test procedure of the GPV-based method can maintain type I error rate close to the nominal level of 5% adequately.(2) In addition, from Table 1, the range of the type I error rates from Delta method is (0.0001,0.0058).
The ranges of the type I error rates of the Delta method are all outside the range of (0.0457, 0.0543), and all of which are far less than nominal value of 0.05.One may observe that Delta method is quite conservative.However, in some extreme cases (not shown in Table 1), such as τ p = 0.01 , and n = 96, 000 , Delta method controls type I error rate much better, and the difference in power between GPV-based and Delta methods shrinks.Apparently, the extreme cases are infeasible for practical clinical application.
(3) On the other hand, the range of the type I error rates from the empirical bootstrap method is (0.0001,0.0477).There are only 5 out of 45 (11.1%) empirical sizes from the empirical bootstrap method fall within (0.0457, 0.0543).As a result, the test procedure by the empirical bootstrap method is quite conservative, except when µ R − µ P = 20 , n ≥ 480 , τ R = 1 and τ P = 2 .As the mean difference between reference and placebo groups gets larger, the bootstrap method controls type I error rate better.
Taken as a whole, the GPV-based method performs extremely well in most cases, and it clearly controls the sufficient the type I error rates better, especially in the small sample cases.

Simulation study II: empirical power
To study the empirical power of the GPV-based method, we consider a simulation in the case of µ E − µ P = 9 and µ E − µ P = 20 ; τ R = 1 and τ P = 2 ; sample size = 60,120 and 480.We allocate total sample for experimental, reference and placebo group by n E : n R : n P = 3: 2: 1.The non-inferiority limit is also chosen as ξ 0 = 0.8 , and the significance level is set to be 0.05 as well.For each combination of parameter specification, 10,000 random samples are generated.For each random sample, 5000 GPQs are constructed, and 1000 samples are drawn for bootstrap method.The results of the empirical power curves are provided in Fig. 1.
Figure 1 provides the power of the simulation by GPV-based method, the Delta method, and the empirical bootstrap method.In Fig. 1, when the mean difference of reference and placebo groups is 9, the GPV-based method is uniformly more powerful than the Delta method and the empirical bootstrap method.Figure 1 shows the power curves as a function of ξ = θ E −θ P θ R −θ P for total sample sizes 60,120 and 480, respectively.The power increases with the increasing values of ξ and with the increasing total sample sizes.However, when the mean difference of reference and placebo groups is 20, the empirical power curves of the GPV-based method and the empirical bootstrap method quite overlap when ξ is larger than 0.9.Therefore, when the mean difference of reference and placebo groups is equal to 9, the performance of empirical power by using GPV-based method is better than those of the Delta method and the empirical bootstrap method.On the other hand, the performance of the empirical bootstrap method is as good as that of GPV-based method when the mean difference of reference and placebo groups is equal to 20 and sample size exceeds 60.In sum, the GPV-based method performs relatively better when the mean difference of reference and placebo groups and the sample size are small.Table 1 The type I error rates for testing non-inferiority with non-inferiority limit = 0.8 in τ R = 1 , µ R − µ P = 9, 15 and 20, respectively n The total sample sizes, GP The GPV-based method, DM The Delta method and

Simulation study III: non-normality cases
In this section, we consider two non-normal distributions, i.e.,log-normal and gamma distributions to study the robustness of the GPQ-based method.When the probability distribution of the population is assumed to be lognormal distribution, let X i , i = E, R, P be mutually independent with means ln(µ i ) − 1 2 ln unknown variances ln belongs to the gamma distribution, denote X i by and γ i2 represent the shape and scale parameters, respec- tively.The same simulation parameters such as µ R − µ P τ R , τ P,n are the same as those in Simulation study I and II.The simulation results of the type I error rates are displayed in Tables 2 and 3, and the simulation results of empirical powers are presented in Table 4.
From Tables 2 and 3, when data follow log-normal or gamma distribution, the performance of GPV-based method can more appropriately maintain the type I error rate near the nominal level of 0.05 than the Delta method and the empirical bootstrap method do.In addition, the type I error rate of the Delta method is quiet conservative as well.Furthermore, under µ R − µ P = 20 , τ R = 1 , τ P = 2 and the total sample size is greater than 900, the type I error rate derived from the empirical bootstrap method approaches the claimed significance level of the non-inferiority test.Moreover, in Table 4, regardless of the sample size and distributions, the empirical power performance of GPV-based method is more powerful than that of the Delta method and the empirical bootstrap method, especially under the µ R − µ P = 9 , τ R = 1 , τ P = 2 and the total sample size is less than 120.

Numerical example: evaluation of the mutagenicity
We adopt the mutagenicity data set in Hauschke et al. [21], which are published by Adler and Kliesch [22] from a micronucleus assay on hydroquinone implementing a positive control of 25 mg/kg cyclophosphamide.The results for male mice at 24 h sampling time are given in Table 5.
Through comparing the difference between a dose group and a vehicle control with the difference between the positive control and the vehicle control, the non-inferiority test can also be adopted to verify the safety in toxicological experiments.Therefore, the above mutagenicity data can be evaluated by such non-inferiority test.Hothorn and Hauschke [23] used the concept of the acceptable maximal safe dose by identifying the highest dose that is non-inferior to the vehicle control, and as a result all other levels of dose below the highest one are also non-inferior.Under the assumption of normality and homogeneous variance,  2 Under Log-normal distribution, the type I error rates for testing non-inferiority with non-inferiority limit = 0.8 in τ R = 1 , µ R − µ P = 9, 15 and 20, respectively n The total sample sizes, GP The GPV-based method, DM The Delta method, and EB the empirical bootstrap method  Hauschke et al. [21] built confidence intervals for the ratio of the difference between the dose groups and the vehicle control to the difference between a positive control and the vehicle control, in which the safety threshold is set to be 0.5.Hence, the hypothesis of the corresponding non-inferiority test can be characterized as follows.
where the dose group is taken as the experimental group, the vehicle control taken as the placebo group and the (14 Table 4 Under non-normal distribution, the empirical powers of testing non-inferiority with non-inferiority limit = 0.8 in τ R = 1 , τ P = 2 n The total sample sizes, GP The GPV-based method, DM The Delta method and EB The empirical bootstrap method n Distribution Method Empirical Power  positive control taken as the reference group.The upper 95% confidence limits for θ E −θ P θ R −θ P calculated from GPVbased method, the Delta method, and the empirical bootstrap method are presented Table 6.
From Table 6, one can see that safety is attainable for the two lower doses, therefore the maximal safe dose is 50 mg/kg.The two higher levels of dose, 75 and 100 mg/ kg, reveal an unacceptable increase.Cases where the variance heterogeneity is taken into account in the GPVbased method, the Delta method, and the empirical bootstrap method, the results do not change.

Conclusions and discussions
We propose the GPV-based method to conduct the non-inferiority test by the difference of means with unknown coefficient of variations between the experimental and the placebo groups relative to that between the reference and the placebo groups under the normality assumption.The main contribution of this research is that we revise the measurement of non-inferiority by considering the coefficient of variation (CV) of each kind of treatment from the average effect of trials.This is slightly different from the traditional non-inferiority test that is difference of means between the experimental and the placebo groups relative to that between the reference and the placebo groups.Besides, through the heuristic statistical testing procedure for non-inferiority test, we incorporate unknown heterogeneous variance among the three arms.Hence, CVs are included in the non-inferiority hypothesis testing explicitly to help prevent possible estimating distortion if heteroskedasticity is allowed.
Empirical results from simulation studies show that the GPV-based method can not only adequately control the type I error rates at the nominal level but also provide power higher than those from the Delta method and the empirical bootstrap method.The performances of empirical type I error rates and empirical power of GPV-based method are better than those from the Delta method and the empirical bootstrap method.Therefore, the GPV-based method is suitable to conduct the non-inferiority test for the means with unknown coefficient of variation in a three-arm trial.The R program for the proposed GPV-based method is available as Supplementary materials 1 and 2.
To further explore the properties of these comparable methods, estimations are conducted for non-inferiority limit under parameter settings as in simulation studies.The non-inferiority limit is chosen as 0.8.For each specified parameter combination, the data are generated 10,000 times independently.The bias, mean square error (MSE) and coverage probability (CP) simulation results of the three methods are shown in Table 7.
From Table 7, the biases from the GPV method are not much different to those from Delta method, but most of which are smaller than the empirical bootstrap method.Furthermore, when the mean difference of the reference and placebo groups is equal to 9 and sample size is less than 120, one can see that the GPQ from GPV-based method has smaller MSE than estimators from the Delta method and the empirical bootstrap method do.On the other hand, the GPVbased method generally provides sufficient coverage probabilities around the confidence level of 0.95.The GPV-based method approach results in fairly better coverage probability than the other two methods do, regardless of the sample size.Moreover, when the mean difference of reference and placebo groups is large than 20, under the ratio of variance of the reference group to the experimental group is 1 and the ratio of variance of the placebo group to the experimental group is 2, the performances of coverage probabilities of the empirical bootstrap method are as good as that of the GPV-based method.Additionally, the coverage probabilities presented by the Delta method are quite conservative as well.
Under the normality assumption, the required percentiles of GPQ for θ E −θ P θ R −θ P (our measurement of noninferiority) cannot be obtained in closed form but may be estimated using Monte-Carlo algorithm.In addition, if the data belongs to non-normal data, we recommend that the power transformation of Box and Cox [24] be performed.
In Wu and Hsieh [5], when conducting non-inferiority test in a three-arm trial, they estimate the sample mean by Searls' estimator (mean with CV) rather than the traditional one (pure sample mean), showing that testing results are better, in terms of empirical sizes and empirical powers.While in our research, different from the traditional three-arm trial, we conduct the non-inferiority test for the means with unknown CVs, and we show that the explicit inclusion of CVs in the measurement

Fig. 1
Fig.1 The power functions of GPV-based method (GP), Delta (Delta method) and Empirical bootstrap method (EB).Panel (A) represents the power functions when µ R − µ P = 9 and n = 60 ; Panel (B) represents the power functions when µ R − µ P = 9 and n = 120 ; Panel (C) represents the power functions when µ R − µ P = 9 and n = 480;Panel (D) represents the power functions when µ R − µ P = 20 and n = 60 ; Panel (E) represents the power functions when µ R − µ P = 20 and n = 120 ; Panel (F) represents the power functions when µ R − µ P = 20 and n = 480 .The significance level of the non-inferiority test is set to be 0.05

Table 3
Under Gamma distribution, the type I error rates for testing non-inferiority with non-inferiority limit = 0.8 in τ R = 1 , µ R − µ P = 9, 15 and 20, respectively n The total sample sizes, GP The GPV-based method, DM The Delta method and EB The empirical bootstrap method

Table 5
Summary statistics for the number of micronuclei per animal and 2000 scored cells for the vehicle control, four doses of hydroquinone and the positive control of 25 mg/kg cyclophosphamide

Table 6
Upper 95% confidence limits for θ E −θ P θ R −θ P , based on the positive control of 25 mg/kg cyclophosphamide

Table 7
Under τ R − τ P = 9, 15, and 20, estimate the Bias, MSE and CP of non-inferiority limit by the GPV-based, the Delta, and the empirical bootstrap methods CP coverage probability µ R − µ P τ P n n The total sample sizes, GP The GPV-based method, DM The Delta method and EB The empirical bootstrap method, MSE Mean square error,