Comparison of alternative approaches for difference, noninferiority, and equivalence testing of normal percentiles

Shieh, Gwowen

doi:10.1186/s12874-020-00933-z

Research article
Open access
Published: 13 March 2020

Comparison of alternative approaches for difference, noninferiority, and equivalence testing of normal percentiles

Gwowen Shieh ORCID: orcid.org/0000-0001-8611-4495¹

BMC Medical Research Methodology volume 20, Article number: 59 (2020) Cite this article

2983 Accesses
2 Citations
1 Altmetric
Metrics details

Abstract

Background

Percentiles are widely used in scientific research for determining the comparative magnitude and reference limit of quantitative measurements. The investigations for point and interval estimation of normal percentiles are well documented in the literature. However, the corresponding statistical tests of hypothesis have received relatively little attention.

Methods

To facilitate data analysis and design planning of percentile study, this paper aims to present hypothesis testing procedures and associated power functions for assessing the difference, noninferiority, and equivalence of normal percentiles.

Results

Numerical illustrations about drug dissolution are provided to demonstrate the usefulness of the suggested exact approaches and the deficiency of approximate methods.

Conclusions

The exact approaches are superior to the approximate methods on the basis of control of Type I errors. Computer algorithms are constructed to implement the recommended test procedures and sample size calculations for percentile analysis.

Peer Review reports

Background

Percentiles are extremely useful for describing the reference threshold and meaningful magnitude of numerical quantities, such as achievement score, developmental index, medical measurement, and physical dimension. The inferential methods for normal means are well documented in the fundamental texts of statistical analysis. However, the methodological aspects and statistical implications of analyzing normal percentiles have been less discussed. It is essential to note that normal percentiles are a linear function of the mean and standard deviation of the underlying population. Because the sample mean and sample variance are complete and sufficient statistics for the population mean and variance, the minimum variance unbiased estimator of a normal percentile can be readily obtained. Specifically, Royston and Mathews [1] compared the minimum variance unbiased estimator and other useful formulas under the intrinsic criteria of bias and mean square error. More advanced and theoretical treatments of normal percentile estimation are also available in Keating, Mason, and Balakrishnan [2], Keating and Tripathi [3], Parrish [4], Rukhin [5], and Zidek [6, 7].

Both exact and approximate confidence intervals of normal percentiles have been considered in several analytical developments. The exact interval estimation of normal percentiles was presented in Meeker, Hahn, and Escobar [8], Johnson, Kotz, and Balakrishnan [9], and Owen [10]. Note that the exact confidence intervals involve the quantiles of a noncentral t distribution. Such critical values are not commonly available in tabulated forms and the implementation necessitates appropriate computing algorithms. To circumvent the reliance on a noncentral t distribution, approximate methods were considered by using the standardization technique and the regular t distribution. Accordingly, the approximate confidence intervals of Bland and Altman [11] and Chakraborti and Li [12] are computationally simple and the interval calculations do not require specialized software. However, the numerical study of Shieh [13] demonstrated that the confidence limits of the approximate methods generally do not preserve the nominal equal-tailed error rates. The finding provides cautionary counterpoint on the practical value of approximate intervals, especially when the sample sizes are small.

The existing investigations present important inferential methodology for point and interval estimation of normal percentiles. However, the related hypothesis testing problems have not been properly explicated in the literature. It is well known that there exists a direct connection between confidence interval and hypothesis testing. But the two approaches are philosophically different in the outset of precision and power viewpoints. Accordingly, to conduct a significance tests for percentiles, the conclusion can be alternatively obtained by examining whether the specified percentile value is contained in the proper two- or one-sided confidence intervals. It appears that percentile analysis can be performed without explicitly defining the desirable test statistics and associated rejection regions. However, power evaluation and sample size planning for hypothesis testing methodologically differ from the precision and sample size considerations in the context of interval estimation. Consequently, it is of theoretical importance and practical interest to document the exact test procedures, power calculations, and sample size determinations for percentile studies.

To enhance the usage of percentile analysis, this article describes hypothesis testing procedures and associated power functions for assessing the difference, noninferiority, and equivalence of normal percentiles. The difference and noninferiority procedures closely follow the two- and one-tail test formulations. In the conventional studies of the population means, a null hypothesis of zero may be informative to address certain essential research questions. The situations associated with percentile assessment are more sophisticated because the target percentile is unlikely a zero value. The percentile tests for difference and noninferiority require researchers to provide a sensible magnitude that corresponds to the percentile threshold for identifying substantial research finding. Moreover, the importance for establishing equivalence instead of no difference has been emphasized in Blackwelder [14] and Parkhurst [15], among others. Further details on the design and analysis of noninferiority and equivalence studies can be found in Fleming et al. [16] and Wellek [17].

Notably, the binomial test of hypotheses concerning quantiles in Mood, Graybill, and Boes ([18], Section 11.3.2) provides an appealing nonparametric alternative. Although the procedure is applicable for all random samples from a continuous distribution, there are not many feasible alpha values for small sample sizes, unless randomized tests are used. In general, the nonparametric tests may be more powerful than their parametric counterparts when normality assumption fails, whereas the nonparametric alternatives are less powerful than the parametric procedures when the conventional assumptions hold. More importantly, the undesirable properties and related problems associated with binomial tests have been addressed in Vos and Hudson [19] and Thulin [20], among others. Comprehensive discussions and reviews for the prevailing Wald large-sample normal test and other alternative interval procedures can be found in Agresti and Coull [21], Newcombe [22], Brown, Cai, and DasGupta [23, 24], and the references therein. The illustrations and appraisals in this article were confined to the test procedures that assume normality of the sampling distribution.

This paper aims to present the exact test procedures for percentile study under the three structural considerations of difference, noninferiority, and equivalence scenarios. For the purpose of providing profound implications in selecting the most appropriate approach, the approximate techniques of Bland and Altman [11] and Chakraborti and Li [12] are also extended to the percentile testing problem. Specifically, Bland and Altman [11] proposed an approximate t distribution for a convenient transformation of the natural, but biased, estimator of the normal percentile. On the other hand, Chakraborti and Li [12] suggested that a standardized minimum variance unbiased estimator also has an approximate t distribution. Note that the simplified considerations proposed in Bland and Altman [11] and Chakraborti and Li [12] may be appealing for inducing computational shortcuts but they do not necessarily maintain the desired accuracy for all settings, especially when the sample sizes are small. Accordingly, it is essential to discern not only which method is most suitable under what circumstances but also the actual differences between the contending test procedures.

Furthermore, the corresponding power and sample size calculations for advance planning of percentile studies are explicated. Monte Carlo simulation study was also conducted to compare the accuracy of the exact and approximate procedures with respect to the control of Type I error rate. Although an exact technique is theoretically better than the approximate methods, the actual performance may not guarantee a substantial difference to justify the need for adopting the exact approach that is methodologically sophisticated and computationally demanding. The current study provides detailed analytic explications and numerical evidences to reveal the discrepancy between the exact and approximate procedures for percentile analysis. A drug dissolution problem and accompanying software programs are employed to illustrate the usefulness of suggested procedures for data analysis and design planning.

Methods

Exact test procedures

Assume X₁, …, X_N are a sample from a N(μ, σ²) population with unknown mean μ and variance σ² for N > 1. The 100pth percentile of the normal distribution N(μ, σ²) is denoted by θ, where

$$ \uptheta =\upmu +{\mathrm{z}}_{\mathrm{p}}\upsigma $$

(1)

and z_p is the (100·p)th percentile of the standard normal distribution N(0, 1). An intuitive, but biased, estimator of the percentile θ is

$$ {\hat{\theta}}_B=\overline{X}+{z}_pS, $$

(2)

where $ \overline{X}=\sum \limits_{i=1}^N{X}_i/N $ and $ {S}^2=\sum \limits_{i=1}^N{\left({X}_i-\overline{X}\right)}^2/\left(N-1\right) $ are the sample mean and sample variance, respectively. Accordingly, the minimum variance unbiased estimator is

$$ {\hat{\theta}}_M=\overline{X}+{z}_p cS. $$

(3)

where c = (ν/2)^1/2Γ(ν/2)/Γ{(ν + 1)/2} and ν = N – 1. Further details about the point estimation properties of $ {\hat{\theta}}_B $ and $ {\hat{\theta}}_{MU} $ are available in Royston and Mathews [1]. Also, the recent study of Shieh [13] compared several confidence interval procedures of θ. In contrast, the focus here is on the hypothesis testing of normal percentiles.

Under the prescribed normal setting for the sample {X₁, …, X_N}, standard derivations show that

$$ {T}_E=\frac{\overline{X}-\theta }{{\left({S}^2/N\right)}^{1/2}}\sim t\left(v,-{z}_p{N}^{1/2}\right), $$

(4)

where t(ν, −z_pN^1/2) is a noncentral t distribution with degrees of freedom ν and noncentrality parameter –z_pN^1/2. The fundamental properties and related extensions of noncentral t distribution can be found in Johnson, Kotz, and Balakrishnan [9].

Tests for difference

To detect the magnitude of a percentile in terms of the hypotheses

$$ {\mathrm{H}}_0:\uptheta ={\uptheta}_0\ \mathrm{versus}\ {\mathrm{H}}_1:\uptheta \ne {\uptheta}_0, $$

(5)

the test statistic is of the form

$$ {T}_{E0}=\frac{\overline{X}-{\uptheta}_0}{{\left({S}^2/N\right)}^{1/2}}, $$

(6)

where θ₀ is a constant. The test rejects H₀ at the significance level α if T_E0 < τ_α/2 or T_E0 > τ_{1 − α/2} where τ_α/2 and τ_{1 − α/2} are the lower and upper (100·α/2)th quantiles of the distribution t(ν, −z_pN^1/2), respectively, for 0 < α < 0.5. Accordingly, it can be shown that the power function is of the form

$$ {\varPsi}_{DI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)<{\uptau}_{\upalpha /2}\right\}+P\left\{t\left(v,\varDelta \right)>{\uptau}_{1-\upalpha /2}\right\}, $$

(7)

where Δ = (μ – θ₀)/(σ²/N)^1/2.

Tests for noninferiority

In addition to the regular test of difference, it is of practical importance to test the hypotheses for noninferiority. The problem of testing noninferiority of percentiles can be presented by the following hypotheses:

$$ {\mathrm{H}}_0:\uptheta \le {\uptheta}_0\;\mathrm{versus}\ {\mathrm{H}}_1:\uptheta >{\uptheta}_0 $$

(8)

when larger values of θ are desired and θ₀ is the designated noninferiority threshold. The test procedure rejects the null hypothesis at the significance level α if T_E0 > τ_1 − α and the associated power function is readily obtained as

$$ {\varPsi}_{NI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)>{\uptau}_{1-\upalpha}\right\}. $$

(9)

On the other hand, if smaller values of θ are preferred, then the following hypotheses should be adopted for the test of noninferiority:

$$ {\mathrm{H}}_0:\uptheta \ge {\uptheta}_0\mathrm{versus}\ {\mathrm{H}}_1:\uptheta <{\uptheta}_0, $$

(10)

where the chosen value θ₀ represents the noninferiority bound. At the significance level α, the rejection region for the lower one-sided test is T_E0 < τ_α and the power function is expressed as

$$ {\varPsi}_{NI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)<{\uptau}_{\upalpha}\right\}. $$

(11)

Tests for equivalence

Unlike the traditional differences-based procedures, equivalence testing provides a proper method for demonstrating the comparability of target percentile. In general, the null and alternative hypotheses of a test of percentile equivalence can be formulated as

$$ {\mathrm{H}}_0:\uptheta -{\uptheta}_T\le -\updelta\;\mathrm{or}\;\uptheta -{\uptheta}_T\ge \updelta\;\mathrm{versus}\ {\mathrm{H}}_1:-\updelta <\uptheta -{\uptheta}_T<\updelta, $$

(12)

where θ_T and δ (> 0) are constants. Accordingly, θ_T is the target value and δ represents the minimum threshold for declaring equivalence between the population percentile θ and θ_T. Following the two one-sided tests procedure proposed by Schuirmann [25] and Westlake [26] for assessing equivalence of mean effects, the null hypothesis is rejected at the significance level α if

$$ {T}_{EL}=\frac{\overline{X}-{\uptheta}_T+\updelta}{{\left({S}^2/N\right)}^{1/2}}>{\uptau}_{1-\upalpha}\;\mathrm{and}\;{T}_{EU}=\frac{\overline{X}-{\uptheta}_T-\updelta}{{\left({S}^2/N\right)}^{1/2}}<{\uptau}_{\upalpha}. $$

(13)

It is important to note that the rejection is an intersection of two one-sided segments in terms of the lower and upper (100·α)th quantiles τ_α and τ_1 − α of the noncentral t distribution t(ν, −z_pN^1/2). The rejection region of $ \overline{X} $ and S²/N has an isosceles triangular shape similar to those in Meyners [27] and Schuirmann [28] for the equivalence procedure of two treatment means. Consequently, the power function of the percentile equivalence test can be written as

$$ {\varPsi}_{EQ}=P\left\{{\uptheta}_T-\updelta +{\uptau}_{1-\upalpha}{\left({S}^2/N\right)}^{1/2}<\overline{X}<{\uptheta}_T+\updelta +{\uptau}_{\upalpha}{\left({S}^2/N\right)}^{1/2}\right\}. $$

(14)

Moreover, it is clear from the fundamental assumption in Eq. 1 that $ Z=\left(\overline{X}-\upmu \right)/{\left({S}^2/N\right)}^{1/2}\sim N\left(0,1\right) $ and K = νS²/σ² ~ χ²(ν), where χ²(ν) denotes the chi-square distribution with ν = N – 1 degrees of freedom, and Z and K are independent. Let H_E = 1 if K < κ_E, and H_E = 0 if K ≥ κ_E where κ_E = (4vNδ²)/{σ²(τ_1 − α − τ_α)²}. Then, the exact power function can be expressed by

$$ {\varPsi}_{EQ}={E}_K\left[{H}_E\left\{\varPhi \left({U}_E\right)-\varPhi \left({L}_E\right)\right\}\right], $$

(15)

where U_E = (θ_T + δ − μ)/(σ²/N)^1/2 + τ_α(K/v)^1/2, L_E = (θ_T − δ − μ)/(σ²/N)^1/2 + τ_1 − α(K/v)^1/2, Φ (⋅) is the cumulative density function of the standard normal distribution, and the expectation E_K is taken with respect to the distribution K. It is essential to note that the probability P{K ≥ κ_E} ≐ 0 in the subsequent numerical assessments under a wide range of model configurations. This phenomenon is similar to the power computations for the equivalence procedure of two treatment means as noted in Siqueira, et al. [29] and Shieh [30]. Therefore, the exact power appraisal can be numerically approximated by

$$ {\varPsi}_{AEQ}=P\;\left\{t\left(v,{\varDelta}_U\right)<{\uptau}_{\upalpha}\right\}-P\left\{t\left(v,{\varDelta}_L\right)<{\uptau}_{1-\upalpha}\right\}, $$

(16)

where Δ_U = (μ – θ_T – δ)/(σ²/N)^1/2 and Δ_L = (μ – θ_T + δ)/(σ²/N)^1/2.

Approximate methods

For the purpose of method comparisons, two different approaches for testing normal percentiles are also presented next. To construct confidence intervals of normal percentiles, Bland and Altman [11] and Chakraborti and Li [12] considered simple t approximations for the standardized forms of $ {\hat{\theta}}_B $ and $ {\hat{\theta}}_M, $ respectively. Their methods are extended and examined here for the three types of difference, noninferiority, and equivalence testing.

The Chakrabort-Li method

In view of the desirable properties of the minimum variance unbiased estimator $ {\hat{\theta}}_M, $ Chakraborti and Li [12] suggested an approximate t distribution for the standardized quantity of $ {\hat{\theta}}_M $:

$$ {T}_M=\frac{{\hat{\theta}}_M-\theta}{{\left(m{S}^2/N\right)}^{1/2}}\dot{\sim} t(v). $$

(17)

where $ m=1+N{z}_p^2\left({c}^2-1\right) $ and t(ν) is a t distribution with degrees of freedom ν. Note that $ Var\left[{\hat{\theta}}_M\right] $ = (mσ²)/N and the denominator of T_M is obtained by a direct substitution of σ² with S² in the standard deviation of $ {\hat{\theta}}_M $.

The simple formulation of T_M provides an alternative test statistic for judging the magnitude of normal percentiles. For the hypothesis test of difference in terms of H₀: θ = θ₀ versus H₁: θ ≠ θ₀, the null hypothesis can be rejected at the significance level α if T_M0 < t_α/2 or T_M0 > t_{1 − α/2}, or equivalently ∣T_M0 ∣ > t_{1 − α/2}, where

$$ {T}_{M0}=\frac{{\hat{\theta}}_M-{\uptheta}_0}{{\left(m{S}^2/N\right)}^{1/2}}, $$

(18)

and t_α/2 and t_{1 − α/2} are the lower and upper 100(α/2)th quantiles of a t distribution t(ν) with degrees of freedom ν, respectively. Under the approximate t assumption, the corresponding power function can be derived as

$$ {\varOmega}_{DI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)<{t}_{\upalpha /2}{m}^{1/2}-{z}_pc{N}^{1/2}\right\}+P\left\{t\left(v,\varDelta \right)>{t}_{1-\upalpha /2}{m}^{1/2}-{z}_pc{N}^{1/2}\right\}. $$

(19)

Similarly, the test statistic T_M0 can be applied for hypothesis testing of noninferiority of percentiles in terms of H₀: θ ≤ θ₀ versus H₁: θ > θ₀. The test procedure rejects the null hypothesis at the significance level α if T_M0 > t_1 − α and the associated power function is

$$ {\varOmega}_{NI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)>{t}_{1-\upalpha}{m}^{1/2}-{z}_pc{N}^{1/2}\right\}. $$

(20)

Moreover, under the hypotheses: H₀: θ ≥ θ₀ versus H₁: θ < θ₀, the test of noninferiority is rejected if T_M0 < t_α and the corresponding power is given by

$$ {\varOmega}_{NI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)<{t}_{\upalpha}{m}^{1/2}-{z}_pc{N}^{1/2}\right\}. $$

(21)

For the case of evaluating percentile equivalence with respect to H₀: θ – θ_T ≤ −δ or θ – θ_T ≥ δ versus H₁: –δ < θ – θ_T < δ, the null hypothesis is rejected at the significance level α if

$$ {T}_{ML}=\frac{{\hat{\theta}}_M-{\uptheta}_T+\updelta}{{\left(m{S}^2/N\right)}^{1/2}}>{t}_{1-\upalpha}\kern0.15em \mathrm{and}\;{T}_{MU}=\frac{{\hat{\theta}}_M-{\uptheta}_T-\updelta}{{\left(m{S}^2/N\right)}^{1/2}}<{t}_{\upalpha}. $$

(22)

Accordingly, the power function can be shown as

$$ {\varOmega}_{EQ}={E}_K\left[{H}_M\left\{\varPhi \left({U}_M\right)-\varPhi \left({L}_M\right)\right\}\right], $$

(23)

where U_M = (θ_T + δ – μ)/(σ²/N)^1/2 + (t_αm^1/2 – z_pcN^1/2)(K/v)^1/2, L_M = (θ_T – δ – μ)/(σ²/N)^1/2 + (t_1 − αm^1/2 – z_pcN^1/2)(K/v)^1/2, and H_M = 1 if K < κ_M, and H_M = 0 if K ≥ κ_M where $ {\kappa}_M=\left( vN{\updelta}^2\right)/\left\{m\upsigma {t}_{1-\upalpha}^2\right\} $. Numerically, the power calculation can be simplified just as Ψ_AEQ given above:

$$ {\varOmega}_{AEQ}=P\left\{t\left(v,{\varDelta}_U\right)<{t}_{\upalpha}{m}^{1/2}-{z}_pc{N}^{1/2}\right\}-P\left\{t\left(v,{\varDelta}_L\right)<{t}_{1-\upalpha}{m}^{1/2}-{z}_pc{N}^{1/2}\right\}. $$

(24)

The Bland-Altman method

Similar to the test procedures based on the minimum variance unbiased estimator, hypothesis testing of normal percentiles can be conducted with the following transformation of $ {\hat{\theta}}_B $ in Bland and Altman [11]:

$$ {T}_B=\frac{{\hat{\theta}}_B-\theta}{{\left(b{S}^2/N\right)}^{1/2}}\dot{\sim} t(v), $$

(25)

where $ b=1+{z}_p^2/2 $. Specifically, the hypothesis testing of percentile difference in terms of H₀: θ = θ₀ versus H₁: θ ≠ θ₀ can be rejected at the significance level α if ∣T_B0 ∣ > t_{1 − α/2} where

$$ {T}_{B0}=\frac{{\hat{\theta}}_B-{\uptheta}_0}{{\left(b{S}^2/N\right)}^{1/2}}. $$

(26)

The associated power function is of the form

$$ {\varXi}_{DI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)<{t}_{\upalpha /2}{b}^{1/2}-{z}_p{N}^{1/2}\right\}+P\left\{t\left(v,\varDelta \right)<{t}_{1-\upalpha /2}{b}^{1/2}-{z}_p{N}^{1/2}\right\}. $$

(27)

To perform the hypothesis testing of noninferiority with H₀: θ ≤ θ₀ versus H₁: θ > θ₀, the test rejects the null hypothesis at the significance level α if T_B0 > t_1 − α and the power function is readily obtained as

$$ {\varXi}_{NI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)>{t}_{1-\upalpha}{b}^{1/2}-{z}_p{N}^{1/2}\right\}. $$

(28)

Likewise, under the hypotheses: H₀: θ ≥ θ₀ versus H₁: θ < θ₀, the test of noninferiority is rejected if T_B0 < t_α and the corresponding power is expressed as

$$ {\varXi}_{NI}\left(\varDelta \right)=P\left\{t\left(v,\varDelta \right)<{t}_{\upalpha}{b}^{1/2}-{z}_p{N}^{1/2}\right\}. $$

(29)

Moreover, for the equivalence test of normal percentiles under the hypotheses of H₀: θ – θ_T ≤ −δ or θ – θ_T ≥ δ versus H₁: –δ < θ – θ_T < δ, the null hypothesis is rejected at the significance level α if

$$ {T}_{BL}=\frac{{\hat{\theta}}_B-{\uptheta}_T+\updelta}{{\left(b{S}^2/N\right)}^{1/2}}>{t}_{1-\upalpha}\kern0.15em \mathrm{and}\;{T}_{BU}=\frac{{\hat{\theta}}_B-{\uptheta}_T-\updelta}{{\left(b{S}^2/N\right)}^{1/2}}<{t}_{\upalpha}. $$

(30)

In this case, the power function has the following formulation:

$$ {\varXi}_{EQ}={E}_K\left[{H}_B\left\{\varPhi \left({U}_B\right)-\varPhi \left({L}_B\right)\right\}\right], $$

(31)

where U_B = (θ_T + δ – μ)/(σ²/N)^1/2 + (t_αb^1/2 – z_pN^1/2)(K/v)^1/2, L_B = (θ_T – δ – μ)/(σ²/N)^1/2 + (t_1 − αb^1/2 – z_pN^1/2)(K/v)^1/2, and H_B = 1 if K < κ_B, and H_B = 0 if K ≥ κ_B where $ {\kappa}_B=\left( vN{\updelta}^2\right)/\left\{b{\upsigma}^2{t}_{1-\upalpha}^2\right\} $. Similar to the other two cases, the power computation can be well approximated by

$$ {\varXi}_{AEQ}=P\left\{t\left(v,{\varDelta}_U\right)<{t}_{\upalpha}{b}^{1/2}-{z}_p{N}^{1/2}\right\}-P\left\{t\left(v,{\varDelta}_L\right)<{t}_{1-\upalpha}{b}^{1/2}-{z}_p{N}^{1/2}\right\}. $$

(32)

Results

Numerical investigations are presented next to examine and compare the fundamental features of the exact and approximate test procedures of percentiles with respect to the control of Type I error rate and accuracy of power and sample size computation.

Tests for difference

For the purpose of illustration, the null $ N\left({\upmu}_0,{\upsigma}_0^2\right) $ distribution is set as N(0, 1) and two different mean values are considered for the alternative distribution N(μ, σ²): N(0.4, 1) and N(0.6, 1). The corresponding percentiles θ₀ and θ are simplified as θ₀ = μ₀ + z_pσ₀ = z_p and θ = μ + z_pσ = μ + z_p, respectively, with μ = 0.4 and 0.6. For the difference test of percentile in terms of H₀: θ = θ₀ versus H₁: θ ≠ θ₀, the sample sizes needed to attain the specified power 0.80 for the chosen significance level α = 0.05 are determined by the power functions Ψ_DI, Ω_DI, and Ξ_DI for p = 0.1, …, 0.9. The computed sample sizes for the prescribed three procedures {T_E0, T_M0, T_B0} are summarized in Table 1 for all eighteen combined cases of μ and p. It should be noted that the parameter settings are chosen so that the resulting sample sizes have a reasonable magnitude that is often occurred in practice. Moreover, these situations with small and moderate sample sizes are of great importance in the sense that the contending procedures have the obvious potential of yielding distinct outcomes. Monte Carlo simulation studies of 10,000 iterations were conducted for examining the accuracy of the power functions Ψ_DI, Ω_DI, and Ξ_DI. The results reveal that the simulated powers and the attained powers of all three methods agree to the second decimal place for all cases considered here. To save space, the details are not reported.

Table 1 The error between simulated alpha and nominal alpha for the difference tests of percentile H₀: θ = θ₀ versus H1: θ ≠ θ₀ with μ₀ = 0, σ₀ = 1, σ = 1, and α = 0.05

Full size table

Due to the approximate nature of the t distribution associated with the two approximations of Chakraborti and Li [12] and Bland and Altman [11], it is of statistical concern to validate the control of the Type I error rates. Note that the real distribution of the percentile is skewed when sample size is small and p deviates considerably from 0.5. This implies that the symmetric t approximation of the two test statistics T_M0, and T_B0 is presumably unsuitable. In other words, the two critical values t_α/2 and t_{1 − α/2} are theoretically inaccurate when one-sided rejection probability are evaluated. It is constructive to examine three distinctive Type I errors correspond to the lower-tail, upper-tail, and two-sided rejection regions of the difference tests of percentile.

Accordingly, Monte Carlo simulation studies were also performed to compute the simulated Type I error rates of the exact and approximate test procedures for θ = θ₀ or μ = 0. The simulated Type I error rate was the proportion of the 10,000 replicates whose test statistic fell in the designated rejection region. In the process, the estimates of the lower-tail and upper-tail rejection rates were computed and summed as the overall or two-sided simulated Type I error rate. The accuracy of the control of Type I error rate can be assessed by the differences between the one-sided and two-sided simulation estimates and the nominal values 0.025 and 0.05, respectively. These differences or errors of the three contending test procedures are also reported in Table 1. It can be readily seen from the results in Table 1 that the all three test methods have excellent control of two-sided Type I error rate. The absolute magnitudes of the errors are less than 0.01 for the investigated mean and percentile configurations.

Moreover, the lower-tail and upper-tail rejection rates of the exact approach are also very close to the nominal levels. But the one-sided Type I error rates of the two approximate methods do not maintain the same accuracy especially for low and high percentiles. Despite the desired performance of the approximate tests in overall Type I error rate, the resulting errors of the lower-tail rejection region tend to be negative for small p while those associated with large p are constantly positive. In contrast, the upper-tail errors have the exactly opposite outcomes. For the particular case with μ = 0.6 and p = 0.9, the induced errors for the approximation of Chakraborti and Li [12] are 0.0201 and − 0.0174 for lower and upper rejection regions, respectively. The corresponding deviated percentages are 0.0201/0.025 = 80.4% and 0.0174/0.025 = 69.6%. To the approximate method of Bland and Altman [11], the lower-tail and upper-tail errors are 0.0248 and − 0.0182 with the deviated percentages 0.0248/0.025 = 99.2% and 0.0182/0.025 = 72.8%, respectively.

Tests for noninferiority

The underlying characteristics of the exact and approximate methods for the noninferiority test of percentile are also assessed. With the same model formulations in the previous scenario of difference test, the required sample sizes are computed for the hypotheses H₀: θ ≤ θ₀ versus H₁: θ > θ₀ with the power functions Ψ_NI, Ω_NI, and Ξ_NI. As expected, the result reported in Table 2 is relatively smaller than the counterpart in Table 1 with the identical values of μ and p. Moreover, simulation studies were also performed to appraise the actual performance of Type I error for θ = θ₀ or μ = 0. The errors between the simulated rejection rates and nominal value α = 0.05 are presented in Table 2. Unlike the exact procedure with good control of Type error rate, the two approximate tests do not maintain the required performance. Specifically, when μ = 0.6 and p = 0.9, the absolute errors (absolute error percentage) can be as large as 0.0258 (0.0258/0.05 = 51.6%) and 0.0260 (0.0260/0.05 = 52.0%) for T_M0 and T_B0 of Chakraborti and Li [12] and Bland and Altman [11], respectively. Although the situations improved with increasing sample size as those cases when μ = 0.4, they still suffer some potential deficiency and are outperformed by the exact test.

Table 2 The error between simulated alpha and nominal alpha for the non-inferiority tests of percentile H0: θ ≤ θ₀ versus H1: θ > θ₀ with μ₀ = 0, σ₀ = 1, σ = 1, and α = 0.05

Full size table

Tests for equivalence

For the sake of completeness, numerical examination is extended to the equivalence tests of percentile in terms of H₀: θ – θ_T ≤ −δ or θ – θ_T ≥ δ versus H₁: –δ < θ – θ_T < δ. In this case, the target percentile and threshold are set as θ_T = z_p and δ = 0.6, respectively. The alternative normal distribution is selected as N(μ, 1) and the associated percentile is θ = μ + z_pσ = μ + z_p. Then, the power functions Ψ_AEQ, Ω_AEQ, and Ξ_AEQ are applied to computed the minimum sample sizes required for attaining the nominal power 0.80 at α = 0.05. The resulting sample sizes are listed in Table 3 for μ = 0 and 0.3 and p = 0.1, …, 0.9. It was further justified with simulation studies that the power and sample size calculations of the three procedures are all extremely accurate for all eighteen cases reported here. However, power evaluation is valid and informative only when the critical value satisfies the nominal Type I error rate. Additional simulation studies were employed to assess the control of Type I error rates of the equivalence tests {T_EL, T_EU}, {T_ML, T_MU}, and {T_BL, T_BU} for θ = θ_T – δ or μ = −δ = − 0.6. The errors between the simulated and nominal Type I error rates are presented in Table 3. The assessments show that the two approximate tests {T_ML, T_MU}, and {T_BL, T_BU} are not as good as the exact procedure {T_EL, T_EU}. The deficiency of the two simple t distributions is particularly more prominent when the sample size is small and |p – 0.5| is large.

Table 3 The error between simulated alpha and nominal alpha for the equivalence tests of percentile H₀: θ – θ_T ≤ − δ or θ – θ_T ≥ − δ versus H₁: - δ < θ - θ_T < δ with δ = 0.6, θ_T = z_p, σ = 1, and α = 0.05

Full size table

An example

To demonstrate the usefulness of the suggested techniques and accompanying programs, a quality control application in pharmaceutical products is exemplified and analyzed with the hypothesis testing and sample size procedures. Suppose a sample of the selected batch of tablets is obtained and tested according to the acceptance sampling plan. Specifically, the dissolution performance is assessed in terms of the percentage of tablets dissolved less than a specified amount at a certain time period.

For illustration, the summary statistics of the dissolution values are $ \overline{X}=50.10 $ and S = 1.31 for N = 15. Suppose that the experimenter is interested in the 90th percentile of the distribution of the dissolution quantity. Then, it follows that z_0.9 = 1.2816, c = 1.0180, and the minimum variance unbiased estimator is $ {\hat{\theta}}_M $ = 51.8091. Using the working settings μ₀ = 49.3 and σ₀ = 1.2, the associated 90th percentile value is computed as θ₀ = μ₀ + z_pσ₀ = 50.8379. The test statistic T_E0 has a value of − 2.1816. For testing the hypotheses of H₀: θ = 50.8379 versus H₁: θ ≠ 50.8379, the two critical values or the lower and upper 2.5th quantiles of the noncentral t distribution t(14, − 4.9636) are τ_0.025 = − 8.7695 and τ_0.975 = − 2.7909, respectively. The null hypothesis is rejected and it implies that the 90th percentile of dissolution amount is not 50.8379 at α = 0.05. Also, it can be shown that the values of the two approximate test statistics for Chakraborti and Li [12] and Bland and Altman [11] are T_M0 = 2.0857 and T_B0 = 2.0614, respectively, with the critical value t_0.975 = 2.1448. Hence, the two approximate tests suggest that the null hypothesis cannot be rejected for α = 0.05.

Moreover, a noninferiority test can be formed as H₀: θ ≤ 50.8379 versus H₁: θ > 50.8379. With a critical value τ_0.05 = − 3.1072, it indicates that the 90th percentile of dissolution distribution is higher than 50.8379 at the 5% level of significance. In this case, the critical value for the two approximate methods is t_0.95 = 1.7613. Thus, the two approximate tests lead to the same result as the exact procedure. Assume a equivalence test of the 90th percentile is expressed as H₀: θ – θ_T ≤ −δ or θ – θ_T ≥ δ versus H₁: –δ < θ – θ_T < δ with θ_T = 50 + (1.2816)(1.3) = 51.6660 and δ = 1.2. The test statistics are {T_EL, T_EU} = {− 1.0821, − 8.1776} and the corresponding critical values are {τ_0.95, τ_0.05} = {−3.1072, −8.0108}. Thus, there is sufficient evidence to suggest that the 90th percentile is practically 51.6660 for the 5% level of significance. The resulting values for the approximate tests are {T_ML, T_MU} = {2.8845, − 2.2700} and {T_BL, T_BU} = {2.8761, − 2.3817}. With the associated critical values t_0.95 = 1.7613 and t_0.05 = − 1.7613, they also reach the same equivalence outcome. Supplemental computer programs are provided to take advantage of the embedded statistical functions in the interactive matrix language software of Statistical Analysis System (SAS/IML) [31] for performing the prescribed exact test procedures.

For planning future drug dissolution study, sample size calculations should be considered so that the tests have enough power to confirm meaningful magnitude of percentile. It is commonly assumed that typical sources like published findings or expert opinions can offer plausible and reasonable values for the vital characteristics of future study. Hence, the sample statistics of the summary statistics are used as parameter values μ = 50.1 and σ = 1.31. To achieve the nominal power 0.80 with α = 0.05, the constructed SAS/IML programs reveal that the required sample sizes are N = 21 and 17 for the test of difference: H₀: θ = 50.8379 versus H₁: θ ≠ 50.8379 and the test of noninferiority: H₀: θ ≤ 50.8379 versus H₁: θ > 50.8379, respectively. Moreover, for the abovementioned test of equivalence with θ_T = 51.6660 and δ = 1.2, sample size N = 25 is needed to attaining the nominal power 0.80 at α = 0.05. Note that the exemplifying configurations are included in the user specifications of the SAS/IML programs presented in the supplemental files. Accordingly, users can easily modify the input values in these statements to accommodate their own model specifications.

Discussion

The present investigation generalizes and expands current results in the statistical literature by describing both exact and approximate procedures for the three different percentile tests of difference, noninferiority, and equivalence. The exact approach employs a noncentral t distribution, while the approximate techniques follow the familiar t distribution as considered in Chakraborti and Li [12] and Bland and Altman [11]. Regarding the two approximate procedures, the results of the conventional tests for difference show that the lower critical value t_α/2 is generally too small for lower normal percentiles and is typically too large for the higher normal percentiles. On the other hand, the upper critical value t_{1 − α/2} overestimate and underestimate the correct one for small and large p, respectively. Even the overall Type I error is not an issue, it is statistically improper to recommend a two-sided test procedure on the basis of a combination of some noticeable under- and over-sized critical values and rejection regions. Moreover, despite the relatively involved analytic assessments and computational requirements, the comprehensive numerical appraisals show that the exact approach is superior to the approximate methods on the basis of control of Type I errors.

Conclusion

In view of the conceptual simplicity and context-free feature, percentiles are widely used for determining the relative magnitude and substantial importance of quantitative measurements in all scientific fields. Accordingly, much of the literature has provided the inferential procedures for point and interval estimation of normal percentiles. To extend the applicability of percentile analysis, this article addresses the hypothesis testing problem for the percentiles of a normal distribution. The recommended test procedures and derived power functions are also empirically justified for percentile score assessments and sample size determinations. In order to facilitate data analysis and study planning, specialized computer programs are presented for conducting hypothesis testing and sample size calculation in percentile research.

Availability of data and materials

All data generated or analysed during this study are included in this published article.

Abbreviations

SAS/IML:: The interactive matrix language software of Statistical Analysis System

References

Royston P, Matthews JNS. Estimation of reference ranges from normal samples. Stat Med. 1991;10:691–5.
Article CAS Google Scholar
Keating JP, Mason RL, Balakrishnan N. Percentile estimators in location-scale parameter families under absolute loss. Metrika. 2010;72:351–67.
Article Google Scholar
Keating JP, Tripathi RC. Percentiles, estimation of, Encyclopedia of statistical sciences, vol. VI. New York: Wiley; 1985. p. 668–74.
Google Scholar
Parrish RS. Comparison of quantile estimators in normal sampling. Biometrics. 1990;46:247–57.
Article Google Scholar
Rukhin AL. A class of minimax estimators of a normal quantile. Stat Probability Letts. 1983;1:217–21.
Article Google Scholar
Zidek JV. Inadmissibility of the best invariant estimator of extreme quantiles of the normal law under squared error loss. Ann Math Stat. 1969;40:1801–8.
Article Google Scholar
Zidek JV. Inadmissibility of a class of estimators of a normal quantile. Ann Math Stat. 1971;42:1444–7.
Article Google Scholar
Meeker WQ, Hahn GJ, Escobar LA. Statistical intervals: a guide for practitioners and researchers. New York: Wiley; 2017.
Book Google Scholar
Johnson NL, Kotz S, Balakrishnan N. Continuous univariate distributions, vol. 2. 2nd ed. New York: Wiley; 1995.
Google Scholar
Owen DB. A survey of properties and applications of the noncentral t-distribution. Technometrics. 1968;10:445–78.
Google Scholar
Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–60.
Article CAS Google Scholar
Chakraborti S, Li J. Confidence interval estimation of a normal percentile. Am Stat. 2007;61:331–6.
Article Google Scholar
Shieh G. The appropriateness of Bland-Altman’s approximate confidence intervals for limits of agreement. BMC Med Res Methodol. 2018;18:45.
Article Google Scholar
Blackwelder WC. “Proving the null hypothesis” in clinical trails. Control Clin Trials. 1982;3:345–53.
Article CAS Google Scholar
Parkhurst DF. Statistical significance tests: equivalence and reverse tests should reduce misinterpretation. BioScience. 2001;51:1051–7.
Article Google Scholar
Fleming TR, Odem-Davis K, Rothmann MD, Shen YL. Some essential considerations in the design and conduct of non-inferiority trials. Clin Trials. 2011;8:432–9.
Article Google Scholar
Wellek S. Testing statistical hypotheses of equivalence and noninferiority. 2nd ed. New York: Chapman and Hall/CRC; 2010.
Book Google Scholar
Mood AM, Graybill FA, Boes DC. Introduction to the theory of statistics. 3rd ed. New York: McGraw-Hill; 1974.
Google Scholar
Vos PW, Hudson S. Problems with binomial two-sided tests and the associated confidence intervals. Aust N Z J Stat. 2008;50:81–9.
Article Google Scholar
Thulin M. The cost of using exact confidence intervals for a binomial proportion. Electron J Stat. 2014;8:817–40.
Article Google Scholar
Agresti A, Coull BA. Approximate is better than “exact” for interval estimation of binomial proportions. Am Stat. 1998;52:119–26.
Google Scholar
Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998;17:857–72.
Article CAS Google Scholar
Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Stat Sci. 2001;16:101–17.
Google Scholar
Brown LD, Cai TT, Dasgupta A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann Stat. 2002;30:160–201.
Article Google Scholar
Schuirmann DL. On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics. 1981;37:617.
Google Scholar
Westlake WJ. Response to T.B.L. Kirkwood: Bioequivalence testing–a need to rethink. Biometrics. 1981;37:589–94.
Article Google Scholar
Meyners M. Equivalence tests–a review. Food Qual Prefer. 2012;26:231–45.
Article Google Scholar
Schuirmann DJ. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm. 1987;15:657–80.
Article CAS Google Scholar
Siqueira AL, Whitehead A, Todd S, Lucini MM. Comparison of sample size formula for 2 × 2 cross-over designs applied to bioequivalence studies. Pharm Stat. 2005;4:233–43.
Article Google Scholar
Shieh G. Exact power and sample size calculations for the two one-sided tests of equivalence. PLoS One. 2016;11:e0162093.
Article Google Scholar
SAS Institute. SAS/IML User’s Guide, Version 9.3. Cary: SAS Institute Inc; 2017.
Google Scholar

Download references

Acknowledgements

The author would like to thank the Associate editor and two reviewers for their constructive comments that led to an improved article.

Funding

This work was supported by a grant from the Ministry of Science and Technology (MOST 107–2410-H-009-024-MY3).

Author information

Authors and Affiliations

Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan, 30010, Republic of China
Gwowen Shieh

Authors

Gwowen Shieh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GS conceived of the study, conducted the theoretical examination, carried out the numerical computations, and drafted the manuscript. The author read and approved the final manuscript.

Corresponding author

Correspondence to Gwowen Shieh.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The author declares that he has no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

SAS/IML program for conducting percentile test of difference.

Additional file 2.

SAS/IML program for computing required sample size for percentile test of difference.

Additional file 3.

SAS/IML program for conducting percentile test of noninferiority.

Additional file 4.

SAS/IML program for computing required sample size for percentile test of noninferiority.

Additional file 5.

SAS/IML program for conducting percentile test of equivalence.

Additional file 6.

SAS/IML program for computing required sample size for percentile test of equivalence.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Shieh, G. Comparison of alternative approaches for difference, noninferiority, and equivalence testing of normal percentiles. BMC Med Res Methodol 20, 59 (2020). https://doi.org/10.1186/s12874-020-00933-z

Download citation

Received: 13 September 2019
Accepted: 19 February 2020
Published: 13 March 2020
DOI: https://doi.org/10.1186/s12874-020-00933-z

Comparison of alternative approaches for difference, noninferiority, and equivalence testing of normal percentiles

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Exact test procedures

Tests for difference

Tests for noninferiority

Tests for equivalence

Approximate methods

The Chakrabort-Li method

The Bland-Altman method

Results

Tests for difference

Tests for noninferiority

Tests for equivalence

An example

Discussion

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary information

Additional file 1.

Additional file 2.

Additional file 3.

Additional file 4.

Additional file 5.

Additional file 6.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Research Methodology

Contact us