- Research article
- Open Access
- Open Peer Review
- Published:

# Confidence regions for repeated measures ANOVA power curves based on estimated covariance

*BMC Medical Research Methodology***volume 13**, Article number: 57 (2013)

## Abstract

### Background

Using covariance or mean estimates from previous data introduces randomness into each power value in a power curve. Creating confidence intervals about the power estimates improves study planning by allowing scientists to account for the uncertainty in the power estimates. Driving examples arise in many imaging applications.

### Methods

We use both analytical and Monte Carlo simulation methods. Our analytical derivations apply to power for tests with the univariate approach to repeated measures (UNIREP). Approximate confidence intervals and regions for power based on an estimated covariance matrix and fixed means are described. Extensive simulations are used to examine the properties of the approximations.

### Results

Closed-form expressions are given for approximate power and confidence intervals and regions. Monte Carlo simulations support the accuracy of the approximations for practical ranges of sample size, rank of the design matrix, error degrees of freedom, and the amount of deviation from sphericity. The new methods provide accurate coverage probabilities for all four UNIREP tests, even for small sample sizes. Accuracy is higher for higher power values than for lower power values, making the methods especially useful in practical research conditions. The new techniques allow the plotting of power confidence regions around an estimated power curve, an approach that has been well received by researchers. Free software makes the new methods readily available.

### Conclusions

The new techniques allow a convenient way to account for the uncertainty of using an estimated covariance matrix in choosing a sample size for a repeated measures ANOVA design. Medical imaging and many other types of healthcare research often use repeated measures ANOVA.

## Background

### Motivation

Computing power for a linear model involving repeated measures requires specifying a set of means and a covariance matrix. Scientists usually feel comfortable specifying a pattern of means that corresponds to a difference of clinical or scientific importance. However, specifying plausible variance and covariance values usually requires estimates from a previous study.

Using data from a previous study to estimate the covariance matrix makes the power value a random variable. Kraemer, et al. [1] noted that if the estimated variance is too small, i.e., when the pilot study is overly favorable, power will be over-estimated. Conversely, if the estimated variance is too large (pilot study overly conservative), power will be under-estimated. Maxwell [2] conducted simulations to illustrate the amount of bias that can occur. Taylor and Muller [3] and Muller and Pasour [4] derived exact distributions of noncentrality and power in univariate linear models based on all combinations of estimated variance and means. The results account for power computed conditional on a previous result being significant, or conditional on a previous result being non-significant. The former creates optimistic bias, while the latter creates pessimistic bias.

Providing confidence intervals to account for the uncertainty inherent in the random power values would be useful for study planning. For example, a lower bound for power would allow stating that a test has power of at least “*P*” to detect an effect, with a specified confidence. A confidence region for a power curve would be even more informative.

Medical imaging research motivated the work here because it often generates the type of complete data that can be handled with the univariate approach to repeated measures (UNIREP). Muller, et al. [5] reviewed the advantages gained by being able to use the UNIREP model, a special case of the general linear mixed model. The same authors described accurate and convenient power approximations for UNIREP analysis. The four UNIREP tests, Box conservative, Geisser-Greenhouse, Huynh-Feldt, and the uncorrected, all use the same test statistic. For data analysis, UNIREP tests differ only by their respective degrees of freedom due to different degrees of freedom multipliers, which measure sphericity in the error covariance for the hypothesis variables. Muller and Stewart [6] provided detailed discussion of the basic theory for both the null and non-null cases. Earlier work detailed basic UNIREP theory. Box [7, 8], Geisser and Greenhouse [9, 10] and Huynh and Feldt [11] gave null results. Davies [12] and Muller and Barton [13, 14] treated the non-null case.

Browne [15] evaluated the impact of using a pilot study to estimate the variance for a *t*-test. More generally, Taylor and Muller [16] demonstrated how to construct exact power confidence intervals for the general linear univariate model for a data-estimated variance and fixed means. The same authors also generalized the result to provide an exact confidence region around a power curve. Parallel results for the UNIREP setting would be equally useful. We generalize the methods in Taylor and Muller [16] to UNIREP tests for repeated measures. We use analytic and simulation results to demonstrate that the techniques allow computing approximate confidence intervals and regions for power with good accuracy for the UNIREP tests, based on an estimated covariance matrix and fixed means.

### Existing results

A vector ** z**, (

*n*×1), is lower case bold. A matrix,

**, is upper case bold with transpose**

*z*

*Z*^{′}, inverse

*Z*^{−1}and generalized inverse

*Z*^{−}. Also,

*1*_{ n }is an (

*n*×1) vector of 1’s and

*I*_{ n }is an (

*n*×

*n*) identity matrix. A diagonal matrix with (

*i*,

*i*) element

*z*

_{ i }is written Dg (

**). The expected value, variance, and trace are E(**

*z***), $\mathcal{V}\left(\mathit{Z}\right)$, and tr(**

*Z***), respectively. Throughout,**

*z**Z*∼

*χ*

^{2}(

*ν*,

*ω*) indicates that

*Z*has a noncentral chi-square distribution with

*ν*degrees of freedom and noncentrality

*ω*, while

*Z*∼

*χ*

^{2}(

*ν*) indicates a central distribution. Similarly,

*Z*∼

*F*(

*ν*

_{1},

*ν*

_{2},

*ω*) indicates

*X*has a noncentral

*F*distribution with

*ν*

_{1}numerator and

*ν*

_{2}denominator degrees of freedom, and noncentrality

*ω*with cumulative distribution function

*F*

_{ F }(

*ν*

_{1},

*ν*

_{2},

*ω*). A central

*F*is written

*Z*∼

*F*(

*ν*

_{1},

*ν*

_{2}) with quantile

*q*indicated ${F}_{F}^{-1}\left(q;{\nu}_{1},{\nu}_{2}\right)$. Writing $\mathit{z}\sim {\mathcal{N}}_{p}\left(\mathit{\mu},\mathbf{\Sigma}\right)$ indicates

**(**

*z**p*×1) is Gaussian with mean

**and covariance**

*μ***Σ**(

*p*×

*p*). If

**(**

*z**N*×

*p*) has independent rows and ${\left[{\text{row}}_{i}\left(\mathit{Z}\right)\right]}^{\prime}\sim {\mathcal{N}}_{p}\left({\mathit{\mu}}_{i},\mathbf{\Sigma}\right)$, then $\mathit{S}={\mathit{Z}}^{\prime}\mathit{Z}\sim {\mathcal{W}}_{p}\left(N,\mathbf{\Sigma},\mathit{\Omega}\right)$ indicates

**follows a Wishart distribution with**

*S**N*degrees of freedom, covariance

**Σ**, and noncentrality

**=E(**

*Ω*

*Z*^{′})E(

**)**

*Z***Σ**

^{−1}.

The general linear multivariate model,

assumes *N* independent rows and ${\left[{\text{row}}_{i}\left(\mathit{Y}\right)\right]}^{\prime}\sim {\mathcal{N}}_{p}\left({\left[{\text{row}}_{i}\left(\mathit{X}\right)\mathit{B}\right]}^{\prime},\mathbf{\Sigma}\right)$. In the model, ** X** is the fixed, known design matrix with 1≤rank(

*X*)≤

*q*, and

**contains fixed, unknown regression coefficients. For repeated measures ANOVA, one-group designs have rank (**

*B***)=1, and two-group comparisons have rank (**

*X***)=2. The associated general linear hypothesis is**

*X*suchthat ** C**definesthebetween-subject effects(rank

*a*) while

**defines the within-subject effects (rank**

*U**b*). Requiring estimable

**Θ**and full rank {

**,**

*C***} ensure a testable hypothesis. Appropriate selections of the contrast matrices (**

*U***and**

*C***) and null matrix (**

*U***Θ**

_{0}) allows testing important one-degree-of-freedom parameters, such as the difference between two means, or a comparison of two trends.

For ** M** =

**(**

*C*

*X*^{′}

**)**

*X*^{−}

*C*^{′}, unscaled noncentrality is

**= (**

*Δ***Θ**−

**Θ**

_{0})

^{′}

*M*^{−1}(

**Θ**−

**Θ**

_{0}), scaled noncentrality is $\mathit{\Omega}={\mathit{\Delta \Sigma}}_{\ast}^{-1}$. Here

**Σ**

_{∗}=

*U*^{′}

**=**

*ΣU***Dg (**

*Υ***)**

*λ*

*Υ*^{′}is the covariance matrix among the hypothesis variables, with

*ΥΥ*^{′}=

*Υ*^{′}

**=**

*Υ*

*I*_{ b }, and

**={**

*λ**λ*

_{ k }} the eigenvalues. Estimates are $\stackrel{~}{\mathit{B}}={\left({\mathit{X}}^{\prime}\mathit{X}\right)}^{-}{\mathit{X}}^{\prime}\mathit{Y}$ and $\hat{\mathbf{\Sigma}}={\mathit{Y}}^{\prime}\left[\mathit{I}-{\left({\mathit{X}}^{\prime}\mathit{X}\right)}^{-}{\mathit{X}}^{\prime}\right]\mathit{Y}/{\nu}_{e}$, with

*ν*

_{ e }=

*N*−rank(

**), the error degrees of freedom. Furthermore, $\widehat{\mathit{\Theta}}=\mathit{C}\stackrel{~}{\mathit{B}}\mathit{U}$, $\hat{\mathbf{\Delta}}={(\hat{\mathit{\Theta}}-{\mathit{\Theta}}_{0})}^{\prime}{\mathit{M}}^{-1}(\hat{\mathit{\Theta}}-{\mathit{\Theta}}_{0})\sim {\mathcal{W}}_{b}\left(a,{\mathbf{\Sigma}}_{\ast},\mathit{\Omega}\right)$ and ${\hat{\mathbf{\Sigma}}}_{\ast}={\mathit{U}}^{\prime}\hat{\mathbf{\Sigma}}\mathit{U}$, with ${\nu}_{e}{\hat{\mathbf{\Sigma}}}_{\ast}\sim {\mathcal{W}}_{b}\left({\nu}_{e},{\mathbf{\Sigma}}_{\ast}\right)$. The sum of squares hypothesis matrix is ${\mathit{S}}_{H}=\hat{\mathbf{\Delta}}$ and the sum of squares error matrix is ${\mathit{S}}_{E}={\nu}_{e}{\hat{\mathbf{\Sigma}}}_{\ast}$, which are independent of one another. The notation follows that in Muller and Stewart [6]. Additional notation is in Appendix A: Additional notation.**

*X*The univariate approach to repeated measures can be expressed in terms of the general linear multivariate model. The Box conservative (Box), the Geisser-Greenhouse (GG), the Huynh-Feldt (HF), and the uncorrected (Un) UNIREP tests use the same test statistic,

and a central *F* distribution to approximate the null distribution of *T*
_{
u
},

The sphericity parameter, *∊*=tr^{2}(**Σ**
_{∗})/[*b*tr(**Σ**∗2)], quantifies the spread of population eigenvalues and is used to discount the degrees of freedom. The term *sphericity* reflects the fact that uncorrelated Gaussian variables with equal variances in three dimensions have a spherical scattergram. The eigenvalues of **Σ**
_{∗} are the variances of the (uncorrelated) principal components of the hypothesis response variables. Perfect sphericity requires *∊* = 1, which occurs with all eigenvalues equal. Minimal sphericity has *∊*=1/*b*, which occurs with one nonzero eigenvalue. Other patterns of **Σ**
_{∗} have 1/*b*<*∊*<1.

The Box conservative test uses the fixed, lower bound of *∊*, while the uncorrected test uses the fixed, upper bound of *∊*. With sphericity (*∊*=1), the uncorrected test is exact and uniformly most powerful (among similarly invariant tests). The Geisser-Greenhouse and Huynh-Feldt tests use the observed data to estimate *∊*. The Geisser-Greenhouse estimator, $\hat{\u220a}={\text{tr}}^{2}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)/b\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}^{2}\right)$, is the maximum likelihood (ML) estimator. The Huynh-Feldt estimator, $\stackrel{~}{\u220a}=\left(\mathit{\text{Nb}}\hat{\u220a}-2\right)/\left[b\left({\nu}_{e}-b\hat{\u220a}\right)\right]$ was proposed as the ratio of two unbiased estimators. Their claim holds only for the special case of rank (** X**)=1. Lecoutre [17] provided a more general form. In turn, Gribbin [18] and Chi et al. [19] described a rank-adjusted approximately unbiased estimator, ${\stackrel{~}{\u220a}}_{r}=\left[\left({\nu}_{e}+1\right)b\hat{\u220a}-2\right]/\left[b\left({\nu}_{e}-b\hat{\u220a}\right)\right]$, which applies to any general linear multivariate model. The rank-adjusted power approximation was shown through simulations to approximate observed mean power values as well as, or better than, the Huynh-Feldt power approximation (Chi et al. [19]). Only the rank-adjusted Huynh-Feldt estimator will be considered in the remainder of the paper.

Although the four UNIREP tests all use the same test statistic, they each use a different measure of sphericity, here indicated *e*. For data analysis, all four tests use a critical value $q\left(e\right)={F}_{F}^{-1}\left(1-\alpha ,{\nu}_{1}e,{\nu}_{2}e\right)$. Here *ν*
_{1}=*a* *b* and *ν*
_{2}=*b* *ν*
_{
e
}. The Box test uses *e*=1/*b*, the GG test uses $e=\hat{\u220a}$, HF uses $e=\stackrel{~}{\u220a}$, and the uncorrected test uses *e*=1. The p-value is then computed, for observed test statistic *t*, as *p*=1−*F*(*t*,*ν*
_{1}
*e*,*ν*
_{2}
*e*). In all cases $1/b\le \hat{\u220a}\le \stackrel{~}{\u220a}\le 1$. In turn, the *p*-values always have the reverse order, with the Box *p*-value being largest, and the uncorrected being smallest.

Muller et al. [5] showed that the distribution function of the UNIREP test statistic can be expressed exactly in terms of the distribution function of the sum of *b* positively and *b* negatively weighted independent chi-squares, namely *y*
_{
k
h
}∼*χ*
^{2}(*a*,*ω*
_{
k
}) and *y*
_{
k
e
}∼*χ*
^{2}(*ν*
_{
e
}),

Muller et al. [5] also reported accurate *F* approximations of the form

Here, *y*
_{∗1}∼*χ*
^{2}(*ν*
_{∗1},*ω*
_{∗}), *y*
_{∗2}∼*χ*
^{2}(*ν*
_{∗2}), $\text{tr}\left(\hat{\mathbf{\Delta}}\right)\approx {\lambda}_{\ast 1}{y}_{\ast 1}$ and ${\nu}_{e}\mathit{\text{tr}}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)\approx {\lambda}_{\ast 2}{y}_{\ast 2}$. Parameters *λ*
_{∗1}, *ν*
_{∗1}, *ω*
_{∗}, *λ*
_{∗2}, and *ν*
_{∗2} are defined in Appendix A: Additional notation. Power analysis involves {*λ*
_{
k
}} and {*ω*
_{
k
}}, with

the diagonal elements of the scaled noncentrality, *Ω*_{∗}=*Υ*^{′}
** ΔΥ**Dg (

**)**

*λ*^{−1}=

*Δ*_{∗}Dg (

**)**

*λ*^{−1}.

## Methods

### Estimating approximate UNIREP power with estimated covariance and fixed means

By extending results in Muller et al. [5], the following lemma helps simplify the *F* approximations. Appendix B: Supporting lemmas and proofs contains all proofs.

#### Lemma 1

The constant in the critical value of the UNIREP test statistic approximation introduced by Muller et al. [5] is equal to 1,

Thus,

For known covariance and means, the power approximations for the Box, Geisser-Greenhouse, rank-adjusted Huynh-Feldt, and uncorrected tests are all of the form

Here, $\overline{\lambda}$ is equal to tr(**Σ**
_{∗})/*b* with *b* equal to the rank of **Σ**
_{∗}, and ${\omega}_{\ast}=\text{tr}\left(\mathbf{\Delta}\right)/\left(\overline{\lambda}/{e}_{5}\right)$. Table 1 contains values for *e*
_{1} through *e*
_{5} for the four UNIREP tests when ${\u220a}_{d}=\u220a={\text{tr}}^{2}\left({\mathbf{\Sigma}}_{\ast}\right)/b\text{tr}\left({\mathbf{\Sigma}}_{\ast}^{2}\right)$, and ${\u220a}_{n}=\left[{\text{tr}}^{2}\left({\mathbf{\Sigma}}_{\ast}\right)+2\text{tr}\left({\mathbf{\Sigma}}_{\ast}\right)\text{tr}\left(\mathbf{\Delta}/a\right)\right]\left\{b\left[\text{tr}\left({\mathbf{\Sigma}}_{\ast}^{2}\right)+2\text{tr}({\mathbf{\Sigma}}_{\ast}\mathbf{\Delta}/a)\right]\right\}$. The expressions for *∊*
_{
d
} and *∊*
_{
n
} were derived using the properties described in Lemma B.1.

In practice, some elements of $\{{e}_{1},{e}_{2},{e}_{3},{e}_{4},{e}_{5},\text{tr}(\mathbf{\Delta}),\overline{\lambda}\}$ may be estimated and hence random. The random elements imply random power values, as with estimated covariance and fixed means, $\{{\hat{\mathbf{\Sigma}}}_{\ast},\mathbf{\Delta}\}$, for ${\hat{\mathbf{\Sigma}}}_{\ast}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{\hat{\mathit{E}}}^{\prime}\hat{\mathit{E}}/{\nu}_{\text{est}}$, the unbiased restricted maximum likelihood (REML) estimator. A distinction must be carefully maintained between the estimation study and target study. The estimation study provides the covariance estimate and has sample size *N*
_{est}, design matrix rank of rank (*X*_{est}), and *ν*
_{est}=*N*
_{est}−rank(*X*_{est}) degrees of freedom. The target study for which power is desired has sample size *N*, rank (** X**) and

*ν*

_{ e }=

*N*−rank(

**) degrees of freedom.**

*X*The ML estimator from the Geisser-Greenhouse test, $\hat{\u220a}={\text{tr}}^{2}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)/\left[b\text{tr}\right({\hat{\mathbf{\Sigma}}}_{\ast}^{2}\left)\right]$, is an obvious estimator for the target study’s *∊*. For power analysis, a parallel estimator is available for *∊*
_{
n
}:

A better choice, given in the following lemma, uses a ratio of unbiased estimators. The result generalizes the rank-adjusted Huynh-Feldt estimator for data analysis. Appendix B: Supporting lemmas and proofs has derivations of moments as well as all proofs.

#### Lemma 2

For the non-null case, a ratio estimating *∊*
_{
n
} in terms of correlated, but unbiased, estimators is [b]

The corresponding estimator for the null case is ${\stackrel{~}{\u220a}}_{r}=\left[\left({\nu}_{\text{est}}+1\right)b\hat{\u220a}-2\right]/\left[b\left({\nu}_{\text{est}}-b\hat{\u220a}\right)\right]$, the rank-adjusted Huynh-Feldt sphericity estimator.

For estimated covariance and fixed means, approximate estimated UNIREP power is

with $\overline{\hat{\lambda}}=\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)/b$, and *e*
_{1} through *e*
_{5} estimated if unknown (Table 1). Nearly every combination of ${\hat{\u220a}}_{n}$, ${\stackrel{~}{\u220a}}_{n}$, ${\hat{\u220a}}_{d}$, ${\stackrel{~}{\u220a}}_{r}$, 1 and 1/*b* was examined for each UNIREP test for the wide range of simulations discussed in Muller et al. [5]. The values chosen provided the most accurate results. In retrospect, they are natural choices as well.

### Approximate UNIREP power confidence intervals

The solution to the UNIREP problem parallels the solution to the univariate problem in Taylor and Muller [16]. The methods apply to any general linear hypothesis, including one degree-of-freedom contrasts, such as pair-wise group comparisons and differences in linear trend between two groups. Tests giving scalar secondary parameters are also common for one-group designs and two-group comparisons. For known covariance and means, *e*
_{5} is defined to be *∊*
_{
n
} (Table 1), and the noncentrality in equation 9 is ${\omega}_{\ast}=\left[\text{tr}\left(\mathbf{\Delta}\right)\right]/(\overline{\lambda}/{\u220a}_{n})=\left[\text{tr}\left(\mathbf{\Delta}\right)\right]/{\lambda}_{\ast 1}$ with $\overline{\lambda}=\text{tr}\left({\mathbf{\Sigma}}_{\ast}\right)/b$ and ${\lambda}_{\ast 1}=\overline{\lambda}/{\u220a}_{n}$. For ${\u220a}_{n}=\left[{\text{tr}}^{2}\left({\mathbf{\Sigma}}_{\ast}\right)+2\text{tr}\left({\mathbf{\Sigma}}_{\ast}\right)\text{tr}\left(\mathbf{\Delta}/a\right)\right]\left\{b\left[\text{tr}\left({\mathbf{\Sigma}}_{\ast}^{2}\right)+2\text{tr}({\mathbf{\Sigma}}_{\ast}\mathbf{\Delta}/a)\right]\right\}$. Therefore, it follows that

For estimated covariance and fixed means, a ratio involving one biased and two unbiased estimators (Lemma B.2) for estimating *λ*
_{∗1} may be written as

In parallel to the univariate setting, the distribution of ${\stackrel{~}{\lambda}}_{\ast 1}$ can be approximated with a Satterthwaite approximation: ${\stackrel{~}{\lambda}}_{\ast 1}{\nu}_{\ast}/{\lambda}_{\ast 1}\sim {\chi}^{2}\left({\nu}_{\ast}\right)$ with ${\nu}_{\ast}=\left(b{\nu}_{\text{est}}\right)\xb7{\hat{\u220a}}_{d}/{\stackrel{~}{\u220a}}_{n}$. Lower and upper tail probabilities, *α*
_{
L
} and *α*
_{
U
}, respectively, define the confidence coefficient, *p*
_{
C
L
}=1−*α*
_{
L
}−*α*
_{
U
}. Also, ${c}_{\mathrm{\alpha L}}={F}_{{\chi}^{2}}^{-1}\left({\alpha}_{L};{\nu}_{\ast}\right)$ and ${c}_{\mathrm{\alpha U}}={F}_{{\chi}^{2}}^{-1}\left(1-{\alpha}_{U};{\nu}_{\ast}\right)$. Approximate confidence limits for the noncentrality may be calculated using the following:

Approximate lower and upper bounds are therefore ${\stackrel{~}{\omega}}_{\ast L}=\text{tr}\left(\mathbf{\Delta}\right){c}_{\mathrm{\alpha L}}/{\stackrel{~}{\lambda}}_{\ast 1}{\nu}_{\ast}$ and ${\stackrel{~}{\omega}}_{\ast U}\text{tr}\left(\mathbf{\Delta}\right){c}_{\mathrm{\alpha U}}/{\stackrel{~}{\lambda}}_{\ast 1}{\nu}_{\ast}$. The strict monotone dependence of the noncentral *F* function on the noncentrality ensures an approximate confidence interval for power. Lower and upper bounds on power are, with *e*
_{1} through *e*
_{4} defined in Table 1 for ${\hat{\mathbf{\Sigma}}}_{\ast}$,

and

Taylor and Muller [16] recommended one-sided power confidence intervals by noting that “the change from a one-sided to a two-sided confidence interval has little effect on the upper bound, but a large effect on the lower bound”. Muller and Fetterman [20] provided examples of a one-sided power confidence interval in the univariate case.

### Approximate UNRIEP power confidence regions for power curves

The new methods allow calculating a confidence interval for a single power value. The logic of a proof in Taylor and Muller ([16] equations 2.1-2.13 and surrounding text) guarantees that accurate confidence regions are provided by the point-wise calculations. The proof may be sketched for the present setting as follows. Equations 14-21 establish the validity of the approximate confidence interval for a particular alternative hypothesis, as specified by the scalar constant tr(** Δ**). The randomness in the noncentrality arises from a scalar random variable, ${\stackrel{~}{\lambda}}_{\ast 1}$, analogous to a variance. Equation 19 describes a single event with a specified probability. The inequality defining the event, and the associated probability, do not change for different values of the scalar constant tr(

**). The smooth and strictly monotone dependence of power on the noncentrality ensures the validity of equations 20-21. The proof is completed by noting that the monotonicity extends the simultaneity property to the power confidence region.**

*Δ*Figure 1 gives an example plot of approximate power confidence regions surrounding the predicted power curve for the rank-adjusted Huynh-Feldt test for *∊*=0.720. Graphical representations such as Figure 1 help researchers accurately recognize the amount of uncertainty in their power calculation, and lead to better decisions about design.

In some cases scientists prefer to consider sample size as a function of the pattern of mean differences. The theory already presented allows plotting sample size as a function of mean difference, albeit with a shift in algorithm. The power function must be numerically inverted to solve for the sample size desired. Taylor and Muller [16] outlined the steps of algorithm needed for the univariate case. Details are not presented here for the sake of brevity.

## Results

### Simulation overview

The accuracy of the new approximate confidence intervals isevaluatedfor a widerange of conditions. Appendix C: Simulation details contains more details of the simulations and examples. All simulations were conducted in SAS/IML (SAS 9.1, SAS Institute, 2003) using a version of LINMOD 3.4 modified to include the rank-adjusted Huynh-Feldt estimator and test. Predicted power values and approximate power confidence intervals were computed using a similarly modified version of POWERLIB 2.03. The modified versions of LINMOD and POWERLIB are available at http://www.health-outcomes-policy.ufl.edu/muller/.

### Simulation 1 with rank *(X)=1*(one-group repeated measures ANOVA)

*(X)=1*

The accuracy of the new approximate confidence intervals were evaluated for a completely within-subject design with *p*=9 repeated measures, *N*∈{10,20,40}, and *q*=rank(** X**)=1. Values for

**, contrast matrices**

*B***,**

*C***, and**

*U***Θ**

_{0}were chosen to test a within-subject interaction for

*α*=0.05. The model was chosen to ensure predicted power values for the Geisser-Greenhouse test of 0.20, 0.50, and 0.80, using the power approximation in Muller et al. [5]. Population covariance matrices were chosen to provide

*∊*∈{0.282,0.505,0.720,1.00}. The sphericity values were selected to cover a range of eigenvalue patterns (i.e., patterns of the principal component variances) arising from the structure of

**Σ**

_{∗}. For example, if

*b*=3, then $\mathit{\lambda}={\left[\begin{array}{lll}1& 0.12& 0.12\end{array}\right]}^{\prime}$ gives

*∊*≈0.50. In turn, $\mathit{\lambda}={\left[\begin{array}{lll}1& 0& 0\end{array}\right]}^{\prime}$gives

*∊*=1/3≈0.33. Pseudo-random realizations of the error matrix,

**, were generated and tests were calculated. The observed mean power values for the four UNIREP tests were calculated and tabulated for 500,000 replications per condition.**

*E*For the conditions described above, additional pseudo-random realizations of the error matrix were generated using an estimating study with sample size, *N*
_{est}, of 10 and rank of ** X**, rank (

*X*_{est}), of 1 with 500,000 replications per condition for all four UNIREP tests. Corresponding estimated covariance matrices were calculated, as well as lower and upper bounds for power. Both one- and two-sided confidence intervals were evaluated with target coverages of 90% and 95%. The number of replications gave a standard error of estimated coverage probability less than or equal to 0.0003 for 1−

*α*=0.95, and 0.0004 for 1−

*α*=0.90, nearly guaranteeing 3 digits of accuracy. Only coverage of observed mean power values, and not predicted, was tabulated. The accuracy of the predicted power values, with respect to the observed, made it essentially redundant to consider both.

Only the worst case results for two-sided 95% confidence intervals are presented here. The worst cases occurred with the smallest sample size for the target study. Table 2 contains results for the Box conservative test with a target sample size of 10. For a wide range of sphericity values and target power values, the target 95% estimated coverage is consistently reached. The two cases in which the target coverage is not reached occur with large population sphericity and low power. Under these conditions, the Box conservative test would not be used in practice.

Table 2 also contains coverage results for the Geisser-Greenhouse and the rank-adjusted Huynh-Feldt tests. The target 95% estimated coverage is consistently reached for extreme sphericity values for both tests. For midrange sphericity values, the coverage fell below the target coverage from 0.8% to 7.3% for the Geisser-Greenhouse, and 1.4% to 12.1% for the rank-adjusted Huynh-Feldt. Coverage accuracy improved as the estimated power increased. In practice, lower power values are of little concern. For target power of 0.80 for the Geisser-Greenhouse test, the largest deviation from the target 95% estimated coverage was 2.6% for the Geisser-Greenhouse test and 4.1% for the rank-adjusted Huynh-Feldt test. Both occurred for the population sphericity value of 0.505.

Only a spherical case is appropriate to consider for the uncorrected test because otherwise the test will have inflated test size. Simulation results in Table 3 show that the approximation for the uncorrected test (with sphericity) always reached the target estimated coverage for the uncorrected test. The conservative bias could be eliminated by using optimal maximum likelihood estimates for the common variance and covariance (Morrison [21]), rather than the unstructured covariance estimate. Additional small changes are needed, associated with degrees of freedom, and corresponding to making all choices of *e*
_{1} through *e*
_{5} equal to 1.

Although not presented here, in general, the accuracy of the coverage improved directly with increasing sample size, for all tests and conditions. The accuracy of the approximate confidence bounds for all four UNIREP tests also improved as the population sphericity increased.

### Simulation 2 with rank (*X*)*>1*

*X*

*>1*

All of the simulations in the second example considered the condition of rank of ** X** greater than 1. The cases used

*p*=5 repeated measures,

*N*∈{16,32,48}, and

*q*=rank(

**)∈{2,4,8,16}, corresponding to a three-, five-, nine-, and seventeen-group comparison, respectively. Appropriate fixed matrices of regression coefficients,**

*X***, contrast matrices,**

*B***and**

*C***, and**

*U***Θ**

_{0}were chosen to test a within-subject interaction for a test size,

*α*, of 0.05. The matrices were also chosen to ensure approximate target predicted power values for the rank-adjusted Huynh-Feldt test of 0.20, 0.50, and 0.80. Specific design matrices,

**, were defined. Population covariance matrices were chosen to provide specific population sphericity values,**

*X**∊*∈{0.282,0.505,0.720,1.00}. Observed mean power values were simulated and tabulated in a similar manner to that described in section ‘Simulation 1 with rank

**(one-group repeated measures ANOVA)’.**

*(X)=1*Pseudo-random realizations of the error matrix were generated using an estimating study with sample size, *N*
_{est}, of 16 and rank of ** X**, rank (

*X*_{est}), of 4 with 500,000 replications per condition for all four UNIREP tests. Corresponding estimated covariance matrices were calculated, as well as lower and upper bounds for power using the methods presented in section ‘Approximate UNIREP power confidence intervals UNIREP power confidence intervals’. Approximate confidence interval coverage was defined as the proportion of the 500,000 simulated bound realizations that successfully covered the observed mean power values for each condition described above. Only coverage of observed mean power values, and not predicted, were tabulated. The accuracy of the predicted power values, with respect to the observed, made it essentially redundant to consider both. Both one- and two-sided confidence intervals were evaluated with target coverages of 90% and 95%.

In practical biomedical research, low power values are of little concern. Rarely will one have a power targeted below 0.70. Therefore, only the results for target power values of 0.80 will be presented and discussed. Power confidence interval coverage converged to the target coverage as sample size increased. Only the worst case results for two-sided 95% confidence intervals are presented here. The worst cases occurred with the smallest sample size for the target study, for a variety of population sphericity values and estimated population powers.

In Table 4, the observed mean population powers are presented for the four UNIREP tests for the population sphericity values and ranks of ** X** considered for target rank-adjusted Huynh-Feldt power of 0.80 and sample size of 16 or 48. In general, as the population sphericity increased and rank of

**increased, the observed mean power values for the Box conservative, the Geisser-Greenhouse, and the rank-adjusted Huynh-Feldt tests decreased. Only the Box conservative had severely biased power values as the population sphericity increased.**

*X*In Table 5, the proportion of simulations in which the estimated confidence interval successfully covered the observed mean population power values for each test is shown. The results are based on using an estimating study with sample size, *N*
_{est}, of 16 and rank of ** X**, rank (

*X*_{est}), of 4. In general, the approximate power confidence intervals nearly always reached the target 95% coverage for the Box conservative test. The coverage became more conservative as rank of

**decreased. Similarly, the coverage became more conservative for the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests as rank of**

*X***decreased. The Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests performed adequately in all cases except for the midrange population sphericity value,**

*X**∊*=0.505. The largest deviation from the target 95% estimated coverage was 13.6% and 16.0% for the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests, respectively, which occurred for

*∊*=0.505 and rank of

**equal to 8. The approximate power confidence intervals for the uncorrected test reached the target coverage value for every case considered in which the uncorrected test would be used.**

*X*Although not presented here, in general, as sample size increased the conservative coverage values observed for the Box conservative and the uncorrected tests slowly converged to the target coverage value. This trend was observed for the conservative coverage values with the extreme population sphericity values for the Geisser-Greenhouse and the rank-adjusted Huynh-Feldt tests as well. The same is true of the liberal coverage values observed for the midrange population sphericity values for the Geisser-Greenhouse and the rank-adjusted Huynh-Feldt tests. Similar results were obtained for the target 90% two-sided confidence interval coverage as well as the 95% and 90% one-sided confidence intervals coverage.

The estimated coverages of these tabulated observed mean power values for each test were simulated for population sphericity values of 0.282, 0.505, 0.720, and 1.00. In Table 6, the worst case results from these simulations, which occurred for population sphericity 0.505, are presented. Approximate confidence intervals were simulated for 500,000 replications per condition (standard error of estimated coverage probability less than or equal to 0.0003 for 1−*α*=0.95, and 0.0004 for 1−*α*=0.90). The estimating studies use sample sizes, *N*
_{est}, of 16, 32, and 48, and ranks of *X*_{est} of 2, 4, and 8.

In general, for population sphericity values of 0.282 and 0.505, the approximate power confidence interval coverage for the Box conservative test converged to the target coverage value as rank of *X*_{est} increased, and thus *ν*
_{est} decreased. Coverage decreased as rank of ** X** from the target study increased. For larger rank of

**, the approximate power confidence interval coverage fell short of the target coverage in several instances. No clear trend was apparent as**

*X**N*

_{est}increased. The Box conservative test would not be used for larger population sphericity values. However, the realization that the target coverage was reached in nearly every case considered for the larger population sphericity values is worth mentioning.

The approximate power confidence interval coverages for both the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests seem to have converged to the target coverage value as rank of *X*_{est} increased, and thus *ν*
_{est} decreased, except in cases of sphericity. Such cases have little practical importance since exact results are available if sphericity is valid. Coverage decreased as rank of ** X** from the target study increased. As observed in previous simulations, the approximate power confidence interval coverages for both the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests fell short of the target coverage to varying degrees in nearly every case considered for midrange population sphericity values. This outcome was also observed for larger rank of

**from the target study for population sphericity of 0.282. The approximate power confidence interval coverage for the uncorrected tests reached the target coverage value in every case except for large**

*X**ν*

_{est}and small rank of

**from the target study. The approximate power confidence interval coverage increased as the ranks of**

*X***for both the target and estimating studies increased and as**

*X**N*

_{est}decreased.

The slow convergence of the approximate power confidence interval coverage to the target coverage may be due, in part, to use of ${\stackrel{~}{\u220a}}_{n}$ and ${\stackrel{~}{\u220a}}_{r}$ in the approximate power confidence interval equation. These estimators of the sphericity parameter are ratios of unbiased estimators for the non-null and null cases, respectively. The variances of these estimators are much larger than the variances for ${\hat{\u220a}}_{n}$ and ${\hat{\u220a}}_{d}$. The larger variances may account for the slow convergence to the population power as the target and estimating study sample sizes and degrees of freedom increase. Further simulations may be needed to confirm this reasoning.

### Alternate approximations

In attempts to develop even better confidence bound estimates, additional approximations were developed and evaluated. One approach approximated the distribution of ${\stackrel{~}{\lambda}}_{\ast 1}$ with an *F*. Using the methods presented in Kim et al. [22], the numerator of ${\stackrel{~}{\lambda}}_{\ast 1}$ was approximated with a weighted noncentral chi-square, while the denominator was approximated with a weighted central chi-square. Two concerns arose. First, the denominator is not necessarily a central quadratic. The 2tr(** Δ**/

*a*) component makes the denominator more of a shifted central quadratic. Second, the Kim et al. [22] result requires that the components of the numerator and denominator be mutually independent, which does not hold. Simulations demonstrated that the approximation was inaccurate in small samples.

Alternative approximations were developed and evaluated. The alternatives matched only the numerator to a weighted noncentral chi-square or to a weighted central chi-square with the denominator a constant equal to E$\left[\text{tr}\right({\hat{\mathbf{\Sigma}}}_{\ast})+2\text{tr}(\mathbf{\Delta}/a\left)\right]$. All were less accurate than the approximation presented here.

## Discussion

Even for small sample sizes, the proposed power confidence intervals attain very accurate coverage probabilities for the Box conservative test in all cases and for the uncorrected test with *∊*=1 (the only case for which it should be used). The result is also true for the extreme population sphericity values for the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests. For midrange population sphericity values, the coverage probabilities of the approximate power confidence intervals for the latter two tests often fell somewhat short of the various target coverage values considered. Coverage probabilities improve as sample size increases. Accuracy is better for higher target power values than for lower, which makes the results useful in practice. One-sided confidence intervals are recommended for lower bounds on power.

The techniques also allow plotting power confidence regions around an estimated power curve (Figure 1). The resulting plots have been well received by researchers.

## Conclusions

Good statistical practice requires associating a credible measure of uncertainty with any parameter estimate. We described and evaluated new methods to meet the need for UNIREP power estimates based on an estimated covariance with fixed means. Across a large range of conditions, the methods provide reasonably accurate coverage for all four UNIREP tests.

## Appendix

### Appendix A: Additional notation

The additional notation comes from Muller et al. [5], who showed that if ${S}_{t1}={\sum}_{k=1}^{b}{\lambda}_{k}$, ${S}_{t2}={\sum}_{k=1}^{b}{\lambda}_{k}^{2}$, ${S}_{t3}={\sum}_{k=1}^{b}{\lambda}_{k}{\omega}_{k}$ and ${S}_{t4}={\sum}_{k=1}^{b}{\lambda}_{k}^{2}{\omega}_{k}$, then

They used the parameters (assumed known) to approximate the UNIREP test statistic with a noncentral *F* distribution, as presented in equation 6.

## Appendix B: Supporting lemmas and proofs

### Lemma B.1

A.1-A.5 imply $\left\{{S}_{t1},{S}_{t2},{S}_{t3},{S}_{t4}\right\}=\phantom{\rule{2pt}{0ex}}\left\{\text{tr}\left({\mathbf{\Sigma}}_{\ast}\right),\right.\left(\right)close="\}">\text{tr}\left({\mathbf{\Sigma}}_{\ast}^{2}\right),\text{tr}\left(\mathbf{\Delta}\right),\text{tr}\left({\mathbf{\Sigma}}_{\ast}\mathbf{\Delta}\right)$.

### Proof of lemma B.1

### Lemma B.2

The first moments of $\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)$, $\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}^{2}\right)$, ${\text{tr}}^{2}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)$, and $\text{tr}\left(\mathbf{\Delta}{\hat{\mathbf{\Sigma}}}_{\ast}\right)$ are known.

### Proof of lemma B.2

Following Wishart [23], $\mathit{S}={\nu}_{e}{\hat{\mathbf{\Sigma}}}_{\ast}\sim {\mathcal{W}}_{b}\left({\nu}_{e},{\mathbf{\Sigma}}_{\ast}\right)$, such that *ν*
_{
e
}=*N*−rank(** X**). For Wishart ${\u3008\mathbf{\Sigma}\u3009}_{\mathit{\text{jj}}}={\sigma}_{j}^{2}$ while here 〈

**Σ**〉

_{ j j }=

*σ*

_{ j j }and

*ρ*

_{ j k }=

*σ*

_{ j k }/(

*σ*

_{ j j }

*σ*

_{ k k })

^{1/2}. Wishart [23] said, with emphasis not in the original, “ …moment coefficients are in all cases

*except the first*calculated about the mean of the sample …”. Here

*μ*(

*n*) indicates the expression in equation

*n*at the end of Wishart [23], $\text{E}\left[\text{tr}\left(\mathit{S}\right)\right]=\text{E}\left({\sum}_{{j=1}^{b}}{s}_{\mathit{\text{jj}}}\right)={\sum}_{j=1}^{b}\text{E}\left({s}_{\mathit{\text{jj}}}\right)={\sum}_{j=1}^{b}\mu {\left(1\right)}_{j}={\sum}_{j=1}^{b}{\nu}_{e}{\sigma}_{\mathit{\text{jj}}}$. Thus

With ${\mathit{S}}^{2}={\left({\nu}_{e}{\hat{\mathbf{\Sigma}}}_{\ast}\right)}^{2}$, $\text{E}\left[\text{tr}\left({\mathit{S}}^{2}\right)\right]=\text{E}\left[\left({\sum}_{j=1}^{b}{\sum}_{k=1}^{b}{s}_{\mathit{\text{jk}}}^{2}\right)\right]={\sum}_{j=1}^{b}{\sum}_{k=1}^{b}\text{E}\left({s}_{\mathit{\text{jk}}}^{2}\right)$. In turn $\text{E}\left({s}_{\mathit{\text{jk}}}^{2}\right)=\text{E}\left\{{\left[\left[{s}_{\mathit{\text{jk}}}-\text{E}\left({s}_{\mathit{\text{jk}}}\right)\right]+\text{E}\left({s}_{\mathit{\text{jk}}}\right)\right]}^{2}\right\}=\text{E}\left\{{\left[{s}_{\mathit{\text{jk}}}-\mathrm{E}\left({s}_{\mathit{\text{jk}}}\right)\right]}^{2}+2\text{E}\left({s}_{\mathit{\text{jk}}}\right)\left[{s}_{\mathit{\text{jk}}}-\mathrm{E}\left({s}_{\mathit{\text{jk}}}\right)\right]+{\left[\text{E}\left({s}_{\mathit{\text{jk}}}\right)\right]}^{2}\right\}=\text{E}\left\{{\left[{s}_{\mathit{\text{jk}}}-\mathrm{E}\left({s}_{\mathit{\text{jk}}}\right)\right]}^{2}\right\}+2\text{E}\left({s}_{\mathit{\text{jk}}}\right)\text{E}\left[{s}_{\mathit{\text{jk}}}-\mathrm{E}\left({s}_{\mathit{\text{jk}}}\right)\right]+{\left[\text{E}\left({s}_{\mathit{\text{jk}}}\right)\right]}^{2}=\text{E}\left\{{\left[{s}_{\mathit{\text{jk}}}-\mathrm{E}\left({s}_{\mathit{\text{jk}}}\right)\right]}^{2}\right\}+{\left[\text{E}\left({s}_{\mathit{\text{jk}}}\right)\right]}^{2}$. Also:

Hence

Here $\text{E}\left[{\text{tr}}^{2}\left(\mathit{S}\right)\right]=\text{E}\left[\text{tr}\left(\mathit{S}\right)\text{tr}\left(\mathit{S}\right)\right]=\text{E}\left[\left(\sum _{j=1}^{b}{s}_{\mathit{\text{jj}}}\right)\left(\sum _{k=1}^{b}{s}_{\mathit{\text{kk}}}\right)\right]=\text{E}\left({\sum}_{j=1}^{b}{\sum}_{k=1}^{b}{s}_{\mathit{\text{jj}}}{s}_{\mathit{\text{kk}}}\right)={\sum}_{j=1}^{b}{\sum}_{k=1}^{b}\text{E}\left({s}_{\mathit{\text{jj}}}{s}_{\mathit{\text{kk}}}\right)$, with $\text{E}\left[{s}_{\mathit{\text{jj}}}{s}_{\mathit{\text{kk}}}\right]=\mu {\left(4\right)}_{\mathit{\text{jk}}}=2{\nu}_{e}{\sigma}_{\mathit{\text{jk}}}^{2}+{\nu}_{e}^{2}{\sigma}_{\mathit{\text{jj}}}{\sigma}_{\mathit{\text{kk}}}$. Thus,

Finally, $\mathit{S}={\nu}_{e}{\hat{\mathbf{\Sigma}}}_{\ast}\sim {\mathcal{W}}_{b}\left({\nu}_{e},{\mathbf{\Sigma}}_{\ast}\right)$ has E (** S**)=

*ν*

_{ e }

**Σ**

_{∗}(Muller and Stewart [6], Theorem 10.10). Hence E [tr(

**)]=tr[E(**

*ΔS***)]=tr[**

*ΔS***E(**

*Δ***)]=tr[**

*S***(**

*Δ**ν*

_{ e }

**Σ**

_{∗})]=

*ν*

_{ e }tr(

*ΔΣ*_{∗}) and

### Proof of lemma 1

Substituting equivalent terms from equations A.1-A.5 into (*λ*
_{∗2}
*a* *b* *ν*
_{∗2})/(*λ*
_{∗1}
*ν*
_{∗1}
*b* *ν*
_{
e
}) allows combining like terms and simplifying to give

If ${T}_{u}=\left[\text{tr}\right(\hat{\mathbf{\Delta}}\left)/a\right]/\left[\text{tr}\right({\hat{\mathbf{\Sigma}}}_{\ast}\left)\right]$, then $\mathit{\text{tr}}\left(\hat{\mathbf{\Delta}}\right)\approx {\lambda}_{\ast 1}{y}_{\ast 1}$ and ${\nu}_{e}\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)\approx {\lambda}_{\ast 2}{y}_{\ast 2}$ with *y*
_{∗1}∼*χ*
^{2}(*ν*
_{∗1},*ω*
_{∗}) and *y*
_{∗2}∼*χ*
^{2}(*ν*
_{∗2}) as described in Muller et al. [5]. In turn,

Pr{(*y*
_{∗1}/*ν*
_{∗1})/(*y*
_{∗2}/*ν*
_{∗2})≤*t* *λ*
_{∗2}
*a* *b* *ν*
_{∗2}/(*λ*
_{∗1}
*ν*
_{∗1}
*b* *ν*
_{
e
})}=*F*
_{
F
}(*t*;*ν*
_{∗1},*ν*
_{∗2},*ω*
_{∗}).

### Proof of lemma 2

Using Lemma B.2, unbiased estimators for tr^{2}(**Σ**
_{∗}) and $\text{tr}\left({\mathbf{\Sigma}}_{\ast}^{2}\right)$ are ${\hat{\tau}}_{1}=\phantom{\rule{1em}{0ex}}\left[{\text{tr}}^{2}\right({\hat{\mathbf{\Sigma}}}_{\ast})-2{\left({\nu}_{e}+1\right)}^{-1}\text{tr}({\hat{\mathbf{\Sigma}}}_{\ast}^{2}\left)\right]{\{1-2{\left[{\nu}_{e}\left({\nu}_{e}+1\right)\right]}^{-1}\}}^{-1}$ and ${\hat{\tau}}_{2}=\left[{\nu}_{e}^{2}\text{tr}\right({\hat{\mathbf{\Sigma}}}_{\ast}^{2})-{\nu}_{e}{\text{tr}}^{2}({\hat{\mathbf{\Sigma}}}_{\ast}\left)\right]{\left[{\nu}_{e}\left({\nu}_{e}+1\right)-2\right]}^{-1}$, as introduced in Gribbin [18] and Chi et al. [19]. As shown in Lemma B.2, $\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)$ and $\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}\mathbf{\Delta}\right)$ are unbiased estimators for $\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}\right)$ and $\text{tr}\left({\hat{\mathbf{\Sigma}}}_{\ast}\mathbf{\Delta}\right)$, respectively. Thus,

In the null case ** Δ**=

**and ${\stackrel{~}{\u220a}}_{n}$ reduces to the rank-adjusted sphericity estimator, ${\stackrel{~}{\u220a}}_{r}=\left[\left({\nu}_{e}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}1\right)\right.\left(\right)close="]">\phantom{\rule{0.3em}{0ex}}b\hat{\u220a}-2$ $\left[b\left({\nu}_{e}-b\hat{\u220a}\right)\right]={\stackrel{~}{\u220a}}_{n}|\mathbf{\Delta}=0$.**

*0*## Appendix C: Simulation details

### Covariance conditions

Covariance conditions 5-8 from Table III of Coffey and Muller [24] were used for each example described below: **Σ**
_{∗}=Dg(*λ*_{
j
}) for *j*∈{1,2,3,4}, with

Thus, *∊*∈{0.28,0.51,0.72,1.00}. Given **Σ**
_{∗}=Dg(*λ*_{
j
}), it follows that **Σ**=*UΣ*_{∗}
*U*^{′}.

### CLAHE mammography example

Computer scientists developed the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm to improve contrast in medical images. Independent observers considered 3×3=9 Clip ×Region combinations. Region denotes the size of the image (pixels^{2}) at which contrasts are controlled and Clip level limits the maximum contrast adjustment. In the multivariate model ** X**=

*1*_{ N }, while within-person factors Clip and Region gave

**, (**

*Y**N*×9). Also

**, (1×9), contained mean log10(contrast) for the unprocessed condition minus the mean for each of the nine combinations of Clip and Region (**

*B**β*

_{cr}=

*μ*

_{unprocessed}−

*μ*

_{cr}). If

*T*_{c}contains orthonormal linear and quadratic trends for log2(Clip) ∈ {1,2,4}, and

*T*_{r}does the same for log2(Region) ∈ {1,3,5}, then the 9 × 4 within-persons contrast matrix,

*U*_{cr}is

With L the linear and Q the quadratic trends for interaction components being tested, ${\mathit{U}}_{\text{cr}}=\left[\begin{array}{llll}{\mathit{u}}_{\text{LL}}& {\mathit{u}}_{\text{LQ}}& {\mathit{u}}_{\text{QL}}& {\mathit{u}}_{\text{QQ}}\end{array}\right]$.

All four covariance patterns were factorially combined with *N*∈{10,20,40}. The multivariate test considered ${\mathit{\Theta}}_{\text{cr}}={\beta}_{P}\xb7\left[\begin{array}{llll}0.5& 1.0& -1.0& 0.5\end{array}\right]$ with *α*=0.05, and *β*
_{
P
} the scaling factor for ** B** corresponding to approximate target power

*P*∈{0.20,0.50,0.80} for the Geisser-Greenhouse approximation using methods in Muller et al. [5]. The conditions in the example were used in section ‘Simulation 1 with rank

**(one-group repeated measures ANOVA)’. More details of the example are in Muller et al. [5].**

*(X)=1*### Test of interaction with rank (*X*)*>1*Example

*X*

*>1*

All cases used 5 repeated measures, *N*∈{16,32,48}, and rank (** X**)∈{2,4,8,16}. For obvious reasons, a rank of

**equal to 16 was not considered for the smallest sample size. All four covariance patterns were factorially combined with the sample sizes and ranks**

*X***. In the multivariate model,**

*X***=**

*X*

*I*_{ q }⊗

*1*_{ r e p n }, such that

*r*

*e*

*p*

*n*=

*N*/

*q*, and ⊗ is a Kronecker product. If

then ** B**=

*β*

_{ P }·

*D*_{ a }, such that

*β*

_{ P }was the scaling factor giving approximate target power

*P*∈{0.20,0.50,0.80}, for the rank-adjusted Huynh-Feldt power approximation. The within-subject contrast,

**, (5×4), contained orthonormal linear, quadratic, cubic and quartic trends:**

*U*Each row of the between-subject contrast, ** C**, a (

*q*−1×

*q*) orthonormal matrix, contained one of the (

*q*−1) trends. The contrasts define a test of interaction of between- and within-subject trends. Without loss of generality, we assumed

**Θ**

_{0}=

**. A test size,**

*0**α*, of 0.05 was used.

### Computational methods

All power computations were conducted in SAS/IML (SAS 9.1, SAS Institute, 2003). Free software LINMOD 3.4 was used for all data analysis and includes new methods. Free software POWERLIB 2.1 in Johnson et al. [25] was used for all power analysis and includes the new methods. Both are available at http://health-outcomes-policy.ufl.edu/faculty-directory/-muller-keith/list-of-software/. UNIREP power is also available in GLIMMPSE, a free web-browser based program with a graphical user interface aimed at health scientists ( http://www.SampleSizeShop.org). The next version of GLIMMPSE is expected to implement the confidence interval methods.

## References

- 1.
Kraemer HC, Mintz J, Noda A, Tinklenberg J, Jerome A, Yesavage: Caution regarding the use of pilot studies to guide power calculations for study proposals. Arch Gen Psychiatry. 2006, 63: 484-489. 10.1001/archpsyc.63.5.484.

- 2.
Maxwell SE: The persistence of underpowered studies in psychological research:causes, consequences, and remedies. Psychol Methods. 2004, 9: 147-163.

- 3.
Taylor DJ, Muller KE: Bias in linear model power and sample size calculation due to estimating noncentrality. Commun Stat Theory Methods. 1996, 25: 1595-1610. 10.1080/03610929608831787.

- 4.
Muller KE, Pasour VB: Bias in linear model power and sample size due to estimating variance. Commun Stat Theory Methods. 1997, 26: 839-851. 10.1080/03610929708831953.

- 5.
Muller KE, Edwards LJ, Simpson SL, Taylor DJ: Statistical tests with accurate size and power for balance linear mixed models. Stat Med. 2007, 26: 3639-3660. 10.1002/sim.2827.

- 6.
Muller KE, Stewart PW: Linear Model, Theory: Univariate, Multivariate and Mixed Models. 2006, New York: Wiley

- 7.
Box GEP: Some theorems on quadratic forms applied in the study of analysis of variance problems: I. Effects of inequality of variance in a one-way classification. Ann Math Stat. 1954, 25: 290-302. 10.1214/aoms/1177728786.

- 8.
Box GEP: Some theorems on quadratic forms applied in the study of analysis of variance problems: II. Effects of inequality of variance and of correlation between errors in the two-way classification. Ann Math Stat. 1954, 25: 484-498. 10.1214/aoms/1177728717.

- 9.
Geisser S, Greenhouse SW: An extension of Box’s results on the use of the F distribution in multivariate analysis. Ann Math Stat. 1958, 29: 885-891. 10.1214/aoms/1177706545.

- 10.
Greenhouse SW, Geisser S: On methods in the analysis of profile data. Psychometrika. 1959, 24: 95-112. 10.1007/BF02289823.

- 11.
Huynh H, Feldt LS: Estimation of the Box corrections for degrees of freedom from sample data in randomized block and split-plot designs. J Edu Stat. 1976, 1: 69-82.

- 12.
Davies RB: Distribution of a linear combination of chi-square random variables. Appl Stat. 1980, 29: 323-333. 10.2307/2346911.

- 13.
Muller KE, Barton CN: Approximate power for repeated measures ANOVA lacking sphericity. J Am Stat Assoc. 1989, 84: 549-555. 10.1080/01621459.1989.10478802.

- 14.
Muller KE, Barton CN: Correction to Approximate power for repeated-measures ANOVA lacking sphericity. J Am Stat Assoc. 1991, 86: 255-256.

- 15.
Browne RH: On the use of a pilot sample for sample size determination. Stat Med. 1995, 14: 1933-1940. 10.1002/sim.4780141709.

- 16.
Taylor DJ, Muller KE: Computing confidence bounds for power and sample size of the general linear univariate model. Am Stat. 1995, 49: 43-47.

- 17.
Lecoutre B: A correction for the $\stackrel{~}{\u220a}$, approximate test with repeated measures design with two or more independent groups. J Educ Stat. 1991, 16: 371-372.

- 18.
Gribbin MJ: Better power methods for the univariate approach to repeated measures. PhD Dissertation. 2007, University of North Carolina at Chapel Hill, Department of Biostatistics

- 19.
Chi YY, Gribbin MJ, Lamers Y, Gregory JF, Muller KE: Global hypothesis testing for high-dimensional repeated measures outcomes. Stat Med. 2012, 31: 724-742. 10.1002/sim.4435.

- 20.
Muller KE, Fetterman BA: Regression and, ANOVA: An Integrated Approach Using SAS®; Software Chapter 17. 2002, Cary: SAS Institute

- 21.
Morrison DF: Multivariate Statistical, Methods. 1990, New York: McGraw-Hill

- 22.
Kim HY, Gribbin MJ, Muller KE, Taylor DJ: Analytic, computational and approximate forms for ratios of noncentral and central Gaussian quadratic forms. J Comput Graphical Stat. 2006, 15: 443-459. 10.1198/106186006X112954.

- 23.
Wishart J: The generalized product moment distribution in samples from a normal multivariate population. Biometrika. 1928, 20A: 32-52.

- 24.
Coffey CS, Muller KE: Properties of internal pilots with the univariate approach to repeated measures. Stat Med. 2003, 22: 2469-2485. 10.1002/sim.1466.

- 25.
Johnson JL, Muller KE, Slaughter JC, Gurka MJ, Gribbin MJ, and Simpson SL: POWERLIB: SAS/IML software for computing power in multivariate linear models. J Stat Softw. 2009, 30: 1-27. [http://www.jstatsoft.org/v30/i05/paper]

### Pre-publication history

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/13/57/prepub

## Acknowledgements

We thank the reviewers Drs. Warren Comulada, Peter Lachenbruch, and Russell Lenth for their timely review and beneficial comments. This work was supported in part by a UF CTSI core grant to Chi and Muller (NCATS UL1TR000064). Chi’s support included NIDCR R01-DE020832, NIDDK R01-DK072398, NIDA R01-DA031017, and NICHD P01-HD065647. Muller’s support included NIDCR R01-DE020832-01A1, NIDDK R01-DK072398, NIDCR U54-DE019261, NCRR K30-RR022258, NHLBI R01-HL091005, NIAAA R01-AA013458-01, and NIDA R01-DA031017.

## Author information

## Additional information

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

The majority of the work comes directly from the dissertation of MJG at UNC Chapel Hill, in partial fulfillment of the requirements of the DrPH, under the direction of KEM. Dr. KEM and Dr. Y-YC assisted in revising and compressing the notation, text and tables. Dr. PWS was a member of the dissertation committee and assisted in formulating the proof of simultaneous coverage of the confidence regions and in final editing. All authors read and approved the final manuscript.

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

#### Received

#### Accepted

#### Published

#### DOI

### Keywords

- Sample size
- Replication study
- Study planning
- Univariate approach
- UNIREP