Confidence regions for repeated measures ANOVA power curves based on estimated covariance

Background Using covariance or mean estimates from previous data introduces randomness into each power value in a power curve. Creating confidence intervals about the power estimates improves study planning by allowing scientists to account for the uncertainty in the power estimates. Driving examples arise in many imaging applications. Methods We use both analytical and Monte Carlo simulation methods. Our analytical derivations apply to power for tests with the univariate approach to repeated measures (UNIREP). Approximate confidence intervals and regions for power based on an estimated covariance matrix and fixed means are described. Extensive simulations are used to examine the properties of the approximations. Results Closed-form expressions are given for approximate power and confidence intervals and regions. Monte Carlo simulations support the accuracy of the approximations for practical ranges of sample size, rank of the design matrix, error degrees of freedom, and the amount of deviation from sphericity. The new methods provide accurate coverage probabilities for all four UNIREP tests, even for small sample sizes. Accuracy is higher for higher power values than for lower power values, making the methods especially useful in practical research conditions. The new techniques allow the plotting of power confidence regions around an estimated power curve, an approach that has been well received by researchers. Free software makes the new methods readily available. Conclusions The new techniques allow a convenient way to account for the uncertainty of using an estimated covariance matrix in choosing a sample size for a repeated measures ANOVA design. Medical imaging and many other types of healthcare research often use repeated measures ANOVA.

http://www.biomedcentral.com/1471-2288/13/57 useful for study planning. For example, a lower bound for power would allow stating that a test has power of at least "P" to detect an effect, with a specified confidence. A confidence region for a power curve would be even more informative.
Medical imaging research motivated the work here because it often generates the type of complete data that can be handled with the univariate approach to repeated measures (UNIREP). Muller, et al. [5] reviewed the advantages gained by being able to use the UNIREP model, a special case of the general linear mixed model. The same authors described accurate and convenient power approximations for UNIREP analysis. The four UNIREP tests, Box conservative, Geisser-Greenhouse, Huynh-Feldt, and the uncorrected, all use the same test statistic. For data analysis, UNIREP tests differ only by their respective degrees of freedom due to different degrees of freedom multipliers, which measure sphericity in the error covariance for the hypothesis variables. Muller and Stewart [6] provided detailed discussion of the basic theory for both the null and non-null cases. Earlier work detailed basic UNIREP theory. Box [7,8], Geisser and Greenhouse [9,10] and Huynh and Feldt [11] gave null results. Davies [12] and Muller and Barton [13,14] treated the non-null case.
Browne [15] evaluated the impact of using a pilot study to estimate the variance for a t-test. More generally, Taylor and Muller [16] demonstrated how to construct exact power confidence intervals for the general linear univariate model for a data-estimated variance and fixed means. The same authors also generalized the result to provide an exact confidence region around a power curve. Parallel results for the UNIREP setting would be equally useful. We generalize the methods in Taylor and Muller [16] to UNIREP tests for repeated measures. We use analytic and simulation results to demonstrate that the techniques allow computing approximate confidence intervals and regions for power with good accuracy for the UNIREP tests, based on an estimated covariance matrix and fixed means.

Existing results
A vector z, (n × 1), is lower case bold. A matrix, Z, is upper case bold with transpose Z , inverse Z −1 and generalized inverse Z − . Also, 1 n is an (n × 1) vector of 1's and I n is an (n × n) identity matrix. A diagonal matrix with (i, i) element z i is written Dg(z). The expected value, variance, and trace are E (Z), V (Z), and tr(Z), respectively. Throughout, Z ∼ χ 2 (ν, ω) indicates that Z has a noncentral chi-square distribution with ν degrees of freedom and noncentrality ω, while Z ∼ χ 2 (ν) indicates a central distribution. Similarly, Z ∼ F (ν 1 , ν 2 , ω) indicates X has a noncentral F distribution with ν 1 numerator and ν 2 denominator degrees of freedom, and noncentrality ω with cumulative distribution function F F (ν 1 , ν 2 , ω). A central F is written Z ∼ F (ν 1 , ν 2 ) with quantile q indicated F −1 F (q; ν 1 , ν 2 ). Writing z ∼ N p (μ, ) indicates z (p × 1) is Gaussian with mean μ and covariance (p × p). If Z (N × p) has independent rows and [row i (Z)] ∼ N p μ i , , then S = Z Z ∼ W p (N, , ) indicates S follows a Wishart distribution with N degrees of freedom, covariance , and noncentrality = E Z E (Z) −1 .
The general linear multivariate model, assumes N independent rows and [row i (Y )] ∼ N p [row i (X) B] , . In the model, X is the fixed, known design matrix with 1 ≤ rank(X) ≤ q, and B contains fixed, unknown regression coefficients. For repeated measures ANOVA, one-group designs have rank(X) = 1, and two-group comparisons have rank(X) = 2. The associated general linear hypothesis is such that C defines the between-subject effects (rank a) while U defines the within-subject effects (rank b). Requiring estimable and full rank {C, U} ensure a testable hypothesis. Appropriate selections of the contrast matrices (C and U) and null matrix ( 0 ) allows testing important one-degree-of-freedom parameters, such as the difference between two means, or a comparison of two trends. For is the covariance matrix among the hypothesis variables, with ϒϒ = ϒ ϒ = I b , and λ = {λ k } the eigenvalues. Estimates are , the error degrees of freedom. Furthermore, The sum of squares hypothesis matrix is S H = and the sum of squares error matrix is S E = ν e * , which are independent of one another. The notation follows that in Muller and Stewart [6]. Additional notation is in Appendix A. http://www.biomedcentral.com/1471-2288/13/57 The univariate approach to repeated measures can be expressed in terms of the general linear multivariate model. The Box conservative (Box), the Geisser-Greenhouse (GG), the Huynh-Feldt (HF), and the uncorrected (Un) UNIREP tests use the same test statistic, and a central F distribution to approximate the null distribution of T u , The sphericity parameter, = tr 2 ( * )/[ btr( 2 * )], quantifies the spread of population eigenvalues and is used to discount the degrees of freedom. The term sphericity reflects the fact that uncorrelated Gaussian variables with equal variances in three dimensions have a spherical scattergram. The eigenvalues of * are the variances of the (uncorrelated) principal components of the hypothesis response variables. Perfect sphericity requires = 1, which occurs with all eigenvalues equal. Minimal sphericity has = 1/b, which occurs with one nonzero eigenvalue. Other patterns of * have 1/b < < 1.
The Box conservative test uses the fixed, lower bound of , while the uncorrected test uses the fixed, upper bound of . With sphericity ( = 1), the uncorrected test is exact and uniformly most powerful (among similarly invariant tests). The Geisser-Greenhouse and Huynh-Feldt tests use the observed data to estimate . The Geisser-Greenhouse estimator, = tr 2 ( * )/btr( 2 * ), is the maximum likelihood (ML) estimator. The Huynh-Feldt estimator, ] was proposed as the ratio of two unbiased estimators. Their claim holds only for the special case of rank(X) = 1. Lecoutre [17] provided a more general form. In turn, Gribbin [18] and Chi et al. [19] described a rank-adjusted approximately unbiased , which applies to any general linear multivariate model. The rankadjusted power approximation was shown through simulations to approximate observed mean power values as well as, or better than, the Huynh-Feldt power approximation (Chi et al. [19]). Only the rank-adjusted Huynh-Feldt estimator will be considered in the remainder of the paper.
Although the four UNIREP tests all use the same test statistic, they each use a different measure of sphericity, here indicated e. For data analysis, all four tests use a critical value q(e) = F −1 F (1 − α, ν 1 e, ν 2 e). Here ν 1 = ab and ν 2 = bν e . The Box test uses e = 1/b, the GG test uses e = , HF uses e =˜ , and the uncorrected test uses e = 1. The p-value is then computed, for observed test statistic t, as p = 1 − F (t, ν 1 e, ν 2 e). In all cases 1/b ≤ ≤˜ ≤ 1. In turn, the p-values always have the reverse order, with the Box p-value being largest, and the uncorrected being smallest.

Estimating approximate UNIREP power with estimated covariance and fixed means
By extending results in Muller et al. [5], the following lemma helps simplify the F approximations. Appendix B contains all proofs. Thus, For known covariance and means, the power approximations for the Box, Geisser-Greenhouse, rank-adjusted Huynh-Feldt, and uncorrected tests are all of the form Here, λ is equal to tr ( * ) /b with b equal to the rank of * , and ω * = tr ( ) / λ/e 5 . Table 1 contains values for e 1 through e 5 for the four UNIREP tests when d = = tr 2 ( * )/btr( 2 * ), and n = tr 2 ( * ) + 2tr( * )tr ( /a) b tr( 2 * ) + 2tr( * /a) . The expressions for d and n were derived using the properties described in Lemma B.1.
In practice, some elements of {e 1 , e 2 , e 3 , e 4 , e 5 , tr( ), λ} may be estimated and hence random. The random elements imply random power values, as with estimated covariance and fixed means, { * , }, for * = E E/ν est , the unbiased restricted maximum likelihood (REML) estimator. A distinction must be carefully maintained between the estimation study and target study. The estimation study provides the covariance estimate and has sample size N est , design matrix rank of rank(X est ), and ν est = N est − rank (X est ) degrees of freedom. The target study for which power is desired has sample size N, rank(X) and ν e = N − rank (X) degrees of freedom.
The ML estimator from the Geisser-Greenhouse test, = tr 2 ( * )/[ btr( 2 * )], is an obvious estimator for the target study's . For power analysis, a parallel estimator is available for n : A better choice, given in the following lemma, uses a ratio of unbiased estimators. The result generalizes the rank-adjusted Huynh-Feldt estimator for data analysis. Appendix B has derivations of moments as well as all proofs.

Lemma 2.
For the non-null case, a ratio estimating n in terms of correlated, but unbiased, estimators is The corresponding estimator for the null case is , the rank-adjusted Huynh-Feldt sphericity estimator.
For estimated covariance and fixed means, approximate estimated UNIREP power is with λ = tr( * )/b, and e 1 through e 5 estimated if unknown (Table 1). Nearly every combination of n , n , d , r , 1 and 1/b was examined for each UNIREP test for the wide range of simulations discussed in Muller et al. [5]. The values chosen provided the most accurate results. In retrospect, they are natural choices as well.

Approximate UNIREP power confidence intervals
The solution to the UNIREP problem parallels the solution to the univariate problem in Taylor and Muller [16]. The methods apply to any general linear hypothesis, including one degree-of-freedom contrasts, such as pair-wise group comparisons and differences in linear trend between two groups. Tests giving scalar secondary parameters are also common for one-group designs and two-group comparisons. For known covariance and means, e 5 is defined to be n (Table 1), and the noncentrality in equation For estimated covariance and fixed means, a ratio involving one biased and two unbiased estimators (Lemma B.2) for estimating λ * 1 may be written as In parallel to the univariate setting, the distribution of λ * 1 can be approximated with a Satterthwaite approximation: Lower and upper tail probabilities, α L and α U , respectively, define the confidence coefficient, . Approximate confidence limits for the noncentrality may be calculated using the following: Pr Approximate lower and upper bounds are therefore ω * L = tr ( ) c αL / λ * 1 ν * and ω * U tr ( ) c αU / λ * 1 ν * . The strict monotone dependence of the noncentral F function on the noncentrality ensures an approximate confidence interval for power. Lower and upper bounds on power are, with e 1 through e 4 defined in Table 1 for * , and Taylor and Muller [16] recommended one-sided power confidence intervals by noting that "the change from a one-sided to a two-sided confidence interval has little effect on the upper bound, but a large effect on the lower bound". Muller and Fetterman [20] provided examples of a one-sided power confidence interval in the univariate case.

Approximate UNRIEP power confidence regions for power curves
The new methods allow calculating a confidence interval for a single power value. The logic of a proof in Taylor   guarantees that accurate confidence regions are provided by the point-wise calculations. The proof may be sketched for the present setting as follows. Equations 14-21 establish the validity of the approximate confidence interval for a particular alternative hypothesis, as specified by the scalar constant tr( ). The randomness in the noncentrality arises from a scalar random variable, λ * 1 , analogous to a variance. Equation 19 describes a single event with a specified probability. The inequality defining the event, and the associated probability, do not change for different values of the scalar constant tr( ). The smooth and strictly monotone dependence of power on the noncentrality ensures the validity of equations 20-21. The proof is completed by noting that the monotonicity extends the simultaneity property to the power confidence region. Figure 1 gives an example plot of approximate power confidence regions surrounding the predicted power curve for the rank-adjusted Huynh-Feldt test for = 0.720. Graphical representations such as Figure 1 help researchers accurately recognize the amount of uncertainty in their power calculation, and lead to better decisions about design.
In some cases scientists prefer to consider sample size as a function of the pattern of mean differences. The theory already presented allows plotting sample size as a function of mean difference, albeit with a shift in algorithm. The power function must be numerically inverted to solve for the sample size desired. Taylor and Muller [16] outlined the steps of algorithm needed for the univariate case. Details are not presented here for the sake of brevity.

Simulation overview
The accuracy of the new approximate confidence intervals is evaluated for a wide range of conditions. Appendix C contains more details of the simulations and examples. All simulations were conducted in SAS/IML (SAS 9.1, SAS http://www.biomedcentral.com/1471-2288/13/57 Institute, 2003) using a version of LINMOD 3.4 modified to include the rank-adjusted Huynh-Feldt estimator and test. Predicted power values and approximate power confidence intervals were computed using a similarly modified version of POWERLIB 2.03. The modified versions of LINMOD and POWERLIB are available at http://www. health-outcomes-policy.ufl.edu/muller/.

Simulation 1 with rank (X) = 1 (one-group repeated measures ANOVA)
The accuracy of the new approximate confidence intervals were evaluated for a completely within-subject design with p = 9 repeated measures, N ∈ {10, 20, 40}, and q = rank (X) = 1. Values for B, contrast matrices C, U, and 0 were chosen to test a within-subject interaction for α = 0.05. The model was chosen to ensure predicted power values for the Geisser-Greenhouse test of 0.20, 0.50, and 0.80, using the power approximation in Muller et al. [5]. Population covariance matrices were chosen to provide ∈ {0.282, 0.505, 0.720, 1.00}. The sphericity values were selected to cover a range of eigenvalue patterns (i.e., patterns of the principal component variances) arising from the structure of * . For example, if b = 3, then λ = 1 0.12 0.12 gives ≈ 0.50. In turn, λ = 1 0 0 gives = 1/3 ≈ 0.33. Pseudo-random realizations of the error matrix, E, were generated and tests were calculated. The observed mean power values for the four UNIREP tests were calculated and tabulated for 500,000 replications per condition.
For the conditions described above, additional pseudorandom realizations of the error matrix were generated using an estimating study with sample size, N est , of 10 and rank of X, rank(X est ), of 1 with 500,000 replications per condition for all four UNIREP tests. Corresponding estimated covariance matrices were calculated, as well as lower and upper bounds for power. Both one-and two-sided confidence intervals were evaluated with target coverages of 90% and 95%. The number of replications gave a standard error of estimated coverage probability less than or equal to 0.0003 for 1 − α = 0.95, and 0.0004 for 1 − α = 0.90, nearly guaranteeing 3 digits of accuracy. Only coverage of observed mean power values, and not predicted, was tabulated. The accuracy of the predicted power values, with respect to the observed, made it essentially redundant to consider both.
Only the worst case results for two-sided 95% confidence intervals are presented here. The worst cases occurred with the smallest sample size for the target study. Table 2 contains results for the Box conservative test with a target sample size of 10. For a wide range of sphericity values and target power values, the target 95% estimated coverage is consistently reached. The two cases in which the target coverage is not reached occur with large population sphericity and low power. Under these conditions, the Box conservative test would not be used in practice. Table 2 also contains coverage results for the Geisser-Greenhouse and the rank-adjusted Huynh-Feldt tests. The target 95% estimated coverage is consistently reached for extreme sphericity values for both tests. For midrange sphericity values, the coverage fell below the target coverage from 0.8% to 7.3% for the Geisser-Greenhouse, and 1.4% to 12.1% for the rank-adjusted Huynh-Feldt. Coverage accuracy improved as the estimated power increased. In practice, lower power values are of little concern. For target power of 0.80 for the Geisser-Greenhouse test, the largest deviation from the target 95% estimated coverage was 2.6% for the Geisser-Greenhouse test and 4.1% for the rank-adjusted Huynh-Feldt test. Both occurred for the population sphericity value of 0.505.
Only a spherical case is appropriate to consider for the uncorrected test because otherwise the test will have inflated test size. Simulation results in Table 3 show that the approximation for the uncorrected test (with sphericity) always reached the target estimated coverage for the uncorrected test. The conservative bias could be eliminated by using optimal maximum likelihood estimates for the common variance and covariance (Morrison [21]), rather than the unstructured covariance estimate. Additional small changes are needed, associated with degrees of freedom, and corresponding to making all choices of e 1 through e 5 equal to 1.
Although not presented here, in general, the accuracy of the coverage improved directly with increasing sample size, for all tests and conditions. The accuracy of the approximate confidence bounds for all four UNIREP tests also improved as the population sphericity increased.

Simulation 2 with rank(X) > 1
All of the simulations in the second example considered the condition of rank of X greater than 1. The cases used p = 5 repeated measures, N ∈ {16, 32, 48}, and q = rank (X) ∈ {2, 4, 8, 16}, corresponding to a three-, five-, nine-, and seventeen-group comparison, respectively. Appropriate fixed matrices of regression coefficients, B, contrast matrices, C and U, and 0 Observed mean power values were simulated and tabulated in a similar manner to that described in section 'Simulation 1 with rank (X) = 1 (one-group repeated measures ANOVA)' . Pseudo-random realizations of the error matrix were generated using an estimating study with sample size, N est , of 16 and rank of X, rank(X est ), of 4 with 500,000 replications per condition for all four UNIREP tests. Corresponding estimated covariance matrices were calculated, as well as lower and upper bounds for power using the methods presented in section 'Approximate UNIREP power confidence intervals' . Approximate confidence interval coverage was defined as the proportion of the 500,000 simulated bound realizations that successfully covered the observed mean power values for each condition described above. Only coverage of observed mean power values, and not predicted, were tabulated. The accuracy of the predicted power values, with respect to the observed, made it essentially redundant to consider both. Both one-and two-sided confidence intervals were evaluated with target coverages of 90% and 95%.
In practical biomedical research, low power values are of little concern. Rarely will one have a power targeted below 0.70. Therefore, only the results for target power values of 0.80 will be presented and discussed. Power confidence interval coverage converged to the target coverage as sample size increased. Only the worst case results for two-sided 95% confidence intervals are presented here. The worst cases occurred with the smallest sample size for the target study, for a variety of population sphericity values and estimated population powers.
In Table 4, the observed mean population powers are presented for the four UNIREP tests for the population sphericity values and ranks of X considered for target rank-adjusted Huynh-Feldt power of 0.80 and sample size of 16 or 48. In general, as the population sphericity increased and rank of X increased, the observed mean power values for the Box conservative, the Geisser-Greenhouse, and the rank-adjusted Huynh-Feldt tests decreased. Only the Box conservative had severely biased power values as the population sphericity increased.
In Table 5, the proportion of simulations in which the estimated confidence interval successfully covered the observed mean population power values for each test is http://www.biomedcentral.com/1471-2288/13/57 shown. The results are based on using an estimating study with sample size, N est , of 16 and rank of X, rank(X est ), of 4. In general, the approximate power confidence intervals nearly always reached the target 95% coverage for the Box conservative test. The coverage became more conservative as rank of X decreased. Similarly, the coverage became more conservative for the Geisser-Greenhouse and rankadjusted Huynh-Feldt tests as rank of X decreased. The Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests performed adequately in all cases except for the midrange population sphericity value, = 0.505. The largest deviation from the target 95% estimated coverage was 13.6% and 16.0% for the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests, respectively, which occurred for = 0.505 and rank of X equal to 8. The approximate power confidence intervals for the uncorrected test reached the Although not presented here, in general, as sample size increased the conservative coverage values observed for the Box conservative and the uncorrected tests slowly converged to the target coverage value. This trend was observed for the conservative coverage values with the extreme population sphericity values for the Geisser-Greenhouse and the rank-adjusted Huynh-Feldt tests as well. The same is true of the liberal coverage values observed for the midrange population sphericity values for the Geisser-Greenhouse and the rank-adjusted Huynh-Feldt tests. Similar results were obtained for the target 90% two-sided confidence interval coverage as well as the 95% and 90% one-sided confidence intervals coverage.
The estimated coverages of these tabulated observed mean power values for each test were simulated for population sphericity values of 0.282, 0.505, 0.720, and 1.00. In Table 6, the worst case results from these simulations, which occurred for population sphericity 0.505, are presented. Approximate confidence intervals were simulated for 500, 000 replications per condition (standard error of estimated coverage probability less than or equal to 0.0003 for 1 − α = 0.95, and 0.0004 for 1 − α = 0.90). The estimating studies use sample sizes, N est , of 16, 32, and 48, and ranks of X est of 2, 4, and 8.
In general, for population sphericity values of 0.282 and 0.505, the approximate power confidence interval coverage for the Box conservative test converged to the target coverage value as rank of X est increased, and thus ν est decreased. Coverage decreased as rank of X from the target study increased. For larger rank of X, the approximate power confidence interval coverage fell short of the target coverage in several instances. No clear trend was apparent as N est increased. The Box conservative test would not be used for larger population sphericity values. However, the realization that the target coverage was reached in nearly every case considered for the larger population sphericity values is worth mentioning.
The approximate power confidence interval coverages for both the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests seem to have converged to the target coverage value as rank of X est increased, and thus ν est decreased, except in cases of sphericity. Such cases have little practical importance since exact results are available if sphericity is valid. Coverage decreased as rank of X from the target study increased. As observed in previous simulations, the approximate power confidence interval coverages for both the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests fell short of the target coverage to varying degrees in nearly every case considered for midrange population sphericity values. This outcome was also observed for larger rank of X from the target study for population sphericity of 0.282. The approximate power confidence interval coverage for the uncorrected tests reached the target coverage value in every case except for large ν est and small rank of X from the target study. The approximate power confidence interval coverage increased as the ranks of X for both the target and estimating studies increased and as N est decreased.
The slow convergence of the approximate power confidence interval coverage to the target coverage may be Table 6 Target 95% CI (two-sided) estimated coverage (×100) of simulated population power for = 0.505, target power = 0.80, N = 48 and rank (X) = q Box  Estimation Study: N est ∈ (16, 32, 48) and rank (X est ∈) (2,4,8). Standard error of coverage probability× 100 ≈ 0.0003 × 100. http://www.biomedcentral.com/1471-2288/13/57 due, in part, to use of n and r in the approximate power confidence interval equation. These estimators of the sphericity parameter are ratios of unbiased estimators for the non-null and null cases, respectively. The variances of these estimators are much larger than the variances for n and d . The larger variances may account for the slow convergence to the population power as the target and estimating study sample sizes and degrees of freedom increase. Further simulations may be needed to confirm this reasoning.

Alternate approximations
In attempts to develop even better confidence bound estimates, additional approximations were developed and evaluated. One approach approximated the distribution of λ * 1 with an F. Using the methods presented in Kim et al. [22], the numerator of λ * 1 was approximated with a weighted noncentral chi-square, while the denominator was approximated with a weighted central chisquare. Two concerns arose. First, the denominator is not necessarily a central quadratic. The 2tr ( /a) component makes the denominator more of a shifted central quadratic. Second, the Kim et al. [22] result requires that the components of the numerator and denominator be mutually independent, which does not hold. Simulations demonstrated that the approximation was inaccurate in small samples. Alternative approximations were developed and evaluated. The alternatives matched only the numerator to a weighted noncentral chi-square or to a weighted central chi-square with the denominator a constant equal to E[ tr( * ) + 2tr( /a)]. All were less accurate than the approximation presented here.

Discussion
Even for small sample sizes, the proposed power confidence intervals attain very accurate coverage probabilities for the Box conservative test in all cases and for the uncorrected test with = 1 (the only case for which it should be used). The result is also true for the extreme population sphericity values for the Geisser-Greenhouse and rank-adjusted Huynh-Feldt tests. For midrange population sphericity values, the coverage probabilities of the approximate power confidence intervals for the latter two tests often fell somewhat short of the various target coverage values considered. Coverage probabilities improve as sample size increases. Accuracy is better for higher target power values than for lower, which makes the results useful in practice. Onesided confidence intervals are recommended for lower bounds on power.
The techniques also allow plotting power confidence regions around an estimated power curve (Figure 1). The resulting plots have been well received by researchers.

CLAHE mammography example
Computer scientists developed the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm to improve contrast in medical images. Independent observers considered 3 × 3 = 9 Clip×Region combinations. Region denotes the size of the image (pixels 2 ) at which contrasts are controlled and Clip level limits the maximum contrast adjustment. In the multivariate model X = 1 N , while within-person factors Clip and Region gave Y , (N × 9). Also B, (1 × 9), contained mean log 10 (contrast) for the unprocessed condition minus the mean for each of the nine combinations of Clip and Region (β cr = μ unprocessed − μ cr ). If T c contains orthonormal linear and quadratic trends for log 2 (Clip) ∈ {1, 2, 4}, and T r does the same for log 2 Region ∈ {1, 3, 5}, then the 9 × 4 within-persons contrast matrix, U cr is With L the linear and Q the quadratic trends for interaction components being tested, U cr = u LL u LQ u QL u QQ .
All four covariance patterns were factorially combined with N ∈ {10, 20, 40}. The multivariate test considered cr = β P · 0.5 1.0 −1.0 0.5 with α = 0.05, and β P the scaling factor for B corresponding to approximate target power P ∈ {0.20, 0.50, 0.80} for the Geisser-Greenhouse approximation using methods in Muller et al. [5]. The conditions in the example were used in section 'Simulation 1 with rank (X) = 1 (one-group repeated measures ANOVA)' . More details of the example are in Muller et al. [5].

Test of interaction with rank(X) > 1 Example
All cases used 5 repeated measures, N ∈ {16, 32, 48}, and rank(X) ∈ {2, 4, 8, 16}. For obvious reasons, a rank of X equal to 16 was not considered for the smallest sample size. All four covariance patterns were factorially combined with the sample sizes and ranks X. In the multivariate model, X = I q ⊗ 1 repn , such that repn = N/q, and ⊗ is a Kronecker product. If then B = β P · D a , such that β P was the scaling factor giving approximate target power P ∈ {0.20, 0.50, 0.80}, for the rank-adjusted Huynh-Feldt power approximation. The within-subject contrast, U, (5 × 4), contained orthonormal linear, quadratic, cubic and quartic trends: Each row of the between-subject contrast, C, a (q − 1 × q) orthonormal matrix, contained one of the (q − 1) trends. The contrasts define a test of interaction of between-and within-subject trends. Without loss of generality, we assumed 0 = 0. A test size, α, of 0.05 was used.

Computational methods
All power computations were conducted in SAS/IML (SAS 9.1, SAS Institute, 2003). Free software LINMOD 3.4 was used for all data analysis and includes new methods. Free software POWERLIB 2.1 in Johnson et al. [25] was used for all power analysis and includes the new methods. Both are available at http://health-outcomes-policy. ufl.edu/faculty-directory/-muller-keith/list-of-software/. UNIREP power is also available in GLIMMPSE, a free web-browser based program with a graphical user interface aimed at health scientists (www.SampleSizeShop.org). The next version of GLIMMPSE is expected to implement the confidence interval methods.