This article has Open Peer Review reports available.
Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials
- Peng Li^{1}Email author and
- David T Redden^{1}
https://doi.org/10.1186/s12874-015-0026-x
© Li and Redden; licensee BioMed Central. 2015
Received: 4 November 2014
Accepted: 26 March 2015
Published: 23 April 2015
Abstract
Background
Small number of clusters and large variation of cluster sizes commonly exist in cluster-randomized trials (CRTs) and are often the critical factors affecting the validity and efficiency of statistical analyses. F tests are commonly used in the generalized linear mixed model (GLMM) to test intervention effects in CRTs. The most challenging issue for the approximate Wald F test is the estimation of the denominator degrees of freedom (DDF). Some DDF approximation methods have been proposed, but their small sample performances in analysing binary outcomes in CRTs with few heterogeneous clusters are not well studied.
Methods
The small sample performances of five DDF approximations for the F test are compared and contrasted under CRT frameworks with simulations. Specifically, we illustrate how the intraclass correlation (ICC), sample size, and the variation of cluster sizes affect the type I error and statistical power when different DDF approximation methods in GLMM are used to test intervention effect in CRTs with binary outcomes. The results are also illustrated using a real CRT dataset.
Results
Our simulation results suggest that the Between-Within method maintains the nominal type I error rates even when the total number of clusters is as low as 10 and is robust to the variation of the cluster sizes. The Residual and Containment methods have inflated type I error rates when the cluster number is small (<30) and the inflation becomes more severe with increased variation in cluster sizes. In contrast, the Satterthwaite and Kenward-Roger methods can provide tests with very conservative type I error rates when the total cluster number is small (<30) and the conservativeness becomes more severe as variation in cluster sizes increases. Our simulations also suggest that the Between-Within method is statistically more powerful than the Satterthwaite or Kenward-Roger method in analysing CRTs with heterogeneous cluster sizes, especially when the cluster number is small.
Conclusion
We conclude that the Between-Within denominator degrees of freedom approximation method for F tests should be recommended when the GLMM is used in analysing CRTs with binary outcomes and few heterogeneous clusters, due to its type I error properties and relatively higher power.
Keywords
Background
Cluster-randomized trials (CRTs), also called group-randomized trials, are widely used in the evaluation of interventions in health services research [1]. CRTs are distinct from other randomized controlled trials in that the identifiable clusters of subjects/participants such as medical practices, hospital wards, schools, or communities, rather than individuals, are randomly assigned to different intervention conditions [2]. Because the clusters are formed not at random but rather through some connections among their members, a positive intraclass correlation (ICC, denoted as ρ) [3] among observations in the same cluster is expected. Although typically the ICC is small (ρ < 0.05) [4] and not known when a trial is planned, the adjustment for ICC is necessary for a valid statistical analysis at the subject level. Any statistical test ignoring the non-independence of participants within clusters will underestimate the variances of the intervention effects and consequently inflate the type I error rates [5]. CRTs can be analyzed at the cluster level, by deriving summary statistics for each cluster, or at the individual level using the data for each participant in each cluster [1]; however, only the individual-level analyses enable the adjustment of the participant characteristics to minimize the selection bias. Two modeling approaches are commonly used for the individual-level analyses of CRTs with the consideration of clustering. One is the random effects model or generalized linear mixed model (GLMM), which incorporates random effects to reflect the correlation among observations of same cluster [6]; the other is the marginal or population mean model using the generalized estimating equations (GEE) approach [7]. These two modeling methods should provide similar results if both models are correctly specified and their underlying assumptions hold well, while the interpretation of the fixed effects estimates is a little different [8]. The GLMM is more complicated and informative than the GEE approach by providing the estimation of the variance components, which are otherwise treated as nuisance parameters in GEE [7]. The choice of modeling method should depend on the scientific questions and the validity of the underlying assumptions. In cases where heterogeneity is of significant interest, the GLMM could be the better choice. In addition, the pattern of missing data, which is common in most trials, is another important consideration on the model selection. The GLMM is valid under both missing completely at random (MCAR) and missing at random (MAR), while the GEE approach is valid only under MCAR even though some imputation strategies have been proposed for valid GEE inference under MAR [8].
The GLMM combines the properties of two statistical models that are widely used in different fields: generalized linear models (GLMs) which handle non-normal data from the exponential family by using link functions and linear mixed models (LMMs) which incorporate random effects [8]. In the GLMM, the Wald statistics are recommended to test the null hypothesis of fixed effects because the likelihood ratio tests are unreliable for small to moderate sample sizes [8-10]. Wald statistics are calculated by dividing parameter estimates or linear combinations of parameter estimates by their estimated standard errors. In the GLMM, the approximated Wald F test, rather than Chi-squared test, is recommended to handle finite sample sizes and overdispersion, which commonly occurs for binary or Poisson regression models, since the variance of both distributions is a function of the mean [8]. The most challenging issue for the approximated Wald F test is the estimation of the denominator degrees of freedom (DDF). It is expected that overestimation of DDF will produce a liberal test leading to inflated type I error and the underestimation of DDF will produce a conservative test leading to the potential power loss. In practice, five DDF approximations are used, including Residual DDF, Containment DDF, Between-Within (B-W) DDF, Satterthwaite DDF and Kenward-Roger (K-R) DDF for the Wald F test; however, none of them work well in all situations and some are only valid in very strict conditions [8,9,11]. Simulation studies [12-15] under unbalanced split-plot designs have shown that the K-R DDF approximation has the best performance in preserving the nominal type I error; and that the covariance structure, the sample size, and the degree of imbalance are the major factors that affect the performance. Although K-R DDF approximation is recommended to maintain the type I error rate, its small sample performance was evaluated mainly on normal-distributed outcomes under repeated measures designs [14,15]. CRTs typically have characteristics including small cluster numbers, moderate to large variable cluster sizes, and weakly correlated outcomes within the same cluster (ρ < 0.05) [4]. These characteristics are quite different from those encountered in repeated measure designs. Therefore, the validity of the K-R DDF approximation for non-normal outcomes under CRT scenarios needs further evaluation.
The purpose of the present study is to compare and contrast the statistical properties of the five DDF approximation methods for GLMM when testing intervention effects for binary outcomes in CRTs with a small number of clusters. Specifically, the type I error rates to test the null hypothesis of treatment effect are examined for each of five DDF approximation methods (Containment, Residual, B-W, Satterthwaite, and K-R) under situations with different ICCs, sample sizes, and cluster size variation. For the methods that can maintain the nominal type I error rate, statistical power is compared. Because the compound symmetry is the reasonable and most widely accepted variance-covariance structure for CRT data, it is the only variance-covariance structure considered in this study.
Methods
Generalized linear mixed models and Wald F test
where
Y_{ i } is the n_{ i } × 1 response vector for the i^{ th } cluster;
g^{− 1}(·) is the inverse of a differentiable monotonic link function;
X_{ i } is a n_{ i } × p matrix of fixed covariates;
β is a p × 1 vector of fixed-effects regression parameters;
Z_{ i } is a n_{ i } × v design matrix of random effects, where v is a design parameter;
b_{ i } is a v × 1 vector of cluster-specific random effects;
ϵ_{ i } is a n_{ i } × 1 error vector.
The parameters in GLMM can be estimated either by the standard maximum likelihood (ML) estimation, which estimates the standard deviations of the random effects assuming that the fixed effect estimates are precisely correct, or by the restricted maximum likelihood (REML) estimation, a variant that averages over some of the uncertainty in the fixed-effect parameters [8,11].
with r numerator degrees of freedom and an approximated DDF, say d. Suppose we are going to test the null hypothesis of no intervention effect, the Wald F statistic \( F\left({\widehat{\beta}}_T\right) \) will have an approximated F distribution with 1 numerator degrees of freedom and d DDF which must be specified or estimated. Five DDF approximations are proposed to justify the correlated outcomes and briefly discussed below.
Residual DDF
The simplest method for the DDF estimation is the Residual method which is calculated by N − rank[X], where N=\( {\displaystyle {\sum}_{i=1}^K\;}{n}_i \), the total participants across all clusters.
Containment DDF
The Containment method chooses DDF as the smallest rank contribution of the random effects that contain the fixed effects to the design matrix in split-plot design [15]. This choice of DDF matches the tests performed for balanced designs and could be adequate for moderately unbalanced designs [15]. Under the framework of CRTs, if the treatment effect is fixed and not contained in any random effects, the Containment DDF is calculated by N − K.
Between-Within DDF
Schluchter and Elashoff [18] divide the residual degrees of freedom into between-cluster and within-cluster portions and suggest that in a mixed model, if a fixed effect changes within any cluster, within-cluster degrees of freedom should be assigned to the effect; otherwise, the between-cluster degrees of freedom should be assigned to the effect. In a CRT to test the intervention effect across the clusters, the between-cluster degrees of freedom will be applied and calculated as K − rank[X].
Satterthwaite DDF
Fai and Cornelius [13], follow Satterthwaite’ premise [19] to propose a method for multi-degree-of-freedom tests in unbalanced split-plot design. The degrees of freedom are calculated as a function of the variance of the parameter estimate. Briefly, \( {\left(L\widehat{Y}\left(\widehat{\beta}\right){L}^{\prime}\right)}^{-1} \) is decomposed to yield \( {P}^{\prime }{\left(L\widehat{Y}\left(\widehat{\beta}\right){L}^{\prime}\right)}^{-1}P= diag\left({\lambda}_m\ \right) \) where columns of P are normalized eigenvectors and the λ_{ m } are the corresponding eigenvalues of \( {\left(L\widehat{Y}\left(\widehat{\beta}\right){L}^{\prime}\right)}^{-1} \). Let Q = rF, using the decomposition, \( Q={\displaystyle {\sum}_{m=1}^r}\frac{{\left({p}_m^{\prime }L\widehat{\beta}\right)}^2}{\lambda_m}={\displaystyle {\sum}_{m=1}^r}\kern0.22em {t}_{U_m}^2 \), the sum of r approximate t variables squared, where \( {p}_m^{\prime } \) is the m^{ th } eigenvector and U_{ m } is the approximate degrees of freedom for the m^{ th } independent single degree of freedom t statistic. Since \( \frac{Q}{r}\sim {F}_{r,d} \), d can be solved using the relationship \( E(F)=\frac{d}{d-2} \). For r > 1, \( d=\frac{2E\left[Q\right]}{E\left[Q\right]-r} \), and for r = 1, \( d=\frac{2{\left(L\widehat{Y}\left(\widehat{\beta}\right){L}^{\prime}\right)}^2}{Var\left[L\widehat{Y}\left(\widehat{\beta}\right){L}^{\prime}\right]} \), where \( Var\left[L\widehat{Y}\left(\widehat{\beta}\right){L}^{\prime}\right] \) is approximated using the multivariate delta method.
Kenward-Roger DDF
The value of d thus derived is the K-R DDF. For r =1, K-R DDF is the same as Satterthwaite DDF, but the K-R approximation generates a more conservative test by inflating the variance-covariance matrix, by φ.
Data simulation
The denominator degrees of freedom of GLMM Wald F test by different approximation methods in the simulations under the framework of CRTs
Methods | Estimated denominator degrees of freedom |
---|---|
Residual | \( {\displaystyle \sum_{i=1}^K}{n}_i-2 \) |
Containment | \( {\displaystyle \sum_{i=1}^K}{n}_i-K \) |
Between-Within | K − 2 |
Satterthwaite | d, estimated from data |
Kenward-Roger | d, estimated from data |
The type I error rate of each DDF approximation is calculated by computing the observed fraction of Wald F tests rejecting the null hypothesis (H_{ o } : β = 0) when the null hypothesis is true. At the nominal 0.05 level and 5000 simulations, we expect the simulated type I error rate to be between 0.044 and 0.056 (95% confidence interval), and any procedure with type I error rate below this range will be considered conservative, above this range will be considered liberal, and within this range will be considered as having the nominal type I error rate. The power is calculated by computing the observed fraction of Wald F tests rejecting the null hypothesis (H_{ o } : β = 0) when the true value of β is log1.5 (i.e., odds ratio is 1.5).
Real data illustration
All the five DDF approximation methods are illustrated using a real CRT, investigating whether intervention in general practices improved subsequent attendance at breast screening among women who did not respond to their initial invitation in the Newham borough of East London [22,23]. Among the participating practices, 12 were randomized to the intervention group and 14 to the control group. The reception staff of the general practices allocated in the intervention group entailed training of to contact non-attenders for breast screening. Control practices were given no training or advice. A total of 995 women in the intervention practices and 1069 in the control practices were included in the trial. The outcome of interest was the attendance at breast screening among women who did not respond to their initial invitation for routine breast screening. The intervention practices generally had higher rates of attendance in comparison to those in the control practices, although the attendance rate varied considerably between practices. It should be noted that a key feature of this trial is the small number of clusters (K = 26) with highly variable cluster sizes (cv ≈ 0.71).
Results
Type I error rates of Wald F tests with different DDF approximations
In this study, we compare the small sample performance of five DDF approximation methods in GLMM to test the null hypothesis of intervention effect under the framework of CRTs with binary outcomes. Specifically, we illustrate how the ICC, sample size, and the variation of the cluster size affect the type I error control of five DDF approximation methods.
The observed type I error rates of the Residual method to test the null hypotheses of intervention effect under various CRT scenarios are shown in Figure 1. The Residual DDF approximation does not consider the correlation of individuals among the same cluster and is calculated by subtracting 2 (the rank of X matrix in our settings) from the total number of individuals across all clusters. Clearly, the observed type I error rates of the Residual method are inflated when the total cluster number is less than 30. The inflation becomes more severe as the total cluster number becomes smaller and/or the variation of cluster size becomes larger. The inflation of type I error caused by the increased variation of cluster size can be diminished by increasing the cluster number; however, the Residual method cannot keep the observed type I error rate to the nominal level even for the equal cluster size (cv = 0); therefore, the Residual method should not be used in the GLMM analyses of CRTs if the cluster number is smaller than 30.
The observed type I error rates of the Containment method to test the null hypotheses of intervention effect under various CRT scenarios are shown in Figure 2. In the GLMM analyses of our CRT simulations, the intervention effect is set to be fixed and all the clusters have the common variance-covariance structure. Hence, the Containment method estimates the DDF as the total number of individuals across all clusters minus the number of clusters, i.e. N − K. Because of the large N and the relatively small K, the Containment method has the similar small sample performance to the Residual method regarding the inflated type I error rates. The observed type I error rates are inflated when the cluster number is smaller than 30; the inflation becomes more severe for smaller cluster numbers and/or for larger variations of cluster size. The inflation of type I error caused by the increased variation of cluster size can be diminished by increasing the cluster number, but not to nominal level given K < 30. Therefore, the Containment method should not be used in the GLMM analyses of CRTs with a cluster number smaller than 30.
The B-W method provides the optimal DDF approximation by providing the nominal type I error rate across our simulations, as shown in Figure 3. In most of the simulation situations, the observed type I error rates are located between 0.044 to 0.056, the 95% confidence interval of the nominal level, even when the number of clusters is as low as 10. Greater cluster size variation is associated with slight increases in the observed type I error rate when the number of clusters is small, such as K < 30. The Wald F test with B-W approximation tends to be slightly conservative under balanced design (cv = 0) and slightly liberal when the variation of cluster sizes is very high (cv > 0.8); however, the observed type I error rates under these extreme conditions are still very close to the nominal level.
The Satterthwaite method is intended as an accurate F test approximation and solves the DDF by matching the moments of observed Wald F statistics and an exact F distribution. Its type I error rate under various CRT scenarios is shown in Figure 4. The Wald F test with the Satterthwaite approximation can keep the type I error rates to nominal level as long as the number of clusters is greater than 30. The method tends to be conservative when the cluster number is lower than 30, and the conservativeness becomes more severe with the increase of the cluster size variation. As shown in Figure 4, the Wald F test with the Satterthwaite approximation only keeps the observed type I error rates close to nominal level under the balanced design (cv = 0) when the number of clusters is smaller than 20. As cluster size variation increases, the observed type I error rates drop dramatically. The conservative type I error rates caused by the increased variation of cluster size can be diminished by increasing the total number of clusters, but not to nominal level. The conservativeness definitely will preserve the validity of the Wald F test, but it may decrease the statistical power of the test.
The K-R method inflates the marginal variance-covariance matrix and then applies the Satterthwaite method for the DDF approximation. Because we only test the null hypothesis of intervention effect, the K-R method has the exactly same DDF approximation as the Satterthwaite method. Its small sample performance with regarding the type I error rate under various CRT scenarios is very similar to the Satterthwaite method, but a little more conservative due to the standard error inflation, as shown in Figure 5. Therefore, this method will preserve the validity of the Wald F test; however, its conservativeness may cause power loss, especially when considerable cluster size variation.
In summary, the number of clusters and the cluster size variation, rather than ICC and the average cluster size, play important roles on the type I error control for the five DDF approximation methods in GLMM analysis to test the null intervention effect under the framework of CRTs with binary outcomes. When the cluster number is smaller than 30, neither Residual nor Containment method should be used due to the inflated type I errors. In contrast, both Satterthwaite and K-R methods tend to be conservative, especially when a considerable cluster size variation exists. Our simulations suggest that the B-W method preserves the type I error rates to nominal level in the GLMM analysis of CRTs with a small number of few clusters and is robust cluster size variation. It should be noted that only binary outcomes are studied here and the aforementioned results may not be directly applicable to outcomes with different distributions.
Statistical power of Wald F tests
In summary, the B-W method is statistically more powerful than the Satterthwaite or K-R method in analysing CRTs with heterogeneous clusters, especially when the cluster number is small and the variation of cluster size is large.
Real data illustration
GLMM small sample inferences of intervention effects on women’s attendance at breast screening with different denominator degrees of freedom approximations
Method | Intervention estimate | Standard error | F value | Numerator DF | Denominator DF | P value |
---|---|---|---|---|---|---|
Residual | 0.9517 | 0.4485 | 4.50 | 1 | 2062 | 0.0340 |
Containment | 0.9517 | 0.4485 | 4.50 | 1 | 2038 | 0.0340 |
B-W | 0.9517 | 0.4485 | 4.50 | 1 | 24 | 0.0444 |
Satterthwaite | 0.9517 | 0.4485 | 4.50 | 1 | 20.85 | 0.0460 |
K-R | 0.9517 | 0.4507 | 4.46 | 1 | 20.85 | 0.0469 |
Discussion
When the GLMM is used in the analyses of CRTs, the null hypothesis of the treatment effect can be tested using the Wald statistics by dividing treatment mean squares by the appropriate error mean square to form a variance ratio with an F distribution. The numerator degrees of freedom can be specified by the number of fixed effect contrasts being considered, but the determination of suitable DDF must be estimated in the unbalanced mixed models [24]. In this study, we compare and contrast the small sample performances of five methods of DDF approximation for the GLMM Wald F test under the framework of CRTs regarding the type I error and power. Our simulation results suggest that the B-W method maintains the type I error rates to the nominal level even when the number of clusters is as low as 10, and is robust to the variation of the cluster sizes. The Residual and Containment methods inflate the type I error rates when the cluster number is small (<30) and the inflation becomes more severe as the variation of cluster sizes increases. In contrast, the Satterthwaite and K-R methods may provide tests that are too conservative when the cluster number is small (<30) and the conservativeness becomes more severe with the increase of cluster size variation. However, the inflation or deflation of the type I error rates caused by the imbalance of the cluster sizes can be diminished by increasing the number of clusters. When the cluster number is greater than 30, all the methods are robust to the variations of the cluster sizes.
The Between-Within method is proposed for the small sample adjustment to the longitudinal repeated measures [18]. This method divides the residual degrees of freedom into between-cluster and within-cluster values and assigns a between-cluster denominator degrees of freedom to a the fixed effect that does not change within clusters. In the GLMM analyses of CRTs, the intervention effect does not change within clusters, and then a between-cluster denominator degrees of freedom, K − 2, is assigned to the Wald F test of the null hypothesis of intervention effects. This method is proposed for the longitudinal repeated measures and is supposed to be valid only for the balanced design; however, in our simulations, this method preserves the type I error rates to a nominal level and it is robust to the small number of clusters and the variation of cluster sizes.
The Residual method does not take the correlation into account and is only valid for the independent outcomes. It is not surprising that the Wald F test with the Residual approximation of DDF has the inflated type I error rates in the GLMM analyses of CRTs. The Containment method mimics the classical degrees of freedom rules for balanced ANOVA situations, and is the default method for the SAS procedures PROC MIXED and PROC GLIMMIX when the random statements are specified [11]. In the analyses of CRTs, the intervention effect is usually considered as the fixed effect so that the DDF of the single parameter Wald F statistic for the intervention effect by the Containment method will be approximated in the similar way as the Residual method. Our simulation results show that, like the Residual method, the Containment method inflates the type I error rates to test the null hypothesis of intervention effect in the GLMM analyses of CRTs. Therefore, neither of these two methods should be considered in the GLMM analyses of CRTs.
Both Satterthwaite and K-R methods estimate the DDF from the data through matching the first two moments of the Wald F statistics and the approximating F distribution [13,14]. Compared with the Satterthwaite method, the K-R method further adjusts the covariance matrix for the fixed effects parameters that accommodates the uncertainty in the covariance matrix [14]. Since their appearance, these two methods, especially the K-R method, have been favored by many studies under the random complete block design, split plot design and repeated measures design [12,15]. Spilke et al. [12] conclude that the Satterthwaite method provides good type I error control and the K-R method gives the best type I error control by reducing the bias of the estimated variance-covariance matrix of fixed effects parameters under random complete block design. Schaalje et al. [15] investigate the repeated measures design and conclude that the K-R method works as well as or better than the Satterthwaite method in maintaining the type I error rates close to the nominal level. In contrast to these previous studies, our simulation results suggest that both Satterthwaite and K-R methods tend to be overly conservative, especially when a considerable variation of cluster sizes exists, under the framework of CRTs and a binary outcome. Not surprisingly, the conservativeness causes greater power loss in analyzing the CRTs with few heterogeneous clusters. Unfortunately, large variation of cluster sizes is common in CRT design and the so caused power loss could be very costly if the Satterthwaite or K-R method is going to be used in the analysis.
The variance-covariance structures have been shown in many studies to affect the small sample performances of different denominator degrees of freedom approximations [12,15,18]. Under the CRT framework, the compound symmetry is the most commonly accepted variance-covariance structure and therefore the only consideration in our study. In actual practice, the intraclass correlation among the same cluster is low and usually less than 0.05 [4]. Under the range (0.001, 0.01, 0.05 and 0.1) investigated in this study, we find that the intraclass correlation has little effect on the small sample performances of all the five methods we evaluated. However, for those CRTs with a more complicated correlation structure, such as the CRTs with binary longitudinal outcomes, the small sample performances of the DDF approximations need further evaluation. Another limitation of this study is that only binary outcomes are considered and the small sample performances of the five DDF methods on other types of outcomes (count, time-to-event, etc.) need further investigations.
Conclusion
In conclusion, we compare the small sample performances of five DDF approximation methods in GLMM to test the null hypothesis of intervention effect under the framework of CRTs with binary outcomes, and find that the B-W method outperforms the other four methods by its ability to preserve the type I error rates to nominal level and its relatively higher statistical power. Therefore, the B-W method should be recommended in the GLMM analyses of CRTs with few heterogeneous clusters.
Declarations
Acknowledgements
This research was supported by NIH grant T32HL079888 (PL), P60AR048095 (DTR) P60AR064172 (DTR), and UL1 TR000165 (DTR).
Authors’ Affiliations
References
- Campbell MJ, Donner A, Klar N. Developments in cluster randomized trials and statistics in medicine. Stat Med. 2007;26(1):2–19.View ArticlePubMedGoogle Scholar
- Donner A, Klar N. Pitfalls of and controversies in cluster randomization trials. Am J Publ Health. 2004;94(3):416–22.View ArticleGoogle Scholar
- Eldridge SM, Ukoumunne OC, Carlin JB. The intra-cluster correlation coefficient in cluster randomized trials: a review of definitions. Int Stat Rev. 2009;77(3):378–94.View ArticleGoogle Scholar
- Campbell MK, Piaggio G, Elbourne DR, Altman DG, Group C. Consort 2010 statement: extension to cluster randomised trials. BMJ. 2012;345:e5661.View ArticlePubMedGoogle Scholar
- Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-randomized trials: a review of recent methodological developments. Am J Publ Health. 2004;94(3):423–32.View ArticleGoogle Scholar
- Harville DA. Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc. 1977;72:9.Google Scholar
- Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):10.View ArticleGoogle Scholar
- Vonesh EF. Generalized linear and nonlinear models for correlated data: theory and applications using SAS. Cary, NC: SAS Institute, Inc; 2012.Google Scholar
- Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, et al. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol Evol. 2009;24(3):127–35.View ArticlePubMedGoogle Scholar
- Bellamy SL, Li Y, Lin XH, Ryan LM. Quantifying PQL bias in estimating cluster-level covariate effects in generalized linear mixed models for group-randomized trials. Stat Sinica. 2005;15(4):1015–32.Google Scholar
- Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O. SAS® for Mixed Models. 2nd ed. Cary, NC: SAS Institute Inc.; 2006.Google Scholar
- Spilke J, Piepho HP, Hu XY. A simulation study on tests of hypotheses and confidence intervals for fixed effects in mixed models for blocked experiments with missing data. J Agric Biol Envir S. 2005;10(3):374–89.View ArticleGoogle Scholar
- Fai AHT, Cornelius PL. Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments. J Stat Comput Sim. 1996;54(4):363–78.View ArticleGoogle Scholar
- Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53(3):983–97.View ArticlePubMedGoogle Scholar
- Schaalje GB, McBride JB, Fellingham GW. Adequacy of approximations to distributions of test statistics in complex mixed linear models. J Agric Biol Envir S. 2002;7(4):512–24.View ArticleGoogle Scholar
- Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993;88(421):9–25.Google Scholar
- McCulloch CE. Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc. 1997;92(437):162–70.View ArticleGoogle Scholar
- Schluchter MDaE JD. Small-sample adjustments to tests with unbalanced repeated measures assuming several covariance structures. J Stat Comput Sim. 1990;37:19.Google Scholar
- Satterthwaite FE. An approximate distribution of estimates of variance components. Biom Bull. 1946;2:110–4.View ArticleGoogle Scholar
- Lee EW, Dubin N. Estimation and sample-size considerations for clustered binary responses. Stat Med. 1994;13(12):1241–52.View ArticlePubMedGoogle Scholar
- Gulliford MC, Adams G, Ukoumunne OC, Latinovic R, Chinn S, Campbell MJ. Intraclass correlation coefficient and outcome prevalence are associated in clustered binary data. J Clin Epidemiol. 2005;58(3):246–51.View ArticlePubMedGoogle Scholar
- Omar RZ, Thompson SG. Analysis of a cluster randomized trial with binary outcome data using a multi-level model. Stat Med. 2000;19(19):2675–88.View ArticlePubMedGoogle Scholar
- Turner RM, Omar RZ, Thompson SG. Bayesian methods of analysis for cluster randomized trials with binary outcome data. Stat Med. 2001;20(3):453–72.View ArticlePubMedGoogle Scholar
- Elston DA. Estimation of denominator degrees of freedom of F-distributions for assessing Wald statistics for fixed-effect factors in unbalanced mixed models. Biometrics. 1998;54(3):1085–96.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.