Beta regression is a promising method for modeling HRQL data in cross sectional research
[3, 10], and recent methodological work has extended the beta regression model to deal with dependent observations
[8, 16]. In this paper, we examined the potential of beta regression methods in the analysis of longitudinal HRQL data. We highlighted the need to distinguish between mixed and marginal models, namely beta GLMM and beta GEE, when beta regression is extended to the longitudinal case. Using two empirical applications with data distributions typically encountered in practice, we compared the performance of the beta regression methods to that of the commonly used LMM.

Data collected in longitudinal designs typically have correlated observations, violating a basic assumption of ordinary regression methods. Longitudinal analyses require regression techniques that account for this dependence. In general, the correlation among repeated measures can be modeled implicitly, i.e. by including random effects as in the mixed model, or explicitly, i.e. by specifying a covariance structure between observations as in the marginal model. Through the inclusion of random effects, mixed models assume natural heterogeneity across individuals in some regression coefficients
[18]. Random effects can also be motivated as an omitted subject-varying covariate, thus they give a potential explanation for the sources of correlation
[19]. In contrast, marginal models treat the dependence between observations as nuisance and account for its effects by specifying a working correlation.

For linear longitudinal models, regression coefficients have the same interpretation regardless of how the correlation is modeled. For regression models with non-identity link such as beta regression, however, interpretation depends on whether a mixed model (i.e. a GLMM) or a marginal model is fitted. In the GLMM, estimated effects are adjusted for individual difference and thus only refer to within-individual change. In the marginal model, in contrast, the mean response is conditional only on covariates and not on other responses or random effects
[18].

The choice between the two depends mainly on the specific scientific question of interest. GLMMs are most useful for making inferences about individuals and tracking individual trajectories, while the marginal model is more useful for inferences about population or sub-population averages. No model is a priori more suitable for the analysis of HRQL data than the other. It has been argued that mixed models may be more appropriate in epidemiological research as they allow a better understanding of the underlying mechanisms
[28]. Also, they have a close relationship to matched-pair design methods often used in epidemiologic and public health research
[19]. Due to the individual-specific interpretation of regression coefficients, the GLMM is also most meaningful for time-varying covariates. In contrast, the interpretation of time-invariant or between-subject covariates in the GLMM is less intuitive or even misleading since they also only allow a within-subject interpretation which is difficult to imagine. For example, if a beta GLMM is used to estimate treatment effects on HRQL in clinical trials, the respective treatment arm coefficient is interpreted as the difference in outcomes between two individuals with the same covariate values and the same random effects *b*
_{
i
}, differing only in their treatment arm. It does not describe the average treatment effect which is usually of major interest in intervention studies, especially if preference-based HRQL measures are used in economic evaluation studies
[4]. Therefore, the marginal model may be more suitable in many applications in public health research. Also, it has been argued that many epidemiologic methods such as stratified methods are essentially population-averaged methods
[19]. For our empirical applications this means that the change in SF-6D index scores associated with diabetes in the KORA data may be better described by a beta GLMM, while the difference in mean SIS scores between rehabilitation phases in the ICF stroke data may be better assessed using a beta GEE. Differences between beta GLMM and beta GEE also exist with respect to the handling of missing data: In practice, the beta GLMM may be more convenient since it remains valid under the MAR assumption which is usually more plausible in quality of life studies than the MCAR assumption made by the beta GEE.

A common approach to compare regression models and assess goodness of fit is to consider likelihood-based statistics which evaluate the probability of the observed data under the model. In both of our empirical examples, beta GLMM had better fit statistics (such as AIC, BIC or pseudo-R^{2}) than the commonly used LMM, indicating that the beta distribution better accounted for the bounded support of the observed HRQL scores and their highly skewed distributions. However, an important question is whether better likelihood statistics make the beta GLMM more suitable than the LMM in practice. A similar issue has also been addressed previously: Zimprich used a beta GLMM to analyze longitudinal data on complex choice reaction time and concluded from better likelihood-based fit statistics that beta GLMM fitted the data much better than a LMM did
[17]. However, given a fairly close similarity between parameter estimates, he also raised the question whether apart from these statistical considerations, beta GLMM is worth the effort to apply in practical data analyses. We even go one step further arguing that the likelihood may not be the most relevant criterion when comparing models to analyze HRQL data. Distributional fit and predicted densities may be important in applications with focus on individual density forecasts, such as in the reaction time example. However, when analyzing HRQL data in RCTs or cohort studies, conditional means rather than predictive densities are commonly of major interest
[4]. Against this background, more attention should be attached to the question whether the mean structure is appropriately reproduced by the model. Figure
2 showed that the beta GLMM reproduced the observed values at the upper end of the distribution less satisfactorily than the LMM. This may be explained by the fact that beta regression fits both means and variances to the data. Since in the beta distribution the variance is a function of the mean, the estimated mean function may be biased. This phenomenon has already been observed in a cross-sectional design and suggests that full likelihood-based beta regression methods should be used with care when analyzing HRQL
[3].

In the marginal perspective, beta GEE produced nearly identical estimates to the LMM, however, differences could be observed with respect to standard errors, especially in the ICF stroke data (Table
4). The larger standard errors of the beta GEE for patients in phase C are probably due to the fact that beta GEE provides robust standard errors using the sandwich formula. The smaller standard errors for patients in phase D, however, indicate that beta regression accounts for heteroscedasticity related to the bounded nature of the response variable
[8]. This is because the predicted means of individuals in phase D were rather high, and for outcomes bounded on the unit interval, the variability of scores declines as the mean approaches one.

An important limitation of the beta regression is that it does not contain the boundary points 0 and 1 so that quite arbitrary transformation methods need to be applied. However, our sensitivity analyses support previous research in that parameter estimates are robust to the choice of transformation, provided that the values are moved far enough away from the boundaries.

The two empirical data sets used in this study were chosen to cover different types of studies commonly encountered in HRQL research. In particular, we addressed both disease-specific and generic HRQL scores and used data both from a cohort study and from a clinical setting. The two illustrative examples tackle clinically relevant research questions that have also been addressed in other studies
[44–46]. However, since this paper focused on the comparison between different methodological approaches, we did not deal in detail with interpreting results in the healthcare context. For the purpose of this paper we have also made some simplifications, e.g. we did not consider model building but preferred using a rather lean model with only a few covariates. Also, we treated the precision parameter in the beta GLMM as constant instead of modeling it in terms of covariates, although such an approach may have improved model fit
[10, 17, 26].

Another limitation of our study is that our empirical data only provided up to three measurements per individual. Further research is needed to examine the use of beta regression in more complex study designs. Also, we did not consider random slopes which are commonly used to model heterogeneity in the effect of time on the response variable
[15]. However, for reasons of model convergence, it is not recommended to fit anything more complex than a single random intercept model to non-normal data with only a few time points per person. Similarly, we did not consider working correlation structures other than compound symmetry in the beta GEE. However, compound symmetry assumes the same correlation for all observations within a person which we think is reasonable in the case of only a few time points per person. Furthermore, it corresponds to the correlation structure implicitly modeled by the mixed model with single random intercept.