This article has Open Peer Review reports available.
Choosing marginal or random-effects models for longitudinal binary responses: application to self-reported disability among older persons
© Carrière and Bouyer; licensee BioMed Central Ltd. 2002
Received: 29 July 2002
Accepted: 5 December 2002
Published: 5 December 2002
Longitudinal studies with binary repeated outcomes are now widespread in epidemiology. The statistical analysis of these studies presents difficulties and standard methods are inadequate.
We consider strategies for modelling binary repeated responses and focus on two specific issues: the choice between marginal and random-effects models, and the choice of the time point origin. These issues are addressed using the example of self-reported disability in older women assessed annually for 6 years. The indicator of disability "needing help to go outdoors or home-confined" is used.
In view of the observed associations between the responses for consecutive years, the baseline response was considered as a covariate. We compared the marginal and random-effects models first when only the influence of time and age is analysed and second when individual risk factors are studied in an aetiological perspective. There were substantial differences between the parameter estimates. They were due to differences between specific concepts related to the two models and the large between-individual heterogeneity revealed by the analysis.
A random-effects model appears to be most suitable for the analysis of self-reported disability in older women.
In developed countries, disability in older persons is a major concern for public health authorities. Progress in medical care together with improved living conditions have led to longer life spans. One adverse consequence is that the number of very old people who are disabled is also increasing. The study of the succession of the different stages of disability, and of the ability to recover, together with the analysis of the risk factors for disability, are important issues for research in gerontology.
Longitudinal studies are more appropriate than cross-sectional studies which suffer various limitations due to biases such as the selective removal of disabled persons by institutionalisation, or differences in the proportions of disabled persons between age groups, also confounded with changes across generations . Several authors  have analysed repeated measures of disability by comparing two by two time points. This method involves a very large number of tests and leads to partial results for fixed periods. The statistical analysis must take four main characteristics of the longitudinal data into account: 1) time may be an explanatory variable, 2) repeated observations for a subject are likely to be correlated, 3) the covariables may be time-dependent (they may vary through time for a subject), 4) missing data in the successive responses may induce a bias. Several statistical models developed for longitudinal data have become more popular and there is software available for some of them. The problem of the choice of which to use remains.
Here, we present a strategy to model binary repeated responses and to explain how to choose between marginal and random-effects models. The evolution of disability in older persons is used as an example to illustrate each step.
The French EPIDOS study is a prospective multicentre study of the risk factors for hip fractures in 7575 women aged 75 years or older, included in 1992–1993 and recruited by mailing based on large population-based listings including electoral rolls . Once in the study, these women were contacted by mail or telephone every four months to collect information about any falls and new fracture events. They had to complete a mail questionnaire annually investigating hospitalisation, new health events, changes in weight, type of housing, ability to go outside, activities of daily life (ADL), instrumental activities of daily life (IADL), medications used, and subjective health. This follow-up was initially planned for 4 years and then was extended to 6 years. In this paper we analyse the data from Montpellier (southern France), one of the 5 participating centres. We used the data from all the subjects included at this centre (1548 women) to analyse the evolution of disability.
The annual questionnaire was in some cases missing. This was mainly due to illness or to family events such as the death of the spouse. The women then postponed their answer.
Various indicators have been proposed to assess disability . Needing help for going outdoors represents a first evident level of functional limitation. It is easy to identify and concerns both men and women. Therefore we chose "needing help to go outdoors or home-confined" as an indicator of disability. The annual response variable is binary.
Marginal and random-effects models
In longitudinal studies, multiple assessments of the same subject at different time points are used and the within subject responses are then correlated. This correlation must be accounted for by analysis methods appropriate to the data . Several models have been proposed for the analysis of such data. Most of them are extensions of the well-known logistic regression that is a particular case of the generalized linear models with a logistic link function . They are usually classified into marginal or random-effects models. Random-effects models are also called generalized linear mixed models or multilevel models or conditional models. However, this last term is ambiguous as several authors use it only for conditional maximum likelihood estimation  that we do not consider in this paper. Unlike linear models, the interpretation of the coefficients of these two types of model differs (see below). The choice of one or the other depends on the objectives of the study.
Let Y ij denote a binary outcome (in our example: needing help to go outdoors or home confined) corresponding to the jth response (jth year in the study in our example, j = 1 to n i ) of the ith subject (i = 1 to K). Let also Xij be a design matrix of covariates (1 × p vector, with first element being 1 for the intercept). The covariates may be fixed (for example age at baseline), or take different values at each year of the study (for example time or hospitalised for the last year). The marginal model, also called the population-averaged model , estimates the model thus:
logit (E (Y ij | X ij )) = logit (P (Y ij = 1 | X ij )) = X ij 'β
whereas the random-effects model, also called the "subject-specified models" , estimates the model as follows:
logit (P (Y ij = 1 | X ij )) = X ij ' β i
Thus, the marginal model supposes that the relationship between the outcome Y and the covariate X is the same for all the subjects, and the random-effects model allows this relationship to differ between subjects. To highlight this point, the random-effects model may also be written:
logit (E (Y ij | X ij , U i )) = X' ij (β* + U i ) = X ij ' β* + X ij ' U i
logit (E (Y ij | X ij , Z ij , U i )) = X ij ' β* + Z ij ' U i
where the random effects U i are assumed to vary independently from one subject to another according to a common distribution. This distribution is often supposed to be normal with mean 0 and variance D. Z ij is often a subvector of X ij , which means that random effects apply only to a part of the covariates and the intercept. The variance, D, has to be estimated and represents the extent of the unexplained between-individual variability.
It is important to note that the p × 1 vectors β in the marginal model and β* in the random effects model are not equal. Hence, the estimators estimate different things and the magnitude of the difference between β and β* is function of the variance, D .
Moreover, dependencies between observations over the time are handled differently in the two models. In the marginal model, it is popular to fit the vector of parameters, β, using the Generalized Estimating Equations (GEE) proposed by Liang and Zeger  wherein the covariance matrix is structured by using a working correlation matrix R(α) fully specified by the vector of parameters, α. This working correlation is assumed to be the same for all the subjects, reflecting an average dependence among the repeated observations for all subjects. In contrast, the random-effects model allows this within-subjects dependency to vary from one subject to another, by the means of the random part of the covariable linear combination. In the simplest case, only a random intercept is introduced
logit (E (Y ij | X ij , U i )) = X ij ' β* + U i0
U i0 being an individual parameter of propensity to become disabled, constant through the time. For a given subject, whatever the interval of time between two responses, the strength of dependency is then the same. If a random slope for the time covariable is added
logit (E (Y ij | X ij , U i )) = X ij ' β* + U i0 + U i1 time
this individual strength of association increases or decreases with the width of the interval.
In the marginal model, several specific choices of the structure of the working correlation matrix R(α) are possible. For example, R(α) is called m-dependent if corr(Y ij ,Y ik ) = α t for t = 1, 2, ..., m and corr(Y ij ,Y ik ) = 0 for t >m; R(α) is exchangeable if corr(Y ij ,Y ik ) = α for j ≠ k, and it is unstructured if corr(Y ij ,Y ik ) = α ik . An advantage of the marginal model, demonstrated by Liang and Zeger, is that β and their robust variance are consistent (the estimator converges towards the parameter being estimated as the sample size increases) even when the correlation structure is misspecified. However, choosing the working correlation structure closest to the true structure increases the statistical efficiency of the parameter estimator. Consequently, it is recommended to specify the working correlation as accurately as possible, based on the knowledge of the longitudinal process .
Concerning the estimating procedures, the GEE method for marginal models is not difficult to implement and is now available in the major statistical analysis packages. The procedures are more complex for the random-effects model. The most attractive of them directly maximizes an approximate integrated likelihood. With non-linear models computational-intensive integration methods, such as Gauss-Hermite quadrature, are necessary to evaluate the likelihood .
Different assumptions are required for the two models regarding missing data. The marginal model using the GEE requires a missing data process completely at random (MCAR). Under this assumption, missingness does not depend on individual characteristics (observed or not). In contrast, random models only need the less stringent assumption of missing at random (MAR). In this process, the probability of missingness depends only on observed variables (previous covariates or outcomes) . All marginal models do not require the MCAR assumption. For example, Robins et all  proposed methods to allow for data that are MAR in marginal models, but these methods are more complicated to implement.
The interpretation of the coefficients β and β* also differs. Consider, for instance, the covariate X "living alone". The odds ratio OR* = exp(β*) represents the odds of the outcome (needing help to go outdoors or home confined) for a person living alone compared to the same person supposed not to live alone. It can be seen as an odds ratio adjusted on unobserved individual characteristics. Under the marginal model, OR = exp(β) represents the averaged odds of the sub-group living alone compared to the sub-group not living alone.
In the following sections we will first describe the data and then focus on the specific problems inherent to longitudinal analyses: how to choose the time point origin and how to take into account the influence of time on responses. Then, we will present an example of an estimation of risk factors including covariates fixed across time for a single subject, and time-dependent covariates.
Description of the data
Description of the data by year of assessment
Total number of women with disability assessment
Percent of disabled women
Time since inclusion (yrs)
Number of deaths for the preceding year
The study was initially planned to last 4 years, some women refused to continue in years 5 and 6, and the number of those returning the completed questionnaire decreased most quickly between years 4 and 5 (Table 1). Some women gave intermittent answers. For example, of the 970 women still providing a response in year 6, at least one observation in the previous years was missing for 150 (15 percent). The missingness partly depended on unobserved individual events (illness or family events) and is unlikely to be MCAR. The use of the marginal model is therefore questionable because an averaged effect is only poorly meaningful. The true time between yearly assessments and inclusion varied from one woman to another and the range tended to be higher at the end of the follow-up. Consequently, time is considered as a continuous covariable. Vital status was known throughout the 6 years for all the participants, even those who no longer sent back the questionnaires. The total number of deaths was 294 (19 percent) by the end of the study.
Choice of the time point origin
Structure of the raw data when the baseline response is taken as a part of the reponse
Covariates l = 1, ... p
Time dependent covariates l = 1, ....q
Structure of the raw data when the baseline response is taken as a baseline covariate
Response at t 0
Covariates l = 1, ... p
Baseline values of the time dependent covariates l = 1, ....q
Time dependent covariates l = 1, ....q
In our example, as in many cohort studies, the baseline examination was exhaustive. The women were proposed clinical and functional examinations (series of standard tests of physical performance) and a bone densitometry at the hospital. However, only those who felt well enough to undergo these tests were examined, probably introducing a selection bias at inclusion. During the follow-up, the women were only asked to complete a mailed questionnaire that even physically dependent persons could answer. The profile of our sample changed through time, with the women becoming more and more physically dependent. At the beginning, many of the disabled women were probably only slightly disabled and thus there was the possibility of recovery. As time passed more women were severely disabled, and stayed so until the end of the study. Thus, the association between the responses for consecutive years is likely to be stronger in the last years. Also, the baseline response may be considered as a special case, less related to the following disabled states.
Odds ratios of being disabled at each year according to disability at the previous years
Analysis of time and age evolution
The objective here is to evaluate the impact of time on the proportion of disabled women. We chose to use the age at entry (fixed covariable) and the time since baseline to characterize this phenomenon. Another solution would be to use the age at every response (as a time varying covariable) but our option allows the introduction of an interaction between the age at entry and the time since baseline to test whether the effect of time is more pronounced in the oldest women.
In view of the missingness process resulting in the sample differing at every assessment, and the possible selection bias at entry, the search for individual relationships using the random-effects model is more suitable than using population averaged associations. However the two models are interesting to compare to show that the estimate parameters can be different, and to explain these differences.
We used two methods to characterize the changes in disability over time: the marginal model with the GEE approach and the random model with likelihood integration. The random model had Gaussian random effects and errors. We used the SAS procedures GENMOD with the REPEATED statement and an unstructured working correlation matrix and also NLMIXED with the Gauss-Hermite quadrature integration method . For the SAS code refer to Appendix (see Additional file 1). In the random model, we determined the structure of the covariance introducing successively two random effects: random intercept and random slope for the time covariate.
In all the models, the following fixed effects were then tested: time since baseline (continuous variable), time square, age at entry (in years exceeding 74), interactions age × time and baseline inability to go outside without help. Only the time square effect was not significant, and was removed from the model. We also introduced an indicator variable for women dying within the 6-year period. This is a simple way to take into account dropouts due to death, a difficulty often encountered in longitudinal studies in the elderly. The use of this time-fixed indicator was possible since information about death was known for the six years for all the women. The NLMIXED procedure converged and provided estimations only when the initial parameters were close to the final solution. We calculated successive models, beginning with a simple model with only a random intercept, and at each step we used the previous estimate parameters as initialisation for the next estimation.
Marginal model and random-effects models analysing the influence of time and age
Marginal model GEE
Random model likelihood integration
Age at baseline*
Age* × time
Death during the study
Random intercept variance
Random slope variance
Differences between the estimators of the marginal and random-effects parameters are expected (see above). The marginal model expresses averaged relationships without taking into account the fact that the same subjects are considered at each time point, whereas the random-effects model gives relationships conditionally on having certain individual characteristics modelled by the random effects. In the case of only a random intercept, Nehaus et al  demonstrated that the estimates from the marginal model are systematically lower than those from the random model. This characteristic is shown in figures 1 and 2, where the curves from the marginal model are flatter than the others. The differences between the estimates of the two approaches are largely dependent on the inter-individual heterogeneity. This heterogeneity can be assessed in the random models by looking at the intercept and slope variances.
The probability of being disabled is therefore much more due to the woman's uncharacterised "frailty" than to age or to baseline disability. This wide between-individual heterogeneity explains the differences between the parameter estimates of the marginal and random-effects models.
Analysis of risk factors
Changes with time expressed as age at entry and the time since baseline were modelled in the previous section. The marginal and random-effects models can also be used to identify other risk factors for disability. It would also be interesting to test how the estimating algorithms behave when numerous covariables are jointly introduced.
Marginal model and random-effects models analysing time-fixed and time varying covariates
Marginal model* GEE
Random model* likelihood integration
< 25 (ref)
25 – 29
> 2/10 (ref)
Bad or very bad
Hospitalised (past year)
Temporarily bed confined (past year)
Number of falls ≥ 2 (past year)
We present the steps for choosing a model able to characterize disability taken as a binary response.
Several important points have to be considered in the analysis of repeated binary data. First, the choice between the marginal and random-effects models depends mainly on the aims of the study. If the goal is to predict a mean prevalence of disability over time in elderly people by sex or age group, the marginal model is suitable. In contrast, if the goal is to study the individual risk factors for aetiological considerations, the random-effects model is more suitable because it allows adjustment on non-observed individual characteristics, and a better understanding of the underlying mechanism.
Second, the missing data process has to be examined. The marginal model using GEE assumes that the sample is representative of the whole population at each time point and the missing data is MCAR. This is unlikely to be the case in our example. The random-effects model assuming an MAR process is more appropriate. By the end of the study, half of the 37% of missing data were due to deaths and most were predictable from the previously collected data. The same probably applies to missing data due to chronic illnesses. Nevertheless, we cannot exclude the possibility that unobserved level of disability in our example may influence a part of the missing process. The magnitude of the potential bias introduced by non-random missingness should therefore be examined. A sensitivity analysis to assess the impact of missing data on subsequent statistical inference would be worthwhile , but is beyond the scope of this paper.
Another important point is to determine whether the disability indicator at baseline should be considered as a response or as a covariate. In epidemiological cohorts, the conditions in which responses are collected at baseline and during the follow-up are often different. The first response is often considered as a covariate but this choice has to be confirmed in view of the analysis of dependency between the responses.
The structure of the covariance between the repeated responses has also to be chosen. In the marginal model, the inferences on the parameter estimates are asymptotically valid under any assumed structure but it is better to choose a structure corresponding to the data. In contrast, in the random model, the fixed and random parameters are simultaneously estimated and the choice of the covariance structure influences the final results. For the mixed model applied to gaussian responses, it is recommended  (i)to consider first the more general model with all the relevant covariables, (ii) to specify a model for the covariance structure (i.e. to specify the random effects), and to estimate the parameters, and (iii) to try to reduce the fixed effect portion. In practice, the estimating procedures that we used do not allow introduction of a complex covariance structure due to computational time and unstable estimations. Other more flexible methods allowing multiple levels of clustering, such as Markov chain Monte Carlo methods, can be used but are more complex to handle .
The calculation procedures for random models need to be improved. The method of estimation with likelihood integration requires excessive computation. An alternative strategy to fit mixed models is the penalized quasi-likelihood (PQL) approach [18, 19] (GLIMMIX macro in SAS) but this method provides highly biased estimates of mixed-effects parameters with binary responses [20, 21].
The analysis of disability evolution with age at entry and time, as well as the study of other risk factors, show that the differences between the estimates for the two models can be large. However, the interpretation of the estimate parameters is different. In the marginal model, the exponential of an estimate parameter is a population-averaged odds ratio for disability and concerns the sub-population that shares a characteristic relative to the sub-population not sharing this characteristic. In the random model, the exponential of an estimate is an odds ratio for a woman that has a characteristic relative to this same woman if she were free of this characteristic. The random model takes into account the underlying dependence relationship. Furthermore, the assumptions about the distributions are different, and the fact that the conditional distribution is binomial does not imply that the marginal distribution is also binomial .
In our example, the search for risk factors for disability prevention, and the characteristics of our sample, such as the missingness process and the number of drop-outs after 4 years, render the random-effects models more appropriate. The marginal model has a tendency to waste information and does not measure the association of within-subject covariate change with change in the response, the associations typically of particular scientific value in longitudinal studies.
In epidemiology, many rich databases are now available from longitudinal studies with binary repeated events. Traditional analyses comparing time points two by two have serious limitations. Marginal models are easy to implement and represent a first solution, but the random models, although more complex, use all available data and are more suitable for explicative studies.
The authors are grateful to Dr François Favier and the EPIDOS group who provided the data for this work and to Annie Lacroux for editorial assistance.
- Beckett LA, Brock DB, Lemke JH, et al: Analysis of change in self-reported physical function among older persons in four population studies. Am J Epidemiol. 1996, 143: 766-78.View ArticlePubMedGoogle Scholar
- Hébert R, Brayne C, Spiegelhalter D: Factors associated with functional decline and improvement in a very elderly community-dwelling population. Am J Epidemiol. 1999, 150: 501-10.View ArticlePubMedGoogle Scholar
- Dargent-Molina P, Favier F, Grandjean H, et al: Fall-related factors and risk of hip fracture: the EPIDOS prospective study. Lancet. 1996, 348: 145-49. 10.1016/S0140-6736(96)01440-7.View ArticlePubMedGoogle Scholar
- WHO/EUROPE, NCBS: Third consultation to develop common methods and instruments for Health interviews surveys. Voorburg, the Netherlands: WHO and Netherlands Central Bureau of Statistics. 1992Google Scholar
- Diggle PJ, Liang KY, Zeger SL: Analysis of longitudinal data. Oxford: Clarendon Press, Oxford Science Publications. 1994Google Scholar
- McCullagh P, Nelder JA: Generalized Linear Models – Second Edition. Boca Raton, London, New-York, Washington, D.C., Chapman & Hall/CRC. 1989Google Scholar
- Zeger SL, Liang KY, Albert PS: Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988, 44: 1049-60.View ArticlePubMedGoogle Scholar
- Pendergast JF, Gange SJ, Newton MA, et al: A survey of methods for analyzing clustered binary response data. Int Stat Rev. 1996, 64: 89-118.View ArticleGoogle Scholar
- Neuhaus JM, Kalbfleisch JD, Hauck WW: A comparison of cluster-specific and population-averaged approaches for analysing correlated binary date. Int Stat Rev. 1991, 59: 25-36.View ArticleGoogle Scholar
- Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22.View ArticleGoogle Scholar
- Albert PS: Longitudinal data analysis (repeated measures) in clinical trials. Stat Med. 1999, 18: 1707-32. 10.1002/(SICI)1097-0258(19990715)18:13<1707::AID-SIM138>3.0.CO;2-H.View ArticlePubMedGoogle Scholar
- Pinheiro JC, Bates DM: Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics. 1995, 4: 12-35.Google Scholar
- Little RJA, Rubin DB: Statistical analysis with missing data. New-York: Wiley and Sons. 1987Google Scholar
- Robins JM, Rotnitzky A, Zhao LP: Analysis of semiparametric regression models for repeated outcomes in presence of missing data. J Am Stat Assoc. 1995, 90: 106-121.View ArticleGoogle Scholar
- Molenberghs G, Burzykowski T, Michiels B, et al: Analysis of incomplete public health data. Rev Epidémiol Santé Publique. 1999, 47: 499-14.PubMedGoogle Scholar
- Littell RC, Pendergast J, Natarajan R: Modelling covariance structure in the analysis of repeated measures data. Stat Med. 2000, 19: 1793-1819. 10.1002/1097-0258(20000715)19:13<1793::AID-SIM482>3.3.CO;2-H.View ArticlePubMedGoogle Scholar
- Zeger SL, Karim MR: Generalized linear models with random effects: A Gibbs sampling approach. J Am Stat Assoc. 1991, 86: 79-86.View ArticleGoogle Scholar
- Breslow NE, Clayton DG: Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993, 88: 9-25.Google Scholar
- Wolfinger R, O'Connell M: Generalized linear mixed models: a pseudo-likelihood approach. J Statist Comput Simul. 1993, 48: 233-43.View ArticleGoogle Scholar
- Breslow NE, Lin X: Bias correction in generalised linear models with a single component of dispersion. Biometrika. 1995, 82: 81-91.View ArticleGoogle Scholar
- Rodriguez G., Goldman N: An assessment of estimation procedures for multilevel models with binary responses. Journal of Royal Statistical Society. 1995, A158: 73-89.View ArticleGoogle Scholar
- Lindsey JK, Lambert P: On the appropriateness of marginal models for repeated measurements in clinical trials. Stat Med. 1998, 17: 447-69. 10.1002/(SICI)1097-0258(19980228)17:4<447::AID-SIM752>3.3.CO;2-7.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/2/15/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.