Latent variables and structural equation models for longitudinal relationships: an illustration in nutritional epidemiology
© Chavance et al; licensee BioMed Central Ltd. 2010
Received: 10 November 2009
Accepted: 30 April 2010
Published: 30 April 2010
The use of structural equation modeling and latent variables remains uncommon in epidemiology despite its potential usefulness. The latter was illustrated by studying cross-sectional and longitudinal relationships between eating behavior and adiposity, using four different indicators of fat mass.
Using data from a longitudinal community-based study, we fitted structural equation models including two latent variables (respectively baseline adiposity and adiposity change after 2 years of follow-up), each being defined, by the four following anthropometric measurement (respectively by their changes): body mass index, waist circumference, skinfold thickness and percent body fat. Latent adiposity variables were hypothesized to depend on a cognitive restraint score, calculated from answers to an eating-behavior questionnaire (TFEQ-18), either cross-sectionally or longitudinally.
We found that high baseline adiposity was associated with a 2-year increase of the cognitive restraint score and no convincing relationship between baseline cognitive restraint and 2-year adiposity change could be established.
The latent variable modeling approach enabled presentation of synthetic results rather than separate regression models and detailed analysis of the causal effects of interest. In the general population, restrained eating appears to be an adaptive response of subjects prone to gaining weight more than as a risk factor for fat-mass increase.
Structural equation and latent variable models [1, 2] have previously been used in several fields of epidemiology. However, because the introduction of a latent variable becomes relevant as soon as a risk factor of interest cannot be obtained with a single exact measurement, it should be more popular. Structural equations allow modelling of different types of correlations between observations, regardless of their source (e.g., causal relationship, multiple outcomes, repeated measurements, longitudinal designs, etc). This approach is useful for path analysis, which, for example, enables separation of direct and indirect effects, and expands causal interpretations through the identification or elimination of potential mediators. Except for a few fields, like quality of life, psychometrics, socio-economics or dietary-intake assessments, in which the common problem is how to deal with psychometric properties of the questionnaires, these techniques remain seldom used by epidemiologists [3–7]. The aim of this paper is to encourage use of this approach. As an illustration, we applied it to data from a longitudinal study, previously analyzed with conventional regression models, about restrained eating as a risk factor for weight gain over a 2-year period, in a sample of adults from the general population . Restrained eating , which has been described as the tendency to consciously restrict food intake to control body weight or promote weight loss, might have the paradoxical effect of inducing increased adiposity, through frequent episodes of loss of control and disinhibited eating. In this analysis, different indicators of adiposity were considered because no perfect measurement of adiposity is applicable for large epidemiological studies. Adiposity is often estimated through body mass index (BMI), but it can also be appreciated through determination of other fat-mass indicators, such as waist circumference, skinfold thickness and percent body fat, estimated with a bioimpedance analyzer. None of them provides an error-free assessment of global adiposity, but each one provides some information about body fat mass. If one tests separately the effect of restrained eating on each measurement, the familywise error rate , i.e. the probability of making any error in this family of tests when restrained eating has no effect on adiposity, is higher than the size of each test. By contrast, combining the four measurements into an adiposity latent variable within a structural model avoids the drawbacks of either arbitrarily choosing a single adiposity measurement or performing separate analyses on each fat-mass indicator. The results obtained with this novel analytical approach, using structural equation models and considering latent variables to model global adiposity, have been compared to those obtained with separate linear regressions.
The dataset is a sample of the community-based Fleurbaix Laventie Ville Santé Study II (FLVS II), whose general aim was to investigate, in the general population, risk factors for weight and adiposity changes. The results of several cross-sectional studies suggested a link between restrained eating and weight gain, but those findings remain controversial. An aim of FLVS II was to measure longitudinally the effect of restrained eating on fat-mass changes and the effect of fat-mass on restrained eating changes.
Details concerning FLSV II study design and data collection can be found elsewhere . Briefly, a first study, FLVS I  had been conducted on the children of all 579 families who had at least one child in primary school in 1992 in Fleurbaix or Laventie. Participation in FLSV II was proposed to 393 families who had not moved and who could be contacted in 1999: 294 families were recruited on a voluntary basis. Parents' overweight status and the subjects' ages and sexes, did not differ significantly between families who accepted to participate or not.
In our analysis, anthropometric data (weight, height, waist circumference, the bicipital, tricipital, subscapular and suprailiac skinfold thicknesses and percent body fat determined using a Tanita TBF 310 tetrapolar foot-to-foot bioimpedance analyzer) were collected by trained technicians at baseline and 2 years later, i.e., in 1999 and 2001. We used the sum of the four skinfold thicknesses as an indicator of the subcutaneous fat mass, named "skinfold thickness" for short in the following. Eating behavior was assessed using a French translation of the Three Factor Eating Questionnaire Revised 18-item version (TFEQ-R18) . We focused on the cognitive restraint scale (CRS) of the eating-behavior questionnaire for the parents. The analyzed sample was composed of 256 females and 201 males.
Latent variables and structural equation modeling
We briefly recall here the principle of this approach. Latent variables are used to translate the fact that several observed variables (also named manifest variables) are imperfect measurements of a single underlying concept. Each manifest variable is assumed to depend on the latent variable through a linear equation. The coefficients linking the latent and manifest variables are called loadings. A measurement scale has to be chosen for the latent variable. By convention, it is generally the scale of the first manifest variable, implying that the first loading is not estimated but fixed at 1. Because the indicators of the manifest variables are measured on various scales, it is useful to consider standardized estimates rather than raw loadings, using the observed standard deviations as measurement units for latent and manifest variables.
In structural equation modeling, relationships may be assumed between all manifest and latent variables according to acquired knowledge. These relationships are also defined through linear equations and a given variable can appear explanatory in one or several equations and as the outcome in another. As a result, it is possible to distinguish direct and indirect effects between an explanatory variable X and an outcome Y. When X has a causal effect on M, which causally influences Y, part or all of the effect of X on Y can be explained by the path X → M → Y, and M is called a mediator. The indirect effect of X on Y through M is obtained as the product of the estimated coefficients associated with the two arrows in the path. The regression coefficients and the variances of the residual errors that appear in the linear equations of the structural model specify how the manifest variables vary together. When they can be identified, they are estimated by optimizing a measure of adequacy between the observed and the model-predicted variance-covariance matrix (e.g. maximizing a likelihood).
To validate the use of a latent variable approach, we fitted preliminary latent variable models to the four baseline anthropometric measurements (BMI, waist circumference, sum of skinfolds, percent body fat) to create a measurement model, as only one latent variable and its four manifest variables assessments are considered. We fitted such a model separately to measurements at baseline and two years later, first for the two sex groups, then for the entire sample. We also considered measurement models for the baseline measurements and their two-year changes explained by the baseline adiposity and its two-year change and we assumed the same relationships between latent adiposity and its four indicators at baseline and two years later; this model constrained the four loadings, i.e. the regression coefficients, to be identical for baseline adiposity, adiposity two years later and adiposity change (see appendix I). We considered variation rather than final values to avoid the problems of estimation and interpretation of coefficients issued from highly correlated variables .
By contrast, the effect of baseline adiposity on CRS change was adjusted for baseline CRS and thus freed, at least partially, from the factors confounding the cross-sectional effect. Testing whether this effect is null can provide an answer to the question: Does initial adiposity predict variation of CRS over time? The direct effects of baseline CRS on adiposity and CRS changes were also adjusted for baseline adiposity and freed, at least partially, of the cross-sectional confounding effects. However, according to the orientation of the arrows, there are three paths from baseline CRS to adiposity change: the direct one and two indirect paths, one through CRS change and one through baseline adiposity. Thus, both the direct effect of baseline CRS on adiposity change and its indirect effects have to be considered to answer the second question: Could restrained eating induce an increase of adiposity over time? The indirect effect through baseline adiposity is not free of the confounding effects and does not have to be considered. The indirect effect through CRS change can be interpreted as a consequence of the change of intake. Note that since the measurement error on a baseline value also appears, with a minus sign for the corresponding change, the baseline value and its change will be negatively related, even in the absence of a causal link between the error-free baseline value and the error-free change. Appendix I provides a short formal presentation of the model.
All statistical analyses were performed on SAS9.1, using CALIS procedure. We log-transformed BMI, skinfold thickness and waist circumference to normalize their distributions and checked with Q-Q plots and Kolmogorov-Smirnov statistics that the transformed variables did not depart significantly from normal distributions. We chose to maximize the normal-theory maximum likelihood criteria. Among the various assessment of fit criteria, we focused on the root mean squared error of approximation (RMSEA)  and on the normed fit index (NFI) . These criteria range from 0 to 1, with RMSEA close to 0 and NFI close to 1 for a correct fit. In order to build confidence intervals for indirect effects estimates or for the sum of direct and indirect effects, their variances were obtained by bootstrapping the sample subjects. A large number of bootstrap samples (1, 000) were used, to assess visually the assumed normal distribution of the estimators.
General characteristics of the dataset
Characteristic of the Studied Population
n = 201
n = 256
Age in 1999 (yr)
Percent Body Fat (%)
Body Mass Index (kg/m2)
Skinfold Thickness (mm)
Waist Circumference (cm)
Cognitive Restraint Score
Percent Body Fat (%)
Body Mass Index (kg/m2)
Skinfold Thickness (mm)
Waist Circumference (cm)
Cognitive Restraint Score
Measurement model for adiposity and adiposity change
Measurement models for 1999 and 2001 evaluations: goodness of fit
0.00 [. ; 0.14]
0.05 [. ; 0.17]
0.16 [0.08 ; 0.26]
0.07 [. ; 0.17]
R2*** Percent Body Fat
R2 Body Mass Index
R2 Skinfold Thickness
R2 Waist Circumference
Global measurement Model: Standardized Loadings of the Two Latent Variables
Baseline Standardized Estimates
Change Standardized Estimates
Baseline Standardized Estimates
Change Standardized Estimates
Percent Body Fat
Body Mass Index
Longitudinal modeling of adiposity and restrained eating
Concerning the global fit of the model, RMSEA and its 95% confidence interval was 0.11 [0.093 ; 0.014] for females and 0.16 [0.14 ; 0.18] for males, while their respective NFI were 0.91 and 0.84. The regression coefficients for the four baseline anthropometric measurements on baseline adiposity and of the four measurement changes on adiposity change, i.e., the loadings, are given in Table 3. The standardized coefficients showed that BMI was the most highly correlated and skinfold thickness was the least correlated to the latent variables. The standardized coefficients of percent body fat, skinfold thickness and waist circumference were clearly lower for changes than for baseline measurements (around 0.6 or lower versus 0.9). On the other hand, the four BMI standardized coefficients were quite high (between 0.94 and 1.00).
Structural Equation Model: Regression Coefficients
Direct and Indirect Effects of Baseline CRS on Adiposity Change
2 (indirect through CRS change)
3 (indirect through baseline adiposity)
Comparison with usual linear regressions
Comparisons of Approaches with and without Latent Variables to Study the Effect of Baseline Fat Mass Measurements on CRS Change
Fat mass measurement
Regression coefficient of CRS change on baseline measurements *
Percent Body Fat
Body Mass Index
Latent variables and measurement model
When fitting longitudinal models for adiposity and restrained eating, both goodness of fit criteria, RMSEA and NFI, worsened in comparison to the model fits of the measurement models obtained separately with the four baseline anthropometric measurements and the four measurement changes. That observation means that the relationships between each of the four indicators and its change cannot be reduced to the relationship between baseline adiposity and its change. Each of the four anthropometric indicators provides an imperfect assessment of global adiposity: BMI, because it also includes lean body mass, and the other three because they reflect local components of total fat mass: mainly the lower part of the body for percent body fat by Tanita bioimpedancemetry, abdominal compartment for waist circumference, and subcutaneous compartment for skinfold thicknesses. Adiposity changes may preferentially affect a given compartment for some subjects and another one for other other subjects. Similarly, the effect of the explanatory variables on the indicators cannot be reduced to their effect on latent adiposity. For example, age may affect BMI, through modifications of fat mass and lean mass. However, the used model provided a reasonable fit and was able to answer the epidemiological questions of interest.
Comparison of statistical approaches
When studying a latent change, some authors prefer to use as manifest variables the baseline measurements and the time 2 measurements rather than the baseline measurements and their changes [16, 17]. Under the equality constraint on the loadings at baseline and at time 2, both measurement models are similar (see appendix I). They differ, however, for the residual errors which should be equal or almost equal at time 1 and time 2 for any raw measurement but are different for a baseline measurement and its change. For each sex, we verified that in the measurement models, the loadings and the fit indices were similar when using either parameterization with and without the equality constraints.
What are the pros and cons of a latent variable analysis, as compared with separate analyses on each indicator? A latent variable analysis considers a combination of the four measurements which expresses what makes them vary together, global adiposity. Thus, it allows a synthetic presentation of results while improving precision, reducing the number of tests and limiting multiple testing difficulties. Here, each of the individual measurement analyses gave similar conclusions, which were the same as that obtained with the latent variable approach. Clearly, this cannot be always the case. When individual analyses are not consistent, a latent variable model provides an easily interpretable synthesis. Moreover, a by-product of our latent variable approach was that, among the four fat-mass indicators, BMI was the closest to latent adiposity for baseline measurement and, especially, for 2-year changes. When a single measurement exhibits a relationship with the latent variable as strong as BMI, there is not much to gain by considering other measurements; but should one decide to consider several measurements, we recommend a latent variable rather than separate analyses of each indicator.
Structural equation and path analyses are very useful for causal interpretation. Of course, the interpretations are conditional on the validity of the assumed model. Physiologically, the short-term effect of restrained eating is decreased adiposity. However, at baseline, high CRS were associated with high adiposity in each sex group. This cross-sectional association is insufficient to establish a long-term causal link between restrained eating and adiposity. The most likely explanation is that this association is confounded by some subjects' propensity to easily gain weight and their efforts to counterbalance this tendency through restrained eating. Accordingly, the longitudinal part of the model showed that, adjusting for baseline CRS, subjects with a high initial adiposity had a larger CRS increase during the 2-year follow-up than the others. The direct effect of baseline CRS on adiposity change was not significant for either sex, and of opposing signs for males and females. Practically, for a given sex, a CRS 20 units above the mean implied an expected BMI change of exp(20 × CRS effect on adiposity × loading of log(BMI)), respectively exp(-20 × 0.096 × 0.024) = 0.995, i.e., a decrease of 0.5% for females, and exp(20 × 0.012 × 0.022) = 1.005, i.e., an increase of 0.5% in males. The indirect effect of baseline CRS through CRS change was positive but small for each sex (0.004). The indirect effect through baseline adiposity is difficult to interpret because it relies on the strongly confounded cross-sectional association. In any case, its estimates were negative for females (-0.001) and males (-0.004). Finally, the longitudinal effect of baseline CRS, free of the cross-sectional confounding factors, is the sum of the direct effect and of the indirect effect through CRS change. The estimate for males was significantly positive (+0.016) but non significant of opposite sign (-0.006) for females. The effect observed for males was found significantly positive, however we considered that the direct effect of CRS on adiposity change (adjusted for CRS change) provide the best measurement of the effect of CRS on adiposity change. The indirect effect through CRS change is at least partly due to the regression to the mean (the expected negative relationships between baseline CRS and CRS change) and to the physiologic effect of CRS change on adiposity change. The relationships observed between each baseline value and its change were negative, as expected, although only three of them were significant, probably because of limited statistical power.
Cross-sectional studies have shown that restrained eating is frequent in those with high adiposity [18–20]. The results of prospective studies are more controversial. Higher restraint scores were associated with better weight maintenance after weight loss  or weight gain  prevention intervention. In the general population, Drapeau et al  found that initial restrained eating was related to subsequent weight gain positively in women but negatively in men, which is the opposite of our results. Hays et al  found that restraint was protective against weight gain only in women with high levels of disinhibition. That latter study was retrospective and self-reporting of past body weight may have biased past relationships. In adults with a familial history of obesity, non-obese women with the highest CRS were those who had been obese in childhood or adolescence, suggesting a beneficial effect of cognitive restriction for weight control in these women . Altogether, we do not consider that available data from general population supports the hypothesis that restraint eating could induce an increase in adiposity: i) because of the inconsistency between studies ii) because of the inconsistency of the relationships observed according to sex; iii) because of the low level of significance of the observed relationship (p = 0.05 for males in our study).
This latent variable and structural equation model enabled us to present synthetic results rather than four separate analyses for each sex group and to perform a detailed analysis of the causal mechanisms involved. It confirmed our previous observations; in the general population, restrained eating appears to be more of an adaptive response of subjects prone to gaining weight than a risk factor for increased fat mass.
Appendix I: Latent Variables and Structural Equation Model
This is the model and the parameterization used in the article. An alternative model uses two different sets for the baseline adiposity (k = 0) and the adiposity change (k = 1). The coefficient λ 1, linking the first manifest variable (here, percent body fat) to its latent variable, is not estimated but fixed at 1. As a result, latent adiposity is arbitrarily expressed on the same measurement scale as percent body fat. Because the latent variable indicators are measured on various scales, it is useful to consider standardized estimates rather than raw loadings, using the observed standard deviations as measurement units for latent and manifest variables, namely .
Note that, for a given λ i obtained under equality constraints, there are two standardized coefficients, one for each latent variable.
where the residual errors, ζ k (k = 0, 1) and ζ 3 are Gaussian random variables with null expectation. To simplify the equations, we centered all observed variables, so that intercepts no longer appear.
The Fleurbaix Laventie Ville Santé Study was supported by grants from the CEDUS (Centre for Sugar Research and Information), the CISB (Centre for Scientific Information on Beer), and Groupe Fournier, Knoll, Lesieur, Nestlé France, and Roche Diagnostics companies. MA Charles received grants from the ALFEDIAM (Association de Langue Française pour l'Étude du Diabète et du Métabolisme) and from the Mutuelle Générale de l'Éducation Nationale. All these funding sources were devoted to data collection, and did not interfere with analysis and interpretation of data, the writing of the manuscript or the decision to submit the manuscript for publication.
- Bollen KA: Structural equations with latent variables. 1989, New York: WileyView ArticleGoogle Scholar
- Kaplan D: Structural equation modeling. 2000, Thousand Oakes: SageGoogle Scholar
- Proust-Lima C, Amieva H, Dartigues JF, Jacqmin-Gadda H: Sensitivity of four psychometric tests to measure cognitive changes in brain aging-population-based studies. Am J Epidemiol. 2007, 165 (3): 344-350. 10.1093/aje/kwk017.View ArticlePubMedGoogle Scholar
- Silva A, Metha Z, O'Callaghan FJ: The relative effect of size at birth, postnatal growth and social factors on cognitive function in late childhood. Ann Epidemiol. 2006, 16 (6): 469-476. 10.1016/j.annepidem.2005.06.056.View ArticlePubMedGoogle Scholar
- Day NE, Wong MY, Bingham S, Khaw KT, Luben R, Michels KB, Welch A, Wareham NJ: Correlated measurement error--implications for nutritional epidemiology. Int J Epidemiol. 2004, 33 (6): 1373-1381. 10.1093/ije/dyh138.View ArticlePubMedGoogle Scholar
- Kaaks R, Ferrari P: Dietary intake assessments in epidemiology: can we know what we are measuring?. Ann Epidemiol. 2006, 16 (5): 377-380. 10.1016/j.annepidem.2005.06.057.View ArticlePubMedGoogle Scholar
- Singh-Manoux A, Clarke P, Marmot M: Multiple measures of socio-economic position and psychosocial health: proximal and distal measures. Int J Epidemiol. 2002, 31 (6): 1192-1199. 10.1093/ije/31.6.1192. discussion 1199-1200View ArticlePubMedGoogle Scholar
- de Lauzon-Guillain B, Basdevant A, Romon M, Karlsson J, Borys JM, Charles MA: Is restrained eating a risk factor for weight gain in a general population?. Am J Clin Nutr. 2006, 83 (1): 132-138.PubMedGoogle Scholar
- Herman CP, Mack D: Restrained and unrestrained eating. J Pers. 1975, 43 (4): 647-660. 10.1111/j.1467-6494.1975.tb00727.x.View ArticlePubMedGoogle Scholar
- Hochberg Y, Tamhane AC, (Eds): Multiple comparison procedures. 1987, New York: John Wiley & SonsGoogle Scholar
- Lafay L, Basdevant A, Charles MA, Vray M, Balkau B, Borys JM, Eschwege E, Romon M: Determinants and nature of dietary underreporting in a free-living population: The Fleurbaix Laventie Ville Santé (FLVS) Study. Int J Obes Relat Metab Disord. 1997, 21 (7): 567-573. 10.1038/sj.ijo.0800443.View ArticlePubMedGoogle Scholar
- de Lauzon B, Romon M, Deschamps V, Lafay L, Borys JM, Karlsson J, Ducimetiere P, Charles MA: The Three-Factor Eating Questionnaire-R18 is able to distinguish among different eating patterns in a general population. J Nutr. 2004, 134 (9): 2372-2380.PubMedGoogle Scholar
- Diggle P, Heagerty P, Liang K, Zeger J: Analysis of longitudinal data (2nd). 2002, Oxford: Oxford Science PublicationsGoogle Scholar
- Steiger JH, Lind JC: Statistically based tests for the number of common factors. annual meeting of the Psychometric Society: 1980; Iowa City, IA. 1980Google Scholar
- Bentler P, Bonett D: Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin. 1980, 88 (3): 588-606. 10.1037/0033-2909.88.3.588.View ArticleGoogle Scholar
- Cribbie RA, Jamieson J: Structural equation models and the regression bias for measuring correlates of change. Educ and Psychol Measurement. 2000, 60 (6): 893-907. 10.1177/00131640021970970.View ArticleGoogle Scholar
- Steyer R, Eid M, Schwenkmezger P: Modeling true individual Change: true change as a latent variable. Meth Psychol Res Online. 1997, 2 (1): 21-33.Google Scholar
- Lluch A, Herbeth B, Mejean L, Siest G: Dietary intakes, eating style and overweight in the Stanislas Family Study. Int J Obes Relat Metab Disord. 2000, 24 (11): 1493-1499. 10.1038/sj.ijo.0801425.View ArticlePubMedGoogle Scholar
- Shunk JA, Birch LL: Girls at risk for overweight at age 5 are at risk for dietary restraint, disinhibited overeating, weight concerns, and greater weight gain from 5 to 9 years. J Am Diet Assoc. 2004, 104 (7): 1120-1126. 10.1016/j.jada.2004.04.031.View ArticlePubMedPubMed CentralGoogle Scholar
- Hill AJ, Draper E, Stack J: A weight on children's minds: body shape dissatisfactions at 9-years old. Int J Obes Relat Metab Disord. 1994, 18 (6): 383-389.PubMedGoogle Scholar
- Vogels N, Diepvens K, Westerterp-Plantenga MS: Predictors of long-term weight maintenance. Obes Res. 2005, 13 (12): 2162-2168. 10.1038/oby.2005.268.View ArticlePubMedGoogle Scholar
- Levine MD, Klem ML, Kalarchian MA, Wing RR, Weissfeld L, Qin L, Marcus MD: Weight gain prevention among women. Obesity (Silver Spring). 2007, 15 (5): 1267-1277. 10.1038/oby.2007.148.View ArticleGoogle Scholar
- Drapeau V, Provencher V, Lemieux S, Despres JP, Bouchard C, Tremblay A: Do 6-y changes in eating behaviors predict changes in body weight? Results from the Quebec Family Study. Int J Obes Relat Metab Disord. 2003, 27 (7): 808-814. 10.1038/sj.ijo.0802303.View ArticlePubMedGoogle Scholar
- Hays NP, Bathalon GP, McCrory MA, Roubenoff R, Lipman R, Roberts SB: Eating behavior correlates of adult weight gain and obesity in healthy women aged 55-65 y. Am J Clin Nutr. 2002, 75 (3): 476-483.PubMedGoogle Scholar
- Bellisle F, Clement K, Le Barzic M, Le Gall A, Guy-Grand B, Basdevant A: The Eating Inventory and body adiposity from leanness to massive obesity: a study of 2509 adults. Obes Res. 2004, 12 (12): 2023-2030. 10.1038/oby.2004.253.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/10/37/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.