 Research
 Open access
 Published:
Change analysis for intermediate disease markers in nutritional epidemiology: a causal inference perspective
BMC Medical Research Methodology volume 24, Article number: 49 (2024)
Abstract
Background
Several approaches are commonly used to estimate the effect of diet on changes of various intermediate disease markers in prospective studies, including “changescore analysis”, “concurrent changechange analysis” and “lagged changechange analysis”. Although empirical evidence suggests that concurrent changechange analysis is most robust, consistent, and biologically plausible, indepth dissection and comparison of these approaches from a causal inference perspective is lacking. We intend to explicitly elucidate and compare the underlying causal model, causal estimand and interpretation of these approaches, intuitively illustrate it with directed acyclic graph (DAG), and further clarify strengths and limitations of the recommended concurrent changechange analysis through simulations.
Methods
Causal model and DAG are deployed to clarify the causal estimand and interpretation of each approach theoretically. Monte Carlo simulation is used to explore the performance of distinct approaches under different extents of timeinvariant heterogeneity and the performance of concurrent changechange analysis when its causal identification assumptions are violated.
Results
Concurrent changechange analysis targets the contemporaneous effect of exposure on outcome (measured at the same survey wave), which is more relevant and plausible in studying the associations of diet and intermediate biomarkers in prospective studies, while changescore analysis and lagged changechange analysis target the effect of exposure on outcome after oneperiod timespan (typically several years). Concurrent changechange analysis always yields unbiased estimates even with severe unobserved timeinvariant confounding, while the other two approaches are always biased even without timeinvariant heterogeneity. However, concurrent changechange analysis produces almost linearly increasing estimation bias with violation of its causal identification assumptions becoming more serious.
Conclusions
Concurrent changechange analysis might be the most superior method in studying the diet and intermediate biomarkers in prospective studies, which targets the most plausible estimand and circumvents the bias from unobserved individual heterogeneity. Importantly, careful examination of the vital identification assumptions behind it should be underscored before applying this promising method.
Background
In the past decades, it has become increasingly prevalent to study sensitive disease biomarkers or intermediate endpoints of diseases, such as weight gain, blood pressure, glycemia, lipid profiles, and other cardiometabolic or inflammationrelated biomarkers in epidemiology, which can help to identify disease risk factors earlier and provide potential pathways linking these factors to distal diseases [1,2,3,4]. In contrast to dichotomous disease status, intermediate biomarkers are continuous indicators, which are more sensitive to various exposure factors and tend to fluctuate over short periods of time as exposure changes [2, 3]. The longitudinal cohort data with repeated measurements could capture the covariation relationships between exposures, confounders, and outcome indicators over time, providing an ideal data structure for clarifying the causal associations between timevarying exposures of interest and intermediate biomarkers.
In practice, while prospective studies on the relationship between diet (including other lifestyle factors such as physical activity) and predisease intermediate biomarkers are tremendous, the analytical methods are various, which have produced very different even contradictory results [5,6,7]. Commonly used approaches mainly fall into the following three categories. The first approach is changescore analysis, which involves modeling the association of baseline exposure and subsequent biomarker change [7,8,9]. The second is concurrent changechange analysis, which evaluates the association of exposure change and biomarker change within the same timespan [7, 10,11,12,13,14,15]. The last is lagged changechange analysis, which models the association of previous exposure change and subsequent biomarker change [5,6,7, 12]. An empirical comparison study has thoroughly evaluated and compared these three approaches based on three famous largescale prospective cohorts [7]. The results showed that concurrent changechange analysis could produce the most robust, consistent, and biologically plausible estimates and therefore was a superior and recommended analytical method to assess the relationship of diet with weight gain in prospective cohort studies [7]. Since then, this method has been widely used in longitudinal studies to explore whether and to what extent the change in diet leads to parallel change in weight (or other adiposity measures: BMI, waist circumference) [16,17,18,19,20,21] as well as many cardiometabolic and inflammationrelated biomarkers [22,23,24] in a relatively short time.
Although such concurrent changechange analysis is appealing in epidemiology and empirical evidence suggests that it outperforms other analysis methods, few studies explicitly elucidate and compare the rationale of the above three approaches from the perspective of causal inference, including the underlying causal model, causal effect estimand, appropriate causal interpretation, etc. Therefore, this article intends to understand and compare these methods under the framework of causal inference, intuitively illustrate it with directed acyclic graph (DAG), further clarify the strengths and limitations of the recommended concurrent changechange analysis through simulations, and thereby have a better understanding of why concurrent changechange analysis usually works while others do not, and under what circumstance it should work.
Theoretical interpretations
The underlying causal model and estimand
Concurrent changechange analysis
The intuitive idea of concurrent changechange analysis is to capture the covariation pattern of exposure and outcome, then answer the question of whether and how changes in exposure cause changes in disease markers. This approach only models the withinindividual variation thus could remove the influence of betweenindividual heterogeneity once the model assumptions are satisfied. In fact, the concurrent changechange analysis method is identical to the fixed effects model (FEM) developed in the econometric literature. FEM is a classical causal inference method commonly used in repeated measures data in econometrics and sociology, which is based on the principle of selfcontrol [25, 26]. The typical linear causal model for twoway FEM is:
Here, to aid the understanding of this causal model, we used the illustration example of evaluating the effect of dairy intake on weight in a prospective cohort study. Thus, \({y}_{it}\) is the weight of individual \(i\) measured at time \(t\), \({x}_{it}\) is the collected dairy intake of this individual at time \(t\), \({z}_{it}\) is some other observed timevarying covariates (e.g. intake of other food, physical activity, sleep status, and so on), \({u}_{i}\) denotes the effect of unobserved individualspecific characteristics (such as genetic predisposition) and \({\lambda }_{t}\) represents the timespecific effects (reflects the effects of unobserved timevarying variables, such as economic growth, health literacy and so on), \({\epsilon }_{it}\) is the random error term. It is worth noting that FEM assumes a concurrent influence of exposure on outcome indicator (\({x}_{it}\to {y}_{it}\)), that is, the targeted estimand of FEM is the effect of exposure on contemporaneously measured outcome.
Suppose we have only twowave \(t=0, 1\) panel data of \(i=1,\cdots ,N\) individuals, we can obtain two equations according to model (1):
Differencing above two equations can wipe out the timeinvariant unobserved term \({u}_{i}\):
The model (2) is a typical analytical model used in concurrent changechange analysis. Therefore, the estimand of concurrent changechange analysis is that of FEM (\(\beta :{x}_{it}\to {y}_{it}\)), which is the average causal effect of dairy intake on contemporaneously measured weight in this example. The most appealing strength of this method is that it takes advantage of the idea of selfcontrol, which makes it rely only on intraindividual variation, and the timeinvariant unobserved heterogeneity (\({u}_{i}\), such as the heterogeneous genetic background) is eliminated by differencing (other “within transformations” such as demeaning could also eliminate the term). Such unmeasured timeinvariant confounding is prone to most observational studies in which effect estimations usually rely both on variations within and between individuals.
However, there were several vital causal identification assumptions for application of FEM [26, 27]:

(i)
The strict exogeneity (SE) assumption of the error term: for each \(i=1,\cdots ,N\) and \(t=0,\cdots , T\),
\({\varvec{X}}_{\varvec{i}}\) and \({\varvec{Z}}_{\varvec{i}}\) is a \(T\times 1\) vector of exposure variables or covariates for unit \(i\), respectively. \({\varvec{\lambda }}_{\varvec{t}}\) is a \(T\times 1\) vector of timespecific effect terms. This assumption forbids the correlation of current error \({\epsilon }_{it}\) with past, present, and future values of regressors, which implies the absence of dynamic causal relationships between exposure and outcome variables across different periods, specifically including the causal relation of the past outcome \({y}_{i,t1}\) and current outcome \({y}_{it}\) (autocorrelation), the causal relation of past exposure \({x}_{i,t1}\) and current outcome \({y}_{it}\) (lag effects), or the causal relation of past outcome \({y}_{i,t1}\) and current exposure \({x}_{it}\) (reverse causation) [27].

(ii)
The common trend (CT) assumption for different individuals: for each \(t=0,\cdots , T\) and each possible exposure level \(x\),
\({y}_{it}^{x}\) denotes the potential counterfactual outcome of individual \(i\) at time \(t\) if the exposure is at \(x\) level, this assumption requires individual outcome trajectories parallel to each other had they not changed their exposure level, which implies that the timespecific effects (\({\lambda }_{t}\)) are constant (or equivalently, unmeasured timevarying variables are identical) among individuals after conditioning on all the measured confounders.
Changescore analysis
Similar to conventional cohort study which estimates the effect of baseline exposure (\({x}_{0}\)) on the followup disease endpoint (\({y}_{1}\)) given all participants being free of that disease at baseline (control \({y}_{0}\)), the changescore analysis essentially aims to obtain the effect of \({x}_{0}\) on \({y}_{1}\) (the part that has not already been determined by \({y}_{0}\)) in the setting of continuous outcomes [28]. The underlying possible linear causal model is depicted as model (3), which considers a “true state dependence” of the outcome over time (that is, the baseline outcome would causally influence the subsequent outcome).
Rather than adjusting the baseline outcome directly in the regression model, the construction of change score (\({\varDelta y=y}_{1}{y}_{0}\)) likely attempts to remove the influence of baseline outcome through subtraction, and the analysis model is constructed as model (4):
In terms of the targeted estimand, changescore analysis aims to estimate the effect of exposure on the outcome after a oneperiod timespan (typically several years in cohort studies), which is the average causal effect of dairy intake many years ago on current weight in this example. As discussed in a latest article, the role of the baseline outcome variable (\({y}_{0}\)) is key to the success of changescore analysis, which does not provide desired causaleffect estimates (the effect of \({x}_{0}\) on \({y}_{1}\)) unless the baseline outcome variable is independent of baseline exposure [28]. In addition, the changescore analysis is always biased when there exists unmeasured confounding [28].
Lagged changechange analysis
The lagged changechange analysis intends to guarantee the temporality of the association between exposure and outcome through a oneperiod lag compared with concurrent changechange analysis. This method is identical to the “lagged firstdifference (LFD)” model, the linear causal model of which is depicted as model (5).
Suppose we have threewave \(t=0, 1, 2\) panel data of \(i=1,\cdots ,N\) individuals, we can obtain two equations according to model (5):
Differencing the above two equations, we could obtain the typical analytical model used in lagged changechange analysis:
It is not difficult to find that the lagged changechange analysis is very similar to the concurrent changechange analysis. It could also deal with the unmeasured timeinvariant confounding \({u}_{i}\), except that it assumes the effect time window of exposure on outcome is oneperiod lagged rather than concurrent (compare the causal model (1) and (5)). Therefore, the targeted estimand of the lagged changechange analysis is the effect of exposure on outcome measured oneperiod later. The key to the lagged changechange analysis is the correct specification of temporal lags. The estimates would suffer from severe bias once the temporal lag does not specify the true time window of causal effects in realworld [29, 30].
A succinct comparison and summary of different analysis approaches is given in Table 1.
Which estimand is more appropriate?
It is important to note that these three methods have distinct estimands, thus the choice of method should depend on which estimand best aligns with the research question. We focus on the setting of prospective studies aiming to evaluate the effect of diet on intermediate disease markers, which estimand is more appropriate in such a scenario? We think it is the concurrent changechange analysis, the reasons are as follows:
First, in a typical prospective cohort study, dietary habits in the previous year are usually retrospectively assessed using relevant questionnaires, and the biomarkers are instantly measured at each survey wave. In addition, it usually conducts repeated surveys at intervals of several years. Therefore, the concurrent effect estimand of concurrent changechange analysis corresponds to a nearly oneyear effect time window, while the oneperiod lagged effect estimand of changescore analysis and lagged changechange analysis corresponds to the effect of diet many years ago on current intermediate disease biomarkers. Given that the intermediate disease biomarkers are usually highly sensitive and reversible (fluctuate over short periods of time as exposure changes), such concurrent effect estimand is more plausible and relevant for studying the present research question.
Second, the empirical study has shown the unbiasedness and superiority of concurrent changechange analysis, which indirectly confirms the rationality of the underlying causal model and estimand for the concurrent changechange analysis in this research question (if the estimand of the other two analysis methods captures the true causal mechanism, the empirical study would show a very different result).
The illustration using DAG
DAG is a useful tool for visually displaying the causal relationships between variables and prompting how to obtain a valid causal effect based on some criteria (such as backdoor criterion) [31]. Given the concurrent effect might be the most relevant and plausible estimand when studying the diet and intermediate biomarkers in prospective cohorts, we would construct a simplified DAG based on concurrent causal relation between variables, and illustrate why concurrent changechange analysis could, while changescore analysis and lagged changechange analysis could not produce a valid causal effect estimate of interest. Taking threewave panel data for example, the constructed DAG is as in Fig. 1, and the desired estimand is \(\beta\), the concurrent effect of dairy intake on weight.
Concurrent changechange analysis
The concurrent changechange analysis model is:
At first, we examine what is the coefficient of \({X}_{0}\) (\(\widehat{\beta }\)) estimate. The true causal path of \({X}_{0}\) to \(\varDelta {Y}_{1}\) is \({X}_{0}\to {Y}_{0}\to \varDelta {Y}_{1}\), and we know the true causal effect is \(\beta\) (multiplying the path coefficients in DAG), and we can find out all the noncausal paths (backdoor paths) between them:

① \({X}_{0}\leftarrow {Z}_{0}\to {Y}_{0}\to \varDelta {Y}_{1}\);

② \({X}_{0}\leftarrow U\to {Y}_{0}\to \varDelta {Y}_{1}\);

③ \({X}_{0}\leftarrow U\to {Y}_{1}\to \varDelta {Y}_{1}\);

④ \({X}_{0}\leftarrow U\to {X}_{1}\to {Y}_{1}\to \varDelta {Y}_{1}\);

⑤ \({X}_{0}\leftarrow U\to {X}_{1}\leftarrow {Z}_{1}\to {Y}_{1}\to \varDelta {Y}_{1}\);
The path ① is blocked by conditioning on \({Z}_{0}\); the confounding caused by the path ② and path ③ is offset due to the equivalent effect of unobserved characteristics on outcomes in each wave; the path ④ was blocked because of the inclusion of \({X}_{1}\) in the regression model; \({X}_{1}\) is a collider in the path ⑤, adjusting for \({X}_{1}\) and \({Z}_{1}\) at the same time could block the path ⑤. Thus, all the five noncausal paths have been blocked or canceled out, the coefficient \(\widehat{\beta }\)in model (7) is an unbiased estimate of the causal effect of \({X}_{0}\) on \(\varDelta {Y}_{1}\), that is, \(\widehat{\beta }=\beta\). Similarly, we could also find that the coefficients of \({X}_{1}\), \({Z}_{0}\) and \({Z}_{1}\) are unbiased causal estimates for corresponding explanatory variables, thus, \(\widehat{\beta }=\beta\), \(\widehat{\gamma }=\gamma\). In addition, the intercept \(\widehat{\alpha }\)in model (7) represents the timespecific effects on \(\varDelta {Y}_{1}\) (that is, \({\lambda }_{1}{\lambda }_{0}\)).
Changescore analysis
The changescore analysis model is:
As depicted above, the true causal effect of \({X}_{0}\) on \({\varDelta Y}_{1}\) is \(\beta\). Therefore, even model (8) could yield an unbiased estimate for the causal effect of \({X}_{0}\) on \({\varDelta Y}_{1}\), which is opposite to the desired estimand \(\beta .\) Furthermore, the noncausal path ④ (\({X}_{0}\leftarrow U\to {X}_{1}\to {Y}_{1}\to \varDelta {Y}_{1}\)) is open because \({X}_{1}\) is not adjusted in model (8), which could further introduce bias in the estimation. Thus, the estimates of changescore analysis are neither the desired estimand \(\beta\) nor its opposite value.
Lagged changechange analysis
The lagged changechange analysis model is:
According to DAG, the true causal effect of \({X}_{0}\) on \({\varDelta Y}_{2}\) is zero, and the true causal effect of \({X}_{1}\) on \({\varDelta Y}_{2}\) is \(\beta\). The lagged changechange analysis model wrongly restricts the coefficients of \({X}_{0}\) and \({X}_{1}\) to opposite values, therefore, no matter what the estimated value of the \({\widehat{\beta }}^{**}\) is, it cannot be correct.
Summary
As described above, the causal model and estimand behind the concurrent changechange analysis (that is, the FEM) is more plausible than that of changescore analysis and lagged changechange analysis when studying the relationship of diet with sensitive disease biomarkers. Furthermore, the FEM has an additional powerful strength to eliminate the unmeasured timeinvariant confounding and thus improve the validity of the causal estimates, which has hardly been recognized in practical studies using concurrent changechange analysis. However, the success of FEM estimation depends on two vital causal identification assumptions, which might be violated in the practical study settings and thus lead to biased estimates as well (see discussion section). We therefore must be careful when adopting the concurrent changechange analysis and interpreting the results of such method.
Methods
Simulation design
The previous section theoretically clarifies the underlying causal model of three common methods for studying the relationship of diet and intermediate disease markers in prospective studies, and illustrates why concurrent changechange analysis could produce unbiased results while the other two approaches could not, under the most appropriate causal model using DAG. In this section, we will conduct several simulations to intuitively display and demonstrate the strengths and limitations of recommended concurrent changechange analysis. We aim to (1) compare the unbiasedness for estimates of the concurrent changechange analysis, crosssectional analysis, changescore analysis, and lagged changechange analysis in the settings of different extent of confounding caused by unobserved individualspecific heterogeneity; (2) investigate the performance of concurrent changechange analysis in the scenarios with varying degrees of violation of the SE assumption or the CT assumption, respectively.
Simulation data
For the first purpose, the basic data generation model of the simulations is as follows, and the sample size and panel waves are set at 1000 and 3, respectively:
\({x}_{it}\) is a continuous exposure with \(\beta =1\), and \({z}_{it}\) is a continuous observed confounding covariable with \(\gamma =1, \delta =0.5\), \({\lambda }_{t}\) is timespecific effect with \({\lambda }_{0}=0.5, {\lambda }_{1}=1,{\lambda }_{2}=1.5\). \({u}_{i}\) is the continuous unobserved individual heterogeneity term with effect of \(\theta\) for exposure and 1 for outcome, \({\nu }_{it}\) and \({\epsilon }_{it}\) are random error terms for exposure and outcome, respectively.
We model \({z}_{it}, {u}_{i}, {\nu }_{it}\) and \({\epsilon }_{it}\) as independent standard normally distributed random variables (\({z}_{it}, {u}_{i}, {\nu}_{it},{\epsilon }_{it}\sim{N}\left(\text{0,1}\right)\), that is, all of these variables have a mean of 0 and a standard deviation of 1), and then generate \({x}_{it}\) and \({y}_{it}\) according to above models and effect parameters. We set \(\theta\) ranging from 0 to 1 by 0.1 intervals to represent the absence or presence of increasing degrees of unobserved confounding resulting from \({u}_{i}\).
For the second purpose, we only simulate twowave panel data. In terms of violation of SE assumption, we only consider the situation of past outcome directly affecting current outcome for simplicity. We thus add a lagged outcome term with effect of \(\rho\) in outcome model as follows:
We set \(\theta =1\) and \(\rho\) from 0 to 1 by 0.1 intervals to indicate the absence or presence of increasing degrees of autocorrelation of outcome. The initial outcome value \({y}_{i,1}\) is generated from model \({y}_{i,1}={u}_{i}+{\epsilon }_{i,1}\), in which \({\epsilon }_{i,1}\) are sampled from \(N\left(\text{0,1}\right)\), and other parameter settings and sampling process are the same as above.
In terms of violation of CT assumption, we assume there exists unobserved timevarying confounding \({\lambda }_{it}\), that is, the timespecific effects are inconstant among individuals, the data generation model is depicted as:
We model \({\lambda }_{it}\) as a standard normal variable (\({\lambda }_{it}\sim{N}\left(\text{0,1}\right)\)); we also set \(\theta =1\) and \(\omega\) ranging from − 1 to 1 in 0.2 intervals to reflect different directions and degrees of heterogenous trends. Other parameter settings and simulating processes are identical to the above.
Analysis models
The concurrent changechange analysis model:
The crosssectional analysis model:
The changescore analysis model:
The lagged changechange analysis model:
For each simulation scenario, we draw 1000 artificial sample data, produce the effect estimates using corresponding analysis methods, and compute the mean of the estimates and their standard errors. A simplified flowchart of the simulation studies is provided in Supplementary Figure S1.
Results
The trend and the estimated coefficient using different analysis methods under various degrees of unobserved heterogeneity are given in Fig. 2. Concurrent changechange analysis has always yielded unbiased estimates as expected. The crosssectional analysis produces unbiased estimates only when there is no unobserved confounding, and produces increasingly biased estimates as the unobserved heterogeneity is larger (ranging from 0.999 to 1.499). However, the estimates of changescore analysis (ranging from − 0.995 to 0.500) and lagged changechange analysis (remains around − 0.5) are always biased and in the opposite direction of the true causal effect even without any unobserved heterogeneity. (See the Supplementary Table S1 for full details.)
The trend and the mean of the estimates for concurrent changechange analysis under different degrees of violation of the SE or the CT assumption are shown in Fig. 3. The results show a linear tendency of increased estimation bias as the degrees of violation become more serious (ranging from 1.000 to 0.501 for violated SE assumption, and increasing from 0.500 to 1.501 for violated CT assumption), with unbiased estimates under no violation of assumptions. (See the Supplementary Table S2 for more details.)
Discussion
Overview
This study thoroughly explores and understands three analysis approaches evaluating diet and intermediate disease markers in prospective studies within the causal inference framework, and mainly demonstrates the strengths and pitfalls of the concurrent changechange analysis recommended in the applied researches through simulations. We find that the underlying causal model and targeted estimand are different for distinct analysis methods. Specifically, the concurrent changechange analysis concerns the contemporaneous effect of exposure on outcome, while the changescore analysis and lagged changechange analysis target the effect of exposure on outcome after a oneperiod timespan. In the setting of prospective cohorts with repeated measures at severalyear intervals, estimating the concurrent effect of diet on sensitive biomarkers (corresponding to a nearly oneyear effect window) is more relevant and plausible in practice, and corresponding concurrent changechange analysis could yield robust and unbiased estimates even with serious unobserved timeinvariant confounding. Nevertheless, the SE and CT assumptions are prerequisites for applying concurrent changechange analysis, violation of which would lead to biased results as well.
Rationality and strength of concurrent changechange analysis
Given the targeted estimand and implied causal model are distinct for these three analysis methods, the fundamental criterion for judging the applicability of a method is which estimand is most relevant to the specific research question. As mentioned above, we think concurrent changechange analysis targets the most proper estimand in the setting of prospective studies aiming to evaluate the effect of diet on intermediate disease markers. The sensitivity and reversibility feature of intermediate biomarkers implies that the effect of exposure would generally occur within a short period of time, in other words, the recent exposures are much more important than the distant past exposures. In many randomized controlled trials (RCTs) evaluating the performance of diet or physical activity interventions on weight loss or improvement of cardiometabolic markers, the intervention time is generally several weeks to two years, within which researchers often observe significant favorable effects [32,33,34,35]. However, regain of weight or those biomarkers usually occurs within a longer followup period after the end of intervention [36, 37]. This phenomenon coincides with the above viewpoint and potentially reinforces the rationality of concurrent effect assumption. In addition to a more plausible estimand, the concurrent changechange analysis could circumvent the unobserved timeinvariant (or relatively stable in the short term) confounding problem plaguing observational studies, such as personality, genetic susceptibility, and cultural customs [38].
Pitfalls and relevant progress of concurrent changechange analysis (FEM)
Is concurrent changechange analysis the panacea for solving the research question about the relation between diet and disease biomarkers, given its preferable performance in empirical studies and more plausible causal model in theory? The answer is of course no. From the perspective of FEM, the use of concurrent changechange analysis is conditioned on two vital identification assumptions, which might be violated in practical research scenarios. For example, the SE assumption requires the past outcome does not directly affect the subsequent outcome, thus attributing the correlation in outcome over time to the stable unobserved individualspecific heterogeneity \({u}_{i}\) or the temporal correlation of other influencing factors (\({x}_{it}\) or \({z}_{it}\)) of the outcome [26]. This may be correct for many sensitive and reversible biomarkers, but not for others which usually indicate irreversible organic/pathological changes (for instance, extreme glucose metabolism indicators can reflect islet damage [39]). In above situation, the past biomarker does causally affect the later biomarker level, thus violating the SE assumption. As for the CT assumption, it requires complete homogeneity for unmeasured timevarying variables among individuals, however, many ubiquitous unmeasured timevarying health determinants such as health awareness and behavioral predisposition tend to have strong individual heterogeneity. Fortunately, there has been some methodological progress to relax the SE or CT assumption in above situations [26, 40,41,42]. The most classical method to loosen the SE assumption is to add the lagged dependent variable term \({y}_{i,t1}\) into the traditional FEM model (called Dynamic Fixed Effects Model, DFEM) and combine the instrumental variables methods and generalized method of moments procedure to obtain the estimates based on firstdifferenced data [41]. The most simple and common method to loosen the CT assumption is to construct the fixedeffects model with individualspecific constants and slopes (FEIS) and estimate it through second differencing, thus allowing timespecific effects or the unobserved timevarying variables heterogeneous [26].
In addition to SE and CT assumptions, there are two other potential limitations for concurrent changechange analysis worthy of note. First, such withinindividual estimators would lose information and lead to a lack of precision (low statistical power), thus might require a larger sample, more waves of data, and sufficient variation over time in the exposure [38, 43, 44]. Second, this method is unable to deal with the problem of reverse causality. On the one hand, modeling the (crosssectional) relationship of concurrent exposure and outcome could not clarify the causal order, but the proper temporality could be guaranteed by the data collection method and process (retrospectively collect the exposure). On the other hand, if the reverse causation of previous outcome and current exposure exist (especially for those biomarkers that are known to or monitored by the study participants, for example, deterioration of blood glucose could cause individuals to modify their future lifestyles), concurrent changechange analysis would also yield biased estimates. Other methods such as crosslagged panel model with fixed effects might be useful in such situations [29].
Recommendations

(1)
How to choose the appropriate analysis method, and when could we adopt the concurrent changechange analysis?
Several key factors should be considered when choosing among these methods. Firstly, the research question nature and the true temporal relationship between concerned variables is the fundamental criterion, we should employ the concurrent changechange analysis when focus on immediate or shortterm effect, and would prefer the changescore analysis or lagged changechange analysis when aiming to estimate the delayed or lagged effect. Secondly, we should contemplate whether there are important unobservable individual or groupspecific effects that lead to confounding, if there exists such unobserved heterogeneity, the changescore analysis is not a useful method, while the other two methods can deal with such problem. Finally, the autocorrelation and serial dependency is another key point, which is considered in the changescore analysis while is not allowed in the other two methods. In conclusion, we should make the choice carefully according to different scenarios and correctly interpret the results of different methods.
Our study focuses on a specific scenario of prospective studies that seek to estimate the causal relation between diet and sensitive intermediate disease biomarkers, in which the most or all of the effect of exposure will occur within a short time. Moreover, there should exist neither dynamic causal relationships between exposure and outcome across different periods (such as “true state dependence” for the outcome indicators over time) nor clear unobserved timevarying heterogeneity in specific research questions. If so, concurrent changechange analysis is most relevant and would lead to the most robust and biologically plausible results comparable to RCTs.

(2)
How to conduct the concurrent changechange analysis?
To reiterate, careful and stringent examination of the applicability for the SE and CT assumptions is necessary, if the specific research scenario substantially diverges from these assumptions, DFEM or FEIS models with corresponding estimation methods might be alternative solutions. When the concurrent changechange analysis is appropriate to conduct, we could directly model the association of change in exposure and parallel change in outcome indicator, and simultaneously adjusting for the changes in those observed timevarying confounders, with no need to include any timeinvariant covariates (because both unobserved and observed timeinvariant terms will be counteracted given the effects of these variables are constant over time) or the baseline level of confounders or outcomes (because when the SE assumption is satisfied, the previous outcome is not a cause for the later outcome, although adjustment for previous/baseline outcome is quite common in applied longitudinal studies).
Strengths and weaknesses
To the best of our knowledge, this is the first study to thoroughly dissect three commonly used analysis approaches for diet and intermediate disease markers in prospective researches from the causal inference perspective, and confirms the superiority of recommended concurrent changechange analysis in theory and in simulation, which is conducive to the scientific application of these methods and improvement of the research quality. However, there are still several limitations or caveats worthy of notice. First, this study only concerns and interprets three mainstream methods, and there might be other analysis approaches in similar applied studies not considered. Second, we generate simulated data only based on the causal model of FEM and do not consider that of other analysis approaches, because empirical and theoretical evidence has suggested that it is most plausible for the research questions we care about. Third, the simulations in present study are oversimple. We did not use a specific illustrative research question and did not set the effect parameters and variable distributions according to empirical data, which might make it difficult to relate the simulation to reality. However, we mainly aimed to intuitively display the fact that concurrent changechange analysis generally outperforms other methods but returns biased estimates when the vital assumptions are violated. The magnitude of the bias resulting from improper analysis method or violation of model identification assumptions in specific research scenario is out of the scope of this article.
Conclusions
In conclusion, the commonly used changescore analysis, concurrent changechange analysis and lagged changechange analysis target different estimands with different interpretations. Concurrent changechange analysis might be the most superior method in studying the causal relation of diet and intermediate biomarkers, which targets the most plausible estimand and tremendously ameliorates the intractable bias from unobserved individual heterogeneity in observational studies. Although this method is highly recommended, the vital assumptions behind it should be always kept in mind.
Availability of data and materials
All data generated or analysed during this study are included in this published article [and its supplementary information files].
Abbreviations
 CT:

Common trend
 DAG:

Directed acyclic graph
 DFEM:

Dynamic fixed effects model
 FEIS:

Fixedeffects model with individualspecific constants and slopes
 FEM:

Fixed effects model
 FFQ:

Food frequency questionnaire
 SE:

Strict exogeneity
References
Schwingshackl L, Hoffmann G, Iqbal K, Schwedhelm C, Boeing H. Food groups and intermediate disease markers: a systematic review and network metaanalysis of randomized trials. Am J Clin Nutr. 2018;108:576–86. https://doi.org/10.1093/ajcn/nqy151.
Yetley EA, DeMets DL, Harlan WR. Jr. Surrogate disease markers as substitutes for chronic disease outcomes in studies of diet and chronic disease relations. Am J Clin Nutr. 2017;106:1175–89. https://doi.org/10.3945/ajcn.117.164046.
Branca F, Hanley AB, PoolZobel B, Verhagen H. Biomarkers in disease and health. Br J Nutr. 2001;86(Suppl 1):55–92. https://doi.org/10.1079/bjn2001339.
Pico C, Serra F, Rodriguez AM, Keijer J, Palou A. Biomarkers of Nutrition and Health: New Tools for New approaches. Nutrients. 2019;11(5):1092. https://doi.org/10.3390/nu11051092.
Choi Y, Larson N, Gallaher DD, Odegaard AO, Rana JS, Shikany JM, Steffen LM, Jacobs DR. Jr. A Shift toward a plantcentered Diet from Young to Middle Adulthood and subsequent risk of type 2 diabetes and weight gain: the coronary artery Risk Development in Young adults (CARDIA) study. Diabetes Care. 2020;43:2796–803. https://doi.org/10.2337/dc201005.
Larsen SC, Mikkelsen ML, Frederiksen P, Heitmann BL. Habitual coffee consumption and changes in measures of adiposity: a comprehensive study of longitudinal associations. Int J Obes (Lond). 2018;42:880–6. https://doi.org/10.1038/ijo.2017.310.
Smith JD, Hou T, Hu FB, Rimm EB, Spiegelman D, Willett WC, Mozaffarian DA. Comparison of different methods for evaluating Diet, Physical Activity, and LongTerm Weight Gain in 3 prospective cohort studies. J Nutr. 2015;145:2527–34. https://doi.org/10.3945/jn.115.214171.
van der Du H, Boshuizen AD, Forouhi HC, Wareham NG, Halkjaer NJ, Tjonneland J, Overvad A, Jakobsen K, Boeing MU. Dietary fiber and subsequent changes in body weight and waist circumference in European men and women. Am J Clin Nutr. 2010;91:329–36. https://doi.org/10.3945/ajcn.2009.28191.
Vergnaud AC, Norat T, Romaguera D, Mouw T, May AM, Travier N, Luan J, Wareham N, Slimani N, Rinaldi S, et al. Meat consumption and prospective weight change in participants of the EPICPANACEA study. Am J Clin Nutr. 2010;92:398–407. https://doi.org/10.3945/ajcn.2009.28713.
Mozaffarian D, Hao T, Rimm EB, Willett WC, Hu FB. Changes in diet and lifestyle and longterm weight gain in women and men. N Engl J Med. 2011;364:2392–404. https://doi.org/10.1056/NEJMoa1014296.
Smith JD, Hou T, Ludwig DS, Rimm EB, Willett W, Hu FB, Mozaffarian D. Changes in intake of protein foods, carbohydrate amount and quality, and longterm weight change: results from 3 prospective cohorts. Am J Clin Nutr. 2015;101:1216–24. https://doi.org/10.3945/ajcn.114.100867.
MartinCalvo N, Chavarro JE, Falbe J, Hu FB, Field AE. Adherence to the Mediterranean dietary pattern and BMI change among US adolescents. Int J Obes (Lond). 2016;40:1103–8. https://doi.org/10.1038/ijo.2016.59.
Olstad DL, Lamb KE, Thornton LE, McNaughton SA, Crawford DA, Minaker LM, Ball K. Prospective associations between diet quality and body mass index in disadvantaged women: the Resilience for Eating and Activity despite Inequality (READI) study. Int J Epidemiol. 2017;46:1433–43. https://doi.org/10.1093/ije/dyx040.
Wang T, Heianza Y, Sun D, Huang T, Ma W, Rimm EB, Manson JE, Hu FB, Willett WC, Qi L. Improving adherence to healthy dietary patterns, genetic risk, and long term weight gain: genediet interaction analysis in two prospective cohort studies. BMJ. 2018;10:360. https://doi.org/10.1136/bmj.j5644.
Xue Q, Li X, Ma H, Tao Z, Heianza Y, Rood JC, Bray GA, Sacks FM, Qi L. Changes in pedometermeasured physical activity are associated with weight loss and changes in body composition and fat distribution in response to reducedenergy diet interventions: the POUNDS lost trial. Diabetes Obes Metab. 2022. https://doi.org/10.1111/dom.14662.
Stern D, Middaugh N, Rice MS, Laden F, LopezRidaura R, Rosner B, Willett W, Lajous M. Changes in Sugarsweetened Soda Consumption, Weight, and Waist circumference: 2Year cohort of Mexican Women. Am J Public Health. 2017;107:1801–8. https://doi.org/10.2105/ajph.2017.304008.
Auerbach BJ, Littman AJ, Krieger J, Young BA, Larson J, Tinker L, Neuhouser ML. Association of 100% fruit juice consumption and 3year weight change among postmenopausal women in the in the women’s Health Initiative. Prev Med. 2018;109:8–10. https://doi.org/10.1016/j.ypmed.2018.01.004.
Liu X, Li Y, Tobias DK, Wang DD, Manson JE, Willet WC, Hu FB. Changes in types of Dietary Fats Influence LongTerm Weight Change in US women and men. J Nutr. 2018;148:1821–9. https://doi.org/10.1093/jn/nxy183.
Konieczna J, Romaguera D, Pereira V, Fiol M, Razquin C, Estruch R, Asensio EM, Babio N, Fito M, GomezGracia E, et al. Longitudinal association of changes in diet with changes in body weight and waist circumference in subjects at high cardiovascular risk: the PREDIMED trial. Int J Behav Nutr Phys Act. 2019;16:139. https://doi.org/10.1186/s1296601908933.
GonzalezMorales R, CantoOsorio F, Stern D, SanchezRomero LM, TorresIbarra L, HernandezLopez R, RiveraParedez B, VidanaPerez D, RamirezPalacios P, Salmeron J, et al. Soft drink intake is associated with weight gain, regardless of physical activity levels: the health workers cohort study. Int J Behav Nutr Phys Activity. 2020;17:1. https://doi.org/10.1186/s12966020009632.
Lim CGY, Whitton C, Rebello SA, van Dam RM. Diet Quality and Lower Refined Grain Consumption are Associated with Less Weight Gain in a multiethnic Asian Adult Population. J Nutr. 2021;151:2372–82. https://doi.org/10.1093/jn/nxab110.
Trichia E, Luben R, Khaw KT, Wareham NJ, Imamura F, Forouhi NG. The associations of longitudinal changes in consumption of total and types of dairy products and markers of metabolic risk and adiposity: findings from the European Investigation into Cancer and Nutrition (EPIC)Norfolk study, United Kingdom. Am J Clin Nutr. 2020;111:1018–26. https://doi.org/10.1093/ajcn/nqz335.
Baden MY, Satija A, Hu FB, Huang T. Change in Plantbased Diet Quality is Associated with changes in plasma adiposityAssociated biomarker concentrations in women. J Nutr. 2019;149:676–86. https://doi.org/10.1093/jn/nxy301.
Glenn AJ, HernandezAlonso P, Kendall CWC, MartinezGonzalez MA, Corella D, Fito M, Martinez JA, AlonsoGomez AM, Warnberg J, Vioque J, et al. Longitudinal changes in adherence to the portfolio and DASH dietary patterns and cardiometabolic risk factors in the PREDIMEDPlus study. Clin Nutr. 2021;40:2825–36. https://doi.org/10.1016/j.clnu.2021.03.016.
Gangl M. Causal inference in Sociological Research. Ann Rev Sociol. 2010;36:21–47. https://doi.org/10.1146/annurev.soc.012809.102702.
Brüderl J, Ludwig V. Fixedeffects Panel Regression. In: Best H, Wolf C, editors. The SAGE handbook of regression analysis and causal inference. London: SAGE Publications Ltd; 2015. p. 327–58.
Imai K, Kim IS. When should we use Unit fixed effects Regression models for Causal Inference with Longitudinal Data? Am J Polit Sci. 2019;63:467–90. https://doi.org/10.1111/ajps.12417.
Tennant PWG, Arnold KF, Ellison GTH, Gilthorpe MS. Analyses of ‘change scores’ do not estimate causal effects in observational data. Int J Epidemiol. 2021. https://doi.org/10.1093/ije/dyab050.
Leszczensky L, Wolbring T. How to Deal with Reverse Causality using Panel Data? Recommendations for researchers based on a Simulation Study. Sociol Methods Res. 2019. https://doi.org/10.1177/0049124119882473.
Vaisey S, Miles A. What you can—and can’t—Do with threeWave Panel Data. Sociol Methods Res. 2016;46:44–67. https://doi.org/10.1177/0049124114547769.
Morgan SL, Winship C. Counterfactuals and Causal Inference: Methods and Principles for Social Research. 2nd ed. Cambridge: Cambridge University Press; 2014.
Kraus WE, Bhapkar M, Huffman KM, Pieper CF, Das SK, Redman LM, Villareal DT, Rochon J, Roberts SB, Ravussin E, et al. 2 years of calorie restriction and cardiometabolic risk (CALERIE): exploratory outcomes of a multicentre, phase 2, randomised controlled trial. Lancet Diabetes Endocrinol. 2019;7:673–83. https://doi.org/10.1016/s22138587(19)301512.
Roager HM, Vogt JK, Kristensen M, Hansen LBS, Ibrugger S, Maerkedahl RB, Bahl ML, Lind MV, Nielsen RL, Frokiaer H, et al. Whole grainrich diet reduces body weight and systemic lowgrade inflammation without inducing major changes of the gut microbiome: a randomised crossover trial. Gut. 2019;68:83–93. https://doi.org/10.1136/gutjnl2017314786.
SalasSalvado J, DiazLopez A, RuizCanela M, Basora J, Fito M, Corella D, SerraMajem L, Waernberg J, Romaguera D, Estruch R, et al. Effect of a lifestyle intervention program with EnergyRestricted Mediterranean Diet and Exercise on Weight loss and Cardiovascular Risk factors: oneyear results of the PREDIMEDPlus trial. Diabetes Care. 2019;42:777–88. https://doi.org/10.2337/dc180836.
Look ARG, PiSunyer X, Blackburn G, Brancati FL, Bray GA, Bright R, Clark JM, Curtis JM, Espeland MA, Foreyt JP, et al. Reduction in weight and cardiovascular disease risk factors in individuals with type 2 diabetes: oneyear results of the look AHEAD trial. Diabetes Care. 2007;30:1374–83. https://doi.org/10.2337/dc070048.
Barte JC, ter Bogt NC, Bogers RP, Teixeira PJ, Blissmer B, Mori TA, Bemelmans WJ. Maintenance of weight loss after lifestyle interventions for overweight and obesity, a systematic review. Obes Rev. 2010;11:899–906. https://doi.org/10.1111/j.1467789X.2010.00740.x.
Nordmo M, Danielsen YS, Nordmo M. The challenge of keeping it off, a descriptive systematic review of highquality, followup studies of obesity treatments. Obes Rev. 2020;21:e12949. https://doi.org/10.1111/obr.12949.
Gunasekara FI, Richardson K, Carter K, Blakely T. Fixed effects analysis of repeated measures data. Int J Epidemiol. 2014;43:264–9. https://doi.org/10.1093/ije/dyt221.
Da Silva Xavier G. The cells of the islets of Langerhans. J Clin Med. 2018;7. https://doi.org/10.3390/jcm7030054.
Baltagi BH, Moon HR, Perron B, Phillips PCB. Incidental Parameters and Dynamic Panel Modeling. In The Oxford Handbook of Panel Data; 2015.
Wooldridge JM. More topics in Linear Unobserved effects models. Econometric Analysis of Cross Section and Panel Data. The MIT; 2010. pp. 345–94.
Moon HR, Weidner M. Linear regression for Panel with unknown number of factors as interactive fixed effects. Econometrica. 2015;83:1543–79. https://doi.org/10.3982/ecta9382.
Hill TD, Davis AP, Roos JM, French MT. Limitations of fixedeffects models for Panel Data. Sociol Perspect. 2019;63:357–69. https://doi.org/10.1177/0731121419863785.
Treiman DJ. Quantitative data analysis: doing social research to test ideas. 1st ed. New York, NY: Wiley; 2009.
Acknowledgements
Not applicable.
Funding
This study was primarily funded by the National Natural Science Foundation of China (Grant No. 82273740 by XX). XZ was supported by the National Natural Science Foundation of China (Grant No. 81973151). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
DT, XX, and XZ designed the study. DT and XX wrote the analysis plan and drafted the manuscript. DT performed the data analysis. YH and NZ made significant contributions to editing and critically revising all drafts and the final paper. XZ had primary responsibility for final content. All authors have read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Figure S1.
The simplified flowchart of the simulation studies. Table S1. The mean of estimates and standard errors using different analysis methods under different degrees of unobserved heterogeneity^{a}. Table S2. The mean of estimates and standard errors using concurrent changechange analysis under different degrees of violation of the strict exogeneity assumption or the common trend assumption^{a}.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Tang, D., Hu, Y., Zhang, N. et al. Change analysis for intermediate disease markers in nutritional epidemiology: a causal inference perspective. BMC Med Res Methodol 24, 49 (2024). https://doi.org/10.1186/s12874024021679
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874024021679