Skip to main content

Bayesian hierarchical piecewise regression models: a tool to detect trajectory divergence between groups in long-term observational studies



Bayesian hierarchical piecewise regression (BHPR) modeling has not been previously formulated to detect and characterise the mechanism of trajectory divergence between groups of participants that have longitudinal responses with distinct developmental phases. These models are useful when participants in a prospective cohort study are grouped according to a distal dichotomous health outcome. Indeed, a refined understanding of how deleterious risk factor profiles develop across the life-course may help inform early-life interventions. Previous techniques to determine between-group differences in risk factors at each age may result in biased estimate of the age at divergence.


We demonstrate the use of Bayesian hierarchical piecewise regression (BHPR) to generate a point estimate and credible interval for the age at which trajectories diverge between groups for continuous outcome measures that exhibit non-linear within-person response profiles over time. We illustrate our approach by modeling the divergence in childhood-to-adulthood body mass index (BMI) trajectories between two groups of adults with/without type 2 diabetes mellitus (T2DM) in the Cardiovascular Risk in Young Finns Study (YFS).


Using the proposed BHPR approach, we estimated the BMI profiles of participants with T2DM diverged from healthy participants at age 16 years for males (95% credible interval (CI):13.5–18 years) and 21 years for females (95% CI: 19.5–23 years). These data suggest that a critical window for weight management intervention in preventing T2DM might exist before the age when BMI growth rate is naturally expected to decrease. Simulation showed that when using pairwise comparison of least-square means from categorical mixed models, smaller sample sizes tended to conclude a later age of divergence. In contrast, the point estimate of the divergence time is not biased by sample size when using the proposed BHPR method.


BHPR is a powerful analytic tool to model long-term non-linear longitudinal outcomes, enabling the identification of the age at which risk factor trajectories diverge between groups of participants. The method is suitable for the analysis of unbalanced longitudinal data, with only a limited number of repeated measures per participants and where the time-related outcome is typically marked by transitional changes or by distinct phases of change over time.

Peer Review reports


Child- to-adult trajectories of health markers are likely to have implications for the risk of chronic diseases in later life, such as obesity, type 2 diabetes mellitus (T2DM) and cardiovascular diseases; it is therefore important to understand their development throughout the life-course [1,2,3,4].

Long-running observational studies that follow the same subjects participants across the life-course are especially suited to studying adult onset disorders, such as cardiometabolic disease, since they allow characterizing the development of normal vs. pathological processes overtime. A goal of such studies is often to determine how a number of patient characteristics, modifiable risk factors profiles [1, 5], their interactions and normal aging may impact the onset and progression of disease over time [6,7,8] in order to identify time periods of divergence in these factors [911].

A key statistical issue in these studies is often to determine whether the risk factor levels vary over time between and within groups of participants, and whether different groups are changing in a similar or different fashion over time [12, 13]. Depending on the study, the stratification of participants into groups can relate to participants’ characteristics or exposure (i.e. smoking status), intervention arm (i.e. control vs. medication), or it could be a later health outcome (i.e. disease status in mid-adulthood). When participants are grouped according to a distal dichotomous health outcome, longitudinal data provide the foundation to understand pathways to deleterious risk factor profiles, which may help inform the timing of interventions [8, 14, 15].

When it is established that groups of interest start with similar initial outcome levels, but do not change similarly overtime, it is often of interest to determine the point in time or age at which they start diverging in their trajectories [16,17,18,19,20]. Being able to determine how and when the change manifests between groups of participants is important, since it can help pinpoint periods in the life course that are critical in the development of abnormal risk factor profiles [21]. However, there is little methodological guidance in the literature on statistical techniques to achieve this, and several studies have noted a lack of relevant methods to investigate trajectory divergence between groups [20,21,22].

A common attempt is to fit a mixed model with time (or age) treated as categorical variable (i.e. non time-ordered/ordinated [23]) to retrieve linear predictions at each age for each group of interest from this model (i.e. means of least squares predictions, aka LS-means [24,25,26]), and to test for a group difference in these predictions using a number of contrasts (i.e. post-hoc pairwise comparisons). In this case, the age at which the difference between-groups emerge is often the age at which a significant between-group difference materializes in the LS-means [23, 27, 28]. Several studies have used this approach to determine at “what times the groups means are different” (e.g. between-subject effect or post-hoc pairwise group comparison, if there are more than two groups) and/or ‘at what times the means differ’ within each group (within-subject effect testing) [27, 29, 30]. However, even when adjustments are applied for multiple tests [27, 31,32,33], many authors advise against the unrestricted use of multiple comparisons among marginal means due to well-documented multiple testing issues, especially the increase in false positive rate as the number of hypothesis tests increases [30, 3437]. Mixed models that assume an unstructured mean response by treating age or time as categorical variables tend to be over parameterized and may be inefficient at detecting main effects [38]. Another crucial disadvantage of this approach, is that it only tests for the difference in means between groups at each time point and does not provide any information on subject-specific response evolution in time [13, 39], so that the age (or point in time) at which the group difference manifests is ultimately a question of sample size and statistical power.

In contrast, continuous time models such as individual-based trajectory modeling methods, including mixed effect [12], hierarchical [40], multilevel [41] and the closely related structural equation and Latent Growth Curve models [42], have become invaluable tools to understand the natural history of health outcome as well as risk factor/determinant trajectories [14, 43,44,45]. They have advantages over traditional approaches to repeated-measure data analysis; their main feature being that they allow summarizing each participant’s outcome trajectory with a few trajectory parameters [39, 46]. In addition, they permit the explicitly modeling of inter-individual differences in intra-individual change, permitting inference regarding the average response trajectory over time and how this evolution may vary with participant characteristics (i.e. participant-level predictors) [47,48,49,50].

Despite their flexibility, these models are not often used to analyse sparse long-term observational data since accelerated longitudinal designs [14, 22, 51] and non-linear response overtime [44, 52,53,54] both introduce significant complexity into the growth curve modeling approach [55,56,57,58]. Indeed, being able to represent non-linear patterns with a relatively small number of measurement occasions per participants (often <10 time points) and be specific about where between-participant heterogeneity appears in those patterns is a statistical challenge.

Many applications often relied on higher order time (or age) polynomials or latent basis coefficients [14, 20, 44, 59,60,61], which strengths and limitations have been described elsewhere [9, 46, 62,63,64,65]. In the context of our study the polynomial parameterisation of the growth model does not specifically yield an age or point in time when the growth pattern is changing within-and between-groups. Alternatively, piecewise models, also known as linear splines or broken stick models, can be used to break up a non-linear or curvilinear growth trajectory into several separate linear components [66]. They are particularly useful to compare growth rates in different periods over time if the functional form of the response is characterised by different phases of development, or if there is a shift in the outcome trajectory at some point in the event window (i.e. an acceleration or a deceleration in the response change rate from one point in time (or age)) [67,68,69,70,71,72,73]. Piecewise linear trajectory models have been used to model ‘multiphase’ developmental processes primarily with ‘fixed’ transition points in a variety of applications in the frequentist multilevel [40, 45, 69, 74, 75] and structural equation modeling framework [42, 76]. Bayesian applications of these processes are often referred to as ‘random change point model’ where the position of individual breakpoints is also estimated, allowing for between-person variability in the transition points [77,78,79,80,81,82,83,84,85,86].

Few studies have, however, investigated the inclusion of categorical covariates or grouping variables as level 2 predictors of the variability in the change point, and the random Bayesian change point model has not, to our knowledge, been formulated to test specifically for the existence of a ‘trajectory divergence’ between two (or more) known groups of participants that have longitudinal responses characterised by distinct developmental phases. In this paper we illustrate the use of Bayesian hierarchical piecewise regression modeling to detect trajectory divergence between groups of participants using longitudinal BMI data from the Cardiovascular Risk in Young Finns (YFS) Study, a well phenotyped prospective cohort with measures from multiple time-points. Previously published work on this data, based on categorical mixed modeling, suggested that BMI levels became statistically different between those who develop T2DM in adulthood and those who did not from the age of 15 years [87]. We re-analyse this data set to demonstrate how the Bayesian method can be used to (1) model the BMI profiles to better understand the natural history of the BMI trajectories in those who do and do not develop T2DM in adulthood while controlling for potential cohort effects, and (2) obtain a refined estimate and confidence interval of the age at which the two groups start diverging from one another, translating into significantly different BMI from a certain age onwards. In addition, we conduct a series of short simulations to illustrate the difference in the estimates of age at divergence when using the traditional approach (i.e. pairwise comparisons of marginal means from a categorical mixed model) vs. the proposed trajectory modeling approach.


Statistical model

No-covariate model

To accommodate the curvilinear developmental pattern in an individual continuous response over time while providing an adequate representation of its developmental theory, we consider a linear-linear piecewise regression model as the functional form of change in the trajectory model. The ‘change point’ (CP) represents the age (or time) at which the transition to a different growth rate occurs. We consider the following unconditional (no covariates) multilevel model:

Level 1 model:

$$ Repons{e}_{i j}={b}_{0 i}+{b}_{1 i} ag{e}_{i j}.\left(1-{u}_{C{P}_i}\left( ag{e}_{i j}\right)\right)+{b}_{2 i}\left( ag{e}_{i j}- C{P}_i\right).{u}_{C{P}_i}\left( ag{e}_{i j}\right)+{\varepsilon}_{i j} $$

Level 2 model:

$$ \begin{array}{c}\hfill {b}_{0 i}={\beta}_{00}+{v}_{0 i}\hfill \\ {}\hfill {b}_{1 i}={\beta}_{10}+{v}_{1 i}\hfill \\ {}\hfill {b}_{2 i}={\beta}_{20}+{v}_{2 i}\hfill \\ {}\hfill C{P}_i= CP+{v}_{CPi}\hfill \end{array} $$
$$ \left(\begin{array}{c}\hfill \begin{array}{c}\hfill \begin{array}{c}\hfill {v}_{0 i}\hfill \\ {}\hfill {v}_{1 i}\hfill \end{array}\hfill \\ {}\hfill {v}_{2 i}\hfill \end{array}\hfill \\ {}\hfill {v}_{CP i}\hfill \end{array}\right)\sim N\left[\left(\begin{array}{c}\hfill 0\hfill \\ {}\hfill 0\hfill \\ {}\hfill 0\hfill \\ {}\hfill 0\hfill \end{array}\right),\left(\begin{array}{cccc}\hfill {\sigma}_{v0}^2\hfill & \hfill \dots \hfill & \hfill \cdots \hfill & \hfill \dots \hfill \\ {}\hfill {\sigma}_{v01}\hfill & \hfill {\sigma}_{v1}^2\hfill & \hfill \dots \hfill & \hfill \dots \hfill \\ {}\hfill {\sigma}_{v02}\hfill & \hfill {\sigma}_{v12}\hfill & \hfill {\sigma}_{v2}^2\hfill & \hfill \dots \hfill \\ {}\hfill {\sigma}_{v0 CP}\hfill & \hfill {\sigma}_{v1 CP}\hfill & \hfill {\sigma}_{v2 CP}\hfill & \hfill {\sigma}_{CP}^2\hfill \end{array}\right)\right] $$

Where at age j for participant i, Response ij is the repeated continuous outcome measures, and age ij is the corresponding time related variables centered around its grand mean. \( {u}_{C{ P}_i}\left( ag{e}_{i j}\right) \) is a unit heavyside step function where \( {u}_{C{ P}_i}\left( ag{e}_{i j}\right) \) =1 if age ij  ≥ CP i and \( {u}_{C{ P}_i}\left( ag{e}_{i j}\right)=0 \) if age ij  < CP i . The random trajectory parameters b 0i , b 1i and b 2i correspond to the individual intercept, slope before and slope after the person-specific change point CP i , respectively. For each person i, b 0i controls the individual baseline level (or initial status) for the response and its interpretation depends on the centering of the age variable (e.g. if age is centered around 25 years, b 0i will be the expected participant-level response at 25 years of age given they are in the first phase of growth b 1i ). b 1i , b 2i and CP i , are the expected linear increase per year of age in the first phase of growth, the expected linear rate of increase after the change point, and age at which the linear perturbation to the initial trend occurs, respectively. ε ij is the level-1 residual (i.e. random within-person error for person i at age j) and is independent and normally distributed (i.e. ε ij  ~ iid N(0, σ 2 e )). v 0i , v 1i , v 2i and v CPi are the level-2 random effects, multivariate normally distributed with zero mean and variances σ 2 v0 , σ 2 v1 , σ 2 v2 and σ 2 CP respectively and full covariance matrix as shown in 1.3 . β 00, β 10, β 20 and CP are the fixed effects (i.e. population average of each trajectory parameter). In this model, the level 1 residual variance σ 2 e can be interpreted as the deviations around an individual’s trajectory and level-2 residuals as between-participant variability in the overall intercept (σ 2 v0 ), in the rate of change before and after the change point CP i (σ 2 v1 and σ 2 v2 respectively), and in the change point itself (σ 2 CP ), respectively.

Model with group-effect

To explore heterogeneity in individual trajectories between groups of interests, the unconditional segmented growth model can be expanded by including time-varying covariates (TVCs) at level-1 and time invariant covariates (TICs) at level 2, while simultaneously adjusting for the effects of variables measured on participants at all time points. Whereas TICs directly predict the growth parameters, TVCs directly predict the repeated measures while controlling for the influence of the growth parameters [43, 88]. If the TIC variable is a binary dummy grouping factor (“GRP i ”), identifying participants coming from two identified groups, the model can be rewritten as follows:

Level 1 model:

$$ Respons{e}_{i j}={b}_{0 i}+{b}_{1 i} ag{e}_{i j}.\left(1-{u}_{C{P}_i}\left( ag{e}_{i j}\right)\right)+{b}_{2 i}\left( ag{e}_{i j}- C{P}_i\right).{u}_{C{P}_i}\left( ag{e}_{i j}\right)+ T V{C}_{i j}+{\varepsilon}_{i j} $$

Level 2 model:

$$ \begin{array}{c}\hfill {b}_{0 i}={\beta}_{00}+{\beta}_{0 grp} GR{P}_i+{v}_{0 i}\hfill \\ {}\hfill {b}_{1 i}={\beta}_{10}+{\beta}_{1 grp} GR{P}_i+{v}_{1 i}\hfill \\ {}\hfill {b}_{2 i}={\beta}_{20}+{\beta}_{2 grp} GR{P}_i+{v}_{2 i}\hfill \\ {}\hfill C{P}_i= CP+{\beta}_{CP} GR{P}_i+{v}_{CP i}\hfill \end{array} $$

Where β 00, β 10, β 20 and CP are the expected trajectory parameters for the reference group (at zero values for other potential covariates); β 0grp , β 1grp , β 2grp  and β CP are the expected intergroup variations in these parameters for participants in the second group (i.e. respectively, in the mean response, in the linear age effect, in the deviation from linear rate after the CP and in the CP timing); and v 0i , v 1i , v 2i and v CPi are the level-2 residuals person i for intercept, slopes, and age at the change point after controlling for group differences. To test for a between-group difference in one trajectory parameter only, ‘GRP’ can be included as a level-2 predictor for the parameter of interest, and model all other growth parameters as random effects only (as in 1.2). For each of the p + 1 individual growth parameters, additional participant-specific covariates (TICs) can be included in a similar fashion to have multiple predictors at level 2 as follows: \( {b}_{p i}={\beta}_{p0}+{\displaystyle \sum_{q=1}^{Qp}}{\beta}_{p q}{x}_{q i}+{u}_{p i} \), with x qi , the qth measured TIC; β pq ,the effect of the TIC x qi on the (p + 1) th trajectory parameter; and u pi , the (p + 1) th random effect. The set of p + 1 random effects for person i assumed to be multivariate normally distributed with covariance matrix of dimension (p + 1) * (p + 1), although simpler variance-covariance structures of the random effects can be considered during model building (i.e. mutual independence of the random effects). It is advisory to standardize TVCs in order to stabilize the variance, improve normality of errors and linearity of the mean [88]. The common assumption for the error structure is ε ij  ~ iid N(0, σ 2 e ) but it can be relaxed to include time specific variances or residual error correlation such as AR1 errors.

The same approach can be used to expand the hierarchical piecewise trajectory model with grouping factors that have more than 2 levels. This is one of the possible approaches to test for a cohort-effect on the development of curvilinear responses over time when data arises from multi-cohort or accelerated longitudinal designs [89, 90]. If study participants belong to one of k possible birth cohorts, k-1 binary dummy variables are created to identify observations coming from people born in the same calendar year, and as in 2.2, these new k-1 grouping variables are introduced as level 2 predictors of the different trajectory parameters in the model. The binary dummy variables are introduced to sequentially shift the conditional means of each of the different trajectory parameters. The fixed effects will be the average trajectory parameters for the cohort chosen as the reference cohort in the study sample, and each (β cohort )1.. k − 1 coefficient will thus be interpreted as the variation in growth parameters in the corresponding k-1th cohort compared to the reference cohort.

Trajectory divergence mechanisms

The equation 2.2 above, allows for between-group difference in each of the 4 trajectory parameters of the piecewise model. (i.e. intercept, slope before and after the change point (CP), and the change point itself). If the focus is to determine and model the divergence in the trajectories between group, then model 2.2 can be modified by forcing the intercept and slope before the CP to be invariant across groups by setting β 0grp and β 1grp to zero at level 2 in equation 2.2. As illustrated in Fig. 1, we identify 3 possible ways in which continuous outcomes trajectories can diverge over time between groups: (1) type 1: the two groups have different slope after the CP, (2) Type 2: the two groups have different change points, and (3) Type 3: the two groups have different CP and post-CP slopes. To test for group-difference at different stages of the outcome development, our approach consists in fitting these 3 possible conditional Bayesian hierarchical models to the data and comparing model fit to determine which mechanisms provides the best representation of the underlying development of the outcome between groups of participants.

Fig. 1

Three hypothetical models of between-group divergence in curvilinear response trajectories over time. Red and black solid lines indicate the average response curve of participants belonging to one or the other group; dashed lines show the position and age at change point(s) for the two groups of participants, or the age at which trajectories diverge between the two groups. Graph obtained using simulated data

Bayesian estimation of the hierarchical piecewise model

We used a Bayesian approach to estimate and summarize the parameters of interest in the conditional multilevel piecewise model (formula 2.2) [86, 91]. In our illustrative example, all models were fit in RJAGS and R2JAGS in R.. In combined Bayesian notation, the trajectory model with a binary grouping status ‘GRP’ as the TIC covariate interacting with all 4 trajectory parameters can be written as follows:

$$ \begin{array}{l} Respons{e}_{i j}\sim Normal\left( m{u}_{i j},{\sigma}^2\right)\\ {} m{u}_{i j} = {v}_{0 i}+{\beta}_{0 grp} GR{P}_i + \left({v}_{1 i}+{\beta}_{1 grp} GR{P}_i\right) ag{e}_{i j} + \left({v}_{2 i}+{\beta}_{2 grp} GR{P}_i\right)\left( ag{e}_{i j}-\Big({v}_{CPi}+ C{P}_{grp} GR{P}_i\right)\Big){}^{+}\end{array} $$

To ensure that the effect of ‘group’ on each trajectory parameter can be either positive or negative and that the prior information does not dominate the likelihood, uninformative priors for the fixed group effects β 0grp , β 1grp , β 2grp , CP grp can be set as N ~ (0, 104). In vector notation, the random effects v i  = (v oi , v 1i  , v 2i  , v cpi  )T are assumed to follow a multivariate normal distribution with mean β and unstructured 4 x4 variance-covariance matrix φ as in 1.3, where β = (β 0, β 1, β 2, CP)T, the vector of population means. Traditionally in Bayesian analysis for random effects, InvWishart(Σ,k) is used as a conjugate prior to the unknown variance-covariance matrix of multivariate normal distributions, where Σ is a positive definite inverse scale matrix of degree of freedom k [93]. Inverse-Gamma (λ1, λ2) is often used as the conjugate prior to the variance of univariate normal distribution (i.e. for mutually independent random effects, and model error variance σ 2). Alternative prior distributions may be used for level 2 variances of independent random effects or for the variance components of multivariate normal distributions [41, 92, 93].

Significance of group-differences in trajectory parameters

Testing for group-differences in trajectory parameters is equivalent to investigating the significance of the grouping covariates parameters at level 2 in the hierarchical change point model. In the Bayesian context, this is done by looking at the posterior probability density for the " β grp ” parameters in 2.2. (i.e. β 0grp , β 1grp ., β 2grp .and β CP ) of the estimated covariate parameters. For example, the effect of ‘GRP’ on each trajectory parameter is significant if the 95% Bayesian credible intervals (CI) of the estimated regressors (i.e. each “β grp ”) exclude zero, in which case, the estimated “β grp ” can be interpreted as the shifts in each trajectory parameter in one group compared to the other group [77, 92, 94].

Model convergence, fit and adequacy

The choice of the best model among the suite of candidate (conditional) Bayesian hierarchical models can be based on two criteria: (1) the deviance information criterion DIC [95, 96], an index of quality of fit that is commonly used for Bayesian model comparison [97], and (2), the Bayesian posterior predictive p-value (PP p-value), obtained through posterior predictive checking of the likelihood of each potential model (the sum of residuals was used as a as a discrepancy measure) [41].

Illustrative data

We illustrate the application of the proposed Bayesian piecewise modeling approach by using it to investigate the divergence in child-to adult trajectories of BMI between participants who do and do not develop adult T2DM in a well-studied ongoing population-based prospective cohort, the Cardiovascular Risk in YFS [15]. Details on study design and on the collection of cardiovascular risk factors between 1980 and 2011 are published elsewhere [98] and summarized in Additional file 1.

In a previously published work on the YFS cohort, elevated BMI in children between 9 and 18 years was associated with an increased risk of developing T2DM in adulthood [87]. Additionally, a sex- and insulin-adjusted mixed model incorporating participants ages as a categorical variable, suggested that differences in average BMI values between those who do and those who do not develop adult T2DM tended to emerge during adolescence, becoming marginally significant from the age of 15 years onwards. In this approach, the between-group difference at each age groups was assessed by pairwise comparisons of the predicted marginal means (i.e. LS-means), and did not incorporate BMI trajectory information at the individual or population level. In contrast, the proposed hierarchical piecewise regression approach considers and makes full use of individual trajectory information to test for group-differences at specific stages of BMI development from childhood to adulthood. Unlike categorical approaches, the proposed growth model provides a clearer representation of the underlying pathological BMI development among those who develop T2DM in adulthood.

In our illustration, we include 2540 YFS participants (1401 females and 1139 males) followed-up a maximum of six times between 1980 and 2011 (Additional file 2: Table S1). Information on adult T2DM status was collected on participants at their latest individual adult follow-up (i.e. dichotomous outcome coded 0 for participants without T2DM, and 1 for those with T2DM in 2001, 2007, or 2011). Included participants had at least one BMI measure available in childhood (i.e. in 1980, 1983 or 1986 between age 3 and 18 years). Participants had on average 4.98 repeated measures of BMI over the study period, with 90.7% of participants having 4 or more BMI measures (Additional file 2: Figure S1). 88 included participants (3.5%, 44 females and 44 males) had T2DM in adulthood. We excluded BMI observations made among those aged 3 years in 1980 so that the ages of participants considered in the trajectory analysis ranged from 6 to 49 years. 3 year olds were not included in the analysis since only 3 participants in this birth cohort developed T2DM in adulthood. Furthermore, the lack of BMI measures between 3 and 6 years, prevented modeling the downwards slope from infancy peak, nor the age at adiposity rebound, which usually occurs before age 6 years in normal weight children [99, 100]. Using BMI data collected on participants aged 6 years and over, we expect that most participants had reached this important childhood milestone, and that a linear trend was thus an appropriate functional form to approximate childhood BMI growth from that age (Additional file 3: Figure S1).

Since sex differences in childhood growth and pubertal timing have been demonstrated [101, 102], subsequent BMI trajectory modeling between age 6 and 49 years was conducted among males and females separately [103]. BMI, especially in adulthood, is slightly right skewed, but using log10 transformed BMI in the modeling approach presented below did not alter our conclusions. For ease of interpretation, we thus present results using untransformed BMI only.

Visual inspection of the sex-specific smoothed BMI trajectories confirms the presence of a divergence between the two groups in adolescence (Additional file 3: Figure S2). Compared to participants who remain healthy, those who develop T2DM seem to have greater average BMI levels by the time they are young adults, although it is unclear whether this divergence results from a group-difference in the timing at which the transition to a slower BMI growth rate happens (Type II divergence) from a group-difference in rate itself after puberty (Type I divergence), or from both (Type III divergence).

Although the distal outcome of ‘adult T2DM’ is the grouping factor of interest in our illustrative trajectory divergence analyses, we also demonstrate how the same modeling approach can be used to investigate potential inter-cohort variation in childhood to adulthood BMI trajectories by considering models with ‘year of birth’ as a categorical level 2 predictor of each of the 4 trajectory parameters. Individual age- and sex-specific BMI Z-scores at the first clinic (in 1980) were also included as level 2 predictors of each BMI trajectory parameters to investigate if systematic deviation from participants of comparable age and sex at baseline had any influence on the development of BMI trajectories later in life. All continuous covariates used in the analyses were standardized in order to stabilize the variance, improve normality of errors and linearity of the mean.

Specific values for the hyperparameters used in our illustrative analyses are given in Additional file 4. While in principle φ can be unstructured, in our application, convergence for some parameters could not be reached when considering an unrestricted covariance structure between all four random effects in the unconditional change point model (equations 1.1 and 1.2), probably due to over parameterisation. Because initial analyses suggested a correlation between the slopes before the change point (b 1i ) and the difference in slopes after the change point (b 2i ), we constrained the model by including a non-zero correlation between these two random effects but setting independence for all other random effects, leading to a block diagonal structure of φ (Additional file 4). Based on DIC, this covariance structure was preferred over mutually independent random effects for both males and females (Additional file 4), and used when expending the trajectory models with level 2 predictors. In our application, we investigated prior sensitivity by fitting the unconditional BMI trajectory model using three sets of priors for the hyperparameters (Additional file 4). Because we found that the choice of hyperparameters had a minor influence on the marginal posterior distributions, for subsequent (conditional) analyses, we chose to report posterior estimates of parameters estimated from the set of priors that yielded the lowest DIC in the sensitivity analyses (Additional file 4). In this set the priors for the means of the change points were based on the sex-specific estimates that maximized the profile log likelihood for the fixed (population-average) breakpoints in the unconditional model (estimated at 16 years for females and 22 for males, see estimation method in Additional file 5). Using these priors for the change point means also kept computation running times reasonable.

For the other participants varying variable included in the analysis, sex- and age-specific BMI z-scores at the first visit and birth cohort, priors were set to N ~ (0,0.001) for all corresponding parameters (i.e. all β cohort priors and β initialBMI − z − score ). To remain consistent with previous analyses of this data set [87], time-varying measures of fasting insulin were log-transformed and standardized before being included as a level-1 predictor in the Bayesian hierarchical models to improve right skewedness and to linearize its relationship with BMI About 17% of the insulin measures were not available in the data. The missing data mechanism for ‘insulin’ was considered non-informative, as we have no reason to believe that the probability of an individual insulin measure being missing depends on the true value of this missing insulin observation (although it may be related to other observed variables for that individual). We thus consider that insulin is missing at random (MAR), and we specify a prior for this covariate [104]. Since log (insulin) is approximately normally distributed, we specify a N ~ (μ log(insulin), τ log(insulin) ) likelihood for log(insulin) i and place a vague prior on its variance (i.e. τ log(insulin) ~ Gamma(0.001,0,001) ). Under this parametrization, the posterior predictive distribution for μ log(insulin) and τ log(insulin) will be informed by the observed part of the data only. Although individual insulin measurements change at each data collection point, by adding log(insulin) as a level 1 covariate in the multilevel model, the estimated relationship between insulin and BMI development remains constant across time [45]. This is a reasonable assumption in our application, since data exploration did not suggest any systematic patterns of change in insulin levels at the intra-individual level as people age. That is, the age smoother estimate obtained by fitting a generalized additive mixed model had an estimated degree of freedom (edf) close to 1 and was not significant (p-value >0.3), which did not suggest a non-linear relationship between log(insulin) and age [105].

Approximate posterior distributions of the parameters of the models considered throughout the analyses are obtained via MCMC simulations. Each model ran with 4 independent parallel chains of the Gibbs sampler (see Appendix 3 for an example of code). For each model, the first 50000 iterations were discarded in a burn-in run, and the draws from the posterior were thinned by a factor of 10 to reduce serial correlation of the chains. The following 20000 iterations were used to obtain posterior distributions of model parameters by mixing the 4 sequences. Model convergence was assessed through MCMC iterations traceplots and Gelman-Rubin diagnostic [92], and residual errors were plotted to confirm they approximately followed a normal distribution.


Divergence of BMI profiles in T2DM and non-T2DM YFS participants

Following the modeling approach presented in the Methods and the priors and their corresponding hyperparameters (Additional file 4: Table S1) we fitted the following set of conditional Bayesian hierarchical piecewise models for each sex: unconditional (Model A), adult T2DM status adjusted intercept (Model B), adult T2DM status adjusted childhood slope (Model C), adult T2DM status adjusted adult slope (Model D), adult T2DM status adjusted change point (CP) (Model E), adult T2DM status adjusted CP and adult slope (Model F), adult T2DM status adjusted change point, childhood and adult slopes (Model G), adult T2DM status adjusted intercept, and change point (Model G), and a model with all four parameters adjusted for adult T2DM status (Model H). As mentioned above, previous research on this data set suggested BMI levels were not significantly different between the two groups in childhood [87]. Models C (i.e. group difference in childhood slopes) and B (i.e. BMI response consistently higher in one group across the life course) were thus fitted to demonstrate our modeling approach. An annotated extract showing the RJAGS code syntax used to fit Model E is available in Additional file 6.

For both sex, the lowest DIC was obtained when fitting model E, which was also the best fitting model with PP p-values close to 0.5 (Table 1). This supported the type II divergence mechanism where a difference in BMI levels emerged between the two groups due to a group difference in the change point timing. BMI growth rate in adulthood for both sexes was decreased by two-thirds compared to childhood (i.e. 0.67- vs. 0.18 -, and 0.61- vs. 0.15 kg/m2 per year in childhood and adulthood for females and males respectively), and participants who developed T2DM had similar BMI yearly rates in adulthood compared to those who remained healthy (β2T2DM effect not significant in model F for both sex Table 2). However, females who developed T2DM reached their developmental transition in BMI rate on average 12.37 years later (Table 2).

Table 1 Analyses of the divergence in BMI trajectories between T2DM adults and non-T2DM adults: assessment of Bayesian model complexity (effective number of parameters pD), and fit (deviance information criteria DIC) for each candidate model
Table 2 Posterior mean parameter estimates for Bayesian hierarchical Piecewise BMI trajectory for best fitting trajectory divergence models in males and females (Models E)

Similarly for males, estimated BMI growth rates were not markedly different between the two T2DM groups in childhood or in adulthood, and comparable to those estimated in females (Table 2). But again, compared with healthy adults, those who developed T2DM reached their slower BMI growth rate on average 6.47 years later.

The effect of the time-varying covariate of insulin at level 1 was significant for both males and females, with a 1-sd increase in log(Insulin) resulting in a BMI observation increased by 2.6 and 2.8 kg/m2 respectively (i.e. exp(β log(insulin)), Table 2). To assess potential differences in the magnitude of the insulin effect as a function of between-person characteristics, we expanded model E by including an interaction between ‘adult T2DM status’ and log(insulin). For each sex, the estimated parameters were not significant (95% CI included 0), suggesting that the effect of insulin on BMI was homogenous between the two groups and across genders.

The estimates of the variance-covariance parameters of model E showed that the correlation between an individual’s BMI growth rate in childhood and adulthood is equal to 0.61 for females and 0.47 for males, suggesting that children who have greater yearly BMI increase rates also have greater adult rates of increase (correlation estimated as: \( \raisebox{1ex}{${\sigma}_{\beta 1\beta 2}$}\!\left/ \!\raisebox{-1ex}{$\sqrt{\sigma_{\beta 1}^2*{\sigma}_{\beta 2}^2}$}\right. \), Table 2). The between-participant variation around the change point σ CP was comparable between males and females (Table 2).

Figure 2 shows the estimated population-average prototypical trajectories for each sex and T2DM group obtained from the estimated parameters for Model E, along with 100 trajectories predicted for each sex and T2DM group from Model E by Monte Carlo simulation. This illustrates a range of credible individual profiles generated under this model (see Appendix for code). For each sex and adult T2DM status group, Fig. 3. shows a box and whiskers plot of the estimated individual BMI slopes obtained from Model E after the average change point in the healthy group and before the T2DM groups reach their average CP (i.e. slopes between 16.02 and 28.4 years in females, and slopes between 21.62 and 28.09 years in males). Figure 3 illustrates that individual rates of change after puberty provides better discrimination of participants who went on to develop T2DM from those who did not, compared to punctual individual BMI levels at age 15 or 18 for females, and ages 21 and 24 for males (Additional file 2: Figure S2). While the distribution of BMI levels at age groups surrounding the age at divergence overlaps considerably (Additional file 2: Figure S2), individual slopes allow to differentiate participants who have switched to a rate consistent with a normal slowing down of BMI development after puberty, from those who are still on the trajectory of increasing BMI development consistent with the rate from childhood.

Fig. 2

Sex-specific population average prototypical BMI trajectories for healthy and T2DM adults in the YFS cohort (solid blue and solid red lines, respectively) and prediction of 200 individual trajectories for each sex (100 per T2DM status group). The dashed trajectories were obtained by MCMC simulation using sex-specific posterior estimates of mean and variance of growth parameters for the best fitting models (Model E). In these predictions, time varying measures of log(insulin) were set to the average log(Insulin) observed in the cohort

Fig. 3

Box and whiskers plot of fitted individuals random slopes between 16.02 and 28.4 years for females (a) and between 21.62 and 28.09 years for males (b). Individual random slopes are estimated from the Bayesian hierarchical random change point model E. Solid lines in the boxplot indicate the group-specific median for the slopes (equivalent to the 50th percentiles of the posterior distribution)

Effect of age-and sex-specific childhood Z-score on BMI trajectories

When including individual age-and sex-specific BMI z-scores at the first clinic as continuous level 2 predictors of each of the four growth parameters in sex-specific Models E, the only significant effect observed was for the childhood BMI slope, with a 1-sd increase in BMI z-score associated with a 0.056 (sd = 0.012) and a 0.038 (sd =0.009) increase in childhood (in kg/m2 per year) for male, and females respectively. This suggests that in the YFS sample, higher age- and sex-adjusted BMI at first visit in childhood were associated with faster BMI increase in childhood, but not with the age at transition in BMI development nor the change rate in adulthood.

Between cohort heterogeneity in BMI trajectories

To test whether ‘year of birth’ was associated with between-participant heterogeneity in the development of BMI from age 6 to 49 years, five binary dummy variables identifying BMI observations of people born in different years (i.e. 62, 65, 71, 74 and 77) were introduced as level 2 predictors of BMI growth parameters in sex-specific models (with year 1971 as the reference level) (Table 3). Increasing the complexity of the model did not improve model fit for males, and the lowest DIC was obtained for the unconditional model (Model A, Table 3) suggesting that their life-course BMI trajectory is more stable across cohorts. For females, model E marks a significant improvement in model fit, suggesting that the most significant predictor of between-cohort variations resides in the timing of the CP, although the best model was obtained when adjusting for a cohort effect on both the adult BMI growth rate and change point. For each sex, the posterior mean parameter estimates of the best fitting model are presented in Additional file 7. The results show that most of the between cohort variation for females is due to slight trajectory differences in two specific birth cohorts: those born in 1968, who reached the transition to adult BMI growth rate on average 2.89 years later than the cohort (year of birth 1971), and those born in 1974, who had adult BMI yearly rates increased by 0.06 (e.g. adult slopes of 0.24 compared to 0.18 kg/m2 per year on average for the other 5 cohorts) (Additional file 7).

Table 3 Analyses of inter-cohort differences in BMI trajectories: assessment of Bayesian model complexity (effective number of parameters pD), and fit (deviance information criteria DIC) for each candidate model


A short series of simulations was conducted to compare difference in estimates of the age at which the groups diverge when using the proposed Bayesian piecewise growth modeling approach compared to a more traditional approach based on pairwise comparison of LS-means estimated from a categorical mixed model. We simulated repeated measure data from a Type II divergence model (i.e. group-difference in the change point timing only), using the posterior estimates of mean growth parameters for the model fitted for females (average parameters are set to: β O = 26.5, β 1 = 0.67, CP = 16.02, = β GroupCP =12.37, β 2 = −0.49, matching Model E posterior estimates for females in Table 2), and both a participant-level random effect (σ 2 error  = 2.77) and an observation-level residual error (σ 2 error  = 2.47). Under this model, “CP”, the change point for the first group to depart from the population-average childhood slope represents the age at which the two groups of participants diverge in their outcome trajectories (i.e. the second group maintains this rate of change for 12.30 years longer). To resemble the YFS BMI data, we randomly sampled baseline ages from the YFS cohort subtracted by 25 years as ages at the first visit for each participant, with 6 non-missing repeated measures 3, 6, 9, 21, 27 and 31 years later for each participant. We considered 3 scenarios of sample sizes for the number of participants in each group (group 1/group 2): (1) 100/100, (2) 50/100, and (3) 30/100. For each of the three scenarios, we simulated 100 datasets and fitted both a mixed model with age as a categorical variable and the Type II divergence Bayesian Hierarchical piecewise model using the set of priors defined in Additional file 4. For each piecewise model, we recorded the posterior estimate for the “CP” parameter, and for each fitted categorical mixed model, we applied pairwise comparison of the least-square means (LS-means) with Tukey adjustment for multiplicity to retrieve: (1) the earliest age at which the group-difference in means was found significant (p < 0.05), and (2) the midway point between two consecutive ages that had a minimum number of non-significant differences in means before and significant differences in means after the “midway point” method (2) is a potential alternative definition of age at which the group-difference appears in the LS means. Compared to the “earliest age with p-val <0.05” method (1), the “midway” point definition minimises the impact of simulations where some tests show significance at a young age, even though tests for the surrounding ages are not. For each scenario, estimated ages at divergence using the 3 methods were averaged across the 100 simulations. Figure 4 presents the simulation results in term of the quartiles distribution and means of these estimates of age at divergence across the 100 simulations. When sample size decreases for one group of participants, the pairwise LS mean comparison method will tend to overestimate the age at divergence, with significant variability in the estimates arising due to random variation, especially when age at significance is determined using the first age at which a p-value <0.05 occurs (Fig. 4.). In contrast, the hierarchical Bayesian piecewise model was less sensitive to sample size, and the true age at divergence was consistently within the estimated interquartile range of the produced estimates, indicating that the Bayesian trajectory divergence model outperforms the LS mean method in both accuracy and precision, regardless of the way “age at divergence” is defined from the model output.

Fig. 4

Boxplots and mean “Age at divergence” (x) estimated across 100 simulations using the three methods. Bottom and top of the boxes are the lower (Q1) and upper quartiles (Q3), respectively; the bands near the middle of the boxes are the medians, the lengths of the boxes represent the interquartile range (IQR = Q3-Q1); the upper whiskers are defined as min(max(x), Q3 + 1.5 * IQR) and the lower whiskers as max(min(x), Q1 – 1.5 * IQR). Means of age at divergence across the 100 simulations for each scenario are indicated with empty circles. The horizontal dashed line indicates the true age at divergence set in the simulations (16.02 years old)


Using the repeated BMI data from the YFS study, we demonstrated how Bayesian hierarchical piecewise regression (BHPR) modeling may be used to investigate between-group trajectory divergence in non-linear longitudinal outcomes.

The non-linearity in BMI development across the life-course is well documented in the literature, with changes in BMI corresponding to a number of identified developmental phases [101, 106, 107]. In particular, BMI rate decelerates after puberty once people reach their adult height, translating to a leveling-off of the BMI trajectory in adulthood [108, 109]. Although many recent applications have relied on such approaches [99, 102, 110,111,112,113], traditional polynomial parameterizations of growth curve models are not well suited to analyse BMI development [114, 115], especially if the focus is to identify transitional changes or determine divergence between groups.

In contrast, piecewise regression is particularly suited to model BMI across different life-stages as its parameters map onto what is known about the natural development of BMI over time [116]. Since ‘change points’ (i.e. milestones in the case of BMI) are model parameters in the piecewise model, there is no need to use elaborate techniques to retrieve these points of interest [59, 112, 117]. Piecewise models are also often preferable to more general continuous non-linear models if the number of repeated measurements per participant is small (i.e. 3 to 6 data points each as in [16, 109]) as is often the case in long-running observational prospective studies [99, 112]. Moreover piecewise multi-level regression models may be used to characterize the divergence mechanisms in non-linear responses between groups by modeling change points as random parameters and introducing grouping factors as predictors of the between-person heterogeneity in responses over time.

Although our main goal was to characterize how and when the developmental patterns of BMI diverged between those who did and did not develop T2DM in the YFS, we also demonstrated the utility of the method to investigate cohort effects in the outcome response. Previous analyses of the YFS BMI and T2DM data considered categorical mixed models and tested for differences in the estimated BMI levels between the two T2DM groups at different ages by pairwise comparisons of the BMI predicted marginal means (i.e. Least-Square means) averaged over sex while adjusting for multiple testing (i.e. Tukey adjustment). This approach suggested that from age 15 years, the T2DM group had significantly higher BMI levels than those without T2DM. However, these analyses ignored the potential confounding effect of birth cohorts, and each existing “age” was treated as a non time-ordered categorical variable so that no inference could be made on individual or group-specific age-related BMI trajectories. Some age groups comprised those from up to five separate birth cohorts, while others only comprised those from a unique birth cohort (i.e. those aged 3 and 27 years). Having substantially fewer participants in one or both T2DM status groups at some age points results in a decreased power to detect a significant difference between groups (i.e. the observed difference at age 27 years was not significant in either sex-averaged or sex-adjusted LS-means, Additional file 8: Tables S1, S2 and Figure S1, S2). Because BMI development is known to progress differently in males and females, and the oldest and youngest cohorts in the YFS sample are almost a generation apart (~15 years), not taking these confounders into account may result in biased inferences. In fact, when estimating the LS-means separately for each sex, the age at which the difference between T2DM groups becomes significant is not as clear since in males the difference is not significant at age 21 and 24 years, suggesting the true divergence in BMI between T2DM groups for males occurs more around those ages (Additional file 8: Table S2 and Figure S2).

In contrast, the method we illustrate here is not sensitive to sample size and uses developmental theory to inform a model that allows between-group differences in within-person BMI trajectories at four possible levels for males and females to be examined (i.e. the overall BMI level, the childhood BMI growth rate, the adult BMI growth rate, and the age at which the transition between the two phases of change occurs).

Applied to the example data set, our method allowed us to characterise group differences in the non-linear development of BMI and to identify a critical age window at which weight intervention programs might be best applied to help reduce or delay the incidence of T2DM in adulthood. Our findings support the theory that girls who keep on gaining weight at the same rate they did in childhood past the age of 16 years are more likely to develop T2DM in adulthood. Similarly, for males, the natural deceleration in BMI velocity occurs, on average, at 21 years of age. Those who stay on their childhood BMI trajectory past that age may be at increased risk of developing T2DM.

Longitudinal studies often aim to make inferences on differences among average population health marker trajectories. Typically, this involves comparing change rates (or slope differences) in healthy participants vs. those with pathological development, specific treatment conditions, or groups following certain lifestyle patterns [118120]. Using our Bayesian hierarchical piecewise regression approach, serial measures of patient’s weight and height, often routinely collected in paediatric, general practice, and healthy or well child clinics, could be used to determine if an individual is on a path to an healthy adult weight status, or if their BMI trajectory places them in a category more susceptible to develop adult metabolic conditions such as T2DM.


Studying within-person and between-person differences in the development of continuous outcomes as a function of age in long-running multi-cohort observational studies is crucial to better understand the natural history of healthy vs. pathological risk factor profiles. Due to the typically unbalanced data designs, loss to follow-up and expected non-linear responses, it remains methodologically challenging to analyse such data. When the substantial focus is on when and how two or more groups of participants grouped according to a distal dichotomous health outcome have diverged in their response trajectories, traditional parameterisations of curvilinear growth model do not allow to identify an age at which the group that developed the condition moved onto a different path compared to the group that remained healthy. In contrast, the hierarchical piecewise multi-level modeling enables the separation of multiple aspects of change in complex developmental processes such as individual and group differences in the rates of change at different periods, and potential heterogeneity in the timing at which individuals from identified groups enter each developmental phase, providing a powerful tool to help inform intervention The methodology we illustrate here focuses on a response with only one developmental change point, but it could easily be extended to more complex non-linear responses with multiple transitions.



Bayesian hierarchical piecewise regression


Body mass index


Credible interval


Change point


Cardiovascular disease


Deviance information criteria


Markov Chain Monte Carlo

PP p-value:

Posterior predictive p-value


Type 2 Diabetes mellitus


Time-independent covariate


Time-varying covariate


Cardiovascular risk in Young Finn study


  1. 1.

    Power C, Kuh D, Morton S. From Developmental Origins of Adult Disease to Life Course Research on Adult Disease and Aging: Insights from Birth Cohort Studies. Annu Rev Public Health. 2013;34:7–28.

    PubMed  Article  Google Scholar 

  2. 2.

    Færch K, Witte DR, Tabák AG, Perreault L, Herder C, Brunner EJ, Kivimäki M, Vistisen D. Trajectories of cardiometabolic risk factors before diagnosis of three subtypes of type 2 diabetes: a post-hoc analysis of the longitudinal Whitehall II cohort study. Lancet Diabetes Endocrinol.1:43–51.

  3. 3.

    Kuh D, Ben-Shlomo Y, Lynch J, Hallqvist J, Power C. Life course epidemiology. J Epidemiol Community Health. 2003;57:778–83.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  4. 4.

    Narayan KMV, Boyle JP, Thompson TJ, Gregg EW, Williamson DF. Effect of BMI on Lifetime Risk for Diabetes in the U.S. Diabetes Care. 2007;30:1562–6.

    CAS  PubMed  Article  Google Scholar 

  5. 5.

    Ben-Shlomo Y, Kuh D. A life course approach to chronic disease epidemiology: conceptual models, empirical challenges and interdisciplinary perspectives. Int J Epidemiol. 2002;31:285–93.

    PubMed  Article  Google Scholar 

  6. 6.

    Dudina A, Cooney MT, Bacquer DD, Backer GD, Ducimetiere P, Jousilahti P, Keil U, Menotti A, Njolstad I, Oganov R, et al. Relationships between body mass index, cardiovascular mortality, and risk factors: a report from the SCORE investigators. Eur J Cardiovasc Prev Rehabil. 2011;18:731–42.

    PubMed  Article  Google Scholar 

  7. 7.

    Chen Y, Copeland WK, Vedanthan R. Association between body mass index and cardiovascular disease mortality in east Asians and south Asians: pooled analysis of prospective data from the Asia Cohort Consortium. BMJ. 2013;347:f5446.

    PubMed  Article  PubMed Central  Google Scholar 

  8. 8.

    Freedman DS, Khan LK, Dietz WH, Srinivasan SR, Berenson GS. Relationship of childhood obesity to coronary heart disease risk factors in adulthood: the Bogalusa Heart Study. Pediatrics. 2001;108:712–8.

    CAS  PubMed  Article  Google Scholar 

  9. 9.

    Twisk JWR. Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. 2nd ed. Cambridge: Cambridge medicine; 2013.

    Book  Google Scholar 

  10. 10.

    Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data, Oxford Statistical Science Series. 2002. ISBN 978-0-19-852484-7.

    Google Scholar 

  11. 11.

    Mattsson N, Ronnemaa T, Juonala M, Viikari JS, Raitakari OT. Childhood predictors of the metabolic syndrome in adulthood. The Cardiovascular Risk in Young Finns Study. Ann Med. 2008;40:542–52.

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer; 2000.

    Google Scholar 

  13. 13.

    Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G. Longitudinal data analysis. Boca Raton: Chapman and Hall/CRC; 2008.

    Google Scholar 

  14. 14.

    Tu YK, Tilling K, Sterne JA, Gilthorpe MS. A critical evaluation of statistical approaches to examining the role of growth trajectories in the developmental origins of health and disease. Int J Epidemiol. 2013;42:1327–39.

    PubMed  Article  Google Scholar 

  15. 15.

    Raitakari OT, Juonala M, Ronnemaa T, Keltikangas-Jarvinen L, Rasanen L, Pietikainen M, Hutri-Kahonen N, Taittonen L, Jokinen E, Marniemi J, et al. Cohort profile: the cardiovascular risk in Young Finns Study. Int J Epidemiol. 2008;37:1220–6.

    PubMed  Article  Google Scholar 

  16. 16.

    Li L, Hardy R, Kuh D, Lo Conte R, Power C. Child-to-adult body mass index and height trajectories: a comparison of 2 British birth cohorts. Am J Epidemiol. 2008;168:1008–15.

    PubMed  Article  PubMed Central  Google Scholar 

  17. 17.

    Stuart B, Panico L. Early-childhood BMI trajectories: evidence from a prospective, nationally representative British cohort study. Nutr Diabetes. 2016;6:e198.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  18. 18.

    Finkel D, Reynolds CA, McArdle JJ, Pedersen NL. Cohort differences in trajectories of cognitive aging. J Gerontol B Psychol Sci Soc Sci. 2007;62:P286–294.

    PubMed  Article  Google Scholar 

  19. 19.

    Gimeno D, Ferrie JE, Elovainio M, Pulkki-Raback L, Keltikangas-Jarvinen L, Eklund C, Hurme M, Lehtimaki T, Marniemi J, Viikari JS, et al. When do social inequalities in C-reactive protein start? A life course perspective from conception to adulthood in the Cardiovascular Risk in Young Finns Study. Int J Epidemiol. 2008;37:290–8.

    CAS  PubMed  Article  Google Scholar 

  20. 20.

    Howe LD, Tilling K, Galobardes B, Smith GD, Gunnell D, Lawlor DA. Socioeconomic differences in childhood growth trajectories: at what age do height inequalities emerge? J Epidemiol Community Health. 2012;66:143–8.

    PubMed  Article  Google Scholar 

  21. 21.

    Carles S, Charles M-A, Forhan A, Slama R, Heude B, Botton J, group Emcs. A Novel Method to Describe Early Offspring Body Mass Index (BMI) Trajectories and to Study Its Determinants. PLoS One. 2016;11:e0157766.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  22. 22.

    Grajeda LM, Ivanescu A, Saito M, Crainiceanu C, Jaganath D, Gilman RH, Crabtree JE, Kelleher D, Cabrera L, Cama V, Checkley W. Modeling subject-specific childhood growth using linear mixed-effect models with cubic regression splines. Emerging Themes in Epidemiology. 2016;13:1.

    PubMed  Article  PubMed Central  Google Scholar 

  23. 23.

    Locascio JJ, Atri A. An overview of longitudinal data analysis methods for neurological research. Dement Geriatr Cogn Dis Extra. 2011;1:330–57.

    PubMed  Article  PubMed Central  Google Scholar 

  24. 24.

    Lenth RV. Least-Squares Means.R package version 2.22. In: Book Least-Squares Means.R package version 2.22. 2016.

    Google Scholar 

  25. 25.

    Goodnight JH, Harvey WR. Least-Squares Means in the Fixed-Effects General Linear Models.Technical Report R-103, Book Least-Squares Means in the Fixed-Effects General Linear Models.Technical Report R-103. NC: SAS Institute Inc; 1978.

    Google Scholar 

  26. 26.

    Harvey WR. Use of the HARVEY Procedure, Book Use of the HARVEY Procedure. NC: SAS Institute Inc; 1976.

    Google Scholar 

  27. 27.

    Liu C, Cripe TP, Kim MO. Statistical issues in longitudinal data analysis for treatment efficacy studies in the biomedical sciences. Mol Ther. 2010;18:1724–30.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  28. 28.

    Diggle PJ, Heagerty P, L K-Y, Zeger SLO. Analysis of Longitudinal Data. 2nd ed. Oxford: Oxford University Press; 2002.

    Google Scholar 

  29. 29.

    Kreft IGG. Are multilevel techniques necessary? An overview, including simulation studies, Book Are multilevel techniques necessary? An overview, including simulation studies. Los Angeles: California State University at Los Angeles; 1996.

    Google Scholar 

  30. 30.

    Kowalchuk RK, Keselman HJ. Mixed-model pairwise multiple comparisons of repeated measures means. Psychol Methods. 2001;6:282–96.

    CAS  PubMed  Article  Google Scholar 

  31. 31.

    Sullivan LM. Repeated measures. Circulation. 2008;117:1238–43.

    PubMed  Article  Google Scholar 

  32. 32.

    Greenhouse SW, Geisser S. On methods in the analysis of profile data. Psychometrika. 1959;24:95–112.

    Article  Google Scholar 

  33. 33.

    Huynh H, Feldt LS. Estimation of the Box Correction for Degrees of Freedom from Sample Data in Randomized Block and Split-Plot Designs. J Educ Behav Stat. 1976;1:69–82.

    Article  Google Scholar 

  34. 34.

    Li D, Dye TD. Power and stability properties of resampling-based multiple testing procedures with applications to gene oncology studies. Comput Math Methods Med. 2013;2013:610297.

    PubMed  PubMed Central  Google Scholar 

  35. 35.

    Westfall PJ, Young SS. Resampling-Based Multiple Testing. New York: Willey; 1993.

    Google Scholar 

  36. 36.

    Pe'er I, Yelensky R, Altshuler D, Daly MJ. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol. 2008;32:381–5.

    PubMed  Article  Google Scholar 

  37. 37.

    Westfall PH, Tobias RD, Rom D, Wolfinger RD, Hochberg Y. Multiple Comparisons and Multiple Tests Using the SAS System, Book Multiple Comparisons and Multiple Tests Using the SAS System. NC: SAS Institute Inc; 1999.

    Google Scholar 

  38. 38.

    Donohue MC, Aisen PS. Mixed model of repeated measures versus slope models in Alzheimer's disease clinical trials. J Nutr Health Aging. 2012;16:360–4.

    CAS  PubMed  Article  Google Scholar 

  39. 39.

    Goldstein H, Browne W, Rabash J. Multilevel modeling of medical data. Stat Med 2002: 21:3291–315. Stat Med. 2002;21:3291–315.

    PubMed  Article  Google Scholar 

  40. 40.

    Raudenbush SW, Bryk AS. Hierarchical linear models: Applications and data analysis methods. 2. Thousand Oaks: Sage Publications; 2002.

    Google Scholar 

  41. 41.

    Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2007.

    Google Scholar 

  42. 42.

    Duncan TE, Duncan SC, Strycker LA. An Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and Application, Second Edition. Mahwah: Lawrence Erlbaum Associates; 2006.

    Google Scholar 

  43. 43.

    Curran PJ, Obeidat K, Losardo D. Twelve Frequently Asked Questions About Growth Curve Modeling. J Cogn Dev. 2010;11:121–36.

    PubMed  Article  PubMed Central  Google Scholar 

  44. 44.

    Hox J, Stoel RD. Multilevel and SEM approaches to Growth Curve modeling. In: Everitt BS, Howell DC, editors. Encyclopedia of Statistics in Behavioral Science, vol. 3. Chichester: John Wiley & Sons; 2005.

    Google Scholar 

  45. 45.

    Singer JD, Willett JB. Applied Longitudinal Data Analysis: Modeling change and event occurence. New York: Oxford University Press; 2003.

    Book  Google Scholar 

  46. 46.

    Fitzmaurice G, Laird NM, Ware JH. Applied Longitudinal Analysis. 2nd ed. New Jersey: John Wiley & Sons; 2011.

    Google Scholar 

  47. 47.

    Preacher KJ. Latent growth curve models. In: Muelle GRHRO, editor. The reviewer's guide to quantitative methods in the social sciences (pp 185–198). London: Routledge; 2010. p. 185–98.

    Google Scholar 

  48. 48.

    Mirman D. Growth Curve Analysis and Visualization Using R. Boca Raton: Chapman and Hall/CRC; 2014.

    Google Scholar 

  49. 49.

    Mirman D, Dixon JA, Magnuson JS. Statistical and computational models of the visual world paradigm: Growth curves and individual differences. J Mem Lang. 2008;59:475–94.

    PubMed  Article  PubMed Central  Google Scholar 

  50. 50.

    Heo M, Faith MS, Mott JW, Gorman BS, Redden DT, Allison DB. Hierarchical linear models for the development of growth curves: an example with body mass index in overweight/obese adults. Stat Med. 2003;22:1911–42.

    PubMed  Article  Google Scholar 

  51. 51.

    Tilling K, Macdonald-Wallis C, Lawlor DA, Hughes RA, Howe LD. Modeling Childhood Growth Using Fractional Polynomials and Linear Splines. Ann Nutr Metab. 2014;65:129–38.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  52. 52.

    Sterba S. Fitting Non linear Latent growth Curve models with Inidvidually varying Time point. Struct Equ Model Multidiscip J. 2014;21:630–47.

    Article  Google Scholar 

  53. 53.

    Cudeck R, du Toit SHC. A nonlinear form of quadratic regression with interpretable parameters. Multivar Behav Res. 2002;37:501–19.

    Article  Google Scholar 

  54. 54.

    Harring JR, Cudeck R, du Toit SH. Fitting Partially Nonlinear Random Coefficient Models as SEMs. Multivariate Behav Res. 2006;41:579–96.

    PubMed  Article  Google Scholar 

  55. 55.

    Frenk SM, Yang YC, Land KC. Assessing the Significance of Cohort and Period Effects in Hierarchical Age-Period-Cohort Models: Applications to Verbal Test Scores and Voter Turnout in U.S. Presidential Elections. Soc Forces. 2013;92:221–48.

    PubMed  Article  PubMed Central  Google Scholar 

  56. 56.

    Raudenbush SW, Chan WS. Application of a hierarchical linear model to the study of adolescent deviance in an overlapping cohort design. J Clin Consulting Psychol. 1993;61:941–51.

    CAS  Article  Google Scholar 

  57. 57.

    Yang Y, Land KC. A mixed models approach to the age-period-cohort analysis of repeated cross-section surveys, with an application to data on trends in verbal test scores. Sociol Methodol. 2006;36:75–97.

    Article  Google Scholar 

  58. 58.

    Raudenbush SW, Chan WS. Growth Curve Analysis in Accelerated Longitudinal Designs. Criminology Penology. 1992;29:387–411.

    Google Scholar 

  59. 59.

    Howe LD, Tilling K, Benfield L, Logue J, Sattar N, Ness AR, Smith GD, Lawlor DA. Changes in ponderal index and body mass index across childhood and their associations with fat mass and cardiovascular risk factors at age 15. PLoS One. 2010;5(12):e15186.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  60. 60.

    Long J, Ryoo J. Using fractional polynomials to model non-linear trends in longitudinal data. Br J Math Stat Psychol. 2010;63:177–203.

    PubMed  Article  Google Scholar 

  61. 61.

    Tilling K, Sterne JAC, Wolfe CDA. Multilevel growth curve models with covariate effects: application to recovery after stroke. Stat Med. 2001;20:685–704.

    CAS  PubMed  Article  Google Scholar 

  62. 62.

    Liu X, Engel CE. Methods and Applications of Longitudinal Data Analysis. Elsevier Science Publishing Company Incorporated; 2015

  63. 63.

    Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modeling. Appl Stat-J R Stat Soc. 1994;43:29–67.

    Google Scholar 

  64. 64.

    Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol. 1999;28(5):964–74.

    CAS  PubMed  Article  Google Scholar 

  65. 65.

    Verbeke G. Linear mixed models for longitudinal data, Book Linear mixed models in practice. New York: Springer; 1997.

    Book  Google Scholar 

  66. 66.

    Lawrence FR, Blair C. Factorial invariance in preventive intervention: modeling the development of intelligence in low birth weight, preterm infants. Prev Sci. 2003;4:249–61.

    PubMed  Article  Google Scholar 

  67. 67.

    Howe LD, Tilling K, Matijasevich A, Petherick ES, Santos AC, Fairley L, Wright J, Santos IS, Barros AJ, Martin RM, et al. Linear spline multilevel models for summarising childhood growth trajectories: A guide to their application using examples from five birth cohorts. Stat Methods Med Res. 2013;4:249–61.

    Google Scholar 

  68. 68.

    Cudeck R, Klebe KJ. Multiphase mixed-effects models for repeated measures data. Psychol Methods. 2002;7:41–63.

    PubMed  Article  Google Scholar 

  69. 69.

    Muggeo MR, Atkins DC, Gallop RJ, Dimidjan S. Segmented mixed models with random changepoints: a maximum likelihood approach with application to treatment for depression study. Stat Model. 2014;14(4):293–313.

    Article  Google Scholar 

  70. 70.

    Flora DB. Specifying piecewise latent trajectory models for longitudinal data. Struct Equ Model. 2008;15:513–33.

    Article  Google Scholar 

  71. 71.

    Kohli N, Harring JR. Modeling growth in latent variables using a piecewise function. Multivar Behav Res. 2013;48:370–97.

    Article  Google Scholar 

  72. 72.

    Willett JB, Singer JD, Martin NC. The design and analysis of longtiduinal studies of development and psychopathology in context: statistical models and methodological recommendations. Dev Psychopathol. 1998;10:395–426.

    CAS  PubMed  Article  Google Scholar 

  73. 73.

    Li L, Hardy R, Kuh D, Lo Conte R, Power C. Child-to-Adult body mass index and height ttrajectories : A comparison of 2 british cohorts. 2008.

    Google Scholar 

  74. 74.

    Muggeo MR. Segmented mixed models with random changepoints in R. Working Paper. 2016.

  75. 75.

    Muggeo MR. Modeling temperature effects on mortality: multiple segmented relationships with common break points. Biostatistics. 2008;9:613–20.

    PubMed  Article  Google Scholar 

  76. 76.

    Muthen B. Second-generation structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class–latent growth modeling. Washington: American Psychological Association; 2001.

    Google Scholar 

  77. 77.

    Congdon P. Bayesian statistical modeling. New York: Wiley; 2001.

    Google Scholar 

  78. 78.

    Jacqmin-Gadda H, Commenges D, Dartigues JF. Random changepoint model for joint modeling of cognitive decline and dementia. Biometrics. 2006;62:254–60.

    PubMed  Article  PubMed Central  Google Scholar 

  79. 79.

    Lange N, Carlin BP, Gelfand AE. Hierarchical Bayes models for the progression of HIV infection using longitudinal CD4 T-cell numbers. J Am Stat Assoc. 1992;87:615–26.

    Article  Google Scholar 

  80. 80.

    Slate EH, Turnbull BW. Statistical models for longitudinal biomarkers of disease onset. Stat Med. 2000;19:617–37.

    CAS  PubMed  Article  Google Scholar 

  81. 81.

    Ghosh P, Vaida F. Random changepoint modeling of HIV immunologic responses. Stat Med. 2007;26:2074–87.

    PubMed  Article  Google Scholar 

  82. 82.

    Dominicus A, Ripatti S, Pedersen NL, Palmgren J. A random change point model for assessing variability in repeated measures of cognitive function. Stat Med. 2008;27:5786–98.

    PubMed  Article  PubMed Central  Google Scholar 

  83. 83.

    Hall CB, Lipton RB, Sliwinski M, Stewart WF. A change point model for estimating the onset of cognitive decline in preclinical Alzheimer's disease. Stat Med. 2000;19:1555–66.

    CAS  PubMed  Article  Google Scholar 

  84. 84.

    Hall CB, Ying J, Kuo L, Lipton RB. Bayesian and profile likelihood change point methods for modeling cognitive function over time. Comput Stat Data Anal. 2003;42:91–109.

    Article  Google Scholar 

  85. 85.

    Kiuchi A, Hartigan JA, Holford TR. Change points in the series of T4 counts prior to AIDS. Biometrics. 1995;51:236–48.

    CAS  PubMed  Article  Google Scholar 

  86. 86.

    Muniz-Terrera G, Van den Hout A, Matthews FE. Random change point models: investigating cognitive decline in the presence of missing data. J Appl Stat. 2011;38:705–16.

    Article  Google Scholar 

  87. 87.

    Sabin MA, Magnussen CG, Juonala M, Shield JPH, Kähönen M, Lehtimäki T, Rönnemaa T, Koskinen J, Loo B-M, Knip M, et al. Insulin and BMI as Predictors of Adult Type 2 Diabetes Mellitus. Pediatrics. 2015;135:e144–51.

    PubMed  Article  Google Scholar 

  88. 88.

    McCoach DB, Kaniskan B. Using Time-Varying Covariates in Multilevel Growth Models. Front Psychol. 2010;1(17):1–12.

    Google Scholar 

  89. 89.

    Miyazaki Y, Raudenbush SW. Tests for linkage of multiple cohorts in an accelerated longitudinal design. Psychol Methods. 2000;5:44–63.

    CAS  PubMed  Article  Google Scholar 

  90. 90.

    Galbraith S, Bowden J, Mander A. Accelerated longitudinal designs: An overview of modeling, power, costs and handling missing data. Stat Methods Med Res. 2014;92:221–48.

    Google Scholar 

  91. 91.

    Mostafa AA, Ghorbal A. Bayesian and non-Bayesian analysis for random changepoint problem using standard computer packages. Int Mathematical Archive. 2011;2:1963–79.

    Google Scholar 

  92. 92.

    Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. Boca Raton: Chapman & Hall/CRC; 2004.

    Google Scholar 

  93. 93.

    Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006;1:515–33.

    Article  Google Scholar 

  94. 94.

    Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2006.

    Book  Google Scholar 

  95. 95.

    Spiegelhalter DJ, Best NG, Carlin BP. Bayesian measure of model complexity and fit. J R Stat Soc Ser B. 2002;64:583–639.

    Article  Google Scholar 

  96. 96.

    Plummer M. Penalized loss functions for Bayesian model comparison. Biostatistics. 2008;9:523–39.

    PubMed  Article  Google Scholar 

  97. 97.

    Burnham K, Anderson D. Model selection and multimodel inference: a practical information-theoretic approach. New York: Springer; 2002.

    Google Scholar 

  98. 98.

    Juonala M, Viikari JS, Raitakari OT. Main findings from the prospective Cardiovascular Risk in Young Finns Study. Curr Opin Lipidol. 2013;24:57–64.

    CAS  PubMed  Article  Google Scholar 

  99. 99.

    Chivers P, Hands BP, Parker H, Beilin L, Kendall G, Bulsara M. Longitudinal modeling of body mass index from birth to 14 years. Obes Facts. 2009;2:302–10.

    PubMed  Article  Google Scholar 

  100. 100.

    Rolland-Cachera MF, Deheeger M, Maillot M, Bellisle F. Early adiposity rebound: causes and consequences for obesity in children and adults. Int J Obes (Lond). 2006;30 Suppl 4:S11–17.

    Article  Google Scholar 

  101. 101.

    Karlberg J. On the modeling of human growth. Stat Med. 1987;6:185–92.

    CAS  PubMed  Article  Google Scholar 

  102. 102.

    Silverwood RJ, De Stavola BL, Cole TJ, Leon DA. BMI peak in infancy as a predictor for later BMI in the Uppsala Family Study. Int J Obes (Lond). 2009;33:929–37.

    CAS  Article  Google Scholar 

  103. 103.

    Kuczmarski RJ. 2000 CDC Growth Charts for the United States: methods and development. Vital Health Statistics1. 2002;11:1–190.

    Google Scholar 

  104. 104.

    Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D. The BUGS Book: A Practical Introduction to Bayesian Analysis. Boca Raton: CRC press; 2012.

    Google Scholar 

  105. 105.

    Wood S. Generalized additive models: an introduction with R. Boca Raton: CRC press; 2006.

    Google Scholar 

  106. 106.

    Dietz WH. Critical periods in childhood for the development of obesity. Am J Clin Nutr. 1994;59:955–9.

    CAS  PubMed  Google Scholar 

  107. 107.

    Rogol AD, Clark PA, Roemmich JN. Growth and pubertal development in children and adolescents: effects of diet and physical activity. Am J Clin Nutr. 2000;72:521–8.

    Google Scholar 

  108. 108.

    Ostbye T, Malhotra R, Landerman LR. Body mass trajectories through adulthood: results from the National Longitudinal Survey of Youth 1979 Cohort (1981–2006). Int J Epidemiol. 2011;40:240–50.

    PubMed  Article  Google Scholar 

  109. 109.

    Li L, Hardy R, Kuh D, Power C. Life-course body mass index trajectories and blood pressure in mid life in two British birth cohorts: stronger associations in the later-born generation. Int J Epidemiol. 2015;44:1018–26.

    PubMed  Article  PubMed Central  Google Scholar 

  110. 110.

    Guo SS, Huang C, Maynard LM, Demerath E, Towne B, Chumlea WC, Siervogel RM. Body mass index during childhood, adolescence and young adulthood in relation to adult overweight and adiposity: the Fels Longitudinal Study. Int J Obes Relat Metab Disord. 2000;24:1628–35.

    CAS  PubMed  Article  Google Scholar 

  111. 111.

    Williams S, Davie G, Lam F. Predicting BMI in young adults from childhood data using two approaches to modeling adiposity rebound. Int J Obes Relat Metab Disord. 1999;23:348–54.

    CAS  PubMed  Article  Google Scholar 

  112. 112.

    Wen X, Kleinman K, Gillman MW, Rifas-Shiman SL, Taveras EM. Childhood body mass index trajectories: modeling, characterizing, pairwise correlations and socio-demographic predictors of trajectory characteristics. BMC Med Res Methodol. 2012;29:12–38.

    Google Scholar 

  113. 113.

    Besharat Pour M, Bergstrom A, Bottai M, Magnusson J, Kull I, Moradi T. Age at adiposity rebound and body mass index trajectory from early childhood to adolescence; differences by breastfeeding and maternal immigration background. Pediatr Obes. 2016;30 Suppl 4:S11–17.

    Google Scholar 

  114. 114.

    Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modeling. J R Stat Soc, Appl Stat. 1994;43:429–67.

    Google Scholar 

  115. 115.

    McArdle JJ, Wang L. Modeling age-based turning points in longitudinal life-span growth curves of cognition. New York: Routledge/Taylor & Francis Group edn; 2008.

    Google Scholar 

  116. 116.

    Ram N, Grimm K. Using simple and complex growth models to articulate developmental change: Matching theory to method. Int J Behav Dev. 2007;31:303–16.

    Article  Google Scholar 

  117. 117.

    Tilling K, Davies NM, Nicoli E. Associations of growth trajectories in infancy and early childhood with later childhood outcomes. Am J Clin Nutr 2011;94 Suppl 6:1808s-13s. Am J Clin Nutr. 2011;94:1808s–13s.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  118. 118.

    Magee CA, Caputi P, Iverson DC. Identification of distinct body mass index trajectories in Australian children. Pediatr Obes. 2013;8:189–98.

    CAS  PubMed  Article  Google Scholar 

  119. 119.

    Ekberg J, Angbratt M, Valter L, Nordvall M, Timpka T. History matters: childhood weight trajectories as a basis for planning community-based obesity prevention to adolescents. Int J Obes. 2012;36:524–8.

    CAS  Article  Google Scholar 

  120. 120.

    Huang DY, Lanza HI, Wright-Volel K, Anglin MD. Developmental trajectories of childhood obesity and risk behaviors in adolescence. J Adolesc. 2013;36:139–48.

    PubMed  Article  Google Scholar 

Download references


We thank the clinic and administrative staff for their contribution to data collection. Above all, we thank the participants of the Cardiovascular Risk in Young Finns Study.


The Young Finn study was financially supported by the Academy of Finland (grants 126925, 121584, 124282, 129378, 117797, 265877 and 41071), the Social Insurance Institution of Finland, the Turku University Foundation, Paavo Nurmi Foundation, Juho Vainio Foundation, Sigrid Juselius Foundation, Maud Kuistila Foundation, Research funds from the Kuopio, Turku and Tampere University Hospitals, the Finnish Foundation of Cardiovascular Research, the Finnish Medical Foundation, the Orion-Farmos Research Foundation, and the Finnish Cultural Foundation. This work was partly funded by the National Health and Medical Research Council Project Grant (APP1098369). CGM was supported by a National Heart Foundation of Australia Future Leader Fellowship (100849).

Availability of data and materials

The dataset supporting the conclusions of this article were obtained from the Cardiovascular Risk In Young Finns study (YFS) after submission and approval of our study plan by the Young Finns Study coordinators. The YFS dataset comprises health related participant data and their use is therefore restricted under the regulations on professional secrecy (Act on the Openness of Government Activities, 612/1999) and on sensitive personal data (Personal Data Act, 523/1999, implementing the EU data protection directive 95/46/EC). Due to these ethical restrictions, the data from this study cannot be stored in public repositories or otherwise made publicly available. However, data access may be permitted on a case to case basis upon request only. Data requests should be addressed to the project coordinator, Dr. Olli Raitakari (, Tel: +358 2 333 7220 Fax: +358 2 333 7270. Written requests should be sent to the following postal address: Research Centre of Applied and Preventive Cardiovascular, Medicine University of Turku, Kiinamyllynkatu 10, FIN-20520 Turku, FINLAND.

Authors’ contributions

Analyzed the data: MJB SSW. Wrote the paper: MJB CGM RJT. Designed the study: MJB CGM RJT OTR. Collected the clinical data: JSAV TL MJ NHK OTR. Revised several drafts of the manuscript for critically for important intellectual content: DPB MAS JSAV, NHK, TL MJ CGM RJT OTR. Reviewed and approved the final manuscript: MJB SSW CGM RJT JSAV TL MJ NHK DPB MAS OTR.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

YFS Participants or their parents provided written informed consent, and the study was approved by local ethics committees (The Ethics Committee of the Hospital District of Southwest Finland) in agreement with the Declaration of Helsinki.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information



Corresponding author

Correspondence to Russell J. Thomson.

Additional files

Additional file 1:

Additional information on BMI, T2DM status and fasting insulin information collection in the YFS subset used in the illustrative analyses. (DOCX 17 kb)

Additional file 2:

Subset of the YFS cohort used for the BMI trajectory analysis. Reported are the total number (No.) of participants seen at each clinic year and their ages (Figure S1.) Density plot of the number of BMI measures per YFS participants in the subset of the cohort used for the BMI trajectory analysis (Figure S2.) and average BMI values in kg/m2 at each age stratified by T2DM group (pink, no adult T2DM; blue, Adult T2DM), with error bars representing the mean BMI ± SD (standard deviation) (Figure S3.) (DOCX 180 kb)

Additional file 3:

Spaghetti plot of the individual trajectories of those with T2DM in adulthood (N = 88) and those who did not develop T2DM in adulthood (N = 2452). Red solid line: loess smoother curve indicating the average longitudinal trend in each group (Figure S1.) and scatterplot of the life-course BMI data (in kg/m2) stratified by sex. Solid lines and gray bands: loess smoothed average trajectories and confidence intervals for each group (adult T2DM vs. non-T2DM group); dashed lines: age-specific averages of BMI levels (Figure S2.) (DOCX 1918 kb)

Additional file 4:

Prior sensitivity analyses methods and results. (DOCX 25 kb)

Additional file 5:

Log-likelihood profiling method and R-code for the choice of priors of the changepoint mean. (μcp). (DOCX 14 kb)

Additional file 6:

Annotated RJAGS sample code to fit a type 1 trajectory divergence model with a fully unstructured 4 by 4 covariance matrix for the random growth parameters. (DOCX 14 kb)

Additional file 7:

Posterior mean parameter estimates for best fitting birth cohort-adjusted trajectory model for each sex (Model E for females and Model A for males). (DOCX 16 kb)

Additional file 8:

Results of mixed models with age as a categorical predictor and log-insulin as a continuous predictor: LS means contrasts (No adult T2DM vs. adult T2DM) and significance at each age averaged over- (Table S1.) or adjusted for the levels of sex (Table S2.) and pairwise comparisons of Least-square means of BMI and 95% CIs at each age in each T2DM status group averaged over levels of sex (Figure S1.) at each age in each T2DM status group and sex group combination (M = males, F-females, 1 = No adult T2DM, 2 = adult T2DM) (Figure S2.) and adjusted for log(insulin). (DOCX 549 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Buscot, Mj., Wotherspoon, S.S., Magnussen, C.G. et al. Bayesian hierarchical piecewise regression models: a tool to detect trajectory divergence between groups in long-term observational studies. BMC Med Res Methodol 17, 86 (2017).

Download citation


  • Piecewise model
  • Hierarchical regression
  • Non-linear trajectory model
  • Accelerated longitudinal design
  • Cohort effect
  • Group divergence