 Research article
 Open Access
 Published:
Using linear and natural cubic splines, SITAR, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies
BMC Medical Research Methodology volumeÂ 22, ArticleÂ number:Â 68 (2022)
Abstract
Background
Longitudinal data analysis can improve our understanding of the influences on health trajectories across the lifecourse. There are a variety of statistical models which can be used, and their fitting and interpretation can be complex, particularly where there is a nonlinear trajectory. Our aim was to provide an accessible guide along with applied examples to using four sophisticated modelling procedures for describing nonlinear growth trajectories.
Methods
This expository paper provides an illustrative guide to summarising nonlinear growth trajectories for repeatedly measured continuous outcomes using (i) linear spline and (ii) natural cubic spline linear mixedeffects (LME) models, (iii) Super Imposition by Translation and Rotation (SITAR) nonlinear mixed effects models, and (iv) latent trajectory models. The underlying model for each approach, their similarities and differences, and their advantages and disadvantages are described. Their application and correct interpretation of their results is illustrated by analysing repeated bone mass measures to characterise bone growth patterns and their sex differences in three cohort studies from the UK, USA, and Canada comprising 8500 individuals and 37,000 measurements from ages 5â€“40â€‰years. Recommendations for choosing a modelling approach are provided along with a discussion and signposting on further modelling extensions for analysing trajectory exposures and outcomes, and multiple cohorts.
Results
Linear and natural cubic spline LME models and SITAR provided similar summary of the mean bone growth trajectory and growth velocity, and the sex differences in growth patterns. Growth velocity (in grams/year) peaked during adolescence, and peaked earlier in females than males e.g., mean age at peak bone mineral content accrual from multicohort SITAR models was 12.2â€‰years in females and 13.9â€‰years in males. Latent trajectory models (with trajectory shapesÂ estimated using a natural cubic spline)Â identified up to four subgroups of individuals with distinct trajectories throughout adolescence.
Conclusions
LME models with linear and natural cubic splines, SITAR, and latent trajectory models are useful for describing nonlinear growth trajectories, and these methods can be adapted for other complex traits. Choice of method depends on the research aims, complexity of the trajectory, and available data. Scripts and synthetic datasets are provided for readers to replicate trajectory modelling and visualisation using the R statistical computing software.
Background
Appropriate modelling of repeated measures in cohort studies can improve our understanding of: (i) patterns of change across the lifecourse (e.g., developmental trajectories to peak function and agerelated decline); (ii) influences on these patterns of change; and (iii) influence of variation in patterns of change on later health and wellbeing [1]. Many developmental processes display nonlinear patterns of change with age, especially during the growing years, which makes it important but challenging to accurately model their trajectories [2]. Also requiring attention is the choice of method to appropriately address the research question, e.g., whether to use methods that model an average trajectory in the whole sample [3,4,5] or clustering based approaches to identify groups of individuals with similar trajectories [6]. Moreover, there is a lack of accessible and practical guidance which can discourage novice researchers from using more complex methods [7]. Thus, an overview of sophisticated modelling procedures along with opensource software applications and application in multiple different cohorts can help address these challenges.
This paper provides a guide to describing nonlinear longitudinal growth trajectories for a single repeatedlymeasured continuous outcome using linear and natural cubic regression splines [3, 4], SITAR (Super Imposition by Translation and Rotation) models [5], and (splinebased)Â latent trajectory models [6] â€“ all common methods for examining growth. The next section of the paper gives an overview of modelling nonlinear growth and the various models considered. The four approaches (and the appropriate interpretation of their results) are then illustrated by modelling bone mass trajectories across three cohort studies to characterise patterns of change and their sex differences. The final section provides recommendations about when different modelling methods might be useful and discusses approaches and challenges in analysing exposures and outcomes of patterns of change, and in making crosscohort comparisons.
Methods
Modelling nonlinear growth
A variety of statistical methods are available for handling repeated (correlated) observations from the same individuals and analysing trajectories [8]. Most methods involve fitting growth models within either a structural equation modelling framework (e.g., latent growth curve analysis where change is analysed as a latent process [9]) or a multilevel modelling framework, with both giving similar results under certain conditions [10]. One type of repeated measures model that is useful when the primary interest is estimating a populationaverage trajectory is generalised estimating equations (GEE) [11, 12]. This uses a working covariance matrix to correct for the dependence among repeated observations but is usually not suited for examining variation within/between individuals. Another repeated measures model that can estimate both populationaverage and individualspecific trajectories (and is more robust to missing outcome data than GEE) is the mixedeffects model [13].
Mixedeffects models
Mixedeffects models (randomeffects, multilevel, or hierarchical models) estimate a populationaverage trajectory as â€˜fixed effectsâ€™ and variation of individual trajectories around this average as â€˜random effectsâ€™ [12,13,14]. A common form is the linear mixedeffects (LME) model where the repeated outcome is modelled by a linear combination of the fixed and random effects. An LME model for a single continuous outcome (e.g., weight), as a linear function of time, which includes random intercepts and random slopes can be written as follows:
where, y_{ij} denotes a single outcome y measured in individual iÂ (iâ€‰=â€‰1,â€‰2,â€‰â€¦,â€‰N) at time t_{ij}Â (jâ€‰=â€‰1,â€‰2,â€‰â€¦,â€‰J_{i}), with responses (y_{1}â€¦y_{N}) assumed to be independent between individuals. Î²_{oi} and Î²_{1i} are individualspecific intercept and slope terms (respectively) that have fixed effects (Î²_{o},â€‰Î²_{1}) and random effects (u_{oi},â€‰u_{1i}). The random effects u_{i} are assumed to be independently normally distributed with mean zero and covariance matrix Î©_{u}. Residual errors Îµ_{ij} are assumed to be independently identically normally distributed with variance \({\sigma}_{\varepsilon}^2\) and reflect the difference between observed and predicted values for individual i at occasion j. The random effects u_{i} and residuals Îµ_{ij} are assumed to be mutually independent.
Moving beyond a linear trajectory
The LME model in Eq. (1) assumes linear change in the outcome with increasing time (e.g., age). Nonlinear change, which is common (particularly when modelling change over a large age range and/or during periods of rapid complex growth) can be incorporated into LME models by including linear combinations of nonlinear terms for age in the model â€“ i.e., keeping the linear link function. Historically, the standard approach has been to make use of polynomial functions to approximate nonlinear curves. However, as illustrated in Fig.Â 1, polynomials have limitations, e.g., simpler polynomials give few curve shapes and more complex polynomials tend to fit badly at extremes and produce artefactual turns in the curve. A more flexible alternative to analysing complex patterns of change (e.g., with several peaks/troughs, as with body mass index (BMI) over infancy, childhood, and adolescence) is using a set of connected polynomials covering different segments of the time/age distribution known as spline functions.
Spline functions
A spline function is a set of piecewise polynomials that are joined together at break/turning points called knots (see [16] for an introductory review). Splines are formed using spline basis functions (the dimensional space containing each element from a set of local polynomials) and can be fitted by various methods including linear and nonlinear models. The polynomial degree and method of knot placement is generally what distinguishes between different spline functions. There are three main ways of estimating spline functions: as regression splines, penalised regression splines, and smoothing splines [12, 16,17,18,19]. Regression splines use fewer knots than observations â€“ here both the number and location of knots must be specified by the user. Penalised regression splines are regression splines that include an added penalty on the parameters â€“ here, the user does not need to specify the knots, instead, a (maximum) spline basis complexity (flexibility) is specified, and the spline coefficients are then penalised by a tuning parameter to avoid overfitting. Smoothing splines use a knot at every unique observation with penalisation of the estimated curve â€“ these preceded penalised regression splines and are now less widely used.
In this paper, we consider two commonly used types of regression splines: linear splines and natural cubic splines. These can be parameterised by a linear combination of the transformed covariate (e.g., age) and thus they can be fitted within an LME framework to model a repeated outcome as a nonlinear function of time. Fitting penalised regression spline models is covered briefly in the Discussion section. The next three sections give an overview to using linear and natural cubic splines in LME models and approaches to choosing the number and position of knots. SITAR, which utilises a natural cubic spline within a nonlinear mixed effects framework, is then covered. These three mixedeffects modelling approaches all estimate populationaverage trajectories (and individual variation from these) and so can be used when it is hypothesised that the study population shares a common mean trajectory. Extensions to latent trajectory models (with trajectory shapesÂ estimated using natural cubic splines)Â for situations where the aim/hypothesis is to identify suspected latent heterogenous trajectories in subpopulations as opposed to one mean trajectory for the whole population are covered in the subsequent section. All four modelling approaches are summarised in TableÂ 1.
Linear spline LME models
The simplest spline function is the linear spline which can be used to describe growth by a series of connected lines joined at knots, where the slope can change after each knot [3]. For example, a linear spline for age (measured from 5 to 40â€‰years) with 2 knots at ages 10 and 15â€‰years produces three different linear slopes of the repeated outcome measure (e.g., weight): 5 to â‰¤10, 10 to â‰¤15, and 15 to â‰¤40â€‰years. The LME model in Eq. (1) can be rewritten to include a linear spline function for age b(t) with K knots Î¾_{1}â€‰<â€‰Î¾_{2}â€‰<â€‰â€¦Î¾_{K} as:
The model in Eq. (2) includes a linear spline in both the fixed and random effects (with b_{ki} having fixed effect b_{k} and random effect v_{ki}) which allows for nonlinear mean and individual trajectories, respectively. The fixed and random splines are assumed to have the same knots in Eq. (2) however, it is possible to allow fewer knots in the random spline. If the aim was solely to model a nonlinear mean trajectory, then Eq. (2) can be simplified by replacing the random spline with a random line â€“ here, Î²_{oi} and Î²_{1i} have similar interpretation as in Eq.Â (1), with random effect v_{ki} omitted (i.e., b_{ki}â€‰=â€‰b_{k}). The rate of change of a linear spline (1st derivative) is not continuous at the knots. An alternative function that has continuous 1^{st} and 2^{nd} derivatives is the natural cubic spline.
Natural cubic spline LME models
A natural cubic spline (also known as restricted cubic spline) is a set of cubic polynomials with continuity and slope constraints at each knot, and additional constraint of linearity at the extremes of the curve, typically before the first and after the last knot [12, 16, 20, 21]. This extra linearity constraint makes the trajectory less erratic at the ends of the age distribution and so more reliable than linear splines (and unrestricted cubic splines). The inclusion of cubic terms with continuity constraints means natural cubic spline models can be more parsimonious for complex shapes than a linear spline with many knots. An LME model that includes a natural cubic spline function b(t) with K knots Î¾_{1}â€‰<â€‰Î¾_{2}â€‰<â€‰â€¦Î¾_{K} (and a linearity constraint for values tâ€‰<â€‰Î¾_{1} and tâ€‰>â€‰Î¾_{K}) can be written as:
In words, a natural cubic spline for age (measured from 5 to 40â€‰years) with 2 knots at 10 and 15â€‰years (and with first and last knots at 5 and 40â€‰years, respectively), invokes 3 cubic polynomials: between 5 to â‰¤10, 10 to â‰¤15, and 15 to â‰¤40â€‰years, and has its curvature equal to 0 at ages 5 and 40â€‰years. If the first and last knots were placed at older and younger ages respectively, then the curve would be linear from the first and last knot to the youngest and oldest ages, respectively. Note, we have defined the natural cubic spline using truncated power basis (like how the linear spline was defined). The analysis software creates this function using a Bspline basis which is mathematically challenging to represent but numerically more stable. Whatever basis is used, if the polynomial degree and knots are identical, then the spline will always be the same.
Choosing number and location of knots
The flexibility of regression splines is determined by the number/position of the knot points. For a natural cubic spline, the number of knots (rather than their position) is more important [20]. A small number of knots (between 3 and 5 knots) provides a good fit to some patterns [20] though, with many repeats (e.g., data spanning many decades) more knots may be required. Approaches to selecting theÂ number/position of knots include (i) placing knots at quantiles of the age distribution (ii) using equally spaced knots, (iii) inspecting smoothing curves and using these to select knots, (iv) starting with many knots and reducing their number, and (v) placing knots at the mean age of data collection [3].
Model selection can be done informally (i.e., by inspecting plots of fitted values/residuals from competing models with different knots). Valid comparison between models with different knots (i.e., models with nonnested mean structures) can be done using likelihoodbased information criteria (e.g., Bayesian Information Criterion value (BIC)) provided maximum likelihood (ML) estimation is used [22]. Note however, MLbased variance estimates are biased downwards [12] and so one approach is to compare models fitted by ML and by restricted maximum likelihood (REML). Crossvalidation is also useful for model selection [23]. If knot position was a primary interest (e.g., testing sensitive periods), then topic knowledge can inform placement of knots [24].
SITAR models
SITAR is a shape invariant nonlinear mixedeffects model [5, 25]. Whereas LME spline models are linear models that allow terms to describe a nonlinear trajectory with age, nonlinear mixedeffects models are fundamentally nonlinear in the coefficients [12]. SITAR assumes that a study population has a common characteristic curve (fitted as fixed effects), which through shifting and scaling (by a set of 3 random effects) can be transformed into any individual curve. Following the notation in Cole et al [5, 26], a SITAR model for outcomeÂ y can be written as:
where y_{ij} is the outcome measurement for individual i at age j; Î±_{0}, Î²_{0}, Î³_{0} are fixed effects; Î±_{i}, Î²_{i}, and Î³_{i} are random effects for the i th individual; h(t) is a natural cubic spline curve; and Îµ_{ij} are independent normally distributed errors. The 3 random effects describe the size (Î±_{i}), timing (Î²_{i}), and intensity (Î³_{i}) of individual growth relative to the mean growth curve. Î±_{i} adjusts for the differences in y and geometrically reflects individual shifts up or down (translation) in the mean curve; Î²_{i} adjusts for differences in the timing of peak growth in y and geometrically reflects left to right shifts (translation) in the mean curve; and Î³_{i} adjusts for the duration of the growth spurt and geometrically corresponds to shrinking or stretching of the age scale and rotating the curve.
Note, a key difference from other mixed effects models, with or without a natural cubic spline mean curve, is that SITAR models growth on both the x and yaxes â€“ this allows differences in developmental age to be modelled. Common practice in selecting the best fitting SITAR models is to compare models with varying number of knots placed at quantiles of the age distribution for the spline curve [5]. The internal SITAR model structure is also customisable [15].
Latent trajectory models
The spline and SITAR models described above assume that the population is homogenous and described at the population level by a mean trajectory, with variability of individuals about this mean. As an alternative, latent trajectory models assume that there is a heterogenous population composed of unknown subgroups (latent classes) of individuals, each characterised by a unique mean trajectory profile [6, 27, 28]. Therefore, latent trajectory modelling addresses a different research hypothesis where heterogeneity in the populationaverage (mean) trajectory is of key interest. Latent trajectory models aim to minimise within group variance and maximise between group differences so that individuals are more similar within groups than between groups. Each individual has a probability of belonging to each latent class and is assigned to the class with the highest probability. Class membership is defined using a latent discrete random variable, with membership probability described by a multinomial logistic model.
Several latent trajectory modelling approaches are possible (see [6] for a recent overview). These include models that ignore the longitudinal structure (known as longitudinal latent class analysis) and models assuming no variability between individuals within subgroups (known as latent class growth analysis or groupbased trajectory models). Another approach that is a direct extension of standard mixedeffects models is a growth mixture model, which involves fitting multiple growth curves to subgroups of individuals that share a common trajectory. Following the notation in [6], the LME model in Eq. (1) can be rewritten as a growth mixture model as:
where C indicates number of latent classes, with probabilities p_{c}, câ€‰=â€‰1, â€¦, C, with 0â€‰â‰¤â€‰p_{c}â€‰â‰¤â€‰1 and \({\sum}_{c=1}^C{p}_c=1\). All other terms are defined as before but specifically for each class c. Growth can be parameterised as nonlinear, e.g., using a natural cubic spline curve in each class (assuming same number/position of knots). Classspecific covariances for individuallevel error terms can be included, and both fixed and random effects can be class specific.
Model estimation is conditional on a prespecified number of classes, with the optimal number of classes identified through a combination of approaches. These include assessing interpretability and plausibility of classes e.g., inspecting if trajectories show biologically plausible patterns and examining characteristics (e.g., socioeconomic position) of the classes [29], information criteria, entropy (statistic for class separation), and numerically meaningful subgroups (e.g., â‰¥5% class size). Models with >â€‰1 class are prone to local maxima solutions (i.e.,Â convergence to theÂ best solution in a neighbourhood of the parameter space, rather than the global maximum (largest loglikelihood)). This can be avoided by using different starting values [30].
Illustrative example
Next, using data from three cohort studies, we demonstrate the application of the four modelling approaches described above to characterise bone mineral content (BMC) growth trajectories and their sex differences, including in population subgroups with potentially distinct trajectories.
Bone mass through the lifeÂ course
Bone mass in early life is thought to be an important determinant of fracture and osteoporosis risk in later life [31] however, few studies have described its developmental trajectory. Furthermore, sex differences in osteoporotic fracture are assumed to be due to menopause but may also reflect early sexual dimorphism in bone development. This is a timely exploration, given the availability of studies with repeated measurements of BMC (a marker of bone strength).
Studies and measurements
Three studies with repeated BMC measurements from childhood to adulthood were included (TableÂ 2, AdditionalÂ fileÂ 1): the Avon Longitudinal Study of Parents and Children (ALSPAC) [32, 33], Bone Mineral Density in Childhood Study (BMDCS) [34], and Pediatric Bone Mineral Accrual Study (PBMAS) [35]. Totalbody (excludinghead) BMC was measured in grams using wholebody DualEnergy Xray Absorptiometry (DXA) scans. Of note, the studies used DXA devices from different manufacturers (Lunar vs. Hologic) which scale differently and are not interchangeable but repeat scans within studies were acquired on the same device. Individuals from each study were included if they had â‰¥1 measure of BMC and no missing data on age at DXA scan (in years) or sex. Analyses were restricted to white ethnicity because 2 cohorts were ethnically homogeneous [33, 35]. The final analysis samples comprised 3888 males and 4007 females in ALSPAC, 465 males and 488 females in BMDCS, and 112 males and 127 females in PBMAS. All studies had ethics approval and obtained parental or participant informed consent.
Statistical analysis
Analyses were performed in R version 4.0.2 (R Project for Statistical Computing) and RStudio version 1.3.1 (RStudio Team). R code is available at https://github.com/aelhak/nltmr/. Synthetic versions of the PBMAS cohorts were simulated [36] and can be found in the same repository.
Prior to trajectory modelling, scatterplots of BMC against age, and line plots of the individual BMC trajectories were used to inspect the form of the trajectory and identify clearly outlying observations. The datasets used for trajectory modelling (Fig.Â 2) showed as expected nonlinear change in BMC with age, and higher BMC in ALSPAC due to Lunar device. The numbers of individuals at each visit were described and age and BMC were summarised with means and standard deviations (Additional file 1). Models were fitted separately by sex due to expected difference in bone mineral content accrual, and our aim was to explore this in the illustrative example.
Linear and natural cubic spline LME models were fitted using the â€˜lme4â€™ package [37]. Models with 2 to 6 knots (placedÂ at quantiles of the age distribution) in the fixed effects curve were compared. Nonlinear individual trajectories were allowed by including a random effects spline with 1 knot at the median. SITAR models with 2 to 6 knots (at quantiles of the age distribution) in the spline curve (and three randomeffects) were fitted using the â€˜sitarâ€™ package [15]. LME and SITAR models were fitted by ML and best fitting models (optimum number of knots) were determined by the smallest BIC (AdditionalÂ fileÂ 2). Goodness of fit for the selected models was assessed by examining residuals (conditional on the random effects) from LME models and variance in BMC explained by SITAR models. The selected models were used to describe BMC growth trajectory and growth velocity. Slopes of the fixed effects spline segments from the linear spline models were used to summarise mean BMC velocity during different age windows and identify windows for peak growth. Mean peak BMC velocity and age at peak BMC velocity were obtained from the natural cubic spline LME and SITAR models by differentiating the mean spline curves.
Growth mixture models were fitted using the â€˜lcmmâ€™ package [30]. The forms of the best fitting mean natural cubic spline curves were used to model the fixed effects age curve. Models included random intercepts and random linear age slopes. Models with different numbers of latent classes were compared: from a 1class model (i.e., a standard natural cubic spline LME model where all individuals follow a single mean trajectory) up to models with 5 classes. Models with >â€‰1 class included classspecific random effects covariance matrices. An automatic search procedure was used to estimate each 2â€“5 class model for 100 iterations using random initial values from the distribution of the 1class model. Optimal number of classes was chosen by inspecting predicted trajectory subgroups from each model for biological plausibility, in addition to the smallest BIC and biggest entropy, and by excluding small class size (â‰¥5%). Goodness of fit and discrimination capacity of the selected models (i.e. with optimal number of classes) was assessed by calculating posterior class membership probabilities [30].
Results
The mean predicted BMC trajectories in each cohort from the linear spline LME (Fig.Â 3), natural cubic spline LME (Fig.Â 4), and SITAR models (Fig.Â 5) showed that BMC increased with age up to a plateau in young adulthood, and thereafter remained stable to age 40 (inÂ PBMAS). All models showed evidence of steeper growth trajectories through adolescence coinciding with emerging sex differences, with males subsequently having higher BMC than females, and plateauing later than females. BMC trajectories from all models were broadly similar (Fig.Â 6). Both linear and natural cubicÂ spline LME models and SITAR provided a good fit to the data (OnlineÂ Resource 2). The mean BMC growth velocity in different age windows (from linear spline LME models) peaked during adolescence, and the peak was lower and occurred earlier in females than males (TableÂ 3). Mean BMC growth velocity curves from natural cubic spline LME and SITAR models were similar (Figs. 45): for all cohorts, the mean ages at peak BMC velocity from both models were within the age windows for peak growth identified by the linear spline LME models (TableÂ 4).
The selected growth mixture models identified 3 subgroups in females and 2 subgroups in males from ALSPAC and BMDCS, and 4 in females and 3 in males from PBMAS (Fig.Â 7). Overall, differences in mean BMC between subgroups were larger during adolescence than in childhood and adulthood. One group with 15% of PBMAS females had higher mean BMC up to age 40 than the remaining three groups. A group comprising 27% of PBMAS males reached a lower peak and showed signs of bone loss by age 40, compared to the other two groups. The 2 trajectory groups in ALSPAC males displayed a biologically implausible lack of plateau by early adulthood. Model discrimination capacity was better in PBMAS (and BMDCS) than in ALSPAC, likewise entropy was high in PBMAS and low in ALSPAC and BMDCS (AdditionalÂ fileÂ 3).
Discussion
Our results provide evidence on bone mineral accrual from 5 to 40â€‰years. Linear and natural cubic splineÂ LME and SITAR showed that the levels and rates of change in BMC were greater for males than females, with peak gains in adolescence in both, but later in males than females. Growth mixture (latent trajectory)Â models (with trajectory shapesÂ estimated usingÂ natural cubic splines)Â identified potentially distinct trajectory subgroups, with the greatest betweengroup differences seen in adolescence. While the main aim of these analyses was to illustrate different modelling approaches, our results are consistent with previous studies on sex differences in BMC and suggest that in both sexes puberty is an important period for peak bone accrual [38,39,40].
Choosing a modelling approach
All the modelling techniques presented may be used to model BMC and other growth processes. Both spline LME models and SITAR were previously used on height/weight/BMI [4, 26, 41, 42]; linear splines LME and SITAR models were previously used on bone density [38, 43]; linear and natural cubic spline LME were used to analyse blood pressure change [44, 45]. Choice of method will be determined by research question, including whether this is concerned with differences in mean change over time in the whole study population (where LME or SITAR may be useful), or if the aim is to identify datadriven population subgroups for patterns of change (where a latent trajectory model may be useful). Complexity of the underlying trajectory, and data availability i.e., number of individuals and repeat measurements will also influence choice of method (see [46, 47] for a discussion on sample size in growth models). All models presented can handle unbalanced datasets (i.e., with individually varying measurement occasions, as in our example). If the dataset is balanced (i.e., all measurements taken at the same schedule for everyone, e.g., participants are measured at exactly 2, 4, and 6â€‰years of age), this will limit the complexity of models that can be fitted.
When the main aim is to quantify growth rate at different periods of the lifecourse, then linear splines may be preferred because of their more interpretable slope coefficients compared to the natural cubic spline LME and SITAR models. If the aim is to describe the shape of the trajectory or identify specific peaks and troughs (e.g., age at peak velocity), then natural cubic spline LME or SITAR are more useful, because linear spline cannot identify points of maxima/minima (but can identify periods/age windows). The number and spacing of repeated measures can influence model convergence and the complexity that can be allowed [48, 49]. SITAR was designed to model adolescent growth in height, and (like other nonlinear models) its parameters reflect its specific purpose. Hence, SITAR may not work well for complex trajectories (e.g., depressive symptoms [50]), and natural cubic spline LME may offer more flexible alternatives. Notable, SITAR fitted without the timing fixed effect (Î²_{0}) is analogous to a random intercept random slope model, and so should be at least as flexible as LME. A simple tactic that may improve SITAR (and optimise knot placement in LME models) is transforming the age scale [48, 51].
The regression splines presented have the advantage of being straightforward to estimate once theÂ number/location of knots is set, with the functions calculated by solving linear equations. If the number of knots is too small, the spline can be sensitive to knot location (and placing knots at quantiles or evenly spaced may not be useful) whereas if theÂ number of knots is too large this can lead to overfitting. Penalised regression splines (introduced in the Methods) are highly flexible alternative functions which avoid the need for knot selection, though are less straightforward to estimate due to the uncertainty of the tuning parameter [12, 16, 17]. One powerful approach for applying penalised regression splines to describe populationaverage growth trajectories is using generalised additive mixed models (GAMM) [52,53,54,55]. Here, the linear predictor depends linearly on an unknown regression spline function of one or more covariates (e.g., time/age) and a tuning parameter is estimated (by REML, generalised crossvalidation or other approaches) and used to penalise the spline and avoid overfitting. When applied to PBMAS, a penalised regression spline GAMM estimated mean BMC growth trajectory and velocity curves that were broadly consistent with the curves obtained from regression spline LME and SITAR models (Fig.Â 8).
Distinct from the LME and SITAR models, we used growth mixture models with a natural cubic spline to identify population subgroups with different nonlinear BMC trajectories. Linear splines with estimated knots [56, 57] and smoothing splines [58] have also been used to parameterise the nonlinear curve in these models (i.e., instead of the natural cubic spline). Latent trajectory models (e.g., growth mixture models) have previously been used to identify trajectory subgroups for BMI [59], depressive symptoms [60], physical activity [61], glucose response [62], and environmental exposures [63], among others. It can be challenging to determine the optimal number of latent classes/subgroups, including whether the classes are meaningful or if the model is just splitting the distribution of the random effects into a larger group and smaller extremes. Model selection is often subjective, and trajectory subgroups can be cohortspecific and may not replicate in other cohorts therefore, it is important to follow established reporting guidelines [28].
Latent trajectory models will be most useful if there are strong reasons to hypothesise that a study population is composed of multiple unknown subgroups of individuals with distinct trajectories. Less complex latent trajectory models (e.g., groupbased trajectory models) are computationally more efficient than growth mixture models though, any subgroup differences identified by such models may just reflect within class variability, which is likely to be absorbed by random effects in growth mixture models. If the aim was inÂ exploring specific hypotheses (e.g., sensitive periods), then clustering (data driven) approaches may not provide a suitable subgroup to test this, e.g., if the aim was to explore the hypothesis that a lower birth weight followed by faster growth in the first 1000â€‰days increased cardiovascular risk, this approach might not identify a cluster with this specific growth pattern. There may also be value in using a combination of approaches [64], like in this paper.
Identifying determinants and outcomes of trajectories
The models in this paper can be extended to include early life exposures and later outcomes to explore their associations with trajectories. Choice of method will depend on the research aims. Exposure variables can be added within LME spline models as fixed effects and as interactions with splines to test their effect on trajectories and growth rate [3, 38, 65]. The individual growth features (e.g., peak velocity and age at peak velocity) can be obtained from spline LME model random effects and used in separate analyses as outcomes or exposures [18, 66] â€“ however, it is important to allow enough complexity in the randomeffects splines for sufficient betweenperson variability (Fig.Â 9). Individual growth features can be easily obtained from SITAR and used in subsequent analyses to examine associations with exposures or outcomes [43, 67]. Of note, 2stage approaches may be more biased than 1stage joint model [68, 69]. Exposures and outcomes can be related to latent trajectories in a joint model or a multistage process where subgroups are first identified and subsequently used in separate models (unweighted or (preferably) weighed for classification probabilities) to examine associations [30, 59,60,61,62,63]. If the aim was to identify effects of repeated exposure, then a â€˜structuredâ€™ modelling approach may be useful for testing competing hypotheses [70].
It is important to identify potential biases and explore ways for mitigating them when analysing (causal) associations between trajectories and exposures or outcomes in cohort studies. Missing data can bias associations depending on mechanism, and appropriate approaches to describe and handle missing data should be explored [12, 71,72,73,74,75,76]. In a repeated measure setting, individuals with missing outcome values can be included in the estimation sample if they have at least one observed outcome value. Mixedeffects models give unbiased results (i.e., not biased by missing data) when the probability that an outcome value is missing depends on observed values of the outcome (i.e., outcome missing at random (MAR) depending on observed outcome values). Bias due to missing data will occur for these models when the probability that the outcome is missing depends on underlying missing values (i.e., outcome missing not at random (MNAR) depending on missing outcome values). With incomplete covariates, bias can occur when the probability of excluding an individual with missing covariate data is related to the outcome. Regardless of bias, excluding individuals with missing covariate information will often mean discarding useful observed data, leading to imprecision [77].
Confounding can lead to spurious associations between trajectories and exposures or outcomes. Confounders (factors causally related to exposure and outcome) should be identified (e.g., using Directed Acyclic Graphs) and controlled for, taking care not to adjust for mediators (factors on the causal pathways) [78, 79]. Even with adjustment, residual confounding (from using poorly measured confounders or not adjusting for important confounders) can bias results. Some useful strategies for checking if residual confounding influences results include using negative control variables and comparing results across cohorts with different confounding structures [64, 80,81,82].
Comparing and modelling trajectories across cohorts
Researchers should be aware of potential differences in the participants, data collection methods and analysis models when comparing trajectories from different studies. For example, the higher BMC due to the Lunar machine in ALSPAC means it is inaccurate to conclude that Britons had higher BMC (and peak BMC velocity) than North Americans. Another example is the effect of medication use by older cohorts on combined blood pressure trajectories [83]. New and existing multicohort collaborations provide unique opportunities to jointly model such trajectories across different cohorts and to extend the amount of the lifeÂ course studied [84, 85]. However, this also generates additional challenges including on model selection and missing data (see [86] for a discussion of challenges and solutions to multicohort modelling). Whatever approach is taken (cohortspecific or multicohort modelling), data harmonisation is an important initial step that involves making data comparable across studies [87, 88], e.g. using DXA reference standards to harmonise BMC [89]. Because age (but not BMC) was fully harmonised in our example, a simple approach to obtaining valid pooled estimate of age at peak BMC velocity is to fit a growth model to all individuals with BMC expressed in cohortspecific SD units (Fig.Â 10).
Conclusions
LME models with linear and natural cubic splines, SITAR, and growth mixture models are useful for describing nonlinear growth trajectories in longitudinal population studies, and these methods can be adapted to other complex traits. Choice of method depends on research aims, complexity of the trajectory, and available data. This illustrative paper and accompanying R analysis code and example datasets will we hope be a useful resource for researchers interested in modelling nonlinear longitudinal trajectories.
Availability of data and materials
All the analysis code used in this paper can be found at https://github.com/aelhak/nltmr/. Details of all available data and the processes and procedures involved in accessing the ALSPAC resource can be found in the ALSPAC study website, which includes a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/ourdata/). The BMDCS data can be accessed through the NICHD DASH website (https://dash.nichd.nih.gov/). Researchers interested in accessing the PBMAS data should contact Professor Adam DG BaxterJones (baxter.jones@usask.ca).
Abbreviations
 ALSPAC:

Avon Longitudinal Study of Parents and Children
 BMC:

Bone mineral content
 BMDCS:

Bone Mineral Density in Childhood Study
 BMI:

Body mass index
 DXA:

DualEnergy Xray Absorptiometry
 GAMM:

Generalised additive mixed model
 GEE:

Generalised estimating equations
 LME:

Linear mixed effects
 MAR:

Missing at random
 ML:

Maximum likelihood
 MNAR:

Missing not at random
 PBMAS:

Pediatric Bone Mineral Accrual Study
 REML:

Restricted maximum likelihood
 SITAR:

Super Imposition by Translation and Rotation
References
BenShlomo Y, Cooper R, Kuh D. The last two decades of life course epidemiology, and its relevance for research on ageing. Int J Epidemiol. 2016;45(4):973â€“88.
Grimm KJ, Ram N, Hamagami F. Nonlinear growth curves in developmental research. Child Dev. 2011;82(5):1357â€“71.
Howe LD, Tilling K, Matijasevich A, Petherick ES, Santos AC, Fairley L, et al. Linear spline multilevel models for summarising childhood growth trajectories: a guide to their application using examples from five birth cohorts. Stat Methods Med Res. 2016;25(5):1854â€“74.
LourenÃ§o BH, Villamor E, Augusto RA, Cardoso MA. Influence of early life factors on body mass index trajectory during childhood: a populationbased longitudinal analysis in the Western Brazilian Amazon. Matern Child Nutr. 2015;11(2):240â€“52.
Cole TJ, Donaldson MD, BenShlomo Y. SITARa useful instrument for growth curve analysis. Int J Epidemiol. 2010;39(6):1558â€“66.
Herle M, Micali N, Abdulkadir M, Loos R, BryantWaugh R, HÃ¼bel C, et al. Identifying typical trajectories in longitudinal data: modelling strategies and interpretations. Eur J Epidemiol. 2020;35(3):205â€“22.
Sauerbrei W, Abrahamowicz M, Altman DG, le Cessie S, Carpenter J, initiative obotS. STRengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med. 2014;33(30):5413â€“32.
Tu YK, Tilling K, Sterne JA, Gilthorpe MS. A critical evaluation of statistical approaches to examining the role of growth trajectories in the developmental origins of health and disease. Int J Epidemiol. 2013;42(5):1327â€“39.
Curran PJ, Obeidat K, Losardo D. Twelve frequently asked questions about growth curve modeling. J Cogn Dev. 2010;11(2):121â€“36.
MacdonaldWallis C, Lawlor DA, Palmer T, Tilling K. Multivariate multilevel spline models for parallel growth processes: application to weight and mean arterial pressure in pregnancy. Stat Med. 2012;31(26):3147â€“64.
Twisk JW. Longitudinal data analysis. A comparison between generalized estimating equations and random coefficient analysis. Eur J Epidemiol. 2004;19(8):769â€“76.
Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G. Longitudinal data analysis. USA: Chapman & Hall/CRC; 2009.
Laird NM, Ware JH. Randomeffects models for longitudinal data. Biometrics. 1982;38(4):963â€“74.
Goldstein H, De Stavola B. Statistical modelling of repeated measurement data. Longitud Life Course Stud. 2010;1(2):170â€“85.
Cole TJ: sitar: Super Imposition by Translation and Rotation growth curve analysis. R package version 1.2.0. 2021. https://cran.rproject.org/web/packages/sitar/index.html.
Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M. A review of spline function procedures in R. BMC Med Res Methodol. 2019;19(1):46.
Suk HW, West SG, Fine KL, Grimm KJ. Nonlinear growth curve modeling using penalized spline models: a gentle introduction. Psychol Methods. 2019;24(3):269â€“90.
Aris IM, Bernard JY, Chen LW, Tint MT, Pang WW, Lim WY, et al. Infant body mass index peak and early childhood cardiometabolic risk markers in a multiethnic Asian birth cohort. Int J Epidemiol. 2017;46(2):513â€“25.
Fonseca MJ, Moreira C, Santos AC. Adiposity rebound and cardiometabolic health in childhood: results from the generation XXI birth cohort. Int J Epidemiol. 2021;50(4):126071.
Harrell F. Regression modeling strategies with applications to linear models, logistic regression, and survival analysis. 1st ed. New York: Springer; 2001.
Desquilbet L, Mariotti F. Doseresponse analyses using restricted cubic spline functions in public health research. Stat Med. 2010;29(9):1037â€“57.
Mackenzie ML, Donovan CR, McArdle BH. Regression spline mixed models: A forestry example. J Agric Biol Environ Stat. 2005;10(4):394.
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning with applications in R. New York: Springer; 2017.
Naumova EN, Must A, Laird NM. Tutorial in biostatistics: evaluating the impact of â€˜critical periodsâ€™ in longitudinal studies of growth using piecewise mixed effects models. Int J Epidemiol. 2001;30(6):1332â€“41.
Beath KJ. Infant growth modelling using a shape invariant model with random effects. Stat Med. 2007;26(12):2547â€“64.
Cole TJ, Kuh D, Johnson W, Ward KA, Howe LD, Adams JE, et al. Using superimposition by translation and rotation (SITAR) to relate pubertal growth to bone health in later life: the Medical Research Council (MRC) National Survey of health and development. Int J Epidemiol. 2016.
Berlin KS, Parra GR, Williams NA. An introduction to latent variable mixture modeling (part 2): longitudinal latent class growth analysis and growth mixture models. J Pediatr Psychol. 2013;39(2):188â€“203.
van de Schoot R, Sijbrandij M, Winter SD, Depaoli S, Vermunt JK. The GRoLTSchecklist: guidelines for reporting on latent trajectory studies. Struct Equ Model Multidiscip J. 2017;24(3):451â€“67.
Lennon H, Kelly S, Sperrin M, Buchan I, Cross AJ, Leitzmann M, et al. Framework to construct and interpret latent class trajectory modelling. BMJ Open. 2018;8(7):e020683.
ProustLima C, Philipps V, Liquet B. Estimation of extended mixed models using latent classes and latent processes: the R package lcmm. J Stat Softw. 2017;1(Issue 2):2017.
Harvey N, Dennison E, Cooper C. Osteoporosis: a lifecourse approach. J Bone Miner Res. 2014;29(9):1917â€“25.
Fraser A, MacdonaldWallis C, Tilling K, Boyd A, Golding J, Davey Smith G, et al. Cohort profile: the Avon longitudinal study of parents and children: ALSPAC mothers cohort. Int J Epidemiol. 2013;42(1):97â€“110.
Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, Henderson J, et al. Cohort profile: the â€˜children of the 90sâ€™the index offspring of the Avon longitudinal study of parents and children. Int J Epidemiol. 2013;42(1):111â€“27.
McCormack SE, Cousminer DL, Chesi A, Mitchell JA, Roy SM, Kalkwarf HJ, et al. Association between linear growth and bone accrual in a diverse cohort of children and adolescents. JAMA Pediatr. 2017;171(9):e171769.
BaxterJones AD, Faulkner RA, Forwood MR, Mirwald RL, Bailey DA. Bone mineral accrual from 8 to 30 years of age: an estimation of peak bone mass. J Bone Miner Res. 2011;26(8):1729â€“39.
Nowok B, Raab GM, Dibben C. synthpop: bespoke creation of synthetic data in R. J Stat Softw. 2016;74(11):26.
Bates D, MÃ¤chler M, Bolker B, Walker S. Fitting linear mixedeffects models using lme4. J Stat Software. 2015;67(1):1â€“48. https://doi.org/10.18637/jss.v067.i01.
Elhakeem A, Frysz M, Tilling K, Tobias JH, Lawlor DA. Association between age at puberty and bone accrual from 10 to 25 years of age. JAMA Netw Open. 2019;2(8):e198918.
Jackowski SA, Erlandson MC, Mirwald RL, Faulkner RA, Bailey DA, Kontulainen SA, et al. Effect of maturational timing on bone mineral content accrual from childhood to adulthood: evidence from 15 years of longitudinal data. Bone. 2011;48(5):1178â€“85.
Cousminer DL, Mitchell JA, Chesi A, Roy SM, Kalkwarf HJ, Lappe JM, et al. Genetically determined later puberty impacts lowered bone mineral density in childhood and adulthood. J Bone Miner Res. 2018;33(3):430â€“6.
Khera AV, Chaffin M, Wade KH, Zahid S, Brancale J, Xia R, et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019;177(3):587â€“596.e589.
Jensen SM, Ritz C, Ejlerskov KT, MÃ¸lgaard C, Michaelsen KF. Infant BMI peak, breastfeeding, and body composition at age 3 y. Am J Clin Nutr. 2014;101(2):319â€“25.
Cousminer DL, Wagley Y, Pippin JA, Elhakeem A, Way GP, Pahl MC, et al. Genomewide association study implicates novel loci and reveals candidate effector genes for longitudinal pediatric bone accrual. Genome Biol. 2021;22(1):1.
O'Keeffe LM, Simpkin AJ, Tilling K, Anderson EL, Hughes AD, Lawlor DA, et al. Sexspecific trajectories of measures of cardiovascular health during childhood and adolescence: a prospective cohort study. Atherosclerosis. 2018;278:190â€“6.
Lambert PC, Abrams KR, Jones DR, Halligan AW, Shennan A. Analysis of ambulatory blood pressure monitor data using a hierarchical model incorporating restricted cubic splines and heterogeneous withinsubject variances. Stat Med. 2001;20(24):3789â€“805.
Snijders T. Power and sample size in multilevel modeling. In: Everitt BS, Howell DC, editors. Encyclopedia of Statistics in Behavioral Science. Chicester: Wiley; 2005.
Guo Y, Logan HL, Glueck DH, Muller KE. Selecting a sample size for studies with repeated measures. BMC Med Res Methodol. 2013;13(1):100.
Simpkin AJ, Sayers A, Gilthorpe MS, Heron J, Tilling K. Modelling height in adolescence: a comparison of methods for estimating the age at peak height velocity. Ann Hum Biol. 2017;44(8):715â€“22.
Tilling K, MacdonaldWallis C, Lawlor DA, Hughes RA, Howe LD. Modelling childhood growth using fractional polynomials and linear splines. Ann Nutr Metab. 2014;65(2â€“3):129â€“38.
Kwong ASF, Manley D, Timpson NJ, Pearson RM, Heron J, Sallis H, et al. Identifying critical points of trajectories of depressive symptoms from childhood to young adulthood. J Youth Adolesc. 2019;48(4):815â€“27.
Cole TJ. Optimal design for longitudinal studies to estimate pubertal height growth in individuals. Ann Hum Biol. 2018;45(4):314â€“20.
Wood SN. Generalized additive models an introduction with R. 2nd ed: Chapman & Hall/CRC; USA: 2017.
Pedersen EJ, Miller DL, Simpson GL, Ross N. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ. 2019;7:e6876.
Wood SN. mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation. 2021. p. 1.8136 https://cran.rproject.org/web/packages/mgcv/index.html.
Wood SN, Scheipl F. gamm4: generalized additive mixed models using â€˜mgcvâ€™and â€˜lme4â€™; 2017. p. 0.2â€“5. http://cran.nexr.com/web/packages/gamm4/index.html
Kohli N, Harring JR, Zopluoglu C. A finite mixture of nonlinear random coefficient models for continuous repeated measures data. Psychometrika. 2016;81(3):851â€“80.
Lock EF, Kohli N, Bose M. Detecting multiple random changepoints in Bayesian piecewise growth mixture models. Psychometrika. 2018;83(3):733â€“50.
Ding M, Chavarro JE, Fitzmaurice GM. Development of a mixture model allowing for smoothing functions of longitudinal trajectories. Stat Methods Med Res. 2021;30(2):549â€“62.
Buscot MJ, Thomson RJ, Juonala M, Sabin MA, Burgner DP, LehtimÃ¤ki T, et al. Distinct childtoadult body mass index trajectories are associated with different levels of adult cardiometabolic risk. Eur Heart J. 2018;39(24):2263â€“70.
Kwong ASF, LopezLopez JA, Hammerton G, Manley D, Timpson NJ, Leckie G, et al. Genetic and environmental risk factors associated with trajectories of depression symptoms from adolescence to young adulthood. JAMA Netw Open. 2019;2(6):e196587.
Elhakeem A, Heron J, Tobias JH, Lawlor DA. Physical activity throughout adolescence and peak hip strength in young adults. JAMA Netw Open. 2020;3(8):e2013463.
Hulman A, Witte DR, Vistisen D, Balkau B, Dekker JM, Herder C, et al. Pathophysiological characteristics underlying different glucose response curves: a latent class trajectory analysis from the prospective EGIRRISC study. Diabetes Care. 2018;41(8):1740â€“8.
LÃ©vÃªque E, Lacourt A, Philipps V, Luce D, GuÃ©nel P, StÃ¼cker I, et al. A new trajectory approach for investigating the association between an environmental or occupational exposure over lifetime and the risk of chronic disease: application to smoking, asbestos, and lung cancer. Plos One. 2020;15(8):e0236736.
Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45(6):1866â€“86.
Madden JM, Li X, Kearney PM, Tilling K, Fitzgerald AP. Exploring diurnal variation using piecewise linear splines: an example using blood pressure. Emerg Themes Epidemiol. 2017;14:1â€“1.
Brilleman SL, Howe LD, Wolfe R, Tilling K. Bayesian piecewise linear mixed models with a random change point: an application to BMI rebound in childhood. Epidemiology. 2017;28(6):827â€“33.
Crozier SR, Johnson W, Cole TJ, MacdonaldWallis C, MunizTerrera G, Inskip HM, et al. A discussion of statistical methods to characterise early growth and its impact on bone mineral content later in childhood. Ann Hum Biol. 2019;46(1):17â€“26.
Sayers A, Heron J, Smith A, MacdonaldWallis C, Gilthorpe MS, Steele F, et al. Joint modelling compared with two stage methods for analysing longitudinal data and prospective outcomes: a simulation study of childhood growth and BP. Stat Methods Med Res. 2017;26(1):437â€“52.
Parker RMA, Leckie G, Goldstein H, Howe LD, Heron J, Hughes AD, et al. Joint modeling of individual trajectories, withinindividual variability, and a later outcome: systolic blood pressure through childhood and left ventricular mass in early adulthood. Am J Epidemiol. 2021;190(4):65262.
Smith AD, Hardy R, Heron J, Joinson CJ, Lawlor DA, MacdonaldWallis C, et al. A structured approach to hypotheses involving continuous exposures over the life course. Int J Epidemiol. 2016;45(4):1271â€“9.
Lee KJ, Tilling K, Cornish RP, Little RJ, Bell ML, Goetghebeur E, et al. Framework for the treatment and reporting of missing data in observational studies: the TARMOS framework. J Clin Epidemiol. 2021;134:7988
van Buuren S. Flexible imputation of missing data. 2nd ed. Chapman & Hall/CRC. USA; 2018.
Matteo Quartagno SG, Carpenter J. jomo: a flexible package for twolevel joint modelling multiple imputation. R J. 2019;11(2):205â€“28.
Hughes RA, Heron J, Sterne JAC, Tilling K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int J Epidemiol. 2019;48(4):1294â€“304.
Twisk J, de Boer M, de Vente W, Heymans M. Multiple imputation of missing values was not necessary before performing a longitudinal mixedmodel analysis. J Clin Epidemiol. 2013;66(9):1022â€“8.
Huque MH, Carlin JB, Simpson JA, Lee KJ. A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Med Res Methodol. 2018;18(1):168.
Huque MH, MorenoBetancur M, Quartagno M, Simpson JA, Carlin JB, Lee KJ. Multiple imputation methods for handling incomplete longitudinal and clustered data where the target analysis is a linear mixed effects model. Biom J. 2020;62(2):444â€“66.
VanderWeele TJ. Principles of confounder selection. Eur J Epidemiol. 2019;34(3):211â€“9.
Groenwold RHH, Palmer TM, Tilling K. To Adjust or Not to Adjust? When a "Confounder" Is Only Measured After Exposure. Epidemiology. 2021;32(2):194201. https://doi.org/10.1097/EDE.0000000000001312.
Lipsitch M, Tchetgen Tchetgen E, Cohen T. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology. 2010;21(3):383â€“8.
Taylor K, Elhakeem A, Nader JLT, Yang T, Isaevska E, Richiardi L, et al. Effect of maternal prepregnancy/earlypregnancy BMI and pregnancy smoking and alcohol on congenital heart diseases: a parental negative control study. J Am Heart Assoc. 2021;10(11):e020051
Brion MJ, Lawlor DA, Matijasevich A, Horta B, Anselmi L, AraÃºjo CL, et al. What are the causal effects of breastfeeding on IQ, obesity and blood pressure? Evidence from comparing highincome with middleincome cohorts. Int J Epidemiol. 2011;40(3):670â€“80.
Wills AK, Lawlor DA, Matthews FE, Aihie Sayer A, Bakra E, BenShlomo Y, et al. Life course trajectories of systolic blood pressure using longitudinal data from eight UK cohorts. PLoS Med. 2011;8(6):e1000440.
Jaddoe VWV, Felix JF, Andersen AN, Charles MA, Chatzi L, Corpeleijn E, et al. The LifeCycle projectEU child cohort network: a federated analysis infrastructure and harmonized data of more than 250,000 children and parents. Eur J Epidemiol. 2020;35(7):709â€“24.
Ronkainen J, Nedelec R, Atehortua A, Balkhiyarova Z, Zhanna A, Dang V, et al. LongITools: dynamic longitudinal exposome trajectories in cardiovascular and metabolic noncommunicable diseases. Environ Epidemiol. 2021;6(1):e184. https://doi.org/10.1097/EE9.0000000000000184.
Hughes RA, Tilling K, Lawlor DA. Combining longitudinal data from different cohorts to examine the lifecourse trajectory. Am J Epidemiol. 2021;190(12):26809.
Pinot de Moira A, Haakma S, StrandbergLarsen K, van Enckevort E, Kooijman M, Cadman T, et al. The EU Child Cohort Networkâ€™s core data: establishing a set of findable, accessible, interoperable and reusable (FAIR) variables. Eur J Epidemiol. 2021;36(5):565â€“80.
Nader JL, LÃ³pez M, Julvez J, Guxens M, Cadman T, Elhakeem A, et al. Cohort description: measures of earlylife behaviour and later psychopathology in the LifeCycle project  EU child cohort network. J Epidemiol. 2021. (Epub ahead of print). https://doi.org/10.2188/jea.JE20210241.
BaxterJones AD, Burrows M, Bachrach LK, Lloyd T, Petit M, Macdonald H, et al. International longitudinal pediatric reference standards for bone mineral content. Bone. 2010;46(1):208â€“16.
Acknowledgments
We thank the study participants from each cohort who took part in this study and the scientific and data collection teams from the ALSPAC, BMDCS and PBMAS cohorts. We are extremely grateful to all the families who took part in the ALSPAC study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. We acknowledge the BMDCS Principal Investigators: Vicente Gilsanz, MD, PhD, Heidi Kalkwarf, PhD, Joan Lappe, PhD, Sharon Oberfield, MD, John Shepherd, PhD, and the National Institute of Child Health and Human Development (NICHD) Data and Specimen Hub (DASH) for providing the BMDCS data that was used for this research.
Funding
This project has received funding from the European Unionâ€™s Horizon 2020 research and innovation programme under grant agreements No. 874739 (LongITools)Â andÂ No. 733206 (LifeCycle). AE, KMT, RAH and DAL work in a unit that is supported by the University of Bristol and UK Medical Research Council (MC_UU_00011/3 & MC_UU_00011/3). RAH is supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (Grant Number 215408/Z/19/Z). ASFK is supported by an Economic Social Research Council Postdoctoral Fellowship (Grant ref.: ES/V011650/1). The UK Medical Research Council and Wellcome (Grant Ref: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. A comprehensive list of grant funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grantacknowledgements.pdf). PBMAS was supported in part by funding from the Canadian Institutes of Health Research, the Saskatchewan Health Research Foundation (SHRF), The Dairy Farmers of Canada and the University of Saskatchewan. This research reflects only the authorsâ€™ view, and the European Commission is not responsible for any use that may be made of the information it contains. The funders had no role in the design or conduct of the study, nor management, analysis, and interpretation of the data, or the decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Contributions
AE developed the idea for the paper with initial input from DAL and with further development from RAH, KMT, DLC, SAJ, TJC ASFK, ZL, SAF, ADGB, BSZ. AE conducted the analysis and wrote the first draft of the manuscript with initial input from DAL. RAH, KMT, TJC, and ZL provided advice on the description and interpretation of statistical methods. AE, RAH, KMT, DLC, SAJ, TJC ASFK, ZL, SAF, ADGB, BSZ, and DAL contributed to the development of the draft to its final version. ZL developed an R package to help complete this work (Package â€˜splutiâ€™ â€“ Utility Functions for PostProcessing Univariate Interpolation Splines, Smoothing Splines and Regression Splines: https://github.com/ZheyuanLi/spluti). The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
All cohorts had ethical approval from their relevant local or national ethics committees. The Avon Longitudinal Study of Parents and Children (ALSPAC) obtained ethics approval from the ALSPAC Law and Ethics committee and the local National Health Service Research Ethics Committee. The Bone Mineral Density in Childhood Study (BMDCS) study was approved by the Institutional Review Board of each clinical center: Children Hospital of Los Angeles (Los Angeles, CA), Cincinnati Children Hospital Medical Center (Cincinnati, OH), Creighton University (Omaha, NE), Children Hospital of Philadelphia (CHOP) (Philadelphia, PA), and Columbia University (New York, NY). The Pediatric Bone Mineral Accrual Study (PBMAS) study was approved by the University of Saskatchewanâ€™s biomedical review committee.
All study participants gave written informed consent to participate in the respective cohorts and secondary data analyses.
Consent for publication
All study participants gave written informed consent to participate in theÂ respective cohorts and secondary data analyses.
Competing interests
DAL reported grants from national and international government and charity funders, Roche Diagnostics, and Medtronic Ltd. for work unrelated to this publication. KMT reported grants from national and international government and charity funders, for work unrelated to this publication. No other disclosures were reported.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Cohort characteristics and participant numbers, and age and BMC at each visit.
Additional file 2.
BIC and fit statistics for linear spline LME models, natural cubic spline LME models, and SITAR models with 2 to 6 knots in the fixed effects spline curve.
Additional file 3.
Fit statistics and predicted mean BMC latent trajectories for growth mixture models with 1 to 5 latent classes.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Elhakeem, A., Hughes, R.A., Tilling, K. et al. Using linear and natural cubic splines, SITAR, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies. BMC Med Res Methodol 22, 68 (2022). https://doi.org/10.1186/s12874022015428
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874022015428
Keywords
 ALSPAC
 Bone mineral content
 BMDCS
 Growth models
 Lifecourse
 Mixedeffects
 PBMAS
 Tutorial