 Software
 Open access
 Published:
nlive: an R package to facilitate the application of the sigmoidal and random changepoint mixed models
BMC Medical Research Methodology volume 23, Article number: 257 (2023)
Abstract
Background
The use of mixed effect models with a specific functional form such as the Sigmoidal Mixed Model and the Piecewise Mixed Model (or Changepoint Mixed Model) with abrupt or smooth random change allows the interpretation of the defined parameters to understand longitudinal trajectories. Currently, there are no interface R packages that can easily fit the Sigmoidal Mixed Model allowing the inclusion of covariates or incorporating recent developments to fit the Piecewise Mixed Model with random change.
Results
To facilitate the modeling of the Sigmoidal Mixed Model, and Piecewise Mixed Model with abrupt or smooth random change, we have created an R package called nlive. All needed pieces such as functions, covariance matrices, and initials generation were programmed. The package was implemented with recent developments such as the polynomial smooth transition of the piecewise mixed model with improved properties over BaconWatts, and the stochastic approximation expectationmaximization (SAEM) for efficient estimation. It was designed to help interpretation of the output by providing features such as annotated output, warnings, and graphs. Functionality, including time and convergence, was tested using simulations. We provided a data example to illustrate the package use and output features and interpretation. The package implemented in the R software is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.Rproject.org/package=nlive.
Conclusions
The nlive package for R fits the Sigmoidal Mixed Model and the Piecewise Mixed: abrupt and smooth. The nlive allows fitting these models with only five mandatory arguments that are intuitive enough to the less sophisticated users.
Background
Continuous longitudinal data may have a trajectory that is not linear. This is the case of cognitive aging and other processes in fields such as agriculture [1], pharmacology [2], and marketing [3]. Although some less parsimonious models have been proposed to model such longitudinal data, the use of models with a specific functional form such as the Sigmoidal Mixed Model (SMM) [4] and the Piecewise Mixed Model (PMM) [5] with abrupt or smooth change allow the interpretation of the defined parameters.
The SMM is currently implemented in SAS using PROC NLMIXED [4], which maximizes the marginal likelihood by using an adaptive Gaussian quadrature [6] or other approximation methods, such as the firstorder method [7]. In R, most of these packages focus on doseresponse optimization and curvefitting [8] such as qpcR [9], grofit [10], FlexParamCurve [11], drfit [12], and MCPMod [13] or aim to automate fitting and classify multiple curves [8]. However, none of these packages can fit the SMM allowing the inclusion of covariates for all 4 parameters. The PMM is commonly fitted using Bayesian inference and implemented in OpenBugs or WinBUGS, but is also commonly fit in R using the lme4 [14], which maximizes the marginal likelihood by using a Laplace approximation. A recently developed Stochastic Approximation Expectation Maximization (SAEM) algorithm was shown to be more successful [15] and faster [16] to identify the maximum likelihood estimators of nonlinear mixed models. This can be implemented directly using the package saemix [17] (version 3.0). However, one downside of having such flexible packages as the lme4 and saemix is that they require more analytical skills to code. It is worth noting that some simpletouse packages in R can fit the abrupt PMM, including segmented [18] and rcpm [19]. However, these packages also do not use SAEM and with them, it is not possible to (i) include covariates for all 4 parameters, (ii) consider a smooth polynomial transition, and/or (iii) estimate directly the last level (e.g. level close to death).
In this work, we present the the version 0.2.0 of the nlive package implemented within R software. The main objective of the package is to facilitate and broaden the application and interpretation of the SMM and PMM for longitudinal data. All needed elements to fit the models have been programmed, including the computation of the structural model and the automatic generation of initials for the main parameters. As such, less experienced R users only need to specify the model to fit via a single intuitive line of code, with only five mandatory arguments. The package was implemented with the most recent and efficient algorithms for nonlinear models. Implementation was also performed with the most interpretable parameterization and was based on the most recent developments in each type of model. For example, for the smooth PMM, instead of using the BaconWatts [20] which can create an artificial increase in the trajectory right after the changepoint [21], we considered the most recently developed polynomial smooth transition [22]. In this article, we reintroduce these models, describe the implementation of the package, and provide a simulation study to demonstrate the performance of the package. We also demonstrate the use of the model and interpretation of the output using a madeup dataset with trajectories similar to those found in the cognitive aging of individuals followed until death [23].
Model specifications
As a prelude to the introduction and demonstration of the new nlive package, we first describe the general formulation of the nonlinear mixed models implemented in the package. The simplified general form of nonlinear mixed models can be written in terms of a known nonlinear function f given by:
where \(y_{ij}\) denotes the longitudinal outcome value of subject i \((i=1, ..., N)\) collected at the observation time \(t_{ij}\) \((j=1, ..., n_i)\); \(\psi _i\) is a vector of normally distributed personspecific parameters function of fixed effects and individual random effects; and \(\epsilon _{ij}\) are random error, with \(\epsilon _{ij} \sim N(0, \sigma _{\epsilon }^2)\).
Motivated by the application on latelife cognitive decline, the nlive package implements two main classes of nonlinear mixed models: the Sigmoidal Mixed Model (SMM) [4, 24] with four parameters and the Piecewise Linear Mixed Model (PMM) [5] with two linear phases and a single changepoint. In the following subsections, we provide a brief introduction to these models. For simplicity, some annotations can be similar from one model to another, while the interpretation of the parameters remains specific to each of them.
The Sigmoidal Mixed Model
The SMM introduced by Capuano and colleagues [4] is based on the fourparameter logistic that allows the inclusion of covariates related to four parametric quantities. The nonlinear trajectory of the outcome Y can be formulated as follows:
where the first parameter, \(\psi _{1i}\), represents the personspecific initial level of the outcome before the onset of decline. The second parameter, \(\psi _{2i}\), represents the personspecific level of the outcome at a time equal to zero (e.g., death), or the intercept. We will call it the last level although the meaning of time may differ depending on the application. \(\psi _{3}\) represents the marginal time when half of the total decline occurred. We will call it the midpoint. \(\psi _{4}\) represents the marginal Hill slope and will define the nonlinear pattern of the trajectory (e.g. determining the steepness, earlier versus later acceleration of change). These two latter parameters are kept as marginal for convergence purposes [4]. The four main parameters are assumed to obey the following equations:
where \(\alpha _1\), \(\alpha _2\), \(\alpha _3\), and \(\alpha _4\) are the mean values for the last level, initial level, midpoint, and Hill slope, respectively; \(X_{1i}\), \(X_{2i}\), \(X_{3i}\), and \(X_{4i}\) are vectors of covariates associated with the vector of fixed effects \(\beta _1\), \(\beta _2\), \(\beta _3\), and \(\beta _4\), respectively; and \(\eta _{1i}\) and \(\eta _{2i}\) are random effects with \((\eta _{1i}, \eta _{2i})^\top \sim MVN(0, B)\) and B assuming correlations between \(\eta _{1i}\) and \(\eta _{2i}\).
The piecewise linear mixed model with a random changepoint
The PMM model [5] assumes that the stochastic process of the longitudinal outcome is characterized by two or more different phases. Under this class of models, the nlive package implements two PMM models with an abrupt change (PMMabrupt) [25] and a smooth polynomial transition (PMMsmooth) [22] between the two linear phases. These models provide an appealing statistical approach to detecting the time when the onset of accelerated decline occurs.
PMM with abrupt change
The PMMabrupt model (also known as the linearlinear or the brokenstick mixed model), consists of an intercept at time zero, a slope close to the intercept, a change point at which the slope changes, and a slope after this change point. The nonlinear trajectory of the outcome Y can be formulated as follows:
where the first parameter, \(\psi _{1i}\), represents the personspecific level of the outcome at time zero, or the intercept; \(\psi _{2i}\) represents the personspecific slope before the changepoint; \(\psi _{3i}\) represents the personspecific slope after the changepoint; and \(\psi _{4i}\) represents the personspecific changepoint time parameter.
Assuming an alignment at death for example (for interpretation purposes), the parameters \(\psi _{1i}\) to \(\psi _{4i}\) are supposed to obey the following equations:
where \(\alpha _1\), \(\alpha _2\), \(\alpha _3\), and \(\alpha _4\) are the mean values for the last level, the slope before the change point, the slope after the changepoint, and the changepoint time, respectively; \(X_{1i}\), \(X_{2i}\), \(X_{3i}\), and \(X_{4i}\) are vectors of covariates associated with the vector of fixed effects \(\beta _1\), \(\beta _2\), \(\beta _3\), and \(\beta _4\), respectively; and \(\eta _{1i}\) to \(\eta _{4i}\) are random effects with \((\eta _{1i}, \eta _{2i}, \eta _{3i}, \eta _{4i})^\top \sim MVN(0, B)\) and B assuming correlations only between \(\eta _{2i}\) and \(\eta _{3i}\).
PMM with smooth polynomial transition
The PMMsmooth model is an extension of the PMMabrupt. The initial smooth PMM proposed by Bacon and Watts [20] includes a hyperbolic tangent transition. In this work, however, we consider a more recent development that considers a smooth polynomial transition introduced in Van den Hout, MunizTerrera, and Matthews [22]. In contrast to the PMMabrupt, the changepoint of the PMMsmooth represents the beginning of a smooth transition.
In PMMsmooth, the transition is modeled using a thirddegree polynomial function fitted between the two straight lines. In the original work [22, 26], the intercept parameter cannot be interpreted directly as it reflects the level parameter projection using the early slope at time zero. To allow direct interpretation of the intercept, we reformulated the PMMsmooth model as:
where \(\psi _{1i}\), \(\psi _{2i}\), and \(\psi _{3i}\) have been previously defined for Eq. (7). \({\psi }_{4i}\) is the personspecific time when the smooth transition phase of length v begins. v is a value representing the time interval where the polynomial curve occurs between \(t_{ij}\) = \({\psi }_{4i}\) and \(t_{ij}\) = \({\psi }_{4i}\) + v. To be closer to the PMMabrupt, the two linear parts should intersect at the middle of the transition phase and the constraint \(\lambda _i=\psi _{1i}+\psi _{2i}({\psi _{4i}}+\frac{v}{2})\psi _{3i} ({\psi }_{4i}+\frac{v}{2})\) is imposed. Note that v set to 0 reduces to a PMMabrupt model.
The smoothness of the transition function involves four linear equations with four parameters:
where \(g_{transition}\) is obtained by solving the system of four linear equations with four unknown parameters. The derivatives of \(g_{transition}\) at the times \(t_{ij}={\psi }_{4i}\) and \(t_{ij}={\psi }_{4i}+v\) are respectively \(\psi _{3i}\) and \(\psi _{2i}\).
Implementation
Software
To facilitate the application and interpretation of the SMM, PMMabrupt, and PMMsmooth models for a broader audience, who is not necessarily familiar with statistical programming, we developed a userfriendly R package called “nlive” (nonlinear mixed models with initial values estimated) with R version 4.0.3. All needed elements to fit the models, including the definition of the structural model and the generation of initials for the four main parameters, have been programmed so that the user only needs to specify a single intuitive line of code to fit the model. A variety of options can also be specified. Along with the functions to fit individual models, the package also provides a function that displays the longitudinal data. A madeup data frame, under the name dataCog, is provided with the package. This data represents cognitive patterns previously observed in cognitive aging when participants are followed until death. Real data on cognitive aging with annual followup until death from the Religious Order Study and the Memory and Aging Project (ROSMAP) from the Rush Alzheimer’s Disease Center can be obtained under request at https://www.radc.rush.edu. These cohorts are described elsewhere [23]. The version 0.2.0 of the nlive package is freely available via the Comprehensive R Archive Network (CRAN) at https://CRAN.Rproject.org/package=nlive.
Estimation
The SMM, PMMabrupt, and PMMsmooth previously described were all estimated using the saemix package (version 3.0) developed by Comets and colleagues [17]. The saemix package, among other things, requires the definition of the structural model; thus, first, nlive includes the structures of the models. For SMM, the nlive algorithm relied on the SSlogis5() function of the nlraa package [27] (version 1.2), which initially defines a 5parameter logistic curve but can be reduced to a 4parameter logistic when the \(5^{th}\) parameter is fixed to 1. For PMMabrupt and PMMsmooth, the structure of the models are explicitly coded in the nlive algorithm. In addition to the output information provided by saemix, the nlive package also provides pvalues for the main terms [28] that would not be available otherwise.
The SAEM algorithm
The computational technique for maximum likelihood estimation implemented in saemix is the Stochastic Approximation Expectation Maximization (SAEM) algorithm, which is a stochastic approximation version of the standard EM algorithm proposed by Khuhn and Lavielle [29]. The SAEM algorithm showed to be efficient in the context of nonlinear mixed models, converging quickly to the maximum likelihood estimators [16] and achieving better performance than linearizationbased algorithms [15]. In preliminary testing during the algorithm coding process, in line with the literature, saemix showed convergence to the adequate solution more often than two main competing software package [17]: nlme [6] and lme4 [14]. Note that in saemix, the likelihood can be computed by linearisation or by importance sampling. In the linearization approach, the likelihood of the Gaussian model is estimated from the nonlinear mixed effects model using the approximation proposed by Lindstrom and Bates [30]. In the importance sampling approach, the likelihood is obtained through a MonteCarlo stochastic integration and does not require a model approximation. More information on these methods is available in the saemix documentation [17].
Initial values
One of the great advantages of the nlive package is that it has an embedded algorithm that examines the data and automatically provides informative initials. Here we briefly describe this algorithm. For SMM, the four main parameters are the last level, first level, midpoint, and Hill slope. An initial for each parameter needs to be provided. For that, we build upon an algorithm previously developed in SAS by Capuano and colleagues [4] (algorithm hosted and accessible at github.com/AWCapuano/sigmoidal). In this work, we expand this algorithm for different data structures (e.g., different time scales). Briefly, first, the algorithm segments the time into the initial (5th percentile) and final (95th percentile). At these periods the initial and final mean levels of the outcome are ascertained. The time of half decline is set to 300 if the curve is nearly linear, and to 2 otherwise. Finally, the Hill slope is set to a high and low value based (0.5 and 1.05). Similarly, for PMMabrupt and PMMsmooth, estimation of the models requires the specification of four starting values related to the four main parameters: last level, changepoint time, slope before the changepoint, and slope after the changepoint. First, the algorithm obtains the final portion of the time (the 95th percentile of time). Then, for this period of time, the mean level of the outcome is ascertained. To inform the other three parameters (changepoint, preslope, postslope), standard linear mixed models are used. First, the time is segmented into quintiles starting from the initial time to the final time. Then the algorithm approximates where the acceleration of change (i.e., changepoint) occurs by estimating five separate linear mixed models based on the quintiles of time. The changepoint time is defined as the lower bound of the time interval where the fastest slope occurred. Lastly, the early and final slopes are informed by the slope of cognitive decline estimated using a linear mixed model considering the subsets of cognitive measures collected before and after the approximated changepoint, respectively. The linear mixed models are implemented using the hlme function from the lcmm [31] package (version 1.9.5) to fit mixed effect models on segments of the longitudinal data.
Overview of the package
The nlive package offers three estimation functions (nlive.smm, nlive.pmma, nlive.pmms) to fit the SMM, PMMabrupt, PMMsmooth models, respectively, relying on the SAEM algorithm implemented in saemix. These functions all require to take as input a dataset that provides information on the longitudinal outcome of interest, participant ID, time, and predictors (if any).
The call of nlive.smm is:
nlive.smm(dataset, ID, outcome, time, var.all=NULL, var.last.level=NULL,
var.first.level=NULL, var.midpoint=NULL, var.Hslope=NULL, traj.marg=NULL,
traj.marg.group=NULL, start=NULL)
The first fourth arguments are mandatory: dataset defines the name of the data frame in the longitudinal format; ID defines the name of the variable representing the grouping structure specified with " (e.g., "ID" representing the unique identifier of participants); outcome corresponds to the name of the timevarying variable representing the longitudinal outcome specified with " (e.g., "cognition"); time is the name of the variable representing the timescale specified with " (e.g., "time"), which can be negative or positive.
All other arguments are optional: var.all specifies a vector indicating the name of the variable(s) that the four main parameters of the model will be adjusted to (e.g. var.all=c("X1"))(NULL by default); var.last.level, var.first.level, var.midpoint, var.Hslope, each specifies a vector indicating the name of the variable(s) that the related specific parameter (e.g. midpoint) can be adjusted to (e.g. var.midpoint=c("X2"))(NULL by default); traj.marg indicates whether the marginal estimated trajectories should be plotted (TRUE) or not (FALSE by default); traj.marg.group provides the name of the grouping variable listed in one of the arguments var. to contrast the estimated marginal trajectories between two specific groups (NULL by default). If the variable is binary, the trajectories are contrasted between the two groups of interest. If the variable is continuous, the \(10^{th}\) and \(90^{th}\) percentile values will automatically be considered; start specifies a vector of length 4 to override the specification of the four initial values for the main parameters (initials obtained directly from the data by default).
The calls of nlive.pmma and nlive.pmms are:
nlive.pmma(dataset, ID, outcome, time, var.all=NULL, var.last.level=NULL,
var.slope1=NULL, var.slope2=NULL, var.changepoint=NULL, ...)
nlive.pmms(dataset, ID, outcome, time, var.all=NULL, var.last.level=NULL,
var.slope1=NULL, var.slope2=NULL, var.changepoint=NULL, ...)
All arguments needed in nlive.pmma and nlive.pmms are the same as those previously described for nlive.smm. However, because the parameters are different between the models, the PMMrelated functions will have different arguments to enter covariates to specific parameters. These arguments are var.last.level, var.slope1, var.slope2, and var.changepoint. Of note, nlive has a legacy function that can fit all models. This function is being kept for older users. Details are available in Supplementary Materials (Appendix 1). Information on all the functions available in nlive is also found in the package documentation at https://CRAN.Rproject.org/package=nlive.
Results
Performance
We performed a simulation study to evaluate the performance of fitting SMM and PMM using the SAEM algorithm and of assigning informative initials using the nlive algorithm. In the first step of the study, we challenged the SAEM algorithm by running two different scenarios: varying sample sizes (n of individuals of 100, 200, and 500), and varying number of covariates (zero, one, and two covariates per parameter). In the second step we compared the gain of using the nlive algorithm to assign initials by comparing it with the use of naïve initials, that is using zero to all initials. The simulated data mimicked the longitudinal cognitive trajectoriems observed in deceased ROSMAP participants. These cohorts were chosen due to the large number of participants followed annually until death. The cohorts are described in detail elsewhere [17]. The time intervals for each visit of each individual were generated using a uniform distribution of [2, 2] months. The data generation started randomly from 24 years before death (time=24) to death (time=0). This allowed the dataset to be more realistic where each individual has a different time from baseline to death, and there is an average of 10 years of followup (SD=5). As an inclusion criterion, individuals need at least 4 cognitive observations to enter the model. Each scenario was tested using 100 replications.
In the first step, for each model, convergence was successfully reached in the second phase of the SAEM algorithm in all replications for all models under all scenarios. The run time increased with the sample size and the number of covariates and was longer for SMM (Fig. 1). However, in the most complex scenarios of fitting SMM with two covariates on a sample of 500 individuals, the run time was still less than 6 minutes. We assessed the estimation accuracy of the marginal mean cognitive values before death, year by year, using the empirical Mean Squared Errors (\(MSE{(t)}=\frac{1}{100} \sum _{r=1}^{100}(Y{(t)}  \hat{Y}_{r}{(t)})^{2}\)), where Y(t) and \(\hat{Y}_{r}{(t)}\) represent the underlying true level and the estimated level of cognition, respectively, at year t (t=24,...,0) before death for r replicates (r=1,...,100). All the models provided estimates with low bias (MSE ranged from 0.02 to 0.07).
In the second step, simulations using naïve initials started with the intermediate challenging scenario of one covariate and 500 individuals. Convergence was not a problem with all models successfully reaching a solution in the second phase of the SAEM algorithm. The naïve initials also did not significantly increase the run time (on average, the gain was \(<5\) seconds for SMMs and \(< 1\) second for PMMs). However, there was a relevant decrease in the quality of the estimates. Using naïve initials, there was up to \(30 \%\) increase in the percent bias in the average marginal estimated cognitive trajectories, with up to \(42 \%\) increase in the mean square errors. This was enough empirical evidence to convince us to use informative starting values rather than naive ones. Of note, all the models were fitted on a HP ProBook 400 G6 containing an i78565U processor and 16 gigabytes of RAM running R version 4.0.3. Together, these profiling results support that the application of the SAEM algorithm to fit SMM, PMMabrupt, and PMMsmooth is efficient.
Example
In this section, we show how nlive can be used to fit the SMM, PMMabrupt, and PMMsmooth models, and we present the main outputs provided by the package. In the context of our motivating application, latelife cognitive decline, each model was fitted using the madeup illustrative sample dataCog available in the package. Thus, the first step consists of loading nlive, which will automatically load dataCog.
R > library(nlive)
The dataCog dataset contains 1200 individuals with annual cognitive testing for at least 4 years until death (mean followup=7 [SD=5] years); a description of the data can be accessed via the command summary(dataCog). On each line, we can read the unique participant identifier (ID), the negative retrospective time before death in years (time), the repeated values of the composite score of global cognition collected over time (cognition), and the age at death of individuals in years; in the natural scale (ageDeath) and centered at its mean (ageDeath90) for interpretation purposes. The following lines create the continuous ageDeath90 variable and display the first lines of dataCog:
R > dataCog$ageDeath90 < dataCog$ageDeath  90
R > head(dataCog)
. ID time cognition ageDeath ageDeath90
1 1000 10.00 0.45 91 1
2 1000 9.08 0.27 91 1
3 1000 8.04 0.19 91 1
4 1000 6.82 0.15 91 1
5 1000 5.99 0.05 91 1
6 1000 4.98 0.15 91 1
Before fitting a model, the user can inspect the longitudinal outcome of interest (or other longitudinal variables) using the function nlive.inspect(). Rstudio is recommended for this function. Below is an example of the use of nlive.inspect() for the variable cognition:
R > nlive.inspect(dataset="dataCog", ID="ID", variable="cognition", time="time")
nlive.inspect() generates key plots, including the distribution of the longitudinal outcome, a spaghetti plot of the observed individual cognitive trajectories before death for 70 individuals randomly selected in dataCog (see Fig. 2A), and boxplots of the longitudinal observed measures, obtained every year before death, for the whole population (see Fig. 2B). Those plots can be customized as needed using the R code provided in Appendix 2. All plots are produced with the ggplot2 package [32] and allow to better appreciate the variability of the trajectories within and between individuals over time. The options plot.xlabel and plot.ylabel allows the user to specify a character string to define axes x and y, respectively (e.g. plot.xlabel=c("Years before death")). For further graphical adjustements, users can access the ggplot scripts of the plots generated in nlive.inspect using nlive.inspect in an R console.
Modeling the SMM
For demonstration purposes, we fit a relatively simple SMM model with all main parameters adjusted for ageDeath90. The user only needs to specify the name of the data frame and the columns containing the participant ID, the response, the timescale, and the predictor. We also include arguments to plot the marginal estimated trajectories before death.
R > smm.fit < nlive.smm(dataset=dataCog, ID="ID",
+ outcome="cognition",
+ time="time",
+ var.all=c("ageDeath90"),
+ traj.marg=TRUE,
+ traj.marg.group=c("ageDeath90"))
In the main output, nlive.smm() provides the general output from saemix, which include a summary of the data, the specification of the model (main parameters, covariates, correlation matrix of random effects, initial values), key algorithm options used, and several numerical results (parameter estimated, likelihood) [17]. In nlive, this output is augmented by providing the processing time of the program and pvalues for the main terms. Below, we focus on the numerical results, but the entire output is displayed in Appendix 3.
The fitted SMM model indicates that higher age at death was associated with lower cognitive level at baseline (see term beta_ageDeath90(first.level)) and close to death (see term beta_ageDeath90(last.level)). In addition, higher age at death was associated with an earlier half of cognitive decline (see term beta_ageDeath90(midpoint)). However, age at death was not associated with the Hill slope(see term beta_ageDeath90(hill.slope)).
To facilitate the interpretation of the estimated parameters, it is convenient to visualize the estimated average trajectories over time. In nlive.smm(), users can easily plot two types of marginal estimated trajectories. First, by setting up the argument traj.marg=T, the function can provide a graph of the estimated marginal trajectory of global cognition before death in the whole study sample, for the most common profile of covariates (see Fig. 3). In this example, this would represent the most common average age at death (i.e., 90 years). Second, by specifying traj.marg.group=c("ageDeath90"), the function can provide a plot of estimated marginal trajectories of global cognition contrasted between two groups corresponding to participants in the \(10^{th}\) versus \(90^{th}\) percentile of the ageDeath90 distribution, for the most common profile of covariates (see Fig. 4). Users can manually specify the percentile values using the traj.marg.group.val option. For example, traj.marg.group.val=c(0.25,0.75) will plot trajectories for the \(25^{th}\) and \(75^{th}\) percentiles, respectively.
Modeling the abrupt PMM
For PMMabrupt, the call of the function nlive.pmma() is:
R > pmm.abrupt.fit < nlive.pmma(dataset=dataCog, ID="ID",
+ outcome="cognition",
+ time="time",
+ var.all=c("ageDeath90"),
+ traj.marg=TRUE,
+ traj.marg.group=c("ageDeath90"))
The general summary output is:
In this example, for the PMMabrupt model, we found that each additional year of age at death was associated with worse mean cognitive level close to death (see term beta_ageDeath90(last.level)). In addition, each increment in the age at death was related to an earlier onset of accelerated decline (see term beta_ageDeath90(changepoint)). However, age at death was not related to the preterminal decline (see term beta_ageDeath90(slope1)) or terminal decline (see term beta_ageDeath90(slope2)). The marginal estimated trajectories in the whole study sample and in the \(10^{th}\) versus \(90^{th}\) percentiles of the age at death distribution are displayed in Fig. 4.
Modeling the smooth PMM
For PMMsmooth, the user can call the function nlive.pmms():
R > pmm.smooth.fit < nlive.pmms(dataset=dataCog, ID="ID",
+ outcome="cognition",
+ time="time",
+ var.all=c("ageDeath90"),
+ traj.marg=TRUE,
+ traj.marg.group=c("ageDeath90"))
The general summary output is:
In this example, as expected, findings are generally similar to those obtained for the PMMabrupt model. The main difference is that the estimated changepoint parameter represents here the beginning of the transition period. Marginal estimated trajectories in the whole study sample and according to age at death are displayed in Fig. 5, respectively.
Other features
Since the estimation functions from nlive fit the models based on the saemix package, users can take advantage of many generic functions from saemix. Table 1 displays a brief description of the main functions available in saemix. Consider the illustrative example of SMM. In this example, nlive will generate the SaemixObject smm.fit. With that, the user can extract the subjectspecific prediction by calling psi(smm.fit, type="mean"). The subjectspecific random effects can be extracted by calling eta(smm.fit, type="mean"). Convergence plots (Fig. 6) can also be obtained by calling saemix.plot.convergence(smm.fit) or plot(smm.fit, plot.type="convergence"). Please refer to the saemix documentation for other available functions.
Concluding remarks
In this work, we introduce the version 0.2.0 of the newly developed R package nlive to fit three nonlinear mixed models for Gaussian longitudinal data: the sigmoidal mixed model (SMM) and two piecewise linear mixed models with a random changepoint (PMMabrupt and PMMsmooth). The SMM includes 4 parameters, which allow for the estimation of early level, half of the decline, Hill slope (the steepness of the curve), and final level of the longitudinal outcome of interest. The two PMM separate the trajectory into two linear phases and allow for estimation of the early slope, changepoint, final slope, and final level. These models were chosen for the implementation as they currently cannot be easily implemented in R and are of importance, especially in aging research. All needed pieces such as functions, covariance matrices, and initials generation were programmed. The nlive() function allows fitting these models with one line of code that is intuitive enough to the less sophisticated users. The yielding product has only five mandatory arguments. Options are available to readily accommodate user preferences, including manual specification of starting values or diagnostic plots. It was also designed to help interpretation of the output by providing features such as annotated output, warnings (e.g. small sample, number of covariates), and graphs.
This package is the first to provide a seamless user interface to fit the Sigmoidal Mixed Effect Model. Some packages in R can fit the Sigmoid curve but not the mixed effect model. As for PMM, although this is not the first package to provide an interface to fit the model, the nlive package includes the more recent developments in the model structure and the likelihood maximization algorithm. The smooth PMM implemented is based on the polynomial transition that was demonstrated to have improved properties over the BaconWatts. PMM models were also reparameterized which allows the interpretation based on the estimated value at time zero and not a projection to zero from the first and more distant slope. Here we build upon recently developed tools in R such as the saemix package that utilizes the Stochastic Approximation EMbased algorithm, shown in several tests to have a better convergence rate than the Maximum Likelihood. All models were fitted with the same algorithm. Extensive testing of basic functionality was already performed for saemix development. In this interface, however, convergence adequacy was tested given the particular complexity of these models. Overall the convergence rate was high, the time was reasonable, and the bias was low.
The motivation of this package was aging research including biomarkers of the Alzheimer’s pathological cascade (a.k.a. Jack curves) [33], natural history of cognition [4, 34, 35], retesting effect [36, 37], and terminal decline [38, 39]. These models, however, are nonspecific and the nlive can be used in a wide variety of fields. Many processes were demonstrated to follow a sigmoid trajectory over time (a.k.a. 3 to 5 parameters logistic, Hill, Langmuir, LangmuirHill, and HillLangmuir equation). Such processes are found in agriculture [1], pharmacology [2] and marketing [3], to cite a few. Similarly, many processes that are initially linear may have an unknown change that may modify the trajectory. Such processes are found in a wide variety of fields from environmental sciences [40] to engineering [41].
In conclusion, we hope that this very userfriendly package will encourage the adoption of more sophisticated models for longitudinal data by the R community, with varying degrees of experience. Although illustrated in the context of cognitive aging, the package can be used in a wide variety of applications.
Availability and requirements
Project name: nlive R package
Project home page: https://cran.rproject.org/web/packages/nlive/index.html
Operatimg system: Platform independent
Programming language: R
Other requirements: No
License: MIT
Any restrictions to use by nonacademics: No
Availability of data and materials
The R package nlive can be installed from CRAN directly using in an R console install.packages("nlive"). Archived versions are available from the CRAN at https://CRAN.Rproject.org/package=nlive. The most recent update of the nlive package can be installed from Github by running the command remotes::install_github("MaudeWagner/nlive"). The madeup illustrative dataset analyzed during the current study is bundled with the R package nlive, and can be accessed by running the command data(dataCog, package = "nlive"). The R code to replicate the simulation study can be provided by the corresponding author upon reasonable request.
Abbreviations
 CRAN:

The comprehensive R archive newtwork
 PMM:

Piecewise mixed model
 SAEM:

Stochastic approximation expectation maximization
 SMM:

Sigmoidal mixed model
References
Zub H, Rambaud C, Bethencourt L, BrancourtHulmel M. Late emergence and rapid growth maximize the plant development of Miscanthus clones. Psychol Aging. 2012;5(4):841–54.
Gesztelyi R, Zsuga J, KemenyBeke A, Varga B, Juhasz B, Tosaki A. Hill equation and the origin of quantitative pharmacology. Arch Hist Exact Sci. 2012;66(4):427–38.
Gesztelyi R, Zsuga J, KemenyBeke A, Varga B, Juhasz B, Tosaki A. Strategic Marketing for High Technology Products: An Integrated Approach. Routledge; 2018.
Capuano A, Wilson R, Leurgans S, Dawson J, Bennett D, Hedeker D. Sigmoidal mixed models for longitudinal data. Stat Methods Med Res. 2018;27(3):863–75. https://doi.org/10.1177/0962280216645632.
Hall C, Lipton R, Sliwinski M, Stewart W. A change point model for estimating the onset of cognitive decline in preclinical Alzheimer’s disease. Stat Med. 2000;19(11–12):1555–66.
Pinheiro J, Bates D. Approximations to the loglikelihood function in the nonlinear mixedeffects model. J Comput Graph Stat. 1995;4(1):12–35.
Beal S, Sheiner L. Heteroscedastic nonlinear regression. Technometrics. 1988;30(3):327–38.
Caglar M, Teufel A, Wilke C. Sicegar: R package for sigmoidal and doublesigmoidal curve fitting. PeerJ. 2018;6:e4251. https://doi.org/10.7717/peerj.4251.
Ritz C, Spiess A. qpcR: an R package for sigmoidal model selection in quantitative realtime polymerase chain reaction analysis. Bioinforma (Oxford, England). 2008;24(13):1549–51.
Kahm M, Hasenbrink G, LichtenbergFrate H, Ludwig J, Kschischo M. Grofit: Fitting biological growth curves. Nat Precedings. 2010;ID:Kahm2010. https://doi.org/10.1038/npre.2010.4508.1.
Oswald S, Nisbet I, Chiaradia A, Arnold J. FlexParamCurve: R package for flexible fitting of nonlinear parametric curves. Methods Ecol Evol. 2012;3(6):1073–7.
Ranke J. Fitting doseresponse curves from bioassays and toxicity testing. R News. 2006;3:7–12.
Bornkamp B, Pinheiro J, Bretz F. MCPMod: An R Package for the Design and Analysis of DoseFinding Studies. J Stat Softw. 2009;29(7):1. https://doi.org/10.18637/jss.v029.i07.
Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixedeffects models using lme4. arXiv preprint arXiv:1406.5823. 2014.
Plan E, Maloney A, Mentré F, Karlsson M, Bertrand J. Performance comparison of various maximum likelihood nonlinear mixedeffects estimation methods for dose–response models. AAPS J. 2012;14(3):420–32.
Delyon B, Lavielle M, Moulines E. Convergence of a stochastic approximation version of the EM algorithm. Ann Stat. 1999;27(1):94–128.
Comets E, Lavenu A, Lavielle M. Parameter Estimation in Nonlinear Mixed Effect Models Using saemix, an R Implementation of the SAEM Algorithm. J Stat Softw. 2017;80. https://doi.org/10.18637/jss.v080.i03.
Muggeo V. Segmented: an R package to fit regression models with brokenline relationships. R News. 2008;8(1):20–5.
Segalas C, Amieva H, JacqminGadda H. A hypothesis testing procedure for random changepoint mixed models. Stat Med. 2019;38(20):3791–803.
Bacon D, Watts D. Estimating the transition between two intersecting straight lines. Biometrika. 1971;58(3):525–34. https://doi.org/10.1093/biomet/58.3.525.
Richardson S, Lawson R, Davis D, Stephan B, Robinson L, Matthews F, et al. Hospitalisation without delirium is not associated with cognitive decline in a populationbased sample of older people—results from a nested, longitudinal cohort study. Age Ageing. 2021;50(5):1675–81.
den Hout AV, MunizTerrera G, Matthews F. Smooth random change point models. Stat Med. 2011;30(6):599–610.
Bennett D, Buchman A, Boyle P, Barnes L, Wilson R, Schneider J. Religious Orders Study and Rush Memory and Aging Project. J Alzheimers Dis. 2018;64(s1):S161–89.
Gottschalk PG, Dunn JR. The fiveparameter logistic: a characterization and comparison with the fourparameter logistic. Anal Biochem. 2005;343(1):54–65. https://doi.org/10.1016/j.ab.2005.04.035.
Hinkley D. Inference about the intersection in twophase regression. Biometrika. 1969;56(3):495–504. https://doi.org/10.1093/biomet/56.3.495.
Hout AVD, MunizTerrera G, Matthews F. Change point models for cognitive tests using semiparametric maximum likelihood. Comput Stat Data Anal. 2013;57(1):684–98.
Miguez F, Pinheiro J. Package ‘nlraa’: Nonlinear Regression for Agricultural Applications. 2021. https://CRAN.Rproject.org/package=nlraa.
Liquet B, Commenges D. Correction of the Pvalue after multiple coding of an explanatory variable in logistic regression. Stat Med. 2001;20(19):2815–26.
Kuhn E, Lavielle M. Maximum likelihood estimation in nonlinear mixed effects models. Comput Stat Data Anal. 2005;49(4):1020–38.
Lindstrom M, Bates D. Nonlinear mixed effects models for repeated measures data. Biometrics. 2000;46(3):673–87.
ProustLima C, Philipps V, Liquet B. Estimation of extended mixed models using latent classes and latent processes: the R package lcmm. arXiv preprint arXiv:1503.00890. 2015.
Wickham H. ggplot2: elegant graphics for data analysis. Springer; 2016.
Jr CJ, Knopman D, Jagust W, Shaw L, Aisen P, Weiner M, et al. Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade. Lancet Neurol. 2010;9:119–28.
Buchman A, Capuano A, VanderHorst V, Wilson R, Oveisgharan S, Schneider J, et al. Brain betaAmyloid Links the Association of Change in Body Mass Index With Cognitive Decline in CommunityDwelling Older Adults. J Gerontol A. 2023;78(2):277–85.
Wilson R, Capuano A, Bennett D, Schneider J, Boyle P. Temporal course of neurodegenerative effects on cognition in old age. Neuropsychology. 2016;30(5):591–9.
Wilson R, Capuano A, Sytsma J, Bennett D, Barnes L. Cognitive aging in older Black and White persons. Psychol Aging. 2015;30(2):279–85.
Wilson R, Capuano A, Marquez D, Amofa P, Barnes L, Bennett D. Change in Cognitive Abilities in Older Latinos. J Int Neuropsychol Soc. 2016;22(1):58–65.
Gerstorf D, Ram N, Mayraz G, Hidajat M, Lindenberger U, Wagner G, et al. Latelife decline in wellbeing across adulthood in Germany, the United Kingdom, and the United States: Something is seriously wrong at the end of life. Psychol Aging. 2010;25(2):477–85.
Terrera G, Minett T, Crayne C, Matthews F. Education associated with a delayed onset of terminal decline. Age Ageing. 2014;43(1):26–31.
Banesh D, Petersen M, Wendelberger J, Ahrens J, Hamann B. Comparison of piecewise linear change point detection with traditional analytical methods for ocean and climate data. Environ Earth Sci. 2019;78(21):1–16.
Ozekici S. Reliability and maintenance of complex systems. Springer Science & Business Media; 2013.
Acknowledgements
The authors thank Dr. Emmanuelle Comets (Université Paris Cité, Université Sorbonne Paris Nord, Inserm, Université Rennes) for kindly reviewing and commenting on several versions of this manuscript and as a collaborator on the programming of the package. The authors also thank Dr. Lisa Barnes (Rush Alzheimer’s Disease Center [RADC]) for reviewing examples during the development of the package using the African American Clinical Core (AA Core) (data not shown), as well as the staff of RADC and the participants of the Minority Aging Research Study (MARS), Religious Orders Study (ROS), and Rush Memory and Aging Project (MAP). Finally, we thank the reviewers for taking the time to go over this article and package and provide important suggestions.
Funding
The study was supported by NIA grants Grants (R01AG 022018, P30AG072975, R01AG17917) and the Illinois Department of Public Health. Dr. Maude Wagner is supported by a postdoctoral fellowship from the French Foundation for Alzheimer’s Research (alzheimerrecherche.org). The funding organizations had no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; or the writing of the report or the decision to submit it for publication.
Author information
Authors and Affiliations
Contributions
AWC was responsible for the study conception and design. AWC and MW were involved in the analysis and interpretation of the data, drafting of the manuscript and critically revising the manuscript. MW performed the simulations. All authors take responsibility for the integrity of the data and the accuracy of the data analysis. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Supplementary materials.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Capuano, A.W., Wagner, M. nlive: an R package to facilitate the application of the sigmoidal and random changepoint mixed models. BMC Med Res Methodol 23, 257 (2023). https://doi.org/10.1186/s12874023020754
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874023020754