Skip to main content

Joint modelling of longitudinal lipids and time to coronary heart disease in the Jackson Heart Study

Abstract

Background

Multiple longitudinal responses together with time-to-event outcome are common in biomedical studies. There are several instances where the longitudinal responses are correlated with each other and at the same time each longitudinal response is associated with the survival outcome. The main purpose of this study is to present and explore a joint modeling approach for multiple correlated longitudinal responses and a survival outcome. The method will be illustrated using the Jackson Heart Study (JHS), which is one of the largest cardiovascular studies among African Americans.

Methods

Four longitudinal responses, i.e., total cholesterol (TC), high density lipoprotein (HDL) cholesterol, triglyceride (TG) and inflammation measured by high-sensitivity C-reactive protein (hsCRP); and time-to-coronary heart disease (CHD) were considered from the JHS. The repeated lipid and hsCRP measurements from a given subject overtime are likely correlated with each other and could influence the subject’s risk for CHD. A joint modeling framework is considered. To deal with the high dimensionality due to the multiple longitudinal profiles, we use a pairwise bivariate model fitting approach that was developed in the context of multivariate Gaussian random effects models. The method is further explored through simulations.

Results

The proposed model performed well in terms of bias and relative efficiency. The JHS data analysis showed that lipid and hsCRP trajectories could exhibit interdependence in their joint evolution and have impact on CHD risk.

Conclusions

We applied a unified and flexible joint modeling approach to analyze multiple correlated longitudinal responses and survival outcome. The method accounts for the correlation among the longitudinal responses as well as the association between each longitudinal response and the survival outcome at once. This helps to explore how the combination of multiple longitudinal trajectories could be related to the survival process.

Peer Review reports

Background

Longitudinal data together with time-to-event measurements are common in biomedical studies. In cardiovascular researches, for example, as we focus in this paper, lipid levels, i.e., total cholesterol (TC), low density lipoprotein (LDL) cholesterol, high density lipoprotein (HDL) cholesterol and triglyceride (TG) could be measured repeatedly over time along with time-to-event data such as time to coronary heart disease (CHD). Abnormalities in LDL cholesterol, HDL cholesterol, TC and TG levels have all been linked to increased risk for CHD. Furthermore, inflammation measured by high-sensitivity C-reactive protein (hsCRP) correlates with CHD status. As people age, body mass and composition as well as physical activity levels and diet tend to change which are often associated with changes in lipid levels.

There is an extensive literature available on separate analyses of longitudinal measurement data and time-to-event data including the popular linear mixed effects model for the former [1, 2], and parametric (Weibull model or exponential model) or semiparametric (Cox) proportional hazards model for the latter [3]. However, separate analyses could be inappropriate in some applications as health status (for example, risk to CHD) is likely correlated with longitudinal lipid level trajectory. Furthermore, separate analysis may not adequately address the underlying research questions. Wang and Taylor [4] include the longitudinal marker as a time-dependent covariate in the (proportional hazards) survival model. It has been shown, however, that including longitudinal measurements as time-varying covariate may be inappropriate as the measurements could be subject to measurement error [5]. Several approaches of joint modeling of a longitudinal continuous response and a time-to-event outcome have been available. Tsiatis and Davidian [6] proposed a two-stage joint model in application to longitudinal CD4 count data as surrogate marker to HIV/AIDS survival whereby a repeated measures random components model is considered in the first stage for the longitudinal CD4 count data while a Cox model is employed in the second stage for estimating the parameters. Henderson et al. [7] proposed a flexible joint modelling framework such that the longitudinal and survival processes are connected via a latent bivariate Gaussian process. A fully Bayesian version of this approach, implemented via Markov chain Monte Carlo (MCMC) methods is presented in Guo and Carlin [8]. Ibrahim et al. [9] applied such joint modeling and explored why it is needed in cancer research. A joint model for longitudinal and survival data with nonparametirc multiplicative random effects approach has been studied by Ding and Wang [10].

The majority of the joint models in the literature focus on a single longitudinal response and a survival outcome. Little attention has been given to modeling multiple longitudinal responses and a survival process simultaneously. Flexible semiparametric multivariate joint model that relate multiple longitudinal outcomes to a time-to-event have been proposed [11, 12] with emphasis on the association between the longitudinal data and survival outcome. Such multivariate joint analysis is also needed when one aims to explore not only the association between an individual longitudinal outcome and a survival process but also the correlation between the multiple longitudinal responses. In the Jackson Heart Study (JHS), described in “Methods” section, longitudinal HDL, TC, TG and hsCRP measurements taken from each subject repeatedly over time are subject to measurement error, potentially correlated with each other, and at the same time each of these lipid and hsCRP profiles is likely associated with the subject’s risk for CHD. It is then useful to assess these features by studying their association and simultaneous covariate effects, which can be addressed in joint modelling context. However, joint modeling of such multivariate longitudinal responses and a survival outcome poses computational challenges due to the high dimensionality in the random effects model.

This paper aims to present a joint modeling approach for multiple longitudinal responses and a survival data. To deal with the high dimensionality due to multiple longitudinal components, a pairwise model fitting approach that was developed in the context of multivariate Gaussian random effects is adopted [13].

The rest of the paper is organized as follows. In “Study data” section, a motivating dataset is described with analysis reported in “Results” section. “Review of basic models” section focuses on brief review of models for separate and joint analysis of a continuous longitudinal response and a survival outcome. “Multivariate joint model” section focuses on a joint model for multivariate continuous longitudinal responses and a time-to-event outcome, followed by a likelihood based estimation approach along the lines of Fieuws and Verbeke [13] in “Estimation” section. “Simulation” section presents results of a simulation study. Discussion and concluding statements are given in “Discussion” and “Conclusion” sections, respectively.

Methods

Study data

The Jackson Heart Study (JHS) is a longitudinal community-based study that investigates the causes of cardiovascular disease among African Americans. JHS participants were recruited from urban and rural areas from the three counties that make up the Jackson metropolitan area (Hinds, Madison, and Rankin). The 5,306 enrolled participants received three clinical examinations (Exam 1, 2000-04; Exam 2, 2005-08; Exam 3, 2009-13) that generated a plethora of longitudinal data including, but not limited to, cardiovascular disease risk factors, socio-economic information and biochemical analytes. In addition to the clinic exams, JHS has kept track of the occurrence of events of interest (i.e. coronary heart disease, abbreviated as CHD) through adjudicated events. Details on the design methods and data collection of the JHS can be obtained from Taylor et al. [14] and Carpenter et al. [15].

This paper uses data from two sources of JHS: (1) the longitudinal clinical examinations where sex, diabetes status, hypertension status, statin use, body mass index, inflammation, and lipid levels are extracted from; and (2) the annual follow-up surveys where Coronary Heart Disease (CHD) adjudicated events through 2016 are obtained. We allow for right censoring, i.e., those who experienced CHD since the baseline year (2000) were considered as incident cases and those who have died, withdrew from the study or have not seen CHD until 2016 are treated as censored. The time variable (event or censoring) begins from the baseline year and we haven’t assumed truncation as present. These variables are summarized in Table 1 by CHD status across the three measurement occasions where count and percentages are used for categorical variables and means and SDs for continuous variables. There are some notable highlights. Of the 4232 participants who have lipids and hsCRP measurements available at baseline, 236 (5.6%) experienced CHD event with a median survival time of 13.8 years. The mean age at baseline for non-CHD was 53.2 ±12 years while 62.0 ±10 years for those who had CHD. The proportion of diabetes as well as hypertension is higher in the CHD incident cases as compared to those without CHD across all visits. The mean HDL seems to increase over time, though the magnitude of mean level in non-CHD participants appears slightly higher when compared across the respective visits. The mean TG, however, tends to decline in both groups and non-CHD participants appear to have lower values. Incident cases are more likely to be men. Individual profiles plots of TG, hsCRP, HDL and TC for a random sample of one hundred participants are shown in Fig. 1. Clearly, there is evidence of between-subjects variability both at baseline and over time. On the other hand, Fig. 2 shows the average profiles plots of TG, hsCRP, HDL and TC for the full cohort. The average TG trajectory tends to decline while hsCRP and HDL seem to suggest an increasing average trend. TC appears to be more or less stable. A formal statistical test is needed to see the significance of these comparisons which will be dealt with later in “Results” section.

Fig. 1
figure1

Individual profiles plot of triglyceride (TG), high-sensitivity C-reactive protein (hsCRP), high density lipoprotein (HDL) and total cholesterol (TC) for a random sample of 100 participants

Fig. 2
figure2

Average trajectory of triglyceride (TG), high-sensitivity C-reactive protein (hsCRP), high density lipoprotein (HDL) and total cholesterol (TC) for all participants

Table 1 Participants Characteristics by coronary heart disease status and visit through 2016, Jackson Heart Study

The Jackson Heart Study provides a unique opportunity to study the longitudinal trajectory of lipids and hsCRP and incident CHD. Therefore, we will assess the association between the longitudinal lipids and hsCRP with incident CHD as well as the correlation between the longitudinal lipids and hsCRP measurements in African American participants.

Review of basic models

In “Linear mixed model for longitudinal data”, we will briefly review the linear mixed model for continuous longitudinal data, followed by a basic parametric model for survival data in “Model for time-to-event data” sections. “Joint model” section focuses on a joint model for a longitudinal response and time-to-event outcome.

Linear mixed model for longitudinal data

Let \(\phantom {\dot {i}\!}y_{i1}, y_{i2}, \ldots y_{i n_{i}}\) be longitudinal measurements from the ith subject at times \(\phantom {\dot {i}\!}t_{i1}, t_{i2}, \ldots t_{i n_{i}}\). The model can be written as

$$ y_{{ij}}=\boldsymbol{{x^{\prime}_{1i}}}\left(t_{{ij}}\right) \boldsymbol{\beta} + {\boldsymbol{z}^{\prime}_{1i}}\left(t_{{ij}}\right) {\boldsymbol{b}_{1i}} + \varepsilon_{{ij}}, $$
(1)

where x1i(tij) are regressors corresponding to unknown regression coefficients β,b1i are random effects with design matrix z1(tij), and εi are measurement errors. The subject-specific random effects b1i and measurement error εi are independent and assumed to follow normal distribution, i.e., b1iN(0,D), and εiN(0,Σi). Details on linear mixed models can be found in Laird and Ware [1].

Model for time-to-event data

Suppose si denotes the survival time for subject \(i,~i=1, 2, \dots, n.\) The event of interest may not be observed for some of the subjects until the end of the follow-up period and hence their event times are right censored. Let ci be the censoring time for subject i. For an indicator function δi=I(sici), it follows that, δi=0 if the survival time for subject i is right censored and δi=1 otherwise. For subject i the observed time-to-event data are (ti,δi) where ti=min(si,ci),i=1,2…,n.

Weibull and exponential models are commonly used parameteric models in survival data analysis. In a Weibull model the density for the survival time for the ith subject is given by

$$ f(t_{i}) = \alpha {t_{i}}^{\alpha-1} \text{exp}\left(\lambda_{i} - \text{exp}(\lambda_{i}) {t_{i}}^{\alpha}\right), $$
(2)

α>0,λi can be linked with covariates as λi=x2iξ, where x2i is the vector of covariates corresponding to the ith observation and ξ is a vector of regression coefficients. The Weibull model in (2) reduces to the exponential model when α=1.

Joint model

The longitudinal response in “Linear mixed model for longitudinal data” and survival process in “Model for time-to-event data” sections can be correlated with each other. One possible route to study this association is through joint modeling, where the linear mixed model in (1) is linked with the survival model in (2) through shared random effects.

Assuming a linear mixed effects model for the longitudinal part and a Weibull model for the survival part, the joint distribution of the longitudinal response and the time-to-event outcome conditional on the random effects is:

$$\begin{array}{@{}rcl@{}} &&f_{i}\left(y_{{ij}}, t_{i}| {\boldsymbol{b}_{1i}}, {\boldsymbol{b}_{2i}},\boldsymbol{\beta},\boldsymbol{\xi}, \boldsymbol{\theta}\right)\\&&=\frac{1}{{(2\pi)}^{\frac{n_{i}}{2}}{|\Sigma_{i}|}^{\frac{1}{2}}} e^{\left\{-\frac{1}{2}\left(\boldsymbol{y_{i}}- \boldsymbol{{x^{\prime}_{1i}}} \boldsymbol{\beta} - {\boldsymbol{z}^{\prime}_{1i}} {\boldsymbol{b}_{1i}}\right)^{\prime}{\Sigma_{i}}^{-1}\left(\boldsymbol{y_{i}} - \boldsymbol{{x^{\prime}_{1i}}} \boldsymbol{\beta} - {\boldsymbol{z}^{\prime}_{1i}} {\boldsymbol{b}_{1i}}\right)\right\}} \\ &&\quad \times \left[\alpha {t_{i}}^{\alpha - 1} {e}^{\boldsymbol{{x^{\prime}_{2i}}} \boldsymbol{\xi} + {\boldsymbol{b}_{2i}}} \right]^{\delta_{i}} e^{{- t_{i}}^{\alpha} e^{\boldsymbol{{x^{\prime}_{2i}}} \boldsymbol{\xi} + {\boldsymbol{b}_{2i}}}}, \end{array} $$

where b2i=θb1i,b1i and b2i are latent zero-mean bivariate Gaussian processes corresponding to the longitudinal measurements and events, θ is vector of association parameters, and the rest of the quantities are as defined in “Linear mixed model for longitudinal data” and “Model for time-to-event data” sections. For random intercepts b0i and random slopes b1i, for example, one possible model could be

$${\boldsymbol{z}^{\prime}_{1i}} {\boldsymbol{b}_{1i}}= b_{0i} + b_{1i} t_{{ij}},$$
$${\boldsymbol{b}_{2i}}= \theta_{1} b_{0i} + \theta_{2} b_{1i},$$

where θ1 and θ2 measure the association between the two submodels induced by the random intercepts and random slopes, respectively. Other ways of parametrization for the association of the two submodels can also be used as well [7]. Furthermore, b1i=(b0i,b1i) is assumed to follow bivariate normal distribution, possibly correlated, with mean 0 and variance-covariance matrix D given by

$$ D= \left(\begin{array}{cc} d_{0}^{2}& \rho d_{0} d_{1}\\ \rho d_{0} d_{1}&d_{1}^{2}\\ \end{array} \right), $$
(3)

where ρ is the correlation between the random intercepts and random slopes.

Multivariate joint model

The joint mode in “Joint model” section can be extended to a multivariate setting where k longitudinal continuous responses can be simultaneously modeled with time-to-event data. A multivariate joint model formulation of the k longitudinal responses and the time-to-event outcome follows from combining each single joint model of the longitudinal continuous response and the time-to-event outcome via random effects that are assumed to follow multivariate normal distribution. The general expression for the resulting multivariate joint model can be given by

$$ f=\idotsint \prod\limits_{\ell=1}^{k} f (y_{\ell},t)|\boldsymbol{b_{\ell}}) \phi(\boldsymbol{b_{1}}, \ldots, \boldsymbol{b_{k}}) d\boldsymbol{b_{1}}\ldots d\boldsymbol{b_{k}}, $$
(4)

where f(y,t) is the single joint model corresponding to the th continuous response and the time-to-event data as described in “Joint model” section and ϕ is a k-dimensional Gaussian density for the random effects (b1,…,bk), such that, for subject i, each bℓi is shared between the linear mixed submodel corresponding to the th longitudinal response and the time-to-event submodel through b1i and b2i, respectively, as shown in “Joint model” section. The likelihood contribution of subject i is li(Y1i,…,Yki,ti|Λ), where Λ is the vector of fixed effect and covariance parameters in the joint model. The full multivariate joint model in (4) is complex due to the high dimensionality in the random effects and inference for Λ is not feasible in the standard statistical softwares. An estimation approach to overcome this computational challenge and reduce the dimensionality along the lines of Fieuws and Verbeke [13] will be the topic of “Estimation” section.

Estimation

We employ the pairwise bivariate fitting to reduce the high dimensionality of (4) and estimate the model parameters in Λ [13]. By pairwise fitting we mean that bivariate joint models of all possible pairwise combinations of the longitudinal continuous responses and a time-to-event data are fitted and duplicate parameter estimates are averaged across all pairs. Consider k random-intercept models that are specific to k single joint models where each joint model represents a longitudinal continuous responses and time-to-event outcome as described in “Joint model” section. The possible number of pairs out of the k continuous responses is \(P=\frac {k(k-1)}{2}\). For subject i, consider a vector of the rth and sth bivariate longitudinal responses and the time-to-event outcome (Yri,ti) and (Ysi,ti), respectively, where \(\boldsymbol {Y_{{ri}}}=\left (y_{ri1}, \ldots, y_{ri n_{{ri}}}\right)\) and \(\boldsymbol {Y_{{si}}}=\left (y_{si1}, \ldots, y_{si n_{{si}}}\right), r=1, \ldots, k-1, s=r+1,\ldots, k.\) We assume that (Yri,ti) and (Ysi,ti) are conditionally independent given bi=(bri,bsi), where biMVN(0,D) is vector of random effects corresponding to the rth and sth bivariate joint models with mean 0 and covariance matrix D.

The likelihood to be maximized corresponding to the rth and sth bivariate joint model of two longitudinal continuous responses the time-to-event data takes the form

$$ \begin{aligned} {}\prod\limits_{i=1}^{N}\int\int&\left\{ \prod\limits_{j=1}^{n_{{ri}}}f \left(y_{{rij}}, t_{i}|\boldsymbol{b_{{ri}}}\right)\right\} \\\times&\left\{ \prod\limits_{j=1}^{n_{{si}}}f \left(y_{{sij}}, t_{i}|\boldsymbol{b_{{si}}}\right)\right\} \phi\left(\boldsymbol{b_{{ri}}}, \boldsymbol{b_{{si}}}\right) d\boldsymbol{b_{{ri}}}d\boldsymbol{b_{{si}}}, \end{aligned} $$
(5)

where f and ϕ are as defined in (4) and N is the total number of subjects. Fitting these bivariate models is computationally feasible as there are only two correlated random effects. The loglikelihood across all possible pairs can be given as \({\sum \nolimits }_{p=1}^{P} l\left (\boldsymbol {Y_{p}} | \boldsymbol {\Lambda }^{\prime }_{p}\right)\), where p=1,…,P, and \(\boldsymbol {\Lambda }^{\prime }_{p}\) is the vector of parameters corresponding to the pth bivariate joint model. All pair-specific parameter vectors \(\boldsymbol {\Lambda }^{\prime }_{p}\) can be combined and the so resulting vector of parameters is denoted by Λ. Assuming that \(\widehat {\boldsymbol {\Lambda }^{\prime }_{p}}\) maximizes \(l\left (\boldsymbol {Y_{p}} | \boldsymbol {\Lambda }^{\prime }_{p}\right), \widehat {\boldsymbol {\Lambda }^{\prime }}\) would maximize l(Λ). An estimate for each parameter in Λ of the full multivariate joint model defined in “Multivariate joint model” section is obtained by averaging all pair-specific estimates in \(\widehat {\boldsymbol {\Lambda }^{\prime }}\). The resulting averages are asymptotically normal. However, the standard errors can’t be directly computed by simple averaging of their corresponding pair-specific estimates. Details on the averaging of pair specific estimates and expression of the asymptotic normality of Λ follows in a similar way as the multivariate linear mixed model and multivariate semi-continuous data and can be obtained in Fieuws and Verbeke [13]. This approach can be well generalized to random intercepts and random slopes model of course with additional computational complexity. Each bivariate pair consists of a longitudinal response and a time-to-event outcome as shown in (5) does not have a closed form solution and is fitted numerically with the SAS procedure NLMIXED using adaptive Gaussian quadrature based on quasi-Newton optimization technique. Other flexible non-linear optimizers can be employed as well. A SAS macro has been used to fit the P bivariate models and combine the results.

Results

We analyze the JHS data introduced in “Study data” section. A multivariate joint model of “Multivariate joint model” section will be fitted for the longitudinal measurements of high density lipoprotein (HDL) cholesterol, triglyceride (TG), total cholesterol (TC), high-sensitivity C-reactive protein (hsCRP), and time to coronary heart disease (CHD) to capture the association between each of these longitudinal response and the time-to-CHD as well as the correlation between HDL, TG, TC and hsCRP. Log-transformation was applied to the four longitudinal responses based on earlier exploratory analysis. Let yij,=1,…,4 be the th log-transformed continuous response for subject i at time j and we model the mean μij as

$$\begin{array}{*{20}l}\mu_{\ell ij}&=\beta_{\ell 0}+\beta_{\ell 1}A_{i}+\beta_{\ell 2} M_{i}+\beta_{\ell 3} D_{{ij}}+ \beta_{\ell 4} H_{{ij}}\\&\quad+\beta_{\ell 5} T_{{ij}}+\beta_{\ell 6} BMI_{{ij}}+\beta_{\ell 7} ST_{{ij}}+\beta_{\ell 8} SM_{i} +b_{\ell i}, \end{array} $$

where Ai baseline age for subject i, Mi a gender indicator for subject i coded as (1: male; 0: female), Dij is a diabetic status indicator for subject i at time j coded as (1: diabetic; 0: non-diabetic), Hij is a hypertension status indicator for subject i at time j coded as (1: hypertensive; 0: non-hypertensive); Tij is the j measurement time for subject i in years from baseline; BMIij is body mass index for subject i at time j, STij is a statin medication indictor for subject i at time j coded as (1: medication; 0: no medication), SMi is current smoking indicator for subject i at baseline coded as (1: smoker; 0: non-smoker) and bi are subject specific random intercepts for the th continuous response.

Turning to the time-to-CHD, we consider an exponential submodel in each joint model with the th continuous response, and further the hazard at time t is modeled as,

$$\begin{array}{*{20}l}\lambda_{\ell i} (t)&=\text{exp}\left(\xi_{\ell 0}+\xi_{\ell 1}A_{i}+\xi_{\ell 2} M_{i}+\xi_{\ell 3} D_{i}+ \xi_{\ell 4} H_{i}\right.\\&\quad\left.+\xi_{\ell 5} BMI_{i}+\xi_{\ell 6} ST_{i}+\xi_{\ell 7} SM_{i} + \theta_{\ell} b_{\ell i}\right), \end{array} $$

where θ captures the association between the linear mixed submodel and the exponential submodel induced by the random effects. The correlation between any two pairs of longitudinal responses is captured by the correlation of the respective random intercepts.

Six pairs of bivariate models were fitted for the four responses using the pairwise fitting approach outlined in “Estimation” section. Results are shown in Table 2. Clearly, the estimated association parameter of TG and CHD is significant, suggesting evidence of strong negative association between the two submodels (\(\hat {\theta }=-0.4473, p=0.009\)), and this implies that high level of TG is associated with poor survival or shorter time-to-CHD. The same is true for the association between hsCRP and CHD (\(\hat {\theta }=-0.2504, p=0.005\)). On the contrary, the estimate corresponding to HDL (\(\hat {\theta }=1.0472, p=0.002\)) is positive and significant suggesting high HDL level is associated with better survival. On the other hand, the association parameter estimate corresponding to TC provides no evidence of association between total cholestol and CHD (p=0.151). Consider testing hypotheses about the four association parameters jointly. One can employ a multivariate Wald test for the hypothesis H0=L(θ)=0, where L is a 4 ×4 identity matrix corresponding to \(\widehat {\theta }_{\ell }, \ell =1, \ldots, 4\). Thus, \(W^{2}=\left (\mathbf {L} \boldsymbol {\widehat {\theta }}\right)^{\prime } {\left (\mathbf {L}\text {COV}(\widehat {\boldsymbol {\theta }}) \mathbf {L}^{\prime }\right)}^{-1}\left (\mathbf {L}\widehat {\boldsymbol {\theta }}\right) \sim {{\chi }^{2}}_{(4)}\), where \(\text {COV}\left (\widehat {\boldsymbol {\theta }}\right)\) is the 4 ×4 covariance matrix of \(\widehat {\boldsymbol {\theta }}\) obtained from fitting the multivariate joint model. This test suggests joint significance of the association between the four longitudinal responses and CHD (χ2=18.72,p=0.0009). Another important output of this analysis is estimated correlation of the random intercepts. In this regard, we observe evidence of negative correlation between TG and HDL (−0.5026) as well as HDL and hsCRP (−0.1112), while positive correlation between TG and TC (0.4390), TG and hsCRP (0.1478), and HDL and TC (0.2235). However, there is no evidence of significant correlation between TC and hsCRP. The estimated standard deviation of the random effects are significant across all the responses, implying the presence of considerable between-subjects variability that needs to be appropriately accounted for. All the estimates corresponding to time in the linear mixed submodel \(\left (\hat {\beta _{5}}\right)\) are significant, i.e., negative time effect for TG, but positive effect for HDL, TC and hsCRP, and this is inline with what has been observed from the average profiles plot in Fig. 2. Those who are in statin medication have lower TG, TC and hsCRP, while no signifcant assocation was observed between HDL and statin use. Smokers tend to have elevated mean TG and hsCRP as compared to non-smokers. Those who are diabetic have higher TG and hsCRP but lower HDL and TC. The estimates of covariate effects in the survival submodel are quite similar except some slight variation depending on which of the longitudinal responses of TG, HDL, TC or hsCRP was modeled in the linear mixed submodel, and this is expected as these lipid and hsCRP measures have different effects on CHD survival as demonstrated by the differences in the estimates of the association parameter (θ) as well as the random intercept standard deviations (d). CHD survival is estimated to be shorter among hypertensive patients. For example, looking the parameter estimate of HTN \(\left (\hat {\xi _{4}}\right)\) correspond to TG, we observe that those who are hypertensive have about 60% (1−e−0.9204=1−0.3984) shorter survival time. Similarly, smokers, diabetic patients, male gender and older age have shorter survival or higher risk to CHD. But no significant association was observed between BMI and CHD survival. Likely due to the smaller number of repeated measurements per subject, incorporating an additional random slope term and association parameter θ2 as described in “‘Joint model” section did not improve model fit. This implies that the subject specific random intercepts are sufficient to describe the longitudinal aspect in these data.

Table 2 Parameter estimates and standard errors of multivariate joint modelling of Triglycerides (TG) with CHD, High density lipoprotein (HDL) with CHD, high-sensitivity C-reactive protein (hsCRP) with CHD, and Total cholesterol (TC) with CHD

For comparisons purpose, separate joint model of “Joint model” section were fitted. The results are summarized in Table 3. When compared with their respective multivariate joint model estimates of Table 2, we notice slight differences in the association parameters estimates corresponding to TG and hsCRP as well as the intercept terms of the survival component. The rest of the parameter estimates appear to be similar. Furthermore, we fitted an exponential model for time-to-CHD by including only baseline values of lipids separately while adjusting for the same covariates of the survival submodel considered so far. The estimated regression coefficients (standard error) corresponding to baseline TG, HDL, hsCRP and TC are −0.2354 (0.0876),0.6206 (0.1834),−0.1674 (0.0421), and −0.5710 (0.2344), respectively. Comparing these results with the association parameter estimates of joint multivariate model in Table 2, we clearly observe a remarkable difference in modeling the longitudinal trajectories versus the baseline measurements of lipids. We will explore this further in “Simulation” section with simulation.

Table 3 Parameter estimates and standard errors of univariate joint modelling of Triglycerides (TG) with CHD, High density lipoprotein (HDL) with CHD, high-sensitivity C-reactive protein (hsCRP) with CHD, and Total cholesterol (TC) with CHD

Simulation

In this section, we report on a simulation study set up to examine the performance of the pairwise bivariate fitting in joint modeling of multiple longitudinal measurements and a time-to-event data. The estimates from the pairwise bivariate fitting will be compared with the full likelihood multivariate joint model in terms of bias and relative efficiency.

We randomly generated 1000 data sets for three correlated longitudinal responses yij and a time-to-event outcome ti from the multivariate joint model of “Multivariate joint model” section for 1000 subject at 7 time points. The th longitudinal data were generated as yij=β0+β1timeij+β2malei+bi+εij, i=1,…,1000;j=0,…,6;=1,2,3. Similarly, the time-to-event data corresponding to the yij is generated from Weibull(α,μi), where μi=ξ0+ξ1malei+θbi, and the true parameter values for the three responses were β1=(3.7,−1,0.5),β2=(4,1,−0.5),β3=(5,1,−0.5),ξ=ξ=(5,−0.5), vector of association parameters of the longitudinal and time-to-event data θ=(θ1,θ2,θ3)=(−0.8,0.6,−0.5),α=α=0.5, the residual errors εij are assumed independent, each generated from normal distribution with mean 0 and standard deviations 0.8, 1 and 1. The random intercepts (b1i,b2i,b3i) are assumed correlated and generated from multivariate normal distribution with mean 0, standard deviations 0.7, 0.5, and 0.5, and correlations: ρ12=−0.5,ρ13=0.6 and ρ13=0.3.

The simulated data were analyzed by the multivariate joint model using pairwise bivariate fitting and full likelihood trivariate joint model. All the 1000 simulations converged in the pairwise fitting while 985 simulations do so in the full trivariate model, i.e., nearly 1.5% failed to converge in the latter. For comparison purposes, results of the 985 successful simulations of the full multivariate joint model and the same number of simulations from the pairwise fitting, based on simulation ID number, were used. The average estimated values from the pairwise bivariate fitting (MeanP) and full trivariate joint model (MeanF) are summarized in Table 4. All of the mean estimates from the pairwise fitting are close to the true values and nearly unbiased for the proposed approach. Furthermore, as shown in Table 5, the Monte Carlo standard errors from the pairwise fitting (MC.SEP) are close to those from the full trivariate joint model (MC.SEF) with their ratios range from 0.997 to 1.056, and this suggests that no considerable efficiency loss of the pairwise bivariate approach relative to the full likelihood multivariate model.

Table 4 Summary of simulation with pairwise fitting (MeanP) and full multivariate joint model (MeanF)
Table 5 Monte-Carlo standard error with pariwise fitting (MC.SEP) and full multivariate joint model (MC.SEF)

Furthermore, the simulated data were analyzed by Weibull model of “Model for time-to-event data” section to investigate the impact of omitting the longitudinal aspect as well as the correlation between the longitudinal responses. This was achieved by including only basline measurements of the longitudinal responses as covariates in the regression model for the time-o-event outcome. Three separate models were fitted as Weibull(α,μi), where μi=ξ0+ξ1malei+θyi,yi is baseline values for the longitudinal response yij and the rest of the quantizes remain defined earlier. The results are summarized in Table 6. Clearly, the impact of taking just the baseline values of yi and ignoring the longitudinal trajectories is remarkable. This can be clearly observed from the considerable bias of the regression coefficients including the association parameter θ.

Table 6 Summary of simulation with Weibull model

Discussion

The method has been used to jointly analyze four multivariate longitudinal responses and a time-to-event outcome from a cardiovascular study on African Americans. The data analysis showed inverse relationship between HDL cholesterol and incidence of coronary heart disease, while elevated triglyceride level and high-sensitivity C-reactive protein are associated with increased risk for CHD. Duncan et al. [16] who studied association between longitudinal lipid trajectories and atherosclerotic cardiovascular disease (ASCVD) including CHD reported a similar finding on the association of HDL cholesterol and ASCVD. This is inline with Wilikins et al. [17] and Skrettebegr et al. [18]. While Duncan’s investigation did not find a significant association between triglyceride and ASCVD, we observe an association of triglyceride with CHD which is consistent with Sarwar et al. [19]. This is likely because in Ducan’s study ASCVDE includes CHD, stroke, or transient ischemic attack, and intermittent claudiction while the present study and Sarwar’s investigation focus on CHD. Jeong et al. [20] reported a similar finding on the association of triglyceride and CHD but reported a significant association between total cholesterol and CHD which is not the case for total cholesterol in the current study. This could be attributed to the fact that Jeong et al. [20] examined young adults (aged 20-39 years) while JHS participants are relatively older.

Our simulation study suggest that bivariate fitting of such multivariate joint model yields unbiased estimates which are as nearly efficient as the full likelihood model. These results are inline with the performance of the pairwise bivariate fitting in the context of multivariate Gaussian longitudinal data as shown in Fieuws and Verbeke [13] and of semi-continuous longitudinal data given in Kassahun-Yimer et al. [21]. Pairwise bivariate fitting is approximately unbiased in missing completely at random (MCAR) and missing at random (MAR) when outcome-specific parameters are considered, but these results may not necessarily be generalized to all missing data problems [21]. Details on missing data mechanisms can be obtained in Little and Rubin [22]. Alison [23] presented a survey of methods to handle missing data and indicated that maximum likelihood based inference for handling missing data have nearly optimal statistical properties under MAR. Ding and Wang [10], however, showed that in joint modeling of univariate longitudinal response and time-to-event modeling, informative dropout on the longitudinal component could result in serious bias. Multiple imputation is also an attractive approach in handling missing data in some applications despite the several questions posed in relation to the distributional choices, number of imputations and iterations and prior knowledge about the missingness pattern [22, 23]. Fieuws and Verbeke [13] pointed out efficiency loss can be present when some of the parameters are shared by a set of outcomes. An additional simulation conducted to investigate the impact of taking just the baseline values of longitudinal response and including as covariates in survival regression indicated that parameter estimates could be severely biased. This is inline with earlier simulation studies. For example, Henderson et al. [7] and Ibrahim et al. [9] pointed out that severe bias could occur for some parameter estimates when there is ignored latent association between the time-to-event and longitudinal data.

Apart from several advantages, our proposed approach has also limitations. First, right censoring is assumed for the time-to-event outcome, though other mechanisms of censoring could also be operating in practice. Second, we assumed the link between the longitudinal and the survival submodels is constant over time. Third, earlier simulations suggest that pairwise bivariate fitting is approximately unbiased in MCAR and MAR but this may not be the case for all longitudinal data with missingness. Hence, exploring the proposed approach in various missing data mechanisms and patters together with appropriate model diagnostic tools is a useful area of future work.

Conclusion

In this paper, we have presented and studied a multivariate joint model for multiple longitudinal Gaussian responses and a time-to-event outcome which follows from combining each single joint model of a longitudinal continuous response and the time-to-event outcome via random effects whereby a linear mixed submodel is used for the longitudinal continuous part while a time-to-event submodel is considered for the survival part. The model was illustrated using the Jackson Heart Study. The association between lipid levels and CHD risk obtained from the joint model that takes into account the longitudinal lipid trajectories is substantially higher than that estimated using the baseline lipid values only. The Weibull model (exponential model as its special case) is most common for time-to-event data due to its flexibility, but alternative distributions, such as, the log-normal, log-logistic and gamma could provide better fit to a particular dataset. The choice of one distribution from the other could be based on exploratory graphical tools or formal statistical tests. The pairwise bivariate model fitting approach employed for the Weibull model can be well extended to any other probability density function in the survival submodel.

Availability of data and materials

Data used in this analysis were produced and used in accordance with the policies of the Jackson Heart Study under contracts from the National, Heart, Lung, and Blood Institute and are not the domain of the authors but that of the Jackson Heart Study. These data are available to other researchers for purposes of reproducing the results or replicating the procedures by submitting a manuscript proposal to the Jackson Heart Study at jhspub @umc.edu, as described at https://www.jacksonheartstudy.org/Research/Publications#submitmanuscript. Data updates for the Jackson Heart Study are also deposited regularly in the National Institutes of Health data repositories, dbGaP (https://www.ncbi.nlm.nih.gov/gap/) and BioLincc (https://biolincc.nhlbi.nih.gov/home/).

Abbreviations

CHD:

Coronary heart disease

HDL:

High density lipoprotein

JHS:

hsCRP: High-sensitivity C-reactive protein

Jackson heart study; SD:

Standard deviation

TC:

Total cholesterol

TG:

Triglyceride

References

  1. 1

    Laird N, Ware J. Random-effects models for longitudinal data. Biometrics. 1982; 38:963–74.

    CAS  Article  Google Scholar 

  2. 2

    Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. New York: Springer; 2000.

    Google Scholar 

  3. 3

    Hosmer D, Lemeshow S, May Susanne. Applied survival analysis: regression modeling of time-to-event data. New York: John Wiley & Sons, Inc.; 2008.

    Book  Google Scholar 

  4. 4

    Wang Y, Taylor J. Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. J Am Stat Assoc. 2001; 96:895–905.

    Article  Google Scholar 

  5. 5

    Rizopoulos D. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics. 2011; 67:819–29.

    Article  Google Scholar 

  6. 6

    Tsiatis A, Davidian M. Joint modeling of longitudinal and time-to-event data: An Overview. Stat Sin. 2004; 14:809–34.

    Google Scholar 

  7. 7

    Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1:465–80.

    CAS  Article  Google Scholar 

  8. 8

    Guo X, Carlin B. Separate and Joint modeling of longitudinal and event time data using standard computer packages. Am Stat. 2004; 58:16–24.

    Article  Google Scholar 

  9. 9

    Ibrahim J, Chu H, Chen L. Basic concepts and methods for joint models of longitudinal and survival data. J Clin Oncol. 2010; 28:2796–801.

    Article  Google Scholar 

  10. 10

    Ding J, Wang J. Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics. 2008; 64:546–56.

    Article  Google Scholar 

  11. 11

    Rizopoulos D, Ghosh Pulak. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Stat Med. 2011; 30:1366–80.

    Article  Google Scholar 

  12. 12

    Brown E, Ibrahim J, DeGruttola V. A flexible B-spline model for multiple longitudinal biomarkers and survival. Biometrics. 2005; 61:64–73.

    Article  Google Scholar 

  13. 13

    Fieuws S, Verbeke G. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics. 2006; 62:424–31.

    Article  Google Scholar 

  14. 14

    Taylor H, Wilson J, Jones D, Sarpong D, Srinivasan A, Garrison R, Nelson C, Wyatt S. Towards resolution of cardivascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethicity Dis. 2005; 15:S6-4–S6-17.

    Google Scholar 

  15. 15

    Carpenter M, Crow R, Steffes W, Heilbraun J, Evans G, Skelton T, Jensen R, Sarpong D. Laboratory, reading center and coordinating center data management methods in the Jackson Heart Study. Am J Med Sci. 2004; 328:131–44.

    Article  Google Scholar 

  16. 16

    Duncan M, Vasan R, Xanthakis V. Trajectories of blood lipid concentrations over the adult life course and risk of cardiovascular disease and all-cause mortality: observations from the Framingham study over 35 years. J Am Heart Assoc. 2019; 8:e011433.

    CAS  Article  Google Scholar 

  17. 17

    Wilkins J, Ning H, Stone N, Criqui M, Zhao L, Greenland P, LIoyd-Jones D. Coronary heart disease risks associated with high levels of HDL cholesterol. J Am Heart Assoc. 2014; 3:e000519.

    Article  Google Scholar 

  18. 18

    Skretteberg P, Grundvold I, Kjeldsen S, Erikssen J, Sandvik L, Liestøl K, Erikssen G, Pedersen T, Bodegrad J. HDL-cholesterol and prediction of coronary heart disease: Modified by physical fitness? A 28-year follow-up of apparently healthy men. Atherosclerosis. 2012; 220:250–6.

    CAS  Article  Google Scholar 

  19. 19

    Sarwar N, Danesh J, Eiriksdottir G, Sigurdsson G, Wareham N, Bingham S, Boekholdt M, Kay-Tee K. Triglycerides and the risk of coronary heart disease 10 158 incident cases among 262 525 participants in 29 western prospective studies. Circulation. 2007; 115:450–8.

    CAS  Article  Google Scholar 

  20. 20

    Jeong Su-M, Choi S, Kim K, Kim S, Lee J, Park S, Kim Y-Y, Son J, Yun J-M, Park S. Effect of change in total cholesterol levels on cardiovascular disease among young adults. J Am Heart Assoc. 2018; 7:e008819.

    Google Scholar 

  21. 21

    Kassahun-Yimer W, Albert P, Lipsky L, Nansel T, Liu A. A joint model for multivariate hierarchical semicontinuous data with replications. Stat Methods Med Res. 2019; 28:858–70.

    Article  Google Scholar 

  22. 22

    Little RJA, Rubin DB, Statistical analysis with missing data. NJ: John Wiley & Sons, Inc; 2002.

  23. 23

    Allison P. Missing data techniques for structural equation modeling. J Abnorm Psychol. 2003; 112:545–57.

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the Editor and the reviewers for their helpful and constructive comments. The authors also wish to thank the staffs and participants of the JHS. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services.

Funding

The Jackson Heart Study is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute on Minority Health and Health Disparities (NIMHD). The views expressed in this article are those of the authors and do not necessarily represent the views of the NHLBI, the NIH, or the US Department of Health and Human Services.

Author information

Affiliations

Authors

Contributions

WKY performed statistical analysis and drafted the manuscript; KAV participated in the data management and reviewed the manuscript; AAO and MEH contributed in the drafting and reviewed the manuscript; YIM, LCS and PA contributed in the data coordination and reviewed the manuscript; AC was involved in data coordination, drafting and critically reviewed the manuscript. All author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Wondwosen Kassahun-Yimer.

Ethics declarations

Ethics approval and consent to participate

The JHS followed guidelines expressed by the Declaration of Helsinki, and all participants completed written informed consent. The available data contained deidentified information. The JHS was approved by the Institutional Review Boards of Jackson State University, Tougaloo College, and the University of Mississippi Medical Center in Jackson.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kassahun-Yimer, W., Valle, K.A., Oshunbade, A.A. et al. Joint modelling of longitudinal lipids and time to coronary heart disease in the Jackson Heart Study. BMC Med Res Methodol 20, 294 (2020). https://doi.org/10.1186/s12874-020-01177-7

Download citation

Keywords

  • Joint modeling
  • Multivariate longitudinal
  • Survival data
  • Correlated responses