Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study

Background Multivariate longitudinal data are under-utilized for survival analysis compared to cross-sectional data (CS - data collected once across cohort). Particularly in cardiovascular risk prediction, despite available methods of longitudinal data analysis, the value of longitudinal information has not been established in terms of improved predictive accuracy and clinical applicability. Methods We investigated the value of longitudinal data over and above the use of cross-sectional data via 6 distinct modeling strategies from statistics, machine learning, and deep learning that incorporate repeated measures for survival analysis of the time-to-cardiovascular event in the Coronary Artery Risk Development in Young Adults (CARDIA) cohort. We then examined and compared the use of model-specific interpretability methods (Random Survival Forest Variable Importance) and model-agnostic methods (SHapley Additive exPlanation (SHAP) and Temporal Importance Model Explanation (TIME)) in cardiovascular risk prediction using the top-performing models. Results In a cohort of 3539 participants, longitudinal information from 35 variables that were repeatedly collected in 6 exam visits over 15 years improved subsequent long-term (17 years after) risk prediction by up to 8.3% in C-index compared to using baseline data (0.78 vs. 0.72), and up to approximately 4% compared to using the last observed CS data (0.75). Time-varying AUC was also higher in models using longitudinal data (0.86–0.87 at 5 years, 0.79–0.81 at 10 years) than using baseline or last observed CS data (0.80–0.86 at 5 years, 0.73–0.77 at 10 years). Comparative model interpretability analysis revealed the impact of longitudinal variables on model prediction on both the individual and global scales among different modeling strategies, as well as identifying the best time windows and best timing within that window for event prediction. The best strategy to incorporate longitudinal data for accuracy was time series massive feature extraction, and the easiest interpretable strategy was trajectory clustering. Conclusion Our analysis demonstrates the added value of longitudinal data in predictive accuracy and epidemiological utility in cardiovascular risk survival analysis in young adults via a unified, scalable framework that compares model performance and explainability. The framework can be extended to a larger number of variables and other longitudinal modeling methods. Trial registration ClinicalTrials.gov Identifier: NCT00005130, Registration Date: 26/05/2000. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-023-01845-4.


List of supplementary figures and tables
. A list of the variables that were used for prediction in this study

Outcome ascertainment
The CARDIA study outcomes ascertainment protocols have been described in detail elsewhere [1]. For this study, the first CVD event was used as the endpoint [2,3]. We recorded new cardiovascular and cerebrovascular events from the baseline examination through August 2018. During their scheduled study examinations and yearly telephone interviews, each participant or designated proxy was asked about interim hospital admissions, outpatient procedures, and deaths. Designated proxies do not participate in the examination. Medical records were requested for participants who had been hospitalized or received an outpatient revascularization procedure. Vital status was assessed every 6 months; medical and other death records were requested after consent had been obtained from the next of kin. Two physician members of the Committee independently reviewed medical records and recorded information to adjudicate each possible cardiovascular or cerebrovascular event or underlying cause of death using specific definitions and a detailed manual of operations (available online: http://www.cardia.dopm.uab.edu). If disagreement occurred between the primary reviewers, the case was reviewed by the full committee. The primary composite outcome was incident CVD, which included coronary heart disease (CHDmyocardial infarction, acute coronary syndrome, or CHD death, including fatal myocardial infarction), stroke, transient ischemic attack (TIA), hospitalization for heart failure, intervention for peripheral arterial disease, or death from cardiovascular causes. Secondary cause-specific outcomes included stroke/TIA, CHD, and CVD mortality. Participants who died from a non-CVD cause were censored in the survival models at time of death.

Temporal Importance Model Explanation (TIME)
Here, we briefly summarize the algorithm of TIME in layman's terms, for a more detailed technical version please refer to [4]. The underlying working of TIME is its permutation approach.
A typical way of permutation in tabular format is to replace the value of feature j in participant i with another value of j in another participant, then compute the difference between the permuted and baseline losses. The baseline loss is the different between the model output and the target outcome yi, and the permuted loss is the difference between the model output using the permuted input and the target outcome yj. If the permuted loss is significantly greater than the baseline loss on average over many permutations, the feature is deemed important. For the case of longitudinal data however, the typical permutation would be simply replacing the value of feature j at time t in participant i with another value of j at time t in another participant. Doing this would break the temporal dependencies and correlations within the trajectory as noted above. To alleviate this problem, TIME performs joint permutation, which means (1) replacing values of feature j from a time window in participant i with the values in another participant of the same time window, instead of individual time points, and (2) replacing the value of feature j from time k1 with that of feature j from time k2, from the same participant, which enables ordering importance.
As for the time window, TIME searches for the most important time windows W* = [k1, k2] (1<=k1<k2<=L) that most of the effect of permuting lies in W* (L is the length of time series, in this work is 6). TIME does this by searching for the largest possible prior window WP = [1, k1] and subsequent window WS = [k2, L]. TIME initializes Wp to be the first half and WS be the latter half of the series, then perturb WP and WS and observe their importance scores. If the importance score for WP is high, it likely that WP contains important time steps, the search algorithm will shorten the Wp to exclude the important time steps, and if the importance score for WP is low, WP will then be expanded until its importance score is greater than a threshold. This threshold of importance is determined from a user-input localization parameter that specifies the level of importance that the importance window should hold (for example, 90% of the total importance of the whole series). Similar logic is applied to find the subsequent window WS, and the important window W* is what in between WP and WS.
Another attractive feature of TIME is using hypothesis testing with correction for multiple comparisons, using the permutation test [5] to ascertain importance at three levels: overall (global), window, and ordering within the window, for each longitudinal variable. TIME uses a hierarchical false discovery rate control method [6] to address the issue of multiple comparisons in hypothesis testing. Table S1. A list of the variables that were used for prediction in this study. A total of 35 variables/predictors were used, three of which were fixed variables and in italic, the rest were longitudinal (repeating) variables that were repeatedly measured in most (if not all) exams in most participants.  Fig. S1: cumulative incidence of CVD after Y15 (top) and Y5 (bottom). The cumulative incidence could range from 0 to 1 (max). Few incidents happened before Y15, as the curve is relatively flat (very few incidents) before Y15 Exam (10 Years after Y5 Exam). After Y15, the incidence rate is roughly linear.   S4: Time-varying AUC on the test set using Dynamic-DeepHit for dynamic prediction on all participants in CARDIA. The model was trained and validated using 5-fold x 2 times cross-validation. AUC before Y15 is unstable because of the low CVD incidence before Y15.   were treated as independent input variables. RSF-VIMP was used to get the variable importance score for each input variable. All variable importance scores were then normalized between 0-1 and plotted as the z-axis on the heatmap. Variables were ordered along the y-axis based on the averaged importance score across all time points.