Predicting the multi-domain progression of Parkinson’s disease: a Bayesian multivariate generalized linear mixed-effect model

Background It is challenging for current statistical models to predict clinical progression of Parkinson’s disease (PD) because of the involvement of multi-domains and longitudinal data. Methods Past univariate longitudinal or multivariate analyses from cross-sectional trials have limited power to predict individual outcomes or a single moment. The multivariate generalized linear mixed-effect model (GLMM) under the Bayesian framework was proposed to study multi-domain longitudinal outcomes obtained at baseline, 18-, and 36-month. The outcomes included motor, non-motor, and postural instability scores from the MDS-UPDRS, and demographic and standardized clinical data were utilized as covariates. The dynamic prediction was performed for both internal and external subjects using the samples from the posterior distributions of the parameter estimates and random effects, and also the predictive accuracy was evaluated based on the root of mean square error (RMSE), absolute bias (AB) and the area under the receiver operating characteristic (ROC) curve. Results First, our prediction model identified clinical data that were differentially associated with motor, non-motor, and postural stability scores. Second, the predictive accuracy of our model for the training data was assessed, and improved prediction was gained in particularly for non-motor (RMSE and AB: 2.89 and 2.20) compared to univariate analysis (RMSE and AB: 3.04 and 2.35). Third, the individual-level predictions of longitudinal trajectories for the testing data were performed, with ~80% observed values falling within the 95% credible intervals. Conclusions Multivariate general mixed models hold promise to predict clinical progression of individual outcomes in PD. Trial registration The data was obtained from Dr. Xuemei Huang’s NIH grant R01 NS060722, part of NINDS PD Biomarker Program (PDBP). All data was entered within 24 h of collection to the Data Management Repository (DMR), which is publically available (https://pdbp.ninds.nih.gov/data-management).


Background
Parkinson's disease (PD) is an age-related neurodegenerative disorder marked by dopaminergic cell loss in the substantia nigra of the basal ganglia [1,2]. In the US, nearly half a million people are living with PD and the number is expected to increase over the next decade due to aging alone [3]. Although there are effective medical and surgical procedures to treat the disease, it progresses relentlessly and causes both motor and non-motor dysfunction that lead to significant disability and decreases in quality of life [4][5][6].
Developing disease-modifying agents to slow, halt, or reverse disease progression has been a focus of PD research. Testing potential neuroprotective agents, however, is hindered by the heterogeneous clinical presentation and progression of PD. A method that can predict the longitudinal trajectories of motor and nonmotor symptoms based on information that physicians can obtain easily in a clinical setting will be useful for: 1) stratifying patients according to their progression speed for clinical trials [1], which in turn increases the power of studies; 2) consultation regarding PD prognosis in a clinical setting. Limited work exists, however, for dynamic prediction in PD due to the computational challenge in multivariate set-ups [7].
In the past, mixed-effect models were used commonly for longitudinal data analysis [8][9][10], however, these were based only on univariate analyses. Extension from a univariate to a multivariate model is straightforward mathematically under the framework of generalized linear mixed effects models (GLMM) to accommodate different types of longitudinal outcomes (continuous or discrete) by adopting different link functions (e.g., identity, logit, or log) [10,11]. Multivariate GLMM can be fitted for inference simultaneously, incorporating not only the correlation of repeated measures for each outcome within a subject, but also the association of multiple outcomes by utilizing the random effects. This method, however, has limited application for clinical data analysis, particularly in Parkinson's disease for disease progression prediction. Although the extension is relatively easy, the main practical issue is not trivial and there are several complex layers for the computation due to the numerical integration with respect to the random effects and the increasing dimensionality of parameters involved as more longitudinal outcomes are considered ( [7,12]). Several strategies have been proposed to reduce this burden. For instance, a Bayesian technique based on the Markov Chain Monte Carlo (MCMC) algorithm was applied for parameter estimation and inference instead of the likelihood approach due to its flexibility [13,14]. In addition, literature exists regarding proposals for reducing the dimensionality of parameters by considering the realization of a single latent process, which is continuous (called a "latent trait") and represents the unobserved disease severity score that combines information from multivariate longitudinal outcomes [15], and this work has been extended to joint modeling of multivariate longitudinal data and time-to-event outcomes [7]. The limitation of these studies, however, is that the association structure among multiple outcomes is restricted since only a single set of random effects accounts for the interrelationships between them. In addition, some work based on machine learning approaches (i.e., principle component analysis) has emerged for longitudinal data analysis, but has not yet been generalized to multivariate settings with mixed types [12]. Recently, Komárek and Komárková [18] conducted a data-driven clustering analysis based on multivariate continuous and discrete longitudinal data under the Bayesian framework and developed an R package "mixAK" for public usage. To implement multivariate GLMM in predicting chronic disease progression, we adopted this package for analysis because of the reliable and efficient computing performance compared to alternative computing packages/software [16,17].
To ensure our method applicable to a real clinical setting, we obtained clinical data (Common Data Elements) longitudinally from PD subjects recruited from a tertiary movement disorders clinic [18]. First, we defined three outcome measures for our study: motor, non-motor, and postural instability scores. Second, we established a multivariate GLMM for prediction based on demographic and standardized clinical data. Third, we evaluated our model's predictive accuracy and further compared its performance with univariate analyses. Lastly, the individual-level prediction of longitudinal trajectories for multiple outcomes was performed. Of note, the prediction of outcomes and their trajectories has not been investigated rigorously to date. These types of analyses may lead to a better understanding of PD heterogeneity and progression. The subjects enrolled in the study were relatively early in their disease (<10 years since their diagnosis), making them an effective cohort for identifying factors that may be amenable to early, effective treatments that may improve clinical management for PD patients. In addition, this work can interrogate questions that are clinically and transnationally important, particularly predicting clinical progression for individual patients.

Clinical and demographic data for PD longitudinal study
A PD cohort with sample size of 76 was followed longitudinally and completed comprehensive study visits at baseline, 18-, and 36-months (Huang R01 NS060722) [19]. PD patients were from a tertiary movement disorders clinic and were free of major or acute medical issues other than PD. The diagnosis of PD was made by movement disorder specialists according to published criteria [20,21]. Multi-domain clinical scales were obtained at each visit, including the Movement Disorders Society Unified PD Rating Scale (MDS-UPDRS) I-II, Hamilton Depression Rating Scale (HAMD) measuring depression, the Montreal Cognitive Assessment (MoCA) measuring global cognitive function, and the Hoehn and Yahr (HY) Scale. Levodopa equivalent daily dose (LEDD) reflecting dopaminergic drug usage was calculated at each study visit based on published conversion factors [22], and duration of illness (DOI) was defined as the time since PD diagnosis by a medical professional [18]. Besides these clinical measures, important demographic information also was collected at the baseline visit for confounding adjustment. This was a retrospective study, and no power analysis was performed.

Coding data and defining outcomes for PD longitudinal study
First, the predictor variables including time (months) since baseline visit, age at baseline (years), gender (1 = Male; 0 = Female), education (years), DOI (years), HAMD, MoCA, and LEDD were included to build up a prediction model after controlling for medication quantified by LEDD. Of note is that one point was added to the MoCA raw score if the subject's education was less than 13 years, and HAMD and LEDD were standardized before analysis [23].
We considered three major clinical outcomes: 1) the MDS-UPDRS-II score that quantified motor aspects of daily living activities using a 5-point scale (0-4). The total sum score from 13 items was treated as a continuous variable with integer scores ranging from 0 (normal) to 52 (the most severe); 2) MDS-UPDRS-I that quantified non-motor aspects of daily living activities. This also was treated as a continuous variable and used the same scale and range of scores as the MDS-UPDRS-II; 3) imbalance, which is known to mark an important functional disability regarding balance and walking. This was a binary outcome with a value of 1 if the HY scale or item 2.12 from the MDS-UPDRS-II (issues with walking and balance) was > = 3, otherwise the term was scored 0. The reason we chose the MDS-UPDRS-I and II subscales as outcome measures is that they are less rater-and drug-state dependent. In particular, the MDS-UPDRS-II was used to assess PD motor symptoms unlike other studies that focused on the MDS-UPDRS-III because of the recent finding that MDS-UPDRS-II was a better predictor for quality of life than the traditional MDS-UPDRS-III motor scale, which depends heavily on the timing of the exam, the medication status of the subject ("on," "off," or "transitional"), and the rater [24].

Statistical methods
Demographic information was summarized using the mean ± SD for continuous variables and the frequency for categorical variables. The normality assumption was investigated for continuous variables (i.e., MDS-UPDRS-I or II) based on graphical checking (e.g., histogram, QQ-plot) and the Anderson-Darling (AD) test. The pvalues for repeated measures comparisons were obtained from mixed-effect models with random intercepts and a significance level of 0.05.
We randomly split the complete dataset into two sets, the training data and the testing data. Joint modeling of multivariate longitudinal scales was performed using multivariate GLMM because of two main advantages: 1) both continuous and discrete types of outcomes can be analyzed jointly and simultaneously; 2) the correlation among repeated measures and multiple response outcomes can be incorporated into the model. The parameter estimation and inference were obtained from the Bayesian approach by utilizing the package "mixAK" in R software [17]. Then, the individual-level prediction of longitudinal trajectories for each outcome was performed. Note that the prediction methods within the training and testing dataset were different. The details of the statistical model and inference are shown next.
Let D train and D test denote a training dataset with sample size N and a testing dataset with sample size N * , where N may or may not be equal to N * and the two datasets are independent of each other. D train is used to build the prediction model and D test is used to evaluate the prediction for new subjects. For observation i in D train ,we denote Y ikj as the j th measure of the k th type of scale for the i th subject at time point t ij , which could be continuous or discrete, i = 1 , 2 , ⋯ N ; k = 1 , 2 , ⋯ K ; j = 1 , 2 , ⋯ n i . Given Y ikj following a distribution from the exponential family with the dispersion parameter ϕ k , we have the following mean model where X ikj is the p-dimentional covariate for fixed effects with β k as the associated p × 1 vector of parameters, Z ikj is the q-dimentional covariate for random effects with γ ik as the associated q × 1 vector of parameters, g is a monotone link function depending on the type of outcomes (e.g., identity function for continuous outcomes, logit function for binary outcomes, and log function for count outcomes), and the random effects Note that Σ not only takes into account the correlation of repeated measures of each outcome, but also incorporates the association between multiple outcomes. In addition, this is a hierarchically centered GLMM [25], thus X ikj and Z ikj may not contain the same variables due to the identifiability problem. Of note, Y i ⊥ Y j , γ i ⊥ γ j for i ≠ j and given γ ik ,the random variables Y ik1 ; Y ik2 ; ⋯Y ikn i are independent of each other for the i th subject. Given that the parameter vector is Here, we utilize the Bayesian approach based on MCMC for parameter estimation, inference, and prediction. The vague priors are used for all elements in Θ, and M (i.e., M = 2000 after burn in) posterior samples are obtained for the parameters and random effects de- The procedures for prediction are performed as follows: I. Internal prediction for PD subjects using the training dataset For the l th PD subject in the training set D train (l = 1, 2, ⋯N), we aim to predict the measures at a future time point t ′ given the outcome history The prediction can be achieved by plugging in the estimates of the parameters and random effects from the posterior samples. For a continuous outcome, the predicted value based on the m th posterior sample is . Thus, the predicted estimate is shown as beloŵ Similar procedures can be followed for the other types of outcomes.

II. External prediction of PD subjects using the testing dataset
For the l th new PD subject in the testing dataset D test (l = 1, 2, ⋯N * ), we also aim to predict the measures at a future time point t ′ given the outcome history Y l t ′ ð Þ and the covariate history X l t ′ ð Þ defined the same as above, and the expectation can be calculated with respect to the posterior distribution of the parameters f(Θ| D train ). The key issue is to obtain the estimate of random effects [26] as shown below The sample of random effects can be drawn from the above posterior distribution after replacing Θ by Θ (m) , denoted by γ m ð Þ l for m = 1 , 2 , ⋯ M. The procedures to obtain the predictive measureŶ ik t ′ ð Þwill be the same as above.
In addition, univariate longitudinal analyses were conducted with a different specification in Σ, where the correlation among outcomes was assumed to be 0.
Finally, we evaluated the predictive accuracy for continuous outcomes by using the statistical criteria, the root mean square error (RMSE) and the absolute bias (AB), with less values indicating better goodness-of-fit [27,28] and defined by where N is the sample size, and n i is the number of repeated observations for the i th subject. With regards to binary outcomes, we evaluated the predictive accuracy by assessing the discrimination ability using the receiver operating characteristic (ROC) curve with varied threshold on the predictive probabilities. The area under the curve (AUC) can be calculated, and the comparison between the AUCs can be tested based on the DeLong's method [29].

Multivariate GLMM under Bayesian framework for PD longitudinal study
To evaluate the prediction ability for internal (existing) and external (new) subjects, we randomly selected the majority of PD patients (N = 70) as the training data to develop the prediction model and avoid convergence issues, and the remaining PD patients (N = 6) as the testing data due to the relatively small overall sample size. Provided that three outcomes for the i th subject at the j th time visit, MDS-UPDRS-II (denoted by Y i1j ), MDS-UPDRS-I (denoted by Y i2j ), and imbalance status (denoted by Y i3j ), the multiva: where only random intercepts were considered and the mechanism of missing at random is assumed. We ran the MCMC algorithm for 4000 burn-in and 2000 subsequent iterations with 1:10 thinning to get samples from the joint posterior distribution (two sampled chains with different sets of initial values using the function "GLMM_MCMC"). Under default options in the R package of "mixAK," the shift vector and the scale matrix were pre-specified. The prior distributions for the shifted-scaled means and the β parameters were normal distributions, the inverse of the covariance matrix followed a Wishart distribution, and the dispersion parameter ϕ was a gamma distribution, which are noninformative.

Baseline and longitudinal demographic and clinical data
The summary results for all PD subjects are provided in Table 1. For the overall PD cohort, the mean ± SD for age at baseline and education years were 63.2 ± 8.4 and 14.9 ± 2.7 years, respectively. Subjects were 61.8% male, although only 57.4% of the males remained at the 36month follow-up visit. There was less attrition for females, whose rate of return at the 36-month visit was 86.2%. At baseline, the mean ± SD disease duration was 5.1 ± 5.5 years (range 0.1-25.3 years). Two subjects had missing values for item 2.12 on the MDS-UPDRS-I subscale at baseline. Of the remaining 74 subjects, 4 had balance problems at baseline (item 2.12 ≥ 3), 5 had balance problems at the 18-month visit, and 7 did at the 36-month visit. There was a significant trend for MoCA scores to decrease (p-values = 0.03), along with a significant increase in LEDD values (p-value < 0.001), over the 36-month study period. The MDS-UPDRS-I and II scores were not significantly different at the three study visits. Imbalance, however, had a significant trend of improvement over time, with the highest rate at the 18month visit that then decreased at the 36-month visit, possibly due to informative dropouts (e.g., people who have balance issues may have a harder time continuing in the study) or inaccuracy of coding (see more discussion in the limitation section). The subject-level trajectories of the MDS-UPDRS-I and II (treated as continuous variables), and the percentages of imbalance over time (binary variable), were shown in Fig. 1. The normality assumptions were satisfied with pvalues = 0.07 for MDS-UPDRS-I and 0.18 for MDS-UPDRS-II based on the AD tests. Of note is that there was substantial heterogeneity across PD subjects and thus mixed-effect models with subject-level random effects (e.g., a random intercept) were adopted for model fitting.

Impact of demographic and clinical data on outcome measures
The estimation results were summarized by the median and 95% credible intervals from the posterior distribution based on the multivariate GLMM under the Bayesian framework, which are shown in Table 2. The HAMD and DOI were associated positively with the MDS-UPDRS-II motor scores of daily living, whereas MoCA scores were associated negatively. Higher HAMD scores were associated positively with MDS-UPDRS-I, but education was associated negatively with this outcome measure of the non-motor aspects of daily living. Also, there was a significant temporal trend for the risk of imbalance that was consistent with the initial analysis of the data (Table 1). MoCA scores also were associated significantly and negatively with imbalance. Higher LEDD values appeared to significantly improve imbalance but did not affect MDS-UPDRS-I or II subscale scores. Interestingly, there also was a gender difference regarding imbalance, where male subjects were less likely to have imbalance problems. We also investigated the associations between MoCA and various factors including age at baseline, education, gender, DOI, HAMD, and LEDD based on a linear mixed-effect model and found no significant effects except for age at baseline and gender. This implies that the significant effect of MoCA on the outcome measures was not due to collinearity or confounding issues from the other factors (i.e., education, gender, DOI, HAMD, and LEDD). Using the multivariate GLMM method, the correlation estimate between the MDS-UPDRS-II and I was 0.55, which seemed smaller than the empirical estimate of 0.66 reported by He et al. [11], and this could be due to the fact that we adjusted for potential confounding variables in our current model, whereas He et al. did not.

Prediction accuracy of the GLMM and comparison with univariate models
The predictive accuracy, RMSE and AB, were 2.01 and 1.53 for the MDS-UPDRS-II and 2.89 and 2.20 for the MDS-UPDRS-I, respectively, and also ROC-AUC was 0.992 for the imbalance outcome, using the GLMM under the Bayesian framework. For comparison, similar strategies were applied for the data but assuming zero correlations among three types of outcomes, and the RMSE and AB for the MDS-UPDRS-II were 1.98 and 1.52, and 3.04 and 2.35 for the MDS-UPDRS-I, and also ROC-AUC was 0.996 (p = 0.29). It showed that the multivariate GLMM had improved performance in prediction particularly for MDS-UPDRS-I compared to univariate models; however, a larger trial or cohort study is needed for further validation.

Individual-level predictions based on the multivariate GLMM
Lastly, we conducted subject-level predictions based on our multivariate GLMM for the testing data that included 6 subjects. We conducted prediction of each measure for each subject at the 18-month and 36-month visits shown in Fig. 2, and the majority (~80%) of the 95% credible intervals included the true observed values. Table 3 shows the results for one PD subject randomly selected from the testing cohort including the prediction at 18-and 36-month visits using only baseline data, and also at the 36-month visit alone by incorporating both baseline and18-month data. As expected, the inclusion of additional information (e.g., both the baseline and 18month visits) improved the prediction (less bias) and decreased the 95% credible intervals.

Discussion
Similar to many chronic diseases, PD presents with heterogeneous symptoms that may coexist and progress at different rates. Thus, identifying factors to predict disease progression in multiple domains for this population is challenging due to the limited number of longitudinal studies using uniform data collection procedures and few advanced statistical approaches. In the current study, we sought to identify critical factors associated with PD progression while accounting for the heterogeneous and longitudinal nature of motor and non-motor symptoms by using NIH Common Data Elements (Table 1) developed for PD research. We then adopted a joint multivariate GLMM to establish a prediction model for assessing the longitudinal progression of motor and non-motor aspects of daily living activities and balance.
The present results demonstrated that the HAMD and DOI significantly impacted MDS-UPDRS-II subscale scores, with higher HAMD scores and increased DOI leading to higher MDS-UPDRS-II scores. MoCA had a significant negative effect on this outcome measure, indicating that deteriorating cognitive function resulted in increased difficulties with motor aspects of daily living activities. The HAMD and education significantly affected MDS-UPDRS-I subscale scores such that subjects with more severe depression and less education were more likely to report increased difficulties with non-motor aspects of daily living activities. The current findings make intuitive sense in a clinical setting and are consistent with previous literature reporting that disease duration, severe depressive symptoms, and cognitive impairment negatively impact PD patients overall ( [30][31][32][33]). For example, depression previously was reported as a significant predictor of non-motor aspects of daily living activities [32]. In addition, DOI was a significant predictor of motor aspects of daily living activities (i.e., MDS-UPDRS-II) [31].
The negative association of total MoCA scores not only with motor symptoms (MDS-UPDRS-II) but also imbalance is intriguing. In the past, cognitive functions (performance on frontal-executive tasks or global cognitive functions assessed by total MMSE or MoCA scores) were reported to be correlated with symptoms assessed by MDS-UPDRS total scores or sub-scores (i.e., bradykinesia or postural instability) ( [33][34][35][36]). These past studies, however, were based mainly on simple correlation analyses using cross-sectional data with limited variables of interest. The current finding of a significant association between global cognitive function and MDS-UPDRS-II and imbalance using longitudinal multivariate modeling confirms the importance of cognitive ability in quality of life and clinically-important outcomes. Most importantly, the differential contribution of the common clinical variables (DOI, depression, and MoCA) on MDS-UPDRS-II and I provides a foundation for models to predict the individual trajectory of motor and/or nonmotor symptoms.
In our study, we established a prediction model of PD progression based on multivariate longitudinal outcomes by considering important demographic and clinical risk factors, and conducted both internal (the training dataset) and external (the testing dataset) predictions for evaluation. As shown in Fig. 2 and Table 3, the model led to a well-accepted prediction for external individual patients with the majority (~80%) of predictive credible intervals including the observed values. In addition, the bias for prediction of the MDS-UPDRS-II is small; however, the prediction bias at the 36-month visit for the MDS-UPDRS-I is relatively large but tends to get smaller when more information is incorporated (i.e., a combination of the baseline and 18-month data). It is interesting to note that the variability for the MDS-UPDRS-I is larger than that for the MDS-UPDRS-II (i.e., the mean of individual-level standard deviations for MDS-UPDRS-I and II: 3.61 and 2.78) since the MDS-UPDRS-I is a composite score of multiple non-motor symptoms that are more subjective. The current approach introducing a Bayesian framework gains more power for prediction by incorporating the correlations not only among repeated measures, but also among multiple outcomes. Demonstrating the ability of our model to predict these outcomes simultaneously is a novel finding. Compared to univariate longitudinal analysis, our method achieved non-inferior predictive accuracy. In particular, borrowing information from motor symptoms and imbalance can improve the prediction of non-motor symptom progression from using only a subject's own history/observed data. This work has several limitations: 1) at the time of the current data analysis, the data collection time period was up to 36-months and included only three visits, which may have contributed to the lack of a clear, significant, temporal trend for clinical progression. A longitudinal cohort with well-characterized clinical features and measures of progression over a longer follow-up time-frame is important for us to build a prediction  a Each prediction for the 36-month outcome measure was based on baseline data alone b a combination of the baseline and 18-month data model that will increase in accuracy.
2) The drop out rate was >20% from baseline to the 36-month visit, and preferentially male. The differential drop out may influence our results and ability to predict individual outcomes. In addition, although postural instability is a major clinical feature, our outcome measurement consisted only of very limited information from a single source (MDS-UPDRS-II, item 2.12). This limitation may contribute to the insensible finding of "improved balance" over time, and low prediction accuracy in this domain. 3) Identifying the baseline risk factors that can predict PD progression continues to be a work in progress. For the current study, we used some of the features collected under the NIH CDE for PD. Future models should integrate new information as state-ofthe-art biomarker discoveries yield more information (e.g., from fluid and imaging research). 4) The linear temporal trend commonly is considered for the current analysis (see the fitted models above), but this would be less plausible in practice. A flexible strategy can be applied to relax it by adopting a spline regression in a semi-parametric framework but at the expense of more computation load.

Conclusions
In the current study, we applied multivariate generalized linear mixed effect models that incorporated not only the correlation among repeated measurements but also the correlations among multiple outcome variables to predict disease progression. This is the first study to conduct a prediction study of PD progression by considering multiple, longitudinal, and clinically meaningful outcomes (i.e., MDS-UPDRS subscales II, I, and imbalance). Using the real clinical data collected via NIH CDEs, the predictive accuracy of joint multivariate modeling was evaluated and compared to univariate modeling. The non-inferiority of the joint multivariate modeling with regards to bias and RMSE compared to univariate modeling demonstrated the promise of the proposed model for predicting PD progression in a clinical setting and/or for subject selection into clinical trials.