Study design and setting
This was a multicenter, multiregional, longitudinal observational study carried out at 32 centers in 12 countries: Argentina, Australia, Austria, Germany, Spain, France, Israel, Italy, Norway, Turkey, the United Kingdom, and the United States  (Additional file 1: Table S1).
The inclusion criteria were as follows: patients 18 years old or more with relapsing-remitting multiple sclerosis (RR-MS) according to the McDonald criteria [18, 19] with an Expanded Disability Status Scale (EDSS) score lower than 7.0, with or without treatment, followed up as per the local standard of care practices and with a signed informed consent form. Patients suffering from dementia were excluded. All therapeutic decisions during the study were made at the discretion of the treating physician.
Ethics committee and regulatory requirements
This study (ClinicalTrials.gov identifier: NCT00702065) was performed in accordance with the Declaration of Helsinki and all applicable regulatory authority requirements and national laws (Institutional Review Board or Independent Ethics Committee in accordance with the local requirements of each of the 12 countries). Written informed consent from patients was obtained prior to any study procedures.
Evaluation times and data collection
The follow-up measurements took place over 24 months after inclusion. At baseline, sociodemographic (age at inclusion, gender, education level, marital status, employment status) and clinical (disease duration) data were obtained. Neurological disability status was assessed using a neurologist-rated EDSS score . QoL was determined using the MusiQoL and SF-36 questionnaires when patients attended their local neurological clinic. The MusiQoL questionnaire is a self-administered, multi-dimensional, patient-based QoL instrument comprising 31 items that describe nine dimensions (activity of daily living, psychological well-being, relationships with friends, symptoms, relationships with family, relationship with the healthcare system, sentimental and sexual life, coping, and rejection) . MusiQoL provides a global index score, which is calculated as the mean of the individual dimension scores. The SF-36 is composed of 36 items that are used to calculate the following eight scale scores: physical functioning (PF), social functioning (SF), role–physical (RP), role–emotional (RE), mental health (MH), vitality (Vi), bodily pain (BP), and general health (GH) . Two composite summary measures are also calculated: the Physical Component Summary (PCS) and the Mental Component Summary (MCS) scores. The PCS and MCS scores are norm-based, using a linear T-score transformation with a mean (standard deviation [SD]) of 50 ). Both the MusiQoL and SF-36 yield scores on a 0–100 scale, in which 0 represents the lowest and 100 the highest QoL.
Every 6 months up to month 24, the EDSS and QoL were recorded: at baseline (M0), 6 months (M6), 12 months (M12), 18 months (M18), and 24 months post-inclusion (M24).
Definition of disability deterioration
At 24 months, individuals were divided into two ‘disability change’ groups according to the following neurological standards [23, 24]: 1. worsened patients experienced clinically meaningful worsening in the EDSS is defined as an increase of one point if the EDSS was less than 5.5, or by half a point if the EDSS was between 5.5 and 7.0, between the baseline and 24-month EDSS scores; 2. not-worsened patients comprised all other cases.
The not-worsened group was used as a control group in the analysis under the assumption that they were not prone to response shifts in perceived QoL.
Classification and regression trees
The Classification and Regression Trees (CART) method  is a binary splitting method that recursively partitions the data set into disjoint subgroups, called the leafs. It uses two algorithms. The first algorithm iteratively splits the data set into two sub-samples according to a binary rule such as “PCS < 50”. The splitting rule is based on one of the explanatory variables and on a threshold for this variable. It is chosen in such a way as to minimize the heterogeneity of the obtained subsamples for a continuous outcome. Regression trees are constructed using the “deviance” criterion.
The two obtained sub-samples are then recursively partitioned in the same way until there are too few observations (usually five) in the obtained samples (other stopping rules are available). This procedure yields a tree that may have too many terminal nodes. The mean value of the output variable is assigned to each leaf, computed over the observations within the corresponding region.
To avoid overfitting the data when using this tree, a pruning algorithm is used to select an optimal sub-tree.
The random forest method
Random Forests  is an ensemble method that aggregates K trees similar to the ones constructed with CART, each one grown using a bootstrap sample of the original data set. Each tree in the forest uses only a subset of the explanatory variables at each node. The trees are not pruned. The prediction given by an RF is the mean of the predictions given by the K trees in the forest when using regression trees.
As the trees in the forest are developed using bootstrap samples of the original data set, the Out-of-Bag (OOB) samples are used as test samples. The performance of each tree is computed over the corresponding OOB sample. The observations of each variable in the OOB sample are randomly permuted, and the trees’ performance is computed over the perturbed OOB samples. A variable's importance (VI) is defined as the mean relative decrease in the trees’ performance when the observations of this variable in the OOB sample are randomly permuted. To obtain more stable assessments of each VI, we run the RF K=300 times and use the average VI over the K runs.
Detecting response shift reprioritization with random forest
We investigated the importance of different explanatory variables in the global MusiQoL index forecast. To do this, we calculated the VI by the RF method based on two models.
(2) is more refined than M
(1). We adjusted these two models separately for the worsened group and the not-worsened group at each moment t=0,…,4. In this way, we obtained the average of VI (AVI) that evolved with time for each explanatory variable . We compared the evolution of AVI for each variable in the two groups. Crossing curves were considered an effect of reprioritization.
To control the difference in baseline EDSS scores between the worsened and not-worsened groups, supplementary analyses were performed on baseline EDSS score-matched groups (100 worsened patients and 100 not-worsened patients).