Prediction of two month modified Rankin Scale with an ordinal prediction model in patients with aneurysmal subarachnoid haemorrhage

Background Aneurysmal subarachnoid haemorrhage (aSAH) is a devastating event with a frequently disabling outcome. Our aim was to develop a prognostic model to predict an ordinal clinical outcome at two months in patients with aSAH. Methods We studied patients enrolled in the International Subarachnoid Aneurysm Trial (ISAT), a randomized multicentre trial to compare coiling and clipping in aSAH patients. Several models were explored to estimate a patient's outcome according to the modified Rankin Scale (mRS) at two months after aSAH. Our final model was validated internally with bootstrapping techniques. Results The study population comprised of 2,128 patients of whom 159 patients died within 2 months (8%). Multivariable proportional odds analysis identified World Federation of Neurosurgical Societies (WFNS) grade as the most important predictor, followed by age, sex, lumen size of the aneurysm, Fisher grade, vasospasm on angiography, and treatment modality. The model discriminated moderately between those with poor and good mRS scores (c statistic = 0.65), with minor optimism according to bootstrap re-sampling (optimism corrected c statistic = 0.64). Conclusion We presented a calibrated and internally validated ordinal prognostic model to predict two month mRS in aSAH patients who survived the early stage up till a treatment decision. Although generalizability of the model is limited due to the selected population in which it was developed, this model could eventually be used to support clinical decision making after external validation. Trial Registration International Standard Randomised Controlled Trial, Number ISRCTN49866681


Conclusion:
We presented a calibrated and internally validated ordinal prognostic model to predict two month mRS in aSAH patients who survived the early stage up till a treatment decision. Although generalizability of the model is limited due to the selected population in which it was developed, this model could eventually be used to support clinical decision making after external validation.
Trial Registration: International Standard Randomised Controlled Trial, Number ISRCTN49866681.

Background
Prediction research typically aims to predict outcome of individual patients after the onset of a certain disease, using prognostic models. These models, preferably based on data directly available at hospital admission, are essential to support clinical decision making, and to facilitate reliable comparison of outcomes between different patient series and variation in results over time. Furthermore, prognostic models have an important role in randomized controlled trials (RCT), for stratification [1] and statistical analyses that explicitly consider prognostic information, such as covariate adjustment [2,3], and may provide realistic and evidence-based expectations to relatives.
The majority of published prognostic models predicts a binary outcome, such as case-fatality using binary logistic regression [4][5][6][7]. Also, outcomes at ordinal scales are often considered as a dichotomized variable. However, there are several objections against collapsing an ordinal outcome scale into a binary one. First, the cut off for dichotomisation is arbitrary and may vary over studies in a single medical field [4,5,7]. Secondly, from a statistical perspective dichotomisation is a waste of information and reduces statistical power for the analysis of treatment effects or other covariates of interest [8,9]. Furthermore, from a clinical point of view dichotomisation may lead to less useful models. For example, for a patient with a minor stroke a model predicting survival versus mortality is of limited value since the risk is low, while a prediction of complete recovery versus some remaining symptoms may be very useful. For a patient with a severe stroke, this will be the other way around.
An alternative for dichotomisation is application of a statistical approach that uses the full ordinal outcome scale. This leads to efficient use of the data and clinically relevant predictions. Several of these approaches for modelling ordinal response variables have been proposed, including proportional odds (PO) logistic regression, multinomial (or polytomous) logistic regression, or simple linear regression [10]. Each of these methods has its pros and cons.
Our aim was to develop an ordinal prognostic model to predict clinical outcome at two months in patients with aneurysmal subarachnoid haemorrhage (aSAH), based on clinical features and neuro-imaging which are regularly readily available on admission to a neurological or neurosurgical unit. SAH is a devastating event, causing substantial mortality. In 85% of the patients, the SAH is caused by rupture of an aneurysm (aSAH) [11,12]. Of those who survive the first month, approximately one third remains dependent with respect to daily activities during the remaining lifetime [11]. Also amongst patients who regain independency, quality of life remains reduced [13]. A frequently used outcome measurement is the modified Rankin Scale (mRS) [14]. This is an ordered scale for measuring motor function and runs from 0 (no symptoms at all) to 6 (dead) (table 1).

Patients
Data were collected prospectively by the Medical Research Council funded International Subarachnoid Aneurysm Trial (ISAT) (International Standard Randomised Controlled Trial, Number ISRCTN49866681). All centres obtained local ethics or institutional review board consent before enrolling patients (see Appendix 1). Able patients provided written informed consent. However, some ethics committees allowed assent from relatives to enable patients who could not give their own written consent to be enrolled in the trial. Full details of ISAT are available elsewhere [15]. The aim of the trial was to determine whether treatment using endovascular coiling reduced the risk of patients being dependent or dead at one year by 25 percent (as defined by modified Rankin Scale grade 3-6) when compared with neurosurgical treatment (clipping) for that cohort of patients.

Predictors and outcome
We considered all patient characteristics that could be obtained easily and reliably within the first hours after hospital admission and that were also present in the ISAT database. These included age, gender, previous occurrence of SAH, CT scan Fisher grading, World Federation of Neurosurgical Societies (WFNS) grading, number of intracranial aneurysms, location of the aneurysm, maximum lumen size of the aneurysm, vasospasm on angiography, and intended treatment at randomization. Fisher grading of blood visible on a plain CT scan runs from grade 1 ("no blood visible") up to grade 4 ("intraventricular or intraparenchymal blood"). WFNS scale runs from grade 1 ("Glasgow Coma Scale (GCS) 15 and no motor deficit") to grade 5 ("GCS 3-6 with or without motor deficit"). One additional grade was created in ISAT for those in whom WFNS could not be assessed; 'grade 6'. The number of aneurysms was dichotomized into one or more than one intracranial aneurysms. Four aneurysm locations were distinguished: Anterior Cerebral Artery (ACA), Internal Carotid Artery (ICA), Middle Cerebral Artery (MCA), and Posterior Circulation (PC). The maximum lumen size of the aneurysm was expressed in millimetres. Vasospasm was examined on angiography and dichotomized into 'absent' or 'present'. Treatment was either neurosurgical clipping or endovascular coiling; we used treatment as allocated by the randomization procedure. The outcome measure in our study was the modified Rankin Scale (mRS) at two months (table 1) [14].

Model
We started the development of the model discarding patients without information on outcome. The few missing values in predictors were imputed by means of single imputation (SI, in R language: aregImpute, n. impute = 1, type = 'pmm').
A simple approach to analyze an ordinal outcome, such as the mRS, is to dichotomize the outcome variable Slight disability; unable to carry out all previous activities, but able to look after own affairs without assistance 3 Moderate disability; requiring some help, but able to walk without assistance 4 Moderately severe disability; unable to walk without assistance and unable to attend to own bodily needs without assistance 5 Severe disability; bedridden, incontinent and requiring constant nursing care and attention 6 Dead by one of several possible cut off points, e.g. 01 vs. 23456 [5], 012 vs. 3456 [4,15], 0123 vs. 456 [7], and 012345 vs. 6 (case-fatality) [6]. We applied binary logistic regression to develop models for these dichotomized responses. Next, we addressed the two main aspects of our ordinal outcome; the fact that it contains order and separate categories. A simple solution for modelling order, while neglecting the categorised nature of our outcome, is to apply linear regression using ordinary least squares. For the opposite -modelling categories, while neglecting order -we used multinomial regression. A more sophisticated approach is to use a proportional odds (PO) model. Such a model takes both order and separate categories into account. The PO logistic model is a rather straightforward extension of binary logistic regression [16]. A common set of regression coefficients is assumed across all levels of the outcome, and intercepts are estimated for each level. The advantage of the PO model is its parsimony in dealing with an ordered outcome. The price we pay is the assumption of proportionality of the odds. This assumption is equivalent to saying that any cut-point on the outcome scale would lead to the same (binary) logistic regression coefficient [10]. We inspected proportionality by studying the univariate odds ratios for each cut off for each predictor. We plotted the score residuals of binary logistic models for each potential predictor separately. The trend of the score components against the levels of the outcome scale should be flat if the proportional odds assumption holds [17]. When the PO assumption is not fulfilled for all potential predictors, we could also investigate a further alternative model: the partial PO model [18].
The association between predictors and outcome is expressed as odds ratios (OR). Predictors have statistically significant effects when the 95% confidence interval does not include the value one.
A multivariable PO model was developed containing predictors that met Akaike's Information Criterion (AIC) in a backward stepwise procedure [19]. AIC compares models based on how well they fit the data, but penalizes for the complexity of the model. AIC requires that the increase in model χ 2 when entering a new predictor has to be larger than two times the degrees of freedom: χ 2 >2 df. When considering a predictor with 1 df, such as gender, this implies that χ 2 has to exceed 2, equivalent to p < 0.157. When considering a predictor with 2 df, χ 2 >4, or p < 0.135; and in case of 4 df, χ 2 >8, or p < 0.092 [10].

Performance
The performance of the final PO model was assessed with respect to calibration and discrimination. Calibration is the ability of the model to produce unbiased estimates of the probability of the outcome. Calibration was tested with a goodness of fit test, which assesses agreement between predicted and observed risks over the full range of predicted probabilities. Discrimination is the model's ability to separate patients with different outcomes. To quantify the discrimination, we used the c statistic. A model with a c statistic of 0.5 has no discriminative power at all, for example a coin flip. A c statistic of 1.0 reflects perfect discrimination.

Model validation
The performance of a prediction model is generally worse in new patients then initially expected. This 'optimism' of the original model can be studied with internal validation techniques [10]. Internal validity of the models was assessed with standard bootstrapping procedures. Bootstrapping involves drawing samples of patients with replacement from the development population. Each sample can be considered as if one is repeating the data collection with the same number of patients and under identical circumstances as the original. Regression models were estimated in 300 bootstrap samples. Each of these 300 models was evaluated on the original sample. The average difference in the c statistic was determined to indicate the optimism in the initially estimated discriminative ability [10]. A shrinkage factor was estimated from the bootstrap validation procedure and we shrunk the regression coefficients to provide better predictions for future patients [10].
All statistical analyses were performed using R software, version 2.8.1 (R Foundation for Statistical Computing, Vienna, Austria).

Results
A total of 2,143 patients were recruited to the ISAT trial by 43 neurosurgical centres, mainly in Europe. We excluded 15 patients with missing information on the two month mRS. Fisher grade of 14 patients was not available and in one patient no information on vasospasm was available. We statistically imputed these missing values, leaving 2,128 patients for analysis, of whom 347 were in mRS grade 0 (16%), 583 in mRS grade 1 (27%), 528 in mRS grade 2 (25%), 296 in mRS grade 3 (14%), 80 in mRS grade 4 (4%), 135 in mRS grade 5 (6%) at the two month assessment, and of whom 159 (8%) died before the two month assessment.
Univariate analyses in the binary models for different cut offs, the PO model, and the linear regression model are presented in table 2. The ORs for each cut off were reasonably similar except for previous SAH (fu_prevhaem) and Fisher grade 2 (Fisher = 2). This violation of the PO assumption is also noted by statistically significant deviations from the horizontal line in figure 1. The linear regression coefficients were surprisingly close to the ORs from the PO model. The multinomial model yielded 108 coefficients, apart from 6 intercepts (not shown). In a partial PO model 6 intercepts were fitted, 6 coefficients for previous SAH, 18 coefficients for Fisher grade, and one for each of the other predictors (not shown).
For the sake of interpretability and clinical usefulness, we chose to accept the violation of the PO assumption in the PO model. Age and WFNS grade were the most important predictors in the multivariable PO model (table 3). Other statistically significant predictors were sex, lumen size, Fisher grade, vasospasm, and treatment modality.
The goodness of fit test yielded a p-value smaller then 0.05 for all levels of mRS, suggesting that the model poorly fitted the data in which it was developed. In our final model the PO assumption was violated only for Fisher grade 2 ( figure 2). The c statistic of the final model was 0.65 (optimism-corrected: 0.64). Details of the model are described in Appendix 2.

Discussion
We developed and validated a prognostic proportional odds model to predict the risk of two month modified Rankin Scale in individual patients after aneurysmal subarachnoid haemorrhage. Predictions were based on characteristics that were collected in a large clinical trial and that are regularly readily available on admission to a neurological or neurosurgical unit. The c statistic was modest, indicating a mediocre ability to predict clinical outcome at the two month assessment.
The dependence of our proportional odds model on the assumption of proportionality should not be overstressed. The potential inaccuracy caused by mild violation of the PO assumption is likely less severe than would be the case in arbitrary dichotomisation of an ordinal outcome. Dichotomisation involves more loss of information [20]. Probably one would prefer a "wrong, but useful" model, despite possibly violating some underlying model assumptions [21]. Moreover, the PO model predicts the probability of being in each mRS level for each individual patient. This makes the model useful for all patients, regardless of severity.
Besides the PO model we explored several other models. The ordinary least squares model seemed to perform quite well (see table 3). Although the categorical nature of the outcome variable is neglected, the model seems to perform reasonable and may yield estimates of regression coefficients that are quite similar to the PO model. This model might suffice to gain insight in which predictors play an important role in this clinical question. On the contrary, the multinomial model andto a lesser extent -the partial PO models suffer from highly limited interpretability and therefore usability. A plethora of coefficients is produced by these models. If one is very specifically interested in one outcome grade, the model might be of some use. In most cases, we consider a more pragmatic approach however preferable. There are many more potentially useful modelling techniques for ordinal outcomes [22,23]. One such technique is the continuation ratio (CR) model, which has been said to be likely to fit ordinal responses when subjects have to 'pass through' one category to get to the next. For a worked example see a tutorial by Harrell et al. [17].
Several limitations of our analyses should be acknowledged. This study used data from one large trial on a selected population of patients in equipoise regarding treatment with either endovascular coiling or neurosurgical clipping, which limits generalizability. Nonetheless, according to a recently published paper, the ISAT population proved to be quite similar to the population admitted with an aSAH to neurosurgical units in the United Kingdom [24]. The model may perform well in the development sample, but poorly when applied to other groups of patients, for example, a less strictly selected one. Validation of a prognostic model in independent patient series is considered an essential next step [25]. However since large samples of systematically collected data on aSAH are sparse, assessment of external validity is difficult. For now the generalizability and overall validity of our model remains to be established. This will be a topic of future research. Although our model represents knowledge obtained from 2,128 SAH patients, predictions for individual aSAH patients are always subject to uncertainty. The model makes certain structural assumptions and statistical interaction terms were not included. Hence, it is possible that specific patterns of risk factors are inadequately reflected in the model predictions. Therefore, predictions should be regarded with care and not directly be applied for treatment limiting decisions.
The modest performance of the presented model might potentially be improved by including neuro-imaging biomarkers other than lumen size, location, Fisher grade on plain CT scan, and vasospasm on angiography. Biomarkers regarding anatomy and morphology might be considered, as well as aneurysm characteristics obtained from three and four dimensional angiography [26,27]. Performance may also be improved by inclusion of subsequent information obtained after admission, including temporal course, neuro-imaging at later time points, eventual rebleeding of the aneurysm, delayed ischemic deficit, and other parameters such as hydrocephalus. The objective of the present study, however, was to investigate prognostic models that predict two month mRS with predictors available at admission.
For scientific purposes, we chose to present the final ordinal model as a formula. To increase usability of the model in clinical practice, it could eventually also be presented as a score chart, giving points for the presence of each predictor. The predicted probabilities for each mRS level corresponding to a certain score can subsequently be read from a score plot. Another possibility is to present the model in an Excel sheet or e.g. as a PDA application.

Conclusion
We presented a calibrated and internally validated ordinal prognostic model for predicting two month outcome after aSAH. Although generalizability of the model is limited due to the selected population in which it was developed, this model could eventually be used to support clinical decision making after external validation in a clinical setting.