- Open Access
Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms
BMC Medical Research Methodology volume 23, Article number: 101 (2023)
Trauma is one of the most critical public health issues worldwide, leading to death and disability and influencing all age groups. Therefore, there is great interest in models for predicting mortality in trauma patients admitted to the ICU. The main objective of the present study is to develop and evaluate SMOTE-based machine-learning tools for predicting hospital mortality in trauma patients with imbalanced data.
This retrospective cohort study was conducted on 126 trauma patients admitted to an intensive care unit at Besat hospital in Hamadan Province, western Iran, from March 2020 to March 2021. Data were extracted from the medical information records of patients. According to the imbalanced property of the data, SMOTE techniques, namely SMOTE, Borderline-SMOTE1, Borderline-SMOTE2, SMOTE-NC, and SVM-SMOTE, were used for primary preprocessing. Then, the Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Artificial Neural Network (ANN), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost) methods were used to predict patients' hospital mortality with traumatic injuries. The performance of the methods used was evaluated by sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), accuracy, Area Under the Curve (AUC), Geometric Mean (G-means), F1 score, and P-value of McNemar's test.
Of the 126 patients admitted to an ICU, 117 (92.9%) survived and 9 (7.1%) died. The mean follow-up time from the date of trauma to the date of outcome was 3.98 ± 4.65 days. The performance of ML algorithms is not good with imbalanced data, whereas the performance of SMOTE-based ML algorithms is significantly improved. The mean area under the ROC curve (AUC) of all SMOTE-based models was more than 91%. F1-score and G-means before balancing the dataset were below 70% for all ML models except ANN. In contrast, F1-score and G-means for the balanced datasets reached more than 90% for all SMOTE-based models. Among all SMOTE-based ML methods, RF and ANN based on SMOTE and XGBoost based on SMOTE-NC achieved the highest value for all evaluation criteria.
This study has shown that SMOTE-based ML algorithms better predict outcomes in traumatic injuries than ML algorithms. They have the potential to assist ICU physicians in making clinical decisions.
Trauma is one of the world's most critical public health issues, leading to death and disability and influencing all age groups . Traumatic injuries are the leading cause of mortality in the first four decades of life . Trauma causes 4.4 million deaths annually and accounts for almost 8% of all deaths worldwide [1, 3]. In this regard, it is important to find solutions to reduce the impact of traumatic injuries and the number of deaths resulting from trauma. For example, improving the ability to predict the outcome of a trauma patient with a high degree of accuracy and identifying important factors that influence the patient's outcome can assist medical trauma teams in their rapid efforts to treat trauma patients.
Many previous studies have used traditional methods such as the logistic and Poisson regression models to identify factors that influence traumatic injuries [4,5,6]. Numerous studies have also used the Trauma and Injury Severity Score (TRISS) as one of the most common models, which is based on logistic regression (LR) and uses a small cohort from a single center to predict the probability of survival of patients with traumatic injuries . However, the TRISS and its various modifications are evidence-based tools, and the results of some studies indicate that they may mislead physicians by misclassifying the patient's condition . Nevertheless, both categories of models performed poorly when collinearity, heteroskedasticity, higher order interactions, and nonlinear relationships among variables were present [9,10,11]. Hence, more valuable and accurate prognostic tools that are not limited to these assumptions are needed to achieve better patient outcomes and make the best use of resources.
In recent decades, methods based on machine learning algorithms have been developed whose main advantage is that they overcome the problems of classical methods [12, 13]. Recently, various ML methods have been used to predict outcomes in medical research, especially in trauma [14,15,16,17,18,19]. In addition, several studies have compared the performance of ML methods with evidence-based and regression models such as TRISS for predicting mortality in trauma patients [11, 17].
However, ML algorithms may be inappropriate when they encounter imbalanced data. An imbalanced data set is common in medical data. It occurs when there are many more instances of one class (majority class) than the other class (minority class). In such cases, the predictive ability of the classifiers is impaired because they are biased towards the majority classes and misclassify the minority class instances. Consequently, the classifiers provide high predictive accuracy for the majority class. Therefore, if the data are imbalanced, the criterion of accuracy is not suitable to evaluate the performance of the classifiers. Although, the minority class is often the main class that researchers want to predict with higher accuracy [20,21,22]. Nevertheless, the problem of imbalanced data is critical, but investigations have shown that less attention has been paid to this problem in recent studies. For trauma, the data are generally unbalanced. Nevertheless, the results of a recent systematic review in this area show that most studies support the benefits of ML models . However, the sensitivity–specificity gap values showed a wide range (0.035 to 0.927), highlighting the risk of imbalanced data [10, 23].
There are several methods to deal with the imbalanced class, such as resampling data by oversampling or under-sampling, increasing the cost of the minority class classification error, or learning only one class [21, 24, 25]. The synthetic minority oversampling (SMOTE) method proposed by Chawla et al. is the first model in the SMOTE family to be widely used in imbalance problems . Over time, many SMOTE algorithms have been proposed, such as borderline SMOTE, ADASYN, SMOTE-NC, and SVM-SMOTE .
To the best of our knowledge, most of the studies conducted have evaluated the performance of SMOTE techniques using simulated data and publicly available data [27,28,29]. Moreover, few studies have used these techniques in trauma, and there is no study that has addressed in depth the prediction of traumatic injury in Iran. In this work, five SMOTE methods, such as SMOTE, Borderline-SMOTE1, Borderline-SMOTE2, SMOTE-NC, and SVM-SMOTE, were used to balance imbalanced datasets. We selected these methods among the numerous SMOTE variants because they belong to the category of data-level techniques that can be flexibly combined with other methods and are easier to use compared to algorithm-level approaches. Moreover, these methods are more adaptable since their application does not depend on the chosen classifier. They are also the most commonly used resampling methods in the literature [21, 26,27,28].
Therefore, the main objective of this study is to comprehensively compare the performance of six ML algorithms, namely DT, RF, NB, ANN, XGBoost, and SVM, based on five techniques of the SOMT family for predicting hospital mortality in patients with traumatic injuries. In addition, identify important variables in predicting hospital mortality in patients with traumatic injuries was referred to the Besat hospital of Hamadan city from—March 2020 to—March 2021.
Materials and methods
Data collection and preparation
The present study was a retrospective cohort study conducted on 126 trauma patients. These patients were admitted to an intensive care unit at the Besat hospital of Hamadan province, in the west of Iran, from—March 2020 to—March 2021. The data were extracted from the patients’ medical records. Our focus was on the information about trauma patients' status (alive/dead) as a response and related risk factors to trauma. Patients were followed up from the time they entered the ICU until death or discharge, and the mean follow-up time from the date of trauma to the date of outcome was 3.98 days. We chose six risk factors associated with trauma outcome including, age, sex (male, female), type of trauma (blunt, penetrating), location of injuries (head and neck, thorax, abdomen and pelvic, spinal, extremities, multi-injuries), Glasgow coma scale (severe, moderate, minor) and white blood cells (k /mm3) to evaluate the performance of ML methods.
Decision Tree is one of the easiest and popular algorithms for classification and regression problems. The main goal of the DT is to construct a model that can predict the value of a target variable by learning simple decision rules deduced from the data features. Nodes and branches are the two main components of a DT model. The three essential steps in making a DT model are division, stopping, and pruning. The tree's making starts with all training data in the first node. Then, the first partition splits the data into two or more daughter nodes based on a predictor variable .
DT contains three types of nodes. (a) A root node or decision node indicates a decision that will result in the subdivision of all features into two or more mutually exclusive subsets. This node has no input branch, and the number of its output branches can be zero or more. (b) Internal nodes indicate one of the possible selects available in the tree structure; the Input branch of the node is linked to its parent node, and the output branch of the node is linked to its child nodes or leaf nodes. (c) Leaf nodes or terminal nodes indicate the final conclusion of a combination of decisions or events. These have one input branch and no output branch .
The benefit of DT contains simplicity in interpretation, the facility to handle categorical and quantitative values, the ability to fill missing values in features with the most probable value, and robustness to outliers. The main drawback of the decision tree is that it can be exposed to overfitting and under-fitting, especially when using a small data set .
The RF method was first proposed by Leo Breiman . This algorithm is an ensemble learning method used widely in classification and regression problems. It produces a large number of decision trees from subsamples of the dataset. Each decision tree will generate an output. Then the final output is obtained based on majority votes for classification and the average for regression. At first, in this algorithm, bootstrap samples were drawn through the resampling of the original data. Approximately 37% of the data is excluded from each bootstrap sample, named out-of-bag or OOB data. Afterward, for each of the bootstrap samples, RF will create an unpruned tree as follow: At each tree node, some variables were randomly picked from all variables, and then picked the best split from among those variables. All the decision data created from the bootstrap samples are compounded and analyzed to gain the final RF model [13, 33].
The performance of the random forest can be estimated by its internal validation using the OOB data. For classification issues, the RF's classification error rate, which is named out-of-bag (OOB) error will be calculated from OOB data. Each bootstrap iteration will be predicted using the tree grown with the bootstrap sample for the OOB data. Then will be cumulated the OOB predictions and computed the error rate or OOB error . A benefit of the OOB error is that original data is used for its estimation and the other benefit of using it is high computational speed . Many studies represent that the RF algorithm compared with other ML algorithms has higher stability, robustness and high classification performance. Also, it can preserve high classification performance when missing data exist . Another property of the RF method is the generation of prediction rules. This method can identify essential variables .
The NB classifier is a simple algorithm that applies the famous Bayes’ theorem with strong independence assumptions. Indeed, the NB classifier supposes that all predictor variables are conditionally independent of one another. NB method looks for a clear, simple, and very quick classifier. NB classification model categorized samples by computing the probability that an object belongs to a specific category. Due to the Bayesian formula, the posterior probability is computed according to the prior probability of an object, and the class with the maximum posterior probability is chosen as the object's class. Easy implementation, good performance, working with little training data and making probabilistic predictions are advantages NB method. Also, it is not sensitive to unrelated features. In addition, NB executes well, even when the independence assumption is violated. However, it is computationally intensive, especially for models involving many variables [15, 32].
Artificial neural network
An artificial neural network inspired by the operation of neurons in the human brain is a machine learning method widely used that performs mightily in classification and pattern identification. The learning process in this method performs via gathering information by detecting patterns and relationships in data and learning through experience. A multilayer feed-forward neural network consists of an input layer, one or more hidden layers, and an output layer. The hidden layer is intermediate between the input and output layers, and the number is commonly specified with the cross-validation method. Each layer is made up of units called neurons (nodes). The neurons in the two adjacent layers are fully connected in which each connection has a weight associated with it, while the neurons inside the same layer are not connected. In the feed-forward neural network, information proceeds unidirectionally. Information traverses from the input layer neurons and transits from the hidden layer's neurons to the output neurons. Furthermore, in a neural network, complex non-linear mappings between input and output are taught by activation functions [13, 32]. In this study, we used the sigmoid activation function because it is a non-linear activation function usually used before the output layer in binary classification.
Support vector machine
The SVM is based on statistical learning theory and was first suggested by Vapnik . The main aim of SVM is to find a particular linear model that maximizes hyper-plane margin. Maximizing the hyper-plane margin will maximize the distance between classes. The nearest training points to the maximum cloud margin are the support vectors. Hence, classification is performed by mapping a vector of variables into a high-dimensional plane by maximizing the margin between two data classes. The SVM algorithm can classify both linear and nonlinear observations. When data are not linearly separable, SVM using a kernel function transforms nonlinear input to a linear state in high-dimensional feature space and carries out the linear separation in this new space. In order to do this, several kernel functions have been proposed and adopted for SVM, such as linear, radial, polynomials, and sigmoid . Selecting the kernel function in the SVM makes it a flexible method . In the present study, we employed the radial basis kernel function for its better performance.
Extreme gradient boosting
XGBoost algorithm has gradient boosting at its core but is an enhanced version of the gradient-boosted decision tree algorithm. This algorithm is a scalable tree-boosting system to overcome long learning times, and Chen and Guestrin developed the overfitting of traditional boosting algorithms in 2016 . XGBoost classifier synthesizes a weak base classifier with a robust classifier. A base classifier’s residual error is utilized in the next classifier to optimize the objective function at each stepwise of the training process . Moreover, this algorithm can restrict overfitting, decrease classification errors, handle the missing values and minimize learning times while developing the final model .
Machine learning models have great potential in prediction and classification. However, understanding the complexity of the predictive models' results is slightly complicated, which is a barrier to the admission of ML models. Hence to overcome this problem, Lundberg and Lee proposed a novel Shapley additive explanations (SHAP) approach for interpreting predictions for different techniques, including XGBoost. It helps us to describe the prediction of a specific input by calculating the impact of each feature on the prediction. SHAP values obtain interpretability through summary plots and the global importance of the variable .
Synthetic Minority Over-Sampling Technique (SMOTE)
The imbalanced dataset classification problem occurs when the number of instances of one class is greater than that of the other class. In classification problems with two classes, the class with more specimens is named the majority class, and the class with a smaller number of specimens is called the minority class . The level of class imbalance of a dataset is measured by the imbalance ratio (IR). The IR is defined as the ratio of the number of samples in the majority class to the number of samples in the minority class. The higher the IR, the greater the imbalance . In such cases, reporting the prediction accuracy as an evaluation criterion is inappropriate, as this usually leads to a bias in favor of the majority class .
Two main approaches have been proposed to solve the class imbalance problem: a data-level approach and an algorithm-based approach. The data-level approach aims to change or modify the class distribution in the dataset before training a classifier, which is usually done in the preprocessing phase. The algorithm-level approach focuses on improving the current classifier by adapting the algorithms to learn minority classes .
The data-level approach is usually preferred and proposed to deal with unbalanced classes in classification problems. This could be due to the fact that the class composition of the data can be adjusted to a "relatively balanced" ratio by adding or removing any number of class instances in the data set, depending on the situation .
Other reasons that can be given are: 1) The samples generated by these methods represent the right trade-off between introducing variance and approximating the original distribution. 2) These techniques are easier to apply compared to algorithm-level methods because the datasets are cleaned before they are used to train different classifiers. 3) Data-level techniques can be flexibly combined with other methods [26,27,28].
Re-sampling or data synthesis is the most popular method of processing unbalanced datasets used for data-level approach. The re-sampling approach can be divided into three categories, (i) over-sampling (ii) under-sampling (iii) hybrid sampling . In over-sampling, the weight of the minority class is increased by repeating or generating new samples of the minority class. Under-sampling randomly deletes instances from the majority class to balance with the minority class. Hybrid sampling combines these two methods to take advantage of the benefits and drawbacks of both approaches . The over-sampling approach is generally applied more frequently than other approaches. This approach is called SMOTE family and a collection of numerous over-sampling techniques (85 variants) evolved from SMOTE . One of the first Over-sampling methods, SMOTE, is a powerful tool for dealing with imbalanced data sets suggested by Chawla et al. . SMOTE is an oversampling technique that generates synthetic data for a minority class based on its k-nearest neighbor until the ratio of minority and majority classes becomes more balanced. The new synthetic data are very similar to the actual data because they are produced based on initial features .
The main advantage of SMOTE is that it prevents overfitting by synthesizing new samples from the minority class instead of repeating them .
There are also some disadvantages of SMOTE, however: oversampling of noisy samples, Oversampling of borderline samples . To overcome these problems, many strategies have been employed in the literature including :
Extensions of SMOTE by combining it with other techniques such as noise filtering, e.g., SMOTE-IPF and SMOTE-LOF
Modifications of SMOTE, e.g., borderline SMOTE (B1-SMOTE and B2-SMOTE) and SVM-SMOTE.
Borderline-SMOTE is an extension of SMOTE with a more powerful performance ability proposed by Han et al. in 2005. In this method, only the borderline examples of the minority class are over-sampled. A Borderline is a region where the samples of minority classes are near the majority. At first, the number of majority neighbors of each minority instance is used to split minority instances into three groups: safe, noise, and danger, then generate new instances. Suppose the neighbors of the points in the danger region are considered from the minority class. In that case, this method is called Borderline-SMOTE1, and when the point's neighbors in the danger region are considered from the minority and majority classes, called Borderline-SMOTE2 . Support vector machine SMOTE (SVM-SMOTE) is another extension of SMOTE that generates new synthetic samples near the decision boundary. This approach used SVM to detect decision boundaries . SMOTE-Nominal Continuous (SMOTE-NC) is an over-sampling method that uses k-nearest neighbors, applying the modified-Euclidean distances to generate new synthetic samples . This study introduced SMOTE techniques that have been used in the preparation initial data stage, then training ML algorithms have performed.
The predictive performance of ML algorithms was evaluated using several criteria, including sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), accuracy, Area Under the Curve (AUC), Geometric Mean (G-means), F1 score, and P-value of the McNemar test. We evaluated the predictive performance of ML methods using a cross-validation approach in which both groups of datasets, the original imbalanced dataset, and the SMOTE-balanced datasets, were randomly split into training (70%) and test (30%) sets. This process was iterated 100 times. Then, mean values for each evaluation criterion were calculated over 100 repetitions. Moreover, to prevent over-fitting, ML algorithms performed fivefold cross-validation to select the optimum hyperparameters. Different values for each of hyperparameters were examined and optimum value was determined. The optimal values of hyperparameters selected for each of the ML models are shown in Table 1.
In the present study, all SMOTE-balancing methods were executed through programming in Python software version 3.10.6 with the package "imbalanced-learn." Also, all analyses of ML methods were implemented using R software version 4.1.1, with the following packages: “e1071” for SVM; “nnet” for NN; “naivebayes” for NB; “randomForest” for RF and variable importance (VIMP) in the RF; “rpart” for DT; “xgboost” for XGBoost; and “SHAPforxgboost” for SHAP value.
In this study, of the 126 patients admitted to an intensive care unit, 117 (92.9%) were alive and 9 (7.1%) were dead. The mean follow-up time from the date of trauma to the date of outcome was 3.98 ± 4.65 days, with a mean follow-up time of 1.56 ± 0.73 days for patients who died and 4.17 ± 4.77 days for patients who survived. The overall mean (± SD) age of patients with traumatic injuries was 37.71 ± 12.78 years, with a minimum and maximum of 18 and 60 years, respectively. The characteristics of patients according to their traumatic injuries are listed in Table 2. Most of them were men, 85 (67.5%). The mean WBC value of the alive patients (9066.67 ± 2938.57) was significantly lower than that of the dead patients (15,500 ± 4492.22) (p < 0.001). Univariate analysis based on the chi-square test showed that the type of trauma in patients and the GCS were significantly related to the outcome of traumatic injuries. Mortality was significantly higher among penetrating trauma (18.5%) than in blunt trauma (4%) (p = 0.022). In patients with severe GCS (50%), mortality was significantly higher than in patients with moderate and minor GCS (8.5%) (p < 0.001).
According to the findings, the ratio of dead to alive population was 1:13 (IR = 13), expressing an extreme imbalance between the two classes. Therefore, various SMOTE family techniques were applied to face the imbalance of the data in the original datasets.
Initially, all classifiers are performed on the imbalanced data to represent the impact of the imbalanced data problem on the performance of the classifiers. Afterward, all classifiers are conducted on balanced data generated by SMOTE family techniques.
Table 3 demonstrates the performance of the six ML algorithms for the prediction of mortality in patients with traumatic injuries on the imbalanced datasets (original) and on the balanced dataset in terms of sensitivity, specificity, PPV, NPV, accuracy, AUC, G-means, F1-score, and P-value of McNemar's test. Further details on the 95% confidence intervals for each criterion of the models used are provided in Additional file 1.
One of the most important results from Table 3 is a considerable discrepancy between specificity and sensitivity in all ML methods used before balancing the dataset. In addition, it can be seen in Table 3 that in the rows of the original dataset, all methods used had high accuracy (≥ 90%). In comparison, the sensitivity values for all algorithms except ANN and XGBoost were less than 55%, which means that the classifiers are biased towards the majority class.
The results in Table 3 show that all methods used except XGBoost have high accuracy (≥ 90%) and specificity (≥ 92%) before and after SMOTE techniques. Compared with imbalanced data, the accuracy of the classifiers increases by a maximum of 8% with balanced data. The sensitivity and AUC of all the algorithms used before SMOTE techniques were significantly lower than after SMOTE techniques. The specificity of all models except XGBoost slightly decreased after the application of SMOTE techniques. In five ML Models, namely, SVM, NB, DT, XGBoost, and RF, the sensitivity and ACU were significantly increased by the use of SMOTE techniques, but the ANN model showed a slight increase in these criteria. For example, with imbalanced data, the DT classifier achieved a sensitivity of 26%, while the result with the SVM-SOMTE technique increased to 95%.
Before applying the SMOTE algorithm, the G-means score for DT was 45%, and for the other models, it was between 60 and 81%. After applying the SMOTE algorithm, the G-means score for all models was over 91%.
The F1 score ranged from 60 to 81% when unbalanced data were used, while it increased to exceed 90% for all models after the SMOTE technique was applied.
Among the SMOTE-based data-balancing techniques, the SMOTE-NC technique attained the highest accuracy value for XGBoost (100%) and SVM (99%), NB, and DT (96%), while Borderline-SMOTE1 provided the highest value of 100% for the ANN Model. SMOTE for ANN and RF also obtained an accuracy of 100% and 99%, respectively. Sensitivity was highest for SMOTE to ANN, RF, and NB, with the highest value of 100%, 99%, and 99%, respectively, whereas Borderline-SMOTE1 had the highest value of 100% to ANN and 99% for SVM. XGBoost with SMOTE-NC also yielded a sensitivity of 100%, and DT with SMOTE-SVM yielded a sensitivity of 95%. Three ML models, namely XGBoost, SVM, and DT with SMOTE-NC, achieved specificity and PPV of 100%, 99%, and 97%, respectively. The ANN model for SMOTE and Borderline-SMOTE1 achieved a specificity and PPV of 100%. RF with SMOTE also had both specificity and PPV 99%.
Based on the NPV comparison of ML algorithms, the performance of the ANN, SVM, and RF classifiers using the SMOTE method was 100%, 99%, and 99%, respectively. In addition, SMOTE-NC provided the highest value of 100% for XGBoost, Borderline-SMOTE1 provided the highest value of 100% for ANN, and the SVM-SMOTE method achieved the highest value of 97% for the DT model.
According to AUC, the performance of the XGBoost, SVM, NB, and DT classifiers with the SMOTE-NC method was 100%, 99%, 96%, and 96%, respectively, while Borderline-SMOTE1 gave the highest value of 100% for the ANN Model. ANN and RF classifiers with SMOTE also obtained AUC of 100% and 99%, respectively.
Finally, the P-value of McNemar’s test for all classifiers was greater than 0.05. Consequently, there was no significant difference between the frequencies of false positives and false negatives between two classes.
In summary, the SMOTE-NC balancing technique outperformed all other four data balancing techniques based on several evaluation criteria for four classifiers: SVM, NB, DT, and XGBoost. Moreover, the XGBoost model outperformed three other ML models among these ML classifiers. The performance comparison of the classifiers with SMOTE techniques and without SMOTE in terms of accuracy, AUC, G-means, and F1 score is shown in Fig. 1. The plots comparing the performance of the classifiers according to other criteria can be found in Additional file 2.
According to the SMOTE dataset, the RF model outperformed the other ML methods based on all evaluation criteria. Therefore, Fig. 2 indicates the relative importance of each variable obtained by the RF method in terms of mean decrease accuracy and mean decrease Gini. These indices identified WBC, GCS, and Age as the three most important variables for predicting trauma injury mortality. Afterward, the location of injuries and sex were important variables.
To better understand the performance of the XGBoost model in predicting mortality and to identify the variables that influenced the prediction model, the SHAP summary plot was shown in Fig. 3. This plot indicates the ranking of variables' importance and the mean SHAP value. Positive SHAP values show that the model predicts patients with traumatic injuries who die, while negative SHAP values show patients with traumatic injuries who survive. SHAP values farther away from zero indicate a more impact for a specific variable.
Figure 3 demonstrates that the most important variables that have a significant impact on the prediction of the XGBoost model are GCS, WBC, type of trauma, age, and gender. In addition, it can be seen in Fig. 3 that the patients who died according to the prediction of the model had high values in all the important variables.
According to Figs. 2 and 3, the important variables detected in predicting trauma injury mortality with RF and XGBoost models were nearly identical.
In the current study, several machine learning methods were applied to predict traumatic injury outcomes in trauma patients referred to the Besat hospital of Hamadan province. Data in this study were highly imbalanced: approximately 7% of the people were classified as dead patients. The imbalance ratio was 13, which indicates that for each sample of the minority class (dead), there were 13 samples of the majority class (alive). Hence, we first used SMOTE balancing techniques for building balanced classes in the original dataset. These techniques are data oversampling approaches that are generally used more frequently than other approaches in studies and cause the improved performance of classifiers [29, 43, 47,48,49,50,51]. Then, machine learning methods were applied to predict the in-hospital mortality of patients with traumatic injuries.
In this regard, the six algorithms of machine learning, DT, RF, NB, ANN, SVM, and XGBoost, were constructed and evaluated to predict traumatic injury outcomes on balanced and imbalanced datasets. This study tried to show the undesirable impact of imbalanced data problems on the performance of the machine learning models and apply SMOTE balancing methods to solve them.
In general, the performance of machine learning methods based on the balanced datasets was remarkably better than that of models based on the original imbalanced dataset, as expected. This indicates to perform prediction using the SMOTE strategies on imbalanced data is rational.
The findings show a considerable difference between specificity and sensitivity in all of the used ML methods before applying to SMOTE methods, which indicates classifiers are biased toward the majority class. At the same time, there is little difference between the sensitivity and specificity of SMOTE-based machine learning algorithms. The slight difference between these two criteria was seen in other studies, too [48, 49, 52, 53].
Also, the evaluation results showed high accuracy for all ML methods except XGBoost before using SMOTE-balancing methods.
The main reason for achieving high accuracy in such a situation is that the classification algorithms are biased toward the majority class. Some studies have shown that when classes are imbalanced, the accuracy of classifiers is slightly higher than that of classifiers in balanced data [48,49,50]. However, some studies demonstrated a slight increase in the accuracy of classifiers with balanced data compared to imbalanced data [29, 51]. In the current study, a slight increase in the accuracy of classifiers with balanced data existed as compared to imbalanced data. Therefore, the accuracy criterion is not a sufficiently robust measure when facing imbalanced datasets classification problems. Hence, to evaluate ML algorithms' performance, the AUC criterion is widely used for evaluating classifiers in the imbalanced dataset .
The findings showed that the mean area under the ROC curve for all ML models in SMOTE-balanced datasets improved significantly compared with that in the imbalanced dataset. This accents the importance of using SMOTE balancing techniques.
Although the general performance of SMOTE-based machine learning algorithms is excellent, finding the appropriate SMOTE-balancing technique to get the best results from ML algorithms is tricky. There is no single SMOTE-balancing technique can achieve the best results for all ML algorithms.
The current study shows that ML algorithms work better on the data balanced by SMOTE-NC and SMOTE. Also, among all ML classifiers, ANN and RF models in SMOTE and the XGBoost model in SMOTE- NC outperformed other ML models.
It should be pointed out that was not possible to perform a comprehensive comparison in the present study for several reasons. First, there was no prior study conducted on the use of SMOTE-based ML algorithms in the trauma field that have focused on general trauma. However, these algorithms were employed in some fields. For example, Karajizadeh et al. had compared balancing approaches of under-sampling, oversampling, SMOTE, and ADASYN with SVM, ANN, C5.0 tree, and CHAID tree to predict in-hospital mortality from hospital-acquired infections in trauma patients. They reported that among these ML algorithms, the SVM algorithm by SMOTE balancing approach in terms of accuracy outperformed other ML algorithms by balancing approaches. The prediction accuracy by SVM with SMOTE was 100% . Kumar et al. had also evaluated the performance of six ML algorithms: DT, k-Nearest Neighbor, Logistic regression, ANN, SVM, and NB over five imbalanced clinical datasets. They used seven balancing techniques for generating balanced data, namely under-sampling, random oversampling, SMOTE, ADASYN, SVM-SMOTE, SMOTEEN, and SMOTETOMEK. Then applied, ML algorithms were for the classification of balanced data. They reported that among seven balancing techniques, SMOTEEN had the best performance . Second, there are many oversampling techniques in the field of imbalanced learning. So far, 85 oversampling techniques have been developed to solve the imbalanced data problem . As a result, available studies used different SMOTE techniques that make comparison difficult and impossible. Third, the performance of both oversampling techniques and ML Models is generally data-dependent, one cannot detect an oversampling technique and ML classifier that always is the best for the classification of different datasets. Fourth, although various studies have investigated predicting trauma patient mortality using different ML methods. Nevertheless, most of these studies have concentrated on a specific type of trauma, such as burns, brain injuries, head injuries, and tooth injuries, and used the NN method [15,16,17, 55]. Hence, only a few studies were conducted in the trauma field focused on general trauma.
In this research, the RF model with SMOTE based on the evaluation criteria outperformed more ML methods. Consequently, the RF model has been used to identify the importance of variables in predicting traumatic injuries. The result of the variable importance based on the random forest model demonstrates that white blood cells and Glasgow coma scale and age, in terms of mean decrease accuracy and mean decrease Gini, have higher relative importance than other variables. Of these variables, WBC was identified as an important risk factor related to trauma mortality. This result is consistent with the findings of Almaghrabi et al. . They compared the performance of DT, RF, ANN, SVM and Logistic regression to predict traumatic injury mortality and found all applied ML algorithms have similar prediction accuracy of 94%. However, based on AUC, logistic regression and RF have the highest value, and SVM has the lowest value. Also, the results of their study showed that the location of treatment and age are other important factors too.
External validation is critical for establishing ML algorithms' validity and reliability . Therefore, there needs to be external validation attempts of SMOTE-based ML algorithms using an alternative external dataset. Therefore, the lack of external validation in our current study is one of the limitations.
Another limitation of the present study is that the data employed here were obtained from a registry-based retrospective study which causes the analysis to be prone to potential biases for the estimations for measures such as sensitivity. In addition, our study had a small sample size. Therefore, studies with large sample sizes are needed to investigate the performance and reliability of these methods. Also, factors such as injury severity scale (ISS), vital signs, and infection need to be considered in future predictive models in these patients.
Recently, to overcome the limitations of SMOTE, new versions of SMOTE have been introduced. Therefore, the authors propose to use the new versions of SMOTE, e.g., A-SMOTE, RN-SMOTE, SMOTE-LOF, to deal with imbalances and compare them with the prior versions of SMOTE for further analysis [28, 57, 58].
In this study, we used SMOTE and modifications of SMOTE to account for borderline samples in the classification of imbalanced datasets. In future work, we will use variants of SMOTE to detect noise samples. We will also employ deep learning methods to detect noise and borderline samples and to resample data.
Prediction models are broadly used in healthcare management, medical sciences, and clinical decision support. These methods help identify the rate of patient injuries, prioritize immediate threats, and decision-making in trauma. Hence causes improved medical care and the development of trauma services. Prediction models can help ICU physicians determine which patients are at high risk of mortality and who should be prioritized for treatment, enabling them to optimize clinical interventions and improve patients' prognoses. According to the excellent performance of machine learning models based on the SOMTE technique in predicting mortality in this study, the design of accurate decision support systems using these models facilitates and accelerates healthcare management processes.
Our finding demonstrated that RF and ANN models with SOMTE and XGBoost model with SMOTE-NC may be better than other ML models in predicting traumatic injury outcomes in trauma patients in terms of all criteria. Also, the most important variable affecting the predicting mortality in trauma patients based on SHAP value and RF were the white blood cells, the Glasgow coma scale, and age. However, these results are based on the finding of our study and do not have a generalization ability. Consequently, simulation studies are suggested for more investigation. Simulation studies are needed to investigate overall results and recommend a valuable tool for hospital mortality prediction in patients with traumatic injuries.
Availability of data and materials
The dataset used for analysis during the current study is not publicly available due to restrictions related to our internal review board policy. However, the dataset is available from the corresponding author upon reasonable request.
Artificial Neural Network
Support Vector Machine
Extreme Gradient Boosting
SHapley Additive exPlanations
Positive Predictive Value
Negative Predictive Value
Area Under the Curve
Synthetic Minority Over-Sampling Technique
Support Vector Machine Synthetic Minority Over-Sampling Technique
Synthetic Minority Over-Sampling Technique- Nominal Continuous
Length of Stay in ICU
White Blood Cells
Glasgow Coma Scale
YousefzadehChabok S, RanjbarTaklimie F, Malekpouri R, Razzaghi A. Predicting mortality, hospital length of stay and need for surgery in pediatric trauma patients. Chin J Traumatol. 2017;20(06):339–42.
Azami-Aghdash S, Sadeghi-Bazargani H, Shabaninejad H, Gorji HA. Injury epidemiology in Iran: a systematic review. J Inj Viol Res. 2017;9(1):27.
WHO. https://www.who.int/news-room/fact-sheets/detail/injuries-and-violence. 2022.
Kashkooe A, Yadollahi M, Pazhuheian F. What factors affect length of hospital stay among trauma patients? A single-center study. Southwestern Iran Chin J Traumatol. 2020;23(03):176–80.
Rafieemehr H, Calhor F, Esfahani H, Gholiabad SG. Risk of acute lymphoblastic leukemia: Results of a case-control study. Asian Pac J Cancer Prev. 2019;20(8):2477.
Eftekhar B, Zarei MR, Ghodsi M, MoezArdalan K, Zargar M, Ketabchi E. Comparing logistic models based on modified GCS motor component with other prognostic tools in prediction of mortality: results of study in 7226 trauma patients. Injury. 2005;36(8):900–4.
de Munter L, Polinder S, Lansink KW, Cnossen MC, Steyerberg EW, de Jongh MA. Mortality prediction models in the general trauma population: A systematic review. Injury. 2017;48(2):221–9.
Elgin LB, Appel SJ, Grisham D, Dunlap S. Comparisons of trauma outcomes and injury severity score. J Trauma Nurs. 2019;26(4):199–207.
Tapak L, Shirmohammadi-Khorram N, Amini P, Alafchi B, Hamidi O, Poorolajal J. Prediction of survival and metastasis in breast cancer patients using machine learning classifiers. Clin Epidemiol Global Health. 2019;7(3):293–9.
Rau C-S, Wu S-C, Chuang J-F, Huang C-Y, Liu H-T, Chien P-C, et al. Machine learning models of survival prediction in trauma patients. J Clin Med. 2019;8(6):799.
Kang WS, Chung H, Ko H, Kim NY, Kim DW, Cho J, et al. Artificial intelligence to predict in-hospital mortality using novel anatomical injury score. Sci Rep. 2021;11(1):23534.
Montazeri M, Montazeri M, Montazeri M, Beigzadeh A. Machine learning models in breast cancer survival prediction. Technol Health Care. 2016;24(1):31–42.
Hastie T, Tibshirani R, Friedman JH, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2009.
Farhadian M, Torkaman S, Mojarad F. Random forest algorithm to identify factors associated with sports-related dental injuries in 6 to 13-year-old athlete children in Hamadan, Iran-2018-a cross-sectional study. BMC Sports Sci Med Rehabil. 2020;12:1–9.
Stylianou N, Akbarov A, Kontopantelis E, Buchan I, Dunn KW. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches. Burns. 2015;41(5):925–34.
Serviá L, Montserrat N, Badia M, Llompart-Pou JA, Barea-Mendoza JA, Chico-Fernández M, et al. Machine learning techniques for mortality prediction in critical traumatic patients: anatomic and physiologic variables from the RETRAUCI study. BMC Med Res Methodol. 2020;20:1–12.
Abujaber A, Fadlalla A, Gammoh D, Abdelrahman H, Mollazehi M, El-Menyar A. Prediction of in-hospital mortality in patients on mechanical ventilation post traumatic brain injury: machine learning approach. BMC Med Inform Decis Mak. 2020;20:1–11.
Xu Q, Yin J. Application of random forest algorithm in physical education. Sci Program. 2021;2021:1–10.
Jabeur SB, Mefteh-Wali S, Viviani J-L. Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Ann Oper Res. 2021;23:1–21.
He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263–84.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Nejad MG, Kashan AH. An effective grouping evolution strategy algorithm enhanced with heuristic methods for assembly line balancing problem. J Adv Manuf Syst. 2019;18(03):487–509.
Liu NT, Salinas J. Machine learning for predicting outcomes in trauma. Shock Inj Inflamm Sepsis Lab Clin Approaches. 2017;48(5):504–10.
Maldonado S, Weber R, Famili F. Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci. 2014;286:228–46.
Liu M, Xu C, Luo Y, Xu C, Wen Y, Tao D. Cost-sensitive feature selection by optimizing F-measures. IEEE Trans Image Process. 2017;27(3):1323–35.
Kovács G. Smote-variants: A python implementation of 85 minority oversampling techniques. Neurocomputing. 2019;366:352–4.
Kovács G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput. 2019;83:105662.
Hussein AS, Li T, Yohannese CW, Bashir K. A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE. Int J Comput Intell Syst. 2019;12(2):1412–22.
Kumar V, Lalotra GS, Sasikala P, Rajput DS, Kaluri R, Lakshmanna K, et al., editors. Addressing binary classification over class imbalanced clinical datasets using computationally intelligent techniques. Healthcare; 2022: MDPI.
Buntine W, Niblett T. A further comparison of splitting rules for decision-tree induction. Mach Learn. 1992;8:75–85.
Zhang H, Singer BH. Recursive partitioning and applications. New York: Springer Science & Business Media; 2010.
Ray S, editor A quick review of machine learning algorithms. 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon); 2019: IEEE.
Breiman L. Random forests. Machine learning. 2001;45:5–32.
Liaw A, Wiener M. Classification and regression by randomForest. R news. 2002;2(3):18–22.
Janitza S, Hornung R. On the overestimation of random forest’s out-of-bag error. PLoS ONE. 2018;13(8):e0201904.
Vapnik V. The nature of statistical learning theory. New York: Springer science & business media; 1999.
Chen T, Guestrin C, editors. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
Deif MA, Solyman AA, Alsharif MH, Uthansakul P. Automated triage system for intensive care admissions during the COVID-19 pandemic using hybrid XGBoost-AHP approach. Sensors. 2021;21(19):6379.
AL-Shatnwai AM, Faris M. Predicting customer retention using XGBoost and balancing methods. Int J Adv Comput Sci Appl. 2020;11(7):704–12.
Ali H, Salleh MNM, Saedudin R, Hussain K, Mushtaq MF. Imbalance class problems in data mining: A review. Indonesian J Electrical Eng Comput Sci. 2019;14(3):1560–71.
Gu Q, Cai Z, Zhu L, Huang B, editors. Data mining on imbalanced data sets. 2008 International Conference on advanced computer theory and engineering; 2008: IEEE.
Pristyanto Y, Pratama I, Nugraha AF, editors. Data level approach for imbalanced class handling on educational data mining multiclass classification. 2018 International Conference on Information and Communications Technology (ICOIACT); 2018: IEEE.
Ghorbani R, Ghousi R. Comparing different resampling methods in predicting students’ performance using machine learning techniques. IEEE Access. 2020;8:67899–911.
Jeatrakul P, Wong KW, Fung CC, editors. Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. Neural Information Processing Models and Applications: 17th International Conference, ICONIP 2010, Sydney, Australia, November 22-25, 2010, Proceedings, Part II 17. Heidelberg: Springer; 2010.
Han H, Wang W-Y, Mao B-H, editors. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23–26, 2005, Proceedings, Part I 1. Heidelberg: Springer; 2005.
Tang Y, Zhang Y-Q, Chawla NV, Krasser S. SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cyber B Cybern. 2008;39(1):281–8.
Almaghrabi FSA. Machine learning methods for predicting traumatic injuries outcomes. United Kingdom: The University of Manchester; 2021.
Ramezankhani A, Pournik O, Shahrabi J, Azizi F, Hadaegh F, Khalili D. The impact of oversampling with SMOTE on the performance of 3 classifiers in prediction of type 2 diabetes. Med Decis Making. 2016;36(1):137–44.
Wu Y, Fang Y. Stroke prediction with machine learning methods among older Chinese. Int J Environ Res Public Health. 2020;17(6):1828.
Wang J, Wang S, Zhu MX, Yang T, Yin Q, Hou Y. Risk prediction of major adverse cardiovascular events occurrence within 6 months after coronary revascularization: machine learning study. JMIR Med Inform. 2022;10(4):e33395.
Ishaq A, Sadiq S, Umer M, Ullah S, Mirjalili S, Rupapara V, et al. Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE access. 2021;9:39707–16.
Saad AI, Omar YM, Maghraby FA. Predicting drug interaction with adenosine receptors using machine learning and SMOTE techniques. IEEE Access. 2019;7:146953–63.
Zheng X. SMOTE variants for imbalanced binary classification: heart disease prediction. Los Angeles: University of California; 2020.
Karajizadeh M, Nasiri M, Yadollahi M, Zolfaghari AH, Pakdam A. Mortality prediction from hospital-acquired infections in trauma patients using an unbalanced dataset. Healthcare Inform Res. 2020;26(4):284–94.
Thara T, Thakul O. Application of machine learning to predict the outcome of pediatric traumatic brain injury. Chin J Traumatol. 2021;24(06):350–5.
Ho SY, Phua K, Wong L, Goh WWB. Extensions of the external validation for checking learned model interpretability and generalizability. Patterns. 2020;1(8):100129.
Arafa A, El-Fishawy N, Badawy M, Radad M. RN-SMOTE: Reduced noise smote based on DBSCAN for enhancing imbalanced data classification. J King Saud Univ Comput Inform Sci. 2022;34(8):5059–74.
Maulidevi NU, Surendro K. SMOTE-LOF for noise identification in imbalanced data classification. J King Saud Univ Comput Inform Sci. 2022;34(6):3413–23.
We would like to thank the vice chancellor for research and technology of Hamadan University of Medical Sciences in Iran. This work is supported by the Vice-Chancellor for Research and Technology of Hamadan University of Medical Sciences, Iran (No. 140104282895).
This study was done with the financial assistance of Hamadan University of Medical Sciences (No. 140104282895).
Ethics approval and consent to participate
This study was approved by a research ethics committee of Hamadan University of Medical Sciences with the code (IR. UMSHA. REC.1401.261). The study adhered to relevant guidelines and regulations. We captured all participants' written informed consent and illiterates from legally authorized representatives. The study adhered to relevant guidelines and regulations.
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: Table A1.
Comparison of the predictive performance of SMOTE-based machine learning methods in terms of 95% confidence intervals of the evaluation criteria on a test data set.
Additional file 2:
Figure A1. The performance comparison of classifiers with SMOTE techniques and without SMOTE in terms of sensitivity, Specificity. Figure A2. The performance comparison of classifiers with SMOTE techniques and without SMOTE in terms of Positive Predictive Value and Negative Predictive Value.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hassanzadeh, R., Farhadian, M. & Rafieemehr, H. Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms. BMC Med Res Methodol 23, 101 (2023). https://doi.org/10.1186/s12874-023-01920-w
- Imbalanced data
- Machine learning algorithms
- SMOTE family techniques
- Traumatic injuries