Skip to main content

Table 2 Description of ML methods

From: Application of machine learning in predicting survival outcomes involving real-world data: a scoping review

Method

Basic Concept

How It Works

Pros

Cons

Random Survival Forest

An ensemble tree-based learning algorithm specialized for survival analysis

Trains multiple decision trees on different subsets of the data and averages predictions. Time-to-event data is used to split nodes and generate survival curves

Handles large, high-dimensional datasets; automatically handles feature interactions; robust to outliers

Can be slow on large datasets; may overfit without careful tuning

Boosted Tree

An ensemble tree-based method that combines weak predictors to form a strong predictor

Trains simple models in a sequential manner. Each new tree tries to correct the mistakes of the previous one

Can handle different types of data; reduces bias and variance; highly accurate

Can overfit if too many trees are used; requires careful tuning; less interpretable

Artificial Neural Network

A model inspired by the human brain, with layers of interconnected nodes or "neurons"

Each neuron receives input from previous neurons, applies a transformation, and sends the output to next neurons. Learning involves updating the transformation parameters

Can model complex nonlinear relationships; highly flexible and adaptable

Requires lots of data and computational resources; hard to interpret; prone to overfitting

Support Vector Machine

A binary classification method that finds the hyperplane maximizing the margin between classes

Finds the hyperplane that maximizes the distance between closest points of different classes. Can use kernels for nonlinear boundaries

Effective in high dimensional spaces; robust to overfitting in the right dimensional space

Not suitable for larger datasets; requires careful choice of kernel; not directly applicable for multi-class problems

Regularization (LASSO, Ridge)

Linear models with added terms in the loss function to prevent overfitting

LASSO (L1 regularization) and Ridge (L2 regularization) add penalty terms to the loss function that shrink coefficients towards zero

Prevents overfitting; reduces model complexity

May lead to underfitting if regularization parameter is not tuned correctly

K-Nearest Neighbor

A simple algorithm that predicts based on the k closest training examples

For a new instance, finds the k nearest instances in the training set and predicts based on their output

Simple to understand and implement; no assumptions about data distribution

Computationally expensive for large datasets; sensitive to irrelevant features; performance depends on the choice of k

Multi-Layer Perceptron

A type of artificial neural network with one or more hidden layers

Works as a simple neural network with added hidden layers for complex transformations

Can model complex nonlinear relationships; flexible and adaptable

Requires lots of data and computational resources; hard to interpret; prone to overfitting

Naive Bayes

Probabilistic classifier based on Bayes' theorem with strong (naive) independence assumptions between features

Each feature independently contributes to the probability of the class. Class with the highest probability is chosen

Fast and efficient; performs well with high dimensions; requires less training data

Assumes feature independence which is often not the case; can be biased if a class lacks representation in the training data