From: A systematic review of the clinical application of data-driven population segmentation analysis
Methods# | No. of studies | Advantages | Disadvantages | Notes |
---|---|---|---|---|
Unsupervised Classifications | ||||
 Latent class/profile/transition/growth analysis | 96 | 1. Can handle missing data [75] 2. Availability of goodness-of-fit measures to assess model fit and determine the appropriate number of segments (e.g. Akaike Information Criterion, Bayesian Information Criterion, standardized entropy) [57,58,59] 3. No need to standardize variables [76] | Can be computationally intensive, especially with datasets that contain thousands of observations [76] | 1. Segmenting variables need to be categorical, continuous, and categorical at multiple time points for latent class analysis, latent profile analysis, and latent transition analysis respectively [77] 2. Users need to pre-specify the desired number of segments |
 K-means cluster analysis | 60 | 1. Can deal with very large datasets [45, 78] 2. Able to handle both continuous and categorical properties [79, 80] | 1. Might not guarantee reproducible solutions (may get a different solution for each set of specified seed points) [81] 2. Sensitive to outliers [82, 83] 3. Limited statistical assistance in determining the optimal number of clusters [76] | Users need to pre-specify the desired number of segments. |
 Hierarchical analysis | 50 | 1. Stopping rules are readily available (e.g. Duda’s pseudo T square statistic, and Calinski’s pseudo F statistic) to determine ideal cluster solutions [70, 84,85,86] 2. Dendogram provided offer a simple and comprehensive visual presentation of segmentation solutions [87] 3. Can handle variables of different kinds, (e.g., continuous, binary, nominal) | 1. Difficult to handle large datasets (sample size is preferably under 300–400, not exceeding 1000) [88] |  |
Supervised Classification | ||||
 Decision Tree Methods (CHAID/CART) | 10 | 1. Can handle outliers and missing data [89] 2. Computationally fast [90] | Models are based on splits that depend on previous splits; an error made in a higher split will propagate down [90] | Users need to pre-specify dependent (or target) variables |