Skip to main content

Table 1 Literature review of key characteristics of previous works for generating longitudinal synthetic health data

From: A method for generating synthetic longitudinal health data

Title

Data Structure

Variable Types

Model Types

Cross-sectional R.1a)

Longitudinal (R.1b)

Variable length sequences (R.2)

Categorical (R.3a)

Continuous (R.3b)

Categories with high cardinality (R.3c)

Outliers removed (R.4)

Missing values present in data (R.5)

Consider all the previous information (R.6)

Model informed by clinicians (R.7)

Variational Autoencoder Modular Bayesian Networks (VAMBN) for Simulation of Heterogeneous Clinical Study Data [61]

No

Yes

Fixed

Yes

Yes

Yes

N/D

Yes

Yes

No

Machine learning for comprehensive forecasting of Alzheimer’s Disease progression [62]

No

Yes

Varied

Yes

Yes

No

N/D

Yes

No

No

Design and Validation of a Data Simulation Model for Longitudinal Healthcare Data [63]

No

Yes

Varied

Yes

No

Yes

N/D

No

Yes

No

Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing [64]

No

Yes

Fixed

No

Yes

No

Yes

No

Yes

No

Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies [65]

Yes

No

N/A

Yes

Yes

No

N/D

Yes

N/A

No

Synthetic Event Time Series Health Data Generation [66]

Yes

Yes

Fixed

Yes

Yes

No

Yes

No

Yes

No

Data-driven approach for creating synthetic electronic medical records [67]

No

Yes

Varied

Yes

Yes

Yes

N/D

N/D

Yes

No

Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record [68]

Yes

Yes

Varied

Yes

Yes

Yes

N/D

No

Yes

Yes

Real-valued (medical) time series generation with recurrent conditional GANS [69]

No

Yes

Fixed

No

Yes

N/A

Yes

No

Yes

No

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks [70]

Yes

No

N/A

Yes

No

No

N/D

Yes

No

No

Data Synthesis based on Generative Adversarial Networks [35]

Yes

Yes

Fixed

Yes

Yes

Yes

N/D

N/D

Yes

No

Generation and Evaluation of Privacy Preserving Synthetic Health Data [71]

Yes

No

N/A

Yes

Yes

Yes

No

No

No

No

Generation of Heterogeneous Synthetic Electronic Health Records using GANs [72]

Yes

No

N/A

Yes

Yes

Yes

Yes

N/D

No

No

Generating Electronic Health Records with Multiple Data Types and Constraints [73]

Yes

No

N/A

Yes

Yes

Yes

Yes

N/D

No

No

Ensuring electronic medical record simulation through better training, modeling, and evaluation [74]

Yes

No

N/A

Yes

No

Yes

Yes

N/D

No

No

Generative Adversarial Networks for Electronic Health Records: A Framework for Exploring and Evaluating Methods for Predicting Drug-Induced Laboratory Test Trajectories [75]

No

Yes

Fixed

No

Yes

N/A

Yes

No

Yes

No

Synthesizing electronic health records using improved generative adversarial networks [76]

Yes

No

N/A

Yes

No

No

Yes

N/D

Yes

No

Generating Fake Data Using GANs for Anonymizing Healthcare Data [77]

Yes

Yes

Fixed

Yes

Yes

No

Yes

N/D

No

No

CorGAN: Correlation-Capturing Convolutional Generative Adversarial Networks for Generating Synthetic Healthcare Records [78]

Yes

No

N/A

Yes

Yes

No

N/D

N/D

N/A

No

Generation and evaluation of synthetic patient data [79]

Yes

No

N/A

Yes

Yes

No

No

N/D

N/A

No

Generating and Evaluating Synthetic UK Primary Care Data: Preserving Data Utility & Patient Privacy [80]

Yes

No

N/A

Yes

Yes

No

No

N/D

N/A

No

SMOOTH-GAN: Towards Sharp and Smooth Synthetic EHR Data Generation [81]

Yes

No

N/A

Yes

Yes

No

Yes

N/D

N/A

No

Continuous Patient-Centric Sequence Generation via Sequentially Coupled Adversarial Learning [82]

No

Yes

Varied

No

Yes

N/A

Yes

No

Yes

No

Medical Time-Series Data Generation using Generative Adversarial Networks [83]

No

Yes

Varied

Yes

Yes

No

N/D

N/D

No

No

  1. N/A refers to not applicable while N/D refers to not described