Skip to main content

Table 1 Overview of simulation process to construct replicated time-to-event datasets

From: Generating high-fidelity synthetic time-to-event datasets to improve data transparency and accessibility

Simulation Methods:

1.Clean and format the original data which is going to be replicated

2.Define administrative censoring date and create an individual exit variable before assigning data as time-to-event

3.Assign new factor levels for variables with missing data groups, generate dummy variables, and generate any required non-linear and interaction terms for the survival model

4.Fit sequentially increasing complex models for individual covariates and store model estimates from which to recover simulated covariate distributions

5.Fit an all-cause survival model, including between-covariate interactions and time-dependent effects

6.Set a seed and number of observations for the replica data, and using the stored model estimates for each covariate model, sequentially generate covariate values in the replica data based on conditional values of earlier covariates in the sequence

7.Recreate any non-linear effects and model interactions which were included in the original survival model

8.Use post estimation predict option from the survival model to generate synthetic survival times based on individual patient covariate patterns from the stored survival model estimates

9.Format vital status variable and generate diagnosis date and exit date variables

10.Re-format vital status variable using exit date and the administrative censoring date to reconstruct the original data censoring distribution

11.Clean and label all simulated variables

12.Assign the simulated data as time-to-event for use in future survival analysis