Skip to main content

Table 1 Methods for fitting individual predictor-specific risk models for members of a test set by combining data from multiple cohorts. All individuals in the training and test cohorts have 2 predictors, PSA and age, and then any subset, including none, of 10 additional predictors for a total of 12 predictors, denoted by \(\mathrm{X}\). The set of predictors available for the new individual is denoted by \({\mathrm{X}}^{*}\). All models use logistic regression for prediction of clinically significant prostate cancer. MICE = Multiple imputation by chained equations; BIC = Bayesian Information Criterion defined as the -2(maximized log likelihood) + (number of covariates) \(\times\) log(sample size)

From: Accommodating heterogeneous missing data patterns for prostate cancer risk prediction

Method

Definition

Available cases

Pool individual-level data that have \({\mathrm{X}}^{*}\) measured across all cohorts and fit a model including \({\mathrm{X}}^{*}\) as main effects

Iterative BIC selection

Same as available cases, but with an iterative stepwise BIC-based model selection to determine the optimal subset of \({\mathrm{X}}^{*}\) and interactions

Cohort ensemble

Separate models are built to each cohort by using the coinciding variables of the cohort and the patient

Categorization

All individuals in all cohorts are used. Predictors are categorized with missing as one of the categories so that the complete list of predictors \(\mathrm{X}\) is used

Missing indicator

Include an indicator for missing a continuous predictor value and the interaction with the predictor as additional variables in the analysis. Mostly similar to Categorization

Imputation

Impute missing covariates in the training set following the MICE method. Mean imputation for missing values in prediction