Prognostic models play an important role in the clinical decision making process as they help clinicians to determine the most appropriate management of patients. A good prognostic model can provide an insight into the relationship between the outcome of patients and known patient and disease characteristics [1, 2].

Missing covariate data and censored outcomes are unfortunately common occurrences in prognostic modelling studies [3], which can complicate the modelling process. Multiple imputation (MI) is one approach to handle the missing covariate data that can properly account for the missing data uncertainty [4]. Missing values are replaced with *m* (>1) values to give *m* imputed datasets. Previously, three to five imputations were considered sufficient to give reasonable efficiency provided that the fraction of missing information is not excessive [4]. However, with increased computer capabilities, the limitations on *m* have diminished and therefore it may be more sensible to use 20 [5] or more [6] imputations. The imputation model, used to generate plausible values for the missing data, should contain all variables to be subsequently analysed including the outcome and any variables that help to explain the missing data [7]. Outcome tends to be incorporated into the imputation model by including both the event status, indicating whether the event, i.e. death, has occurred or not, and the survival time, with the most appropriate transformation [7]. Due to censoring, this approach is not exact and may introduce some bias, but should still help to preserve important relationships in the data. The *m* imputed datasets are each analysed using standard statistical methods. The estimates from each imputed dataset must then be combined into one overall estimate together with an associated variance that incorporates both the within and between imputation variability [4]. Rubin [4] developed a set of rules for combining the individual estimates and standard errors (SE) from each of the *m* imputed datasets into an overall MI estimate and SE to provide valid statistical results, which will be described in the methods section. These rules are based on asymptotic theory [4]. It is assumed that complete data inferences about the population parameter of interest (*Q*) are based on the normal approximation
, where
is a complete data estimate of *Q* and *U* is the associated variance for
[4]. In a frequentist analysis,
would be a maximum likelihood estimate of *Q*, *U* the inverse of the observed information matrix and the sampling distribution of
is considered approximately normal with mean *Q* and variance *U* [8]. From a Bayesian perspective,
and associated variance *U* should approximate to the posterior mean and variance of *Q* respectively, under a reasonable complete data model and prior [9]. Inference is based on the large sample approximation of the posterior distribution of *Q* to the normal distribution [8]. With missing data, estimates of the parameters of interest are calculated on each of the *m* imputed datasets to give
with associated variances *U*
_{1},..., *U*
_{
m
}. Provided that the imputation procedure is proper [4], thus reflecting sufficient variability due to the missing data, and samples are large, the overall MI estimate and variance approximate the mean and variance of the posterior distribution of *Q* [6, 8]. The overall MI estimators and confidence intervals would be improved if combined on a scale where the posterior of *Q* is better approximated by the normal distribution [6, 10, 11]. When the normality assumption appears inappropriate for estimates of the parameters of interest, suitable transformations that make the normality assumption more applicable should be considered [4]. In circumstances where transformations cannot be identified, alternative robust summary measures [12], such as medians and ranges, may provide better results than applying Rubin's rules. In the context of prognostic modelling, there are no explicit guidelines for handling estimates of the parameters of interest after MI, such as predicted survival probabilities and assessments of model performance, where it is unclear whether simply applying Rubin's rules is appropriate.

Example techniques and parameters of interest in prognostic modelling studies and the rules currently available for combining estimates after MI are summarised. This paper will then provide guidelines on how estimates of the parameters of interest in prognostic modelling studies can be combined after performing MI. A review of the current practice for combining estimates after MI within published prognostic modelling studies is provided.