Use of a consistent estimator does not guarantee the absence of any bias for finite sample. We only showed that MI analysis of case-cohort data provides unbiased estimates of the log-hazard ratio when the imputation model and the proportional hazard model are correctly specified. The misspecification of the imputation model can originate from an erroneous choice of the distribution, or from wrongly assuming that the estimator of the imputation model is consistent and normal, or from the omission of some important explanatory variable. Imputations carried out using a misspecified distribution in the imputation model can provide biased estimates of hazard ratios, especially, if the specified distribution of the phase-2 variable differs from the true one in terms of symmetry (log-normal versus normal distribution). The negative bias on a log hazard ratio of 0.69 was noticeable but not large when a log-normal variable was imputed according to a normal distribution (-0.09 or -13%), but it is clearly a type of misspecification easily identified with diagnostic tools [20]. One can then transform the incomplete variable in order to obtain a symmetrical distribution, impute transformed values and apply the inverse transformation to the imputed values. Note that although a normal and a uniform distribution are quite different, both are symmetrical and the observed bias was quite smaller (only 5%). In the 3C study of the relationship between VaD and D-dimer, we observed slightly different estimates of the log hazard ratio when comparing the third to the first tertile (2.77 versus 2.93, i.e. a relative difference of 8% between the MI and the weighted estimates). This is probably because of the qualitative imputation of D-dimer, and thus, the use of a multinomial imputation model, which implied estimation of parameters in separate strata defined by D-dimer concentration tertiles, some of which had a small number of events. Due to these small numbers (only 51 VaD in total), asymptotic conditions might not have been fulfilled in at least some strata, and the estimated coefficients of the imputation model could have been biased and notnormally distributed. We give below some recommendations regarding the choice of explanatory variables in the imputation model. Since the potential bias of MI estimates can be detected by comparing them to weighted estimates, we suggest building the proportional hazard model by using only the case-cohort data and weighted estimators. MI can eventually be used to reanalyze the data with the selected model to improve the precision of the results, while verifying that no bias was introduced.

In simulated data, for the phase-1 variables, the precision of MI and full cohort estimates was similar and smaller than with the weighted estimator. For the phase-2 variable, MI estimates were slightly more precise than weighted estimates. Globally, the mean squared errors were smaller with MI than with the weighted estimator, with one exception implying a normal imputation model for a log-normally distributed phase-2 variable, an error which should easily be avoided.

There is no standard method for estimating the predictive ability of a model in the framework of case-cohort surveys. We showed that the naive application of the *C* index to case-cohort surveys yielded an underestimation of the predictive ability of the model that depended on the subcohort size when the phase-2 variable had an effect on the risk. Similarly, the naive estimates of the predictive ability of an added phase-2 variable differed notably from the full cohort values when the effect of the phase-2 variable was not null. Harrell's C index could theoretically be estimated with a weighted approach, but this can be computationally difficult because it requires weighting each pair by the pairwise sampling probabilities, i.e., using a square matrix of size *N'*(*N'*-1), where *N'* is the size of the case-cohort sample. Computing the variance of this Horvitz-Thompson estimator requires either weighting each quadruplet by the quadruple-wise sampling probabilities, i.e., working with a matrix of size *N*'(*N*'-1)(*N*'-2)(*N*'-3), or bootstrapping the case-cohort data. By contrast, MI easily allows estimation of the predictive ability of a model or of an additional phase-2 variable and their variances in the context of case-cohort data, only requiring bootstrapping to estimate the variance of the predictive ability of the phase-2 variable. MI provided estimates of Harrell *C*, NRI and IDI indexes similar to those obtained with the full cohort analysis. Note, however, that the predictive abilities were always overestimated because the same data were used to estimate the model and its predictive ability.

Analysis of the Three-City case-cohort study was in agreement with our previous work [10]. The weighted and the MI approaches yielded similar estimates of the hazard ratios and MI was slightly more precise, particularly for phase-1 variables. The relative differences between both estimates was always below 2% for the hazard ratios related to CHD and D-dimer, but as early discussed, they could be slightly higher (8%) for a hazard ratio related to VaD and D-dimer. The precision was similar for both analyses.

The imputation model must reflect the association between the incomplete variable, the outcome and the other explanatory variables. Therefore, variables included in the proportional hazard model as well as the stratification variables must be included in the imputation model. If a surrogate of the phase-2 variable is available, it should also be included in the imputation model. On the other hand, multiple imputation approach can provide unbiased and more efficient estimates than weighted analysis even when no strong predictor of the phase-2 variable is available [10]. The inclusion of additional variables other than strongly predictive variables can lead to an increased inter-imputation variance. This prompted the use of different imputation models for D-dimer levels in the CHD and VaD analyses. However, we verified that adding the variables only used in the CHD analysis to the model used for VaD, did not modify the results observed in the former (data not shown).

The number of requested imputations depends on the proportion of missing information which, in case-cohort studies, is considerably smaller than the percentage of incompletely observed subjects. Rubin showed that with as much as 40 per cent information missing, M = 5 imputations provides an asymptotic relative efficiency was 0.97, and, with 50 per cent missing information, M = 10 provides an asymptotic relative efficiency of 0.98. Thus, a small number of imputations, 5-10, should suffice [21]. In our analyses, we used 5 imputations to limit the computer time of the simulations, a reasonable choice since the proportion of missing information was always smaller than 30 per cent. However, a slightly larger number of imputations (e.g. 10) could have been performed on the 3C study data at a reasonable time cost; it would have provided a more precise estimate of the between imputation variance and of the percentage of missing information.

The VaD risk increased with D-dimer tertiles. However, D-dimer inclusion did not significantly improve the predictive ability of the model for VaD risk. Computations of the *C* and IDI index yielded the same conclusion. To our knowledge, no other results concerning the predictive ability of D-dimer on the risk of VaD have been published to date. The risk of CHD did not vary with D-dimer, so, not surprisingly, the predictive ability of this variable was negligible, regardless of the index used. Wang *et al.* [22] and Tzoulaki [23] reported that the use of 10 and 4 biomarkers respectively added only moderately to the overall risk prediction based on conventional cardiovascular risk factors.