Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

Table 1 Data structure for the breast cancer dataset and associated means and standard deviations (SDs) after suitable transformation

Covariate	Variable Type	Groupings/Measurement	Label	X	Mean(SD)
Age	Continuous	Years	Age	X ₁	53.05(10.12)
Lymph nodes	Continuous	Number of	LN	X ₂	1.16(0.94)
Progesterone receptor	Continuous	fmol	PGR	X ₃	3.35(1.93)
Oestrogen receptor	Continuous	fmol	ER	X ₄	3.35(1.84)
Hormonal treatment	Binary	1 = Yes, 0 = No	TRT	X ₅	0.36(0.48)
Menopausal status	Binary	0 = Pre, 1 = Post	MENO	X ₆	0.58(0.49)
Tumour group	Binary	0 = Grade I, 1 = Grade II/III	TG	X ₇	0.88(0.32)
Tumour size	Continuous variable categorised	1 = ≤20 mm, 2 = 21-30 mm, 3 = >30 mm	TS	X ₈	3.27(0.46)

Note: Data from the breast cancer dataset for X₂ and X₈ were log transformed; X₃ and X₄ were transformed using log(X+1).

ISSN: 1471-2288