Sample
The Victorian Adolescent Health Cohort Study (VAHCS) is a repeated measures cohort study of health in adolescents (waves 1 to 6) and young adults (waves 7 to 9), which was conducted between 1992 and 2008. The original sample of 1943 participants was randomly sampled from schools in Victoria, Australia, when they were aged 14 – 15 years. Data collection protocols were approved by The Royal Children’s Hospital’s Ethics in Human Research Committee. For further details on the cohort, see Reference [16].
Target analysis
The target analysis in the current study was a summary of minor psychiatric illness, measured by the General Health Questionnaire (GHQ) [17] at wave 8 (age approximately 24 years), and the association between GHQ at wave 8 and the likelihood of a person continuing to live in the family home at wave 9 (at approximately 29 years).
The exposure of interest, the GHQ, is a 12item questionnaire that was developed to measure minor psychiatric illness in the community [17]. Each of the 12 items in the GHQ screens for a symptom that is indicative of psychological distress and has four response options that reflect the increasing degree to which the participant has experienced the symptom. An example of a question in this scale is: “Have you lost much sleep over worry?” with the possible responses being: “not at all/no more than usual/rather more than usual/much more than usual”.
The GHQ can be scored using three different methods, as described by Donath [18]: the Likert, standard and CGHQ scores. The Likert scoring method (possible range 0 – 36) has a scoring pattern of 0123 for each of the items, with 3 representing the most extreme presence of the symptom. The total of the Likert scores provides a measure of the severity of psychological distress. The standard scoring method (possible range 0 – 12) has a scoring pattern of 0011 for each item, with the last two responses indicating presence of the symptom and the total measures psychological distress using a count of the number of items that have a positive response. The CGHQ scoring (possible range 0 – 12) is an adaptation of the standard scoring method, with the positively worded items scored 0011 as in the standard scoring method, and the negatively worded items, such as the example above, allocated a scoring pattern of 0111. This latter approach was developed to capture the possible presence of symptoms associated with the response “no more than usual” [19].
The outcome of interest in the target analysis was a binary indicator of whether the participants lived at their parent(s)’ home at wave 9, as determined from a direct question in the questionnaire administered at this wave.
A third variable used in this simulation study was GHQ measured at wave 9; this was a fourlevel categorical variable derived from the Likert scoring method, with categories of 0–5 (low), 6–8 (moderate), 9–11 (high) and 12–36 (very high). This variable was included as a complete auxiliary variable in the imputation model, as it is correlated with GHQ score at wave 8. To ensure that variation in the scaling of this auxiliary variable did not confound the results, the same categorical variable for GHQ at wave 9 was used in all imputation models, regardless of the scoring method of GHQ at wave 8.
For the sake of this paper we restricted our analysis to females with complete data on the exposure, outcome and auxiliary variable, resulting in a sample size of 714 participants.
Due to the steps involved in the simulation study (described below), particularly the reduction of the dataset to complete data and the omission of key confounders from the analysis, reported results are not intended to realistically address the substantive question about association between mental health and living at home in young adulthood.
Simulation method
The method for this simulation study is based on that used by Brand et al. [20] and described further by van Buuren [21]. We start with a sample that have complete data and simulate the missing data process by repeatedly setting a proportion of the data to missing.
We examined the imputation methods under both MCAR and MAR missingness conditions. For MCAR, missing values were randomly imposed for approximately 33% of values in the GHQ at wave 8. For the MAR condition, values were set to missing depending on the binary outcome (living at home at wave 9) and 4level ordinal auxiliary variable (GHQ at wave 9), with a probability determined by the logistic regression model:
\begin{array}{ll}\mathrm{\text{logit}}Pr\left(\mathrm{\text{missing}}\right)=& \mathit{\alpha}+{\mathit{\beta}}_{1}\mathrm{\text{Living}}+{\mathit{\beta}}_{2}\mathrm{\text{GHQ}}{9}_{\mathit{\text{moderate}}}\\ +{\mathit{\beta}}_{3}\mathrm{\text{GHQ}}{9}_{\mathit{\text{high}}}+{\mathit{\beta}}_{4}\mathrm{\text{GHQ}}{9}_{\mathit{\text{very}}\phantom{\rule{0.25em}{0ex}}\mathit{\text{high}}}\end{array}
(1)
where Living is an indicator of living at home at wave 9 and GHQ9_{
moderate
}, GHQ9_{
high
} and GHQ9_{
very high
} represent indicators for moderate, high and very high GHQ at wave 9. We fixed the coefficients of this logistic regression to be β
_{1} = 1.25 (corresponding to an odds ratio [OR] of 3.5), β
_{2} = 0.2 (OR = 1.22), β
_{3} = 0.3 (OR = 1.35) and β
_{4} = 0.4 (OR = 1.5), which represent modest but potentially realistic relationships between these variables and missingness. The value of α was chosen empirically in order to produce missing values in approximately 33% of cases.
For each scoring method of the GHQ at wave 8 and both missingness scenarios, we conducted the following steps N = 1000 times:

Missingness was generated in the complete dataset as described above.

The following imputation methods were used with m = 20 imputations performed for each procedure:
– Linear regression imputation (applied using the Stata command: mi impute regress) with no postimputation rounding.
– Linear regression imputation with postimputation rounding, with the limits specified as 0 (min) and 12 (max) for the CGHQ and standard scoring and 0 (min) and 36 (max) for the Likert scoring.
– Truncated normal regression (carried out using mi impute truncreg), with the lower and upper limits specified as the same limits used for the postimputation rounding method.
– Predictive mean matching (carried out using mi impute pmm), with the number of nearest neighbour candidates specified as k =5 [22].

For all imputation analyses, imputation models included the complete outcome variable (living at home at wave 9) and the complete auxiliary variable (GHQ at wave 9, included as a 4level ordinal variable).

Each of the above methods was also applied to the incomplete GHQ variable transformed using a shifted log transformation (using the lnskew0 in Stata version 13 [23]). Where relevant, the minimum and maximum limits were specified on the shifted log scale to be equivalent to those on the raw scale.

Target parameters of interest for evaluation of the imputation approaches were the marginal mean of the GHQ at wave 8 and the log odds of living at home at wave 9 given GHQ score at wave 8.
Performance measures for evaluating different methods
In order to evaluate these various imputation approaches, we compared our estimated statistics from the simulations to the complete data statistics.
Using the notation of Brand et al. [20] and van Buuren [21], we define Q to be the unknown population parameter of interest, for which we have a complete data point estimate, denoted \widehat{\mathit{Q}}. For each imputation method within one simulated dataset, we obtain an average point estimate across the m imputed datasets, which we denote {\overline{\mathit{Q}}}_{\mathit{m}}. In this simulation design, we consider \widehat{\mathit{Q}} to be both an estimate of Q and an estimand for {\overline{\mathit{Q}}}_{\mathit{m}}. Since we are fixing \widehat{\mathit{Q}} the performance measures we consider relate to the properties of {\overline{\mathit{Q}}}_{\mathit{m}} under repeated sampling of the missingness (assuming that \widehat{\mathit{Q}} is a valid estimate of Q under repeated sampling of the complete data). We calculated bias (in the restricted sense described) by comparing the average of {\overline{\mathit{Q}}}_{\mathit{m}} over our 1000 simulated datasets (\mathit{E}\left[{\overline{\mathit{Q}}}_{\mathit{m}}\right]) with the complete data estimate (\widehat{\mathit{Q}}):
\mathrm{\text{Bias}}=\phantom{\rule{0.5em}{0ex}}\mathit{E}\left[{\overline{\mathit{Q}}}_{\mathit{m}}\right]\widehat{\mathit{Q}}
(2)
To assess the variance estimates from the various imputation approaches under this simulation design, Brand et al. [20] distinguished between the two components of variance, withinimputation and betweenimputation, that are estimated and pooled using Rubin’s rules [3] to estimate the total variance of {\overline{\mathit{Q}}}_{\mathit{m}} as an estimate of Q. The withinimputation variance ({\overline{\mathit{U}}}_{\mathit{m}}), which is the average of the square of the standard errors of the point estimates derived from each of the m imputed datasets, should produce an unbiased estimate (over repeated sampling of the missingness) of the completedata variance estimate, denoted U:
\mathit{E}\left[{\overline{\mathit{U}}}_{\mathit{m}}\right]=\mathit{U}
(3)
As with the bias measure, we assess the performance of the withinimputation variance estimates by averaging {\overline{\mathit{U}}}_{\mathit{m}} across the 1000 simulations and comparing the result with U.
The second component of variance is the betweenimputation variance, which represents the variability due to missing data [21], and is estimated by B
_{
m
}, the empirical variance of the m estimates of Q obtained across the imputed datasets. On average, this quantity should estimate the actual variability observed in the MI point estimates across the repeated draws of the missingness (\mathit{i}.\mathit{e}.\mathit{\text{Var}}\left({\overline{\mathit{Q}}}_{\mathit{m}}\right)) so that the following condition should hold:
\mathit{\text{Var}}\left({\overline{\mathit{Q}}}_{\mathit{m}}\right)=\left(1+{\mathit{m}}^{1}\right)\mathit{E}\left[{\mathit{B}}_{\mathit{m}}\right]
(4)
We therefore assess the performance of the betweenimputation variance B
_{
m
} in estimating the actual variability of the estimates {\overline{\mathit{Q}}}_{\mathit{m}}.
The final measure of performance we used is a coverage property based on the proportion of (nominal) 95% confidence intervals that contain \widehat{\mathit{Q}}, the point estimate from the complete data, over repeated draws of the missingness, which we estimate as:
\begin{array}{ll}\mathit{P}[{\overline{\mathit{Q}}}_{\mathit{m}}& \left(\sqrt{\left(1+{\mathit{m}}^{1}\right){\mathit{B}}_{\mathit{m}}}\right){\mathit{t}}_{\mathit{m}1;0.975}\phantom{\rule{0.25em}{0ex}})\le \widehat{\mathit{Q}}\le \\ ({\overline{\mathit{Q}}}_{\mathit{m}}+\left(\sqrt{\left(1+{\mathit{m}}^{1}\right){\mathit{B}}_{\mathit{m}}}\right){\mathit{t}}_{\mathit{m}1;0.975}\phantom{\rule{0.25em}{0ex}}]\end{array}
(5)
This coverage proportion should equal 0.95, with both under and over coverage indicating a problem.
We considered each of the above evaluations of performance for the estimates of the marginal mean of the GHQ measure at wave 8 and the log odds ratio for the association between living at home at wave 9 and GHQ score at wave 8.
Comments
View archived comments (2)