The present paper is one of the few to attempt to directly quantify the likely effects of reporting error on disease-exposure associations for any anthropometric or reproductive history variables [23–27]. Purely random errors in reported values bias estimates towards the null, but inflation of estimates is also possible if systematic reporting errors work in opposition to the effects of random errors, or where errors in adjustment factors are correlated with those in the main exposure [23, 33]. For epidemiological analyses, the utility of self-reported exposure data is determined by the magnitudes of these errors, the attendant loss of power, and whether biases in estimates can be corrected either formally or informally. Methods of correction for random and systematic measurement or reporting errors, such as the regression calibration methods of Rosner et al. [44] and later developments thereof, have been used extensively in nutritional epidemiology, where discrepancies between reported and true dietary intakes can be substantial [33, 45], but in few other areas of epidemiology. The regression dilution ratio approach was developed in the context of prospective studies of clinical measurements such as blood pressure [36], which has relatively poor repeatability over time. Regression dilution ratios estimate the same quantity as the regression calibration methods familiar to nutritional epidemiologists [23], and can be applied, as we have done, to general measurement or reporting error problems in non-clinical contexts.
In contrast to statistics for agreement, which are purely descriptive, regression dilution ratios summarise the potential consequences of both random and systematic errors for epidemiological analyses. We found RDRs consistent with slight to moderate attenuation of estimates of disease-exposure associations (RDRs 0.66-0.86) for most quantitative anthropometric and reproductive history variables. A few variables (age at menarche, birth weight and waist-to-hip ratio) had smaller RDRs, consistent with more substantial attenuation of estimates (RDRs 0.44-0.50). For weight (RDR 1.02) and body mass index (RDR 1.04), however, there was little attenuation.
These regression dilution ratios provide a guide to possible effects of reporting error in one particular cohort, although in principle a good estimate of the regression dilution ratio can be used to correct estimates of linear disease-exposure associations in univariate analyses. For example, a regression dilution ratio of 0.5 corresponds to a 50% attenuation of the log relative risk (or other linear coefficient) towards 0. An estimated relative risk of 1.5 per unit self-reported exposure would then, after correction for reporting error, be equal to exp(ln(1.5)/0.5) = 2.25 per unit true exposure.
Regression dilution ratios are not suitable for correcting estimates of non-linear disease-exposure associations, such as the apparent J-shape in the association between BMI and all-cause mortality [27]. In these cases, the means presented in Figure 1 provide a guide to a more objective scale on which to interpret relative risks across categories of these variables. For example, relative risks within categories of BMI or other variables could be plotted against mean measured values. In addition, regression dilution ratios will not reveal situations where self-reported values are not linearly related to the reference values. However, the approximate linearity of each plot in Figure 1 (with the possible exception of the plot for waist-to-hip ratio) indicates that RDRs will provide suitable summaries of the effects of reporting errors across the ranges of each of these variables. Regression dilution ratios and the mean reference values presented in the figures are calculated under the additional assumption that NSHD reference values are unbiased but may be subject to small random errors that are uncorrelated with other quantities of interest. Results for regression calibration methods suggest that even if these assumptions are violated, imperfect adjustment for reporting error is usually better than proceeding with analyses under the false presumption that exposures are self-reported without error [39].
It must also be emphasised that methods of correction for reporting error, including the use of regression dilution ratios, are not robust to other common statistical problems. Poorly assessed outcomes, violations of assumptions underlying statistical methods, and lack of information on confounders, among other issues, can result in bias to estimates which will remain even after accounting for reporting error.
Systematic and random reporting errors also result in a loss of power to correctly reject false null hypotheses of no effect. Squared correlation coefficients indicate the approximate effective sample sizes, as a proportion of actual sample sizes, due to loss of power [30, 31]. Correlations reported here are consistent with reductions in effective sample sizes of between 9%, for weight, and 73%, for waist-to-hip ratio. Importantly, loss of power due to reporting errors cannot be remedied by correcting estimates using RDRs or similar techniques. The sample size must also be increased, and consequently regression dilution ratios and other methods for accounting for bias due to reporting error will be most useful in large-scale studies, or those that are otherwise well-powered. (Sample size calculations for studies based on self-reported data will still be accurate, however, provided that they are interpreted as sample sizes required to detect the attenuated association between the disease and the self-reported exposure.)
We also found good overall agreement between MWS and NSHD data for quantitative anthropometric and reproductive history variables, particularly for current height, weight and body mass index reported at recruitment. However, consistent with findings of previous studies [2, 11, 12, 15, 16], differences between MWS and NSHD anthropometric data included systematic over-reporting of height, and under-reporting of weight that was more pronounced among heavier individuals. Similar differential under-reporting was observed for self-reported waist and hip circumferences [6, 13], recalled body size variables including childhood body size and birth weight [4, 7–10, 18], and reported body sizes of close relatives [5]. Comparisons between intraclass and Pearson correlations suggested that systematic reporting errors were relatively greater for waist circumference and for the derived waist-to-hip and waist-to-height ratios, than they were for other variables. For both weight and body mass index, the increased under-reporting among heavier individuals explains why their regression dilution ratios are close to 1: this differential under-reporting would inflate estimates of disease-exposure associations, counteracting the attenuation due to random reporting errors. The RDRs for other variables (except height, birth weight, and age at menarche) may also be closer to 1 than would result from random error alone, due to increased under-reporting of each variable in its upper range of values. Differential under-reporting also implies that self-reported anthropometric data are likely to be inadequate for the purposes of clinical assessment, for example when classifying an individual as normal weight, overweight or obese based on their body mass index.
Most MWS variables on reproductive history and related factors showed good to moderate agreement with NSHD data. The exception was age at menarche, for which there was poorer agreement between the MWS and NSHD data. This level of agreement was comparable to that found in a recent validation study of recalled age at menarche in a larger subset of NSHD participants, which concluded that age at menarche self-reported in middle age may not be appropriate in a clinical setting, or to estimate risk profiles for associated diseases [22]. Several previous studies have concluded that information on having been breastfed, age at menopause and use of oral contraceptives is recalled with reasonable accuracy [17–21], however it is generally advisable to be cautious in the use of data that is recalled many years after the time of interest [22].
We also compared ordinal body size variables from the MWS, self-reported in middle age (relative body size at age 10, clothes size at age 20 and at recruitment and bra band size at recruitment), with anthropometry from the NSHD collected at the relevant ages (body mass index, waist circumference and chest circumference). Ordinal body size variables from the MWS were moderately to strongly associated with the NSHD variables against which they were compared. Notably, the strength of the relationship between clothes size reported at resurvey and measured waist circumference was comparable to that between reported waist circumference and measured waist circumference. This suggests that for the purposes of epidemiological studies, self-reported clothes size might be at least as good a proxy for waist measurements as self-reported waist circumference. Other studies have found differential systematic error in reported anthropometry in childhood and early adulthood (e.g. again, greater under-reporting of weight by heavier individuals) [3, 4, 10]. For ordinal data, however, it is not possible to assess agreement with anthropometry. Our results focus instead on the strength of the association between ordinal variables and corresponding anthropometry.
We are unaware of any studies which have directly validated self-reported clothes sizes against actual clothes sizes in either men or women, but in men measured trouser-waist size has been found to be highly correlated (r > 0.85) with clinical measurement of waist circumference [46]. Our findings suggest that clothes size might be well-reported by women and be representative of their true body size. Few studies have used clothes sizes as markers of disease risk [14, 46, 47], but the relationships they find are consistent with those for more conventional anthropometry. The mean NSHD values presented by category of clothes size and other ordinal variables (Figure 2) can be used in the interpretation of these relationships on a more objective scale.
Although most variables were validated against measured values or information from other reliable sources, clothes size at age 20 and maternal height were validated against data that was self-reported at the relevant age, and father's height and age at menarche were validated against data reported by proxy. In these cases, despite being collected close to the relevant time the reference NSHD data are not "gold standard". Hence there are two major sources of error: first, in the self-reported or proxy NSHD data, and second, in the self-reported MWS data. Because our results for these variables can at most account only for the second source of error, it is likely that they overestimate, to some degree, the levels of association and agreement between the two studies. Similarly, regression dilution ratios for MWS data on parental heights may underestimate the effects of error in these variables, which is likely to result in greater attenuation of estimates in epidemiological studies.
Other types of error are included within reporting error, but should be considered when interpreting any statistics for association and agreement, and regression dilution ratios. Survey questions were developed independently for each study. For data that was self-reported in both studies, subtle differences in wording of questions, and differences in the requested precision of responses, could contribute to disagreement between the studies. There were also variations in differences between the age at which NSHD data were collected and the age of data collection or referent age for MWS data (e.g. a difference between studies of 2.3 years between the average age of collection of waist and hip measures). These differences may contribute to slightly greater apparent reporting error for some variables than would have been found if the ages could have been matched more closely. Conversely, reporting errors assessed here do not include changes in exposures during follow-up, such as has been observed for blood pressure [24, 36] and may be likely for anthropometric variables including weight. Prospective studies with a long period of follow-up should also assess the contribution of such changes over time to bias in disease-exposure associations [24].
There were few significant associations of reporting errors in the variables considered in this study with childhood social class, educational attainment, adult deprivation and whether the participant's mother was still alive. However, there were more missing values in the lower socio-economic groups, and comparisons may not be generalisable to all subgroups of these factors. Overall comparisons between variables and detailed assessments of between-study differences by socio-economic group may be further limited by small numbers, particularly for age at menopause and variables reported at MWS resurvey. One other study has found no association of between-study differences in body weight according to socio-economic factors [3], but several studies have found differences in reporting of anthropometry according to sex, age, education or ethnicity [1, 16, 48, 49]. Other than education, we were unable to assess these factors, due to the composition of the cohort. Further investigations of populations including men, or with different distributions of ages, socio-economic factors or ethnicities, will be required to determine whether regression dilution ratios are similar, in these other populations, to the results presented here.
A previous report from the NSHD showed that categorical agreement between age at menarche reported during adulthood and that recorded nearer the time can vary according educational attainment [22]. Similar to the other variables, we found no significant associations of quantitative between-study differences with childhood social class or educational attainment for age at menarche. Because age at menarche was reported by proxy, the magnitude and effects of reporting errors could be underestimated, though it seems likely that a participant's mother would have been able to report her daughter's age at menarche with reasonable accuracy, at the time she was asked. Also, quantitative NSHD data on age at menarche is limited to women with age of menarche at most 14-15 years. This limitation could result in exaggerated between-studies differences for women reporting older ages at menarche in the MWS. For age at menopause reported at recruitment, because women matched to both studies were at most 55 years old when they joined the MWS, it was not possible to compare MWS ages at menopause greater than 55 years against NSHD data. Agreement between the studies for age at menopause was very high, although this may in part be due to improved recall in the MWS as a result of the very frequent follow-up for age at menopause, in the NSHD, between the ages of 47 and 54.
The matched participants in this validation study have consented to be part of two prospective cohorts, which suggests potential for self-selection biases in their data. There were few differences, however, in means of quantitative variables or proportions of categorical data between the matched participants and other MWS participants born within 1 year of the NSHD recruitment period, consistent with little additional bias. Nonetheless, the NSHD cohort has been followed since birth and participants are accustomed to providing information about their health and lifestyle, and might therefore be better able to recall information about past health and lifestyle than other women.