The collection of income information on a questionnaire or within time limited interview situations is not straightforward. This is reflected in both its absence from some research instruments, its simplified form in others and more generally its relatively high level of missing or improbable responses. In studies where a measure of income is entirely missing, the use of other indicators of socio-economic position such as social class, educational attainment or small area based indicators are frequently used to approximate the material disadvantage that would have been captured by an income measure. This study has proposed and examined an alternative approach, the estimation of a synthetic measure of individual wages among workers based on detailed occupation groups from a standard occupation classification. While occupation forms a key component of many social class based measures, this often involves collapsing detailed occupational categories to such an extent that much ‘information’ is lost. We utilised this ‘information’ to estimate a synthetic measure of occupational based wage and then tested its external validity in relation to the prediction of an often used self-reported general health measure. We observed two main findings. Firstly, the estimates provide independent and additional explanatory power within models containing only social class and small area based measures of socio-economic position alone and secondly, that they behaved very similarly to ‘real’, reported measures of both household and individual income when modelling ‘general health’. These findings suggest that occupation may be a useful variable with which to estimate a synthetic measure of wages and may provide a reliable and effective alternative or supplement for the recording of reported income in social and health surveys.
The approach we have taken has a number of advantages both when datasets are missing an income measure entirely as well as for those where it is imprecisely measured. In the former case, our findings appear to support the notion that wage measures a different component of SEP than that captured by social class and small area poverty or deprivation measures. This suggests that social class and small area deprivation on their own may not be sufficient to adjust for all socio-economic differences in general health and certainly not those differences that are related to income.
It could be argued that the synthetic occupation-based wage estimates also provide a more analytically useful measure of ‘average income’. Research suggests that of the many aspects of SEP, income is perhaps the component with the greatest degree of short-term variability [7] which means that a traditional cross sectional survey, collecting the data at a single time point, may not capture the underlying information of interest. Because our estimates are closer to an individual’s medium term average wage given their occupation, it may capture important economic forces more effectively than reported measures of income for a specific period of time. This may explain why our synthetic measure has better discrimination (in terms of health) at lower levels than reported income (see Table 4 – odds ratio for deciles). At this point in the income distribution casual employment with more variable rates of wage within any period of time will be more common and therefore a single sample in time may provide a poor estimate of average wage, the more important factor in the determination of health.
The methodology can be applied to a wide range of studies or datasets because the estimation models are reasonably parsimonious and only require a record of age, sex and occupation coded within some form of hierarchical or tiered standard classification. In most datasets these variables are unlikely to contain significant numbers of missing cases leading to mostly ignorable and negligible missing cases in the resulting estimates. It may also be possible to simplify the model further. Provided a sufficiently large dataset is available (ie sufficient number of cases in each occupation group), the mean wage level within an occupational group may provide as good an estimate of wage as the empirical bayes estimate used in this study (see Table 2).
The findings have a number of important implications for understanding confounding by socio-economic position and the collection of income data in surveys for health research. Firstly, it is clear that other non-income measures or components of SEP do not entirely capture the effect of income on their own and that omitting an income measure risks introducing income-related confounding. This is particularly problematic in datasets in which income is not measured such as the UK census and census based longitudinal studies. Extending this argument further, the findings may have wider implications for the measurement of income in health surveys more generally. Although we have restricted our analysis to an examination of self-reported health, the evidence begins to suggest that the collection of reported income data in health surveys may not be as crucial as the measurement of occupation. This is important as occupation is a far easier characteristic to measure and is much less problematic in terms of missing data, mis-measurement or inaccuracy.
There are limitations with the approach that we have used. Firstly, it relies on occupational information being available for subjects and, if household income needs to be calculated, for all those contributing to the household budget. Secondly, for those of working age, who are not employed or those who have retired, a description of occupation, if available, will not necessarily be an accurate measure of their income. However, it is possible to estimate the likely income for those who are unemployed or retired by using the standard welfare payments or occupational related pension payments. For those who have retired but have a pre-retirement occupation recorded, a similar modelling approach could be used to estimate pension level. Finally, the study was restricted to an examination of a measure of self-reported general health and it does not necessarily follow that our findings can be generalised to other health variables. For example, the shape, magnitude and functional form of the relationship between income and other health indicators such as mortality and physical health measures differs markedly in some cases [24–26]. It is important for future research to examine the validity of these synthetic estimates in relation to other health variables.