Our results from a cohort of middle-aged UK women showed that hours and corresponding estimated excess MET-hours for walking, cycling and gardening at follow-up were associated with the frequency of both any and strenuous physical activity reported at baseline about 3 years earlier. Mean estimated excess MET-hours for strenuous activity reported at follow-up also increased progressively with each increase in the frequency of strenuous physical activity reported at baseline. Similar relationships were found between any activity reported at baseline and an aggregate of various activities (walking, cycling, gardening, housework, and exercise causing sweating or a fast heartbeat) reported at follow-up. Hence, simple questions on the frequency of strenuous and any activity at baseline predict the estimated excess energy expenditure for strenuous and other specific activities, reported about 3 years later.
It has been suggested that older populations tend to be more sedentary, resulting in decreased between-person variability when compared to younger populations, and that this may reduce the power of a physical activity measurement instrument to discriminate between different activity levels . It has also been proposed that measurement error associated with reported physical activity may increase as the proportion of total activity consisting of light intensity activities increases, as often occurs in older populations . Social desirability bias may also influence the ability of some questionnaires to accurately assess self-reported levels of physical activity among middle-aged women .
Due to constraints on the length of the baseline questionnaire, simple, frequency-based questions were used to assess physical activity behaviours. At follow-up women reported on hours spent walking, cycling, doing gardening, doing housework, or doing exercise causing sweating or a fast heartbeat. Wareham et al.  showed that a simple four-level physical activity index, derived from questions asking the number of hours per week spent doing physical exercise, cycling and type of occupational physical activity, had high repeatability and was positively associated with objective measures of physical activity, and was therefore useful when ranking the physical activity levels of participants in large scale studies. In the Million Women Study, most women were aged 50 to 64 years at baseline. Only one fifth of women reported being in full-time work on the follow-up questionnaire, hence we did not include work effort.
Women reporting never being active at baseline reported taking on average 21.0 hours per week of various activities (walking, cycling, gardening, housework, and exercise causing sweating or a fast heartbeat) at follow-up. The lack of correlation between frequency of baseline activity and housework reported at follow-up may result from women not considering housework when asked to report on physical activity at baseline. However, domestic activities may account for a large proportion of total activity among these women. For example, for women in Britain aged 60 to 79 years, when domestic activities were not taken into account only 21% were classified as regularly active; whereas when domestic activities were included more than two thirds of these women met the requirements for achieving the recommended levels of physical activity . It is also possible that some women who reported never being active at baseline were subject to a degree of physical impairment that prevented them from engaging in higher-intensity activities, but not necessarily low-intensity activities such as housework. No prospective information was obtained at baseline, however, which indicated levels of physical or functional impairment, and we were therefore unable to examine this hypothesis in more detail.
We also examined repeatability of women's responses to questions on frequency of physical activity over time. The overall distribution of responses to these questions was similar between first and repeat administrations of the same baseline questionnaire, regardless of the interval between these assessments. However, agreement between individual women's responses declined as this interval increased. This was more marked for strenuous activity. In terms of repeatability, our results are consistent with those previously reported [[5, 24, 25, 28, 29]], which have shown a decrease in the agreement of physical activity measures with greater time periods between initial and repeat testing. At the population level, a similar distribution across activity groups despite changes among individuals between first and repeat testing has also been reported [[19–22]].
Measurement errors in the assessment of long term activity patterns result from a combination of variability in answers when completing a questionnaire (resulting from difficulty recalling past activity, differing interpretations of the questions, social desirability response bias, and random reporting errors) and real changes in physical activity patterns over time . Overall, these errors are likely to result in attenuation of estimates of associations between physical activity assessed at baseline and disease risk. This attenuation may be substantial and is likely to depend on the assessment instrument as well as participant characteristics . Intensive assessments of physical activity may minimise some sources of error, but can be impractical in larger-scale studies over a long period [[1, 3, 5]].
Test/retest reliability of physical activity questionnaires is often examined over short time periods of weeks to a few months , but our findings underscore the importance of changes in physical activity over longer periods. The reduction in the repeatability of physical activity responses over time which we have observed is likely to reflect, at least in part, real changes in physical activity patterns. This type of measurement error is of great importance in prospective epidemiological studies, where risks of various health outcomes during an extended period of follow-up may be estimated according to baseline self-reported physical activity. Repeatedly measuring different aspects of physical activity over time permits the assessment of the potential magnitude of the attenuation of disease associations. This can be done, for example, by using methods analogous to those used when correcting for "regression dilution bias" [14, 16].
Published surveys have reported evidence of changing physical activity levels in diverse populations in developed countries. An analysis of secular trends of physical activity levels in the adult population of the UK from 1991 to 2004 using data from the Health Survey for England, has shown a decline in occupational physical activity, but a progressive increase in sports participation in women aged 50 years and over . Overall, in adults aged 50 to 64 years there was a significant increase from 42.9% to 47.1% (p < 0.001) for those meeting the current physical activity recommendations. Guthrie  showed that within a cohort of Australian-born women, aged 45 to 55 years at baseline in the Melbourne Women's Midlife Health Project, 14% increased their frequency of physical activity by two or more sessions per week and 12% decreased their activity by the same amount over a 3-year follow-up period. Similarly, among 20 to 59 year old adults from the Netherlands , 45% changed their level of physical activity over a ten year period. Similar patterns of change have also been observed in a range of age-groups studied in the UK , and the USA . Our study had a longer mean interval between first and repeat baseline questionnaire administrations than many other studies, and is likely to be more conservative in assessing agreement; this may give a more realistic indication of the performance of similar physical activity questions in an epidemiological setting. As our findings and those of others [[5, 24, 25, 28, 29]] indicate that repeatability decreases with time, it is important to regularly update measures of physical activity in prospective studies .
The main strengths of this study include the large sample size and prospective study design. A potential limitation of this work is the use of Ainsworth et al.'s  compendium to estimate METs and the calculation of estimated excess METs instead of the estimation of 24-hour energy expenditure. While this compendium provides an estimate of the caloric energy expenditure required to perform a variety of different activities at various intensity levels, these values are often based on data from individuals atypical to the general population such as young, active males . Furthermore, as in any epidemiological study there may also be inter-individual variation in caloric energy expenditure resulting from differences in metabolic and mechanical efficiency [1, 34], and simple questions cannot capture all aspects of physical activity. Despite illustrating that our two different self-reported measures of physical activity agree well in terms of ability to rank women according to their level of physical activity, we did not have objective measurements of physical activity against which to compare the questionnaire data. We were therefore unable to make direct estimates of the magnitudes of biases in associations between health outcomes and self-reported physical activity data, which are likely to require independent, objective measures of activity levels .