In our study in a developed multi-ethnic urban Asian population, SP2PAQ showed a substantially higher correlation with an objective measure of energy expenditure from physical activity than the IPAQ for both moderate and vigorous activity. The validity of the IPAQ for ranking the physical activity level of individuals was inadequate in our population, whereas the validity for SP2PAQ was acceptable for this purpose with the possible exception of ranking moderate activity in Indians. Both questionnaires tended to overestimate energy expenditure for vigorous activity, especially at higher levels of energy expenditure. For moderate activity, both questionnaires underestimated the energy expenditure when compared with the measurement of accelerometer. The reproducibility over an average of 6 months of the two questionnaires and the accelerometer was reasonably good.
Our study showed that the corrected correlation for vigorous activity was substantially better than for moderate activity, and this is consistent with findings in other studies [41, 42]. In the Stanford Five-City Project, a survey of a representative population sample of four cities in central California, which compared nine measurement instruments for physical activity, recall was accurate for vigorous activity, but poor for moderate activity and this finding was consistent in men and women and in all domains of activity . The same finding was observed in the validation of the Stanford 7-Day Recall, which showed lower correlation for moderate activity than vigorous activity in men (0.23 vs. 0.59) .
The correlation between IPAQ and accelerometer in our population appears to be lower than other populations [18, 43]. In the New Zealand population the correlation was 0.19 for moderate activity and 0.42 for vigorous activity  and in a Swedish population it was 0.21 for moderate activity and 0.71 for vigorous activity . This may be due to differences in culture as well as educational level of participants that might affect interpretation of the questionnaire . All participants in the Swedish study had a higher education level, whereas our study population consisted of participants with varying educational levels. However, when compared with other validation studies in Asia, the correlation between IPAQ and accelerometer for the Chinese ethnic group in our study was similar to the Chinese population studied in Hong Kong (r = 0.27 for moderate activity and r = 0.28 for vigorous activity) . Another validation study in the Chinese population of Hong Kong showed different results according to the accelerometer used: the correlations of IPAQ with the Tritrac accelerometer were 0.15 and 0.18 for moderate and vigorous activity respectively, whereas with the MTI accelerometer these correlations were -0.06 and 0.44 respectively. In a Japanese population, the correlation between the IPAQ and accelerometer measurements of total physical activity was 0.36 .
The validity of SP2PAQ is comparable to other questionnaires that have been used in large epidemiological studies. For example the correlation of Behavioural Risk Factor Surveillance System (BRFSS) physical activity questionnaire used for monitoring physical activity across the U.S.A. compared with the accelerometer was 0.31 for moderate activity and 0.17-0.26 for vigorous activity , whereas the correlation of New Zealand Physical Activity Questionnaire(NZPAQ-LF) with accelerometer was 0.30 for moderate activity and 0.37 for vigorous activity .
When we compared questionnaire and accelerometer estimates of energy expenditure from physical activity using Bland-Altman plots, greater differences between the two methods were observed with increasing means of measurements for both moderate and vigorous activity. This may be due to either the questionnaire increasingly over-estimating activity with increasing activity or the accelerometer increasingly under-estimating activity with increasing activity. In a study done by Klesges et al reported that participants overestimated the duration of their physical activities, especially for aerobic activities . In addition, the Actical accelerometer may have underestimated energy expenditure, especially at high levels of energy expenditure . The accelerometer is known to substantially underestimate energy expenditure for specific activities . For example, accelerometers have limitations in detecting activities where the body is mostly stationary such as when cycling or weight lifting . Moreover, in our study, the accelerometer was taken off during water-based activities. This may have reduced the amount of activity detected by the accelerometer as compared with the questionnaire although only five participants reported swimming during the period in which they wore the accelerometer. The combination of over-estimation by the questionnaires and under-estimation by the accelerometer may have given rise to the observation that the difference between these methods was greater at higher levels of activity. Similar findings were reported for a nationally representative sample of the Swedish population, where the difference between the IPAQ and accelerometer measurements of time spent on physical activity was larger at higher activity levels reported by the IPAQ .
Several methodological differences exist between SP2PAQ and IPAQ. It should be noted that the IPAQ assesses physical activity in the past week, whereas the SP2PAQ assesses habitual physical activity of at least the past 3 months. For the first 120 participants, the week recorded by IPAQ was different from that of the week measured by the accelerometer as IPAQ was administered before the accelerometer wearing period. However, we reversed the order of questionnaire administration for the subsequent 43 participants so that the IPAQ questionnaire applied to the same week of accelerometer measurement and found that the order of questionnaire administration did not affect the agreement with accelerometer measurements. The mode of administration of the questionnaires was also different as SP2PAQ is administered by an interviewer whereas IPAQ is self-administered. In a comprehensive review of physical activity instruments, it was concluded that the accuracy of interviewer-administered questionnaires tends to be greater than for self-administered questionnaires . Finally it should be noted that the primary intention of IPAQ is to obtain comparable population estimates of physical activity data across different countries, whereas the aim of SP2PAQ was to assess inter individual variation in usual physical activity within a population.
To our knowledge, this is one of a few studies that validated physical activity questionnaires in an Asian population. In addition, the correlations were corrected for within-person variation in the accelerometer measurements. Within-person correlation in the reference instrument will reduce the correlation with the evaluated questionnaires and should be corrected for in validation studies . The drop-out rate in our study was negligible as there was only one person who withdrew from the study. There are also several limitations in our study that need to be considered. The size of our study population was modest and most participants were from a hospital and a university campus thus limiting generalizability. However, the participants were derived from fairly wide age and socioeconomic groups with different educational, occupational and income levels. Although the distributions of age, gender and ethnicity were not exactly the same across the sub-groups, these differences in distribution were not statistically significant. The reference measurement used in this study was the accelerometer which is not the gold standard to validate the physical activity measurements . However, the current reference standard for validating activity questionnaires, the doubly labeled water technique, is not only very costly but it also does not provide information on the patterns of physical activity as it estimates total energy expenditure . The accelerometer on the other hand can provide the frequency, duration and intensity of free living physical activity to obtain a good estimate of energy expenditure and has been recommended as an objective method of choice to use in validating questionnaires or studying patterns of physical activity [6, 52, 53]. It has also been used to validate physical activity questionnaires in national surveys such as the England Physical Activity questionnaire  and the BRFSS . Finally, the interval between the test and retest measurements was rather long. We realize that as a result the reliability estimates are affected by both measurement error related to the assessment of short-term activity and real changes in activity habits of participants over time. However, in epidemiological studies we are generally interested in habitual activity over years as this is most relevant for the development of chronic diseases. For this application, an inability of assessment methods to capture long-term physical activity is therefore a limitation and long-term reproducibility, part of which may be due to real changes in physical activity, is most relevant.