SAMSS has contributed to the monitoring of departmental issues, key risk factors and population trends in priority chronic disease and related areas thereby guiding investments, identifying target groups, providing important program and policy information, and assessing outcomes. Potential bias from differing probabilities of selection in the sample is addressed by weighting by age, gender, probability of selection in the household, and area of residence to the most recent estimates of residential population, derived from census data by the Australian Bureau of Statistics. Additional bias may result if the questions are not reliable, that is the questionnaire will not always result in the same response when repeatedly administered to the same respondents. Undertaking this study to determine this source of bias determined that substantial to almost perfect reliability for the questions asked in SAMSS that were tested. These findings are consistent with published literature of BRFSS survey questions [1–6] and a previous reliability study of health survey questions asked in South Australia .
The high level of reliability for demographic variables reflects survey administration protocols which ensured that the respondent was in fact the original respondent, resulting in excellent agreement beyond chance on sex and age. The small variation in household composition would be expected in a random sample of the population over such a short time frame. This finding is consistent with our previous reliability study  and BRFSS study .
It is reasonable to expect some variation between surveys for behavioural related variables such as physical activity, smoking or alcohol consumption, with a change between surveys leading to a lower level of agreement for these variables. This retest was performed a minimum of 13 days following the initial survey (mean 16.8; SD 3.6). This length was long enough so that respondents were not expected to have remembered the answers that they gave to the first survey and also unlikely that answers would change significantly, making it possible that the two surveys could be considered independent. Despite this reasoning, real changes could have occurred between interviews, which would weaken the reliability estimates. Notwithstanding, any differences could be the result of social desirability and the subjective need to report knowledge rather than actual behaviour. This could explain the lower kappa scores for smoking status (κ 0.81) smoking situation at home (κ 0.59) and alcohol consumption (reliability values from 0.66 to 0.75). Our results for smoking status, which had an observed agreement of 89.0%, is similar to our previous study (κ 0.92) and other studies with kappa values of current smoking of 0.85 , 0.90  and 0.83 . Smoking situation at home had moderate agreement beyond chance (κ 0.59) but the observed agreement (96.1%) was excellent. This low kappa value would be due to the responses to the categories being close to 0% or 100% which affects the kappa statistic. The results for alcohol consumption had moderate agreement beyond chance which was similar to our previous finding . Similar to smoking, the risk of harm to health due to alcohol consumption had a weighted kappa value of 0.66 but high observed agreement of 95.0%. This suggests that smoking status and alcohol consumption are relatively reliable due to the excellent observed agreement despite the moderate kappa values. The total time estimated walking had the lowest level of agreement (ICC 0.47) which was similar to another Australian study which had reliability score of 0.56  and our previous study (κ=0.54 95% CI 0.37-0.70) and other BRFSS study  with kappa values of 0.54 for sedentary lifestyle and 0.56 for inactivity. The fair agreement could be due to a real change in total time walking between the two interviews or the ability to recall actual time on walking was not good. It should be noted that outliers can greatly influence the ICC producing low values and the possibility of converting the values to categories can result in a higher reliability values .
Similarly, while questions relating to self-reported risk factors such as fruit and vegetable consumption showed only fair to moderate agreement beyond chance this could be due to changes in behaviour following the first interview. However in both cases, the number of serves of both fruit and vegetables appeared to decrease following the first survey. It is also possible that participants modified their answers the second time due to providing social desirable or knowledge answers as a result of major advertising campaigns, Go for 2&5®, that was conducted in SA and other states to promote the recommended 2 serves of fruit and 5 serves of vegetables per day. Or participants do not have the ability to recall all of the fruit or vegetables consumed per day or to quantify fruit and vegetables into serve sizes.
Somewhat unexpected is the excellent reliability for height, weight and BMI (ICC ranging from 0.97 to 1.00), and the excellent observed agreement (97.9%) and agreement beyond chance in the derived BMI categories (κw 0.93). These results are consistent with our previous study which the weighted kappa value for the BMI categories had of 0.89 (95% CI 0.84-0.92)  and a BRFSS study which reported excellent reliability for height, weight and BMI (Pearson’s r 0.84 to 0.94). We have previously addressed the reliability of self-reported height and weight when compared to clinic measurements  but this study has shown that BMI, as a broad measurement of adiposity, is reliable when collecting self-reported data over the telephone.
The questions relating to chronic condition variables, and high blood pressure and cholesterol displayed excellent observed agreements and good to excellent agreement beyond chance with the exception of heart disease. As stated previously, the low prevalence of heart disease affects the kappa statistic when the observed and expected agreements were shown to be excellent. Hence the questions to obtain prevalence estimates on health conditions, and high blood pressure and cholesterol are reliable in telephone surveys. Health service use displayed excellent agreement and one would expect some variation in the time period of this study. Overall health status use had moderate agreement by chance (κ 0.60) but good observed agreement (85.8%). The kappa value is lower than a BRFSS study (κ 0.75). This low kappa value found in this study could be partly because the respondent’s health could have changed between the two time periods and the low proportion reporting ‘poor’ health which affects the kappa statistic.
Weaknesses of the study include the possible bias from a willingness to participate, although comparison with SAMSS November sample does not indicate this to be the case. In addition, the retest data were not weighted so any prevalence estimates should be used with the utmost caution. Although telephones are connected to a large number of Australian households, not all are listed in the EWP (mobile only households and silent numbers). In 2008 in South Australia, 9% of households are mobile only and 69% of households do not have their landline or mobile number listed in EWP . Previous work undertaken in 1999 has shown that inclusion of unlisted landline numbers in the sampling did not impact on the health estimates . However, mobile only households are increasing in South Australia and following international trends, and a very small proportion (7%) elect to have their mobile number listed in EWP . Presently, the exclusion of this group from the current sampling frame may be small in relation to the health estimates obtained using EWP. However, the characteristics of people living in mobile-only households are distinctly different and the rising proportion in the number of mobile-only households is not uniform across all groups in the community. Given these sampling issues, there is potential bias in the results obtained in this study.
The response rate of nearly 65% is moderately acceptable for this type of survey but the potential for survey non-response bias is acknowledged. Response rates are declining in surveys based on all forms of interviewing  as people have become more active in protecting their privacy. The growth of telemarketing has disillusioned the community and diminished the success of legitimate social science research by means of telephone-based surveys.