The impact of the mode of survey administration on estimates of daily smoking for mobile phone only users

Background Over the past decade, there have been substantial changes in landline and mobile phone ownership, with a substantial increase in the proportion of mobile-only households. Estimates of daily smoking rates for the mobile phone only (MPO) population have been found to be substantially higher than the rest of the population and telephone surveys that use a dual sampling frame (landline and mobile phones) are now considered best practice. Smoking is seen as an undesirable behaviour; measuring such behaviours using an interviewer may lead to lower estimates when using telephone based surveys compared to self-administered approaches. This study aims to assess whether higher daily smoking estimates observed for the mobile phone only population can be explained by administrative features of surveys, after accounting for differences in the phone ownership population groups. Methods Data on New South Wales (NSW) residents aged 18 years or older from the NSW Population Health Survey (PHS), a telephone survey, and the National Drug Strategy Household Survey (NDSHS), a self-administered survey, were combined, with weights adjusted to match the 2013 population. Design-adjusted prevalence estimates and odds ratios were calculated using survey analysis procedures available in SAS 9.4. Results Both the PHS and NDSHS gave the same estimates for daily smoking (12%) and similar estimates for MPO users (20% and 18% respectively). Pooled data showed that daily smoking was 19% for MPO users, compared to 10% for dual phone owners, and 12% for landline phone only users. Prevalence estimates for MPO users across both surveys were consistently higher than other phone ownership groups. Differences in estimates for the MPO population compared to other phone ownership groups persisted even after adjustment for the mode of collection and demographic factors. Conclusions Daily smoking rates were consistently higher for the mobile phone only population and this was not driven by the mode of survey collection. This supports the assertion that the use of a dual sampling frame addresses coverage issues that would otherwise be present in telephone surveys that only made use of a landline sampling frame. Electronic supplementary material The online version of this article (doi:10.1186/s12874-017-0342-4) contains supplementary material, which is available to authorized users.


Background
Over the past decade, there have been substantial changes in landline and mobile phone ownership, with most nations observing declines in landline ownership and a corresponding increase in the proportion of mobile phone only (MPO) households [1][2][3]. In Australia, these changes in phone ownership have not been uniform across all population groups, with many harder to reach groups, such as males, younger people, recent migrants, renters and people from a low socioeconomic background less likely to own a landline telephone [1,4,5]. For this reason, health surveys of the general population that only use a landline phone number sampling frame no longer have adequate population coverage to produce unbiased estimates of health behaviours [4,6]. With the decreasing coverage of landline phone number sampling frames, it has become necessary to use dual sampling frames, which use mobile and landline phone numbers and accounts for the overlapping chance of selection, or alternatively, use non-telephone based survey approaches in order to ensure that representative estimates of health behaviours can be produced. There has been substantial work undertaken to implement dual sampling frames for health surveys and to determine whether dual sampling frames are able to correct for biases in health behaviours estimates. One key population group that appears to be quite distinct from others is the MPO population [6,7].
Prevalence estimates for a number of health indicators, including smoking, alcohol consumption and adequate physical activity, have been found to be much higher for the MPO population compared with the landlineaccessible population [6]. While some of these disparities had been explained by differences in population structure, smoking estimates were found to be persistently higher for the MPO population [6][7][8]. Given the emerging use of dual frame telephone surveys, higher smoking estimates obtained for the MPO population warrants further investigation [8].
In recognition of the increasing size of the MPO population, Livingston et al. have recommended that telephone surveys allow for a larger mobile subsample to ensure that the growing population of mobile-only users is properly represented in survey estimates [8], however this work predominantly focussed on measures of alcohol consumption.
There is substantial evidence that survey respondents are more likely to under-report undesirable behaviours when participating in interviewer-directed surveys (faceto-face and telephone interviewing modes) compared to self-administered surveys (such as self-complete questionnaires) [9]. Compared to self-administered surveys, interview-administered survey results were more likely to be biased towards more socially desirable responses with regards to health-related lifestyle questions [10,11]. Therefore, further work is required to identify whether it is the mode of survey collection which influences estimates for the MPO population, particularly for smoking.
This study aims to identify whether higher smoking estimates for the MPO population can be explained by the mode of data collection for a survey, after accounting for differences in the population structure of each phone ownership population. This paper focusses on comparing daily smoking estimates from a CATI dualframe survey with those arising from a self-administered survey for NSW, as well as a brief examination of the results for key demographic strata. A comparison of estimates from the different phone ownership groups will also be made.

Data sources
Data from the New South Wales Population Health Survey (NSWPHS) [12] and the National Drug Strategy Household Survey (NDSHS) were obtained for this study [13].
New South Wales (NSW) is the most populous state in Australia with an estimated population of 7.41 million in June 2013 including both highly urbanised and rural areas, and accounts for approximately one third of the Australian population [14]. At June 2013, 21% of the total Australian adult population were estimated to be mobile phone only users and the majority of these people were aged 18-34 [1]. The NSWPHS, administered by the Centre for Epidemiology and Evidence, NSW Ministry of Health, sampled respondents living in private households using CATI software to sample NSW residents according to Local Health District boundaries [15]. In 2013, an overlapping dual-frame was adopted using a sample of landline and mobile phone numbers, with a target of 30% of all interviews completed on a mobile phone, with the remainder completed on a landline telephone. A stratified two-stage cluster sample design was used for the landline frame, using simple random sampling to select clusters (household telephone numbers) within Local Health District strata and to select one household resident from the selected households using the Kish Grid [16]. A simple random sample of mobile phone numbers was selected to obtain an additional sample of adult respondents. Interviews were conducted between February and December 2013. A dual-frame weighting approach was developed to allow for the different probabilities of selection for landline only, dual-phone and mobile-only users [17], by accounting for the overlapping probability of selection for dual-phone type owners. The survey was also weighted to match Local Health District, age group and sex population estimates from the Australian Bureau of Statistics (ABS) 2013 mid-calendar year population [18].
The NDSHS, administered by the Australian Institute of Health and Welfare, was based on private dwelling households across Australia, sampling respondents aged 12 years or older [13]. Private dwellings such as hotels, motels and boarding houses were excluded from the sample as were institutional settings. A multi-stage stratified random sample was used where each state was divided into two strata; capital city and rest of state [19]. Smaller geographical areas, Statistical Area 1 (SA1) in the capital city strata and Statistical Area 2 (SA2) in the rest of state strata, were selected with probability proportional to size (based on the total number of households), with households sampled systematically within each smaller area [20]. The target respondent within each household was selected using the next birthday method. The survey was administered between 31 July and 1 December 2013, with respondents completing a de-identified paper copy of the questionnaire without an interviewer present. Population estimates used for weighting were based on the age and sex profile of each stratum using the June 2012 ABS estimated resident population. All analyses which follow are for respondents aged 18 years or older and were restricted to NSW residents.
NSWPHS data was accessed through Secure Analytics for Population Health Research and Intelligence. NDSHS data was accessed through the Australian Data Archive.

Data preparation
Data from the NSWPHS and the NDSHS was prepared to facilitate combined analyses by appending data from the NDSHS to the NSWPHS, with analyses restricted to respondents 18 years and older. A review of demographic and geographical variables common to the two surveys was undertaken to ensure that definitions were standardised prior to any analysis. Variables in both surveys were harmonised to a common standard, except where this was not possible, such as socio-economic status and remoteness status. For the NDSHS, these concepts were mapped to the data via a concordance file at the SA1-level. For the NSWPHS, these concepts were mapped to the data via a concordance file at the postcode level. SA1 and postcode represented the finest spatial boundaries available for both surveys, with postcodes generally representing larger spatial regions on average compared to SA1s in Australia. Socio-economic status (SES) was defined according to the Index of Relative Socio-Economic Advantage and Disadvantage [21]. Remoteness was defined according to the Accessibility and Remoteness Index of Australia [22]. Phone ownership status was derived from information about the number of landline telephones and mobile phones the respondent personally owned on the NSWPHS and by two questions on telephone ownership on the NDSHS. The NDSHS included a small proportion of respondents who had no telephone and these were excluded from the analysis. Other variables which could not be harmonised were not used in the analysis [23]. Therefore, we were not able to include income as a covariate, and used socioeconomic and remoteness status of the area as a proximate measure. Although there were slight differences in the way that the smoking status question was asked between the surveys, we were able to identify a common response category, which was daily use (see Additional file 1). Stratification variables for the two surveys were treated as distinct strata [24]. Further, as the primary sampling unit for each survey was different (SA1 and SA2 in the NDSHS; households in the NSWPHS); a new cluster variable was derived to account for this. Weights for both surveys were adjusted to represent 2013 population counts using an overall adjustment factor of N 2 /(N 1 + N 2 ) where N 2 is the 2013 population count and N 1 is the 2012 population count [23]. This adjustment ensures that the sum of the weights for the combined sample is equal to the 2013 population count and effectively adjusts for the two samples covering the same population and that they contribute approximately equally. All statistical analysis was performed on the combined data from the NSWPHS and the NDSHS.
Daily smoking prevalence estimates, odds ratios, and confidence intervals were estimated using SAS 9.4. Odds ratios were computed using SAS's SURVEY LOGISTIC procedure and prevalence estimates were obtained using the SURVEY MEANS procedure. The Taylor series linearisation method was used to estimate the variance of prevalence estimates and model parameter estimates [24].

Sensitivity analyses
As data from the NDSHS was only collected during the second half of 2013, a sensitivity analysis was performed by restricting data from the NSWPHS to the same period and comparing our findings from the full dataset to the restricted data. Similar results would confirm the validity of the analysis and warrant no further investigation of a possible seasonal effect in the data.
Further, to compare the results with a more precise weighting adjustment, the weighting adjustment factor of N 2 /(N 1 + N 2 ) was also applied for each age and sex stratum as a second sensitivity analysis [24]. While an overall weighting adjustment is usually recommended [23], further investigation of the impact of any adjustment factor was warranted to ensure that our findings were robust to one of the key decisions made when combining data from the two surveys.

Respondent profile
A total of 12,751 respondents from the NSWPHS were aged 18 years or older and a total of 6,009 from the NDSHS were NSW residents aged 18 years or older who owned a telephone. Response rates were 30% for the NSWPHS and 32.7% for the NDSHS using the American Association for Public Opinion Research defined Response Rate 3 [25]. Mobile-only respondents constituted 9.6% of the NSWPHS sample under consideration, 77.9% were dual phone respondents, and 12.5% were landline-only respondents. By contrast, mobile-only respondents constituted 20.5% of the NDSHS sample under consideration, 54.3% were dual phone respondents and 25.2% were landline-only respondents. The distribution of respondent characteristics across the two surveys is shown in Table 1. The sex distribution for the mobile-only respondents was similar to the other phone ownership groups. Mobile-only respondents were more likely to be younger and had the largest proportion of respondents in the 25-34 years category.

Prevalence estimates
Daily smoking estimates by key demographic characteristics, collection mode and phone ownership status are presented in Table 2. The overall daily smoking estimate from the NSWPHS was similar to the NDSHS estimate (around 12% of the population reported daily smoking). Daily smoking estimates were similar for males and females for the two surveys. Although estimates by age group differed slightly between the surveys, a similar risk profile was apparent for each collection method, preserving disparities separately identifiable in each survey, such as higher smoking rates for males, people living in rural areas, and people living in the first socio-economic  quintile. While daily smoking estimates for the MPO population were higher in the NSWPHS (20.3%, 95% CI: 17.5-23.1) than the NDSHS (17.8%, 95% CI: 15.1-20.5), estimates from both surveys were higher than all other phone ownership groups across the majority of subgroups. Disparities in metropolitan and rural/regional estimates were more pronounced in the NDSHS than the NSWPHS. Daily smoking estimates by country of birth and education were similar. Daily smoking estimates for the MPO population were considerably higher than estimates from other phone ownership groups.

Statistical models
Odds ratios for the outcome measure of daily smoking are given in Table 3. Results from the crude model (the Phone ownership effects only model) indicate that the MPO population were more likely (OR: 2.14, 95% CI: 1.84-2.49) to be daily smokers compared with the dualphone ownership population. Interestingly, estimates for the MPO population remained identical after controlling for the collection method. Disparities in MPO population estimates of daily smoking still persisted after accounting for age group, sex, remoteness, socio-economic status, country of birth and education, while the landline-only phone ownership group remained quite similar to the dual-phone ownership group (Fig. 1).

Sensitivity analyses
Restricting data from the NSWPHS data to the second half of 2013 (to match the collection period for the NDSHS) produced estimates and disparities within population sub-groups which were broadly consistent with results based on a full year of data (data not shown).
Applying the weighting adjustment to both surveys by age and sex stratum produced similar results to those obtained by using the simple weighting adjustment of N 2 /(N 1 + N 2 ) (data not shown).

Discussion
This study demonstrates that CATI surveys can produce estimates that are consistent with self-administered surveys for daily smoking, not only for the total population, but also for most population sub-groups, including the MPO population. These findings were consistent even after accounting for factors such age group, sex, remoteness, socio-economic status, country of birth and education.
Higher daily smoking estimates for the MPO population were not unique to the CATI survey, with a similar pattern of estimates by phone ownership group observable for the self-administered survey. Further, this study reinforces the argument that dual sampling frames adequately address biases that arise in estimates of smoking behaviour, when compared to a self-administered survey.
Although it was anticipated that the CATI survey results were more likely to be biased towards more socially desirable responses [9][10][11], we noted that this effect was not observed in our study, with overall estimates for each survey almost identical, and only a 2% difference in the prevalence estimate for the MPO population between the surveys. This can be seen in Table 2 where the estimates are around 20% and 18% for the CATI and self-administered modes respectively.
The MPO population had higher odds of daily smoking even after adjusting for important demographic characteristics, which is consistent with previous research in the area [8], and has built on this research by noting that these differences have not been attributed to different collection methods. Our study has demonstrated that using a dual sampling frame (mobile and landline) for telephone interviewing can help to reduce biases arising from the declining coverage of landline phone number sampling frames, which is consistent with other findings [6,8].
Although self-administration provides some advantages in terms of coverage for surveys of the general population, one of the primary limitations of these methods is the cost, especially compared to other interviewing modes [26]. While mobile phone interviewing is more expensive compared to landline interviewing in Australia, these differences are ameliorated when conducting national phone surveys, or studies where the target population makes up a substantial proportion of the overall population.
This study benefited from the availability of large sample sizes from the two surveys, meaning that we were able to identify whether our findings were robust to the use of different weighting adjustments, and from restricting analyses to the second half of 2013. The weighting adjustment applied ensures that the sum of the weights for the combined sample is equal to the 2013 population count and effectively adjusts for the two samples covering the same population. Our planned sensitivity analyses enabled us to determine that our model estimates were relatively robust to any potential seasonal effects arising from the differences in collection period for the surveys. Variances for our prevalence and odds ratio estimates were estimated under the assumption that the two samples were independent. While there is the possibility of some overlap in respondents between the samples, it is negligible and will not affect the variance estimates to any meaningful degree [23]. The analysis would have benefited from harmonised SES and remoteness variables based on SA1 for all respondents. Although the SES and remoteness variables were derived at different geographic levels in the two surveys, disparities in estimates for both surveys broadly matched in terms of the direction of the association.
Although biochemical tests of tobacco use may be more precise, our study has shown estimates arising from a CATI survey were similar to those arising from a self-administered survey. Other studies have found that self-administered surveys are broadly consistent with the findings of biochemical tests [27][28][29]. It is also noted that responses to sensitive questions may vary depending on the specific topic [30]. Therefore, CATI responses to sensitive questions such as illicit drug use or mental health may not behave as consistently to the self-administered approach as has been observed for daily smoking estimates in this study. Further analysis and assessment of these indicators is needed in order to ascertain whether dual-frame sampling can reconcile differences in estimates of illicit drug use between CATI and self-administered surveys.

Conclusions
We have demonstrated in this study that daily smoking estimates vary consistently across both a CATI survey and a self-administered survey. Further, we have demonstrated that higher daily smoking estimates for the MPO population are not an artefact of the dual-frame design, but have also been observed in a survey where phone ownership is not relevant to the administration of the survey. Our results provide evidence that daily smoking rates for the MPO population, while high, are not being driven by the mode of collection and lend credence to the use of dual sampling frame telephone surveys as a cost effective tool for the collection of health risk factor and behaviour information for large populations.