Clearing the air: underestimation of youth smoking prevalence associated with proxy-reporting compared to youth self-report

Background Smoking remains a leading cause of disease burden globally. Declining youth smoking prevalence is an essential feature of effective tobacco control; however, accurate data are required to assess progress. This study investigates bias in youth smoking prevalence estimates by respondent type (proxy-reported, self-report with parent present, or self-report independently) for Aboriginal and Torres Strait Islander and total populations of Australia. Methods Repeated cross-sectional analysis of representative Aboriginal and Torres Strait Islander Health and National Health Surveys, 2007–2019. Data were restricted to participants aged 15–17 years. Prevalence ratios (PR) and 95% Confidence Intervals (CI) for ever-smoking by respondent type were calculated using Poisson regression with robust standard errors. National youth current-smoking prevalence was estimated if all data were collected by youth self-report; estimates and trends were compared to observed estimates. Results Over 75% of all smoking status data were reported by proxy or with parent present. Ever-smoking prevalence among youth self-reporting independently versus proxy-reported was 1.29 (95% CI:0.96–1.73) to 1.99 (95% CI:1.39–2.85) times as high for Aboriginal and Torres Strait Islander youth, and 1.83 (95% CI:0.92–3.63) to 2.72 (95% CI:1.68–4.41) times as high for total population youth. Across surveys, predicted national current-smoking prevalence if all youth self-reported independently was generally higher than observed estimate. Conclusions Estimates of youth smoking prevalence are likely inaccurate and underestimated if data are collected by proxy or with parent present. Increased reliance on data reported by youth independently is crucial to improve data accuracy, including to enable accurate assessment of national prevalence. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01594-w.


Background
Tobacco use is a leading contributor to the burden of disease globally, and is an area with substantial potential for health improvement. Effective tobacco control is the combination of reducing tobacco smoking (hereafter referred to as smoking) initiation and increasing cessation in established smokers. Progress in reducing smoking prevalence in Australia is increasingly driven Page 2 of 10 Barrett et al. BMC Medical Research Methodology (2022) 22:108 by reduced smoking in youth [1,2]. Accordingly, youth smoking prevalence is a key outcome measure for national monitoring and evaluation. Generally, smoking status is accurate when selfreported by the participant [3], particularly if participants perceive a high degree of confidentiality and anonymity in data collection [4]. However, some survey designs allow for youth smoking status to be reported by a proxy ("proxy-reported"), or elicited in the presence of a parent or guardian ("with parent present").
A proxy reporting on behalf of the youth may not know the actual smoking behaviours. Youth who smoke may be less likely to report smoking with a parent or guardian present, resulting in underreporting. There is limited international evidence on the potential bias in reported youth smoking introduced through use of these data collection methods from international studies [5][6][7], and none in the Australian context. Progress against Australian policy targets are generally assessed using nationally representative surveys of the Aboriginal and Torres Strait Islander population and of the total population, including national health and social surveys conducted by the Australian Bureau of Statistics (ABS). These surveys collect data on youth smoking through personal interview with the youth, if a parent or guardian consented. Where consent is granted for a personal interview, some youth answer in the presence of a parent or guardian, and others answer with no parent present. Where consent is not granted for a personal interview with the youth, a parent or other adult respond on their behalf (proxy respondent). This may undermine our understanding of true progress in reducing youth smoking, and overall smoking prevalence, in Australia.
We aimed to quantify potential bias by respondent type in youth (15-17 years) smoking prevalence estimates for the Aboriginal and Torres Strait Islander and total population of Australia over time. We aimed to examine the extent to which national youth smoking prevalence estimates and trends could differ from current estimates generated from these surveys if all data were self-reported by youth independently.

Data sources
Existing national cross-sectional surveys conducted by the ABS were accessed through ABS DataLab using Confidential Unit Record Data Files [8]. This included a total of six datasets from surveys conducted between 2007 and 2019; the National Aboriginal and Torres Strait Islander Health Survey (NATSIHS 2018-19), the Australian Aboriginal and Torres Strait Islander Health Survey 2012-13 (AATSIHS 2012-13), and the National Health Survey (NHS, 2007(NHS, -08, 2011(NHS, -12, 2014(NHS, -15, and 2017. These surveys provide representative estimates for the Aboriginal and Torres Strait Islander population (NAT-SIHS/AATSIHS) and the total population (NHS). Each survey collects information by face-to-face interview from usual residents of private dwellings, covering around 97% of the targeted population. Briefly, the surveys are conducted using a stratified multistage area sample of private dwellings to ensure that all sections of the in-scope population are represented. The NATSIHS comprises a "community" sample, made up of discrete Indigenous communities, and a "non-community" sample, made up of persons in private dwellings in other areas. For the NATSIHS, in each identified Aboriginal and Torres Strait Islander household, up to two adults (≥ 18 years) and two children (0-17 years) were randomly selected in non-remote areas, and up to one adult and one child were randomly selected in remote areas. In the NHS, one adult and one child within each selected dwelling were randomly selected for inclusion. More details on the sampling frame and design of the surveys are available from the ABS [9, 10].
Data were restricted to participants aged 15-17 years as smoking status was not measured for youth younger than 15 years.

Outcome: smoking status
Youth smoking status was recorded as current daily smoker, current weekly smoker (at least once a week but not daily), current less-than-weekly smoker, ex-smoker, or never-smoker (does not currently smoke, has not previously smoked daily, and has smoked fewer than 100 cigarettes or 20 pipes, cigars or other tobacco products in the participant's lifetime). Smoking status relates to use of combustible tobacco products only. Participants were categorised as current-smokers (combining daily, weekly, and less-than-weekly), ex-smokers, or never-smokers. A binary 'ever-smoked' variable (current-and ex-smoker combined versus never-smoker) and 'current-smoker' variable (current-smoker versus ex-smoker and neversmoker combined) was used where required for analysis.

Exposure: respondent type
Respondent type was categorised as: proxy-respondent, youth self-report with parent present for some or all smoking questions, or youth self-report independently.

Potentially confounding variables
Potentially confounding variables were those factors conceptually considered to be linked to both respondent type and smoking behaviour, restricted to available factors. Sex was categorised as male or female, based on self-reported responses. Age of youth was categorised as 15-16 years or 17 years. Education status of the youth was categorised as currently studying or not currently studying. Remoteness was categorised as major cities, inner regional, or outer regional and remote for all NHS analyses. Remoteness was categorised as major cities, inner regional, or outer regional, remote and very remote for distribution of respondent type by remoteness in the 2018-19 NATSIHS and the 2012-13 AATSIHS. Tailored distribution data by remoteness were provided by the ABS for the 2012-13 AATSIHS to enable comparability across surveys. Due to use of different remoteness categorisations between datasets, a binary remoteness variable (remote or non-remote) was used as the confounding variable in 2018-19 NATSIHS/2012-13 AAT-SIHS analyses.

Statistical analysis
All analyses were repeated for each survey. An alpha level of 0.05 was the threshold for statistical significance. Data were analysed using Stata 16, in ABS DataLab.

Unweighted analysis
We quantified the distribution (percentage and 95% Confidence Intervals (CI)) of respondent type for youth smoking data overall and by potentially confounding factors.
The prevalence of current-, ex-, never-and eversmoking was calculated overall and by respondent type. Prevalence Ratios (PR) and 95% CI for 'eversmoked' by respondent type were calculated in the youth sample of each survey using Poisson regression with robust standard errors. Analyses were adjusted for the potentially confounding factors. Fit of Poisson models was confirmed using Pearson goodness-of-fit test.

Weighted analysis
The above analyses were repeated with survey weights applied. For all weighted estimates, data were weighted to the total in-scope population (Aboriginal and Torres Strait Islander or total population), using replicate weights provided by the ABS, and employing the delete-a-group jackknife replication method, described in detail elsewhere [11]. To assess impact of respondent type on estimates of youth smoking status, the PR analysis was also conducted using 'current-smoker' as the outcome. These PR results were used to predict the national prevalence of current-smoking if all youth smoking data were collected by youth self-report independently (using the Stata margins command [12], Supplementary table S1). Predicted prevalence estimates and their corresponding 95% CI were compared to those of actual prevalence estimates using an upper tailed Z test. Differences in slope of predicted prevalence and actual prevalence trend lines were compared using methods outlined by Andrade and Perez [13]. Briefly, we tested the assumption of equality of variances between the two regression trend lines using an F-test. The assumption of equality of variances was met; given the small number of time periods, we performed a t-test based on a pooled standard error calculated from the standard errors of the two regression trend lines.

Results
The substantial majority of youth aged 15-17 years had their smoking status collected by proxy or with parent present (75.6-92.7% of Aboriginal youth and 77.8-86.1% of total population youth) and this proportion generally increased across years examined (Table 1). Youth aged 17 years tended to self-report smoking behaviours independently more often than those aged 15-16 years across both datasets and all years. The distribution across respondent types was generally similar for males and females, by education status and by remoteness across surveys. No participants had missing data for any included variables.
Among Aboriginal and Torres Strait Islander youth with proxy-reported smoking status 21.3% (95% CI 16.5-26.1) were current-smokers in 2012/13 and 17.4% (95% CI 13.4-21.4) in 2018/19, compared to 25.4% (95% CI 19.1-31.7) and 33.3% (95% CI 18.5-48.2) of youth self-reporting independently in the corresponding years ( Table 2). In the total population youth sample, current-smoking prevalence was 4.7% (95% CI 2.5-7.0) in 2007/08 and 5.6% (95% CI 2.9-8.3) in 2011/12 among youth with proxy-reported smoking status, compared to 15   08) times as high for total population youth. While consistent with higher current-smoking prevalence among youth reporting independently, CIs were wide due to small numbers of Table 2 Prevalence of current, ex-, never, and ever smokers overall and by respondent type, and aPRs for 'ever smoked' by respondent type, within the youth (15-17 years) sample of each survey -indicates that data were not presented due to small numbers in one or more categories. Prevalence estimates and aPRs are calculated using unweighted data aPR, adjusted prevalence ratio (adjusted for age, sex, remoteness and education status); CI, confidence intervals; NATSIHS, National Aboriginal and Torres Strait Islander Health Survey; NHS, National Health Survey   Table S1). In all six surveys, the predicted national current-smoking prevalence if all youth were to self-report smoking status independently was substantially higher than the observed estimate based on actual responses ( Fig. 1; Supplementary table S1). The difference was significant in the 2007/08 (17.4% vs 6.9% respectively, p-value = 0.011) and 2011/12 NHS (12.6% vs 6.7%, p-value = 0.035).
See Table S3 for  The rate of change in Aboriginal and Torres Strait Islander youth current-smoking was similar using observed data and predicted estimates if all youth selfreported independently while the rate of change in total population youth was significantly greater using the predicted estimate if all youth self-reported independently (p-value = 0.045) (Fig. 1).

Discussion
One of Australia's largest sources of representative data about youth smoking prevalence predominantly, and increasingly over time, relies on data about youth behaviours reported by proxy or with an adult present at interview. Data collection by proxy or with parental presence leads to under-reporting compared to youth self-report independently and is likely to have resulted in underestimation of actual national youth smoking prevalence from 2007-2019, with predicted prevalence 1.3 to 2.5 times as high as the observed prevalence if all youth had self-reported. Based on the most recent survey data (2018-19 NATSIHS and 2017-18 NHS), there may be up to 3200 more Aboriginal and Torres Strait Islander youth and 18,600 more total population youth currently smoking than estimated based on observed responses. There is potential for further increasing bias in national health surveys if the percentage of youth self-reporting independently continues to decline.
In Australia's total population youth, the gap between predicted and observed smoking prevalence narrowed over time, reflecting changes in the proportion of youth self-reporting independently, bias (current smoking prevalence ratio for independent report vs proxy report) at each time point, and actual smoking behaviour. If all youth had self-reported independently, we may have seen a similar absolute prevalence decline in Aboriginal and Torres Strait Islander youth (5.9% compared to 6.2% in observed data). In total population youth, we may have seen a larger absolute prevalence decline over time (12.0% compared to 3.7% in observed data), but with a much higher starting point.
These findings and the observed magnitude of association are in accordance with the limited international evidence which suggest that youth smoking data reported by a parent proxy [6], or in the presence of a parent [7], results in underreporting of youth smoking. Harakeh et al. found that the percentage of youth aged 14-17 years who had ever tried smoking in a sample from the Netherlands was nearly double when self-reported by the youth independently versus proxy-reported by the mother (47.8% vs 26.8%) [6]. In a representative sample of Californian students aged 12-17 years, parental presence at data collection was associated with 30% lower odds of reporting current-smoking (OR 0.70, 95% CI: 0.56,0.86) [7]. Collection of other forms of data, including alcohol and drug use and e-cigarettes, by parent proxy or with parent present show similar results [14][15][16].
Collecting precise data on smoking behaviour is critical to monitor smoking trends over time and is consistent with the World Health Organization's Framework Convention on Tobacco Control [17], signed by Australia, which states that "each party shall endeavour to: … progressively establish and maintain updated data from national surveillance programmes…" (pg. 18). This is particularly important given youth non-uptake is vital to the success of tobacco control. The Implementation Plan for the National Aboriginal and Torres Strait Islander Health Plan 2013-2023 [2] sets targets to increase the prevalence of Aboriginal and Torres Strait Islander youth aged 15-17 years who have never smoked from 77 to 91% by 2023. Smoking population prevalence data from the ABS national health and social surveys are used to inform progress against the Implementation Plan target, and could also inform the imminent new Implementation Plan for the National Aboriginal and Torres Strait Islander Health Plan, the next iteration of the National Tobacco Strategy and the National Preventative Health Strategy. Underestimating youth smoking due to bias within these surveys may lead to a false sense of security regarding tobacco control, particularly if reliance on proxy report continues to increase. This may also lead to an artificial jump in smoking prevalence at age 18 years when survey participants all self-report independently.
It is prudent to consider alternative methods for collecting data from youth within national surveys. Strategies such as gaining consent for the youth to self-report privately using computer-assisted self-interviewing software have been employed within other surveys in efforts to ensure greater privacy in youth data collection [18]. Further research is warranted to explore if this method could be used to produce more valid youth smoking data, including their feasibility within large quantitative survey data collection.
Within Australia, other sources of youth smoking data are also used to monitor trends, although each has its own potential limitations. The Australian Secondary Students' Alcohol and Drug Survey (ASSAD) collects data on youth smoking every three years from a representative sample of Australian students (aged 12-17 years) enrolled in school nationally [19]. The ASSAD is administered on school premises, which has been shown to result in higher reported youth smoking compared to surveys administered in the home [4,7]. However, the sample is restricted to youth who attend school, and does not adequately capture youth living in remote areas, or attending small schools (fewer than 100 students) [20]. These limits in survey scope are likely to lead to an underestimation of smoking prevalence, particularly in the Aboriginal and Torres Strait Islander youth population. The National Drug Strategy Household Survey (NDSHS) has collected data on youth smoking behaviours every three years since 1985. Although the NDSHS does not collect youth smoking data by proxy, it does allow parents to be present at data collection. In the 2019 NDNHS, parent presence for youth aged 14-15 years was around 40% [21]. Like the ASSAD, the NDSHS should not be used for Aboriginal and Torres Strait Islander-specific estimates. Australia currently lacks any other nationally representative data about Aboriginal and Torres Strait Islander youth (15-17 years) smoking, resulting in a reliance solely on the ABS national health and social surveys. A detailed overview of other sources of youth smoking data in Australia has been included in Supplementary Material (Table S4).
There are several limitations to consider in the interpretation of these findings. The changes across ABS surveys in the scope, sample design, coverage, and questions asked and category definitions make it difficult to confidently assess trends in smoking. The prevalence estimates presented here may contrast with the findings of other analyses that used different categorisations of smoking status or age groups. This problem is compounded by the limited number of Aboriginal and Torres Strait Islander surveys with data on respondent type available. For example, data from another ABS survey, the National Aboriginal and Torres Strait Islander Social Survey (NATSISS), also collects youth smoking data by proxy or with parent present, and is also used to monitor trends in Aboriginal and Torres Strait Islander youth smoking. However, data on respondent type is unavailable for the NATSISS, and could not be included in the analysis.
Additional factors such as the youth's level of independence and the smoking status of the parent present, not measured in this dataset, may relate to parent presence at interview and youth smoking behaviours. These factors may influence the extent to which youth are comfortable disclosing their actual smoking behaviours. However, any additional factors are unlikely to account for the magnitude of difference observed.
Lastly, we assume youth smoking data when selfreported independently is more likely to be correct. There are additional factors that can introduce bias in self-report, as even specific characteristics of the interviewer can influence responses [22]. The reliability of this self-reported youth smoking data was not validated with biochemical measures. However, previous research has shown that smoking in adolescents can be accurately assessed with self-reports if confidentiality and anonymity are guaranteed [3,23], as is more likely the case when youth self-report independently.

Conclusion
Our findings demonstrate that youth tobacco smoking estimates are unlikely to be accurate if drawn from data collected by a proxy respondent or with parent present. In order to improve the accuracy of data on youth smoking behaviours, it is important to collect sufficient data through self-report, in a safe and confidential manner. To achieve this within these Australian ABS surveys would require both increasing the number of youth who are present at interview, and increasing parents' willingness to have the youth self-report independently. Furthermore, it is critical to assess the suitability of available data sources for measuring and monitoring prevalence and trends in youth smoking.