 Research article
 Open Access
 Published:
Analysis of selfreport and biochemically verified tobacco abstinence outcomes with missing data: a sensitivity analysis using twostage imputation
BMC Medical Research Methodology volume 18, Article number: 170 (2018)
Abstract
Background
Missing data are common in tobacco studies. It is well known that from the observed data alone, it is impossible to distinguish between missing mechanisms such as missing at random (MAR) and missing not at random (MNAR). In this paper, we propose a sensitivity analysis method to accommodate different missing mechanisms in cessation outcomes determined by selfreport and urine validation results.
Methods
We propose a twostage imputation procedure, allowing survey and urine data to be missing under different mechanisms. The motivating data were from a tobacco cessation trial examining the effects of the extended vs. standard Quit and Win contests and counseling vs. no counseling under a 2by2 factorial design. The primary outcome was 6month biochemically verified tobacco abstinence.
Results
Our proposed method covers a wide spectrum of missing scenarios, including the widely adopted “missing = smoking” imputation by assuming a perfect smokingmissing correlation (an extreme case of MNAR), the MAR case by assuming a zero smokingmissing correlation, and many more in between. The analysis of the data example shows that the estimated effects of the studied interventions are sensitive to the different missing assumptions on the survey and urine data.
Conclusions
Sensitivity analysis has played a crucial role in assessing the robustness of the findings in clinical trials with missing data. The proposed method provides an effective tool for analyzing missing data introduced at two different stages of outcome assessment, the selfreport and validation time. Our methods are applicable to trials studying biochemically verified abstinence from alcohol and other substances.
Background
Cigarette smoking is a risk factor for morbidity and mortality in the US and around the world [1,2,3,4]. Smoking cessation studies usually encourage cessation and provide either behavioral (e.g., counseling) or pharmaceutical (e.g., nicotine gum) interventions or both. In smoking cessation studies, missing binary abstinence outcomes (i.e., quit or not quit) are very common. These missing outcomes may lead to bias or weakened statistical power in estimating the effect of the studied intervention. Choosing appropriate statistical methods to handle binary missing data has been a continuing source of controversy [5].
The choice of methods to deal with missing data would depend on assumptions about the missing mechanism [6]. Data are referred to as being missing at random (MAR) if the missing status (yes or no) is not related to the missing value itself, but can be dependent on some other observed variables. Data are referred to as being missing not at random (MNAR) or nonignorable missing if the probability of missing depends on the missing value. It is well known that from the observed data alone, it is impossible to distinguish between MAR and MNAR. Therefore, statistical analyses based on one specific missing mechanism, such as the popular MAR assumption, could lead to misleading conclusions if it turns out that the missing is not at random. For example, consider a trial to incentivize smokers to quit smoking. Missing data due to a nonresponse in surveys or as a result of dropouts could be dependent on the smoking status of the participants, which renders the missing mechanism to be not at random. Sensitivity analysis can play a crucial role in assessing the robustness of the findings in clinical trials with missing data [7]. In this paper, we aim to study sensitivity analysis methods for analyzing smoking cessation outcome data under various missing data mechanisms including MAR and MNAR.
In the literature, the standard procedure used in smoking cessation trials is to assume that all nonrespondents are smoking (referred to as the “missing = smoking” method hereinafter), which is a special case of single imputation under the MNAR assumption. Based on Jackson et al. [8], around 80% of reports of smoking cessation trials adopt this assumption. However, this simple imputation approach has been shown to lead to potentially biased results [8,9,10,11,12]. Other common single imputation methods frequently used in smoking cessation trials include the last observation carried forward (LOCF), the baseline observation carried forward (BOCF) and imputations based on predicted values from a regression model or the expectationmaximization (EM) algorithm [13]. In addition, Barnes and colleagues [14] used a multiple imputation procedure with the propensity score matching method to impute missing smoking status.
Hedeker and colleagues [12] demonstrated that the simple missing = smoking imputation is essentially based on the assumption that the missing status and the smoking status have a perfect correlation (r = 1) or, equivalently, the odds ratio (OR) between missing and smoking equals positive infinity (OR = +∞). They developed both simple and multiple imputation approaches based on more relaxed assumptions which allow different levels of correlation between smoking and missing status. Although their imputation method provides a more flexible and useful alternative to the simple missing = smoking method and has been applied in various trials [8, 13], this method cannot be directly applied to data with missing values generated from multiple sources or stages. For example, when cessation outcome is determined by selfreport survey data followed by a urine validation test, both a nonrespondent survey and a missing urine sample can lead to missing cessation outcomes. In this case, an imputation procedure designed to account for missing data generated from two different stages, the survey collection stage and the urine collection stage, would be preferable. A naive imputation approach for dealing with this type of missing data would treat only subjects who confirmed their abstinence by both selfreport and a urine sample with a negative result as a treatment success (i.e., biochemically verified selfreported abstinence). All the other subjects including those who either failed to complete the survey or failed to provide the urine sample for confirmation of selfreported abstinence would be considered as treatment failures (i.e., not achieving abstinence). Note that this naive approach is an extreme case under the MNAR assumption, assuming a perfect correlation between survey missing and selfreport failure and a perfect correlation between urine missing and urineverified failure among people who selfreported abstinence. Hence, it does not have the flexibility to accommodate different levels of correlation between survey missing and selfreport failure or between urine missing and urineverified failure. In this paper, we extend Hedeker et al.’s method [12] to a twostage imputation procedure to take into account missing data in either the selfreport or the urine verification stages.
The rest of the article is organized as follows. In Section 2, we first introduce a randomized controlled trial of college smokers [15] which motivated this research, and then, we introduce a sensitivity analysis method using a twostage imputation procedure for missing abstinence data at the selfreporting and subsequent biochemical verification stages. In Section 3, we report the sensitivity analysis result of the college smokers study. Some discussions and concluding remarks can be found in Sections 4 and 5, respectively.
Methods
Aim, design, and setting
The data motivating this research were collected from 1217 subjects enrolled in a smoking cessation randomized clinical trial entitled “Enhanced quit and win contests to improve smoking cessation among college students” (henceforth referred to as the “Enhanced Quit & Win” study) during the academic years 2010–2013. This study utilized a twobytwo factorial design to examine the marginal effect of two distinct interventions: the impact of multiple vs. single Quit & Win contests and the effect of the Motivational and Problem Solving counseling (MAPS) counseling vs. no counseling on smoking cessation among college smokers. Specifically, participants were randomly assigned to one of four groups: (1) single contest (denoted by Tx1, n = 306), (2) single contest plus counseling (Tx2, n = 296), (3) multiple contests (Tx3, n = 309), and (4) multiple contests plus counseling (Tx4, n = 306). The primary cessation outcome was measured at 6 months postrandomization when all participants were encouraged to complete an online survey to report their smoking status and other tobacco use in the past 30 days. Only people who reported no tobacco use in the past 30 days were invited to provide urine to biochemically (cotinine assay) confirm their selfreported abstinence. Both selfreported abstinence and biochemically verified abstinence were of interest. The study design and the characteristics of participants are described in greater detail in the parent study manuscript [15]. This trial was registered at ClinicalTrials.gov as number NCT01096108.
Sensitivity analysis using twostage imputation
As we described earlier, the missing data in the Enhanced Quit & Win study occurred at two different stages: the survey collection stage and the urine verification stage. A common and conservative imputation approach for dealing with such twostage missing data would treat only subjects who selfreported abstinence and provided a urine sample which confirmed the abstinence as a treatment success (i.e., biochemically verified abstinence). All the other subjects including those who either failed to complete the survey or failed to provide urine would be considered as a treatment failure. This is analogous to the missing = smoking method for onestage missing data. Note that this approach is an extreme case of the single imputation approach under the not missing at random (MNAR) assumption, assuming a perfect correlation between the survey missing and selfreport failure, or equivalently an infinite odds ratio between the two (denoted by OR_{1} = ∞), and at the same time a perfect correlation between the urine missing and urineverified failure (denoted by OR_{2} = ∞).
In this paper, we propose a twostage imputation approach under the MNAR assumption, which takes into account the twostage missing process and allows (1) different levels of correlation between the survey missing and selfreport failure (i.e., varying OR_{1}) and (2) different levels of correlation between the urine missing and urineverified failure among those who selfreported abstinence (i.e., varying OR_{2}). This can be considered as an extension of the imputation method in Hedeker et al. [12] for onestage missing data to a twostage missing data situation. In this section, we present a twostage imputation approach conducted on a summary or aggregated data basis.
The onestage imputation method by Hedeker et al. [12] for the selfreport data
We first introduce some general notation. We code “tobacco use status”, the binary dependent variable as 1 = used tobacco/failure and 0 = did not use tobacco/abstinence and “missing status”, the binary indicator of whether the data is missing or not as 1 = missing and 0 = observed. Let j = 1, 2, 3, 4 index the four treatment groups, Tx1 to Tx4, respectively. Let subjects be indexed by i = 1, 2, …, n_{j}, where n_{j} denotes the total number of subjects in treatment group Tx_{j}. Since we propose to perform imputations within each treatment, in the sequel we omit j from all symbols to simplify notation. Moreover, we use superscripts 11, 12, 21, and 22 to denote the four entries of the twobytwo table between the tobacco use status and missing status, as illustrated in Table 1. Note that, in the second row of Table 1, only the total number of individuals with missing data, n^{2.}, can be observed; the abstinence statuses of these people, n^{21} and n^{22} (the second row in Table 1) are unknown and need to be estimated. Furthermore, in the summation row of Table 1, the total number of abstinence (denoted by n^{.1}) and the total number of failure (denoted by n^{.2}) are also unknown. Note that the ‘dot’ in the superscripts indicates summation over a row or column.
Following Hedeker et al. [12], in order to impute the numbers for abstinence and failure for participants with missing survey data (n^{21} and n^{22}), we will assume an odds ratio for the missing survey status and selfreport tobacco use status (OR_{1}) to reflect the strength of correlation between them (denoted by r_{1}). Note that the widely adopted missing = smoking method corresponds to the situation of r_{1} = 1 or OR_{1} = ∞. In that case, n^{21} is imputed with 0 and n^{22} is imputed with n^{2.}. More generally, we have
and then it can be shown that the unobserved values, n^{21} and n^{22} can be imputed with the assumed OR_{1} by:
where Odds is the odds of tobacco using among survey respondents and can be calculated from the observed survey data by \( \frac{n^{12}}{n^{11}} \); and \( \pi =\frac{OR_1\ast Odds}{1+{OR}_1\ast Odds} \) is a multiplicative factor relating n^{22} to n^{2.}.
Participants who do not respond or are lost to followup in a smoking cessation study may differ from those who are retained in the study with regard to their smoking status. We often expect that the odds of tobacco use among nonrespondents is equal to or higher than that of respondents (i.e., OR_{1} ≥ 1), especially in studies where people are incentivized to quit as in the Enhanced Quit & Win study. Note that a larger OR_{1} would imply a stronger relationship between missing and tobacco use.
The twostage imputation method for the urineverified data
When estimating biochemically verified abstinence, more complex conditions should be considered since missing data can be present at both the survey and the urine verification stage. Without specification, the notation for the survey data are the same as those defined previously (see Table 1 and the top half of Fig. 1). Some additional notations, shown in the lower half of Fig. 1, specific to the urine data are defined as follows. Let u^{(obs)} and u^{(imp)} denote the number of urine samples provided by people who selfreported abstinence (n^{11}) and the estimated number of urine samples that could be collected from people who would report abstinence if they did not fail to respond to the survey (n^{21}), respectively; similarly, v^{(obs)} and v^{(imp)} are used for the number of missing urine samples of n^{11} and n^{21}, respectively. For the urineverified abstinence outcome, similar notation is defined as for the selfreport abstinence outcome except for using f, instead of n. The superscript 11, 12, 21, and 22 have the same meaning as those for n. In addition, we use f^{11(obs)} to denote the number of urineverified abstinence and f^{12(obs)} the number of urineverified failures obtained from people who actually provided urine samples, and we have u^{(obs)} = f^{11(obs)} + f^{12(obs)}.Similarly, we use f^{11(imp)} to denote the number of urineverified abstinence and f^{12(imp)} the number of urineverified failures obtained from the estimated available urine samples, and we have u^{(imp)} = f^{11(imp)} + f^{12(imp)}. Then we combine f^{11(obs)}and f^{11(imp)} to obtain the total number of participants with urineverified abstinence f^{11.}, among the urine samples what were actually provided or could have been provided if there were no surveys missing; similarly, we combine f^{12(obs)} and f^{12(imp)} to obtain the total number of urine verified failures f^{12.}.
Based on the previous imputation results for missing data at the survey stage, selfreport abstinence (n^{11}, n^{21}) and failed abstinence (n^{12}, n^{22}) have been generated based on the imputed survey data under the assumed OR_{1} within each treatment. Next, we proceed to estimate urineverified abstinence or failure under the assumed OR_{2} for the imputed, “complete” selfreport data. Prior to imputing missing urine sample data, the numbers of subjects who would have provided urine samples (u^{(imp)}) or would not provide urine samples (v^{(imp)}) among survey nonrespondents need to be estimated. One can assume that the urine missing rate among survey nonrespondents, compared with respondents, varies by a known factor λ (λ > 0), that is
Consequently, the number of available (u^{(imp)}) and unavailable (v^{(imp)}) urine samples among imputed selfreport abstinence cases can be calculated based on Equation (2) and the fact that u^{(imp)} + v^{(imp)} = n^{21}.
Similarly, one can assume that, compared to the actually provided urine samples, the urineverified abstinence rate among the urine samples that could have been provided if the survey were completed, varies by a known factor η (η > 0):
Therefore, the number of urineverified abstinence (f^{11(imp)}) and failure (f^{12(imp)}) among imputed selfreport abstinence cases people can be estimated based on Equation (3) and the fact that f^{11(imp)} + f^{12(imp)} = u^{(imp)}.
We then can calculate the total number of urineverified abstinence cases by f^{11.} = f^{11(obs)} + f^{11(imp)} and the urineverified failure by f^{12.} = f^{12(obs)} + f^{12(imp)} among all the “available” urine samples (including actually observed or imputed). Up to this point, the urineverified abstinence (f^{21}) and urineverified failure (f^{22}), among people whose urine was not actually provided (v^{(obs)}) or would not be provided even if their survey data were completed (v^{(imp)}), have not yet been imputed. Next, we use the fact that v = f^{22} + f^{21} = v^{(obs)} + v^{(imp)} and propose a similar imputation procedure for the urine missing data as for the survey missing data described in the previous subsection as follows:
where the second equality follows from Equation (3), and Odds^{′} and π^{′} are the odds of tobacco use and probability of tobacco use among people who provided urine sample, respectively. The overall number of participants with urineverified abstinence can then be obtained by simply adding f^{11.} and f^{21}, and similarly, the overall number of urineverified failure is f^{12.} + f^{22}. After all the above steps are completed for each treatment arm, we can estimate the various treatment effects based on the imputed data.
For the Enhanced Quit & Win data, we assumed a series of ≥1 values for OR_{1} and OR_{2} (1, 2, 3, 4, 5, and positive infinity) and that λ = η = 1 for the ease of presentation, but certainly more values can be examined for these parameters in the sensitivity analysis. SAS Version 9.4 (SAS Institute Inc., Cary, NC, USA) was used for all analyses and the SAS computing code for the proposed twostage imputation method is provided in the Additional file 1: Supplementary Material.
Results
Summary of missing data
Figure 2 shows the summary of the 6month abstinence outcomes and missing data. Of the 1217 randomized participants, 981 (81%) completed the 6month survey and 236 (19%) did not. Among the 981 survey completers, 264 (27%) selfreported tobacco abstinence. Among the 264 participants who selfreported abstinence, 182 (69%) provided urine. Among the 182 participants who provided urine samples, 5 were not of adequate amount for testing and 153 (84%) were biochemically confirmed as abstinent.
Table 2 presents the differential missing data patterns across treatment arms and intervention conditions by both survey missing and urine missing. Note that the five missing urine test results due to inadequate urine amount were assumed to have the same distribution as the other 177 urine samples (86% verified abstinence and 14% verified failure) and added to the corresponding columns in Table 2. We found that the no counseling groups, Tx1 and Tx3, had significantly (p = 0.003) lower survey missing rates (15.4 and 16.5%, respectively and 15.9% for the combined group) than the two counseling arms, Tx2 and Tx4 (22.6 and 23.2%, respectively and 22.9% for the combined group), whereas the single and multiplecontests groups were found to have similar survey missing rates (p = 0.798). The urine missing rate was similar between the single and multiplecontests groups and between the counseling and no counseling groups (both ps > 0.05).
Selfreport abstinence outcome
The imputation results of the selfreport abstinence outcome are summarized in Table 3. As a comparison, we also present the results from a complete case only analysis, where only subjects with no missing survey or urine were included. We can prove that the abstinence rate decreases as OR_{1} increases. As expected, the estimated abstinence rates and treatment effect based on the imputed data under the MAR assumption (i.e., OR_{1} = 1) are the same as those based on the complete case only analysis. However, the statistical significance is stronger (smaller p) in the former as more data are utilized. Under the MAR assumption, the estimated treatment effect of counseling vs. no counseling is significant (OR for abstinence = 1.31, p = 0.034); however, as OR_{1}increases, the estimated treatment effect becomes less significant (all ps > 0.05 for OR_{1} ≥ 2), indicating that this treatment effect is sensitive to different assumed values of OR_{1}. On the contrary, the estimated treatment effects of multiple vs. single contest are all close to 1.16 (all ps > 0.05), indicating that this treatment effect estimation was robust to different assumed values of OR_{1}. This phenomenon can be explained by the different survey missing rate between the counseling and no counseling groups, but not between the multiple and single contest groups (see the left panel in Table 2).
Urineverified abstinence outcome
The results obtained from the imputed urineverified abstinence data were summarized in Table 4. By considering all the combinations of OR_{1} and OR_{2}, each ranging from 1 to 5 and positive infinity, we found that the abstinence rate decreases as the assumed level of dependence between missing and tobacco use, OR_{1} or OR_{2} increases, as expected. Notice that the abstinence rates for the two studied conditions were found consistently higher than their corresponding control groups in all scenarios (i.e., the estimated treatment effect as indicated as odds ratios of abstinence are all > 1).
As shown in the upperleft corner of Table 4, significant treatment effects were estimated for the counseling group when both OR_{1} and OR_{2} were small. Otherwise, there seemed to be no significant treatment effects for the counseling or the multiple contests groups under different combinations of OR_{1} and OR_{2}. We also found that the estimated treatment effect of counseling vs. no counseling is more sensitive to the assumed level of dependence between the survey missing and selfreport abstinence, but less sensitive to the assumed level of dependence between the urine missing and urineverified abstinence. For the estimated treatment effect of the multiple vs. singlecontest, we observed no obvious pattern, no matter what values were assumed for OR_{1} or OR_{2}. This can be explained by the comparable survey and urine missing rates between the two contest groups as shown in Table 2. We performed additional sensitivity analysis by assuming that survey nonrespondents would be less likely to provide urine than survey respondents (λ = 0.5). Results (shown in Additional file 1: Table S1) are consistent with the results reported above which are based on the equal urine missing rate assumption (λ = 1).
Discussion
In many smoking cessation studies, researchers are interested in biochemically verified abstinence (e.g., urine cotinine verified abstinence). To conserve resources, it is common to only invite people who selfreport abstinence to provide biochemical samples to validate selfreported abstinence. Hence, missing data can be present at either the survey completion stage or the biochemical sampling stage. The imputation approaches presented in this paper take into account this twostage missing data challenge and describes a twostep imputation approach allowing the survey missing and biochemical sample missing to have different missing mechanisms. Our proposed imputation approach includes both the missing = smoking imputation (an extreme case of MNAR) and the MAR imputation as special cases, hence providing a more thorough sensitivity analysis result than any simple imputation method alone. The estimated effect of the treatments tested in the Enhanced Quit & Win study were sensitive to the different missing mechanisms depending on the differential missing data patterns across treatment arms. Although the overall results were not universally impacted, these findings demonstrate that the use of one simple imputation method alone could result in misleading conclusions regarding a treatment effect estimate.
There has been a debate regarding whether treatment should be adjusted or stratified in the imputation models. Jackson et al. [16] adjusted for treatment in their imputation model since treatment was found to be associated with the missing status and predicted missing outcomes. Alternatively, in this paper, we performed imputations stratified by treatment rather than adjusting for treatment in the model [17, 18]. Although some researchers may argue that this may overestimate the treatment effect [16], it has not been demonstrated by the preponderance of evidence. Research with more data examples to investigate the difference between these two strategies is certainly warranted.
In this paper, all the imputations were performed on aggregated data. In other words, no individuallevel variation has been considered. Currently, we are working on extending the proposed imputation approach for aggregated data to take into account the uncertainty in the individual probability of tobacco use as in multiple imputations. One advantage of the imputations based on aggregated data is the ease of computing, while the multiple imputations approach is expected to give more conservative results as individual level variability is taken into account in the estimation of treatment effect. Also in this paper, we focus on the analysis of cessation outcome at a single time point. However, with repeatedly measured outcomes, longitudinal data analysis methods for dealing with missing data could be considered. [12, 19,20,21]. Note that our proposed methods are applicable to various tobacco or other substance use trials where the treatment goal is biochemically verified selfreported abstinence.
Conclusions
The proposed twostage imputation method provides an effective sensitivity analysis tool for analyzing missing data introduced at two different stages of outcome assessment, the selfreport and validation time, frequently encountered in tobacco cessation studies. Our methods are also applicable to trials studying biochemically verified abstinence from other substance use such as alcohol and recreational drugs.
Abbreviations
 MAR:

Missing at random
 MCAR:

Missing completely at random
 MNAR:

Missing not at random
 LOCF:

Last observation carried forward
 BOCF:

Baseline observation carried forward
 EM:

Expectationmaximization
 OR:

Odds ratio
 MAPS:

Motivational and problem solving counseling
References
 1.
Lopez AD, Collishaw NE, Piha T. A descriptive model of the cigarette epidemic in developed countries. Tob Control. 1994;3:242–7.
 2.
Peto R, Lopez AD, Boreham J, Thun M, Heath C Jr. Mortality from tobacco in developed countries: indirect estimation from national vital statistics. Lancet. 1992;339:1268–78.
 3.
Peto R, Lopez AD, Boreham J, Thun M, Heath C Jr, Doll R. Mortality from smoking worldwide. Br Med Bull. 1996;52:12–21.
 4.
Pirie K, Peto R, Reeves GK, Green J, Beral V, Collaborators MWS. The 21st century hazards of smoking and benefits of stopping: a prospective study of one million women in the UK. Lancet. 2013;381:133–41.
 5.
Delucchi KL. Methods for the analysis of binary outcome results in the presence of missing data. J Consult Clin Psychol. 1994;62:569–75.
 6.
Little RJA, Rubin DB. Statistical analysis with missing data. 2nd ed. New York NY: Wiley; 2002.
 7.
Thabane L, Mbuagbaw L, Zhang S, et al. A tutorial on sensitivity analyses in clinical trials: the what, why, when and how. BMC Med Res Methodol. 2013;13:92.
 8.
Jackson D, White IR, Mason D, Sutton S. A general method for handling missing binary outcome data in randomized controlled trials. Addiction. 2014;109:1986–93.
 9.
Borland R, Balmford J, Hutn D. The effectiveness of personally tailored computergenerated letters for tobacco cessation. Addiction. 2004;99:369–77.
 10.
Nelson DB, Parlin MR, Fu SS, Joseph AM, An LC. Why assigning ongoing tobacco use is not necessarily a conservative approach to handling missing tobacco cessation outcomes. Nicotine Tob Res. 2009;11:77–83.
 11.
Blankers M, Smit ES, van der Pol P, de Vres H, Hoving C, van Laar M. The missing=smoking assumption: a fallacy in internetbased smoking cessation trials? Nicotine Tob Res. 2016;18:25–33.
 12.
Hedeker D, Mermelstein RJ, Demirtas H. Analysis of binary outcomes with missing data: missing=smoking, last observation carried forward, and a little multiple imputation. Addiction. 2007;102:1564–73.
 13.
Smolkowski K, Danaher BG, Seeley JR, Kosty DB, Severson HH. Modeling missing binary outcome data in a successful webbased smokeless tobacco cessation program. Addiction. 2010;105:1005–15.
 14.
Barnes SA, Larsen MD, Schroeder D, Hanson A, Decker PA. Missing data assumption and methods in a smoking cessation study. Addiction. 2010;105:431–7.
 15.
Thomas JL, Luo X, Bengtson J, et al. Enhancing Quit & win contests to improve cessation among college smokers: a randomized clinical trial. Addiction. 2016;111:331–9.
 16.
Jackson D, Mason D, White IR, Sutton S. An exploration of the missing data mechanism in an internet based smoking cessation trial. BMC Med Res Methodol. 2012;12:157.
 17.
White IR, Royston P, Wood AM. Miltiple imputation using chained equations: issues and guidance for practice. Statist Med. 2011;30:377–99.
 18.
Sullivan TR, White IR, Salter AB, Ryan P, Lee KJ. Should multiple imputation be the method of choice for handling missing data in randomized trials? Stat Methods Med Res. 2018;27:2610–26.
 19.
Daniels MJ, Hogan JW. Missing data in longitudinal studies. Taylor & Francis Group; 2008.
 20.
Demirtas H. Multiple imputation under Bayesianly smoothed patternmixture models for nonignorable dropout. Statist Med. 2005;24:2345–63.
 21.
Yang X, Shoptaw S. Assessing missing data assumptions in longitudinal studies: an example using a smoking cessation trial. Drug Alcohol Depend. 2005;77:213–25.
Acknowledgements
The authors thank the Enhanced Quit and Win study team for collecting the data and the two referees whose comments have helped to improve the manuscript substantially.
Funding
This study was supported by the Biostatistics Core of the University of Minnesota Masonic Cancer Center (funded by the National Cancer Institute 5P30CA077598) to CTL and XL, by the National Heart, Lung, and Blood Institute (5R01HL094183) to JLT, JSA, and XL, and by the Clinical and Translational Science Institute of University of Minnesota (National Center for Advancing Translational Sciences UL1TR002494). The funding body played no role in the design of the study and collection, analysis, and interpretation of data and/or in writing the manuscript.
Availability of data and materials
This is a manuscript demonstrating a novel application of a statistical method on data collected from a previous study [15]. Data requests should be addressed to JLT, the principle investigator of the Enhanced Quit & Win study.
Author information
Affiliations
Contributions
YZ performed all the analyses and SAS programming and cowrote the manuscipt, which was part of her dissertation when she was a master of science (MS) student at the University of Minnesota; XL developed the original idea, supervise YZ’s dissertation research, and cowrote the manuscript; CTL and JSA were YZ’s dissertation committee members and participated discussions; TLJ was the principle investigator of the Enhanced Quit & Win study and supervised the conduct of the trial and the interpretation of the analysis results; All authors contributed to the writing and revisions of the manuscript and have read and approved the amnuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The Enhanced Quit & Win study was approved by the University of Minnesota’s human subjects committee. Written informed consent was obtained from all participants in the “Quit and Win Study”.
Consent for publication
Not applicable.
Competing interests
X L is a member of the editoral board (Associate Editor) of this journal.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
SAS Computing Code for Analyzing Enhanced Quit & Win Data. Table S1. Summary of imputation results for urineverified abstinence assuming different levels of association between missing and abstinence when λ = 0.5. (DOCX 58 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Zhang, Y., Luo, X., Le, C.T. et al. Analysis of selfreport and biochemically verified tobacco abstinence outcomes with missing data: a sensitivity analysis using twostage imputation. BMC Med Res Methodol 18, 170 (2018). https://doi.org/10.1186/s1287401806352
Received:
Accepted:
Published:
Keywords
 Abstinence outcome
 Imputation
 Missing data
 Sensitivity analysis