Handling informative dropout in longitudinal analysis of health-related quality of life: application of three approaches to data from the esophageal cancer clinical trial PRODIGE 5/ACCORD 17

Background Health-related quality of life (HRQoL) has become a major endpoint to assess the clinical benefit of new therapeutic strategies in oncology clinical trials. Typically, HRQoL outcomes are analyzed using linear mixed models (LMMs). However, longitudinal analysis of HRQoL in the presence of missing data remains complex and unstandardized. Our objective was to compare the modeling alternatives that account for informative dropout. Methods We investigated three alternative methods—the selection model (SM), pattern-mixture model (PMM), and shared-parameters model (SPM)—in relation to the LMM. We first compared them on the basis of methodological arguments highlighting their advantages and drawbacks. Then, we applied them to data from a randomized clinical trial that included 267 patients with advanced esophageal cancer for the analysis of four HRQoL dimensions evaluated using the European Organisation for Research and Treatment of Cancer (EORTC) QLQ-C30 questionnaire. Results We highlighted differences in terms of outputs, interpretation, and underlying modeling assumptions; this methodological comparison could guide the choice of method according to the context. In the application, none of the four models detected a significant difference between the two treatment arms. The estimated effect of time on HRQoL varied according to the method: for all analyzed dimensions, the PMM estimated an effect that contrasted with those estimated by the SM and SPM; the LMM estimated effects were confirmed by the SM (on two of four HRQoL dimensions) and SPM (on three of four HRQoL dimensions). Conclusions The PMM, SM, or SPM should be used to confirm or invalidate the results of LMM analysis when informative dropout is suspected. Of these three alternative methods, the SPM appears to be the most interesting from both theoretical and practical viewpoints. Trial registration This study is registered with ClinicalTrials.gov, number NCT00861094.


(Continued from previous page)
Trial registration: This study is registered with ClinicalTrials.gov, number NCT00861094.
Keywords: Pattern-mixture model, Selection model, Shared-parameters model, Joint modeling, Health-related quality of life, Informative dropout, Cancer clinical trial

Background
Health-related quality of life (HRQoL) is often a secondary endpoint in cancer clinical trials. It is also increasingly being used as a primary or co-primary endpoint [1]. HRQoL is assessed at different time points throughout the care process (at baseline, during treatment, and during follow-up) by self-administered questionnaires composed of items assessing different HRQoL dimensions. The HRQoL outcome to be analyzed consists of longitudinal dimension-specific score data. However, the rate of completed questionnaires generally decreases over time and, in addition, some items may be missing among available questionnaires. This leads to missing data that are said to be monotone if the score is not available from a certain time point until the end of the study, and intermittent otherwise. The nature of the missing data mechanism depends on how the missingness is related to the HRQoL outcome.
Missing data are classified as missing completely at random (MCAR) if missingness is independent of the (observed or unobserved) HRQoL outcome or depends only on observed characteristics, as missing at random (MAR) if missingness additionally depends on the observed HRQoL outcome, and as missing not at random (MNAR) if missingness is dependent of the unobserved HRQoL outcome [2,3]. The terms informative or nonignorable are also used to refer to MNAR data. In the presence of incomplete longitudinal outcome data, the strategy of analysis should be adapted to the nature of the missing data mechanism in order to avoid biased or inaccurate results. In most studies, the missing data mechanism is not characterized, so methods used to analyze longitudinal HRQoL data in randomized clinical trials [4] are potentially inadequate.
Linear mixed models (LMMs) are powerful and flexible models for the analysis of repeated measures of a continuous outcome. This class of model is classically used to compare changes in HRQoL over time between experimental and control arms in cancer clinical trials [5,6]. However, the occurrence of intermittent or monotone missing data could compromise the longitudinal analysis of HRQoL data, leading to a loss of statistical power at best, and, at worse, biased estimates; for instance, in palliative or advanced disease situations, where missing data could be related to the health status of patients too ill to complete their HRQoL questionnaires [7,8]. Likelihood-based methods that use all the observed information (as in LMMs) are valid when the missing data are MAR [9]. However, in the presence of informative missing data (i.e., MNAR), the two processes that are the longitudinal HRQoL outcome and the missing data mechanism have to be jointly modeled to prevent a biased estimation [10,11].
Since the end of the 1980s, different models have been proposed for the joint distribution of the longitudinal outcome and the missingness process. More attention has been devoted to monotone missing data, corresponding to dropout, which is more likely to be informative and generally easier to handle. Pattern-mixture models (PMMs) and selection models (SMs) are based on the two possible decompositions of the joint distribution [12,13]. In recent years, the joint models or sharedparameter models (SPMs), where the association between the two processes is captured by shared parameters, have received much attention [14,15]. In clinical trials, SPMs are mostly used to jointly analyze a longitudinal outcome and overall survival. They can also be used to take into account and study the relationship between a longitudinal HRQoL outcome and time-todropout [16].
There are relatively few publications that compare these three approaches from a perspective of their practical application to clinical trial data [17][18][19]. This is needed to further our understanding of their use and interpretation; the insufficient knowledge about these models could explain why they are rarely used in clinical trials.
The objectives of this paper were to compare the PMM, the SM, and the SPM with each other and then to compare these models with the LMM, for the analysis of an HRQoL outcome in the presence of informative dropout. First, we compare the models from a methodological point of view, highlighting the advantages and drawbacks of each one. Then, we illustrate and interrogate them in the longitudinal analysis of four HRQoL dimensions in patients with advanced esophageal cancer from the PRODIGE 5/ACCORD 17 clinical trial.

Methods
We highlighted the differences between the PMM, SM, and SPM in handling informative dropout when analyzing a longitudinal HRQoL outcome and interpreted their results in relation to those from the LMM. For this purpose, we first made a methodological comparison of the four models by highlighted their differences in terms of underlying modelling assumptions and interpretation. The advantages and drawbacks of each of model are then illustrated through an analysis of data from the PRODIGE 5/ACCORD 17 clinical trial (NCT00861094).

Illustrative clinical trial Study design
In the PRODIGE 5/ACCORD 17 clinical trial, 267 patients with advanced esophageal cancer were randomly assigned to either an experimental arm (N = 134) receiving a FOL-FOX (fluorouracil plus leucovorin and oxaliplatin) regimen or a control arm (N = 133) receiving a fluorouracil and cisplatin regimen as part of chemoradiotherapy treatment. The primary endpoint was progression-free survival and one of the secondary endpoints was HRQoL. The statistical analysis of the primary endpoint revealed no significant difference between the two treatment arms. More details concerning inclusion and exclusion criteria, study design, protocol treatment, HRQoL assessment, and compliance have been previously published [20,21].

HRQoL assessment
HRQoL was prospectively assessed using the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 (QLQ-C30, version 3.0) [22] at baseline, during treatment (months 1.25 and 3), at month 4, and after treatment during follow-up (at months 6, 12, 24, and 36). This selfadministered questionnaire contains 30 items evaluating five functional scales, nine symptomatic scales/items, and one global health status/HRQoL scale. Standardized scores from 0 to 100 can be calculated for each scale according to the scoring procedure recommended by the EORTC [23]. A high score for the functional and global health status scales corresponds to good functional capacities and reflects a high level of HRQoL, whereas a high score for the symptom scales corresponds to a high level of symptoms and reflects a poor HRQoL. Four dimensions were pre-specified in the protocol as targeted dimensions: global health status/HRQoL (QL scale), physical functioning (PF scale), pain (PA scale), and fatigue (FA scale). In what follows, we will consider only these four dimensions (or scales).

Statistical analysis
All analyses were performed in the evaluable intent-totreat population: a patient was considered as evaluable for a given scale when the score was available at least once during the study, whatever the corresponding measurement time. We used the four models described below in Eqs. (1), (3), (5) and (8) to analyze the longitudinal HRQoL score data conditionally to baseline covariates in the presence of potentially informative monotone missing data (dropout).
We first used the LMM that is valid under the MAR assumption. We then modeled the joint distribution of the longitudinal outcome and the dropout process using three models that are valid under the MNAR assumption: the SM and the PMM, which are based on the two existing and converse factorizations of the joint distribution, and the SPM, where the longitudinal outcome and the time-to-dropout are linked through a function of the random effects. In these three models, we used the LMM presented below as the sub-model for the HRQoL score.

Linear mixed model (LMM)
We modeled the HRQoL score trajectories by a random coefficients LMM. The HRQoL score for patient i at time t j of the j-th planned visit was expressed as follows: where arm i is the arm indicator variable for patient i (0: control, 1: experimental), β 0 is the intercept, β 1 the slope in the control arm, and β 2 the interaction effect corresponding to the difference between the slopes in the experimental and control arms. With this parametrization, the quantity β 1 + β 2 represents the slope in the experimental arm. The random intercept b 0i and the random slope b 1i take into account the repeated measurements on the same patient and correspond to the individual deviations from the fixed intercept and slope, respectively. They are assumed to be normally distributed with a mean of 0 and a 2 × 2 unconstrained covariance matrix to estimate. The error term denoted by ε i (t j ) is also assumed to be normally distributed with a mean of 0 and a variance to estimate. In what follows, Y i , X i , and D i denote respectively the vector of longitudinal HRQoL scores, the vector of covariates, and the dropout variable for patient i.

Selection model (SM)
The SM is based on the decomposition of the joint distribution into the marginal distribution of the HRQoL score and the conditional distribution of the dropout variable given the HRQoL score: where the dropout variable D i corresponds to the visit at which the last available HRQoL assessment took place, i.e., before patient i dropout. In cases of no dropout, D i = J, where J is the number of planned visits. We modeled the HRQoL score using the LMM in Eq. (1). We modeled the conditional probability of dropout at each visit j = 1, …, J by the logistic regression proposed by Diggle and Kenward [24]: The dropout probability is allowed to depend on the last (observed) HRQoL score Y i (t j ) and the current (unobserved) HRQoL score Y i (t j + 1 ). A non-zero parameter ψ 1 would be in favor of the MAR assumption and a non-zero parameter ψ 2 in favor of the MNAR assumption (informative dropout). If only the ψ 0 parameter is non-zero, the dropout can be considered to be independent of the HRQoL score (MCAR assumption).

Pattern-mixture model (PMM)
The PMM is based on the other possible decomposition of the joint distribution, that is, the decomposition into the marginal distribution of the dropout variable and the conditional distribution of the HRQoL score given the dropout variable: where the dropout variable corresponds to the pattern of missing data: D i = k, k = 1, …, K, where K is the number of possible patterns. In the simplest case, the variable is defined as a dropout indicator (K = 2); in the most complex case, the variable is defined as the number of dropout possibilities: D i = k, k = 1, …, J, where J is the number of planned visits. In our application, we classified a patient as belonging to a certain pattern when she/he dropped out within a specific time interval covering one or several visits.
In the PMM, a multinomial distribution is assumed for the dropout probability, meaning that the probability of belonging to pattern k is simply estimated by the proportion π k of patients belonging to pattern k.
We modeled the conditional HRQoL score trajectory using an LMM similar to the LMM in Eq. (1) in each pattern k: Note that in the PMM approach, the fixed effects differ according to the dropout pattern. The following formula allows estimates to be obtained for the marginal distribution of the HRQoL score (irrespective of the pattern): It corresponds to a weighted sum of the patternspecific parameters. Confidence intervals can then be calculated using the delta method [25].

Shared-parameter model (SPM)
The SPM captures the association between the time-todropout and the longitudinal HRQoL outcome through shared parameters that include the random effects b i , so that the HRQoL score and the dropout variable are supposed to be conditionally independent given the random effects: where the dropout variable D i corresponds to a time-todropout variable. In our application, dropout is not related to an event occurring at any time but corresponds to nonresponse after a certain visit. Thus, we defined D i as the delay between inclusion and the last visit in which HRQoL assessment occurred. We modeled the HRQoL score using the LMM in Eq. (1). We modeled the risk of dropout at time t j using a Cox-type survival model.
In the SPM, the association between the HRQoL score and dropout is modeled by including a function of the variables and parameters from the model for Y i as a time-dependent variable in the survival model. We used the current value parametrization, which means that the time-dependent variable corresponded to the true current HRQoL score value: More precisely, we used the following model for D i : where λ 0 is the baseline hazard function, γ denotes the arm effect on the instantaneous risk of dropout, and α is the parameter that quantifies the association between risk of dropout and true current HRQoL score.

Statistical software
We fitted the four models to the PRODIGE 5/ACCORD 17 data using the R software (code available on request). For LMM estimation, we used the restricted maximum likelihood method (REML) from the R package nlme [26]. The SM was not available in standard statistical software and required sophisticated programming: the Diggle and Kenward model involved marginalization over the unobserved outcomes and the computation of the likelihood required evaluation of integrals approximated by the Romberg numerical algorithm. We implemented a maximum likelihood function procedure based on a Newton-type algorithm. To apply the PMM required that we apply an LMM with indicator variables for the pattern. We then combined the PMM estimates following Eq. (6) to obtain marginal estimates and implemented a delta method to obtain their confidence intervals. For the SPM, we used the R package JM [27] by assuming a piecewise-constant function for the baseline hazard λ 0 with seven intervals for the baseline (six internal knots placed at months 1.25, 3, 4, 6, 12, and 24) and the pseudo-adaptive Gauss-Hermite method with nine quadrature points to approximate the integrals over the random effects. Table 1 compares the four approaches (LMM, SM, PMM, and SPM) from a methodological point of view.

Methodological comparison
In cases of non-informative dropout (MAR assumption), the likelihood-based LMM that uses all observed data provides valid results; in cases of informative dropout (MNAR assumption), the risk of dropout needs to be modeled using one of the three other approaches.
The SM explains the probability of dropout by a logistic regression; the PMM estimates the probability of belonging to a certain pattern of dropout with a multinomial distribution; the SPM uses a survival model for the time-to-dropout. The SM and PMM suppose that dropout occurs at the discrete assessment times of the HRQoL. By contrast, the SPM treats the time variable as continuous, making it possible to take into account the fact that the dropout could arise at any time during the study.
The fixed parameters β 0 , β 1 , and β 2 characterizing the mean HRQoL score trajectories are directly estimated using the LMM, SM, and SPM, or obtained indirectly by extrapolation using the PMM. More precisely, the PMM estimates the HRQoL score trajectory parameters at the level of each pattern k; afterwards, marginal estimates can be calculated as weighted averages using the proportion π k of patients in each dropout pattern. Note that this calculation implicitly extrapolates the HRQoL score trajectories beyond the dropout. Thus, all models can be used to graphically represent the mean HRQoL score over time according to treatment arm, directly (LMM, SM, SPM) or indirectly (PMM). The PMM provides complementary graphs specific to the dropout pattern, which can be useful to understand and visualize how the risk of dropout is linked to the HRQoL. The SPM allows a graphical representation of the risk of dropout over time. The informative nature of the dropout can also be tested using additional parameters of the SM or SPM: the ψ 2 coefficient in the logistic regression of the SM indicates how the probability of the HRQoL score to be missing at a certain time depends on the missing value at this time, while the α coefficient in the Cox regression of the SPM indicates how the instantaneous risk of dropout at any time is associated with the current HRQoL score.
Nevertheless, the models used to study the evolution of HRQoL scores in the presence of informative dropout require additional assumptions that are untestable on the basis of the observed data. We have already mentioned extrapolating the HRQoL trajectories beyond the dropout in the PMM. The SM is based on the assumption of a normal distribution of the complete (i.e., observed and unobserved) HRQoL score variable. The SPM assumes independence between the longitudinal outcome and dropout process conditionally to the random effects.
The estimates of each model can be obtained using usual statistical software (including R, SAS, and Stata). Specific software has already been developed for LMM and SPM. However, applying the SM and the PMM requires a programming effort. In particular, applying the SM requires implementation and maximization of the likelihood function. In fact, the proportion of available scores for scales QL, PF, PA, and FA decreased over time, mostly because of monotone missing data that can be attributed to dropouts (see Fig. 1). For example, for the QL scale, 16/130 patients (12%) in the experimental arm and 17/122 patients (14%) in the standard arm dropped out after the baseline visit (V0, baseline); at the last scheduled visit (V7, month 36), 125/ 130 patients (96%) in the experimental arm and 115/122 patients (94%) in the standard arm had dropped out (i.e., only 5/130 (4%) and 7/122 (6%) patients completed the questionnaire or the items associated with the QL scale until V7). The distribution of the dropouts seemed homogeneous in both treatment arms, regardless of the dimension. The compliance in completing the entire questionnaire was high at baseline (89 and 90% in experimental and standard regimen arms, respectively), then reduced during treatment and follow-up. Some missing items led to a lower compliance for dimension QL than for the others (for example, at baseline: 83% for QL vs. 89% for PF and 88% for PA and FA in the experimental regimen arm, and 86% for QL vs. 90% for PF, PA and FA in the standard regimen arm) (see Supplementary Figure 1).

Definition of the patterns for the PMM approach
We defined four patterns of dropout with well balanced effectives and a reasonable number of patients by pattern as well as clinically pertinent (see Fig. 1).
The first pattern grouped the patients who dropped out before visit V3 (last HRQoL measurement at V0, V1, or V2), that is, during or just after the period of radiochemotherapy and chemotherapy treatment. The patients who dropped out between V3 and V5 (last measurement at V3 or V4) formed the second pattern, and between V5 and V6 (last measurement at V5) the third pattern. The last pattern grouped the patients who dropped out between V6 and V7 (last measurement at V6) and the patients who did not drop out. For the QL dimension for example, the 252 evaluable patients were distributed as follows: 89/252 (π 1 = 35%), 70/252 (π 2 = 28%), 58/252 (π 3 = 23%), and 35/252 (π 4 = 14%) in the four respective patterns (for the other dimensions, see Fig. 1).
The results of the longitudinal analysis of the QL, PF, PA, and FA scales of the EORTC QLQ-C30 using the four previously described approaches are summarized in Table 2 (estimates, 95% confidence intervals, and associated p-values of the Wald test) and graphically Fixed effects (β 0 , β 1 , and β 2 ) Logistic regression coefficients (ψ 0 , ψ 1 , and ψ 2 ) (Fixed effects overall patterns (β 0 , β 1 , and β 2 )) Fixed effects in each pattern k (β k 0 , β k 1 , and β k 2 ) Proportion in each pattern (π k ) Fixed effects (β 0 , β 1 , and β 2 ) Association parameter (α) Effect of arm on instantaneous risk of dropout (γ) Interpretation Improvement/ deterioration of the HRQoL Improvement/ deterioration of the HRQoL Testing MNAR assumption: a non-null ψ 2 when probability of dropout is associated with unobserved Y (Improvement/deterioration of the HRQoL) Improvement/deterioration of the HRQoL in each dropout pattern Improvement/deterioration of the HRQoL Risk of dropout over time Testing MNAR assumption: a nonnull α when instantaneous risk of dropout is associated with current value of Y The LMM showed a significant time effect for three of the four dimensions. More precisely, this model showed an increase in scale QL (β 1 ¼ 0:513; p < 0:001) and a decrease in scales PA (β 1 ¼ − 0:472; p ¼ 0:008 ) and FA (β 1 ¼ − 0:514; p ¼ 0:003 ), reflecting a better level of HRQoL.
The SM confirmed or contradicted these results, depending on whether an association with the probability of dropout was detected or not. The SM and LMM estimated similar effects of time in the QL and in the PA scale where the dropout seemed to be ignorable (non-significantψ 2 ). However, there were unclear results with optimization difficulties: for scale QL, a numerical issue when inverting the Hessian matrix made it impossible to estimate the standard errors ofβ 1 , and therefore its confi-dence interval and associated p-value were not available; in view of the results for the PA scale, we could question whether or not the algorithm converged to a local minimum. When the SM detected an informative dropout (PF: ψ 2 ¼ 0:107; p < 0:001 and FA:ψ 2 ¼ − 0:097; p < 0:00 1), the estimated effect of time was larger than that estimated by the LMM, with a substantial increase in PF (SM: LMM:β 1 ¼ − 0:514; p ¼ 0:003 ). However, the values of ψ 2 were counterintuitive, suggesting that the probability of dropout increased with an unobserved score value that corresponded to a higher level of HRQoL.
The marginal effect of time derived from the PMM estimates was ambiguous for all dimensions. For scales QL and PA, the direction of the time effect (i.e., the sign of β 1 ) was reversed and no longer significant compared to the LMM. For the PF and FA scales, the HRQoL deterioration was aggravated compared to the LMM, with a significant increase in PF (PMM:β 1 ¼ − 2:652; p < 0:00 1 vs. LMM:β 1 ¼ − 0:164; p ¼ 0:266 ) and FA (PMM: β 1 ¼ 3:157; p < 0:001 vs. LMM:β 1 ¼ − 0:514; p ¼ 0:003), corresponding exactly with the same dimensions for which the SM had detected informative dropout.
We observed that the estimated effect of time in the first pattern differed greatly from those in all other patterns (see also Fig. 3, which depicts the score trajectories by pattern).
The estimates in this pattern with a maximum of three repeated measures showed poor functional capacities    As for the treatment-by-time interaction effect, we also observed that the 95% confidence intervals for the time effect were much larger than those seen in the other three models, reflecting more uncertainty.
For scales QL and PA, the estimated effect of time in the SPM was similar to that in the LMM. No association was detected between the risk of dropout and the current HRQoL score value, which confirmed the results of a non-informative dropout already identified by the SM. In contrast with the SM, the SPM also did not detect an association between the risk of dropout and the score in the FA scale, and the estimated time effect was similar to the LMM estimate. In fact, the SPM only detected a significant association between the risk of dropout and the score in the PF scale (also found by the SM) (α ¼ − 0:015; p ¼ 0:006). In particular, a decrease of 10 points in the PF score corresponded to a risk of dropout multiplied by 1.16 (95% confidence interval: [1.00, 1.35]). The estimation of the time effect was impacted (SPM:β 1 ¼ − 0:394; p ¼ 0:0 78 vs. LMM:β 1 ¼ − 0:164; p ¼ 0:266). Finally, the SPM allowed a more detailed analysis of the dropout process. The baseline hazard function was high at the beginning of the study and then decreased over time for the four scales (see theξ 1 ; …;ξ 7 estimates). Besides this, the arm effect γ in the survival model was always non-significant, which suggests that there was no difference in the risk of dropout between the treatment arms.
Finally, Fig. 4 depicts how the differences between the models impacted the estimated HRQoL score trajectories.
The trajectories predicted by the PMM differed from the other models, showing poor functional capacities (QL and PF) and high levels of symptoms (PA and FA). The trajectories predicted by the SM contrasted with those of the PMM, particularly for scales PF and FA. Globally, the trajectories predicted by the SPM were consistent with those of the LMM.

Discussion
Three approaches exist to model the joint distribution of a longitudinal outcome, such as a longitudinal score, and a dropout process: the SM, the PMM, and the SPM. In this article, we have compared them; firstly, from a methodological point of view, and secondly, when applied to data from the randomized clinical trial PRO-DIGE 5/ACCORD 17, which included 267 patients with advanced esophageal cancer. We have also compared the results of the three models with those obtained with the LMM.
All three approaches have different advantages and could be complementary. They also have different drawbacks and require assumptions that are untestable since they are based on unobserved data.
The PMM makes it possible to describe and study the HRQoL trajectories in each dropout pattern. In the application, the PMM revealed that the earlier the patients dropped out, the stronger their HRQoL deterioration. Besides this, by highlighting the different evolutions of HRQoL scores according to the dropout pattern, one can presume that the dropout process is informative. However, the PMM does not directly provide marginal estimates that would allow conclusions to be made for the whole population unless assumptions are made about the evolution of HRQoL trajectories after dropout. In our application we considered a simple PMM model with a linear HRQoL trajectory within each dropout pattern and a first pattern grouping patients with 1, 2 and 3 observations. It resulted in a direct and easy-toimplement formulation of the marginal estimates and implied that the HRQoL score evolution after dropout was extrapolated as an extension of the linear trajectories. This gave results that contradicted those obtained with the other models (the LMM, SM, and SPM) and with larger confidence intervals. Indeed, the first patterns including patients with few repeated measurements and a strong HRQoL deterioration highly influenced the marginal estimates. Note that in a more complex model, making identifying assumptions would be necessary [28]; a common strategy consists in using identifying restrictions [29]. Although unverifiable, the assumptions necessary to achieve identifiability in the PMMs and obtain marginal estimates have the advantage of being explicit.
The SM and SPM are interesting approaches because they can test the mechanism of missing data through interpretable parameters obtained from the logistic regression (SM) or the Cox model (SPM). In the application, when the dropout was detected as non-informative by the SM or the SPM the results for the trajectories of HRQoL were similar to those of the LMM and led to the same conclusions. Both models detected an informative dropout in the PF dimension but only the SM detected an informative dropout in the FA dimension. The SPM results were consistent with the LMM results and had a coherent interpretation. In contrast, the SM results revealed that the probability of dropout increased with an unobserved score value corresponding to a higher level of HRQoL. It is possible that these unexpected results are the consequence of the strong assumption of a normal distribution of the complete (observed and unobserved) HRQoL score values. Indeed, it has been shown that the SM is particularly sensitive to this unverifiable assumption [24,30].
The SPM makes also modeling assumptions. In particular, it relies on the conditional independence between the longitudinal outcome and dropout process given the random effects. The random effects are also supposed to be normally distributed. Rizopoulos et al. showed that estimation of the parameters and standard errors could be sensible to misspecification of the random effects distribution, especially when some patients have very few measurements (early dropout) [31]. Note that in this application, we considered that the risk of dropout was associated with the HRQoL score through its current value. Other association structures could be considered, including the current slope or the random effects alone. The SPM alone is able to take into account dropout by modeling time-to-event data. Thus, unlike the PMM and the SM, the SPM treats the time-todropout as continuous. In our application, we used discrete dropout times corresponding to pre-specified assessment times, but the SPM would allow researchers to take into account dropouts corresponding to clinical events such as death, which can occur at any time between the HRQoL assessment times. By contrast, the use of the SPM was facilitated by the standard statistical software [27,[32][33][34]. Moreover, the existing programs allow for flexible models for the longitudinal outcome, more complex models for the time-to-dropout, and different association structures to capture the association between the longitudinal outcome and the time-todropout.
In this article, we have analyzed HRQoL data from the PRODIGE 5/ACCORD 17 clinical trial under three possible MNAR models accounting for informative dropout and the MAR corresponding model. MNAR methods, especially PMM, can also be used for sensitivity analysis to assess the robustness of the results [35].
This work has some limitations. The main objective was to compare MNAR models from a practical point of view but this does not allow to clearly decide between one model or the other. A simulation study would allow a comparison with statistical criteria by example in case of misspecification or by varying the proportion of missing data.
Longitudinal analysis of the HRQoL in the presence of missing data remains complex and unstandardized. Reviews and guidelines about reporting missing patientreported outcome data in clinical trials have been published [36,37]. It is recommended that the amount of missing data in each arm is reported and that the statistical methods used to handle missing data are explicitly specified. Nevertheless, there is no consensus for analyzing such data. Indeed, there is a lack of standardization and a gap between the development of statistical methods and their use in clinical trials [38,39].

Conclusions
This article aims to facilitate the understanding and use of such methods allowing analysis of longitudinal HRQoL data that include missing data due to dropout. Nevertheless, including in clinical trial protocol a plan to collect the reasons for non-responses would help to better characterize the missingness. Then, if informative dropout is suspected, we recommend using models that account for dropout, such as the SPM. In studies where no information is available on the reasons for missingness, the SPM can be used to confirm or invalidate the results of LMMs.
Additional file 1: Figure S1. Compliance in completing the EORTC QLQ-C30. Compliance in completing the entire questionnaire and for the four dimensions QL, PF, PA, and FA (ratio of the number of available questionnaires or scores to the number of expected questionnaires) at each HRQoL assessment visit (V) by treatment arm during radiochemotherapy (RT), chemotherapy (CT), and follow-up.