 Research
 Open Access
 Published:
When a joint model should be preferred over a linear mixed model for analysis of longitudinal healthrelated quality of life data in cancer clinical trials
BMC Medical Research Methodology volume 23, Article number: 36 (2023)
Abstract
Background
Patientreported outcomes such as healthrelated quality of life (HRQoL) are increasingly used as endpoints in randomized cancer clinical trials. However, the patients often drop out so that observation of the HRQoL longitudinal outcome ends prematurely, leading to monotone missing data. The patients may drop out for various reasons including occurrence of toxicities, disease progression, or may die. In case of informative dropout, the usual linear mixed model analysis will produce biased estimates. Unbiased estimates cannot be obtained unless the dropout is jointly modeled with the longitudinal outcome, for instance by using a joint model composed of a linear mixed (sub)model linked to a survival (sub)model. Our objective was to investigate in a clinical trial context the consequences of using the most frequently used linear mixed model, the random intercept and slope model, rather than its corresponding joint model.
Methods
We first illustrate and compare the models on data of patients with metastatic pancreatic cancer. We then perform a more formal comparison through a simulation study.
Results
From the application, we derived hypotheses on the situations in which biases arise and on their nature. Through the simulation study, we confirmed and complemented these hypotheses and provided general explanations of the bias mechanisms.
Conclusions
In particular, this article reveals how the linear mixed model fails in the typical situation where poor HRQoL is associated with an increased risk of dropout and the experimental treatment improves survival. Unlike the joint model, in this situation the linear mixed model will overestimate the HRQoL in both arms, but not equally, misestimating the difference between the HRQoL trajectories of the two arms to the disadvantage of the experimental arm.
Background
There is a growing interest in using patientreported outcomes (PROs) in addition to objective endpoints when assessing the benefit of new treatments or new therapeutic strategies. In cancer clinical trials, healthrelated quality of life (HRQoL) has become an almost standard secondary endpoint and is even sometimes (in palliative or supportive care) a primary or coprimary endpoint. Usually, HRQoL is assessed by a selfadministered questionnaire at different visit times during care and followup, and results in a longitudinal outcome analyzed using linear mixed models (LMMs).
However, assessment of HRQoL often ends prematurely at timepoints that differ between patients. The patients may simply stop completing the questionnaires, drop out for various reasons including occurrence of toxicities, disease progression, or die. In case of informative dropout, i.e., if the dropout is related to the HRQoL outcome, an LMM analysis will lead to biased estimates [1]. To avoid biases, the joint distribution of the HRQoL and dropout variables must be considered, for example through a joint model (JM) that consists of a linear mixed (sub)model for the longitudinal HRQoL outcome and a survival (sub)model for the time to dropout. By sharing parameters including the random effects, JMs allow for the association between a longitudinal outcome and a time to event [2]. They have been extensively used to make predictions on the occurrence of a clinical event while accounting for an endogenous timedependent covariate (the longitudinal outcome), for example predicting prostate cancer recurrence using PSA (prostatespecific antigen) measurements [3]. In clinical trials, JMs are increasingly applied using a biomarker as the longitudinal outcome. In this context, they can provide more efficient estimates of the treatment effect on the longitudinal outcome and of the direct treatment effect on survival as well as less biased estimates of the overall treatment effect on survival [4]. In addition, JMs have been applied using a HRQoL longitudinal outcome and in particular have shown that part of the survival benefit of a treatment can be masked by the negative effect that the treatment has on patients’ HRQoL [5]. Nevertheless, applications of JMs where the primary interest is in the longitudinal outcome are less frequent; among them, those using HRQoL longitudinal outcomes in clinical trial settings are rare. Indeed, there is little knowledge of the impact of taking into account or not dropout on the estimation of the parameters that characterize the HRQoL outcome trajectory; when, how, and why will the usual LMM analysis produce misleading results? Will the biases be similar in the two treatment arms or could the results affect the betweenarm comparison?
This article studies the practical implications of using an LMM to analyze a HRQoL longitudinal outcome in a randomized clinical trial where the patients may drop out.
In Section 2, we detail the models that will be compared: the random intercept and slope model, which is the most frequently used LMM in clinical trial settings, and its corresponding JM. Section 3 deals with PRODIGE 4/ACCORD 11, a randomized phase IIIII clinical trial including patients with metastatic pancreatic cancer. We apply the two models to data from this trial and derive hypotheses on the situations in which using the LMM would lead to biased estimates of the HRQoL parameters. In Section 4, we conduct a simulation study considering different scenarios to validate (or invalidate) these hypotheses and more generally to study the bias mechanisms in depth. We discuss our findings in Section 5 and conclude in Section 6.
Modeling the HRQoL longitudinal outcome
In the following, the longitudinal outcome to be considered consists of a HRQoL score and the corresponding variable is assumed to be continuous and normally distributed.
Linear mixed model (LMM)
The usual approach to analyzing longitudinal HRQoL score data consists of using an LMM. According to its general form, the HRQoL score of patient i at time t is:
where \({Y}_i^{\star }(t)\) represents the true score value at time t, β and b_{i} are the vectors of the p fixed effects and q random effects, X_{i}(t) and Z_{i}(t) are the respective design vectors of size p and q containing the covariates at time t, and ϵ_{i}(t) is the random error term at time t. It is assumed that ϵ_{i}(t) ∼ N(0, σ^{2}) and b_{i} ∼ N(0, D) where D is a q × q unstructured covariance matrix. Furthermore, the ϵ_{i}(t) are mutually independent and independent of b_{i}. Note that Y_{i}(t) is only observed at time points t_{ij}, j = 1, …, n_{i} where n_{i} is the number of HRQoL measurements for patient i, and that the t_{ij} and n_{i} can vary from a patient to another.
As it is commonly used in clinical trial settings, we have focused on the random coefficient model, or random intercept and slope model, which specifies the true HRQoL score trajectory as a linear function of time as follows:
where \(D=\left(\begin{array}{cc}{\sigma}_0^2& {\sigma}_{01}\\ {}{\sigma}_{01}& {\sigma}_1^2\end{array}\right)\). The factor arm_{i} equals 1 if patient i belongs to the experimental arm receiving the new treatment and 0 if patient i belongs to the control arm receiving the standard treatment. The fixed intercept β_{0} represents the mean score at inclusion (t =0), the fixed slope β_{1} represents the score change by unit of time in the control arm, the interaction effect β_{2} represents the difference between the slopes of the experimental and control arms, and the random intercept b_{0i} and random slope b_{1i} represent individual deviations from the fixed intercept and fixed slope, respectively. Note that β_{1} + β_{2} corresponds to the slope in the experimental arm. Note also that there is no arm effect in the model because randomization normally ensures that baseline HRQoL is similar between the two arms.
Joint model (JM)
To analyze longitudinal HRQoL score data taking into account the fact that observation ends with an event (dropout), an alternative approach consists of using a JM that links the LMM to a timetoevent model through shared parameters. In general, the latter is a proportional hazards model that includes the true current value of the longitudinal outcome as a covariate on the hazard function:
where λ_{0}(t) is the baseline hazard function, W_{i} is the vector of baseline timeindependent covariates that includes the treatment arm and possibly other prognostic factors or covariates, γ is the vector of the corresponding effects, and α is the parameter that represents the association between the risk of event and the current true value of the longitudinal outcome \({Y}_i^{\star }(t)\).
We have considered the JM where the (sub)model for the longitudinal HRQoL outcome is given by Eqs. (1) and (2) and the survival (sub)model is given by:
with γ_{1} corresponding to the direct treatment effect on the risk of event. Note that due to the presence of \({Y}_i^{\star }(t)\), the quantities given by exp{γ_{1}} and exp{α} are conditional hazard ratios (HRs) controlling for the random effects. The baseline hazard function was assumed to be piecewise constant (application of Section 3) or to follow a Weibull distribution (simulation study of Section 4).
Predicted HRQoL score trajectories
From the estimated parameters of each model, one can obtain predicted values of the HRQoL score at any time t, thus one can depict the predicted HRQoL score trajectories. The predicted HRQoL score trajectory of patient i is given by:
However, the interest often lies on the predicted mean trajectories rather than on the predicted individual trajectories, that is on plotting on the same graph the mean trajectory in the experimental arm:
and the mean trajectory in the control arm:
The PRODIGE 4/ACCORD 11 clinical trial
Description
PRODIGE4/ACCORD11 was a multicenter, randomized, phase IIIII clinical trial comparing FOLFIRINOX (combination of folinic acid, fluorouracil, irinotecan, and oxaliplatin) and gemcitabine (reference regimen) as firstline chemotherapy for patients with metastatic pancreatic cancer. Detailed inclusion and exclusion criteria, as well as study design have been previously published [6]. Inclusion criteria included a measurable metastatic pancreatic adenocarcinoma, an Eastern Cooperative Oncologic Group (ECOG) performance status score of 0 or 1, and no prior chemotherapy. The primary endpoint for the phase III analysis was overall survival, and the secondary endpoints were progressionfree survival, tumor response, safety, and HRQoL.
HRQoL assessment
HRQoL was assessed by use of the European Organization for Research and Treatment of Cancer (EORTC) QLQC30 questionnaire (version 3.0) at inclusion, then every 2 weeks during treatment, then every 2 months until progression, then every 6 months until death or end of study. The QLQC30 is a 30item selfadministered cancerspecific questionnaire composed of five functional scales (physical, role, cognitive, emotional, and social functioning), nine symptom scales (fatigue, nausea and vomiting, pain, dyspnea, insomnia, appetite loss, constipation, diarrhea, financial difficulties) and a global health status/HRQoL scale [7]. The primary endpoint for the HRQoL analysis was global health status/HRQoL domain, and the secondary endpoints were physical, role, and social functioning, and fatigue and pain. For each scale, a standardized score ranging from 0 to 100 was calculated from the item responses as recommended by the EORTC [8]. A high score for the global health status/HRQoL and a functional scale represents respectively a high HRQoL and a high level of functioning and so is associated with a good HRQoL level; conversely, a high score for a symptom scale represents a high level of symptomatology and so is associated with a poor HRQoL level.
Main findings
The main findings of the trial have been previously published [6]. All analyses were performed on the intentiontotreat (ITT) population that included 342 patients (n = 171 in the FOLFIRINOX experimental arm, n = 171 in the gemcitabine control arm). Demographic and baseline disease characteristics were similar in the two treatment arms, except for the number of measurable target lung metastases (fewer in the FOLFIRINOX arm than in the gemcitabine arm). Significant differences were found in overall survival (HR = 0.57, 95% confidence interval (CI): [0.45, 0.73], p < 0.001) and in progressionfree survival (HR = 0.47, 95% CI: [0.37; 0.59], p < 0.001) in favor of the FOLFIRINOX arm. Median overall survival was 11.1 months (95% CI: [9.0; 13.1]) in the FOLFIRINOX arm and 6.8 months (95% CI: [5.5; 7.6]) in the gemcitabine arm. Median progressionfree survival was 6.4 months (95% CI: [5.5; 7.2]) in the FOLFIRINOX arm versus 3.3 months (95% CI: [2.2; 3.6]) in the gemcitabine arm. However, more adverse events occurred in the FOLFIRINOX arm than in the gemcitabine arm. HRQoL in the two treatment arms was compared through a timetodefinitivedeterioration analysis with responder thresholds of 10 and 20 points to quantify an individual change [9]. A decreased risk of definitive deterioration in favor of the FOLFIRINOX arm was found in the six scales of interest: the global health status/HRQoL scale (HR = 2.3, p < 0.001), the physical (HR = 1.9, p =0.001), role (HR = 2.2, p < 0.001), and social (HR = 2.1, p < 0.001) functioning scales, and the fatigue (HR = 1.9, p =0.001) and pain (HR = 2.7, p < 0.001) symptom scales (values given for a 10point threshold).
Application of the LMM and JM to analyze longitudinal HRQoL data
We applied the LMM and the JM described in Section 2 to data from the PRODIGE 4/ACCORD 11 trial to analyze the HRQoL score evolution in the two treatment arms for each of the six scales of interest. Contrary to the LMM, the JM take into account the fact that death occurrence stopped the observation of the longitudinal HRQoL outcome. The analyses were performed on the 335 evaluable patients (FOLFIRINOX arm: n = 167, gemcitabine arm: n = 168) of the ITT population (i.e., with at least one HRQoL score measurement).
We used the R package nlme for the LMM (function lme) and the R package JM for the joint model (function jointModel with a piecewiseconstant baseline hazard on seven intervals and a pseudoadaptative Gauss–Hermite method with 15 quadrature points to approximate the integrals over the random effects). The main estimation results (β_{1}, β_{2}, γ_{1}, and α parameters) are summarized in Table 1 and the predicted mean score trajectories are depicted in Fig. 1. The results concerning β_{0} and the variance parameters can be found in Supplementary Table 1.
Clinical comments
For all scales except physical functioning, both models found a tendency toward improvement in the HRQoL in the control arm – that is, a score increase (\({\hat{\beta}}_1\) > 0) in the global health status/HRQoL and the functional scales and a score decrease (\({\hat{\beta}}_1\) < 0) in the symptom scales. Both models detected a significant improvement of pain symptoms (p < 10^{−3}) but only the LMM found a significant improvement of fatigue symptoms (p = 0.016).
Both models found that all dimensions of HRQoL were improved in the experimental arm versus the control arm (i.e., \({\hat{\beta}}_2\) > 0 for the global health status/HRQoL and the functional scales and \({\hat{\beta}}_2\) < 0 for the symptom scales). Both models detected that this armbytime effect was significant for the global health status/HRQoL (p = 0.013 for the LMM and p = 0.006 for the JM); only the JM detected a significant effect for the social functioning scale (p = 0.036).
For all the HRQoL dimensions, the JM detected a significant and protective (i.e., \({\hat{\gamma}}_1\) < 0) direct effect of the experimental treatment on the risk of death. We found that risk of death adjusted for the current HRQoL score was multiplied by a HR going from exp(−0.43) = 0.65 for global health status/HRQoL to exp(− 0.66) = 0.52 for physical functioning. This is consistent with the primary endpoint analysis that had found a marginal risk of death multiplied by a HR of 0.57 in the experimental arm compared with the control arm (see Section 3.3).
For all the HRQoL dimensions, the JM found that a poorer level was significantly associated with an increased risk of death. This negative association was strongest for the global health status/HRQoL scale, with an estimated value of \(\hat{\alpha}\) = − 0.029, (p < 10^{−4}), meaning that a diminution of 8.33 points, which is the difference between two adjacent possible score values in this scale, corresponds to a risk increase of exp(0.029 × 8.33) = 1.27. A consequence of this association can be observed in Fig. 1. In the control arm (dotted lines), where HRQoL over time is poorer than in the experimental arm (solid lines), the predicted mean trajectories go up to approximately 10 months (vs. 20 months in the experimental arm). Indeed, the time interval during which there are available HRQoL score data is shorter in the control arm than in the experimental arm due to earlier deaths.
Methodological comments
Compared with the LMM, the JM found that HRQoL in the control arm improved less (global health status/HRQoL: \({\hat{\beta}}_1\) = 0.31 vs. 0.63; role functioning: \({\hat{\beta}}_1\) = 0.27 vs. 0.80; fatigue: \({\hat{\beta}}_1\) = − 0.78 vs. −1.30; pain: \({\hat{\beta}}_1\) = − 1.99 vs. −2.47), deteriorated more (physical functioning: \({\hat{\beta}}_1\) = − 0.77 vs. −0.33) or deteriorated instead of improved (social functioning: \({\hat{\beta}}_1\) = − 0.35 vs. 0.11). These results suggest that the LMM would overestimate the slope parameter β_{1} in the global health status/HRQoL and functional scales and would underestimate it in the symptom scales. It is important to note that fatigue decreased nonsignificantly according to the JM while the (presumed) underestimation of the LMM made \({\hat{\beta}}_1\) cross the level of significance.
Compared with the LMM, for five out of six scales (all except for pain) the JM found a larger difference (in favor of the experimental treatment) of the HRQoL score trajectory between arms and smaller associated pvalues. This result suggests that the LMM would underestimate the armbytime interaction parameter β_{2}. In addition, the larger the direct arm effect γ_{1} was on the risk of death, the larger the difference was between the β_{2} estimates of the LMM and the JM. It should be noted that a consequence of the (presumed) underestimation of the armbytime interaction effect β_{2} by the LMM is that this model found this effect to be nonsignificant in the social functioning scale, whereas the JM found this effect to be significant.
From the previous comments, we can derive the following hypotheses:

If there is a negative association between HRQoL and the risk of event (dropout), then the LMM will overestimate (global health status/HRQoL and functional scales) or underestimate (symptomatic scales) the time effect β_{1}, leading to an overly optimistic predicted HRQoL score trajectory in the control arm.

If there is also a protective effect γ_{1} of the new treatment on the risk of event (dropout), then the LMM will underestimate (global health status/HRQoL and functional scales) or overestimate (symptomatic scales) the armbytime interaction effect β_{2}. In particular, where the HRQoL level over time is better in the experimental arm compared with the control arm, the LMM will diminish the beneficial treatment effect on HRQoL.
A simulation study is needed to validate these statements; this is the purpose of the next section.
Simulation study
We conducted a simulation study to compare both models under several scenarios. We based the design of the simulations on PRODIGE 4/ACCORD 11 and our knowledge about various clinical trials. For each scenario, we generated 1000 datasets of 500 patients randomly assigned to two treatment arms with 250 patients per arm. To compare the models, we calculated for each estimated parameter \(\hat{\beta}\) the following criteria: the mean value \(\overline{\hat{\beta}}\), the bias \(\overline{\hat{\beta}}\beta\), the relative bias \(\left(\overline{\hat{\beta}}\beta \right)/\beta\), the root mean square error (RMSE) \(\sqrt{\overline{{\left(\hat{\beta}\beta \right)}^2}}\), and the coverage rate, i.e., the proportion of samples for which the 95% CI includes the true value β.
Design
Visit times
For all patients, there were 15 scheduled visit times for HRQoL assessment at: inclusion, month 1, month 2, month 3, month 6, month 9, month 12, month 18, and then every 6 months until administrative censoring (end of study). For each patient i, we generated the individual visit times t_{ij} where j is the visit number (j = 1, …, 15). At inclusion, t_{i1} = 0 ∀ i, and, for j > 1, the t_{ij} were uniformly distributed: within ±3 days of scheduled visits in months 1 to 3, ±7 days of scheduled visits in months 6 to 12, and ± 14 days of scheduled visits at later timepoints.
Longitudinal HRQoL outcome and time to dropout
For each patient i, we generated the longitudinal HRQoL outcome following the JM defined by Eqs. (1) and (2) at the visit times t_{ij}, j = 1, …, 15. We then generated the dropout time from Eq. (4) using a Weibull distribution for the baseline hazard; that is, with λ_{0}(t) = ϕt^{ϕ − 1} exp(γ_{0}) where ϕ was the shape parameter and exp(γ_{0}) corresponded to the scale parameter (we used an exponential parametrization for the scale parameter to be consistent with the JM package that estimates γ_{0} using an intercept in the exponentiated term of the hazard function). As usual inverse transform sampling was not straightforward, in particular because of the timevarying covariate Y^{⋆}(t), we used a method implemented in the R package simsurv that allows survival times to be generated from complex models such as JMs [10].
If dropout occurred before administrative censoring, we removed all the HRQoL data for visit times after the time to dropout.
Parameter values
The choice of the parameter values was based on the estimates from the JM applied to the PRODIGE 4/ACCORD 11 trial data for the global health status/HRQoL scale while assuming a Weibull distribution to specify the baseline hazard function (main results depicted in Supplementary Table 2). In Scenario 0, all the parameter values were set to the estimates from the application rounded to the second decimal place; in Scenarios 1, 2, 3, 4, and 5, some parameter values deviated from the application (as detailed in the next section).
Scenarios
General principle
We have considered five different scenarios. The “PRODIGE 4/ACCORD 11”like scenario (Scenario 0) aimed to verify that the simulation results were consistent with the application results of Section 3.4. The other scenarios (1, 2, 3, 4, and 5) aimed to confirm and generalize the statements derived from the methodological comparison of Section 3.6.
HRQoL trajectory parameters
In all scenarios, the variance parameters were set to the rounded estimates from PRODIGE 4/ACCORD 11: σ_{0} = 15.2, σ_{1} = 2.1, ρ_{01} = − 0.4 (with ρ_{01} = σ_{01}/σ_{0}σ_{1}), σ = 13.5. In Scenario 0, all other parameters were also set to the rounded estimates from this application: β_{0} = 53.9, β_{1} = 0.3, and β_{2} = 1.2, corresponding to a HRQoL value that increased slightly over time in the control arm and substantially in the experimental arm. In the other scenarios, the mean HRQoL value at baseline was β_{0} = 50, and then HRQoL values were constant in the control arm (β_{1} = 0) and increased (β_{2} = 1.5 in Scenarios 1, 2, 3), were constant (β_{2} = 0 in Scenario 4), or decreased (β_{2} = − 1.5 in Scenario 5) in the experimental arm. The mean trajectory of the HRQoL value in each treatment arm is depicted in Fig. 2 for all scenarios.
Riskofdropout parameters
In Scenario 0, the regression parameters were set to the rounded estimates from PRODIGE 4/ACCORD 11. The arm effect was γ_{1} = − 0.4, corresponding with a reduced risk of dropout due to the direct effect of the experimental treatment, and the association parameter was α = − 0.02, meaning that an increase in the HRQoL value was associated with a reduced risk of dropout. In Scenario 1, risk of dropout was the same in both treatment arms (γ_{1} = 0) and there was no association between HRQoL and dropout (α = 0). In Scenario 2, the risk of dropout was still the same in both treatment arms (γ_{1} = 0) but the current value of HRQoL was negatively associated with the risk of dropout (α = − 0.03). In Scenarios 3, 4, and 5, HRQoL and dropout were still negatively associated (α = − 0.03) and, in addition, the experimental treatment reduced the risk of dropout (γ_{1} = − 0.7).
The Weibull parameters for the baseline hazard function were set to ϕ = 1.6 and γ_{0} = − 2.2 in Scenarios 0, 2, 3, 4, and 5 based on the PRODIGE 4/ACCORD 11 application. We replaced them with ϕ = 1.7 and γ_{0} = − 4.1 in Scenario 1, where there was no association between HRQoL and dropout (α = 0), in order to obtain an overall hazard function comparable to those of the other scenarios.
Figure 3 depicts for each scenario the hazard function in both treatment arms with a current HRQoL true value set to its theoretical mean – that is, the hazard function given by: λ_{0}(t) exp {γ_{1}arm + α(β_{0} + β_{1}t + β_{2}{arm × t})}. Note that the curves are not the same in Scenarios 3, 4, and 5, while all the parameters governing the risk of dropout are the same. This comes from different values of the HRQoL covariate since, as can be seen in Fig. 2, the HRQoL trajectory in the experimental arm varies: it increases, is constant, and decreases in Scenarios 3, 4, and 5, respectively. Depending on the scenario, the mean of the median survival times in the 1000 simulations varied from 6.2 to 9 months for the control arm and from 8.7 to 17 months for the experimental arm (see Supplementary Table 3).
Results
The results (mean, bias, relative bias, RMSE, and coverage rate) concerning the main parameters, β_{1}, β_{2}, γ_{1}, and α, are summarized in Table 2; for readability, the table also details the results (mean, bias, and relative bias) for β_{1} + β_{2}. Supplementary Table 4 details the results concerning the other parameters (intercept, variance parameters, and Weibull parameters). Figure 4 depicts the mean of the predicted mean HRQoL trajectories from the two models by treatment arm.
As can be seen in Table 2, the JM provided unbiased estimates of the main HRQoL parameters in all scenarios, in contrast with the LMM (which provided biased estimates in all scenarios other than Scenario 1). The bias and RMSE from the LMM were in general also larger than those from the JM for the other HRQoL parameters, though to a lesser extent than for β_{1} and β_{2} (see Supplementary Table 4). The JM also estimated the effects of HRQoL and treatment on the risk of dropout, α and γ_{1}, as well as the baseline hazard parameters, γ_{0} and ϕ, with almost zero biases and coverage rates close to 95% for all scenarios.
Indepth comments of the results scenario by scenario are given below.
Scenario 0
In the “PRODIGE 4/ACCORD 11”like scenario, HRQoL increased slightly over time in the control arm (slope: β_{1} = 0.3) and substantially in the experimental arm (slope: β_{1} + β_{2} = 1.5). The new treatment had a protective effect on the risk of dropout (γ_{1} = − 0.4), and an increase in the current value of HRQoL was associated with a reduced risk of dropout (α = − 0.02). The results were consistent with those found in the global health status/HRQoL scale in the application in Section 3: the LMM overestimated the slope parameter β_{1} = 0.3 (\(\overline{{\hat{\beta}}_1}\) = 0.603 compared with \(\overline{{\hat{\beta}}_1}\) = 0.297 with the JM) and underestimated the interaction parameter β_{2} = 1.2 (\(\overline{{\hat{\beta}}_2}\) = 1.141 compared with \(\overline{{\hat{\beta}}_2}\) = 1.215 with the JM). Consequently, the LMM overestimated the slope in the experimental arm (bias for β_{1} + β_{2}: 0.244) but to a lesser extent than in the control arm (bias for β_{1}: 0.303). The β_{1} coverage rate of the LMM deviated from the nominal level of 95%, unlike the JM (70.8% vs. 95.7%).
Scenario 1
Scenario 1 was the scenario of reference for subsequent Scenarios 2, 3, 4, and 5. In all these five scenarios, HRQoL in the control arm was constant over time and HRQoL in the experimental arm increased (Scenarios 1, 2, 3), was constant (Scenario 4), or decreased (Scenario 5). HRQoL was independent of risk of dropout (i.e., dropout was noninformative) in Scenario 1 (α=0), in contrast to Scenarios 2, 3, 4, and 5, so there was no need to use the JM rather than using its two linear mixed and survival submodels separately. In this scenario, the JM performed roughly as well as the LMM regarding the HRQoL parameters. The RMSEs were 0.184 (JM) and 0.183 (LMM) for β_{1}, and 0.253 (both models) for β_{2}. The JM also performed well regarding the riskofdropout parameters, with almost zero biases for γ_{1} and α.
Scenario 2
Unlike in Scenario 1, in Scenario 2 there was a negative association between the current value of HRQoL and risk of dropout (α = − 0.03). This led to an LMM mean overestimation of 0.356 for the HRQoL parameter β_{1} (=0), while the bias from the JM estimate was almost zero (0.004). The β_{1} coverage rate using the LMM was far from 95% (52.7% vs. 94.3% for the JM). For the HRQoL parameter β_{2}, the relative bias from the LMM was doubled (in absolute value) compared with that from the JM (− 0.023 vs. 0.011). However, the absolute bias from the LMM was not substantial (− 0.035). This slight underestimation of β_{2} resulted in a slope overestimation that was slightly less important in the experimental arm (bias of β_{1} + β_{2}: 0.321) than in the control arm (bias of β_{1}: 0.356). It is noticeable that the slope overestimation is less important where the risk of dropout is lower; that is, in the experimental arm (see Fig. 3). In fact, even though the new treatment had no direct effect on the risk of dropout (γ_{1} = 0), it had a protective indirect effect through higher values of HRQoL in the experimental arm than in the control arm (see Fig. 2).
Scenario 3
In Scenario 3, the LMM behaved as it did in Scenario 2 as regards the β_{1} parameter, giving a bias of 0.343. Indeed, the HRQoL trajectory in the control arm was the same in the two scenarios, as was the association effect between the current value of HRQoL and risk of dropout (α = − 0.03). In contrast with Scenario 2, in Scenario 3 the new treatment had a protective effect on the risk of dropout (γ_{1} = − 0.7). Consequently, the LMM underestimated the interaction parameter β_{2} by 0.141 on average, with a corresponding relative bias of − 0.094 (vs. 0.010 using the JM) and a coverage rate of 88.6% (vs. 93% using the JM). Notice that, in comparison with Scenario 2, a similar overestimation of the slope in the control arm β_{1} associated with a larger underestimation of β_{2} led to a smaller overestimation of the slope governing the HRQoL trajectory in the experimental arm (bias of β_{1} + β_{2}: 0.202). The lower bias of the slope β_{1} + β_{2} can be related to the decreased risk of dropout that is present in the experimental arm due to the additional treatment effect (see Fig. 3).
Scenarios 4 and 5
As expected, the LMM behaved in Scenarios 4 and 5 as it did in Scenario 3 (and 2) as regards the β_{1} parameter, since neither the HRQoL trajectory in the control arm nor the association between the current HRQoL value and risk of dropout had changed. As in Scenario 3, the new treatment was protective on the risk of dropout (γ_{1} = − 0.7 in Scenarios 3, 4, and 5) but, instead of benefiting HRQoL (β_{2} = 1.5 in Scenario 3), it had no effect in Scenario 4 (β_{2} = 0) and was deleterious in Scenario 5 (β_{2} = − 1.5). This led to ascending sizes of bias for the slope β_{1} + β_{2} in Scenarios 3, 4, and 5 (0.202, 0.254, and 0.291, respectively), which can be related to the increased risk of dropout in the experimental arm due to the indirect treatment effect through the HRQoL values. Conversely, sizes of bias for β_{2} descended in the three scenarios (− 0.141, − 0.098, and − 0.051, respectively).
Summary and further comments
The results of Scenario 0 confirmed that in the application to the PRODIGE 4/ACCORD 11 data, the LMM would have overestimated the slope parameter β_{1} and underestimated the interaction parameter β_{2} in the global health status/HRQoL scale. More generally, the results of the subsequent scenarios were consistent with the hypotheses derived from the application. In the reference Scenario 1 where there was no association between HRQoL and risk of dropout, the LMM performed well (as did the JM). In Scenarios 2, 3, 4, and 5, where a low current HRQoL value was associated with an increased risk of dropout, the LMM produced biased results for the HRQoL parameters β_{1} and β_{2}, which the JM did not. In fact, to understand the bias of β_{2}, it should be seen as the difference between the biases of the β_{1} and the β_{1} + β_{2} slopes governing the HRQoL trajectories in the control and experimental arms, respectively.
Scenarios 2, 3, 4, and 5 have shown that for a given degree of association (namely, α = − 0.03), the size of the LMM bias increases with the risk of dropout. Hence, the size of the bias varies with the factors affecting the risk of dropout, of which there were two in our simulation study: γ_{1}, the direct effect of the treatment, and \({Y}_i^{\star }(t)\), the current HRQoL value. Scenarios 2 and 3 were similar except in the case of the treatment effect γ_{1}: this was null in Scenario 2 but protective in Scenario 3. This implied a lower risk of dropout in the experimental arm in Scenario 3 and, consequently, a lower bias of the slope β_{1} + β_{2}. Similarly, in Scenario 4, nothing differentiated the two arms (which had the same constant HRQoL trajectories) apart from the effect of the treatment on dropout, γ_{1}, which was protective. This implied that the slope governing the HRQoL trajectory was less biased in the experimental arm (bias of β_{1} + β_{2}: 0.254) than in the control arm (bias of β_{1}: 0.352). Secondly, the size of the bias varied with the HRQoL itself, since \({Y}_i^{\star }(t)\) also impacted the risk of dropout. This is obvious in Scenarios 3, 4, and 5; these scenarios exhibited different HRQoL trajectories in the experimental arm but the same direct effect of the treatment on the risk of dropout. In these scenarios, the HRQoL trajectory in the experimental arm was increasing, constant, and decreasing, respectively, leading to ascending sizes of bias for the slope governing the HRQoL trajectory in this arm, β_{1} + β_{2}. Similarly, in Scenario 2 where the treatment had no direct effect on the risk of dropout, the risk of dropout was lower in the experimental arm than in the control arm because of a higher HRQoL trajectory. This led to a slope that was less biased in the experimental arm (bias for β_{1} + β_{2}: 0.321) than in the control arm (bias for β_{1}: 0.356). Finally, it can be noted that in Scenarios 2, 3, 4, and 5, where the risk of dropout in the control arm was the same (since the HRQoL score trajectories were the same and, of course, there was no treatment effect), the biases for β_{1}, the slope governing the HRQoL trajectory in the control arm, were similar (0.356, 0.343, 0.352, and 0.342, respectively).
Discussion
This article focuses on the analysis of longitudinal HRQoL data in cancer clinical trials where observation may be interrupted during treatment or followup. We have shown that as soon as the risk of dropout is associated with HRQoL, the LMM typically used for data analysis will produce biased estimates. Obtaining unbiased results cannot be achieved without jointly modeling the dropout and the HRQoL longitudinal outcome.
We first compared the LMM with a JM on HRQoL data of patients with metastatic pancreatic cancer who may drop out before end of study due to death. The JM found a significant association between HRQoL and survival for the six analyzed scales. Moreover, the two models differed in the estimation of HRQoL parameters so that the main parameters of interest, β_{1} and β_{2}, could be significant using the JM but nonsignificant using the LMM (social functioning), and vice versa (fatigue). A limitation of this application stems from the fact that death is not an ordinary dropout since the HRQoL outcome is not unobserved but cannot exist after death. Contrary to the LMM, the JM account for the dependence between the longitudinal outcome and the dropout but both models implicitly impute HRQoL data beyond dropout. Actually, if a longitudinal outcome is truncated by death rather than ordinary dropout and if the primary interest is in the longitudinal outcome rather than in the survival outcome, other kinds of models could be more relevant than JMs. In particular, the RCA (regression conditioning of being alive) models produce estimates regarding the surviving population [11, 12] but, at our knowledge, they have not been implemented in standard software. Another alternative would be stratifying the analysis by the survival time using pattern mixture models so that mean HRQoL trajectories could be plotted by groups defined by the time of death [13]. It should be noted that, because they use future survival information, pattern mixture models can be used for description but not to predict trajectories. Note that pattern mixture models are also an alternative to take into account ordinary dropout as well as selection models, but we found more advantages in using joint models [14].
In spite of the limitation we have just mentioned, the results of our application made it possible to derive two hypotheses on the bias mechanisms arising in presence of dropout, which we then confirmed and complemented by use of a simulation study.
The first hypothesis was that a negative association between HRQoL and risk of dropout would result in an overestimation of β_{1}, the slope governing the HRQoL score trajectory in the control arm. We confirmed this in the simulation study and deduced that the same assertion holds for β_{1} + β_{2}, the slope governing the HRQoL score trajectory in the experimental arm. Dropout is informative: the patients with the lowest levels of HRQoL are the most likely to drop out early, so will contribute only weakly to the likelihood function. The second hypothesis was that a protective effect of the new treatment on the risk of dropout would result in an underestimation of β_{2}, the armbytime interaction effect (slope difference between experimental and control arms). The simulation study confirmed this and also revealed that to understand why and how β_{2} is biased, the bias of the sum of β_{1} + β_{2}, rather than the bias of β_{1}, should be the focus. No bias for β_{2} means that the slopes in both arms are equally overestimated, maintaining the correct difference between the slopes. Accordingly, the underestimation of β_{2} observed in the application would be the result of a lower slope overestimation in the experimental arm than in the control arm, due to a longer time before dropout.
Indeed, in addition to the degree of association between the current value of HRQoL and risk of dropout (α parameter), the simulation study revealed that the size of the biases in the HRQoL slope estimates increases with the risk of dropout. Thus, the size of the bias depends on the factors affecting the risk of dropout; the first is the direct effect of the treatment on the risk of dropout, γ_{1}. If the treatment has a protective effect, the risk of dropout will be lower in the experimental arm than in the control arm, implying that the bias will be smaller for β_{1} + β_{2} than for β_{1}. The second factor affecting the risk of dropout is the longitudinal HRQoL outcome itself, through the variable \({Y}_i^{\star }(t)\). Consequently, if the HRQoL score trajectories are different for the two arms, the slope bias will be largest for the arm with the poorest HRQoL values. HRQoL participates here in an indirect effect of the treatment on the risk of dropout. In practice, the direct and indirect treatment effects act together, and how the slope will be biased in one arm compared with the other will in fact depend on the overall treatment effect on the risk of dropout.
In a JM, the overall treatment effect corresponds to (the logarithm of) a timevarying HR between the experimental and control arms. For two patients sharing the same random effects, the overall treatment effect in the considered JM reduces to γ_{1} + α β_{2}t, which corresponds to the sum of the direct and indirect treatment effects. This HR has a subjectspecific interpretation since the model controls for the random effects when calculating it. In other applications where the focus is on the survival submodel – typically, an application where a longitudinal biomarker serves to predict progression or death – an estimate of an overall treatment effect with a marginal interpretation could be preferred. To achieve this, van Oudenhoven et al. [15] recently proposed a method that is implemented in the JM package. However, the reason for using a JM in the present work was to obtain fair estimates of the HRQoL parameters, so we mainly focused on the linear mixed submodel.
Our findings could be extended to other JMs in many aspects. For example, the survival submodel could be adjusted on additional prognostic factors, and the linear mixed submodel could include other variables such as the treatment arm. In the latter case, the indirect treatment effect on the risk of dropout would take the form α(β_{3} + β_{2}t) where β_{3} is the baseline arm effect. Our conclusions are based on a HRQoL score for which high values are associated with a high level of HRQoL and low values are associated with a poor level of HRQoL, such as the scores from the global health status/HRQoL scale or the functional scales of the QLQC30. The same conclusions hold in the reverse situation of a scale in which high values are associated with a poor level of HRQoL and low values are associated with a high level of HRQoL, such as the score from the symptomatic scales of the QLQC30. In this case, the association parameter α would be positive instead of negative when a poor level of HRQoL is associated with an increased risk of dropout. Obviously, the present works remains valid for PROs other than HRQoL.
Finally, note that we focused on a LMM and a JM assuming a linear relationship between the HRQoL outcome and time (random coefficient model). Nevertheless, we encourage the use of models allowing for a flexible trajectory of the HRQoL outcome (for example, based on splines). Indeed, an analysis assuming a simplistic functional form for the HRQoL outcome could miss information and provide misleading results [16]. Note that the bias mechanisms that have been revealed in this article should operate similarly in case of flexible models; although the parameters would lose their interpretability, our findings could probably by extended to the resulting predicted trajectories.
Conclusion
This work suggests avoiding the usual LMM analysis of longitudinal HRQoL data in cancer clinical trials where patients may drop out. In general, dropout is associated with poor rather than good HRQoL, so that the LMM will be too optimistic in estimating the HRQoL parameters. In particular, the LMM will overestimate the slope governing the HRQoL trajectory in both arms. The direct and indirect effects of the treatment on the risk of dropout act together, potentially in opposite directions, but in general, the overall treatment effect is protective; that is, time before dropout is longer in the experimental arm. If this is the case, the LMM will also misestimate the effect of the treatment over time on HRQoL (i.e., the slope difference between experimental and control arms). More specifically, the LMM will be more optimistic for the control arm than for the experimental arm: where there is better HRQoL in the experimental arm, the LMM will diminish the beneficial treatment effect on HRQoL, and where there is poorer HRQoL in the experimental arm, it will accentuate the deleterious treatment effect on HRQoL. Such biases can be avoided using a JM and, as the JM is composed of a linear mixed submodel, the way of interpreting the HRQoL parameter results will be unchanged. On top of that, as the JM is also composed of a survival submodel, the JM will provide additional estimates and give further insight into the relationship between dropout and the longitudinal HRQoL outcome.
Availability of data and materials
The dataset from the PRODIGE 4/ACCORD 11 clinical trial is not publicly available due to confidentiality requirements. Data are however available from the main coordinator of the clinical trial, Pr Thierry Conroy, upon reasonable request, and with permissions of the study sponsor UNICANCER R&D. The R scripts to perform the data analysis and to reproduce the simulation study are available from the authors upon request.
Abbreviations
 HRQoL:

Healthrelated quality of life
 LMM:

Linear mixed model
 PRO:

Patientreported outcome
 JM:

Joint model
 HR:

Hazard ratio
 EORTC:

European Organization for Research and Treatment of Cancer
 ITT:

Intentiontotreat
 CI:

Confidence interval
 RMSE:

Root mean square error
 RB:

Relative bias
 Cov.:

Coverage rate
References
Fairclough DL, Peterson HF, Cella D, Bonomi P. Comparison of several modelbased methods for analysing incomplete quality of life data in cancer clinical trials. Stat Med. 1998;17:781–96.
Rizopoulos D. Joint models for longitudinal and timetoevent Data : with applications in R. London: Chapman and Hall/CRC; 2012.
ProustLima C, Séne M, Taylor JM, JacqminGadda H. Joint latent class models for longitudinal and timetoevent data: a review. Stat Methods Med Res. 2014;23:74–90.
Ibrahim JG, Chu H, Chen LM. Basic concepts and methods for joint models of longitudinal and survival data. JCO. 2010;28:2796–801.
Ediebah DE, GalindoGarre F, Uitdehaag BMJ, Ringash J, Reijneveld JC, Dirven L, et al. Joint modeling of longitudinal healthrelated quality of life data and survival. Qual Life Res. 2015;24:795–804.
Conroy T, Desseigne F, Ychou M, Bouché O, Guimbaud R, Bécouarn Y, et al. FOLFIRINOX versus gemcitabine for metastatic pancreatic cancer. N Engl J Med. 2011;364:1817–25.
Aaronson NK, Ahmedzai S, Bergman B, Bullinger M, Cull A, Duez NJ, et al. The European Organization for Research and Treatment of Cancer QLQC30: a qualityoflife instrument for use in international clinical trials in oncology. J Natl Cancer Inst. 1993;85:365–76.
Fayers P, Aaronson NK, Bjordal K, Groenvold M, Curran D, Bottomley A. EORTC QLQC30 Scoring Manual. 3rd ed. Brussels: European Organisation for Research and Treatment of Cancer; 2001.
GourgouBourgade S, BascoulMollevi C, Desseigne F, Ychou M, Bouché O, Guimbaud R, et al. Impact of FOLFIRINOX compared with gemcitabine on quality of life in patients with metastatic pancreatic cancer: results from the PRODIGE 4/ACCORD 11 randomized trial. J Clin Oncol. 2013;31:23–9.
Crowther MJ, Lambert PC. Simulating biologically plausible complex survival data. Stat Med. 2013;32:4118–34.
Kurland BF, Heagerty PJ. Directly parameterized regression conditioning on being alive: analysis of longitudinal data truncated by deaths. Biostatistics. 2005;6:241–58.
Kurland BF, Johnson LL, Egleston BL, Diehr PH. Longitudinal data with followup truncated by death: match the analysis method to research aims. Stat Sci. 2009;24:211–22.
Pauler DK, McCoy S, Moinpour C. Pattern mixture models for longitudinal quality of life studies in advanced stage disease. Stat Med. 2003;22:795–809.
Cuer B, Mollevi C, Anota A, Charton E, Juzyna B, Conroy T, et al. Handling informative dropout in longitudinal analysis of healthrelated quality of life: application of three approaches to data from the esophageal cancer clinical trial PRODIGE 5/ACCORD 17. BMC Med Res Methodol. 2020;20:223.
van Oudenhoven FM, Swinkels SHN, Ibrahim JG, Rizopoulos D. A marginal estimate for the overall treatment effect on a survival outcome within the joint modeling framework. Stat Med. 2020;39:4120–32.
Winter A, Cuer B, Conroy T, Juzyna B, Gourgou S, Mollevi C, et al. Flexible modeling of longitudinal healthrelated quality of life data accounting for informative dropout in a cancer clinical trial. Qual Life Res. 2022. https://doi.org/10.1007/s11136022032526.
Acknowledgements
Not applicable.
Funding
This work was supported by the “Ligue nationale contre le cancer” (“Projet de Recherche en Epidémiologie Ligue” 2019) and the “SIRIC Montpellier Cancer” (Grant INCa_Inserm_DGOS_12553).
Author information
Authors and Affiliations
Contributions
Study design: CT, CM; Statistical analysis: CT; Manuscript preparation: CT, CM, BC; Reading and approval of final manuscript: all authors.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
In this study, we take data from the PRODIGE 4/ACCORD 11 clinical trial (ClinicalTrials.gov Identifier: NCT00112658). All participants of the PRODIGE 4/ACCORD 11 trial provided written informed consent. We obtained permission for the data base access from the sponsor UNICANCER R&D.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Supplementary Table 1.
Complementary results to Table 1 regarding parameters β_{0}, σ, σ_{0}, σ_{1}, ρ_{01}. Supplementary Table 2. Main results of the JM (assuming a Weibull distribution for the baseline hazard function) fitted to the clinical trial data. Supplementary Table 3. Mean values of the median survival time in the 1000 simulations of the simulation study in each arm and in total. Supplementary Table 4. Complementary results to Table 2 regarding parameter β_{0}, σ, σ_{0}, σ_{1}, ρ_{01}.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Touraine, C., Cuer, B., Conroy, T. et al. When a joint model should be preferred over a linear mixed model for analysis of longitudinal healthrelated quality of life data in cancer clinical trials. BMC Med Res Methodol 23, 36 (2023). https://doi.org/10.1186/s12874023018463
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874023018463
Keywords
 Joint model
 Informative dropout
 Linear mixed model
 Random intercept and slope model
 Healthrelated quality of life
 Longitudinal outcome
 Clinical trials
 Cancer