 Research Article
 Open Access
 Published:
Joint model robustness compared with the timevarying covariate Cox model to evaluate the association between a longitudinal marker and a timetoevent endpoint
BMC Medical Research Methodology volume 19, Article number: 222 (2019)
Abstract
Background
The recent progress in medical research generates an increasing interest in the use of longitudinal biomarkers for characterizing the occurrence of an outcome. The present work is motivated by a study, where the objective was to explore the potential of the long pentraxin 3 (PTX3) as a prognostic marker of Acute GraftversusHost Disease (GvHD) after haematopoietic stem cell transplantation. Timevarying covariate Cox model was commonly used, despite its limiting assumptions that marker values are constant in time and measured without error. A joint model has been developed as a viable alternative; however, the approach is computationally intensive and requires additional strong assumptions, in which the impacts of their misspecification were not sufficiently studied.
Methods
We conduct an extensive simulation to clarify relevant assumptions for the understanding of joint models and assessment of its robustness under key model misspecifications. Further, we characterize the extent of bias introduced by the limiting assumptions of the timevarying covariate Cox model and compare its performance with a joint model in various contexts. We then present results of the two approaches to evaluate the potential of PTX3 as a prognostic marker of GvHD after haematopoietic stem cell transplantation.
Results
Overall, we illustrate that a joint model provides an unbiased estimate of the association between a longitudinal marker and the hazard of an event in the presence of measurement error, showing improvement over the timevarying Cox model. However, a joint model is severely biased when the baseline hazard or the shape of the longitudinal trajectories are misspecified. Both the Cox model and the joint model correctly specified indicated PTX3 as a potential prognostic marker of GvHD, with the joint model providing a higher hazard ratio estimate.
Conclusions
Joint models are beneficial to investigate the capability of the longitudinal marker to characterize timetoevent endpoint. However, the benefits are strictly linked to the correct specification of the longitudinal marker trajectory and the baseline hazard function, indicating a careful consideration of assumptions to avoid biased estimates.
Background
The recent progress in molecular biology and genetics generates an increasing interest in investigating genomic or molecular biomarkers, as markers of diagnosis, prognosis or response to treatment. The longitudinal measure of biomarkers is useful for characterizing the occurrence of an outcome of interest, as they can be predictive of treatment results or related to the event process and prognosis. For example, the present work is motivated by a study, where the objective was to explore the potential of the long pentraxin 3 (PTX3) as a prognostic marker of Acute GraftversusHost Disease (GvHD) after haematopoietic stem cell transplantation [1].
The timevarying covariate Cox model (TVCM) [2, 3] has been used to study the association between an observed longitudinal measure of biomarkers and the hazard of an event [1, 4]. This approach uses the lastobservationcarriedforward (LOCF), since marker’s observations are only available at discrete times (i.e. time of measurement), leading to the pitfall of introducing bias given the continuous nature of the biomarker [5]. Further, the TVCM fails to account for the so called “measurement error” in the biomarker. As evidenced by various studies (e.g., [6, 7]), failure to adjust for such measurement error introduces further bias into model estimates.
Shared frailty joint models address these issues by modeling simultaneously the profile of the marker and the timetoevent data [8, 9]. Within such approaches, a linear mixed model for the underlying longitudinal trajectories of the marker is linked to the survival model using shared random effects [10]. This approach allows inference on the association between the hazards of an event and the longitudinal biomarkers, by avoiding the LOCF assumption and accounting for the random measurement error [11]. However, joint models are parametric and thus require additional strong assumptions over the semiparametric Cox model with timevarying covariate [12]. Assumptions are needed on both the distribution of the marker and its trajectory, and on the shape of the hazard function of the event of interest.
The literature that evaluates the impacts of misspecification of joint models for their applications in biomedical research has been particularly rare, while methodological efforts rapidly increasing (e.g., [13]). This causes a lack of clarity on practical issues, which in turn discourages applied researchers to improve the understanding of such models [14, 15]. Few simulation studies have been performed in the joint modeling framework. [16] investigated the use of joint models to adjust for measurement error only at the baseline measurement value. Simulation by [11] evaluated the performance of the joint model and the TVCM focusing on treatment effect on the timetoevent outcome, while [17] focused on the association between marker and event under few specific scenarios. A broader simulation study that evaluates the impact of model misspecifications and that could be useful for applied statisticians in order to understand advantages and disadvantages of a joint model as compared with a Cox model in different contexts is lacking. Moreover, the distinctive role of the bias due to LOCF and measurement error in the TVCM has not received attention in the previous studies. In this paper, we conduct a comprehensive simulation study with the following goals: (a) disentangle the bias introduced by LOCF and measurement error when assessing the association between a marker and a timetoevent endpoint by the TVCM and to compare its performance with a joint model, (b) clarify relevant assumptions of the joint model and assess its robustness in the presence of key model misspecifications, in particular considering the misspecifications of the marker distribution, of the marker trajectory, and of the shape of the hazard function. Additionally, these theoretical considerations will be used to evaluate the potential of PTX3 as a prognostic marker of GvHD after haematopoietic stem cell transplantation.
In “Method” section below, we describe the TVCM and the joint model approaches. In “Simulation study” section we present the simulation studies: simulation protocol, key model misspecification scenarios and discussion of associated results. In “Motivating context” section, we present an application to illustrate the use of PTX3 as a marker of GvHD using both the TVCM and joint model. Concluding discussion is presented in “Discussion” section.
Method
Notation
Let \(T^{*}_{i}\) be the failure time of subject i (i=1,…,n) in a cohort of size n. Suppose that we want to estimate the association between a biomarker w_{i}(t), that is varying in time, and the hazard of failure. In practice, the longitudinal biomarker is measured at discrete times t_{ij},j=1,…,n_{i}. Thus, biomarker information coming from the ith subject is a vector of observed discrete values, possibly subjected to the measurement error ε_{i}(t), {y_{i}(t_{ij})=w_{i}(t_{ij})+ε_{i}(t_{ij}),j=1,…,n_{i}}. Since survival times are commonly affected by right censoring, the observed survival time is \(T_{i}=\text {min}(T^{*}_{i}, C_{i})\), where C_{i} is the right censoring time and \(\delta _{i}=I(T^{*}_{i}\leq C_{i})\) is the event indicator, indicating whether the survival time or the censoring time is observed. \(T^{*}_{i}\) and C_{i} are assumed to be independent conditional on the biomarker trajectory w_{i}(t), as commonly done in survival analysis (e.g., [18]).
The timevarying covariate Cox model
The TVCM is a generalization of the Cox model [2] accounting for covariates that can change value during the observation time. The proportional hazards model has the form
where h_{0}(t) denotes an unspecified baseline hazard, α is a parameter measuring the association between the observed longitudinal measure y_{i}(t) and the hazard at time t (h_{i}(t)). A vector of fixed baseline covariates can also be included in the model (1). The hazard ratio HR= exp(α) is interpreted as the relative increase in the hazard at any time t for a unit increase in the observed value of the biomarker at the same time point. The HR is assumed to be constant in time, thus we assume that the relative increase in the hazard for each unit increase in the biomarker is the same for all the observation time. Inference is based on maximizing the partial likelihood [3]. Of note, when y_{i}(t) is not observed at time t, the most updated value is used: y_{i}(t_{ij}),t_{ij}≤t<t_{ij+1}, using the LOCF principle [8].
Joint models
A joint model of longitudinal and survival data comprises two linked submodels: the longitudinal and the survival submodels [10, 19]. The longitudinal submodel specifies the trajectory of a biomarker over time. This is typically achieved using a linear mixed effects model [20] of the form:
in which f_{i}(t) and g_{i}(t) are vectors of functions of time t for the fixed effect parameters β and the random effect parameters b_{i}, respectively. The component ε_{i}(t) denotes mutually independent normally distributed error terms with variance \(\sigma ^{2}_{\epsilon }\). For the random effects, one assumes b_{i}∼MVN(0,Σ), where Σ is intersubject variancecovariance matrix. Further, the random effects are assumed to be independent of the error terms. In model (2) the observed marker value y_{i}(t) at time point t is decomposed into the underlying true marker value w_{i}(t) and a random error term. The survival submodel attempts to associate the marker value with the hazard of an event at the same time point t using the proportional hazards model:
Similarly to (1), the parameter α measures the association between the longitudinal biomarker and the timetoevent and the hazard ratio HR= exp(α) is assumed constant in time. A vector of fixed baseline covariates can be included in this model as well. The basic difference with (1) is that model (3) does not use the observed value of the biomarker y_{i}(t), but an estimate of the true value w_{i}(t), which is continuously updated in time and obtained by maximizing the joint likelihood of the timetoevent and longitudinal marker outcomes. As a note, an appropriate estimate of the subject trajectory w_{i}(t) requires correct specification of the design vectors f_{i}(t) and g_{i}(t). The optimization procedure involves a hybrid of expectation maximization (EM) and direct maximization as discussed in [10]. Unlike in the TVCM of (1), the baseline hazard must be specified parametrically or approximated by splinebased approaches. In fact, leaving the baseline hazard completely unspecified within the joint modeling framework severely underestimates the standard errors of the parameter estimates [21]. While the association parameter in both (3) and (1) is denoted by α, the corresponding estimates from the two models would be different.
Simulation study
In this section, we conduct a simulation study under various scenarios in order to address the two aims, (a) disentangling the bias introduced by LOCF and measurement error when assessing the association between a marker and a timetoevent by the TVCM and compare its performance with that of the joint model. The second aim (b) focuses on clarifying relevant assumptions of the joint model and assess its robustness in the presence of model misspecifications. In fact, in the joint modeling framework, the association between the longitudinal marker and the hazard of an event relies on several assumptions on the longitudinal and survival submodels, including the marker distribution, the marker trajectory and the shape of the hazard function. The impacts of misspecifying these assumptions are illustrated, respectively, in the sections b1, b2 and b3. Table 1 summarizes the main parameter values used for the simulation scenarios, which are described below. All simulations and analyses were performed using the R package JM version 1.4.7.
Simulations protocol
We considered a sample size of n=300 subjects with regular measures of the biomarker for 14 weeks, including the baseline measurement (t=0,...14). The simulation setting was inspired by the motivating context of the data in “Motivating context” section. Data were generated by the following steps:
 1
The general formula to obtain the true marker value w_{i}(t) was given as
$$ \begin{aligned} w_{i}(t)&=\beta_{0}+\beta_{1}t+\beta_{2}t^{2}+b_{i0}+b_{i1}t+b_{i2}t^{2} \\ &\boldsymbol{b}_{i}=(b_{i0},b_{i1}, b_{i2})^{T}\sim N_{3}(\boldsymbol{0}, \Sigma), \\ \end{aligned} $$(4)where Σ denotes 3 by 3 intersubject variancecovariance matrix. When a linear decreasing trajectory was considered, as for the majority of the scenarios reported in Table 1, the fixed effect parameters were chosen to be β_{0}=3.2, β_{1}=−0.07 and β_{2}=0. A basic scenario of biomarker with constant value in time was also considered by setting β_{1}=β_{2}=0 (scenario 1, Table 1). To assess the misspecification of the marker distribution (b1), a random intercept model was considered with b_{i0} generated from four different probability distributions: a Bimodal mixture of two normal distributions (hereafter called Bimodal), Chisquare, Gamma and Normal (scenarios 3 to 6). The parameter values of these distributions were chosen such that their corresponding variances equaled the random intercept variance Σ_{11}=1.44. Model (4) was used to investigate misspecification of marker trajectory (b2) by generating biomarker values with a quadratic profile in scenarios 7 and 8, as depicted in Fig. 2a.
 2
The observed marker value y_{i}(t) at time t was obtained as y_{i}(t)=w_{i}(t)+ε, where ε represents a normally distributed measurement error \(\epsilon \sim N(0,\sigma ^{2}_{\epsilon })\), with increasing variability σ_{ε}∈(0.1,0.3,0.5), corresponding to a coefficient of variation (CV), defined as the standard deviation of measurement error divided by the mean (e.g.,[22]), of 3.1%,9.4%,15.6% respectively. Regular measures of w_{i}(t) were obtained with increasing frequency, from one measurement per week (t=0,1,…,14) to 4 measurements per week (t=0,0.25,…,14), in order to examine the effect of LOCF in TVCM.
 3
The survival time \(T^{*}_{i}\) was obtained by a Weibull proportional hazard model: h_{i}(t)=λρt^{ρ−1} exp{αw_{i}(t)}, where ρ=1.4, λ=0.1. The association parameter was set at α∈(0,0.3,0.6), corresponding to no, moderate and strong association between w_{i}(t) and h_{i}(t), respectively. The survival time was generated by evaluating the inverse of a cumulative hazard (see, [23]). Since this does not lead to a closed form expression, we used the R root finder function uniroot to generate \(T^{*}_{i}\) numerically. To investigate the impact of misspecifying the distribution of the hazard function on the association parameter α (b3), in the scenario 9, the survival times were generated from a nonmonotonic baseline hazard function h_{0}(t)=νκt^{κ−1}/(c+t^{κ}), where ν=1, κ=2 and c=10. The shape of this function, along with the Weibull curve previously described, were shown in Fig. 2b.
 4
The censoring time C_{i} was generated according to a uniform distribution in (0,14), leading to around 20% of censoring proportion before week 14.
 5
The observed survival time \(T_{i}=min(T^{*}_{i},C_{i})\) was then calculated.
 6
Marker values y_{i}(t) with t>T_{i} were disregarded.
We drew B=1000 simulations for each scenario, B was chosen in order to get at least a 2% level of accuracy in the estimate of the association parameter α in about 95% of the samples, assuming a true association parameter of 0.6 with standard error 0.14 [24]. To each generated dataset, we fitted the following models: i) basic Cox model considering only the baseline measurement of a marker, y_{i}(t=0); ii) the TVCM considering the observed updated value of the marker; iii) the joint model considering the updated value of the marker. We summarized the results using: the mean of the simulation estimates (Est), empirical Monte Carlo standard error (ESE), asymptotic standard error (ASE), percentage bias (%Bias =bias/α) and 95% coverage probabilities (CP) of the association parameter α. We also used bias and the meansquared error (MSE) as necessary. The ASE was computed as the average of the estimated standard errors, and the ESE as the standard deviation of the estimates of α.
Results
a) Measurement error and last observation carried forward impact
Table 2 shows the results of the constant biomarker case (scenario 1 of Table 1). The TVCM and the baseline Cox model show a very similar performance, with increasing bias as the measurement error is increasing. This is expected given that the biomarker mean value does not change over time. In the presence of small measurement error (σ_{ε}=0.1), the joint model estimate showed a higher bias, indicating that a joint model is less beneficial in the presence of small measurement error and a constant biomarker. However, when σ_{ε} was increased to 0.3 and 0.5, the bias in the estimates of the joint model was smaller than the one in the TVCM, suggesting the ability of the joint model to account for measurement error.
Table 3 shows the results under scenario 2 (linearly decreasing marker), with α∈(0,0.3,0.6). The ESE (not reported) were always in close agreement with the ASE. When α was set at 0, a similar good performance of the three models was visible regardless of the size of σ_{ε}. In the other scenarios, we can observe increasing bias, and decreasing coverage probabilities, for the TVCM (every week) as the magnitude of σ_{ε} increases. With σ_{ε}=0.1 and α=0.3, the percentage bias was −2.3% and the coverage 95%. This percentage bias raised to −19%, and the coverage dropped to 80%, when σ_{ε} increased to 0.5, while it reduced to −0.7% when the number of measurements taken was increased to four times per week, thus the impact of LOCF estimate was reduced. The advantage of using the joint model was observed in the presence of high measurement error, where the percentage bias of −19% (TVCM) was reduced to 0.3%. The joint model, fitted using the parametric Weibull baseline hazard, provided the most unbiased estimates with coverage probabilities much closer to 95% across all the scenarios. We note that the performance of the TVCM falls further in the presence of a strong association between the marker and the timetoevent. For instance, with α=0.6 and σ_{ε}=0.5, a large percentage bias, −21%, and a very small coverage, 35%, were observed for the TVCM (once per week). In the latter setting, the improvement achieved by increasing the number of measurements was small.
b) Results under model misspecification
b1) Marker distribution
In joint modeling, the marker distribution is typically assumed to be Gaussian (e.g., [16]). Violation of this assumption is a key concern as the random effects play a central role in characterizing the association between the biomarker and hazard of an event [10]. The simulation study in this section assesses the effect of misspecifying the distribution of the random effects according to the scenarios 3 to 6 of Table 1. A random intercept model was considered to generate the random intercept b_{i0} from three nonnormal distributions and a reference Normal distribution. The joint model was fitted assuming a normally distributed random intercept in the longitudinal submodel. Five different sample sizes of 35, 75, 150, 300 and 600 subjects were considered in this setting. The measurement error standard deviation was held fixed σ_{ε}=0.3 and the true association parameter α=0.3. The results of the simulation are shown in Table 4. The joint model failed to converge for a few simulations with small sample size: 6/1000 when the data were generated using the Bimodal distribution with n=35 and 1/1000 for n=75. These nonconverging simulations were excluded from the analyses. When the marker was generated from a nonnormal distribution, the joint model produced a biased estimate of α for n=35, with a percentage bias of 22%, 17% and 7.7% when the random intercept was generated from Chisquare, Gamma and Bimodal distributions, respectively. However, the percentage bias decreased as the sample size n increased, reaching a maximum value of 3.7% with n=600 subjects, and the coverage probabilities were closer to the optimal 95% across all distributions. Further, both the ESE and the ASE decreased as the sample size increased. Thus, the estimate of the association between longitudinal marker and hazard of an event is not affected substantially by the misspecification of the random effect distribution as long as the sample size is large.
The TVCM is relatively less biased and more precise in the estimate of α for small sample sizes, indicating it could provide a good accuracy even though the marker was contaminated with a measurement error (σ_{ε}=0.3). Figure 1 shows the MSE for the joint and TVCM models under the four distributions. The MSE reflects the accuracy of each model taking into account both the bias and variability [24]. For the small sample size, the TVCM has lower MSE, except for the Normal case where MSE from both models are the same. As the sample size increases, the MSE from both models coincide.
b2) Marker trajectory
In order to appropriately characterize the association between the marker and the hazard of an event, estimation of the subjectspecific trajectory w_{i}(t) from (2) must capture the underlying shape. To evaluate the impact of misspecification of the marker profile on the estimate of α, we generated longitudinal trajectories which were quadratic in nature and fitted a joint model assuming linear trajectories with random intercept and random slope. We considered a slight and a gross departure from linearity, with parameters specified in scenarios 7 and 8 of Table 1, respectively. Figure 2a illustrates the mean longitudinal profile under both scenarios.
Table 5 reports the results of the simulation study under marker trajectory misspecification. The table includes the TVCM fitted to the generated observed longitudinal marker based on four times per week. A lack of convergence was encountered for the joint model under gross misspecification: the frequencies of nonconvergence were 16/1000 and 13/1000 for σ_{ε}=0.3 and σ_{ε}=0.5, respectively. Further, one extreme outlier estimate for each of the two σ_{ε} values was obtained. The two outliers were excluded from the results shown in Table 5. The impact of the marker trajectory misspecification is clearly observed in the estimates of the joint model. For σ_{ε}=0.3, we observe a percentage bias of −5.3% for the joint model under slight misspecification. This corresponds to an extra 5% bias as compared with the same scenario when the marker shape was correctly specified (see, Table 3). The extra bias could be as large as 8.7% under gross misspecification. These indicate that the longitudinal trajectory of a marker must be carefully specified when a joint model is considered for estimating the association between longitudinal biomarker and timetoevent. In the event of gross misspecification, the TVCM provides less biased estimates even in the presence of moderate measurement error in the biomarker.
b3) Hazard shape function
Within the joint model framework, leaving the baseline hazard unspecified severely underestimates the standard errors of the parameter estimates [21]. Thus the hazard function for the survival submodel is often assumed to be Weibull (e.g., [25]), but the evolution of the hazard rate over time can easily be nonmonotonic (e.g., [26, 27]). To investigate the impact of misspecifying the distribution of the hazard function on the association parameter α, we generated data following a nonmonotonic hazard (scenario 9 in Table 1) and fitted the joint model assuming three baseline hazard shapes: constant, Weibull and splines. For the case of splines, the baseline hazard was defined using Bsplines (e.g., [28]) with 5 internal knots placed at equallyspaced percentiles of the observed survival time T_{i}. Table 6 reports the results considering α∈(0.3,0.6) and σ_{ε}∈(0.1,0.3,0.5). The performance of the TVCM was comparable to the previous scenarios (see Table 3), while the accuracy of the joint model was strictly dependent on the assumptions on the hazard shape. The joint model with constant hazard produced severely biased estimates: for example when σ_{ε}=0.1, α=0.3 was underestimated by 39%, with a coverage of 39%, and none of the confidence intervals contained the true value, when α was set at 0.6. Thus, even if the constant hazard can be appealing for ease of computation, it often does not represent a realistic assumption. When the joint model was fitted to the generated data assuming a Weibull hazard, the estimate of α was also biased for all the scenarios. For α=0.3 and σ_{ε}=0.1, α was overestimated by 12%. Joint models based on spline functions provided the most unbiased estimates of α with coverage probability closer to 95% in most scenarios. The flexibility of spline functions allowed to capture the underlying nonlinear shape of the baseline hazard.
Motivating context
The example is coming from a study where patients with haematooncological diseases who underwent stem cell transplantation (HSCT) were evaluated to explore the potential of the long pentraxin 3 (PTX3) as a prognostic marker of Acute GraftversusHost Disease (GvHD) [1]. Acute graftversushost disease is one of the major causes of morbidity and mortality associated with allogeneic stem cell transplants [29]. Currently, the diagnosis of GvHD is based on clinical signs and symptoms and requires invasive biopsies of disease target organs in uncertain cases, which are sometimes unfeasible. To improve diagnosis and prognosis of GvHD, recent researches focus on specific biomarkers measured in the plasma or serum of HSCT patients as a new tool for detecting GvHD prior to clinical manifestation and for GvHD management. PTX3 is an acutephase protein, rapidly produced by vascular endothelial cells, mesenchymal cells and fibroblasts, as well as by innate immune response cells upon stimulation with proinflammatory cytokines, damaged tissuederived signals and microbial antigens. Differently from other acute phase proteins, such as the CReactive Protein, PTX3 is considered a rapid marker for primary local activation of innate immunity and inflammation due to its peculiar pattern of production.
In this section, we compare the use of the TVCM and joint model for the evaluation of PTX3 as a marker of GvHD. Peripheral blood samples were collected in a cohort of 116 patients before the beginning of conditioning regimen, on day 0 (HSCT), weekly after HSCT until the 14th week and at the development of symptoms consistent with GvHD. Plasma was obtained after centrifuging whole blood and PTX3 was evaluated by Sandwich ELISA assay, with a measurement precision declared as an intraassay CV lower than 10%. The median followup time was 5 weeks. Time was measured from HSCT up to the occurrence of GvHD, censoring occurred if a subject died before GvHD or was lost to followup. The followup ended at the 14th week.
Figure 3a displays the distribution of the PTX3 marker over time, showing a decreasing trend and departure of the distribution from normality. The average PTX3 at week 0 for all subjects was 29.46 ng/ml (nanograms per milliliter) with a standard deviation of 31.5. The GvHD hazard was estimated using the bshazard package [30], and plotted in Fig. 3b, which showed a highly nonmonotonic shape of the GvHD event. We fitted a TVCM and a joint model to evaluate the association between the marker and the hazard of GvHD. Consistently with the simulation study, we also considered the basic Cox model that uses only the baseline information, observed at t=0, as a covariate. For the joint model the longitudinal PTX3 was specified using a linear mixed model with random intercept and random slope, which was chosen as the best model according to AIC selection criterion when compared to a mixed model that involves a quadratic time. The baseline hazard within the joint model was specified as constant, Weibull and Bsplines with 6 internal knots placed at equallyspaced percentiles of the event time. Each model was fitted considering both the original PTX3 and the logarithmic transformation of PTX3 to satisfy the normality assumption of the linear mixed model.
The results are shown in Table 7, which reports the estimated association between PTX3 and GvHD (Est), the standard error of the estimate (SE), hazard ratio (HR), and the 95% confidence interval of the HR (95% HR CI). The baseline marker did not showed a significant association with the hazard of GvHD event. The updated values of PTX3 appear to be positively associated with the hazard of the GvHD as estimated by the TVCM, with both its original value and the log transformed version, even though the HR values are not comparable due to the log transformation. The TVCM hazard ratio of 1.14 indicates that a unit increase in the PTX3 marker corresponds to a 1.14fold increase in the hazard of developing the GvHD disease.
The joint models using constant and Weibull hazards estimated a lower nonsignificant association between PTX3 and time to GvHD. Interestingly, when the hazard was modeled by splines, the HR point estimate was equal to the one obtained by the TVCM (1.14), but with higher variability. When the log of PTX3 was used in a joint model with spline baseline hazard, a HR(95%CI) of 3.11(1.05,9.18) was obtained. It follows that a unit increase in the log of PTX3 marker was associated to a 3.11fold increase in the risk of developing the GvHD disease. This value was greater than the HR of 1.82 estimated by the TVCM, but with higher variability.
Overall, we notice a great variability among the joint model estimates of the HR, ranging from 0.76 up to 3.11. This can be directly linked to the misspecification of the marker and hazard distribution in some of the applied models, coherent with the simulation results. The Cox model was unaffected by the normality of the marker and from the hazard distribution.
Figure 4 shows the KaplanMeier (KM) estimate of GvHD occurrence and the predicted marginal survival from each one of the applied joint models. The spline based survival curve was much closer to the KM curve, suggesting that splines were able to capture the strong nonlinear hazard function shown in Fig. 3b. The curve associated to the Weibull was in agreement with the KM estimate until the 4th week of followup, but the difference with the KM estimate increased over time. As expected, the survival curve associated to the constant hazard largely deviated from the KM curve.
Discussion
Investigating biological biomarkers as markers of diagnosis/prognosis or response to treatment requires inferential tools for the association between the longitudinal process of the marker and the progression of the diseases. The TVCM has been the standard approach, but its partial likelihood assumes constant biomarker values between followup times and ignores measurement error. There has been some effort to expand the Cox model to accommodate measurement error, such as regression calibration (e.g., [33]), that however requires the availability of a validation subsample, that is not often available. The modeling of the longitudinal profile of the biomarker by a linear mixed model is another approach to obtain an estimate of the expected value of the biomarker free from measurement error, that can be included as a covariate in the TVCM with a twostage approach [17]. Joint models simultaneously analyze the longitudinal marker profile and the time to an event overcoming both the issues of LOCF and measurement error. Joint models are, however, computationally intensive and require additional assumptions over the TVCM. In this paper, we performed a comprehensive simulation study with the goal of clarifying relevant assumptions for the understanding of a joint model and for assessing its robustness under key model misspecifications. Further, we disentangled the bias introduced by LOCF and measurement error in the TVCM and compared its performance with the joint model. Overall, we illustrated that the TVCM approach underestimates the association estimates in the presence of measurement error.The major source of the TVCM bias was attributable to measurement error compared with that attributable to LOCF. On the other hand, the joint model can be severely biased under model misspecification.
Firstly we considered how estimates from a joint model may be biased under the misspecification of the normality assumption for the true marker distribution. Violation of this assumption for joint models is an issue as the random effects play a central role for characterizing the association between the marker and the hazard of an event [10]. To avoid parametric distributional assumption, joint models based on a semiparametric [31] or nonparametric assumptions [5] have been proposed. Further, [32] showed that parameter estimates are robust to misspecification as the number of measurements per subject increases. We showed that the misspecification has a negligible effect on the estimate of the association parameter as long as the sample size is large, regardless of the parametric distribution being adopted. The TVCM was not affected by the marker distribution. This is expected, but it is worth to underline it here to discourage unnecessary logtransformation to account for normality in the Cox model framework, which is sometimes seen in the medical literature (e.g., [34]).
Second, we looked into the impact of misspecifying the longitudinal marker trajectory on the estimate of the association between the marker and the hazard of an event. This is motivated by the fact that the true underlying marker trajectory is typically unknown, since we only observe error contaminated and intermittently measured marker. To effectively characterize the association estimate, the true marker trajectory must be appropriately estimated [10]. We illustrated that failing to capture the underlying marker trajectory, at different amounts of measurement error, leads to substantially biased estimates in the joint model, while the TVCM is unaffected by the misspecification, since it does not assume any form of marker shape. [17] similarly found that, at fixed measurement error, estimates from the joint model are biased under marker trajectory misspecification. However, they also suggested that the bias is still less than the bias from the TVCM.
Furthermore, we found that a misspecification of the baseline hazard in the joint modeling framework has an important effect on the estimate of the association between the longitudinal marker and the hazard of an event. This issue had never been considered in the literature of joint models, but simulations indicated that the association estimate was severely biased when the data generating hazard process was misspecified. This was particularly evident when we attempted to model a highly nonlinear hazard shape by a constant or Weibull hazard. On the other hand, association estimate using TVCM was insensitive to the misspecification of the baseline hazard, as its shape is unspecified. In the joint modeling framework leaving the baseline hazard unspecified severely underestimates the standard error of the parameters [21], even if it appears to be the most applied choice as shown in a recent metaanalysis on joint models [25]. Thus, the baseline hazard in the joint model should be carefully modeled, also with the use of splines if necessary, to avoid bias on the association estimate. The two modeling techniques were illustrated using a real data on HSCT for establishing PTX3 as a marker of GvHD. The joint model, with the hazard modeled by spline functions, provided the PTX3 as a potential diagnostic marker of GvHD. This was corroborated by the TVCM, even if it indicated a lower association estimate.
In conclusion, joint models are a powerful tool, able to account for marker measurement error and to model the marker trajectory in time. However, they require strong assumptions that need to be properly validated, and the avoidance of bias due to model misspecification is crucial in order for a joint model to provide a substantive benefit over the semiparametric Cox model with a timevarying covariate. Furthermore, it may be suggested that the better performance by the joint model is unfair because the data generating scheme in our simulation utilized a biomarker measurement error whereas the TVCM does not assume the presence of measurement error. We showed that the performance of the joint model was higher than that of a TVCM accounting for measurement error in the biomarker by a twostage approach, while requiring similar hypotheses. The results are provided in the Additional file 1.
Availability of data and materials
The datasets along with the simulation code used during the current study are available from the corresponding author on reasonable request.
Abbreviations
 ASE:

Asymptotic standard error
 CI:

Confidence interval
 CP:

Coverage probabilities
 ESE:

Empirical monte carlo standard error
 Est:

Mean of the maximum likelihood estimates
 GvHD:

Acute graftversushost disease
 HR:

Hazard ratio
 HSCT:

Haematooncological stem cell transplantation
 KM:

Kaplanmeier
 LOCF:

Last observation carried forward
 PTX3:

Long Pentraxin 3
 TVCM:

Timevarying covariate cox model
References
Dander E, De Lorenzo P, Bottazzi B, Quarello P, Vinci P, Balduzzi A, Masciocchi F, Bonanomi S, Cappuzzello C, Prunotto G, et al. Pentraxin 3 plasma levels at graftversushost disease onset predict disease severity and response to therapy in children given haematopoietic stem cell transplantation. Oncotarget. 2016; 7:82123–38.
Cox DR. Regression models and life tables. J Royal Stat Soc. 1972; 34:187–220.
Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. New Jersey: Wiley; 2012.
Andersen PK, Gill RD, Borgan Ø, Keiding N. Statistical models based on counting processes. New York: Springer; 1993.
Tsiatis AA, Davidian MA. A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika. 2001; 88:447–58.
Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982; 69:331–42.
Dupuy JF, Mesbah M. Joint modeling of event time and nonignorable missing longitudinal data. Light Data Anal. 2002; 8:99–115.
Tsiatis AA, Davidian MA. Joint modeling of longitudinal and timetoevent data: an overview. Stat Sin. 2004; 14:809–34.
ProustLima C, Sene M, Taylor JM, JacqminGadda H. Joint latent class models for longitudinal and timetoevent data: A review. Stat Methods Med Res. 2014; 23:74–90.
Rizopoulos D. Joint models for longitudinal and timetoevent data: With applications in R. Boca Raton: Chapman and Hall/CRC; 2012.
Ibrahim JG, Chu H, Chen LM. Basic concepts and methods for joint models of longitudinal and survival data. J Clin Oncol. 2010; 28:2796–801.
Gould AL, Boye ME, Crowther MJ, Ibrahim JG, Quartey G, Micallef S, Bois FY. Responses to discussants of ’Joint modeling of survival and longitudinal nonsurvival data: current methods and issues. report of the DIA Bayesian joint modeling working group. Stat Med. 2015; 34:2202.
Köhler M, Umlauf N, Beyerlein A, Winkler C, Ziegler AG, Greven S. Flexible Bayesian additive joint models with an application to type 1 diabetes research. Biom J. 2017; 59:1144–65.
Sauerbrei W, Abrahamowicz M, Altman DG, Cessie S, Carpenter J. STRengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med. 2014; 33:5413–32.
Boulesteix AL, Binder H, Abrahamowicz M, Sauerbrei W. Simulation Panel of the STRATOS Initiative. Biom J. 2008; 60:216–8.
Crowther MJ, Lambert PC, Abrams KR. Adjusting for measurement error in baseline prognostic biomarkers included in a timetoevent analysis: a joint modelling approach. Med Res Methodol. 2013; 13:146–54.
Sweeting MJ, Thompson SG. Joint modelling of longitudinal and timetoevent data with application to predicting abdominal aortic aneurysm growth and rupture. Biom J. 2011; 53:750–63.
Marubini E, Valsecchi MG. Analysing Survival Data from Clinical Trials and Observational Studies. Wiley: West Sussex; 1996.
Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1:465–80.
Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. New York: Springer; 2000.
Hsieh F, Tseng Y, Wang J. Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics. 2006; 62:1037–43.
Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998; 26:217–38.
Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med. 2005; 24:1713–23.
Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006; 25:4279–92.
Sudell M, KolamunnageDona R, TudurSmith C. Joint models for longitudinal and timetoevent data: a review of reporting quality with a view to metaanalysis. BMC Med Res Methodol. 2016; 16:168.
Rebora P, Galimberti S, Valsecchi MG. Using multiple timescale models for the evaluation of a timedependent treatment. Stat Med. 2015; 34:3648–60.
Lee M, Czene K, Rebora P, Reilly M. Patterns of changing cancer risks with time since diagnosis of a sibling. Int J Cancer. 2015; 136:1948–56.
Arisido MW. Functional measure of ozone exposure to model shortterm health effects. Environmetrics. 2016; 27:306–17.
Ferrara JLM, Levine JE, Reddy P, Holler E. Graftversushost disease. The Lancet. 2009; 373:1550–61.
Rebora P, Salim A, Reilly M. A flexible tool for nonparametric smoothing of the hazard function. The R J. 2014; 6/2:114–22.
Song X, Davidian M, Tsiatis AA. A semiparametric likelihood approach to joint modeling of longitudinal and timetoevent data. Biometrics. 2002; 58:742–53.
Rizopoulos D, Verbeke G, Molenberghs G. Shared parameter models under random effects misspecification. Biometrika. 2008; 95:63–74.
Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982; 69:331–42.
de Vries EM, Wang J, Williamson KD, Leeflang MM, Boonstra K, Weersma RK, et al. A novel prognostic model for transplantfree survival in primary sclerosing cholangitis. Gut. 2017. https://doi.org/10.1136/gutjnl2016313681.
Acknowledgements
Not applicable.
Funding
This work was supported by the grant SIR RBSI14LOVD of the Italian Ministry of Education, University and Research. We acknowledge that this research was partially supported by the Italian Ministry of University and Research (MIUR)  Department of Excellence project PREMIA (PREcision MedIcine Approach: bringing biomarker research to clinic). The sponsor has no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
MW conducted the simulations, analyzed the application data and was a major contributor in writing the manuscript. PR supervised and coordinated the research work, designed the simulation protocol and data analysis, interpreted the results and was a major contributor in writing the manuscript. LA contributed to describing the methods and interpreting the result. DPB was involved in interpreting the result and discussing the research work. MG evaluated the research work and revised the writing of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1
Supplementary Appendix.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Arisido, M.W., Antolini, L., Bernasconi, D.P. et al. Joint model robustness compared with the timevarying covariate Cox model to evaluate the association between a longitudinal marker and a timetoevent endpoint. BMC Med Res Methodol 19, 222 (2019). https://doi.org/10.1186/s128740190873y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s128740190873y
Keywords
 Cox model
 Joint model simulation
 Longitudinal biomarker
 Random effects model
 Timevarying covariate