 Research article
 Open Access
 Published:
Comparison of a timevarying covariate model and a joint model of timetoevent outcomes in the presence of measurement error and interval censoring: application to kidney transplantation
BMC Medical Research Methodology volume 19, Article number: 130 (2019)
Abstract
Background
Tacrolimus (TAC) is an immunosuppressant drug given to kidney transplant recipients posttransplant to prevent antibody formation and kidney rejection. The optimal therapeutic dose for TAC is poorly defined and therapy requires frequent monitoring of drug trough levels. Analyzing the association between TAC levels over time and the development of potentially harmful de novo donor specific antibodies (dnDSA) is complex because TAC levels are subject to measurement error and dnDSA is assessed at discrete times, so it is an interval censored timetoevent outcome.
Methods
Using data from the University of Colorado Transplant Center, we investigated the association between TAC and dnDSA using a shared random effects (intercept and slope) model with longitudinal and interval censored survival submodels (JM) and compared it with the more traditional interval censored survival model with a timevarying covariate (TVC). We carried out simulations to compare bias, level and power for the association parameter in the TVC and JM under varying conditions of measurement error and interval censoring. In addition, using Markov Chain Monte Carlo (MCMC) methods allowed us to calculate clinically relevant quantities along with credible intervals (CrI).
Results
The shared random effects model was a better fit and showed both the average TAC and the slope of TAC were associated with risk of dnDSA. The simulation studies demonstrated that, in the presence of heavy interval censoring and high measurement error, the TVC survival model underestimates the association between the survival and longitudinal measurement and has inflated type I error and considerably less power to detect associations.
Conclusions
To avoid underestimating associations, shared random effects models should be used in analyses of data with interval censoring and measurement error.
Background
There are nearly 100,000 patients on the U.S. kidney transplant waiting list, with 13 people dying every day while awaiting this lifesaving therapy [1]. This national shortage of organs has made it imperative to ensure the longest possible kidney allograft survival for the approximately 20,000 patients that receive a transplant each year, yet over half of kidney grafts fail by 10 years after transplantation [1]. There has been an evolving recognition that antibodymediated rejection represents one of the prominent barriers to improving longterm graft outcomes [2]. The antibodies that mediate this rejection process, de novo donorspecific antibodies (dnDSA), have been established as an early biomarker for posttransplant adverse kidney events and patients are now screened for dnDSA at regular intervals at most centers around the country [3, 4]. Immunosuppression therapy is likely the most directly modifiable factor in preventing dnDSA. The majority of centers in the U.S. (93%) use Tacrolimus (TAC) as the backbone of their immunosuppression protocols [5], but the drug has a narrow and poorly defined therapeutic range that varies by patient.
The relationship between TAC and dnDSA is still largely unexplored due to numerous complexities. TAC trough levels, or the lowest concentration of drug in the blood prior to the next dose of medication, are measured frequently posttransplant. The measurement error is high because of biological variability in how the drug is absorbed. In addition, TAC is considered an internal covariate in relation to dnDSA since both processes are occurring in the same individual and there may exist a feedback loop in which low TAC levels lead to dnDSA but the suspected presence of dnDSA causes a TAC dose increase. To add to the complexities, screening for dnDSA is conducted only periodically, e.g. every six months to one year, which makes dnDSA an interval censored outcome. It is known that dnDSA have developed during the interval, but the exact timing of dnDSA is unknown. In previous studies, onedimensional functions of observed TAC values were found to be associated with development of dnDSA; these functions included the coefficient of variation of TAC, the percentage of TAC levels below 5 ng/ml [6], the mean TAC troughs of less than 8 ng/ml and the percentage of time TAC was in therapeutic range [7]. These studies did not address the longitudinal aspect of TAC, the measurement error or possible feedback from dnDSA to TAC, or the intervalcensored nature of dnDSA.
We compare two methods to model a heavily interval censored (dnDSA) and a highly variable longitudinal outcome (TAC): (i) a joint model (JM) and (ii) a survival model with a time varying covariate (TVC). It has been shown that when exact event times are known, the presence of measurement error causes the association parameter estimated by a TVC model to be biased towards null [8, 9]. We sought to further this exploration by studying how the amount of measurement error and the width of the censoring intervals around the timetoevent outcome affect the association estimate in JM and TVC. Interval censored time to event outcomes occur frequently in clinical situations, in particular for conditions that are subclinical and detectable only through a specialized assessment. However, to our knowledge, a thorough comparison of TVC versus JM in the presence of differing amounts of measurement error and interval censoring has yet to be performed.
First, we fit a joint model to analyze the association between longitudinal TAC levels and interval censored dnDSA. The literature of joint models of longitudinal and survival outcomes is extensive including textbooks [10] and [11] and comprehensive review articles [12–15]. Although methods have been developed for single survival outcomes [16–19], much less attention has been given to intervalcensored outcomes in joint models [20–22]. Gueorguieva et al. developed a joint model for longitudinal and interval censored competing risk dropout outcomes using a family of parametric baseline hazards including Weibull and loglogistic and used maximum likelihood estimation [22]. We build our joint model for TAC and dnDSA based on this work, but with a single time to event interval censored outcome, a correlated error structure within the longitudinal submodel, and estimation using Markov Chain Monte Carlo (MCMC).
Second, we fit an interval censored survival model with a timevarying covariate (TVC), following the work by Sparling et al. [23]. We compare the estimates of association between TAC and dnDSA, with the hypothesis that the association parameter in the TVC model will be underestimated. Finally, we perform a series of simulation studies to further explore how the two models perform in terms of bias, level and power based on (1) varying strengths of association, (2) varying amounts of measurement error in TAC, and (3) varying degrees of interval censoring in dnDSA.
In the first section, we introduce the kidney transplant study data. Next, the models are formulated and applied to the motivating kidney transplantation dataset. A simulation study comparing the models under different levels of interval censoring and different amounts of measurement error in the longitudinal process is presented and a discussion on the findings and comparison of the models follows.
Kidney Transplant Study
This retrospective study included patients who were at least 18 years old at time of kidney transplant between September 2007 and December 2013 at the University of Colorado Hospital. Patients were excluded if they had primary nonfunction, simultaneous liver and kidney transplant, islet cell transplant, had pretransplant DSA, failed to undergo dnDSA screening, or had dnDSA within the first week posttransplant. Informed consent was provided by all patients, and this study was conducted in accordance with the Declaration of Helsinki and was approved by the ethics committee at the University of Colorado (COMIRB 133137). No organs or tissues were obtained from prisoners and all kidney transplant recipients received organs in accordance with the United Network of Organ Sharing, the Organ Procurement and Transplantation Network and a local organ procurement organization, the Donor Alliance, as directed by the United States National Organ Transplant Act.
Five hundred and thirty eight patients met the final inclusion criteria for this analysis. Baseline characteristics of interest are summarized in Table 1. It is hypothesized that age, race, and the degree of Human Leucocyte Antigen (HLA) mismatch between the donor and recipient are associated with development of dnDSA [7]. Posttransplant, each patient was closely monitored and data on TAC trough levels and formation of dnDSA were collected for up to 7 years. Each individual had up to 90 measures of TAC that ranged in value from 0 to 30. No obvious violation of normality assumption in the distribution of TAC was observed (histogram in Additional file 1: Figure S1). Due to the high variability and unreliability of TAC levels within the first week, only TAC levels after day 7 were included in the analysis. TAC levels of zero indicated that a patient was noncompliant and did not take their prescribed TAC dose or a patient was purposefully taken off the drug due to some other medical complication, such as an infection. The number and frequency of TAC measures varied by patient, with an overall median of 22 measurements per person (IQR: 1532) (Table 1). dnDSA screening was performed at 1, 6, 12 months, annually, and when clinically indicated. Of the 538 patients, 181 developed dnDSA during the study period. For the purpose of this project, dnDSA was treated as a time to event variable, with the event being the first time an individual tested positive for dnDSA. Figure 1 shows the TAC trajectories and time to dnDSA or censoring of 4 random individuals who developed dnDSA and 4 random individuals who did not develop dnDSA.
Methods
To analyze the association between TAC and dnDSA, we propose a shared random effects model and compare it with a traditional interval censored survival model treating TAC as a timevarying covariate.
M1: Shared Random Intercept and Slope Model
Suppose there are N individuals each measured at n_{i} time points, i=1,...,n_{i}. Let y_{ij} be the measurement for individual i at time t_{ij}, j=1,...,n_{i}. We assume that given the vector of individualspecific random intercept and slope (a_{0i},a_{1i})^{′}, individual outcome measures y_{ij} are independent and normally distributed with mean μ_{ij} and residual variance \(\sigma ^{2}_{e}\), where
The time component t_{ij} can be modeled with any function f. Some examples include a linear trajectory, i.e. f(t_{ij})=b_{0}+b_{1}t_{ij}; or bspline basis functions, i.e. \(f(t_{ij})=\sum _{k=1}^{K} \beta ^{\prime }_{0k} \psi _{k}(t_{ij})\), where ψ_{k}(t_{ij}) is the k^{th} of K basis functions with coefficient β_{0k} and the number of basis functions K depends on the degree of the splines (e.g. g=3) and number of inner knots (h=3; K=g+h+1). The vector x_{1i} represents baseline covariates with regression coefficients β_{1}. The parameters a_{0i}, a_{1i} are the random intercepts and slopes that allow individuals to vary in their baseline TAC level and time trend; they are normally distributed with mean zero and a correlated variance/covariance matrix
Suppose the same N individuals are measured at discrete and varying timepoints for a time to event outcome. The time to event outcome is interval censored and only known to happen between two discrete time points. Let t_{Ri} denote the time at which the outcome was detected and t_{Li} indicate the visit time immediately preceding t_{Ri}. The true time to event, t_{i}, lies somewhere in between t_{Li} and t_{Ri}. The hazard for individual i at any time t is given by
where h_{0}(t) is the baseline hazard, β_{0} is the fixed intercept and scale parameter, x_{2i} is the vector of fixed covariates and β_{2} is its parameter, (λ_{0},λ_{1})^{′} are the association parameters, and (a_{0i},a_{1i})^{′} are the same random terms as above in the longitudinal submodel (1). The baseline hazard can a number of flexible distributions, including parametric (typically exponential, Weibull, or gamma), piecewise constant or spline forms. The two submodels (1) and (3) are connected by the shared random intercepts from the survival portion with both the random intercepts and slopes from the longitudinal portion. The first association parameter, λ_{0}, corresponds to the risk of event for a difference in average longitudinal measure of one unit. Parameter λ_{1} corresponds to the risk of event for a unit increase in the slope of the longitudinal model.
Conditional on random effects, a_{0i},a_{1i}, the log likelihood can be written for each individual:
where I_{Ri} is an indicator variable that is 1 when the individual is right censored and 0 when interval censored, t_{Ri} is the right censoring time, and t_{Li} is the left censoring time. F_{i}(t) corresponds to the cumulative density function at time t. Details on how this likelihood is constructed for the Weibull hazard that was used in the results can be found in the Additional file 1.
M2 and M3: Special Cases of M1
We also consider special cases of M1 where, instead of sharing both the random intercepts and slopes from the longitudinal model (1) with the random intercepts in the survival model (3), only the random intercepts from the longitudinal model are shared with the random intercepts from the survival model. This model is called M2, and the hazard function of the survival model is the special case when λ_{1}=0 in Equation (3). Model M3 is the special case when λ_{0}=0 in Equation (3). This model shares the random slope from the longitudinal model with the random intercept from the survival model.
M4: Time Varying Covariate Survival Model
We also consider an interval censored survival model with a timevarying covariate for this data. Assume that each individual, i=1,...,N, has longitudinal measurements at update times t_{ij}, where j=1,...,n_{i}. Let z_{i} denote a vector of baseline covariates for individual i and w_{ij} denote the value of the timevarying covariate for individual i at update time j. The hazard of event for individual i at time j can be written as:
where h_{0tvc} is the baseline hazard function (which can be specified as any flexible distribution, as in M1), γ_{0} is the fixed intercept, γ^{′} is the coefficient vector for baseline covariates, and η is the coefficient for the timevarying covariate, w_{ij}. In this case, w_{ij} is the same covariate as the longitudinal outcome in M1 (y_{ij}), since we are comparing the performance of M1 and M4 in modeling the same outcome, TAC from our motivating kidney dataset.
The likelihood for this model can be written as in Sparling et al. [23]:
where \(\phantom {\dot {i}\!}F_{i}(t_{i}\boldsymbol {\gamma }_{i},\boldsymbol {w}_{i[(t_{i})]})\) is the cumulative density, conditional on the fixed covariate values γ_{i} and the relevant timedependent covariate values w_{ij} up to the right or interval censoring times. The cumulative density can be iteratively calculated based based on equation (5) for each subject, by using the cumulative hazard and survival functions.
Bayesian Model Fit
Model estimation and inference is based on the Bayesian framework where the hyperparameters are assigned weakly informative priors. Priors for the parameters are as follows:
where u∼Normal(μ,σ^{2}) with probability density \(1/(\sigma \sqrt {2\pi })\exp {\{(u\mu)^{2}/2\sigma ^{2}\}}\); u∼Uniform(a,b) with probability density 1/(b−a) for a<u<b; u∼Gamma(r,λ) with probability density λ^{r}u^{r−1}exp(−λu)/ Γ(r); and Ω∼Wishart(R,k) with probability density (∣Ω∣^{(k−p−1)/2}∣R∣^{k/2} exp{−Tr(RΩ/2})/(2^{pk/2}Γ_{p}(k/2)) for k≥p, where p=2, Tr is the trace function and Γ_{p} is the multivariate gamma function.
Markov Chain Monte Carlo (MCMC) simulations were performed to estimate the posterior distribution. A Gibbs sampler was used to construct two Markov chains using JAGS software [24] and the runjags package in R [25]. Convergence of the samples was assessed by trace plot inspection, and Gelman and Geweke tests which test for equality of the means of the first and last part of a Markov chain [26]. After a burnin period of 20,000 and thinning of 40 of 80,000 sampling iterations, 2000 samples per chain were used for inference.
Model Selection and Comparison
Model selection was carried out using (a) the Deviance Information Criterion (DIC) [27] and (b) the WatanabeAkaike information criterion (WAIC) introduced by Watanabe [28]. The DIC is a measure calculated based on the deviance, D(θ)=−2 logL(yθ), that penalizes for the number of effective degrees of freedom in the model and is defined as \( DIC=\overline {D(\boldsymbol {\theta })}+\rho _{D}=2\overline {D(\boldsymbol {\theta })}D(\hat {\boldsymbol {\theta }}) \), where \(\bar {D}\) and \(\hat {\boldsymbol {\theta }}\) are obtained from MCMC output as the mean of the deviance and the posterior mean, respectively. WAIC approximates leaveoneout crossvalidation [28] and works well for hierarchical models where the number of degrees of freedom is not obvious [29]. The best model should have the lowest values of DIC and WAIC; differences of at least 5 points will be considered important. We followed the traditional recommendations by Spiegelhalter and Ntzoufras [26, 27] to use the conditional likelihood rather than the marginal to compare model fits. More discussion on this topic can be found in the work by Kapur [30] and in the supplement (Part G) of JuarezColunga [31].
Results
Models and Model Selection
Models M1M4 were fitted to the kidney transplant data. The three joint models (M1, M2, M3) had a longitudinal submodel with TAC as the outcome, a fixed intercept and slope for time (f(t_{ij})=b_{0}+b_{1}t_{ij})) and a random intercept and/or slope for time (depending on the model) for variations across individuals. Spline terms for time were tested, but DIC and WAIC indicated that a linear trend was sufficient. For the survival submodels, various baseline hazard functions were tested (exponential, Weibull, gamma, piecewise constant) and the Weibull yielded the best model fit (h_{0}(t)=αt^{α−1}, where α is the Weibull shape parameter). The Weibull distribution makes intuitive sense for this data, as the hazard can flexibly be increasing or decreasing over time and the risk of dnDSA is thought to decrease as time from transplant progresses. The parameters from the Weibull model can easily be interpreted as the change in logrelative risk. All survival submodels had time to dnDSA as the interval censored outcome, random intercept(s) associated with the longitudinal submodel, and a vector of baseline covariates: age, ethnicity, and number of HLA mismatches. See Table 1 for the description of these covariates. Baseline covariates were tested in both the longitudinal and survival submodels, but none were significant predictors in the longitudinal submodel, so they were removed.
The interval censored survival model with a timevarying covariate (M4) had dnDSA as the outcome, TAC as the timevarying covariate, and the same baseline covariates as listed above. After considering other baseline hazard functions for M4, we chose the Weibull due to model fit and to compare to M1, so \(\phantom {\dot {i}\!}h_{0tvc}(t)=\alpha _{tvc}t^{\alpha _{tvc}1}\) where α_{tvc} is the Weibull shape parameter.
Table 2 displays the results of model comparisons based on the DIC and the WAIC. The survival submodel fits in M1M3 can be directly compared with the model fit for M4. The best fitting survival model according to both DIC and WAIC is Model 1, which shares both the random intercept and slope from the longitudinal model with the random intercept from the survival model.
Estimation of Associations
Table 3 presents the results from the final model, M1, and the timevarying covariate model, M4. The longitudinal portion of M1 shows that on average, individuals start at a TAC level of about 7 ng/ml. For every one month posttransplant, the average individual’s TAC levels decrease by 0.05 ng/ml. There is a significant negative correlation between the random intercept and slope, ρ=−0.38, indicating that individuals with higher starting TAC have a lower slope. The survival portion of the model has a decreasing hazard over time, parameterized by α=0.57. The degree of HLA mismatching is significantly associated with dnDSA; for every one unit increase in mismatching, the risk of dnDSA increases 1.26fold (95% CrI: 1.13, 1.40). Race is also associated with risk of dnDSA; compared to Caucasian individuals, African Americans and Hispanics have a higher risk of dnDSA (2.09 and 1.60, respectively). Age is also a factor, as middleaged (3049 years) and olderaged (50+ years) individuals both have a reduced risk of dnDSA compared to individuals under the age of 30 (Hazard Ratios: 0.57 and 0.30, respectively). These covariate effects are all conditional on the same initial level and slope of TAC, so these results are not explained by different TAC levels across HLA mismatches, race or age groups. The first association parameter, λ_{0}, explains the risk of dnDSA associated with a difference in overall average level of TAC. For every one ng/ml higher average TAC level, the risk of dnDSA lowers 0.64fold (95% CrI: 0.52, 0.75). The second association parameter, λ_{1}, explains the risk of dnDSA associated with a difference in the slope of TAC levels. For every 0.05 ng/ml/month higher TAC slope, the risk of dnDSA lowers 0.43fold (95% CrI: 0.32, 0.58).
The interval censored survival model M4 does not have a linear submodel, but instead uses a timevarying covariate for TAC. The scale parameter of the survival model is α=0.46, which also results in a decreasing hazard of dnDSA over time. The effects of baseline covariates, γ^{′}, are similar in magnitude to β^{′} in M1. The hazard ratio for the timevarying TAC is η=0.80. Within an individual, following a one unit increase in TAC level, the risk of dnDSA decreases 0.80fold (95% CrI: 0.75, 0.85). Though this is still a significant finding, the effect size of η in M4 is smaller than λ_{0} in M1. We explore this further in the “Simulation Study” section.
Goodness of Fit of Kidney Transplant Study
To assess the adequacy of model fit for M1M3, we simulated 1000 datasets from the posterior predictive distribution and compared them to the observed data. We performed these posterior predictive assessments separately for the longitudinal and survival submodels. This is usually done through a discrepancy function represented by a scalar or a vector, often represented graphically [32, 33]. For comparing the observed longitudinal TAC data with the posterior distribution of longitudinal measures on given individuals, we have conditioned on the posterior mean of the random effects at the individual level; e.g. for M1: \(\hat {a}_{0i}, \hat {a}_{1i}\) in (1) are considered fixed [34]. We compared 1000 posterior predictive TAC trajectories against the observed trajectory for 3 random individuals using M1, M2 and M3. Due to the large variability in TAC levels within and between individuals, the plots are all nearly identical, which is indicative that no one model fits the longitudinal trajectory better than another model. The plots are shown in Additional file 1: Figure S2.
For the survival submodel predictive check, we created artificial intervals (that resembled the kidney study data intervals) around the survival times taken from the 1000 posterior predictive datasets. Next, using these intervals, we plotted the predicted Weibull cumulative density functions from the 1000 posterior datasets and then overlaid the estimated cumulative density curve from the raw data. We chose to plot the cumulative density function instead of the survival function because of the low event rate in these subgroups [35]. We had to choose certain values of baseline covariates for these checks, so we presented three different types of individuals that represent the most prevalent combinations of covariates. The results of the survival predictive checks for three particular combinations of baseline covariates, for all three models M1M3, are found in the Additional file 1: Figure S3. The coverage of the observed survival curve by the posterior predictive survival curves is best for M1. This is indicative that M1 fits the survival data better than the other two joint models.
Sensitivity Analysis
To determine if the choice of priors affected the estimates of M1 parameters, an alternative prior distribution of Uniform (U(−100,100)) was compared against a Normal (N(0,1000)) prior for b_{0}, b_{1}, β, λ_{0} and λ_{1}, and the halfCauchy (HC(0,25)) distribution for σ_{e} was compared against the uniform prior (U(0,100)) for this variance component. The results, tabulated in Additional file 1: Table S1, indicate the choice of priors has very little effect on the parameter estimates.
Simulation Study
We conducted a simulation study to compare how the joint M1 and the time varying covariate model perform under (1) varying degrees of association, (2) varying amounts of interval censoring (IC), and (3) varying amounts of measurement error (ME). We simulated data using the joint model with shared random intercepts (M2) under nine scenarios, each with a different combination of measurement error (none \(\left [\sigma _{e}^{2}=0\right ]\), low \(\left [\sigma _{e}^{2}=1\right ]\), and high \(\left [\sigma _{e}^{2}=8\right ]\)) and degree of interval censoring (lighter [01 month, 1–6 month, 6–12 month, 12–24 month, yearly after], visits missed at random, and heavier [3 year intervals]). For the random interval censoring, we assumed patients had a fixed followup schedule matching the lighter IC scenario, but missed 50% of visits at random (we also set the missingness to 25% and the results were almost identical). All other variables were set to be similar to the kidney transplant data, with Weibull parameters α=0.50 and β_{0}=−2. Two hundred datasets were simulated under each of these six conditions, and the JM and TVC models were fitted to each dataset using MCMC. Each dataset contained 300 individuals, and approximately half developed dnDSA. The results from all simulations were obtained using two parallel chains; the TVC model had a burnin of 1000 samples and 1000 samples for inference, while the JM had a burnin of 2000 samples and 2000 samples for inference. The average model estimates from the light and heavy IC scenarios are reported in Table 4. The light and random IC scenarios did not differ significantly, so the results from the random IC scenarios are in the Additional file 1: Table S2.
To compare power for detecting an association between the longitudinal and the time to event outcomes, we varied the association between TAC and dnDSA (λ_{0} in the JM and η in the TVC) between 0.50 and 0 within each scenario. We calculated power based on each model at each true association value using the credible intervals of λ_{0} or η. For example, for simulation run m, m=1,...,M (M=200 for our simulations), let \(\hat {\lambda }_{0L}\) and \(\hat {\lambda }_{0U}\) denote the 2.5% and 97.5% percentiles of the distribution of the MCMC samples of \(\hat {\lambda }_{0}\). The power is calculated as: Power \(= 11/M\sum \limits _{m=1}^{M} I(\hat {\lambda }_{0L}<0<\hat {\lambda }_{0U})\), where I(E) is the indicator function of event E. The power curves for all nine scenarios are presented in Fig. 2.
As expected, the association parameter was consistently attenuated toward zero in the TVC compared to the JM. As more measurement error was introduced, the TVC underestimated the association even more (true value: λ_{0}=−0.50, low ME: \(\hat {\eta }\)= 0.32 and high ME: \(\hat {\eta }\)= 0.18), while there was no evidence of substantial bias for the JM (low ME: \(\hat {\lambda }_{0}= 0.53\) and high ME: \(\hat {\lambda }_{0}= 0.54\)). As heavier interval censoring was introduced, the association estimates were further from the true value in both modeling approaches, but the standard deviations also increased. The parameter that was most affected by heavier interval censoring was the Weibull shape parameter, α. Although the bias of α was lower in the TVC context, this model should not be preferred when there is high measurement error and heavy censoring, due to the underestimation of the association between the two outcomes, which is typically of more interest.
The power of the TVC model was consistently lower than the power of the JM (Fig. 2). When more measurement error was introduced, the power of the TVC model significantly decreased while the power of the JM stayed approximately the same. When visits were missed at random, the decrease in power on both approaches was minimal. The wider interval censoring around dnDSA resulted in a high type I error in the case of the TVC model (type I error=0.23 in the high ME and heavy IC simulation from Fig. 2). The average credible intervals and coverage probabilities for all simulations are reported in the Additional file 1: Figure S3.
Discussion
In this paper, we compare two modeling techniques for two outcomes: a longitudinal outcome with high measurement error and a survival outcome with heavy interval censoring. For our motivating dataset, both DIC and WAIC suggested that a shared random effects model with random effects for both the intercept and slope of the longitudinal process (TAC) was the preferred model. Results from this model and from the TVC model (M4) support the hypothesis that the association between TAC and dnDSA is underestimated when a TVC model is fit. The simulation study further confirmed this hypothesis and showed that the TVC has lower power and higher type I error compared to the JM when the outcomes have high measurement error and heavy interval censoring.
TAC is the most commonly used immunosuppressant drug in kidney transplantation and yet optimal therapeutic dose to prevent dnDSA, a common reason for graft loss, is unknown. Most published studies on the influence of immunosuppressant drugs on development of dnDSA compare drugs rather than evaluate dosing of a specific drug [36, 37]. A study in the liver transplant setting by Kaneku et al. analyzed risk factors for dnDSA in 749 adult liver transplant recipients [38]. They found 8.1% of individuals developed dnDSA at one year and at least one low TAC (<3 ng/ml) or cyclosporine (<75 ng/ml) level were associated with dnDSA (OR 2.66, p = 0.015). In our dataset, 21.7% of individuals had dnDSA by 1 year and we also found that for every one ng/ml lower average TAC the risk of dnDSA was higher (HR [95% CrI]: 1.56 [1.33, 1.92]) and the hazard of dnDSA also increased 2.33fold (95% CrI: 1.72, 3.13) for every 0.05 ng/ml/month decrease in the slope of TAC. The higher levels of dnDSA detected in our study may be due to several reasons, including center differences in detection levels of antibody counted as positive and inclusion criteria. Advancing previous models, we modeled the time to dnDSA while accounting for changes in TAC, where TAC is modeled as a variable with measurement error. Although minimization of TAC exposure with lower drug trough levels may decrease its associated toxicities, our results suggest that this strategy may place individuals at increased risk for dnDSA, which may have important implications for intermediate and longterm graft survival.
Similar to the findings presented by Kolagmunnage and Prentice [8, 9], our simulation found severe underestimation of the association between the longitudinal and survival outcomes when using the timevarying covariate model for longitudinal data with a considerable amount of measurement error. We also found a similarly severe underestimation of the association using the TVC model when the survival outcome was interval censored, even in the case of no measurement error. Importantly, the TVC model had a lower power to detect an association compared to the JM, and it had a higher type I error rate when heavy interval censoring was introduced. Interestingly, when half of the followup visits were missed at random for the scenarios studied, there was a negligible overall effect on power for both the TVC and JM. These results are important in understanding the behavior of the JM and TVC since in many clinical settings time to event outcomes can often be observed only within intervals. In future work, we would like to study the effect of informative missingness (e.g. probability of missing depends on a covariate, such as age, or the probability of missing increases as time from transplant progresses) on the power of detecting associations within these models.
We recognize the inherent differences in the two modeling approaches discussed here. The joint models (M1M3) all result in betweensubject association interpretations. For example, λ_{0} is interpreted as the increased hazard of dnDSA for an individual with an average TAC level of one ng/ml lower than another individual. In contrast, the timevarying covariate model (M4) yields a withinsubject interpretation. In this model, η relates to the increased hazard of dnDSA for a given individual, immediately following a one ng/ml decrease in TAC.
By employing MCMC to fit all models, we were able to easily obtain the posterior distribution of all estimated parameters. Since the main focus of this analysis was to assess the association between TAC and dnDSA, we were particularly interested in the association parameters: λ_{0} and λ_{1} in M1 and η in M4. We calculated credible intervals (CrI) for these parameters, which give the straightforward interpretation as an interval that contains the association parameter with 95% certainty. The MCMC framework also allows for easy calculations of dynamic predictions and this will be explored in future work. The code to fit the final joint model (M1) and timevarying covariate model (M4) on simulated data in JAGS is located in the Additional file 2. The code for fitting M1 in SAS using PROC NLMIXED is also provided.
Conclusion
The joint model examined in this paper allows for flexible modeling of dnDSA development as it relates to TAC levels over time. As our simulation study showed, this approach accommodates modeling heavily interval censored survival data jointly with highly variable longitudinal data, and if a time varying covariate (TVC) approach is used with heavy interval censoring and high measurement error, the associations may be severely underestimated, missed completely, or detected spuriously.
Availability of data and materials
Simulated datasets and the code used to fit the joint model and time varying covariate model are publicly available in the Additional file 2.
Abbreviations
 CrI:

Credible interval
 DIC:

Deviance information criterion
 dnDSA:

de novo Donor specific antibody
 HLA:

Human leucocyte antigen
 IC:

Interval censoring
 JAGS:

Just another Gibbs sampler
 JM:

Joint model
 MCMC:

Markov chain Monte Carlo
 ME:

Measurement error
 TAC:

Tacrolimus
 TVC:

Timevarying covariate
 WAIC:

WatanabeAkaike information criterion
References
 1
Hart A, Smith J, Skeans M, Gustafson S, Stewart D, Cherikh W, Wainright J, Kucheryavaya A, Woodbury M, Snyder J, et al.Optn/srtr 2015 annual data report: kidney. Am J Transplant. 2017; 17(S1):21–116.
 2
Sellares J, de Freitas D, Mengel M, Reeve J, Einecke G, Sis B. Understanding the causes of kidney transplant failure: The dominant role of antibodymediated rejection and nonadherence. Am J Transplant. 2011; 2(12):388–99.
 3
Parajuli S, Reville PK, Ellis TM, Djamali A, Mandelbrot DA. Utility of protocol kidney biopsies for de novo donorspecific antibodies. Am J Transplant. 2017; 17(12):3210–8.
 4
Schinstock C, Cosio F, Cheungpasitporn W, Dadhania D, Everly M, SamaniegoPicota M, Cornell L, Stegall M. The value of protocol biopsies to identify patients with de novo donorspecific antibody at high risk for allograft loss. Am J Transplant. 2017; 17(6):1574–84.
 5
Matas A, Smith J, Skeans M, Thompson B, Gustafson S, Schnitzler M, Stewart D, Cherikh W, Wainright J, Snyder J, et al.Optn/srtr 2012 annual data report: kidney. Am J Transplant. 2014; 14(S1):11–44.
 6
Wiebe C, Rush DN, Nevins TE, Birk PE, BlydtHansen T, Gibson IW, Goldberg A, Ho J, Karpinski M, Pochinco D, et al.Class ii eplet mismatch modulates tacrolimus trough levels required to prevent donorspecific antibody development. J Am Soc Nephrol: JASN. 2017; 28(11):3353–62.
 7
Davis S, Gralla J, Klem P, Tong S, Wedermyer G, Freed B, Wiseman A, Cooper JE. Lower tacrolimus exposure and time in therapeutic range increase the risk of de novo donorspecific antibodies in the first year of kidney transplantation. Am J Transplant. 2017; 18(4):907–15.
 8
KolamunnageDona R, Williamson PR. Timedependent efficacy of longitudinal biomarker for clinical endpoint. Stat Methods Med Res. 2018; 27(6):1909–24.
 9
Prentice R. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982; 69(2):331–42.
 10
Elashoff R, Li N, et al.Joint Modeling of Longitudinal and Timetoevent Data. Boca Raton, FL: CRC Press; 2016.
 11
Rizopoulos D. Joint Models for Longitudinal and Timetoevent Data: With Applications in R. Boca Raton, FL: CRC Press; 2012.
 12
Hogan JW, Laird NM. Modelbased approaches to analysing incomplete longitudinal and failure time data. Stat Med. 1997; 16(3):259–72.
 13
Tsiatis AA, Davidian M. Joint modeling of longitudinal and timetoevent data: an overview. Stat Sin. 2004; 14:809–34.
 14
Lawrence Gould A, Boye ME, Crowther MJ, Ibrahim JG, Quartey G, Micallef S, Bois FY. Joint modeling of survival and longitudinal nonsurvival data: current methods and issues. report of the dia bayesian joint modeling working group. Stat Med. 2015; 34(14):2181–95.
 15
Papageorgiou G, Mauff K, Tomer A, Rizopoulos D. An overview of joint modeling of timetoevent and longitudinal outcomes. Ann Rev Stat Appl. 2019; 6:223–40.
 16
Li J, Ma S. Survival Analysis in Medicine and Genetics. Boca Raton, FL: Chapman and Hall/CRC; 2013.
 17
Finkelstein DM, Wolfe RA. A semiparametric model for regression analysis of intervalcensored failure time data. Biometrics. 1985; 4:933–45.
 18
Goetghebeur E, Ryan L. Semiparametric regression analysis of intervalcensored data. Biometrics. 2000; 56(4):1139–44.
 19
Odell PM, Anderson KM, D’Agostino RB. Maximum likelihood estimation for intervalcensored data using a weibullbased accelerated failure time model. Biometrics. 1992; 48:951–9.
 20
Rouanet A, Joly P, Dartigues JF, ProustLima C, JacqminGadda H. Joint latent class model for longitudinal data and intervalcensored semicompeting events: Application to dementia. Biometrics. 2016; 72(4):1123–35.
 21
Chen CM, Shen Ps, Tseng YK. Semiparametric transformation joint models for longitudinal covariates and intervalcensored failure time. Comput Stat Data Anal. 2018; 128:116–27.
 22
Gueorguieva R, Rosenheck R, Lin H. Joint modelling of longitudinal outcome and intervalcensored competing risk dropout in a schizophrenia clinical trial. J R Stat Soc: Ser A (Stat Soc). 2012; 175(2):417–33.
 23
Sparling YvonneH, Younes Naji, Lachin John. Parametric survival models for intervalcensored data with timedependent covariates. Biostatistics. 2006; 7(4):599–614. https://doi.org/10.1093/biostatistics/kxj028.
 24
Plummer M. JAGS Version 3.3.0 User Manual. 2012. http://mcmcjags.sourceforge.net. Accessed 1 Aug 2018.
 25
Denwood MJ. runjags: An r package providing interface utilities, model templates, parallel computing methods and additional distributions for mcmc models in jags. J Stat Softw. 2016; 71(9):1–25.
 26
Ntzoufras I. Bayesian Modeling Using WinBUGS vol. 698.Hoboken, NJ: Wiley; 2011.
 27
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc: Ser B (Stat Methodol). 2002; 64(4):583–639.
 28
Watanabe S. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. J Mach Learn Res. 2010; 11:3571–94.
 29
Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Stat Comput. 2014; 24:997–1016.
 30
Kapur K, Li X, Blood EA, Hedeker D. Bayesian mixedeffects location and scale models for multivariate longitudinal outcomes: an application to ecological momentary assessment data. Stat Med. 2015; 34(4):630–51.
 31
JuarezColunga E, Silva G, Dean C. Joint modeling of zeroinflated panel count and severity outcomes. Biometrics. 2017; 73(4):1413–23.
 32
Gelman A, Meng XL, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Stat Sin. 1996; 6(4):733–60.
 33
Lynch SM, Western B. Bayesian posterior predictive checks for complex models. Sociol Methods Res. 2004; 32(3):301–35. http://smr.sagepub.com/content/32/3/301.full.pdf+html.
 34
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis, 3rd edn.Boca Raton, FL: CRC press; 2013.
 35
Pocock SJ, Clayton TC, Altman DG. Survival plots of timetoevent outcomes in clinical trials: good practice and pitfalls. Lancet. 2002; 359(9318):1686–9.
 36
O’Leary JG SM. The influence of immunosuppressive agents on the risk of de novo donorspecific hla antibody production in solid organ transplant recipients. Transplantation. 2016; 1(100):39–53.
 37
Thaunat O, Koenig A, Leibler C, Grimbert P. Effect of immunosuppressive drugs on humoral allosensitization after kidney transplant. J Am Soc Nephrol. 2016; 27(7):1890–1990.
 38
Kaneku H, O’leary J, Banuelos N, Jennings L, Susskind B, Klintmalm G, Terasaki P. De novo donorspecific hla antibodies decrease patient and graft survival in liver transplant recipients. Am J Transplant. 2013; 13(6):1541–8.
Acknowledgments
This paper is based on Kristen Campbell’s master’s thesis supervised by Dr. Elizabeth JuarezColunga, Dr. Jane Gralla and Dr. Gary Grunwald.
Funding
Supported by NIH/NCATS Colorado CTSA Grant Number UL1 TR002535. The design of the study and the collection, analysis, and interpretation of data are the authors’ sole responsibility and do not necessarily represent official NIH views.
Author information
Affiliations
Contributions
KC, EJC, and GKG performed all statistical analysis mentioned in the paper. SD and JC provided clinical context and guidance. JG served as the liaison between the statistical and clinical coauthors. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Study data collection was conducted in accordance with the Declaration of Helsinki and was approved by the ethics committee at the University of Colorado (COMIRB 133137). All data collected was retrospective in nature and deidentified, so individual consent forms were not needed and administrative data privileges were not needed.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1
Supplemental materials. Contains details on likelihood construction, additional figures and tables, and selected output from the reproducible examples. (PDF 393 kb)
Additional file 2
∙ example1JM.R: Code to fit M1
∙ longitudinaldata.csv: simulated TAC data for M1
∙ survivaldata.csv: simulated dnDSA data for M1
∙ model1JM.txt: JAGS model, called by example1JM.R
∙ example1NLMIXED.SAS: Code to fit M1 with PROC NLMIXED, uses same simulated data
∙ example4TVC.R: Code to fit M4
∙ longitudinal data tvc.csv: simulated TAC data for M4 (carried forward values of TAC)
∙ model4TVC.txt: JAGS model, called by example4TVC.R (ZIP 199 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Campbell, K.R., JuarezColunga, E., Grunwald, G.K. et al. Comparison of a timevarying covariate model and a joint model of timetoevent outcomes in the presence of measurement error and interval censoring: application to kidney transplantation. BMC Med Res Methodol 19, 130 (2019). https://doi.org/10.1186/s1287401907731
Received:
Accepted:
Published:
Keywords
 Bayesian analysis
 Interval censoring
 Measurement error
 Shared random effects