 Software
 Open Access
 Open Peer Review
 Published:
joineRML: a joint model and software package for timetoevent and multivariate longitudinal outcomes
BMC Medical Research Methodology volume 18, Article number: 50 (2018)
Abstract
Background
Joint modelling of longitudinal and timetoevent outcomes has received considerable attention over recent years. Commensurate with this has been a rise in statistical software options for fitting these models. However, these tools have generally been limited to a single longitudinal outcome. Here, we describe the classical joint model to the case of multiple longitudinal outcomes, propose a practical algorithm for fitting the models, and demonstrate how to fit the models using a new package for the statistical software platform R, joineRML.
Results
A multivariate linear mixed submodel is specified for the longitudinal outcomes, and a Cox proportional hazards regression model with timevarying covariates is specified for the event time submodel. The association between models is captured through a zeromean multivariate latent Gaussian process. The models are fitted using a Monte Carlo ExpectationMaximisation algorithm, and inferences are based on approximate standard errors from the empirical profile information matrix, which are contrasted to an alternative bootstrap estimation approach. We illustrate the model and software on a real data example for patients with primary biliary cirrhosis with three repeatedly measured biomarkers.
Conclusions
An opensource software package capable of fitting multivariate joint models is available. The underlying algorithm and source code makes use of several methods to increase computational speed.
Background
In many clinical studies, subjects are followedup repeatedly and response data collected. For example, routine blood tests might be performed at each followup clinic appointment for patients enrolled in a randomized drug trial, and biomarker measurements recorded. An event time is also usually of interest, for example time of death or study dropout. It has been repeatedly shown elsewhere that if the longitudinal and eventtime outcomes are correlated, then modelling the two outcome processes separately, for example using linear mixed models and Cox regression models, can lead to biased effect size estimates [1]. The same criticism has also been levelled at the application of socalled twostage models [2]. The motivation for using joint models can be broadly separated into interest in drawing inference about (1) the timetoevent process whilst adjusting for the intermittently measured (and potentially errorprone) longitudinal outcomes, and (2) the longitudinal data process whilst adjusting for a potentially informative dropout mechanism [3]. The literature on joint modelling is extensive, with excellent reviews given by Tsiatis and Davidian [4], Gould et al. [5], and the book by Rizopoulos [6].
Joint modelling has until recently been predominated by modelling a single longitudinal outcome together with a solitary event time outcome; herein referred to as univariate joint modelling. Commensurate with methodological research has been an increase in wideranging clinical applications (e.g. [7]). Recent innovations in the field of joint models have included the incorporation of multivariate longitudinal data [8], competing risks data [9, 10], recurrent events data [11], multivariate timetoevent data [12, 13], noncontinuous repeated measurements (e.g. count, binary, ordinal, and censored data) [14], nonnormally and nonparametrically distributed random effects [15], alternative estimation methodologies (e.g. Bayesian fitting and conditional estimating equations) [16, 17], and different association structures [18]. In this article, we specifically focus on the first innovation: multivariate longitudinal data. In this situation, we assume that multiple longitudinal outcomes are measured on each subject, which can be unbalanced and measured at different times for each subject.
Despite the inherently obvious benefits of harnessing all data in a single model or the published research on the topic of joint models for multivariate longitudinal data, a recent literature review by Hickey et al. [19] identified that publicly available software for fitting such models was lacking, which has translated into limited uptake by biomedical researchers. In this article we present the classical joint model described by Henderson et al. [3] extended to the case of multiple longitudinal outcomes. An algorithm proposed by Lin et al. [20] is used to fit the model, augmented by techniques to reduce the computational fitting time, including a quasiNewton update approach, variance reduction method, and dynamic Monte Carlo updates. This algorithm is encoded into a R sofware package–joineRML. A simulation analysis and realworld data example are used to demonstrate the accuracy of the algorithm and the software, respectively.
Implementation
As a prelude to the introduction and demonstration of the newly introduced software package, in the following section we describe the underlying model formulation and model fitting methodology.
Model
For each subject i=1,…,n, \(\boldsymbol {y}_{i} = \left (\boldsymbol {y}_{i1}^{\top }, \dots, \boldsymbol {y}_{iK}^{\top }\right)\) is the Kvariate continuous outcome vector, where each y_{ ik } denotes an (n_{ ik }×1)vector of observed longitudinal measurements for the kth outcome type: \(\boldsymbol {y}_{ik} = (y_{i1k}, \dots, y_{in_{ik}k})^{\top }\). Each outcome is measured at observed (possibly prespecified) times t_{ ijk } for j=1,…,n_{ ik }, which can differ between subjects and outcomes. Additionally, for each subject there is an event time \(T_{i}^{*}\), which is subject to right censoring. Therefore, we observe \(T_{i} = \min (T_{i}^{*}, C_{i})\), where C_{ i } corresponds to a potential censoring time, and the failure indicator δ_{ i }, which is equal to 1 if the failure is observed \((T_{i}^{*} \leq C_{i})\) and 0 otherwise. We assume that both censoring and measurement times are noninformative.
The model we describe is the natural extension of the model proposed by Henderson et al. [3] to the case of multivariate longitudinal data. The model posits an unobserved or latent zeromean (K+1)variate Gaussian process that is realised independently for each subject, \(W_{i}(t) = \left \{W_{1i}^{(1)}(t), \dots, W_{1i}^{(K)}(t), W_{2i}(t)\right \}\). This latent process subsequently links the separate submodels via association parameters.
The kth longitudinal data submodel is given by
where μ_{ ik }(t) is the mean response, and ε_{ ik }(t) is the model error term, which we assume to be independent and identically distributed normal with mean 0 and variance \(\sigma _{k}^{2}\). The mean response is specified as a linear model
where x_{ ik }(t) is a p_{ k }vector of (possibly) timevarying covariates with corresponding fixed effect terms β_{ k }. \(W_{1i}^{(k)}(t)\) is specified as
where z_{ ik }(t) is an r_{ k }vector of (possibly) timevarying covariates with corresponding subjectandoutcome random effect terms b_{ ik }, which follow a zeromean multivariate normal distribution with (r_{ k }×r_{ k })variancecovariance matrix D_{ kk }. To account for dependence between the different longitudinal outcome outcomes, we let cov(b_{ ik },b_{ il })=D_{ kl } for k≠l. Furthermore, we assume ε_{ ik }(t) and b_{ ik } are uncorrelated, and that the censoring times are independent of the random effects. These distributional assumptions together with the model given by (1)–(3) are equivalent to the multivariate extension of the Laird and Ware [21] linear mixed effects model. More flexible specifications of \(W_{1i}^{(k)}(t)\) can be used [3], including for example, stationary Gaussian processes. However, we do not consider these cases here owing to the increased computational burden it carries, even for the univariate case.
The submodel for the timetoevent outcome is given by the hazard model
where λ_{0}(·) is an unspecified baseline hazard, and v_{ i }(t) is a qvector of (possibly) timevarying covariates with corresponding fixed effect terms γ_{ v }. Conditional on W_{ i }(t) and the observed covariate data, the longitudinal and timetoevent data generating processes are conditionally independent. To establish a latent association, we specify W_{2i}(t) as a linear combination of \(\left \{W_{1i}^{(1)}(t), \dots, W_{1i}^{(K)}(t)\right \}\):
where γ_{ y }=(γ_{y1},…,γ_{ yK }) are the corresponding association parameters. To emphasise the dependence of W_{2i}(t) on the random effects, we explicitly write it as W_{2i}(t,b_{ i }) from here onwards. As per \(W_{1i}^{(k)}(t)\), W_{2i}(t,b_{ i }) can also be flexibly extended, for example to include subjectspecific frailty effects [3].
Estimation
Likelihood
For each subject i, let \(\boldsymbol {X}_{i} = \bigoplus _{k = 1}^{K} \boldsymbol {X}_{ik}\) and \(\boldsymbol {Z}_{i} = \bigoplus _{k = 1}^{K} \boldsymbol {Z}_{ik}\) be blockdiagonal matrices, where \(\boldsymbol {X}_{ik} = \left (\boldsymbol {x}_{i1k}^{\top }, \dots, \boldsymbol {x}_{in_{ik}k}^{\top }\right)\) is an (n_{ ik }×p_{ k })design matrix, with the jth row corresponding to the p_{ k }vector of covariates measured at time t_{ ijk }, and \(\bigoplus \) denotes the direct matrix sum. The notation similarly follows for the random effects design matrices, Z_{ ik }. We denote the error terms by a diagonal matrix \(\boldsymbol {\Sigma }_{i} = \bigoplus _{k = 1}^{K} \sigma _{k}^{2} \boldsymbol {I}_{n_{ik}}\) and write the overall variancecovariance matrix for the random effects as
where I_{ n } denotes an n×n identity matrix. We further define \(\boldsymbol {\beta } = \left (\boldsymbol {\beta }_{1}^{\top }, \dots, \boldsymbol {\beta }_{K}^{\top }\right)^{\top }\) and \(\boldsymbol {b}_{i} = \left (\boldsymbol {b}_{i1}^{\top }, \dots, \boldsymbol {b}_{iK}^{\top }\right)^{\top }\). Hence, we can then rewrite the longitudinal outcome submodel as
For the estimation, we will assume that the covariates in the timetoevent submodel are timeindependent and known at baseline, i.e. v_{ i }≡v_{ i }(0). Extensions of the estimation procedure for timevarying covariates are outlined elsewhere [6, p. 115]. The observed data likelihood for the joint outcome is given by
where \(\boldsymbol {\theta } = \left (\boldsymbol {\beta }^{\top }, \text {vech}(\boldsymbol {D}), \sigma _{1}^{2}, \dots, \sigma _{K}^{2}, \lambda _{0}(t), \boldsymbol {\gamma }_{v}^{\top }, \boldsymbol {\gamma }_{y}^{\top }\right)\) is the collection of unknown parameters that we want to estimate, with vech(D) denoting the halfvectorisation operator that returns the vector of lowertriangular elements of matrix D.
As noted by Henderson et al. [3], the observed data likelihood can be calculated by rewriting it as
where the marginal distribution f(y_{ i }  θ) is a multivariate normal density with mean X_{ i }β and variancecovariance matrix \(\boldsymbol {\Sigma }_{i} + \boldsymbol {Z}_{i} \boldsymbol {D} \boldsymbol {Z}_{i}^{\top }\), and f(b_{ i }  y_{ i },θ) is given by (6).
MCEM algorithm
We determine maximum likelihood estimates of the parameters θ using the Monte Carlo Expectation Maximisation (MCEM) algorithm [22], by treating the random effects b_{ i } as missing data. This is effectively the same as the conventional ExpectationMaximisation (EM) algorithm, as used by Wulfsohn and Tsiatis [23] and Ratcliffe et al. [24] in the context of fitting univariate data joint models, except the Estep exploits a Monte Carlo (MC) integration routine as opposed to Gaussian quadrature methods, which we expect to be beneficial when the dimension of random effects becomes large.
Starting from an initial estimate of the parameters, \(\hat {\boldsymbol {\theta }}^{(0)}\), the procedure involves iterating between the following two steps until convergence is achieved.

1.
Estep. At the (m+1)th iteration, we compute the expected loglikelihood of the complete data conditional on the observed data and the current estimate of the parameters,
$$\begin{array}{@{}rcl@{}} {\selectfont{\begin{aligned} {} Q(\boldsymbol{\theta} \,\, \hat{\boldsymbol{\theta}}^{(m)}) &= \sum\limits_{i=1}^{n} \mathbb{E} \Big\{\log f(\boldsymbol{y}_{i}, T_{i}, \delta_{i}, \boldsymbol{b}_{i} \,\, \boldsymbol{\theta})\Big\} \\ &= \sum\limits _{i=1}^{n} \int_{\infty}^{\infty} \Big\{\log f(\boldsymbol{y}_{i}, T_{i}, \delta_{i}, \boldsymbol{b}_{i} \,\, \boldsymbol{\theta})\Big\} f(\boldsymbol{b}_{i} \,\!\, T_{i}, \delta_{i}, \boldsymbol{y}_{i};\hat{\boldsymbol{\theta}}^{(m)}) d\boldsymbol{b}_{i}. \end{aligned}}} \end{array} $$Here, the completedata likelihood contribution for subject i is given by the integrand of (4).

2.
Mstep. We maximise \(Q(\boldsymbol {\theta } \,\, \hat {\boldsymbol {\theta }}^{(m)})\) with respect to θ. Namely, we set
$$ \hat{\boldsymbol{\theta}}^{(m+1)} = \underset{\boldsymbol{\theta}}{\text{argmax}}\ Q\left(\boldsymbol{\theta} \,\, \hat{\boldsymbol{\theta}}^{(m)}\right). $$
The Mstep estimators naturally follow from Wulfsohn and Tsiatis [23] and Lin et al. [20]. Maximizers for all parameters except γ_{ v } and γ_{ y } are available in closedform; algebraic details are presented in Additional file 1. The parameters \(\boldsymbol {\gamma } = (\boldsymbol {\gamma _{v}}^{\top }, \boldsymbol {\gamma }_{y}^{\top })^{\top }\) are jointly updated using a onestep NewtonRaphson algorithm as
where \(\hat {\boldsymbol {\gamma }}^{(m)}\) denotes the value of γ at the current iteration, \(S\left (\hat {\boldsymbol {\gamma }}^{(m)}\right)\) is the corresponding score, and \(I\left (\hat {\boldsymbol {\gamma }}^{(m)}\right)\) is the observed information matrix, which is equal to the derivative of the negative score. Further details of this update are given in Additional file 1. The Mstep for γ is computationally expensive to evaluate. Therefore, we also propose a quasiNewton onestep update by approximating \(I\left (\hat {\boldsymbol {\gamma }}^{(m)}\right)\) by an empirical information matrix for γ, which can be considered an analogue of the GaussNewton method [25, p. 8]. To further compensate for this approximation, we also use a nominal stepsize of 0.5 rather than 1, which is used when using the NewtonRaphson update.
The Mstep involves terms of the form \(\mathbb {E}\left [h(\boldsymbol {b}_{i}) \,\, T_{i},{\vphantom {\hat {\boldsymbol {\theta }}}}\right. \left.\delta _{i}, \boldsymbol {y}_{i}; \hat {\boldsymbol {\theta }}\right ]\), for known functions h(·). The conditional expectation of a function of the random effects can be written as
where \(f(T_{i}, \delta _{i} \,\, \boldsymbol {b}_{i}; \hat {\boldsymbol {\theta }})\) is given by
and \(f(\boldsymbol {b}_{i} \,\, \boldsymbol {y}_{i}; \hat {\boldsymbol {\theta }})\) is calculated from multivariate normal distribution theory as
with \(\boldsymbol {A}_{i} = \left (\boldsymbol {Z}_{i}^{\top } \boldsymbol {\Sigma }_{i}^{1} \boldsymbol {Z}_{i} + \boldsymbol {D}^{1}\right)^{1}\). As this becomes computationally expensive using Gaussian quadrature commensurate with increasing dimension of b_{ i }, we estimate the integrals by MC sampling such that the expectation is approximated by the ratio of the sample means for \(h(\boldsymbol {b}_{i}) f(T_{i}, \delta _{i} \,\, \boldsymbol {b}_{i}; \hat {\boldsymbol {\theta }})\) and \(f(T_{i}, \delta _{i} \,\, \boldsymbol {b}_{i}; \hat {\boldsymbol {\theta }})\) evaluated at each MC draw. Furthermore, we use antithetic simulation for variance reduction in the MC integration. Instead of directly sampling from (6), we sample Ω∼N(0,I_{ r }) and obtain the pairs
where C_{ i } is the Cholesky decomposition of A_{ i } such that \(\boldsymbol {C}_{i} \boldsymbol {C}_{i}^{\top } = \boldsymbol {A}_{i}\). Therefore we only need to draw N/2 samples using this approach, and by virtue of the negative correlation between the pairs, it leads to a smaller variance in the sample means taken in the approximation than would be obtained from N independent simulations. The choice of N is described below.
Initial values
The EM algorithm requires that initial parameters are specified, namely \(\hat {\boldsymbol {\theta }}^{(0)}\). By choosing values close to the maximizer, the number of iterations required to reach convergence should be reduced.
For the timetoevent submodel, a quasitwostage model is fitted when the measurement times are balanced, i.e. when t_{ ijk }=t_{ ij }∀k. That is, we fit separate LMMs for each longitudinal outcome as per (1), ignoring the correlation between different outcomes. This is straightforward to implement using standard software, in particular using lme() and coxph() from the R packages nlme [26] and survival [27], respectively. From the fitted models, the best linear unbiased predictions (BLUPs) of the separate model random effects are used to estimate each \(W_{1i}^{(k)}(t)\) function. These estimates are then included as timevarying covariates in a Cox regression model, alongside any other fixed effect covariates, which can be straightforwardly fitted using standard software. In the situation that the data are not balanced, i.e. when t_{ ijk }≠t_{ ij }∀k, then we fit a standard Cox proportional hazards regression model to estimate γ_{ v } and set γ_{ yk }=0∀k.
For the longitudinal data submodel, when K>1 we first find the maximum likelihood estimate of \(\left \{\boldsymbol {\beta }, \text {vech}(\boldsymbol {D}), \sigma _{1}^{2}, \dots, \sigma _{K}^{2}\right \}\) by running a separate EM algorithm for the multivariate linear mixed model. Both the E and Mstep updates are available in closed form, and the initial parameters for this EM algorithm are available from the separate LMM fits, with D initialized as blockdiagonal. As these are estimated using an EM rather than MCEM algorithm, we can specify a stricter convergence criterion on the estimates.
Convergence and stopping rules
Two standard stopping rules for the deterministic EM algorithm used to declare convergence are the relative and absolute differences, defined as
respectively, for some appropriate choice of ε_{0}, ε_{1}, and ε_{2}, where the maximum is taken over the components of θ. For reference, the R package JM [28] implements (7) (in combination with another rule based on relative change in the likelihood), whereas the R package joineR [29] implements (8). The relative difference might be unstable about parameters near zero that are subject to MC error. Therefore, the convergence criterion for each parameter might be chosen separately at each EM iteration based on whether the absolute magnitude is below or above some threshold. A similar approach is adopted in the EM algorithms employed by the software package SAS [30, p. 330].
The choice of N and the monitoring of convergence are conflated when applying a MCEM algorithm, and a dynamic approach is required. As noted by [22], it is computationally inefficient to use a large N in the early phase of the algorithm when the parameter estimates are likely to be far from the maximizer. On the flip side, as the parameter estimates approach the maximizer, the stopping rules will fail as the changes in parameter estimates will be swamped by MC error. Therefore, it has been recommended that one increase N as the estimate moves towards the maximizer. Although this might be done subjectively [31] or by prespecified rules [32], an automated approach is preferable and necessary for a software implementation. Booth and Hobert [33] proposed an update rule based on a confidence ellipsoid for the maximizer at the (m+1)th iteration, calculated using an approximate sandwich estimator for the maximizer, which accounts for the MC error at each iteration. This approach requires additional variance estimation at each iteration, therefore we opt for a simpler approach described by Ripatti et al. [34]. Namely, we calculate a coefficient of variation at the (m+1)th iteration as
where \(\Delta _{\text {rel}}^{(m+1)}\) is given by (7), and sd(·) and mean(·) are the sample standard deviation and mean functions, respectively. If \(\text {cv}\left (\Delta _{\text {rel}}^{(m+1)}\right) > \text {cv}\left (\Delta _{\text {rel}}^{(m)}\right)\), then N:=N+⌊N/δ⌋, for some small positive integer δ. Typically, we run the MCEM algorithm with a small N (for a fixed number of iterations—a burnin) before implementing this update rule in order to get into the approximately correct parameter region. Appropriate values for other parameters will be application specific, however we have found δ=3, N=100K (for 100K burnin iterations), ε_{1}=0.001, and ε_{0}=ε_{2}=0.005 delivers reasonably accurate estimates in many cases, where K was earlier defined as the number of longitudinal outcomes.
As the EM monotonicity property is lost due to the MC integrations in the MCEM algorithm, convergence might be prematurely declared due to stochasticity if the εvalues are too large. To reduce the chance of this occurring, we require that the stopping rule is satisfied for 3 consecutive iterations [33, 34]. However, in any case, trace plots should be inspected to confirm convergence is appropriate.
Standard error estimation
Standard error (SE) estimation is usually based on inverting the observed information matrix. When the baseline hazard is unspecified, as is the case here, this presents several challenges. First, \(\hat {\lambda }_{0}(t)\) will generally be a highdimensional vector, which might lead to numerical difficulties in the inversion of the observed information matrix [6]. Second, the profile likelihood estimates based on the usual observed information matrix approach are known to be underestimated [35]. The reason for this is that the profile estimates are implicit, since the posterior expectations, given by (5), depend on the parameters being estimated, including λ_{0}(t) [6, p. 67].
To overcome these challenges, Hsieh et al. [35] recommended to use bootstrap methods to calculate the SEs. However, this approach is computationally expensive. Moreover, despite the purported theoretical advantages, we also note that recently it has been suggested that bootstrap estimators might actually overestimate the SEs; e.g. [36, p. 740] and [35, p. 1041]. At the model development stage, it is often of interest to gauge the strength of association of model covariates, which is not feasible with repeated bootstrap implementations. Hence, an approximate SE estimator is desirable. In either case, the theoretical properties will be contaminated by the addition of MC error from the MCEM algorithm, and it is not yet fully understood what the ramifications of this are. Hence, any standard errors must be interpreted with a degree of caution. We consider two estimators below.
1. Bootstrap method. These are estimated by sampling n subjects with replacement and relabelling the subjects with indices i^{′}=1,…,n. We then refit the model to the bootstrapsampled dataset. It is important to note that we resample subjects, not individual data points. This is repeated Btimes, for a sufficiently large integer B. Since we already have the MLEs from the fitted model, we can use these as initial values for each bootstrap model fit, thus reducing initial computational overheads in calculating approximate initial parameters. For each iteration, we extract the model parameter estimates for \(\left (\boldsymbol {\beta }^{\top }, \text {vech}(\boldsymbol {D}), \sigma _{1}^{2}, \dots, \sigma _{K}^{2}, \boldsymbol {\gamma }_{v}^{\top }, \boldsymbol {\gamma }_{y}^{\top }\right)\). Note that we do not estimate SEs for λ_{0}(t) using this approach. However, they are generally not of inferential interest. When B is sufficiently large, the SEs can be estimated from the estimated coefficients of the bootstrap samples. Alternatively, 100(1−α)%confidence intervals can be estimated from the the 100α/2th and 100(1−α/2)th percentiles.
2. Empirical information matrix method. Using the Breslow estimator for \(\int _{0}^{t} \lambda _{0}(u) \mathrm {d}u\), the profile score vector for \(\boldsymbol {\theta }_{\lambda } = (\boldsymbol {\beta }^{\top }, \text {vech}(\boldsymbol {D}), \sigma _{1}^{2}, \dots, \sigma _{K}^{2}, \boldsymbol {\gamma }^{\top })\) is calculated (see Additional file 1). We approximate the profile information for θ_{−λ} by \(I_{e}^{1/2}(\hat {\boldsymbol {\theta }}_{\lambda _{0}})\), where \(I_{e}(\boldsymbol {\theta }_{\lambda _{0}})\phantom {\dot {i}\!}\) is the observed empirical information [25] given by
s_{ i }(θ_{−λ}) is the conditional expectation of the completedata profile score for subject i, S(θ_{−λ}) is the score defined by \(S(\boldsymbol {\theta }_{\lambda }) = \sum _{i=1}^{n} s_{i}(\boldsymbol {\theta }_{\lambda })\), and a^{⊗2}=aa^{⊤} is outer product for a vector a. At the maximizer, \(S(\hat {\boldsymbol {\theta }}) = 0\), meaning that the right handside of (9) is zero. Due to the MC error in the MCEM algorithm, this will not be exactly zero, and therefore we include it in the calculations. As per the bootstrap approach, SEs for the baseline hazard are again not calculated. We note that this SE estimator will be subject to the exact same theoretical limitation of underestimation described by Hsieh et al. [35], since the profiling was implicit; that is, because the posterior expectations involve the parameters θ.
Software
The model described here is implemented in the R package joineRML, which is available on the The Comprehensive R Archive Network (CRAN) (https://CRAN.Rproject.org/package=joineRML). The principal function in joineRML is mjoint(). The primary arguments for implementing mjoint() are summarised in Table 1. To achieve computationally efficiency, parts of the MCEM algorithm in joineRML are coded in C++ using the Armadillo linear algebra library and integrated using the R package RcppArmadillo [37].
A model fitted using the mjoint() function returns an object of class mjoint. By default, approximate SE estimates are calculated using the empirical information matrix. If one wishes to use bootstrap standard error estimates, then the user can pass the model object to the bootSE() function. Several generic functions (or rather, S3 methods) can also be applied to mjoint objects, as described in Table 2. These generic functions include common methods, for example coef(), which extracts the model coefficients; ranef(), which extracts the BLUPs (and optional standard errors); and resid(), which extracts the residuals from the linear mixed submodel. The intention of these functions is to have a common syntax with standard R packages for linear mixed models [26] and survival analysis [27]. Additionally, plotting capabilities are included in joineRML. These include trace plots for assessment of convergence of the MCEM algorithm, and caterpillar plots for subjectspecific random effects (Table 2).
The package also provides several datasets, and a function simData() that allows for simulation of data from joint models with multiple longitudinal outcomes. joineRML can also fit univariate joint models, however in this case we would currently recommend that the R packages joineR [29], JM [28], or frailtypack [38] are used, which are optimized for the univariate case and exploits Gaussian quadrature. In addition, these packages allow for extensions to more complex cases; for example, competing risks [28, 29] and recurrent events [38].
Results
Simulation analysis
A simulation study was conducted assuming two longitudinal outcomes and n=200 subjects. Longitudinal data were simulated according to a followup schedule of 6 time points (at times 0,1,…,5), with each model including subjectandoutcomespecific randomintercepts and randomslopes: b_{ i }=(b_{0i1},b_{1i1},b_{0i2},b_{1i2})^{⊤}, Correlation was induced between the 2 outcomes by assuming correlation of − 0.5 between the random intercepts for each outcome. Event times were simulated from a Gompertz distribution with shape θ_{1}=−3.5 and scale exp(θ_{0})= exp(0.25)≈1.28, following the methodology described by Austin [39]. Independent censoring times were drawn from an exponential distribution with rate 0.05. Any subject where the event and censoring time exceeded 5 was administratively censored at the truncation time C=5.1. For all submodels, we included a pair of covariates X_{ i }=(x_{i1},x_{i2})^{⊤}, where x_{i1} is a continuous covariate independently drawn from N(0,1) and x_{i2} is a binary covariate independently drawn from Bin(1,0.5). The submodels are given as
where D is specified unstructured (4×4)covariance matrix with 10 unique parameters. Simulating datasets is straightforward using the joineRML package by means of the simData() function. The true parameter values and results from 500 simulations are shown in Table 3. In particular, we display the mean estimate, the bias, the empirical SE (= the standard deviation of the the parameter estimates); the mean SE (= the mean SE of each parameter calculated for each fitted model); the mean square error (MSE), and the coverage. The results confirm that the model fitting algorithm generally performs well.
A second simulation analysis was conducted using the parameters above (with n=100 subjects per dataset). However, in this case we used a heaviertailed distribution for the random effects: a multivariate t_{5} distribution [40]. The bias for the fixed effect coefficients was comparable to the multivariate normal random effects simulation study (above). The empirical standard error was consistently smaller than the mean standard error, resulting in coverage between 95% and 99% for the coefficient parameters. Rizopoulos et al. [41] noted that the misspecification of the random effects distributions was minimised as the number of longitudinal measurements per subject increased, but that the standard errors are generally affected. These findings are broadly in agreement with the simulation study conducted here, and other studies [42, 43]. Choi et al. [44] provide a review of existing research on the misspecification of random effects in joint modelling.
Example
We consider the primary biliary cirrhosis (PBC) data collected at the Mayo Clinic between 1974 to 1984 [45]. This dataset has been widely analyzed using joint modelling methods [18, 46, 47]. PBC is a longterm liver disease in which the bile ducts in the liver become damaged. Progressively, this leads to a buildup of bile in the liver, which can damage it and eventually lead to cirrhosis. If PBC is not treated or reaches an advanced stage, it can lead to several major complications, including mortality. In this study, 312 patients were randomised to receive Dpenicillamine (n=158) or placebo (n=154). In this example we analyse the subset of patients randomized to placebo.
Patients with PBC typically have abnormalities in several blood tests; hence, during followup several biomarkers associated with liver function were serially recorded for these patients. We consider three biomarkers: serum bilirunbin (denoted serBilir in the model and data; measured in units of mg/dl), serum albumin (albumin; mg/dl), and prothrombin time (prothrombin; seconds). Patients had a mean 6.3 (SD = 3.7) visits (including baseline). The data can be accessed from the joineRML package via the command data(pbc2). Profile plots for each biomarker are shown in Fig. 1, indicating distinct differences in trajectories between the those who died during followup and those who did not (rightcensored cases). A KaplanMeier curve for overall survival is shown in Fig. 2. There were a total of 69 (44.8%) deaths during followup in the placebo subset.
We fit a relatively simple joint model for the purposes of demonstration, which encompasses the following trivariate longitudinal data submodel:
and a timetoevent submodel for the study endpoint of death:
The log transformation of bilirubin is standard, and confirmed reasonable based on inspection of QQ plots for residuals from a separate fitted linear mixed model fitted using the lme() function from the R package nlme. Albumin did not require transformation. Residuals were grossly nonnormal for prothrombin time using both untransformed and logtransformed outcomes. Therefore, a BoxCox transformation was applied, which suggested an inversequartic transform might be suitable, which was confirmed by inspection of a QQ plot. The pairwise correlations for baseline measurements between the three transformed markers were 0.19 (prothrombin time vs. albumin), − 0.30 (bilirubin vs. prothrombin time and albumin). The model is fit using the joineRML R package (version 0.2.0) using the following code.
Here, we have specified a more stringent tolerance value for ε_{0} than the default setting in mjoint(). Additionally, the burnin phase was increased to 400 iterations after inspection of convergence trace plots. The model fits in 3.1 min on a MacBook Air 1.6GHz Intel Core i5 with 8GB or RAM running R version 3.3.0, having completed 423 MCEM iterations (not including the EM algorithm iterations performed for determining the initial values of the separate multivariate linear mixed submodel) with a final MC size of M=3528. The fitted model results are shown in Table 4.
The fitted model indicated that an increase in the subjectspecific random deviation from the population trajectory of serum bilirubin was significantly associated with increased hazard of death. A significant association was also detected for subjectspecific decreases in albumin from the population mean trajectory. However, prothrombin time was not significantly associated with hazard of death, although its direction is clinically consistent with PBC disease. Albert and Shih [46] analysed the first 4years followup from this dataset with the same 3 biomarkers and a discrete event time distribution using a regression calibration model. Their results were broadly consistent, although the effect of prothrombin time on the event time submodel was strongly significant.
We also fitted 3 univariate joint models to each of the biomarkers and the event time submodel using the R package joineR (version 1.2.0) owing to its optimization for such models. The LMM parameter estimates were similar, although the absolute magnitude of the slopes was smaller for the separate univariate models. Since 3 separate models were fitted, 3 estimates of γ_{ v } were estimated, with the average comparable to the multivariate model estimate. The multivariate model estimates of γ_{ y }=(γ_{bil},γ_{alb},γ_{pro})^{⊤} were substantially attenuated relative to the separate model estimates, although the directions remained consistent. It is also interesting to note that γ_{pro} was statistically significant in the univariate model. However, the univariate models are not accounting for the correlation between different outcomes, whereas the multivariate joint model does.
The model was refitted with the onestep NewtonRaphson update for γ replaced by a GaussNewtonlike update in a time of 2.2 minutes for 419 MCEM iterations with a final MC size of M=6272. This is easily achieved by running the following code.
In addition, we bootstrapped this model with B=100 samples to estimate SEs and contrast them with the approximate estimates based on the inverse empirical profile information matrix. In practice, one should choose B>100, particularly if using bootstrap percentile confidence intervals; however, we used a small value to reduce the computational burden on this process. In a similar spirit, we relaxed the convergence criteria and reduced the number of burnin iterations. This is easily implemented by running the following code, taking 1.8 h to fit.
It was observed that the choice of gradient matrix in the γupdate led to virtually indistinguishable parameter estimates, although we note the same random seed was used in both cases. The bootstrap estimated SEs were broadly consistent with the approximate SEs, with no consistent pattern in underestimation observed.
Discussion
Multivariate joint models introduce three types of correlations: (1) withinsubject serial correlation for repeated measures; (2) between longitudinal outcomes correlation; and (3) correlation between the multivariate LMM and timetoevent submodels. It is important to account for all of these types of correlations; however, some authors have reported collapsing their multivariate data to permit univariate joint models to be fitted. For example, Battes et al. [7] used an ad hoc approach of either summing or multiplying the three repeated continuous measures (standardized according to clinical upper reference limits of the biomarker assays), and then applying standard univariate joint models. Wang et al. [48] fitted separate univariate joint models to each longitudinal outcome in turn. Neither approach takes complete advantage of the correlation between the multiple longitudinal measures and the timetoevent outcome.
Here, we described a new R package joineRML that can fit the models described in this paper. This was demonstrated on a realworld dataset. Although in the fitted model we assumed linear trajectories for the biomarkers, splines could be straightforwardly employed, as have been used in other multivariate joint model applications [15], albeit at the cost of additional computational time. Despite a growing availability of software for univariate joint models, Hickey et al. [19] noted that there were very few options for fitting joint models involving multivariate longitudinal data. To the best of our knowledge, options are limited to the R packages JMbayes [49], rstanarm [50], and the Stata package stjm [47]. Moreover, none of these incorporates an unspecified baseline hazard. The first two packages use Markov chain Monte Carlo (MCMC) methods to fit the joint models. Bayesian models are potentially very useful for fitting joint models, and in particular for dynamic prediction; however, MCMC is also computationally demanding, especially in the case of multivariate models. Several other publications have made BUGS code available for use with WinBUGS and OpenBUGS (e.g. [51]), but these are not easily modifiable and postfit computations are cumbersome.
joineRML is a new software package developed to fill a void in the joint modelling field, but is still in its infancy relative to highly developed univariate joint model packages such as the R package JM [28] and Stata package stjm [47]. Future developments of joineRML intend to cover several deficiencies. First, joineRML currently only permits an association structure of the form \(W_{2i}(t) = \sum _{k=1}^{K} \gamma _{yk} W_{1i}^{(k)}(t)\). As has been demonstrated by others, the association might take different forms, including randomslopes and cumulative effects or some combination of multiple structures, and these may also be different for separate longitudinal outcomes [18]. Moreover, it is conceivable that separate longitudinal outcomes may interact in the hazard submodel. Second, the use of MC integration provides a scalable solution to the issue of increasing dimensionality in the random effects. However, for simpler cases, e.g. bivariate models with randomintercepts and randomslopes (total of 4 random effects), Gaussian quadrature might be computationally superior; this tradeoff requires further investigation. Third, joineRML can currently only model a single event time. However, there is a growing interest in competing risks [9] and recurrent events data [11], which if incorporated into joineRML, would provide a flexible allround multivariate joint modelling platform. Competing risks [28, 29] and recurrent events [38] have been incorporated into joint modelling R packages already, but are limited to the case of a solitary longitudinal outcome. Of note, the PBC trial dataset analysed in this study includes times to the competing risk of liver transplantation. Fourth, with everincreasing volumes of data collected during routine clinical visits, the need for software to fit joint models with very many longitudinal outcomes is foreseeable [52]. This would likely require the use of approximate methods for the numerical integration or data reduction methods. Fifth, additional residual diagnostics are necessary for assessing possible violations of model assumptions. The joineRML package has a resid() function for extracting the longitudinal submodel residuals; however, these are complex for diagnostic purposes due to the informative dropout, hence the development of multipleimputation based residuals [53].
Conclusions
In this paper we have presented an extension of the classical joint model proposed by Henderson et al. [3] and an estimation procedure for fitting the models that builds on the foundations laid by Lin et al. [20]. In addition, we described a new R package joineRML that can fit the models described in this paper, which leverages the MCEM algorithm and which should scale well for increasing number of longitudinal outcomes. This software is timely, as it has previously been highlighted that there is a paucity of software available to fit such models [19]. The software is being regularly updated and improved.
Availability and requirements
Project name:joineRMLProject home page:https://github.com/graemeleehickey/joineRML/Operating system(s): platform independent Programming language: R Other requirements: none License: GNU GPL3 Any restrictions to use by nonacademics: none
Abbreviations
 BLUP:

Best linear unbiased prediction
 CRAN:

The Comprehensive R Archive Network
 EM:

Expectation maximisation
 LMM:

Linear mixed models
 MC:

Monte Carlo
 MCEM:

Monte Carlo expectation maximisation
 MLE:

Maximum likelihood estimate
 PBC:

Primary biliary cirrhosis
 SD:

Standard deviation
 SE:

Standard error
References
 1
Ibrahim JG, Chu H, Chen LM. Basic concepts and methods for joint models of longitudinal and survival data. J Clin Oncol. 2010; 28(16):2796–801.
 2
Sweeting MJ, Thompson SG. Joint modelling of longitudinal and timetoevent data with application to predicting abdominal aortic aneurysm growth and rupture. Biom J. 2011; 53(5):750–63.
 3
Henderson R, Diggle PJ, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1(4):465–480.
 4
Tsiatis AA, Davidian M. Joint modeling of longitudinal and timetoevent data: an overview. Stat Sin. 2004; 14:809–34.
 5
Gould AL, Boye ME, Crowther MJ, Ibrahim JG, Quartey G, Micallef S, Bois FY. Joint modeling of survival and longitudinal nonsurvival data: current methods and issues. report of the DIA Bayesian joint modeling working group. Stat Med. 2015; 34:2181–95.
 6
Rizopoulos D. Joint Models for Longitudinal and TimetoEvent Data, with Applications in R. Boca Raton: Chapman & Hall/CRC; 2012.
 7
Battes LC, Caliskan K, Rizopoulos D, Constantinescu AA, Robertus JL, Akkerhuis M, Manintveld OC, Boersma E, Kardys I. Repeated measurements of NTproBtype natriuretic peptide, troponin T or Creactive protein do not predict future allograft rejection in heart transplant recipients. Transplantation. 2015; 99(3):580–5.
 8
Song X, Davidian M, Tsiatis AA. An estimator for the proportional hazards model with multiple longitudinal covariates measured with error. Biostatistics. 2002; 3(4):511–28.
 9
Williamson P, KolamunnageDona R, Philipson P, Marson AG. Joint modelling of longitudinal and competing risks data. Stat Med. 2008; 27:6426–38.
 10
Hickey GL, Philipson P, Jorgensen A, KolamunnageDona R. A comparison of joint models for longitudinal and competing risks data, with application to an epilepsy drug randomized controlled trial. J R Stat Soc: Ser A: Stat Soc. 2018. https://doi.org/10.1111/rssa.12348.
 11
Liu L, Huang X. Joint analysis of correlated repeated measures and recurrent events processes in the presence of death, with application to a study on acquired immune deficiency syndrome. J R Stat Soc: Ser C: Appl Stat. 2009; 58(1):65–81.
 12
Chi YY, Ibrahim JG. Joint models for multivariate longitudinal and multivariate survival data. Biometrics. 2006; 62(2):432–45.
 13
Hickey GL, Philipson P, Jorgensen A, KolamunnageDona R. Joint models of longitudinal and timetoevent data with more than one event time outcome: a review. Int J Biostat. 2018. https://doi.org/10.1515/ijb20170047.
 14
Andrinopoulou ER, Rizopoulos D, Takkenberg JJM, Lesaffre E. Combined dynamic predictions using joint models of two longitudinal outcomes and competing risk data. Stat Methods Med Res. 2017; 26(4):1787–1801.
 15
Rizopoulos D, Ghosh P. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a timetoevent. Stat Med. 2011; 30(12):1366–80.
 16
Faucett CL, Thomas DC. Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Stat Med. 1996; 15(15):1663–85.
 17
Song X, Davidian M, Tsiatis AA. A semiparametric likelihood approach to joint modeling of longitudinal and timetoevent data. Biometrics. 2002; 58(4):742–53.
 18
Andrinopoulou ER, Rizopoulos D. Bayesian shrinkage approach for a joint model of longitudinal and survival outcomes assuming different association structures. Stat Med. 2016; 35(26):4813–23.
 19
Hickey GL, Philipson P, Jorgensen A, KolamunnageDona R. Joint modelling of timetoevent and multivariate longitudinal outcomes: recent developments and issues. BMC Med Res Methodol. 2016; 16(1):1–15.
 20
Lin H, McCulloch CE, Mayne ST. Maximum likelihood estimation in the joint analysis of timetoevent and multiple longitudinal variables. Stat Med. 2002; 21:2369–82.
 21
Laird NM, Ware JH. Randomeffects models for longitudinal data. Biometrics. 1982; 38(4):963–74.
 22
Wei GC, Tanner MA. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc. 1990; 85(411):699–704.
 23
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997; 53(1):330–9.
 24
Ratcliffe SJ, Guo W, Ten Have TR. Joint modeling of longitudinal and survival data via a common frailty. Biometrics. 2004; 60(4):892–9.
 25
McLachlan GJ, Krishnan T. The EM Algorithm and Extensions, 2nd ed.Hoboken: WileyInterscience; 2008.
 26
Pinheiro JC, Bates DM. MixedEffects Models in S and SPLUS. New York: Springer; 2000.
 27
Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model. New Jersey: Springer; 2000, p. 350.
 28
Rizopoulos D. JM: an R package for the joint modelling of longitudinal and timetoevent data. Journal of Statistical Software. 2010; 35(9):1–33.
 29
Philipson P, Sousa I, Diggle PJ, Williamson P, KolamunnageDona R, Henderson R, Hickey GL. joineR: Joint Modelling of Repeated Measurements and Timetoevent Data. 2017. R package version 1.2.0. https://CRAN.Rproject.org/package=joineR.
 30
Dmitrienko A, Molenberghs G, ChuangStein C, Offen W. Analysis of Clinical Trials Using SAS: A Practical Guide.Cary: SAS Institute; 2005.
 31
Law NJ, Taylor JM, Sandler H. The joint modeling of a longitudinal disease progression marker and the failure time process in the presence of cure. Biostatistics. 2002; 3(4):547–63.
 32
McCulloch CE. Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc. 1997; 92(437):162–70.
 33
Booth JG, Hobert JP. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm.J R Stat Soc Ser B Stat Methodol. 1999; 61(1):265–85.
 34
Ripatti S, Larsen K, Palmgren J. Maximum likelihood inference for multivariate frailty models using an automated Monte Carlo EM algorithm. Lifetime Data Anal. 2002; 8(2002):349–60.
 35
Hsieh F, Tseng YK, Wang JL. Joint modeling of survival and longitudinal data: Likelihood approach revisited. Biometrics. 2006; 62(4):1037–43.
 36
Xu C, Baines PD, Wang JL. Standard error estimation using the EM algorithm for the joint modeling of survival and longitudinal data. Biostatistics. 2014; 15(4):731–44.
 37
Eddelbuettel D, Sanderson C. RcppArmadillo: accelerating r with highperformance C++ linear algebra. Comput Stat Data Anal. 2014; 71:1054–63.
 38
Król A, Mauguen A, Mazroui Y, Laurent A, Michiels S, Rondeau V. Tutorial in joint modeling and prediction: A statistical software for correlated longitudinal outcomes, recurrent events and a terminal event. J Stat Softw. 2017; 81(3):1–52.
 39
Austin PC. Generating survival times to simulate Cox proportional hazards models with timevarying covariates. Stat Med. 2012; 31(29):3946–58.
 40
Genz A, Bretz F. Computation of Multivariate Normal and t Probabilities. Berlin: Springer; 2009.
 41
Rizopoulos D, Verbeke G, Molenberghs G. Shared parameter models under randomeffects misspecification. Biometrika. 2008; 95(1):63–74.
 42
Xu J, Zeger SL. The evaluation of multiple surrogate endpoints. Biometrics. 2001; 57(1):81–7.
 43
Pantazis N, Touloumi G. Robustness of a parametric model for informatively censored bivariate longitudinal data under misspecification of its distributional assumptions: a simulation study. Stat Med. 2007; 26:5473–85.
 44
Choi J, Zeng D, Olshan AF, Cai J. Joint modeling of survival time and longitudinal outcomes with flexible random effects. Lifetime Data Anal. 2018; 24(1):126–52.
 45
Murtaugh PA, Dickson ER, Van Dam GM, Malinchoc M, Grambsch PM, Langworthy AL, Gips CH. Primary biliary cirrhosis: prediction of shortterm survival based on repeated patient visits. Hepatology. 1994; 20(1):126–134.
 46
Albert PS, Shih JH. An approach for jointly modeling multivariate longitudinal measurements and discrete timetoevent data. Ann Appl Stat. 2010; 4(3):1517–32.
 47
Crowther MJ, Abrams KR, Lambert PC. Joint modeling of longitudinal and survival data. Stata J. 2013; 13(1):165–84.
 48
Wang P, Shen W, Boye ME. Joint modeling of longitudinal outcomes and survival using latent growth modeling approach in a mesothelioma trial. Health Serv Outcome Res Methodol. 2012; 12(2–3):182–99.
 49
Rizopoulos D. The R package JMbayes for fitting joint models for longitudinal and timetoevent data using mcmc. J Stat Softw. 2016; 72(7):1–45.
 50
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker MA, Li P, Riddell A. Stan: a probabilistic programming language. J Stat Softw. 2017; 76(1):1–32.
 51
Andrinopoulou ER, Rizopoulos D, Takkenberg JJM, Lesaffre E. Joint modeling of two longitudinal outcomes and competing risk data. Stat Med. 2014; 33(18):3167–78.
 52
Jaffa MA, Gebregziabher M, Jaffa AA. A joint modeling approach for right censored high dimensiondal multivariate longitudinal data. J Biom Biostat. 2014; 5(4):1000203.
 53
Rizopoulos D, Verbeke G, Molenberghs G. Multipleimputationbased residuals and diagnostic plots for joint models of longitudinal and survival outcomes. Biometrics. 2010; 66(1):20–9.
Acknowledgements
The authors would like to thank Professor Robin Henderson (University of Newcastle) for useful discussions with regards to the MCEM algorithm, and Dr Haiqun Lin (Yale University) for helpful discussions on the likelihood specification.
Funding
Funding for the project was provided by the Medical Research Council (Grant number MR/M013227/1). The funder had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Availability of data and materials
The R package joineRML can be installed directly using install.packages(~joineRML~) in an R console. The source code is available at https://github.com/graemeleehickey/joineRML. Archived versions are available from the Comprehensive R Archive Network (CRAN) at https://cran.rproject.org/web/packages/joineRML/. joineRML is platform independent, requiring R version ≥ 3.3.0, and is published under a GNU GPL3 license. The dataset analysed during the current study is bundled with the R package joineRML, and can be accessed by running the command data(pbc2, package = ~joineRML~).
Author information
Affiliations
Contributions
All authors collaborated in developing the model fitting algorithm reported. The programming and running of the analysis was carried out by GLH. GLH wrote the first draft of the manuscript, with input provided by PP, AJ, and RKD. All authors contributed to the manuscript revisions. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Ruwanthi KolamunnageDona.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1
An appendix (appendix.pdf) is available that includes details on the score vector and Mstep estimators. (PDF 220 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Joint modelling
 Longitudinal data
 Multivariate data
 Timetoevent data
 Software