Fitting a shared frailty illness-death model to left-truncated semi-competing risks data to examine the impact of education level on incident dementia

Background Semi-competing risks arise when interest lies in the time-to-event for some non-terminal event, the observation of which is subject to some terminal event. One approach to assessing the impact of covariates on semi-competing risks data is through the illness-death model with shared frailty, where hazard regression models are used to model the effect of covariates on the endpoints. The shared frailty term, which can be viewed as an individual-specific random effect, acknowledges dependence between the events that is not accounted for by covariates. Although methods exist for fitting such a model to right-censored semi-competing risks data, there is currently a gap in the literature for fitting such models when a flexible baseline hazard specification is desired and the data are left-truncated, for example when time is on the age scale. We provide a modeling framework and openly available code for implementation. Methods We specified the model and the likelihood function that accounts for left-truncated data, and provided an approach to estimation and inference via maximum likelihood. Our model was fully parametric, specifying baseline hazards via Weibull or B-splines. Using simulated data we examined the operating characteristics of the implementation in terms of bias and coverage. We applied our methods to a dataset of 33,117 Kaiser Permanente Northern California members aged 65 or older examining the relationship between educational level (categorized as: high school or less; trade school, some college or college graduate; post-graduate) and incident dementia and death. Results A simulation study showed that our implementation provided regression parameter estimates with negligible bias and good coverage. In our data application, we found higher levels of education are associated with a lower risk of incident dementia, after adjusting for sex and race/ethnicity. Conclusions As illustrated by our analysis of Kaiser data, our proposed modeling framework allows the analyst to assess the impact of covariates on semi-competing risks data, such as incident dementia and death, while accounting for dependence between the outcomes when data are left-truncated, as is common in studies of aging and dementia. Supplementary Information The online version contains supplementary material available at (10.1186/s12874-020-01203-8).


Background
Semi-competing risks refers to the setting where interest lies in the time-to-event for some so-called non-terminal event, the observation of which is subject to some terminal event [1]. In contrast to standard competing risks, where each of the outcomes under consideration is typically terminal (e.g. death due to some cause or another), in the semi-competing risks setting it is possible to observe both events on the same study unit, so that there is at least partial information on their joint distribution [1,2]. Take as an example the study of dementia among the elderly [3], a complex neurocognitive condition that is estimated to affect nearly 6 million individuals aged 65 and older in the US [4], a number that has been projected increase to 13.9 million by 2060 [5]. It is known that the risk of death is higher among those who are diagnosed with dementia [6]. As such, studies seeking to investigate risk factors for dementia must contend with death as a competing risk, which precludes the subsequent observation of dementia. However, it is possible to observe both outcomes among individuals who die following a diagnosis of dementia. This information can potentially increase efficiency of results and be used to assess the dependence between the nonterminal and terminal events.
Towards the analysis of semi-competing risks data, the statistical literature has focused on three broad frameworks that seek to exploit the joint information on the time to the non-terminal event and the time to the terminal event [7]: those based on copulas [1,[8][9][10]; those framed from the perspective of causal inference [11,12]; and, those based on the illness-death multi-state model [2,[13][14][15][16]. In this paper, we focus on the last of these approaches, for which the philosophical underpinning is that patients begin in some initial state at time zero and may transition into the non-terminal and/or terminal state [14,[16][17][18]. Analyses typically proceed through the development of models for transition-specific hazard functions (which dictate the rate at which patients experience the events), often with the use of subject-specific frailties, which can be viewed as individual-specific random effects that acknowledge the heterogeneity across individuals that is not accounted for by covariates [19,20]. Moreover, the shared frailty accounts for dependence between the nonterminal and terminal events, which can be quantified from an estimable frailty variance parameter.
In the analysis of time-to-event outcomes, data are subject to left-truncation or delayed entry when subjects are enrolled into a study after the time origin of interest. Lefttruncation is common in the study of aging and dementia, where the age scale is commonly taken to be the time scale [21][22][23]. In this setting, sampling is biased toward longer follow-up times since patients are typically only included in the study if they are dementia-free at study entry. The analysis of left-truncated time-to-event data should apply statistical methods that account for this bias. Although current methods exist for analyzing left-truncated semicompeting risks data via a standard illness-death model (without a shared frailty) [24,25] and have been applied to Alzheimer's disease [26][27][28][29], to our knowledge, there are no published methods in the literature for fitting an illness-death model with shared frailty to left-truncated semi-competing risks data. The purpose of this paper is to fill this gap by providing a modeling framework and openly available code for implementation.
In this paper, we provide methods for fitting an illness-death model with shared frailty to left-truncated semi-competing risks data. In "Methods" section, we present the model specification ("Model specification: Illness-death model" section), methods for estimation and inference ("Estimation and inference" section), a brief simulation study ("Simulation study" section), and an analysis of data from Kaiser Permanente Northern California examining the relationship between educational level and incident dementia and death ("Assessing the impact of education level on incident dementia in a large US cohort" section). We present results in "Results" section. We conclude with a discussion in "Discussion" section and conclusions in "Conclusions" section.

Model specification: Illness-death model
Semi-competing risks refers to the setting where interest lies in a nonterminal event that is subject to dependent censoring by a terminal event [1]. This paper focuses on modeling semi-competing risks data using the illnessdeath multistate model, where in the study of dementia we take the nonterminal and terminal events to be dementia diagnosis and death. Figure 1 illustrates the possible individual trajectories following time of study accrual. Let T 1 and T 2 denote the nonterminal and terminal event times, respectively, following a well-defined time origin. Following the multistate modeling literature, the illness-death model is characterized by the transition hazards: Although a variety of hazard regression models can be adopted, including Cox-type multiplicative hazard regression models [30], additive models [31] and accelerated  [32], this paper assumes the first of these which is prevalent in the multistate modeling literature: for for where λ k,0 (·) are baseline hazard functions and β k are loghazard ratio regression parameters, for k = 1, 2, 3. We allow for two common specifications of λ 3,0 (t 2 |T 1 = t 1 ), a Markov model defined by λ 3,0 (t 2 |T 1 = t 1 ) = λ 3,0 (t 2 ) or a semi-Markov ('clock-reset') model defined by λ 3,0 (t 2 |T 1 = t 1 ) = λ 3,0 (t 2 − t 1 ). In each of expressions (4)-(6), γ is a common subject-specific frailty with mean 1.0 and variance θ, which serves two related purposes. First, similar to a random intercept in a generalized linear mixed model [33], the frailties serve to accommodate between-subject heterogeneity that is not accounted for by the covariates included in the linear predictors. Second, the frailty induces dependence between the non-terminal and terminal events since a patient with frailty larger than one will be at higher risk of both the nonterminal and terminal events than the population average (conditional on covariates).
To complete the specification of the model, the baseline hazard functions and frailty distribution must be specified. For the latter, we adopt a Gamma distribution, which is commonly used because closed-form expressions for the marginal likelihood contributions can be obtained. It is important to note that although the shared frailty term can only account for or represent positive dependence between the hazard functions [20], it is possible to have negative correlation between the hazard functions corresponding to T 1 and T 2 , e.g. if λ 1,0 (t) is monotonically increasing and λ 2,0 (t) is monotonically decreasing.
The baseline hazard functions can be specified parametrically [34], semi-parametrically [15,26], or nonparametrically [2]. In this paper, we consider Weibull baseline hazards of the form λ k,0 (t) = α k κ k t α k −1 , for k = 1, 2, 3, which are commonly used in survival analysis, and flexible B-spline baseline hazard functions that satisfy log λ(t) = B(t), where B(t) is a polynomial B-spline function of degree d with unique knots at t 0 < t 1 < · · · < t K < t K+1 and defined for t ∈[ t 0 , t K+1 ]. For continuous time-to-event outcomes with support on the positive real line, we let t 0 = 0 and t K+1 to be the largest follow-up time. Note that the B-spline function B(t) is parametrically defined as a linear combination of B-spline basis functions B b,d (t) of degree d, where η b are parameters, known as control points, and the B-spline basis functions are defined for t ∈[ t 0 , t K+1 ] [35].

Estimation and inference Observed data
Let T i1 and T i2 denote the nonterminal and terminal event times and X i a vector of patient-specific covariates that includes X i1 , X i2 and X i3 for individual i used in (4)- (6). We assume that the observed data are subject to right-censoring and left-truncation. Let C i and L i denote the right-censoring and left-truncation times which we assume are independent of T i1 and T i2 conditional on X i . Note that we assume the convention of Xu and colleagues [2], by setting T 1 = ∞ for individuals who experience the terminal event in the absence of the nonterminal event.
In the analysis of dementia on the age scale, where the nonterminal and terminal events are dementia diagnosis and death and the left-truncation time is study entry, prevalent dementia cases are typically excluded from the analysis since the primary outcome of interest is dementia [27], i.e. we require that T i1 > L i . The observed data for the i th individual is

Likelihood
Towards developing a form of the likelihood, we first present the joint density, (T 1 , T 2 ), in the absence of lefttruncation below [15,25]: where Under the assumption of independent truncation, the joint density of left-truncated semi-competing risks data, (L, T 1 , T 2 ), is: which is a product of the conditional density of (T 1 , T 2 ) given L and the marginal density of L [24,25,36] in (8), where g T 1 ,T 2 is given in (7), g L (·) is the density function corresponding to L and S T 1 (t|X) = P(T 1 > t|X) = S 1 (t|X)S 2 (t|X). When the distribution of left-truncation times is unknown, the maximum likelihood estimate can be obtained from the conditional likelihood, ignoring the marginal likelihood of the left-truncation times [24,25,36,37]. Based on the observed data, D i , there are four possible data scenarios and thus likelihood contributions corresponding to the distinct combinations of the nonterminal and terminal event indicators, δ 1 and δ 2 . Using (7) and the conditional likelihood expression (on the left) in (8), the observed data likelihood for model parameters φ = (ξ 1 , ξ 2 , ξ 3 , β 1 , β 2 , β 3 , θ), where ξ k are the baseline hazard parameters and β k are the vectors of regression coefficients for transition k = 1, 2, 3 and θ is the Gamma frailty variance, is and Since the individual-specific frailty terms, γ i , are not observed, we marginalize the likelihood with respect to γ i . Detailed derivations of the marginal likelihood components in (9) are included in Section A of the Additional file 1.

Estimation and inference
We will use maximum likelihood for estimation and inference of model parameters using the marginal loglikelihood in Section A of the Additional file 1 [38]. Let U (φ) = ∂/∂φ log L(φ) denote the score function. Using standard arguments, under certain regularity conditions and a correctly specified model, the maximum likelihood estimator of φ, denoted φ, is the solution to U (φ) = 0 and is consistent for the true φ 0 as n −→ ∞. In addition, can be estimated via the inverse of the observed information matrix.
From our experience fitting the proposed model, optimization of the log-likelihood requires careful consideration of the numerical optimization algorithm used and the choice of starting values. For modeling fitting, we used a quasi-Newton non-linear numerical optimization algorithm [39] as implemented in the optim function in R [40] to maximize the log-likelihood. For Weibull baseline hazard models, starting values for the non-linear optimization were generated by fitting the univariate Weibull regression models for each transition (4)- (6). For B-spline parameterizations of the baseline hazard functions, we used the bSpline function in the splines2 package [41] in R [40] to generate B-spline basis functions and obtained starting values for hazard parameters as follows: fit univariate Cox models for each transition hazard model; smoothed the estimated cumulative baseline hazard functions using linear interpolation; obtained smoothed baseline hazards function via numerical differentiation followed by loess; and found the control points that minimized the distance between the smoothed loghazard functions and B-spline functions by least squares. The frailty variance θ was initialized to value 0.5.

Simulation study
We performed a set of simulations to investigate the operating characteristics (e.g., bias and coverage) of the model implementation for both Weibull and B-spline parameterizations of the baseline hazard functions.
Data were generated from the model (4)-(6) assuming Weibull baseline hazard functions using simID from the SemiCompRisks package [42,43] in R [40]. We set model parameters to those obtained from fitting the model to dementia data, where the nonterminal and terminal events were dementia diagnosis and death, the origin was age 65, the left-truncation time was study entry, and a dichotomous variable was included as a covariate in all three regression models. The baseline hazard and frailty parameter values used in the simulations were: log α 1 = 1.05, log κ 1 = −9.98, log α 2 = 1.15, log κ 2 = −10.01, log α 3 = 0.92, log κ 3 = −5.92, and log θ = −1.39. The dichotomous covariate was drawn from a Bernoulli distribution with probability 0.57 and the regression coefficients for the three transitions were β 1 = −0.03, β 2 = −0.33 and β 3 = −0.11. Age at study entry was drawn from a uniform distribution with range between 65 and 78 years. We administratively censored at age 95 years.
Based on these set parameters, we simulated R = 1, 000 datasets of sample size n = 5, 000. For each simulated dataset, we fit both the model with Weibull and B-spline baseline hazard functions. For the fitted Weibull models, we reported the: mean of the parameter estimates; mean of the estimated analytical standard errors; standard deviation of the distribution of parameter estimates (empirical standard error); and the coverage probability (the proportion of Wald-based 95% confidence intervals that contained the true parameter). For the fitted B-splinebased models, we reported the operating characteristics listed for the Weibull-based fitted models for the regression and frailty parameters only. To assess the B-spline fit of the transition baseline hazard functions, for each of the three transitions, we plotted the estimated baseline hazards from the fitted models, which we compared to the true baseline hazard. We also compared the estimated baseline hazard functions to kernel smoothed Nelson-Aalen baseline hazard estimates obtained from fitting separate Cox models to each transition using the coxphHaz function from the biostat3 package [44] in R [40]. Note that the univariate Cox models for the 1-and 2-transitions accounted for left-truncation and the Cox model on the 3-transition was performed only among those who experienced the nonterminal event. To quantify the difference between the true baseline hazard function and those estimated from the B-spline model and the smoothed Nelson-Aalen, we calculated the median of the integrated squared error [26].

Assessing the impact of education level on incident dementia in a large US cohort
We applied the methodological approach presented in "Model specification: Illness-death model" and "Estimation and inference" sections to data from Kaiser Permanente Northern California (KPNC) where the goal of the analysis was to examine the association of education level on incident dementia. The initial analytic cohort consisted of 36,134 individuals who were at least 65 years old, dementia-free by 1997 and KPNC members as of January 1, 1996 (the earliest date when electronic medical records were reliably available defined as study entry) and participated in the Multiphasic Health Checkups which began in the 1960s and continued through 1996 during which information on participant educational attainment was captured. We categorized the exposure of educational attainment level based on the highest level of completion in the following groups: elementary or high school; trade degree, any college or college graduate; and post-graduate study. We focused the analyses on the four most prevalent race/ethnic subgroups, which were White, Black, Asian and Latinx, resulting in a final analytic dataset comprised of 33,117 KPNC members.
We incorporated the strong force of mortality (of all causes) of individuals aged 65 or greater in the analysis via the illness-death model in (1)-(3), where the nonterminal event of interest was incident dementia diagnosis and the terminal event was death. We assumed the semi-Markov or 'clock-reset' approach for the 3transition model (dementia → death). Follow-up was administratively censored at the earliest of age 90 or end of study defined by September 30, 2017. In our primary analysis, we used a shared frailty to account for the dependence between incident dementia and death that is not accounted for by covariates. We assumed that time was on the age scale, starting at age 65, and thus left-truncated since members were not followed before age 65. For this time origin, left-truncation was accounted for using the approach in "Estimation and inference" section, assuming B-spline parameterizations of the transition baseline hazard functions. When B-spline baseline hazards are assumed, model fitting requires the specification of the internal knots (number and placement) and polynomial degree of the B-spline function. We fit a suite of models that varied the number of internal knots (we considered 1, 2 or 3 knots), which were placed at equally spaced percentiles of the observed event times, and varied the B-spline polynomial degree (linear, quadratic, cubic). The final model was chosen based on the largest loglikelihood. Time was scaled by a factor of five so that the interpretation of hazard ratios are in terms of 5-year increments. We controlled for sex (reference group: male) and race/ethnicity (reference group: White) in all hazard transition models via covariate adjustment.
In sensitivity analyses, we fit models: 1) without a shared frailty; 2) that defined study entry as the time origin (which we refer to as time-on-study models); and 3) assumed Weibull baseline hazard functions. When the origin was taken to be study entry, for the 1-and 2transition hazard models we adjusted for age at study entry (centered at age 65); for the 3−transition hazard model, we adjusted for age at dementia (centered at the cohort mean which was 83 years). These models were fit using the SemiCompRisks package in R [42].
Note that this was an electronic medical record (EMR)based study. KPNC is an integrated health care delivery system of approximately 4.3 million members that adopted an EMR system as early as 1993. All healthcare utilization, including care visits (inpatient, outpatient (ambulatory), emergency department, telephone and video visits), pharmacy fills, etc., are captured in the EMR. Dementia diagnoses and death dates were pulled from the EMR. Dementia diagnoses were based on International Classification of Diseases (ICD) codes that were made during a member visit. Deaths were obtained from the KPNC mortality linkage file which includes data from multiple sources (internal reporting, California state death records, Social Security Administration). Followup was not prescribed within the cohort and could vary by individual. However, health system utilization is very high in this population with a median of 97 visits with a physician during the study period (minimum: 0.003 years, median: 15.4 years, maximum: 21.2 years). Details regarding the dataset, including outcome definition, exposure variable and covariates, can be found in the Additional file 1: Section C.1.

Simulation study
The operating characteristics of the regression estimates and the frailty parameter are displayed in Table 1. Both modeling approaches resulted in regression estimates with little bias and good coverage. For the Weibull-based model, the estimates of the frailty variance parameter were slightly underestimated with conservative coverage. For the B-spline based model, the mean of the frailty variance parameter was upward downward. The simulation results for the baseline hazard parameters corresponding to the fitted Weibull-based models are displayed in Additional file 1: Table B.1 and exhibit negligible bias and good coverage. In Fig. 2, we display the true baseline hazard functions in red and the estimated baseline hazard functions from the fitted B-spline-based models in the left column and the corresponding kernel smooth Nelson-Aalen estimates from separate univariate Cox models in the right column. Deviations in the estimated baseline hazard functions for both approaches are likely due to data sparsity in the latter half of the follow-up time. The median integrated squared error comparing each of the true baseline hazard functions to the estimated baseline hazard functions over the range of observed event times are displayed in Fig. 2 and indicate that the Bspline-based model provides estimates of the baseline hazard functions that are close to the truth and are One thousand data sets were generated under a model with Weibull baseline hazard functions described in "Simulation study" section with n = 5, 000 and were fit to models with Weibull and B-spline parameterized baseline hazard functions. Point estimates and analytical standard errors (SE a ) were averaged across estimates. Empirical standard errors (SE e ) correspond to the standard deviation of the parameter sampling distributions. Coverage was calculated as the proportion of estimated 95% Wald-based confidence intervals that contained the true parameter comparable, if not slightly better, than the kernel smooth Nelson-Aalen estimates corresponding to separate Cox models.

Assessing the impact of education level on incident dementia in a large US cohort
A description of the semi-competing risks data and demographic variables used in the applied analysis are presented in Table 2. Of the 33,117 members in our analytic dataset, elementary or high school was the highest level of education in 43% of members, followed by trade school, some college or a college degree in 40%, and post-graduate study in 16.7%. More than half (56.6%) were female and 72.0% were of White race. By the end of the study, nearly half (47.1%) of members died without a diagnosis of dementia, with 19.1% censored before dementia or death, 5.0% alive with dementia, and 28.9% died carrying a diagnosis of dementia. In examining the outcome prevalences among the three education level groups, there were more dementia diagnoses and all-cause deaths among members in the elementary and high school group. The regression estimates from the fitted models are presented in Table 3 and Table C.1 of the Additional file 1. Across all models and time origins, increasing education level was associated with a decreased risk of dementia. In the primary analysis (on the age scale with the inclusion of a shared frailty presented in the first two columns of Table 3), the estimated hazard ratio (HR) and 95% confidence interval (CI) was 0.87 (0.83, 0.92) comparing those with a trade degree, some college or a college degree to those who had an elementary or high school education. This protective effect was amplified when comparing those with post-graduate study to those who had an elementary or high school education (HR: 0.76, 95% CI: 0.71, 0.81). There was some variation in the estimated HR across different time origins and models. The hazard ratio comparing those with post-graduate study to those who had an elementary or high school education for the time-on-study model with shared frailty (third column of Table 3) was closer to the null with value 0.80 and to a greater degree in the time-on-study model without shared frailty (last two columns of Table 3) with an estimated HR of 0.83. Similarly, across all models and time origins, an education level of some college or more compared with a trade degree or less was associated with a decreased risk of death, with estimated HR and 95% CI in the primary model of 0.84 (0.81, 0.88) comparing those with a trade degree, some college or a college degree to those who had an elementary or high school education and 0.68 (0.65, 0.72) comparing those with post-graduate education to those who had an elementary or high school education. In our primary model, increased education level was associated with a reduced risk of death following dementia diagnosis. However, the time-on-study models (columns 3, 4, 7 and 8 of Table 3) provide no evidence of this association with point estimates that suggest an opposite effect.
In the models with a shared frailty and B-spline baseline hazard functions (first four columns of Table 3), the estimated frailty variance parameter, θ was 0.41 (0.37, 0.45) when age was the time scale and 0.25 (0.20, 0.31) for the time-on-study analysis, suggesting a greater degree of dependence between time-to-dementia and all-cause death when age is the time scale. When Weibull baseline hazards were assumed (presented in Table C.1 of the Additional file 1), the estimated frailty variance parameter, θ was comparable with value 0.36 (0.32, 0.40) when age was the time scale. However, the estimated frailty variance was null for the time-on-study analysis. This finding is difficult to explain, but an examination of the estimated baseline hazard functions from the frailty models indicate differences in the underlying baseline risk assumed in the various models. Figure 3 presents the estimated baseline hazard functions under B-spline or Weibull parameterizations and a shared frailty term for two time origins: 1) age 65 (age-scale, adjusted for left-truncation), and 2) study entry (time-on-study). After fitting a suite of models that varied the number of internal knots and the B-spline polynomial degree, the specification with the largest log-likelihood was defined by one internal knot with linear B-splines for the 1-and 2-transitions and two internals knots with cubic B-splines for the 3-transition. We observe that the baseline hazard for the 1-and 2transitions were similar for the B-spline and Weibull specifications and increasing over time, more so when time was on the age scale (left panel) than time-on-study (right Models were fit with and without a shared frailty term. Two time origins were considered: 1) age 65 (age scale, left-truncated data); and 2) study entry (time-on-study) * p-value for a categorical variable with k > 2 levels (educational attainment, race/ethnicity) is based on a (k − 1)-degree of freedom Wald test of linear hypotheses panel). For the 3-transition (death following dementia), the estimated baseline hazard functions were quite different for the B-spline and Weibull parameterizations, as the Weibull can only accommodate baseline hazards of the form of a power function, while B-splines can approximate a range of functional forms. For both B-spline and Weibull parameterizations, the estimated baseline hazard for the 3-transition indicate that the risk of death is highest shortly after dementia diagnosis, which has been observed in a study of survival after dementia diagnosis in five racial/ethnic groups [45]. This is followed by either decreases then increases (based on the B-spline parameterization) or decreases over time (based on the Weibull parameterization).

Discussion
For the analysis of left-truncated semi-competing risks data, we have provided methods and software for fitting an illness-death model with shared frailty assuming Weibull or B-spline baseline hazards. These methods were used to estimate the association of education level and dementia accounting for the competing risk of death in a cohort of 33,117 Kaiser Permanente Northern California members. We found a dose-response relationship between educational attainment and incident dementia, with a decreased risk of dementia associated with increasing levels of education, after adjusting for sex and race/ethnicity. The impact of education level and incident dementia is still not well understood, with published studies reporting both protective and null effects [46,47]. Our study supports that higher education is associated with a lower risk of dementia in a large US cohort. Note that the conclusions drawn for the outcome of interest (dementia) from the illness-death model with shared frailty aligned with those from alternative models that we considered, which omitted the shared frailty or used study entry as the time origin (as shown in Table 3). We expect that there will be variation in the estimates from these models depending on the data at hand, as was observed in the estimates of education level on the risk of death following dementia diagnosis in our study.
In our examination of model fitting operating characteristics via simulated data, we found that the regression parameters were estimated with negligible bias and good coverage. However, on average the frailty variance parameter was slightly underestimated for the Weibull baseline hazard parameterization and overestimated for the B-spline baseline hazard parameterization. For the case of the Weibull baseline hazard parameterization, the coverage was conservative for a sample size 5,000, but was closer to 95% when the sample size was increased 10,000 (see Table B.2 in the Additional file 1). For the case of the B-spline baseline hazard parameterization, the coverage was lower than 95% due to the bias in the frailty variance parameter. It is important to note that primary interest in the methods presented in this paper are the regression parameters. The frailty, and corresponding frailty variance parameter, allow us to further account for the dependence between the nonterminal and terminal events beyond covariate adjustment, analogous to a random effect in a random effects model. Similar to a random effects model, the primary interest lies in the mean outcome model and regression estimates; the variance parameter of the normally distributed random effects are typically of secondary interest.
In the analysis of dementia diagnoses such as those presented in this paper, prevalent cases at study entry are typically excluded. However, the likelihood in "Likelihood" section can be easily updated (see Additional file 1: Section D) to include prevalent nonterminal cases. Note that in the literature, there are two approaches for handling prevalent nonterminal cases in the analysis of left-truncated semi-competing risks data. The approach we take conditions on the history up to the left truncation time as in [25,37] so that prevalent nonterminal cases only contribute to the estimation of λ 3 . Estimation is straight-forward assuming an illness-death model with shared frailty since the frailty term, γ , can be easily integrated out. Alternatively, Saarela and colleagues [24] provided methods for estimation that conditions on the left-truncation time only, so that prevalent cases contribute to the estimation of all transition hazards. This approach is more efficient as it uses more of the data, but is computationally more intensive as it involves numerical integration. This approach does not accommodate an illness-death model with shared frailty well since the integration of the shared frailty term is not straight-forward.
In our approach to fitting an illness-death model with shared frailty to left-truncated semi-competing risks data, we have considered fully-parametric specifications of the baseline hazard functions, Weibull and B-spline. Both functional forms are flexible and can approximate a wide range of baseline hazard functions. At the time of submission, a pre-print by Gorfine et al. [48] proposed a semi-parametric approach to the illness-death model with shared frailty using a pseudo-likelihood approach to estimating the regression parameters and baseline hazard functions that accommodates left-truncated semicompeting risks data. While the illness-death model with shared frailty in this paper was formulated using hazard models that are conditional on the frailty in (4)-(6), Gorfine et al. [48] focused on marginal Cox hazard models. This is analogous to the conditional and marginal approaches to modeling mean outcomes in the presence of clustering via mixed models [49] and generalized estimating equations [50], respectively. Thus our approaches are complementary, filling a gap in the literature and allowing the analyst options for fitting an illness-death model with shared frailty to left-truncated semi-competing risks data.
One of the reviewers pointed out that dementia diagnoses may be subject to interval-censoring. To explore the possibility and/or extent of interval-censoring in the cohort, we looked at the patterns of inpatient and outpatient visits (during which dementia might be assessed) among two groups of members: those who were diagnosed with dementia during the study, and those who died without a dementia diagnosis. The concern is that long gaps between visits would lead to imprecise dementia diagnosis dates in the former and missed opportunities for dementia diagnoses in the latter. We found that among those who were diagnosed with dementia during the study, 81% had a visit with a physician within 60 days prior to the diagnosis date in the EMR. For those who died without a dementia diagnosis, 90% had a visit within 60 days prior to death. Plots of individuallevel visit patterns over the study period (see Supplementary File Figures C1 and C2) among members of these two groups illustrate that utilization is high in this cohort.
Based on these data, we believe that interval-censoring is not of major concern in our data, as it is in prospective studies of Alzheimer's disease or dementia, such as PAQUID [51,52] and the Adult Changes in Thought Study [53,54], where dementia screening can be years apart. While analyses of data from those prospective studies are indeed complicated by interval-censoring, they were designed for the purpose of understanding incident dementia with identification of dementia cases based on a battery of neuropsychological testing and confirmation by a neurologist. An EMR-based analysis of a cohort of high care utilizers may avoid interval-censoring, however, may capture dementia cases with less rigor. At KPNC, a similar set of EMR code used to identify dementia diagnoses was shown to have a sensitivity of 77% and a specificity of 95% compared with a consensus dementia diagnosis utilizing a neuropsychiatric battery, structured interviews, physical examination, and medical records review. If interval-censoring were evident in our data, modeling should be updated to account for interval-censoring. This can be done by updating the likelihood function, as in Touraine, et al. 2017 [52]. In the setting of a shared frailty illness-death model presented in this paper, this is an avenue for future research.
It is important to mention that we provide methods for fitting a shared frailty illness-death model subject to left-truncated data when the covariate of interest is fixed with time. As one reviewer aptly pointed out, time-varying covariates may also be of interest to an analyst. We believe that the modeling framework specified in this paper can incorporate time-dependent covariates. However, deriving the likelihood function for a shared frailty model requires marginalization (integration) of the frailty term, which is complicated when time-dependent covariates are used and, such, beyond the scope of this paper. We intend to explore the implementation of the proposed extension with time-varying covariates in future work.

Conclusions
As illustrated by our analysis of Kaiser data, our proposed modeling framework allows the analyst to assess the impact of covariates on semi-competing risks data, such as incident dementia and death, while accounting for dependence between the outcomes when data are lefttruncated, as is common in studies of aging and dementia. This approach has the potential to be applied to a wide range of settings beyond the field of aging.