Additive and multiplicative hazards modeling for recurrent event data analysis

Background Sequentially ordered multivariate failure time or recurrent event duration data are commonly observed in biomedical longitudinal studies. In general, standard hazard regression methods cannot be applied because of correlation between recurrent failure times within a subject and induced dependent censoring. Multiplicative and additive hazards models provide the two principal frameworks for studying the association between risk factors and recurrent event durations for the analysis of multivariate failure time data. Methods Using emergency department visits data, we illustrated and compared the additive and multiplicative hazards models for analysis of recurrent event durations under (i) a varying baseline with a common coefficient effect and (ii) a varying baseline with an order-specific coefficient effect. Results The analysis showed that both additive and multiplicative hazards models, with varying baseline and common coefficient effects, gave similar results with regard to covariates selected to remain in the model of our real dataset. The confidence intervals of the multiplicative hazards model were wider than the additive hazards model for each of the recurrent events. In addition, in both models, the confidence interval gets wider as the revisit order increased because the risk set decreased as the order of visit increased. Conclusions Due to the frequency of multiple failure times or recurrent event duration data in clinical and epidemiologic studies, the multiplicative and additive hazards models are widely applicable and present different information. Hence, it seems desirable to use them, not as alternatives to each other, but together as complementary methods, to provide a more comprehensive understanding of data.


Background
Sequentially ordered multivariate failure time data or recurrent event time data are commonly observed in biomedical longitudinal studies. Examples include tumor recurrences, epileptic seizures, asthma attacks, migraines, infectious episodes, myocardial infarctions, injuries, and admissions to the hospital.
In general, standard hazard regression methods cannot be applied because of correlations between multivariate failure or recurrent event times within a subject. Adjustment is necessary for existing correlations, and more sophisticated analytic approaches are needed to obtain accurate estimates and efficient inferences. In the presence of the dependence between recurrent event times within a subject and subject-specific susceptibility across subjects, a variety of statistical methods have been proposed for the estimation of the covariate effect. In survival analysis, multiplicative and additive hazards models provide the two principal frameworks for studying the association between risk factors and recurrent event durations for the analysis of multivariate failure time data.
The majority of existing regression methods for analyzing multivariate failure or recurrent event time data assumes multiplicative covariate effects. Various authors have considered multivariate failure time models to be extensions of the Cox proportional hazards model [1]. The multivariate model with a Markov assumption, the conditional approach, the marginal approach, and the random effects approach are among them. Anderson and Gill proposed use of modeling under a Markov assumption [2]. Wei et al and Lee et al proposed use of the marginal approach [3,4]. Prentice et al proposed use of a semi-parametric model when multivariate failure times are conditionally independent, given the covariates [5]. Others used the random effect frailty model or the conditional frailty model for such recurrent event data analysis [6,7]. The popularity of these multiplicative models derives not only from their utility and wide applicability, but also from convention and the availability of statistical software. In general, consideration is not given to the possibility that the true underlying covariate effects may add to, rather than multiply, the baseline hazards. The semiparametric additive hazards model proposed by Lin and Ying [8] is the most closely connected analogue of the multiplicative Cox hazards model. Their additive hazards model assumes that covariates act in an additive manner on an unknown baseline hazard rate and that the effect of a covariate is time-invariant. Numerous authors advocated and utilized the additive hazards models for multivariate failure time data [9][10][11][12][13][14].
In this paper, we applied both multiplicative and additive models to the pediatric firearm victim's emergency department visit data. We considered the gap time model to be a multiplicative hazards model, as recommended for analysis of recurrent event time data by several authors [7,15]. We considered the Lin and Ying's model [10] to be an additive hazards model in our data analysis. The multiplicative and additive hazards models for analysis of recurrent event data with two scenarios were considered: (i) a varying baseline with a common coefficient effect and (ii) a varying baseline with an order-specific coefficient effect. The proposed models were applied to the emergency department (ED) visits of pediatric firearm victims, and difference between the models was examined.
Four additional sections comprise this paper. In Section 2, the models and methods for the analysis of recurrent event duration data are reviewed. In Section 3, a description of the ED visit study and the methods applied to this dataset are provided. Section 4 contains a discussion, in which the applicability and appropriateness of each model are discussed.

Methods
Within the framework of the multiplicative or additive hazards regression models, a variety of models have been proposed and utilized in real applications. Among the rich selection of different models, the gap time model as an extension of the multiplicative Cox proportional hazards model [5] and the Lin and Ying's additive model (L-Y model) [8,10] received the greatest attention due to easy interpretation of the covariate effects. These two models assume unspecified baseline hazards and constant covariate effects. In the models, we will assume that all censoring is non-informative and independent.

Basic Notations
Suppose that there are n subjects and that each subject can experience K failures or recurrent events. Suppose that censoring is non-informative, which means that knowledge of a censoring time for a subject provides no further information about the subject's likelihood of survival at a future time. Let T ik be the time when the kth failure occurs for the ith subject and C ik be the corresponding censoring time. T ik is measured from the subject's study enrollment and the censoring C ik occurs after the subject has been entered into a study to the right of the last known failure time; thus, it is right censoring. When T ik is subject to right censoring, the kth failure time X ik is a minimum of (T ik , C ik ), i.e., X ik is equal to T ik if the event was observed and is equal to C ik if it is censored. Let δ ik = I(T ik ≤ C ik ), where I(.) is an indicator function and takes the value 1 when T ik ≤ C ik and is 0 otherwise. Let Z ik be a covariate vector of p-dimensions for the ith subject at the kth failure. For each of the K failures, the hazard function for the ith subject with respect to the kth failure,l ik (t), is assumed to take additive or multiplicative forms.

Multiplicative hazards model
The gap time model requires the same assumptions as the Cox proportional hazards model, but they allow the baseline hazard to vary from recurrence to recurrence. Gap time is defined the time between two successive failures experienced by the same subject [5]. For the gap time model, the hazard function is where t is the time since a patient's study enrollment and t k-1 is the time of the (k-1)th failure. Note that l 0k (t) are unspecified baseline hazard functions varying with k = 1, .., K. The corresponding partial likelihood function [16,17] is where G i,k = X ik -X i,k-1 is the inter-event or gap time interval and Y jk (t) = I (G ik ≥ t) is a risk set indicator. β is a p-vector of regression coefficients of Z i .
In order to draw a semi-parametric inference on β for the model (1), the score functions U( β) are obtained by differentiating the logarithm of L( β) with respect to β .
The maximum partial likelihood estimatorˆ β is obtained by solving the corresponding score equation, ∂ ln L β ∂ β = 0 . When failure times are independent, the variance-covariance matrix is estimated from the inverse of the information matrix, I −1 (ˆ β) , called the "naïve" variance-covariance matrix; however, when failure times are dependent, I −1 (ˆ β) is not a good estimator of the variance-covariance matrix. When there are dependencies, the variance-covariance matrixQ (ˆ β) , the so-called "sandwich" or "robust" variance-covariance estimator, is is a data-based estimator, i.e., the cross-product of the empirical score residual matrix W(ˆ β) . Here, Therefore, it turns out that , which is called the "robust" variance-covariance estimate, and a detailed derivation is given by Wei et al [16] and Lin [17]. To account for the within-subject correlation, we used this robust "sandwich" method in the estimation of standard errors.

Additive hazards model
The additive hazards model is considered for multivariate survival data in which individuals may experience events of same or different types and in which there may also be correlation between individuals. A p-vector of the covariates Z ik in the additive hazards model acts additively on unknown baseline hazards, while it acts multiplicatively in the multiplicative model. The hazard function l ik (t) for the kth gap time t since a patient's last failure is in a linear form, Where l 0k (t) is the unknown and unspecified baseline hazard function for the kth gap time and β is a p x 1 vector of the regression coefficients. When there is only one failure event (i.e., K = 1), the model (3) reduces to a univariate additive hazards model [8].
It turns out that a convenient representation of the data is given by the counting process, N ik (t). With the commonly employed counting process notation, an atrisk indicator is defined as Y ik (t) = I(G ik ≤ t), and observed-event counting processes are defined as By the Doob-Meyer decomposition, where M ik (t) is a local square-integrable martingale with respect to F ik (t) [2]. As a result of the underlying correlation, M ik (t) is not a martingale with respect to the joint filtration generated by all the failures, censoring, and covariates up to time t [18]. From (4), If theˆ β denote the estimates of the true regression parameter β , then under the working independent assumption, the cumulative baseline hazard function Λ 0k (t) for the kth failure can be estimated bŷ . Lin and Ying (1997) proposed to estimate β from the following estimating function By substitutingˆ 0k (t), the above function is equivalent to The estimate of the model parameter β is obtained by solving the equation U( β) = 0 and we obtain the consis- where for any vector a, a ⊗2 = aa T . The variancecovariance matrix of β may be estimated by Using empirical process theory, U ( β ) is shown to be a sum of independently, identically distributed random variables and thus follows a zero-mean Gaussian process by the functional central limit theorem; see Pollard [19], page 53, or van der Vaart & Weller [20], Section 2.11. Using Taylor's series expansion and some probability arguments, n 1/2 (ˆ β − β) converges in distribution to a zero-mean normal distribution [11].

Study Description
The pediatric firearm victim's ED visit study was a retrospective cohort study. Data consisted of medical record reviews of follow-ups of firearm victims younger than 19-years-old who were presenting to the Pediatric Emergency Department/Trauma Center at the Children's Hospital of Wisconsin and all other hospitals in the Milwaukee metropolitan area between 1990 and 1997.
More detailed descriptions of the study design and profile have been published elsewhere [21,22]. Briefly, a total of 511 subjects were eligible for this study; this sample was taken from the pediatric firearm ED visit database. The events of interest are the times to ED revisit following the initial visit due to injury. Each subject experienced, at various times, a varying number of visits to the ED, which represent the whole observable history of his/her recurrences. Each subject has a number of ED visits and contributes several observations, which are dependent when there is inter-subject variation. Of these 511 subjects, 263 (51.5%) had at least one ED revisit during the follow-up period (median follow-up time = 3.2 years). Table 1 summarizes the number of events experienced by the 511 subjects during the follow-up period. A total of 571 occurrences of ED revisits were observed, with some persons experiencing a rather large number (up to six revisits). In our study, any ED revisit due to injuries was defined as a recurrent event, and up to four revisits per subject were used in the subsequent analysis, as too few subjects experienced more than four revisits. The main purpose of this study, however, was to illustrate the application of the multiplicative and additive hazards models to recurrent event duration data and provide comparisons between the models, rather than give universally valid estimates for ED revisits in a pediatric population.

Analysis of the dataset
We applied the multiplicative and additive hazards described in Section 2 to the ED visit gap time data, allowing a varying baseline with common coefficient effects and with order-specific coefficient effects, respectively. The adequacy of the models was assessed by residuals and Arjas plots. The ED revisit was defined as the recurrent event. When a subject is already in the ED, the subject is not at the risk for an ED revisit. Four baseline characteristicsage, gender (male ≡ 1; female ≡ 0), race/ethnicity (black ≡ 1; others ≡ 0), and parents (1 if subject had parents as guardian; 0 otherwise)were included in the models. We fitted the multiplicative gap time model and the L-Y additive hazards model. When we assume that the regression parameters were similar for all ED revisits or are interested in global covariate effects, we can adopt a model with a common covariate effect, i.e., β k = β for all k = 1, .., K. Table 2 shows the coefficient estimates and standard errors of the common covariate effects. When we assume that the regression parameters are different for each ED revisit order, we can use a model with an order-specific covariate effect. Table 3 shows the coefficients estimates and standard errors of the order-specific covariate effects. The estimates from the additive and multiplicative hazards models had the same signs, indicating the same directions of the covariate effects. The standard errors from the additive hazard model were smaller than those of the multiplicative model. However, while the p-values for the two models differ, the inferences were consistent. Almost all of the models from both the common and order-specific covariate effects showed that gender was the significant risk factor for ED revisits.
Both additive and multiplicative hazards models with varying baseline and common coefficient effects gave similar results with regard to covariates selected to remain in the model (Tables 2). Three covariates showed significant impact on ED revisits in both hazards models: gender, race/ethnicity, and age. The result obtained under the additive model with a varying baseline and common coefficient effect in Table 2 suggested that females tended to have more-delayed ED revisits, compared to males, and being older and black were associated with significantly shorter gap times. On the other hand, the gap time did not seem to be related to having a parent as a guardian; the p-values were 0.167 and 0.124 for the additive and multiplicative hazards model, respectively. Estimates of the ED revisit order-specific covariate effects for the models with varying baselines are shown in Table 3. For all orders, k = 1, .., 4, gender was again the only significant risk factor for an ED revisit in the multiplicative hazards model, but this was not true for the additive hazards model. In both models, age was significant for the 2 nd revisit but not for any other revisit. Ethnicity and having a parent as a guardian were not significant in the additive model for all k.
To illustrate the prediction of the survival probability for a given subject, Figures 1a-d and 2a-d (a dashed curve for the multiplicative hazards model; a solid curve for the L-Y additive hazards model) showed the estimated survival curves for a 15-year-old patient, a black male with a parent as guardian, under the multiplicative and additive hazards models with a varying baseline and common coefficient effect. The selected covariate values were roughly the sample median. Figures 1a-d show the estimated survival functions, based on the additive and multiplicative hazards models with a varying baseline and common coefficient effect were very similar. Comparing to the Kaplan-Meier estimate, the estimates of both additive and multiplicative hazards models were larger than the K-M estimate for all k. However, the differences of the estimate increased as the order increased. Figures 2a-d show the estimated survival curves based on the additive and multiplicative hazards models, with a varying baseline and order-specific coefficient effect, for each order k = 1, .., 4. For k = 1, 2, 3, the survival curves of these two models were almost identical. For the higher revisit orders (Figures 2c-d), the multiplicative hazards model had slightly higher survival curves than the survival curves estimated under the additive hazards model. The confidence intervals of the additive and multiplicative hazards models were similar except the order 4 and get wider as the revisit order increases because the risk set decreased as the revisit order increased. All these facts concerning the prediction of the survival probability were consistently observed when the values of the covariates were changed. The martingale and deviance residuals for these two models with common coefficient effect showed that the models fit well (Figure 3a-d). The residuals plots and the Arjas plots of the covariates for the models with a varying baseline and order-specific covariate effect showed that the models also fit well. (figures not shown). For all estimations in this study, we used SAS PHREG procedure to fit the

Discussion
Among recurrent event data, correlation between event durations within a subject exists. For an example, one can suppose that more frequently a subject experiences episodes of injury, the sooner the next injury is likely to occur. In this study, the additive and multiplicative hazards regression models for the recurrent event duration analysis were examined and illustrated with a real dataset. Differences in estimates from the models under (i) a varying baseline with a common covariate effect and (ii) a varying baseline with an order-specific covariate effect were compared using the pediatric firearm victim's ED visit data. The additive and multiplicative models revealed similar results with regard to covariates selected to remain in the model: gender, race/ ethnicity, and age. The estimated survival functions, based on the additive and multiplicative hazards models from our data, were similar. Our example showed that the goodness-of-fit of both multiplicative and additive hazards models was satisfactory. The standard errors increase as the order increases because the size of the risk set for the models decreases after each revisit. If the risk set decreases rapidly, then it yields estimates that are less reliable with a small risk set size. However, in our study, the standard errors increased moderately. The coefficients of the models cannot be compared directly because the coefficients of the former act in a multiplicative way on an unknown baseline hazard, whereas the coefficients of the latter act in an additive way on unknown baseline hazard or represent coefficient function for added risks. A naïve way of comparing the models would be comparing p-values, which would indicate the power of rejecting the null hypothesis for selected covariate in the models. In order to detect any difference between models in terms of prediction, a comparison between two modelbased survival curves with the nonparametric estimate of the survival function could be performed. In our study, the survival curves of these two models were larger than the Kaplan-Meier estimate for all order, but the differences were negligible.
The additive model is plausible for many applications and is often attractive in epidemiologic applications, for example, a study of diabetic patients [23]. In such a study, λ 0 is taken to be the baseline mortality of the standard population and b measures excess risk for the patients under study. The excess mortality is more appealing than the relative mortality to provide an inference on how the study population' mortality differs from that of the standard population.
The additive and multiplicative hazards models can capture the risk process for patients with average comorbidity profiles equally well. In cases where both the additive and multiplicative models fit the data fairly well, an additive specification may be preferred, due to the interpretation of the regression parameters. One of the major advantages of using the additive hazards model over the multiplicative hazards model is that the resulting regression parameter estimator has a closed form [13]. Regression coefficients from the additive model are more interpretable in public health research since they represent differences in event rates, as opposed to ratio [24]. A practical drawback of using the additive models is that the current standard procedure for fitting additive models is still limited, whereas statistical software for the multiplicative model is available and easy to use.
In the presence of the dependence between recurrent events in multivariate failure time data, frailty model have been proposed for the estimation of the covariate effect by incorporation of additional unobserved random frailty effects into standard survival models [6]. The covariate estimates in the frailty model are estimated conditionally on the unobservable frailty, and because of this, their interpretation is often ambiguous [7]. When the primary interest of investigation is a measurement of the dependence of correlated repeated events within a subject, the frailty model approach is adequate [25]. However, our study does not focus on the dependence measurement of recurrent events.
Recurrent event duration data are the archetypical example of series data, which differ from parallel multivariate failure time data. Because the study period is typically less than the first failure time, the marginal distribution of the second gap time is not identifiable unless within-subject failure times are independent. Even if the total times are censored independently, the subsequent failure times will be subject to induced dependent censoring [17,18]. To analyze such recurrent event duration data, the non-informative censoring assumption is required for the validity of the statistical analysis. However, when the recurrence is influencing a censoring mechanism such as dropout or death, censoring is informative about the event process; therefore, the non-informative censoring assumption is violated, and subjects in the risk set do not form a representative sample from the target population. An important assumption of the models examined in this paper is that the recurrent event process is independent of the censoring process. Suitable modification of the methodologies needs to be further studied to adjust for such informative censoring mechanisms related to terminal events in the recurrent event analysis [26]. In addition, there is relatively little information in the literature on the goodness-of-fit for multiple failure time models.

Conclusion
In this study, we illustrated and compared the additive and multiplicative hazards models for analysis of recurrent event durations. In summary, the choice between the additive and multiplicative models will typically be an empirical matter. Due to the frequency of recurrent event duration data in clinical and epidemiologic studies, the proposed additive and multiplicative methods are widely applicable. The two modeling approaches have sound biological bases, providing complementary information about the association between risk factors and death. An overall conclusion is that the additive and multiplicative hazards models present different aspects of the association between risk factors and the event durations. Hence, two hazards models give different information and it seems desirable to use themnot as alternatives to each other, but as complementary methodstogether to gain a more comprehensive understanding of the data. Practitioners may benefit from the use of these statistical models, which help in predicting the effect of one or more variables and in verifying their influence on study outcomes.