Relaxing the assumption of constant transition rates in a multi-state model in hospital epidemiology

Background Multi-state models are being increasingly used to capture complex disease pathways. The convenient formula of the exponential multi-state model can facilitate a quick and accessible understanding of the data. However, assuming time constant transition rates is not always plausible. On the other hand, obtaining predictions from a fitted model with time-dependent transitions can be challenging. One proposed solution is to utilise a general simulation algorithm to calculate predictions from a fitted multi-state model. Methods Predictions obtained from an exponential multi-state model were compared to those obtained from two different parametric models and to non-parametric Aalen-Johansen estimates. The first comparative approach fitted a multi-state model with transition-specific distributions, chosen separately based on the Akaike Information Criterion. The second approach was a Royston-Parmar multi-state model with 4 degrees of freedom, which was chosen as a reference model flexible enough to capture complex hazard shapes. All quantities were obtained analytically for the exponential and Aalen-Johansen approaches. The transition rates for the two comparative approaches were also obtained analytically, while all other quantities were obtained from the fitted models via a general simulation algorithm. Metrics investigated were: transition probabilities, attributable mortality (AM), population attributable fraction (PAF) and expected length of stay. This work was performed on previously analysed hospital acquired infection (HAI) data. By definition, a HAI takes three days to develop and therefore selected metrics were also predicted from time 3 (delayed entry). Results Despite clear deviations from the constant transition rates assumption, the empirical estimates of the transition probabilities were approximated reasonably well by the exponential model. However, functions of the transition probabilities, e.g. AM and PAF, were not well approximated and the comparative models offered considerable improvements for these metrics. They also provided consistent predictions with the empirical estimates in the case of delayed entry time, unlike the exponential model. Conclusion We conclude that methods and software are readily available for obtaining predictions from multi-state models that do not assume constant transition rates. The multistate package in Stata facilitates a range of predictions with confidence intervals, which can provide a more comprehensive understanding of the data. User-friendly code is provided. Supplementary Information The online version contains supplementary material available at (10.1186/s12874-020-01192-8).

The Aalen-Johansen estimator generalises the Kaplan-Meier estimator to Markov multistate processes. It was proposed by Aalen and Johansen [1] and has been discussed in detail by Andersen et al [2]. Following Allignol et al [3], let N ab (t) be the number of direct transitions a → b occurring up to time t and Y a (t) be the number of individuals in state a at the time immediately before t, i.e. the number of individuals at risk of the transition a → b. The matrix of cumulative transition hazards A(t) can be estimated by the Nelson-Aalen estimator [2]: The Aalen-Johansen estimator of the transition probabilities is then: Where Â ab (t k ) a = b is the number of observed direct a → b transitions divided by the number of individuals in state a at the time immediately before t k . The diagonal entries Â aa (t k ) are such that the row equals 0. The Aalen-Johansen estimator is a matrix of step-functions, changing only at the times when an event is observed. Expected length of stay can then be easily calculated as a summation of rectangles.

Exponential Model
For transitions i = 1, ..5, the transition rates h i (t) for the exponential model are a non-negative

Royston-Parmar Model (4 degrees of freedom)
Royston-Parmar models utilise restricted cubic splines to model the effect of time on the log cumulative hazard ln{H i (t)} scale. A Royston-Parmar model with K knots can be fitted by creating K − 1 derived variables [4]. For models with K = 5, there will be 4 derived variables for each transition i: z ij , j = 1, ...4 and 5 knots for each transition i: k il , l = 1, ...5. The equation for ln{H i (t)} and all necessary components are [4]: Where γ ij are the parameters to be estimated from the data. The internal knot locations k i2 , k i3 and k i4 were chosen as the 25 th , 50 th and 75 th centiles of the distribution of uncensored log event times for transition i, respectively [4]. k i1 and k i5 are the boundary knots, located at the minimum and maximum of the uncensored log event times [4]. The model described here has 5 parameters: the 4 derived variables and the constant term. When fitting this model in Stata using merlin or stpm2, the user would specify 4 degrees of freedom, as the intercept is included by default. For consistency with the programming, we have named this model "RP(4)" (Royston-Parmar with 4 degrees of freedom) in the main text.
The transition rates involve the derivative of the cubic spline functions. See Royston and Parmar [5] and Lambert and Royston [4] for more details on these models.

AIC Model
The "AIC" model involved choosing the distribution with the lowest AIC for each transition. The candidate models were: exponential, Weibull, Gompertz, log-logistic, log-normal, generalised gamma and Royston-Parmar models with 2 to 5 degrees of freedom. The AIC results are given in the Results section under the subsection Transition Rates. The chosen distributions for each transition are repeated here: Parametrisation of the generalised gamma and log-normal distribution are given below. The parameters to be estimated are specific to each transition i, but the subscript has been dropped for easier viewing.

Generalised Gamma
The parametrisation follows the documentation in Stata for streg [6]. The transition rate is defined as h(t) = f (t)/S(t), where the three parametrised gamma density and survivor functions are defined as: Where γ = |κ| −2 , z = sign(κ){ln(t) − µ}/σ, u = γexp(|κ|z), Φ(.) is the standard normal cumulative distribution function and I(a, x) is the incomplete gamma function. The parameter µ and ancillary parameters κ and σ are to be estimated from the data. For more details see the help file for streg [6].

Log-Normal
The parametrisation follows the documentation in Stata for streg [6] and also that of Royston [7]. The transition rates are defined as h(t) = f (t)/S(t): Where Φ(.) is the standard normal cumulative distribution function. µ is the location parameter and σ 2 is the variance of random variable T . For more details see Royston [7] or the help file for streg [6].