### Multi-state model for hospital infections

We focus on estimating the cLOS in the hospital due to HIs. We study the amount of bias which can occur when estimating the cLOS by treating patients that die as censored.

To do so, we describe the data setting with a multi-state model as proposed by e.g. [7]. Figure 1 displays this model (model A), which is a multi-state model with states, 0= admission, 1= infection, 2= discharge alive and 3= death. For simplicity we assume that the hazard rates are constant over time so that we can focus on the key points concerning the censoring of the death cases. We denote *α*_{
ij
}(*t*)=*α*_{
ij
} as the hazard of moving from state “i” to state “j”. An example hazard is,

$${} \alpha_{01}\!(t)\cdot\Delta t \!\approx\! P(\text{HI acquired by time t} \,+\, \Delta t | \text{no HI up to time t}). $$

The actual hazard *α*_{01}(*t*) is obtained by taking limits as *Δ**t*→0. We define the hazard rates, *α*_{01}= infection hazard rate; *α*_{02}= discharge hazard rate without infection; *α*_{03}= death hazard rate without infection; *α*_{12}= discharge hazard rate with infection and *α*_{13}= death hazard rate with infection. Under a constant hazards assumption, one estimates *α*_{
ij
} by using the maximum likelihood estimator

$$ \hat{\alpha}_{ij} = \frac{\text{number of i} \to \text{j transitions}}{\text{person-time in state i}}. $$

(1)

Under this model the mean sojourn time of an infected patient in the hospital is \(\frac {1}{\alpha _{12}+\alpha _{13}}\) and of an uninfected patient it is \(\frac {1}{\alpha _{01}+\alpha _{02}+\alpha _{03}}\). We write *X*_{
t
} for the state occupied by the patient at time t. At a time point *t*, the patient status *X*_{
t
}∈{0,1,2,3}. By definition, all individuals start in the initial state 0 of being alive in the hospital and free of HI, i.e., *X*_{0}=0. We denote *T* as the smallest time at which the process is in an absorbing state, *T*=inf{*t*:*X*_{
t
}∈{2,3}}. Eventually, end of the hospital stay occurs when *X*_{
T
}∈{2,3}.

To evaluate the impact of HIs on the subsequent hospital stay, Schulgen and Schumacher (1996) [6] suggested to consider the difference of the expected subsequent stay given infectious status at time s, *ϕ*(*s*)=*E*(*T*|*X*_{
s
}=1)−*E*(*T*|*X*_{
s
}=0). Schulgen and Schumacher called *ϕ*(*s*) the ’expected extra hospitalization time of an infected individual dependent on time s’. In our setting, the process follows a homogeneous Markov model. Allignol et al. [7] studied the cLOS for model A (Fig. 1) mathematically and found that cLOS does not depend on the time s in the homogeneous case. The cLOS can therefore be expressed as

$$ {} \text{CLOS}_{true} = \phi(s) = \left[\frac{\alpha_{02} + \alpha_{03}}{\alpha_{12} + \alpha_{13}}-1\right]\times \frac{1}{ \alpha_{01} + \alpha_{02} + \alpha_{03}} $$

(2)

Furthermore, Allignol et al. provided a formula to separate the estimation of the cLOS for the discharged patients and the deceased patients under the constant hazard set up. This formula is given by

$$ {\begin{aligned} {} \text{CLOS} &= \text{CLOS(due to discharged alive)} \\ &\quad+ \text{CLOS(due to deaths)}\\ &= \frac{\alpha_{12}}{\alpha_{12} + \alpha_{13}}\times \text{CLOS} + \frac{\alpha_{13}}{\alpha_{12} + \alpha_{13}}\times \text{CLOS} \end{aligned}} $$

(3)

Hence, we can separately estimate cLOS attributable to patients discharged alive and cLOS attributable to death cases by plugging in the estimates of the constant hazards obtained with (1).

Model B results from model A when treating death cases as censored. In contrast to model A, patients that die are assumed to remain under the same risk of being discharged alive as patients that are still in the hospital. While the discharge hazards of model A and B are the same, the absolute chance of discharge alive in model A depends on the competing risk death and therefore differs from the discharge probability modelled in model B. To derive the cLOS that results from model B, we apply the formula proposed by Allignol et al. which is then

$$ \text{CLOS}^{*} = \left[\frac{\alpha_{02}}{\alpha_{12}}-1\right]\frac{1}{\alpha_{01} + \alpha_{02}}. $$

(4)

### Analytic expression for the bias

Our focus is on investigating the bias in cLOS when the information of the patients that die is censored. Using the formulas in Eqs. (2) and (4), we deduce that the bias in cLOS due to censoring is,

$$ {\begin{aligned} \text{CLOS}^{*} - \text{CLOS}_{true} =& \frac{\alpha_{03}(\alpha_{02} - \alpha_{12})}{\alpha_{12}(\alpha_{01} + \alpha_{02} + \alpha_{03})(\alpha_{01} + \alpha_{02})} \\ &+\frac{(\alpha_{02}\alpha_{13}- \alpha_{03}\alpha_{12})}{\alpha_{12}(\alpha_{01} + \alpha_{02} + \alpha_{03})(\alpha_{12}+\alpha_{13})}\\ =&\frac{\alpha_{03}(\alpha_{02} \,-\, \alpha_{12})}{\alpha_{12}\alpha_{0\cdot}\alpha^{*}_{0\cdot}} \,+\, \frac{(\alpha_{02}\alpha_{13}\,-\, \alpha_{03}\alpha_{12})}{\alpha_{0\cdot}\alpha_{1\cdot}\alpha_{12}}, \end{aligned}} $$

(5)

where *α*_{0·}=*α*_{01}+*α*_{02}+*α*_{03}, \(\alpha ^{*}_{0\cdot } = \alpha _{01} + \alpha _{02}\), *α*_{1·}=*α*_{12}+*α*_{13} and \(\alpha ^{*}_{1\cdot } = \alpha _{12}\). The formula shows that the bias depends on the product of the mean LOS in state 0 (*α*_{0.}) and a term depending on all hazards. The second term determines the direction of the bias which could be positive or negative. In the following, we study the bias in specific settings which we call differential mortality. We define “direct differential mortality” as the setting where the discharge hazards *α*_{02} and *α*_{12} are the same but the death hazards *α*_{03} and *α*_{13} differ. In contrast, “indirect differential mortality” is described by equal death hazards but different discharge hazards. Of note, due to the competing risk situation both settings influence - directly or indirectly - the overall hospital mortality. We define *Δ*_{1}=*α*_{13}−*α*_{03} and *Δ*_{2}=*α*_{02}−*α*_{12} and emphasize that both quantities are likely to be positive because infected patients often have a higher mortality hazard and a lower discharge hazard, i.e., they stay longer in the hospital.

A formal mathematical derivation of the bias can be found in Additional file 1.

#### No differential mortality

The bias predominately depends on the hazard rates. In the following we study the magnitude of the bias under differential mortality. When there is no differential mortality, that is, no difference between the death hazards with and without infection and no difference between the discharge hazards with and without infection, *Δ*_{1}=*α*_{13}−*α*_{03}=0 and *Δ*_{2}=*α*_{02}−*α*_{12}=0, the bias becomes 0. The following formula can be used to obtain an idea of the magnitude and the direction of the bias for given values of the hazard functions when the death cases are censored.

#### Direct differential mortality

Under direct differential mortality, there is a non-zero difference between the death hazards with and without infection while the discharge hazards with and without infection are the same, that is *Δ*_{2}=*α*_{02}−*α*_{12}=0 and *Δ*_{1}=*α*_{13}−*α*_{03}≠0. Then, the bias can be expressed as

$$\begin{array}{*{20}l} \text{CLOS}^{*} - \text{CLOS}_{true} &= (\alpha_{13} - \alpha_{03})\cdot\frac{1}{\alpha_{0\cdot}}\cdot\frac{1}{\alpha_{1\cdot}} \\ &= \Delta_{1}\cdot\frac{1}{\alpha_{0\cdot}}\cdot\frac{1}{\alpha_{1\cdot}}. \end{array} $$

(6)

The bias changes with *Δ*_{1}. Moreover, as \(\frac {1}{\alpha _{0\cdot }}\) and \(\frac {1}{\alpha _{1\cdot }}\) are the average sojourn time in state 0 and state 1 of uninfected and respectively infected patients, the bias also increases when the average sojourn times increase.

#### Indirect differential mortality

Under indirect differential mortality, there is a non-zero difference between the discharge hazards with and without infection while the death intensities with and without infection are the same, that is *Δ*_{1}=*α*_{13}−*α*_{03}=0 and *Δ*_{2}=*α*_{02}−*α*_{12}≠0. Then, the bias is

$$\begin{array}{*{20}l} {} \text{CLOS}^{*}\! -\! \text{CLOS}_{true} &\,=\, \left(\alpha_{02} - \alpha_{12}\right) \!\cdot\!\frac{1}\!\cdot\!\frac{1}{\alpha_{1\cdot}}\!\cdot\!\frac{\alpha_{03}\left(\alpha_{0\cdot} + \alpha_{12}\right)}{\alpha_{12}\alpha^{*}_{0\cdot}} \\ &= \Delta_{2} \cdot\frac{1}{\alpha_{0\cdot}}\cdot\frac{1}{\alpha_{1\cdot}}\cdot\frac{\alpha_{03}\left(\alpha_{0\cdot} + \alpha_{12}\right)}{\alpha_{12}\alpha^{*}_{0\cdot}}. \end{array} $$

(7)

The bias changes with *Δ*_{2}. The bias also increases with the average waiting time in state 0 and in state 1. Again, in most of the real world situations, we observe *Δ*_{2}>0, which means the infected patients have lower discharge rates than the uninfected ones. Then, the bias is positive which leads to an overestimation of the cLOS.

The derived analytical expressions demonstrate for a simplified setting (constant hazards, differential mortality) how estimation of cLOS is influenced when information of the death cases is censored. Only in the situation where HIs have neither an effect on the death hazards nor on the discharge hazards, the bias is avoided. Otherwise, the bias increases with increasing magnitude of the differential mortality.