 Software
 Open access
 Published:
Stacked probability plots of the extended illnessdeath model using constant transition hazards – an easy to use shiny app
BMC Medical Research Methodology volume 24, Article number: 116 (2024)
Abstract
Background
Extended illnessdeath models (a specific class of multistate models) are a useful tool to analyse situations like hospitalacquired infections, ventilationassociated pneumonia, and transfers between hospitals. The main components of these models are hazard rates and transition probabilities. Calculation of different measures and their interpretation can be challenging due to their complexity.
Methods
By assuming timeconstant hazards, the complexity of these models becomes manageable and closed mathematical forms for transition probabilities can be derived. Using these forms, we created a tool in R to visualize transition probabilities via stacked probability plots.
Results
In this article, we present this tool and give some insights into its theoretical background. Using published examples, we give guidelines on how this tool can be used. Our goal is to provide an instrument that helps obtain a deeper understanding of a complex multistate setting.
Conclusion
While multistate models (in particular extended illnessdeath models), can be highly complex, this tool can be used in studies to both understand assumptions, which have been made during planning and as a first step in analysing complex data structures. An online version of this tool can be found at https://eidm.imbi.unifreiburg.de/.
Background
Choosing and understanding statistical analysis models in epidemiology can be challenging. Many models have distinct shortcomings. For example, standard logistic regression ignores the timing of events and therefore, only provides a restricted view. KaplanMeier models take into account the timing of events but fail to consider competing events. Analysing hospitalacquired infections (HAI), ventilationassociated pneumonia (VAP), worsening of COVID19cases in hospital or transfers of COVID19ICUcases are just a few examples of situations where competing events like death and discharge have to be taken into account.
The European Medicines Agency (EMA) has included the occurrence of intercurrent events in its list of things to consider when describing treatment effects [1]. Multistate models are one way to meet this requirement. U. Beyer et al. [2] and A. Erdmann et al. [3] showed the use of multistate models to address the occurrence of intermediate events in cancer patients with a slightly different models.
Multistate (especially competing risk) models are becoming a more and more established tool to analyse such complex settings. Many authors have already pointed out the importance of being careful in the presence of competing events [4, 5] and have given suggestions on which methods to use in specific situations [6,7,8,9]. C. H. Jackson et al. [10] shows the advantage of multistate models by comparing different modelling frameworks in a model which is quite similar to the one used in this article.
In this work we will focus on the extended illnessdeath model (eidm) which can be seen in Fig. 1 and is described in detail in [6] while considering constant hazards. The implied limitations will be discussed in the conclusion section. This model distinguishes between two absorbing events before or after the intermediate event. The term “intermediate event” depends on the setting of the study. Some examples could be disease progression, as in the articles by U. Beyer et al. [2] and A. Erdmann et al. [3] or nosocomial pneumonia like seen in the work of B. François et al. [11] and J. Chastre et al. [12]. This accounts for the timedependencies of intermediate events like HAI, VAP, and worsening patient condition or transfer of COVID19cases.
Using constant transition hazards offers the opportunity to calculate closed forms of transition probabilities at a given point in time. The second advantage of using the constant hazard framework is that there are few to no data requirements. These transition probabilities can be visualized in stacked probability plots which are discussed in detail by Hazard et al. and von Cube et al. [15, 13].
Our goal is to provide a tool, which utilizes the benefits of a multistate model especially in the framework of constant hazards and the visual advantages of stacked probability plots to provide a tool for nonstatisticians like clinicians to improve the planning and analysis of epidemiologic studies. Specifically, for studies with nonmortal Endpoints (e.g. discharge or infection) corresponding to an extended illnessdeath model. Hence, we implemented an app using R [14]. This app takes hazards or hazard ratios as inputs and renders corresponding stacked probability plots, plots for the population attributable fraction (PAF), and plots for attributable mortality (AM) (not discussed in this article).
In the following, we present statistical considerations for our calculations followed by a guideline on how to use this tool. In addition, we give some hypothetical and real examples.
Implementation
We consider a finite state continuous time markov process \(X\left(t\right)\) which can occupy states in \(\{0,\dots ,5\}\) at a given time \(t\), see Fig. 1. Since this process is markov, the transition probabilities (i.e. the probability to be in state \(j\) at time \(t\) while previously being in state \(i\) at time \(s\)) can be written as
By considering every possible transition from state \(i\) to state \(j\), we get a transition probability matrix \(P\left(t\right)\) given by:
The explicit formulas for those probabilities can be found in the supplement material. Those formulas can be used to plot the transition probabilities \({P}_{ij}(0,t)\) dependent on the hazards \({\lambda }_{ij}\). Finally, we stack these probability plots upon each other to create a stacked probability plot. See Figs. 2 and 3 for some example visualizations.
Therefore, by plotting those probabilities over time stacked on each other, we get a graphic that ranges from \(0\) to \(1\) on the yAxis, hence each Probability is represented by a specific area in this plot. Those areas can be interpreted as the time spent in certain state [15]. Arranging those areas can furthermore help to interpret the sum of specific areas as cumulative incidence i.e. \({P}_{01}\left(0,t\right)+ {P}_{04}\left(0,t\right)+ {P}_{05}\left(0,t\right)\) is the cumulative probability to get an intermediate event until time \(t\).
Application
In order to use this tool, one first needs to estimate time constant hazard rates. These are calculated by dividing the number of transitions by the time at risk in the state from which the transition occurs:
Where \({N}_{ij}\) corresponds to the number of transitions from state i to state j and \({T}_{i}\) is the total time at risk in state \(i\).
These hazard rates can be plugged into the tool. In the following we discuss some examples to illustrate the application.
The absolute easiest and intended way to actually use this tool is to go to https://eidm.imbi.unifreiburg.de/ and start playing around.
If you are interested in using this tool locally, it gets more technical. The necessary code and instructions can be found in the supplementary material or at https://github.com/marlongrodd/eidm.
Figure 2 shows the interface of the application. First there are some instructions on how to use the tool. Then you can choose to enter either the hazards for each group separately, or just the hazards for group A and the corresponding hazard ratios for group B. The next input fields are for the hazards and hazard ratios, depending on the choice made earlier. The “Limit of xaxis” option can be used to limit the plots to certain time periods. The ‘Order of stacked plot’ field is used to determine the order in which the coloured areas are stacked. Finally, you can choose to display the PAF and AM plots as well.
Examples
Multistate models, and in particular extended illnessdeath models, can have beneficial insights into the structural background of many situations. For instance, some examples in the hospital setting are listed in Table 1.
Note that in this work we focus on the hospital setting. However, there are many other settings that can be analyzed using multistate models. These models are relevant as soon as there are several outcomes of interest or intermediate events. For instance disease progression as an intermediate event for outpatients. In the following, some of the examples in a hospital setting loosely referring to real world data will be presented. We take a closer look at examples for ventilationacquired infections and disease progression. Aside from describing the usage and interpretation of the stacked probability plots, one further focus will be on how one can order the plot areas.
Three examples are considered. The corresponding hazards are given in Table 2.
Example 1
considers VAP as an intermediate event. The only effect of the group variable is on the hazard of the transition from the initial state to VAP (decreasing hazard). The question of interest is how does this single difference effect the occurrence of all possible transitions?
Example 2
considers a completely different setting with disease progression as intermediate event and only one absorbing state (death). There is an increased hazard for the intermediate event and a decreased hazard for death after the intermediate event in group B compared to group A. How do these two effects in different directions affect the overall occurrence of death?
Example 3
illustrates the differences between the full followup analysis and the simplified constant hazard approach. Real data of hospitalized patients is considered. Hospital acquired infection is the intermediate event and discharge or death are the absorbing events.
Note that if a transition hazard is zero this means that this transition is not possible. Thus, example 2 reduces to a simple illnessdeath model with one intermediate and only one absorbing state (death).
For a detailed description of the examples, see the following section.
Example 1: Ventilationassociated pneumonia
In this example, we consider ventilated patients and VAP as intermediate event. Motivated by the project EVADE (Effort to Prevent Nosocomial Pneumonia caused by Pseudomonas aeruginosa in Mechanically ventilated Subjects) which is part of the COMBACTE (Combatting Bacterial Resistance in Europe, https://www.combacte.com/) consortium, we discuss the effects of VAP on death and discharge [12]. The aim of this project was to analyze the impact of treatment with specific antibodies in ventilated patients against VAP. Therefore, information about observation times of VAP, death and discharge have been collected. Note that our focus lies in presenting the results. Our aim is not to discuss medical implications. The hazards are not the real hazards from the trial, but are inspired by the real hazards calculated from the dataset by dividing the number of events by the number of patient days in hospital. Those hazards can be found in Table 2. Using those values as input, the tool provides the plots as presented in Fig. 2a.
Group A corresponds to the control group and group B to the intervention group. Note that the death and discharge hazards are the same in both treatment groups. Only the VAP hazard (\({\lambda }_{01}\)) differs between the treatment groups and is lower in group B compared to group A, implying an advantage in group B with respect to VAP. However, the death and discharge hazards before the intermediate event differ from the death and discharge hazard after the intermediate event (equal hazards in both groups). The hazard for death is lower before VAP compared to the time afterwards (\({\lambda }_{03}\)compared to \({\lambda }_{15}\)). In contrast the discharge hazard before VAP is greater than afterwards (\({\lambda }_{02}\) compared to \({\lambda }_{14}\)).
Hence, as the probability of the intermediate event differs between treatment groups, the overall probabilities of dying and being discharged with and without the treatment differ too, even if there is no difference in the death and discharge hazards between the groups. The overall probability of dying declines after receiving the treatment. For discharge, this effect can be seen in the opposite direction. This effect can also be highlighted using the tool, see Fig. 2b. In this figure, the order of the areas is modified. On the top of Fig. 2b, the two desirable discharge states are combined. At the bottom, the three nondesirable states are plotted. The dark blue and dark brown area represent the death states (without and with VAP) and thus overall mortality. These combined areas on the left hand side is lower than those on the right hand side. Meanwhile the combined areas for discharge (without or with VAP) at the top of the plot is greater on the right than on the left graphics.
Example 2: Disease progression
Our second example considers disease progression as an intermediate event and orientates on [17]. The focus of this study was to analyse the effect of the drug Selexipag on the occurrence of complications related to pulmonary hypertension. This article provides information about the total number of patients, the number of disease progressions, deaths and median follow up. With these numbers, we can calculate the corresponding transition hazards by dividing the number of events (progression or death) by the patient days. In fact the numbers used in our example are just inspired by those given in the paper. We ignore discharge in this example and consequently use a simple illnessdeath model. The resulting plots are given in Fig. 3a and b, considering different orders.
The hazard rates for discharge are set to zero, making this transition not possible and reducing the model to a simple illnessdeath model. Consequently, there are no areas present for discharge in this plot. The hazard for complications related to pulmonary hypertension (intermediate event) is higher in group B compared to group A. Thus, the area for the intermediate event is bigger on the right side. The deathhazard before complications related to pulmonary hypertension is the same in both groups. However, the intermediate event leads to a higher death hazard in both groups. This increase is higher in group A. Thus, on the one hand, Group B has a disadvantage concerning the intermediate event, but on the other hand has an advantage concerning death after intermediate event. This leads to the situation that the two differences in hazards basically cancel each other out. If the deathhazard after the intermediate event would be the same in both groups the higher hazard for the intermediate event would lead to a much higher probability of dying in general in group B (blue and brown area combined).
Example 3: Real data
This example uses the los.data dataset from the R package “etm” to compare the constant hazard approach with nonparametric estimation using the AalenJohansen estimator [18].
The los.data consists of a sample of the dataset from the SIR3 study, an observational cohort study to analyse the burden of hospitalacquired infections [16].
As this dataset does not distinguish between different groups, we will focus on the differences between the models. We calculated constant transition hazards for the five possible transitions (baseline > discharge, death or intermediate event; intermediate event > death or discharge) and used these in the application by dividing the number of events by the total amount of time patients spend in each state (see section “Application” for more details). In addition, we used the etm function to calculate empirical state occupancy probabilities and plotted these using the R package ggplot2. We arranged these two plots together for better comparison (see Fig. 4).
It can be seen that the use of constant hazard can mimic the empirical state occupancy probabilities. Thus, if there are no real data but information about patient days and number of events for each transition, one can calculate the corresponding hazard rates and use them to get a good idea of what the probabilities from the real data set might look like.
On the other hand, there are clearly some differences. In this example, the first days differ between these two plots; while the intermediate event has a high slope in the constant hazard approach, this behaviour is not seen in the empirical estimate, where the high slope does not start until day four. This is an obvious violation of the constant hazard assumption and shows that this tool should not be used for the final analysis of the data, but rather to get an impression of the possible results of planned studies.
Conclusion
In conclusion, this tool can be used to translate the transition hazards into probabilities and furthermore to visualize the impact of a single varying hazard on all transition probabilities.
Furthermore, one can investigate the impact of different hazards with both, desirable and nondesirable effects. Additionally, one can experience the impact of varying hazard rates on the probabilities (i.e. what would happen if in example 1 the hazards for death before and after intermediate were also impacted by the intervention? What are the Probabilities if the intervention does affect the hazard for discharge rather than the hazard for the intermediate event? ).
Investigation of effects when dealing with intermediate events faces different challenges. These settings can be analysed using multistate models. However, events and effects depend on time and all possible hazard rates. This creates a level of complexity that adds difficulty to achieving proper interpretation, planning, and analysis of epidemiologic studies. In this article, a tool to visualize multistate models in an extendedillnessdeathsetting was discussed. This is a common setting in epidemiologic studies addressing an intermediate event, e.g. studies on nosocomial infections [12].
This tool provides the benefits of multistate models and facilitates the interpretation of complex correlation. By visualizing the impact of expected hazard ratios in specific scenarios, one can acquire a better understanding of the effects of their intervention.
Furthermore, the area between the curves can be considered as the expected length of stay in the respected state. For example, Fig. 3 group B stays much longer in the initial state (yellow area) compared to group A. In contrast, the time “spent” in the state “death after intermediate” (brown area) is greater for group A since the corresponding area is greater. The expected length of stay in the state “intermediate” without further progression is comparable in both groups.
Additionally, three examples were used to exhibit possible applications of this tool. In the first two examples, we orientated on the COMBACTEstudy EVADE and discussed the direct and indirect impacts of an intervention on intermediate events, death and discharge. The second example considered pulmonary hypertension [17] to cover settings where one terminal event, e.g. hospital discharge, is not present.
It is important to note that the assumption of constant hazards is a huge simplification and limitation of this tool. Constant hazards imply that the underlying mechanisms are independent of time. This is usually not the case in practice, so this tool should be used during the design phase of a study, or at most as a first step in the process of data analysis. For more complex analyses, R packages such as etm [18] or mstate [19] are more appropriate. How to use multistate models to analyse data sets from epidemiological studies is described in more detail in R. J. Cook & J. F. Lawless [20] in A. Bühler et al. [21] in J. Beyersmann et al. [22] or in P. Hougaard [23]. C. H. Jackson [10] showed the use of a similar model applied to outcomes after admission with COVID19 in two frameworks (transitionspecific hazard functions and mixture multistate models) of parametric models using gamma distributions.
Our aim was to provide a tool to simplify the complexity of multistate models and to promote a better understanding of these processes. An additional concern was to give this tool a wide range of possible applications; Table 1 gives some examples of possible settings in a hospital environment, without claiming to be exhaustive. Extended illnessdeath models can be applied whenever there is an intermediate event and a final event of interest other than death.
In further steps, our tool could be extended to more different models, as we see for example in M. Lafuente et al.‘s [24] established standalone tool for visualisation and prediction of multistate processes on ICU occupancy by patients with COVID19.
The Rcode for this tool is provided in the supplement file “Additional file 1.docx” as well as on https://github.com/marlongrodd/eidm.
Availability and requirements.

Project name: Extended illnessdeath model with constant transition hazards.

Project home page: https://eidm.imbi.unifreiburg.de/.

Operating system(s): Platform independent.

Programming language: R.

Other requirements: R version 4.2.2 or higher.

License: GNU GPL.
Data availability
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
Abbreviations
 HAI:

Hospitalacquired infections
 VAP:

Ventilationassociated pneumonia
 Eidm:

Extended illnessdeath model
References
European Medicines Agency. ICH E9 (R1.) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials [Internet]. Verfügbar unter: https://www.ema.europa.eu/en/iche9statisticalprinciplesclinicaltrialsscientificguideline.
Beyer U, Dejardin D, Meller M, Rufibach K, Burger HU. A multistate model for early decisionmaking in oncology. Biom J Mai. 2020;62(3):550–67.
Erdmann A, Beyersmann J, Rufibach K. Oncology clinical trial design planning based on a multistate model that jointly models progressionfree and overall survival endpoints. 2023 [zitiert 13. Februar 2024]; Verfügbar unter: https://arxiv.org/abs/2301.10059.
Wolkewitz M, Cooper BS, Bonten MJM, Barnett AG, Schumacher M. Interpreting and comparing risks in the presence of competing events. BMJ 21 August. 2014;349(aug21 5):g5060–5060.
Schumacher M, Allignol A, Beyersmann J, Binder N, Wolkewitz M. Hospitalacquired infections–appropriate statistical treatment is urgently needed! Int J Epidemiol 1 Oktober. 2013;42(5):1502–8.
von Cube M, Schumacher M, Wolkewitz M. Basic parametric analysis for a multistate model in hospital epidemiology. BMC Med Res Methodol Dezember. 2017;17(1):111.
Pierce RA, Lessler J, Milstone AM. Expanding the statistical toolbox: analytic approaches for cohort studies with healthcareassociated infectious outcomes. Curr Opin Infect Dis August. 2015;28(4):384–91.
Schumacher M, Wangler M, Wolkewitz M, Beyersmann J. Attributable mortality due to nosocomial infections: a simple and useful application of Multistate models. Methods Inf Med. 2007;46(05):595–600.
Wolkewitz M, von Cube M, Schumacher M. Multistate modeling to analyze nosocomial infection data: an introduction and demonstration. Infect Control Hosp Epidemiol August. 2017;38(08):953–9.
Jackson CH, Tom BD, Kirwan PD, Mandal S, Seaman SR, Kunzmann K. u. a. A comparison of two frameworks for multistate modelling, applied to outcomes after hospital admissions with COVID19. Stat Methods Med Res September. 2022;31(9):1656–74.
François B, Chastre J, Eggiman P, Laterre PF, Torres A, Sanchez M. u. a. The SAATELLITE and EVADE clinical studies within the COMBACTE Consortium: a public–private collaborative effort in Designing and performing clinical trials for Novel Antibacterial drugs to prevent nosocomial pneumonia: table 1. Clin Infect Dis 15 August. 2016;63(suppl 2):S46–51.
Chastre J, François B, Bourgeois M, Komnos A, Ferrer R, Rahav G. u. a. Safety, efficacy, and pharmacokinetics of gremubamab (MEDI3902), an antipseudomonas aeruginosa bispecific human monoclonal antibody, in P. aeruginosacolonised, mechanically ventilated intensive care unit patients: a randomised controlled trial. Crit Care 15 November. 2022;26(1):355.
von Cube M, Grodd M, Wolkewitz M, Hazard D, Wengenmayer T, Canet E. u. a. Harmonizing heterogeneous endpoints in Coronavirus Disease 2019 trials without loss of information. Crit Care Med Januar. 2021;49(1):e11–9.
R Core Team. R: A Language and Environment for Statistical Computing [Internet]. 2018. Verfügbar unter: https://www.Rproject.org/.
Hazard D, Kaier K, von Cube M, Grodd M, Bugiera L, Lambert J. u. a. Joint analysis of duration of ventilation, length of intensive care, and mortality of COVID19 patients: a multistate approach. BMC Med Res Methodol Dezember. 2020;20(1):206.
Wolkewitz M, Vonberg R, Grundmann H, Beyersmann J, Gastmeier P, Bärwolff S. u. a. Risk factors for the development of nosocomial pneumonia and mortality on intensive care units: application of competing risks models. Crit Care. 2008;12(2):R44.
Sitbon O, Channick R, Chin KM, Frey A, Gaine S. Galiè N, u. a. Selexipag for the treatment of pulmonary arterial hypertension. N Engl J Med 24 Dezember. 2015;373(26):2522–33.
Allignol A, Schumacher M, Beyersmann J. Empirical transition matrix of multistate models: the etm package. J Stat Softw [Internet]. 2011 [zitiert 13. Dezember 2022];38(4). Verfügbar unter: http://www.jstatsoft.org/v38/i04/.
Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multistate models. Stat Med 20 Mai. 2007;26(11):2389–430.
Cook RJ, Lawless JF. Multistate models for the analysis of life history data. Boca Raton, FL: CRC; 2018.
Bühler A, Cook RJ, Lawless JF. Multistate models as a framework for estimand specification in clinical trials of complex processes. Stat Med 30 April. 2023;42(9):1368–97.
Beyersmann J, Allignol A, Schumacher M. Competing risks and multistate models with R [Internet]. New York, NY: Springer New York; 2012. [zitiert 13. Februar 2024]. Verfügbar unter. https://link.springer.com/. https://doi.org/10.1007/9781461420354.
Hougaard P. Analysis of multivariate survival data. New York: Springer; 2000. p. 542.
Lafuente M, López FJ, Mateo PM, Cebrián AC, Asín J, Moler JA. u. a. A multistate model and its standalone tool to predict hospital and ICU occupancy by patients with COVID19. Heliyon Februar. 2023;9(2):e13545.
Acknowledgements
The funding bodies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL. This work was supported by the Innovative Medicines Initiative Joint Undertaking resources (composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007–2013) and EFPIA companies) [grant number 1157372 – COMBACTEMAGNET and 115523 – COMBACTENET].
Open Access funding enabled and organized by Projekt DEAL.
Author information
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Grodd, M., Weber, S. & Wolkewitz, M. Stacked probability plots of the extended illnessdeath model using constant transition hazards – an easy to use shiny app. BMC Med Res Methodol 24, 116 (2024). https://doi.org/10.1186/s12874024022403
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874024022403