 Research article
 Open Access
 Published:
Joint analysis of duration of ventilation, length of intensive care, and mortality of COVID19 patients: a multistate approach
BMC Medical Research Methodology volume 20, Article number: 206 (2020)
Abstract
Background
The clinical progress of patients hospitalized due to COVID19 is often associated with severe pneumonia which may require intensive care, invasive ventilation, or extracorporeal membrane oxygenation (ECMO). The length of intensive care and the duration of these supportive therapies are clinically relevant outcomes. From the statistical perspective, these quantities are challenging to estimate due to episodes being timedependent and potentially multiple, as well as being determined by the competing, terminal events of discharge alive and death.
Methods
We used multistate models to study COVID19 patients’ timedependent progress and provide a statistical framework to estimate hazard rates and transition probabilities. These estimates can then be used to quantify average sojourn times of clinically important states such as intensive care and invasive ventilation. We have made two real data sets of COVID19 patients (n = 24* and n = 53**) and the corresponding statistical code publically available.
Results
The expected lengths of intensive care unit (ICU) stay at day 28 for the two cohorts were 15.05* and 19.62** days, while expected durations of mechanical ventilation were 7.97* and 9.85** days. Predicted mortality stood at 51%* and 15%**. Patients mechanically ventilated at the start of the example studies had a longer expected duration of ventilation (12.25*, 14.57** days) compared to patients nonventilated (4.34*, 1.41** days) after 28 days. Furthermore, initially ventilated patients had a higher risk of death (54%* and 20%** vs. 48%* and 6%**) after 4 weeks. These results are further illustrated in stacked probability plots for the two groups from time zero, as well as for the entire cohort which depicts the predicted proportions of the patients in each state over followup.
Conclusions
The multistate approach gives important insights into the progress of COVID19 patients in terms of ventilation duration, length of ICU stay, and mortality. In addition to avoiding frequent pitfalls in survival analysis, the methodology enables active cases to be analyzed by allowing for censoring. The stacked probability plots provide extensive information in a concise manner that can be easily conveyed to decision makers regarding healthcare capacities. Furthermore, clear comparisons can be made among different baseline characteristics.
Background
Most individuals infected with SARSCoV2 will experience mild or moderate symptoms (such as cough, fever, shortness of breath) and do not need hospitalization. In contrast, those with a severe pneumonia require clinical support.
The temporal dynamics of illness severity among hospitalized Covid19 patients can be described in terms of length of stay in the intensive care unit, duration of invasive ventilation, and the probability of death. The present paper demonstrates the application of statistical methods for analyzing these timedependent types of data from hospitalized Covid19 patients. These models effectively map the progression of diseased patients during their ICU stay. The methodology also avoids common pitfalls and biases that arise during the analysis of hospital data when using less sophisticated survival analysis methods [1, 2]. For example, treating exposures as timefixed at baseline that can, in fact, vary over time leads to the timedependent bias [3, 4]. Furthermore, ignoring competing events that can impede the observation of the event of interest can introduce bias and facilitate false predictions [5]. The ability of multistate models to avoid these biases was demonstrated in analyzing the effect of treatment with Oseltamivir (Tamiflu) on hospital mortality and length of stay in confirmed pandemic influenza A/H1N1 2009 infected patients [6, 7]. In this current demonstration, invasive mechanical ventilation is treated as timedependent while competing risks are adequately accounted for, thus avoiding the aforementioned biases.
For illustration of the methods used, we used two real data examples extracted from figures in The New England Journal of Medicine. The first one was a case series of 24 laboratoryverified COVID19 intensive care patients admitted to hospital ICUs in the Seattle area [8]. The second one was a sample of 53 COVID19 patients from North America, Europe, and Japan that we extracted from a figure depicting a recent study of patients treated with compassionateuse Remdesivir [9]. It should be noted that the two data sets are used for demonstration and not for comparing the two cohorts. For both data sets we estimate duration in the ICU and under mechanical ventilation using multistate model methodology.
Methods
Multistate models are a powerful tool to study the course of ICU stay of diseased patients. COVID19 observational studies can, for example, be analyzed with the model shown in Fig. 1. In this model, patients may enter the study in one of two initial states: State 1: ICU without invasive mechanical ventilation (“NonMV”) and State 2: ICU with mechanical ventilation (“MV). These two states are called transient states. The model includes two absorbing states from which a patient no longer transitions: discharged alive from the ICU (State 3: “Discharge”) and dead (State 4: “Death”). From ICU without ventilation, a patient can either be ventilated, discharged, or die. From ventilation, a patient can transition into nonventilation, discharge, or death. Patients can repeatedly transition between ventilation and nonventilation.
Estimands
Formally the course of a patients ICU stay is described with a timeinhomogeneous Markov chain given by {X(t), t ≥ 0} with finite state space S = {1, 2, 3, 4} and followup time τ. X(t) denotes the state occupied at time t. Various estimands are of interest. First, we define the probability to move from one state to another within the multistate model. This includes, for example, the ICU mortality risk and the discharge probability.
The (Markovian) transition probabilities are.
and interpreted as the probabilities to transition from State l, occupied at time s, to State m within the time interval (s, t]. The Markov property states that this probability depends only on the current time s and the current state occupied at s, but not on past events. Using formula (1) we can define the transition hazards as
The transition hazards are represented graphically by the arrows between states in Fig. 1.
Subsequently, we can define cause specific cumulative hazards as
For more details, we refer to [10,11,12].
We analyze the course of a patients hospital stay from study entry (s = 0) in the following. The probabilities to start either in State 1 or State 2 define the initial distribution, which is given by.
The state occupation probabilities are
and
To determine estimands for the full cohort, we could take into account the initial distribution and use (5) and (6) in the equations that follow. However, to focus on patients that start in a specific state, (5) reduces to
for patients that start in State 1 and (6) reduces to
for patients that start in State 2. We will use (7) and (8) in what follows.
The state occupation probabilities can be used to derive the length of stay in the ICU and the duration of mechanical ventilation. The sojourn time spent in the ICU nonventilated (truncated after, for example, 28 days) is formally given by
if the patient started in State 1 and
if the patient started in State 2.
Similarly, the duration of MV is given by
if the patient started in State 1 and
if the patient started in State 2.
The total length of stay in the ICU (irrespective of being ventilated or not) up to a maximum of 28 days is simply the sum
if the patient started in State 1 and
if the patient started in State 2.
We have limited S to the 4 states depicted in Fig. 1. However, S can take the value of a finite number of J states and the methodology still holds. For an example of an extended model, see Additional file 11.
Estimators
We used the R package mstate to estimate the transition and state occupation probabilities for the patients over the course of their ICU stay up to 28 days, implying administrative censoring at day 28. The mstate package employs AalenJohansen estimators which are implemented within the Rfunction probtrans. The probabilities can be estimated from both initial states (ventilated (State 2) and nonventilated (State 1) admission to the ICU).
The AalenJohansen estimators are based on matrix multiplication and therefore depend fundamentally on the Markov assumption. As described in detail by Allignol et al. [13] and Beyersmann et al. [14], estimation is based on the causespecific cumulative hazards indicated in formula (3). These are informally given by
where L is the total number of events up to time t and t_{k}, k = 1, …L, are the event times. Then, using a representation of the state occupation probabilities as product integral (explained in detail in Additional file 12), we have
where \( \hat{\boldsymbol{A}}(t) \) is the matrix with entries \( {\hat{A}}_{ij}(t) \), I is the identity matrix and \( \Delta \hat{A}(t) \) are the difference in \( \hat{A} \) between t and the time just prior to t
For example, the transition rate between the two states (1: NonMV, 2: MV) is calculated by substituting 1 for i and 2 for j (i.e. NonMV→MV) or substituting 2 for i and 1 for j (i.e. MV→NonMV) into eq. (17). Eq. (17) is used to calculate the state occupation probabilities in eq. (16).
Beyersmann and Putter [12] describe how to estimate the sojourn times spent in State 1 and State 2 from the state occupation probabilities. This approach is also based on the Aalen Johansen estimators and implemented within the Rfunction ELOS in the mstate package. Confidence intervals can be obtained via bootstrapping.
In addition to describing the risk of death, giving the chances to be discharged, and quantifying hospital capacities (length of ICU stay, duration of mechanical ventilation), causespecific hazard regression models can be used to study the potential impact of factors on each of the transition hazards.
For a more thorough treatment of the theoretical background of the paper, see Additional file 12.
Data examples
Example 1: Case series of criticallyill COVID19 patients in Seattle, USA
In the first example, we reconstructed the patients in the case series from Bhatraju et al. [8] by extracting the data depicted in a figure in their paper. The study included 24 laboratoryconfirmed COVID19 patients admitted to ICUs in the surrounding area of Seattle, USA. The paper provides individual patient information including treatment with invasive mechanical ventilation times, as well as final outcomes (discharged alive, dead). Periods of acute care were also provided but due to the small size of the sample, we dichotomized the patients into two states: “NonMV” (ICU without MV and acute care) and “MV” (ICU with MV). This dichotomization matches the model presented in Fig. 1. Maximum followup was 31 days with each patient having at least 14 days of followup. At admission, 13 (54%) patients were NonMV while 11 (46%) were MV. Seven patients were censored. Table 1 shows a portion of the data set extracted from the published figure and adapted to the model in Fig. 1. For example, patient with id 1 started NonMV (‘from’ = 1) at ICU admission (‘entry’ = 0) and transitioned into MV (‘to’ = 2) on day 5 (‘exit’ =5). Patient 1 then moved back into NonMV on day 16, before being censored on day 25 (‘to’ = 0). ID 2 died (‘to’ = 4) on day 1. The patient with ID 3 started MV, transitioned into NonMV on day 12, and then is discharged (‘to’ = 3) on day 15. Data is required to be put into this form for the functions that are provided. The format can be easily adjusted to take into account baseline and timedependent covariates. The entire data set for example 1 is provided in Additional file 2.
Example 2: Cohort study of patients with severe COVID19 and treated with compassionateuse Remdesivir
Our second data example of COVID19 patients is a reconstruction of the study population from Grein et al. [9]. The study included patients with severe COVID19 that were treated with Remdesivir. Inclusion criteria were confirmed SARSCoV2 infection and an oxygen saturation of ≤94% or oxygen support. Followup was 28 days. Missing data regarding level of oxygen support was imputed by the method of last observation carried forward (LOCF). In this study we have detailed information not only on episodes of MV but also on other forms of intubation. To match the model shown in Fig. 1, we again dichotomized the patients into two groups: “NonMV” (noninvasive positive pressure ventilation, nasal highflow oxygen therapy, lowflow oxygen, ambient air) and “MV” (extracorporeal membrane oxygenation and invasive mechanical ventilation). At admission, 19 (36%) patients were nonMV while 34 (64%) were MV. Twentyone patients were censored.
The data set for example 2 is provided in Additional file 7.
Results
Example 1
Predictions of the expected length of stay for patients in this cohort are shown in Table 2. For example, a patient starting unventilated at the beginning of his/her ICU stay had a much shorter expected duration of ventilation (4.34 days) than a patient already ventilated at ICU admission (12.25 days). Using the initial distribution, the weighted average of the expected durations in each state determined the expected total ICU time (15.05 days). This information is vital for advance planning of both ventilation and ICU capacities. The same is true for Fig. 2: we multiplied the transition matrix (eq. (16)) by the initial distribution to produce the stacked probability plot that illustrates the predicted proportions of the states throughout the entire follow up. At day 21 after ICU admission, for instance, a predicted 21% of patients are already discharged, 18% are not invasively ventilated, 10% need invasive mechanical ventilation, and mortality is predicted at 51%. With respect to the relatively high level of predicted mortality based on this cohort, it should be noted that 4 deceased patients had donotresuscitate orders in place before their admission. R code to reproduce the analysis for example 1 is provided in Additional file 5.
Example 2
The expected sojourn times and lengths of stay for this data are presented in Table 3. Figure 3 provides a visualization of the clinical course of the full cohort for example 2. Similar to example 1, patients initially MV had a longer expected ICU stay (20.71 vs. 17.67 days) at 28 days. Figure 4 sheds light onto this finding by comparing the clinical progression for patients starting in the two initial states. The increased size of the sample in example 2 allows such visual comparisons to be made. At 21 days of followup, patients starting in NonMV had a higher probability of being discharged alive (60% vs. 31%) and a lower probability of dying (6% vs. 20%). ICU duration is shortened by a higher death probability in initially MV patients and a higher discharge probability for initially nonMV patients. This underlines the influence these competing events have on the lengths of stay. Figure 4 visually illustrates the marked difference in the progression of the hospital stay of patients in these two groups. It indicates that the ventilator demands of patients who are initially admitted nonventilated are different from those who are ventilated at admission. R code to reproduce the analysis for example 2 is provided in Additional file 8.
Discussion
We have demonstrated how researchers can model the hospital stays of COVID19 patients to determine the expected duration of mechanical ventilation, expected overall ICU stay, and patients’ predicted clinical progression while avoiding common pitfalls and biases in modeling such settings. Given the need for reliable evidence, we believe that the application of multistate models to this kind of data represents the best way to generate an extensive amount of valid evidence from observational studies.
Although the limited number of patients in the two data examples made it difficult to identify predictors or analyze treatment effects, visualizations of the results provide easytointerpret and comprehensive information of the patients’ clinical courses. The results of the two reanalyses provide insights into timedependent eventprobabilities, while estimates regarding the conditional length of stay are of major interest for capacity planning. To maintain transparency and further help researchers, code in the programming language R and the data examples are provided in the additional files.
The model selected in Fig. 1 facilitated a harmonization of the two data examples. The data sets could have, in fact, been merged if not for differing time origins (time from ICU admission vs. time from Remdesivir treatment initiation). Nonetheless, the harmonization reveals the potential for use in metanalyses and systematic literature reviews. In contrast, the flexibility of the methodology is illustrated in the various models that could have been chosen for each of the data examples. Although we dichotomized the states into NonMV and MV, example 1 also provided information on acute care while example 2 included information on 6 different levels of oxygen support. Additionally, a transition state of ‘ICU after invasive ventilation’ could be modeled to give further insights into the healthcare demands of ventilated COVID19 patients. To demonstrate the flexibility of the methodology, an extended model analysis for example 2 is provided in Additional file 11. A researcher can adapt the choice of multistate model to the data or outcome of interest.
Since publication, several researchers [15] have pointed out biases in the analyses performed by Grein et al. The bias stems from the censoring of deceased patients, whose risk of improving is not similar to noncensored patients (i.e. informative censoring). This results in an overestimated cumulative incidence of clinical improvement. It should be noted that this bias would have been avoided with our proposed multistate methodology as death and discharge from the ICU are absorbing states from which patients are no longer at risk of clinical improvement.
In addition to the biases common to survival analysis already mentioned, Lipsitch et al. [16] review biases that can occur during the outbreak of both known and unknown infectious diseases. They describe a “survivorship bias” that can occur when many infected patients die before being hospitalized, thus implying a protective effect of hospitalization on mortality. This bias is related to “length bias” [6, 7] which can be addressed by incorporating lefttruncation. Lipsitch et al. suggest comparing the risk of death among patients hospitalized and nonhospitalized on a specific day since a patient has become a case, and combining the estimates over several days. Here multistate methodology could be applied by modeling the transient states of “hospitalized” and “nonhospitalized” with a transition into “death” from both states. There are several advantages to this model. First, all patients can be included in one analysis through assignment into one of the initial states (“hospitalized” and “nonhospitalized). Similar to our comparison of stacked probability plots in Fig. 4, we could then compare the risk of death for these two groups. Second, the model accounts for the timedependent nature of hospitalization by allowing for several admission and discharge transitions. The allowance of repeated transitions among initial states is an advantage over standard competing risk models. This further highlights the utility of the proposed methods in epidemic/pandemic settings.
Although the sizes of our two data samples are rather modest, the volume and availability of COVID19 data is expanding. These methods applied to more detailed patient data could produce very informative plots for comparing age, gender, underlying health condition, or even different treatment arms [17]. This expanding capability to compare groups visually as data sets increase in size was demonstrated in Fig. 4, which was informative for the larger of our two real data sets. Furthermore, depictions like the second figures in Bhatraju et al. [8] and Grein et al. [9] provide an impression for the viewer when the sample sizes are small. However, such depictions are overwhelming and difficult to read with larger data sets. In contrast, the stacked probability plots incorporate all of the information in the aforementioned figures into one easytoview illustration regardless of the size of the sample. Naturally, the precision of the stacked probability plots increases with an expanding number of patients.
There are several limitations to our demonstration. First, we chose the model in Fig. 1 as it reflected the observed transitions recorded in the two data sets. The observations included patients who were discharged from the ICU directly from being mechanically ventilated; in other words without first being nonventilated in the ICU. From a clinical standpoint, these observed transitions do not occur. Either the patients were extubated and remained in the ICU for a couple hours, or were transferred to another ICU unit. In either case, there may be reason to adjust the data set by censoring these observations. Second, the lengths of stay estimates do not distinguish between the final outcomes of discharge alive and death. While these estimates are relevant for planning capacities, they are less clinically relevant. An alternative would be to evaluate, for example, time alive without mechanical ventilation. Third, we performed LOCF on the example 2 data to handle periods of missing information on a patient’s level of oxygen support. While this simplified the analysis, it is likely that transitions between states occurred for longer periods of missing information.
A further strength of this methodology is that it allows for censoring, thus acknowledging active cases. It therefore lends itself to ongoing as well as completed studies. Considering the wealth of COVID19related research being produced currently, the multistate approach is an invaluable addition to a COVID19 researcher’s toolkit.
Conclusions
Applying multistate methodology to ICU settings with COVID19 patients gives important insights into mechanical ventilation duration, length of ICU stay, and mortality. The visualization of these results, in the form of a stacked probability plot, is both easytoread and comprehensive. The approach also allows for clear comparisons among different baseline characteristics, and even treatment arms. The tools described here offer important aid to decision makers with regard to healthcare capacities.
Availability of data and materials
All data generated or analyzed during this study are included in this published article (and its supplementary information files).
Abbreviations
 ECMO:

Extracorporeal membrane oxygenation
 ICU:

Intensive Care Unit
 LOCF:

Last observation carried forward
 MV:

ICU with mechanical ventilation
 NonMV:

ICU without mechanical ventilation
References
 1.
Beyersmann J, Wolkewitz M, Allignol A, Grambauer N, Schumacher M. Application of multistate models in hospital epidemiology: advances and challenges. Biom J. 2011;53(2):332–50.
 2.
Wolkewitz M, von Cube M, Schumacher M. Multistate modeling to analyze nosocomial infection data: an introduction and demonstration. Infect Control Amp Hosp Epidemiol. 2017;38(8):953–9.
 3.
van Walraven C, Davis D, Forster AJ, Wells GA. Timedependent bias was common in survival analyses published in leading clinical journals. J Clin Epidemiol. 2004;57(7):672–82.
 4.
Beyersmann J, Gastmeier P, Wolkewitz M, Schumacher M. An easy mathematical proof showed that timedependent bias inevitably leads to biased effect estimation. J Clin Epidemiol. 2008;61(12):1216–21.
 5.
Wolkewitz M, Cooper BS, Bonten MJM, Barnett AG, Schumacher M. Interpreting and comparing risks in the presence of competing events. BMJ. 2014;349:g5060.
 6.
Wolkewitz M, Schumacher M. Survival biases lead to flawed conclusions in observational treatment studies of influenza patients. J Clin Epidemiol. 2017;84:121–9.
 7.
Wolkewitz M, Schumacher M. Neuraminidase inhibitors and hospital mortality in British patients with H1N1 influenza a: a reanalysis of observational data. PLoS One. 2016;11(9):e0160430.
 8.
Bhatraju PK, Ghassemieh BJ, Nichols M, Kim R, Jerome KR, Nalla AK, et al. Covid19 in Critically Ill Patients in the Seattle Region — Case Series. New England Journal of Medicine. 2020 ;382(21):2012–22.
 9.
Grein J, Ohmagari N, Shin D, Diaz G, Asperges E, Castagna A, et al. Compassionate Use of Remdesivir for Patients with Severe Covid19. New England Journal of Medicine. 2020;382(24):2327–36.
 10.
Andersen PK, Keiding N. Multistate models for event history analysis. Stat Methods Med Res. 2002;11(2):91–115.
 11.
Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer Science & Business Media; 2012. p. 779.
 12.
Beyersmann J, Putter H. A note on computing average state occupation times. Demogr Res. 2014;30:1681–96.
 13.
Allignol A, Schumacher M, Beyersmann J. Empirical Transition Matrix of MultiState Models: The etm Package. J Stat Softw. 2011;38(4) Available from: http://www.jstatsoft.org/v38/i04/. [cited 2020 Mar 2].
 14.
Beyersmann J, Allignol A, Schumacher M. Competing Risks and Multistate Models with R. New York: Springer Science & Business Media; 2011. p. 249.
 15.
Compassionate Use of Remdesivir in Covid19. N Engl J Med. 2020;382(25):e101.
 16.
Lipsitch M, Donnelly CA, Fraser C, Blake IM, Cori A, Dorigatti I, et al. Potential biases in estimating absolute and relative casefatality risks during outbreaks. PLoS Negl Trop Dis. 2015;9(7):e0003846.
 17.
Cube M von, Grodd M, Wolkewitz M, Hazard D, Lambert J. Harmonizing heterogeneous endpoints in COVID19 trials without loss of information  an essential step to facilitate decision making. medRxiv. 2020;2020.03.31.20049007.
Acknowledgements
We would like to thank Philip Hehn for his help in extracting the data sets from the published articles. We would also like to thank Prof. Martin Schumacher for his invaluable insights and recommendations.
Funding
JL was supported by Edmond de Rothschild Foundation Medical Fellowship Program. Open access funding provided by Projekt DEAL.
Author information
Affiliations
Contributions
MW conceptualized the project. DH, KK, MvC, and MW wrote the manuscript. DH, JL, and MG developed the code for the analyses and plots. LB and MG wrote Additional file 12. DH and KK extracted the data from the two published articles. All authors read and commented on several drafts of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1.
format: txt, title: README_Example1.txt, description: README file with instructions so that a researcher can reproduce the results for data example 1.
Additional file 2.
format: csv, title: Example1_Data.csv, description: Data set for example 1.
Additional file 3.
format: R, title: ext_mstate.R, description: R function to modify data frame for use in R package mstate.
Additional file 4.
format: R, title: LOS_boot.R, description: R function to calculate bootstrap confidence intervals in R package mstate.
Additional file 6.
format: txt, title: README_Example2.txt, description: README file with instructions so that a researcher can reproduce the results for data example 2.
Additional file 7.
format: csv, title: Example2_Data.csv, description: Data set for data example 2.
Additional file 9.
format: txt, title: README_Example2_Extended.txt, description: README file with instructions so that a researcher can produce results for extended data example 2.
Additional file 10.
format: csv, title: Example2_Extended_Data.csv, description: Data set for extended data example 2.
Additional file 12.
format: docx, title: Theoretical_Background.docx, description: Theoretical aspects of the analyses of the real data examples.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hazard, D., Kaier, K., von Cube, M. et al. Joint analysis of duration of ventilation, length of intensive care, and mortality of COVID19 patients: a multistate approach. BMC Med Res Methodol 20, 206 (2020). https://doi.org/10.1186/s1287402001082z
Received:
Accepted:
Published:
Keywords
 SARSCoV2
 Multistate model
 Length of stay
 Competing risks
 Mechanical ventilation