Software
The COVID19-World application has been developed in RStudio [6], version 1.2.5033, using the Shiny package, version 1.4.0. Shiny offers the ability to develop a graphical user interface (GUI) that can be run locally or deployed online. Last is particularly beneficial to show and communicate updated findings to a broad audience. All the analyses have been carried out using R [7], version 3.6.3. The key R packages used in the tool implementation include dplyr, xlsx, and vroom for data management, sjPlot, and EpiEstim for data analysis, shinydashboard, shinyFeedback, shinycssloaders, and kableExtra for application enhancement and plotly for the graphical displays. The application is freely available online at [https://ubidi.shinyapps.io/covid19world], being the source code available under request through Github at [https://github.com/ubidi/covid19world]. Menus, tabs, and outputs are available in English, Spanish, and Catalan.
The European Centre for Disease Prevention and Control (ECDC) data file offers a downloadable file updated daily with the latest available public data on COVID-19 per day and country. Data is collected based on reports from health authorities worldwide by the ECDC’s Epidemic Intelligence team. The application has an automated process to update data and all analyses each time a user connects to the app. Data on COVID-19 diagnosed cases and mortality, from January 1st, 2020, onwards is collected at [https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide]. The downloadable dataset is updated daily and contains the latest available public data on COVID-19 worldwide. Countries with a population under 500,000 inhabitants are not included. The application is user friendly, based on intuitive menus to show data visualization for each of the analyses implemented, once a specific country has been chosen from the top dialog box (Fig. 1).
Trends and projections
Trends for the number of diagnosed cases and deaths are estimated using Poisson regression models [8], allowing for over-dispersion [9]. A time-dependent polynomial function is used to estimate the expected number of cases. As the epidemic evolved the degree of the polynomial function was increased. The current model allows for a fourth-degree polynomial function, as follows:
$$ \log \left(\mathrm{E}\left({\mathrm{c}}_{\mathrm{t}}\right)\right)={\upbeta}_0+{\upbeta}_1\mathrm{t}+{\upbeta}_2{\mathrm{t}}^2+{\upbeta}_3{\mathrm{t}}^3+{\upbeta}_4{\mathrm{t}}^4 $$
where t = 1, 2, …, T, represents the time unit (from the first observed day until the last, T consecutive days in total), and ct is the number of events. The estimated regression parameters and their standard errors are used to obtain the short-term projections, up to 3 days, and their 95% CI.
Nevertheless, these models are being regularly evaluated by checking the overdispersion parameter, the sum of Pearson residuals, and the deviance, in case a model reformulation with a better fit is necessary during the epidemic.
Case fatality rate
The case fatality rate is defined as the ratio between the number of deaths and the number of diagnosed cases [10]. Thus, an offset is fitted into the Poisson regression model, also allowing for overdispersion, as the logarithm of the diagnosed cases:
$$ \log \left(\mathrm{E}\left({\mathrm{m}}_{\mathrm{t}}\right)\right)={\upbeta}_0+{\upbeta}_1\mathrm{t}+{\upbeta}_2{\mathrm{t}}^2+{\upbeta}_3{\mathrm{t}}^3+{\upbeta}_4{\mathrm{t}}^4+\log \left({\mathrm{c}}_{\mathrm{t}}\right) $$
where mt is the daily number of deaths, and ct is the daily number of diagnosed cases. Case fatality rates are also calculated for the same age groups.
We should acknowledge that it is not possible to make an accurate estimate of the case fatality rates due to the underreporting of cases diagnosed in official statistics [11]. Nonetheless, the estimation and monitoring of the case fatality rates monitoring are of especial interest in the current epidemic scenario.
Infection time
Infection time, estimating the incubation period for COVID-19 between the interval of exposure to SARS-CoV-2 and the date of diagnosis is computed following the approach of Lauer et al. [12], who have recently analyzed the incubation period for COVID-19 in a cohort of symptomatic patients. For each patient, they collected the interval of exposure to SARS-CoV-2 and the date of appearance of symptoms. They assumed that the incubation time would follow, as in other viral respiratory tract infections, a Lognormal distribution.
$$ Lognormal\left( mu,{sigma}^2\right)= Lognormal\left(1.621,0.418\right) $$
We have replicated this distribution in the group of diagnosed cases to approximate the date of exposure to SARS-CoV-2 recursively:
$$ q(i)=\sum \limits_{j=1}^{14}P(j)\times {c}_{j+i} $$
where p is the number of diagnosed cases on a day i; q is the number of infected cases on day i-j; j = 1, 2, …, 14 is the maximum time it is expected that the disease can develop; and P(j) is the probability of presenting symptoms on day j according to a Lognormal probability distribution with the parameters defined by Lauer et al. [12]
To estimate the last 14 days, since the information on the diagnosed cases was not available for the forthcoming days, a fourth-degree polynomial model was used to project diagnosed cases. These latest estimates are displayed in the application with a different color.
Basic reproduction number
The basic reproduction number (R0) is the average number of secondary cases of disease caused by a single infected individual over his or her infectious period [13]. This statistic, which is time and situation-specific, is commonly used to characterize pathogen transmissibility during an epidemic. The monitoring of R0 over time provides feedback on the effectiveness of interventions and on the need to intensify control efforts. The goal of control efforts is to reduce the R0 below the threshold value of 1 and as close to 0 as possible to control the epidemic. Here, we used the R package EpiEstim to estimate the basic reproduction number through the Wallinga and Teunis method [13], which assumes a gamma distribution for the serial interval. The serial interval is the time between the onset of symptoms in a primary case and the onset of symptoms of secondary cases, which is needed to estimate R0 throughout the epidemic. The mean and standard deviation of the serial interval distribution can vary depending on the disease [13]. Recently, Nishiura et al. [14] estimated a mean and standard deviation for the COVID-19 serial interval distribution of 4.7 and 2.9 days, respectively, being these the values we are using in our analysis for the gamma a priori distribution.
The goodness of fit of estimated models is evaluated to provide a better fit of the data during the epidemic. A deviance analysis is performed to compare the model’s fit. Moreover, to quantify the model error Poisson overdispersion parameter, the sum of Pearson’s residuals and deviances are shown.