Analysis of COVID-19 data using neutrosophic Kruskal Wallis H test

Background Kruskal-Wallis H test from the bank of classical statistics tests is a well-known nonparametric alternative to a one-way analysis of variance. The test is extensively used in decision-making problems where one has to compare the equality of several means when the observations are in exact form. The test is helpless when the data is in an interval form and has some indeterminacy. Methods The interval-valued data often contain uncertainty and imprecision and often arise from situations that contain vagueness and ambiguity. In this research, a modified form of the Kruskal-Wallis H test has been proposed for indeterminacy data. A comprehensive theoretical methodology with an application and implementation of the test has been proposed in the research. Results The proposed test is applied on a Covid-19 data set for application purposes. The study results suggested that the proposed modified Kruskal-Wallis H test is more suitable in interval-valued data situations. The application of this new neutrosophic Kruskal-Wallis test on the Covid-19 data set showed that the proposed test provides more relevant and adequate results. The data representing the daily ICU occupancy by the Covid-19 patients were recorded for both determinate and indeterminate parts. The existing nonparametric Kruskal-Wallis H test under Classical Statistics would have given misleading results. The proposed test showed that at a 1% level of significance, there is a statistically significant difference among the average daily ICU occupancy by corona-positive patients of different age groups. Conclusions The findings of the results suggested that our proposed modified form of the Kruskal-Wallis is appropriate in place of the classical form of the test in the presence of the neutrosophic environment.


Introduction
Hypothesis testing is a scientific process used to investigate the acceptance or rejection of a proposition under consideration. Two approaches are used in statistics to verify a hypothesis: 1) Parametric approach 2) Nonparametric approach. The most important aspect of the parametric approach is the satisfaction of the assumption about data's normality, and a few tests require the equality of population variances [1]. In most situations, the distributional assumption under a parametric test hardly satisfy, and the use of nonparametric or distributionfree tests is a common practice [2,3]. However, all such nonparametric tests apply to data containing determined observations. In real life, there are various scenarios where we have non-precise data, and in such cases, the existing hypothesis testing approach based on classical test statistics cannot be implemented. Recent studies have suggested nonparametric tests based on interval-valued data and fuzzy logic [4]. Smarandache [5] generalized the fuzzy logic in the neutrosophic sense by considering the interval-valued data and the measure of indeterminacy or falseness. Smarandache introduced Neutrosophic Statistics as a generalization of classical statistics applied when the data under consideration is in neutrosophic numbers [6]. Smarandache and Khalid [6] verified the efficiency of neutrosophic logic. Several authors have implemented neutrosophic logic for data containing uncertainty and vagueness; see ref [7][8][9][10][11].
Furthermore, several authors have developed statistical tests to analyze fuzzy data; see, for example, refs [12][13][14][15]. Also, in fuzzy logic and neutrosophic statistics, several research works have been contributed by introducing decision-making analysis for the data set containing uncertainty and vagueness [16][17][18]. Recently, Aslam introduced different statistical tests using Neutrosophic Statistics, including the tests of homogeneity of variance for uncertainty environment, the goodness of fit test in the presence of uncertain parameters, and the Kolmogorov-Smirnov tests under uncertainty [19][20][21].
In 1952, Kruskal and Wallis [12] provided a robust rank-based test for the k sample problem as an alternative to the parametric approaches, such as the one-way analysis of variance (ANOVA). Kruskal-Wallis H test has been used for analysis purposes in various manners; for example, see refs [13][14][15][16][17][18]. In the classical k sample problem, data are determined and do not contain any ambiguity or vagueness. However, in many current scientific studies, the observations are not necessarily relentless, and indeterminate parts quantitatively express the uncertainties in a sample. The existing Kruskal-Wallis H test cannot be used to investigate the data which is measured in the neutrosophic intervals. A detailed literature review has given a shred of clear evidence that no such test is available that can be useful as a nonparametric alternate for ANOVA under an indeterminate environment. The unavailability of a method for the said purpose is a source of motivation for the current research. The goal is to develop a test that compares several sample observations or group(s); the proposed test is easy to apply and understandable. The proposed modified Kruskal-Wallis test results in the interval-valued form and is preferable for data containing vagueness and uncertainty. The objectives of this article are (1) to introduce the modified neutrosophic Kruskal Wallis test; (2) to define the methodology of the neutrosophic Kruskal Wallis test; and (3) to compare the performance of the existing Kruskal Wallis test with the proposed test through an application on Covid-19 data set under Neutrosophy. More information about the application of neutrosophic statistics can be seen in [22][23][24].
The article is planned as follows. Section 2 presents the computational method for the application of the neutrosophic Kruskal Wallis test. In section 3, the modified Kruskal Wallis test has been demonstrated with an eloquent example of the Covid-19 data set for scrutinizing its efficiency and competence. It is anticipated that the modified nonparametric Kruskal Wallis test will proficiently analyze the data in the presence of uncertainty and vagueness as compared to the existing Kruskal Wallis test under classical statistics. Finally, the results are discussed and generalized with some conclusive remarks.

Computational method of the modified Kruskal Wallis test under uncertainty
In Classical Statistics, nonparametric tests are methods of statistical analysis that do not require a distribution to meet the assumptions necessary to be analyzed. These tests apply to non-normal data sets. Due to this reason, they are sometimes referred to as distributionfree tests. The basic purpose of suggesting the Kruskal Wallis test is to scrutinize that all independent samples containing neutrosophic observations come from neutrosophic populations with equal means implying that the populations under uncertainty are identical. The proposed nonparametric test is applicable for data where the measure of uncertainty or the measure of falseness has been recorded. Suppose X N = a N + b N I N ; X N ∈ [X L , X U ] is a neutrosophic number where the first part represents the measure of determinacy and the second part represents the measure of vagueness or uncertainty. For I N ∈ [I L , I U ] = 0, the neutrosophic number reduces to a random variable under classical statistics. The neutrosophic variable X N represents the neutrosophic sample obtained from the population containing imprecise, uncertain, and indeterminate observations; for detail, see ref [5].

Modified Kruskal Wallis H test
Under Classical Statistics, the Kruskal-Wallis H test is used to test the null hypothesis that all k independent samples come from populations having equal means against the alternative hypothesis that at least one population varies. The existing nonparametric test is a generalization of the two-sample Mann-Whitney U test. It is an extremely useful test when the assumptions of normality do not hold, or the population variances are not equal, but helpless for data under uncertainty. The modified Kruskal Wallis test under uncertainty will be applicable under the following assumptions: 1. The data consists of uncertain, imprecise, and indeterminate values. 2. The neutrosophic samples must be random.
3. The two neutrosophic samples must be mutually independent. 4. The test is generally considered robust to ties, but if there are ties in the data set, they shouldn't be concentrated together in one part of the sample.   [19] having critical values for all combinations of sample sizes up to 5,5,5. In case one of the neutrosophic samples contains more than five observations, or there are more than five observations in each neutrosophic sample and the null hypothesis is true, the neutrosophic test statistic H N follows a chi-square distribution with (k-1) degrees of freedom.

Application of the proposed modified Kruskal Wallis H test
For applying the proposed neutrosophic Kruskal Wallis test, data representing the daily ICU occupancy by Corona-positive patients have been considered, which was recorded specifically from Pakistan. The hypothesis under investigation for the research is to test a statistically significant the difference in the daily ICU occupancy of Covid-19 patients based on their age groups. Neutrosophy or uncertainty is introduced in the data for a better illustration. The neutrosophic Kruskal Wallis test is applied to test the null hypothesis that there is no difference in the daily ICU occupancy of Covid-19 patients among different age groups in Pakistan during December 2020. Daily ICU occupancy of Covid-19 patients aged 55 and above are shown in Fig. 1, Daily ICU occupancy of Covid-19 patients aged 35-55 are shown in Fig. 2 and Daily ICU occupancy of Covid-19 patients aged 35 and below are shown in Fig. 3.
The neutrosophic null and alternate hypotheses for neutrosophic data given in Table 1  arranging the data in ascending order and assigning ranks to them, it is found that ties exist in the neutrosophic data set for both determinate and indeterminate parts. Therefore, the neutrosophic statistic is given in (4)    Assuming the level of significance to be 1%, the critical region is H N > χ 2 0.01,2 = 9.21. Since the calculated value of test statistic based on neutrosophic observations lies in the critical region (p-value < α), we, therefore, reject the neutrosophic null hypothesis and conclude that daily ICU occupancy of different age groups of Covid-19 patients is not equal.

Conclusion and discussion
This article proposed the modified form of the rank-based nonparametric Kruskal Wallis H test for observations containing the measure of uncertainty or the measurement of falseness when comparing k samples. It is evident from Table 1 that uncertain data used for the illustration purpose reduces to the determined part under classical statistics if no observations of uncertainty are logged. For example, for sample one, the first observation 443 for the first group represents the determinate part of the indeterminate interval. The second value, which is 450, represents the indeterminate part of the interval. We can observe here that the modified Kruskal-Wallis test results in the indeterminacy interval rather than the determined values, and this implies that the proposed test provides a good measure of uncertainty. Recent studies also show that the methods dealing with the interval-valued data are more suitable in the indeterminate environment than classical statistical techniques [25,26]. The work was originally motivated by the extensive research work under the fuzzy logic and neutrosophic statistics used for the interval-valued data set. The proposed nonparametric test can be readily applied to compare k samples testing the hypothesis that they have equal means.
The application of this new neutrosophic Kruskal-Wallis test on the Covid-19 data set showed that the proposed test provides more relevant and adequate results. The data representing the daily ICU occupancy by the Covid-19 patients were recorded for both determinate and indeterminate parts. The existing nonparametric Kruskal Wallis H test under Classical Statistics would have given misleading results. The proposed test showed that at a 1% level of significance, there is a statistically significant difference among the average daily ICU occupancy by corona-positive patients of different age groups.
The modified Kruskal Wallis test can be used to compare the averages of several sample observations or group(s); the proposed test is easy to apply and understandable. The Neutrosophic Kruskal Wallis test results in an uncertain interval, which is ideal when the data is measured from the complex system. The application of the proposed test is recommended for different fields, including biomedical sciences, engineering, and many other statistical areas. On the other hand, applying nonparametric tests under classical statistics on the data containing vagueness can produce misleading results. In conclusion, the proposed neutrosophic nonparametric test provides an efficient tool to data analysts for analyzing k samples in the presence of uncertainty and indeterminacy. However, more properties of this modified Kruskal-Wallis test can be derived for future research. The evaluation of the proposed test using different measures can be studied as future research.