 Research
 Open access
 Published:
Analysis of COVID19 data using neutrosophic Kruskal Wallis H test
BMC Medical Research Methodology volumeÂ 21, ArticleÂ number:Â 215 (2021)
Abstract
Background
KruskalWallis H test from the bank of classical statistics tests is a wellknown nonparametric alternative to a oneway analysis of variance. The test is extensively used in decisionmaking problems where one has to compare the equality of several means when the observations are in exact form. The test is helpless when the data is in an interval form and has some indeterminacy.
Methods
The intervalvalued data often contain uncertainty and imprecision and often arise from situations that contain vagueness and ambiguity. In this research, a modified form of the KruskalWallis H test has been proposed for indeterminacy data. A comprehensive theoretical methodology with an application and implementation of the test has been proposed in the research.
Results
The proposed test is applied on a Covid19 data set for application purposes. The study results suggested that the proposed modified KruskalWallis H test is more suitable in intervalvalued data situations. The application of this new neutrosophic KruskalWallis test on the Covid19 data set showed that the proposed test provides more relevant and adequate results. The data representing the daily ICU occupancy by the Covid19 patients were recorded for both determinate and indeterminate parts. The existing nonparametric KruskalWallis H test under Classical Statistics would have given misleading results. The proposed test showed that at a 1% level of significance, there is a statistically significant difference among the average daily ICU occupancy by coronapositive patients of different age groups.
Conclusions
The findings of the results suggested that our proposed modified form of the KruskalWallis is appropriate in place of the classical form of the test in the presence of the neutrosophic environment.
Introduction
Hypothesis testing is a scientific process used to investigate the acceptance or rejection of a proposition under consideration. Two approaches are used in statistics to verify a hypothesis: 1) Parametric approach 2) Nonparametric approach. The most important aspect of the parametric approach is the satisfaction of the assumption about dataâ€™s normality, and a few tests require the equality of population variances [1]. In most situations, the distributional assumption under a parametric test hardly satisfy, and the use of nonparametric or distributionfree tests is a common practice [2, 3]. However, all such nonparametric tests apply to data containing determined observations. In real life, there are various scenarios where we have nonprecise data, and in such cases, the existing hypothesis testing approach based on classical test statistics cannot be implemented. Recent studies have suggested nonparametric tests based on intervalvalued data and fuzzy logic [4]. Smarandache [5] generalized the fuzzy logic in the neutrosophic sense by considering the intervalvalued data and the measure of indeterminacy or falseness. Smarandache introduced Neutrosophic Statistics as a generalization of classical statistics applied when the data under consideration is in neutrosophic numbers [6]. Smarandache and Khalid [6] verified the efficiency of neutrosophic logic. Several authors have implemented neutrosophic logic for data containing uncertainty and vagueness; see ref [7,8,9,10,11].
Furthermore, several authors have developed statistical tests to analyze fuzzy data; see, for example, refs [12,13,14,15]. Also, in fuzzy logic and neutrosophic statistics, several research works have been contributed by introducing decisionmaking analysis for the data set containing uncertainty and vagueness [16,17,18]. Recently, Aslam introduced different statistical tests using Neutrosophic Statistics, including the tests of homogeneity of variance for uncertainty environment, the goodness of fit test in the presence of uncertain parameters, and the KolmogorovSmirnov tests under uncertainty [19,20,21].
In 1952, Kruskal and Wallis [12] provided a robust rankbased test for the k sample problem as an alternative to the parametric approaches, such as the oneway analysis of variance (ANOVA). KruskalWallis H test has been used for analysis purposes in various manners; for example, see refs [13,14,15,16,17,18]. In the classical k sample problem, data are determined and do not contain any ambiguity or vagueness. However, in many current scientific studies, the observations are not necessarily relentless, and indeterminate parts quantitatively express the uncertainties in a sample. The existing KruskalWallis H test cannot be used to investigate the data which is measured in the neutrosophic intervals. A detailed literature review has given a shred of clear evidence that no such test is available that can be useful as a nonparametric alternate for ANOVA under an indeterminate environment. The unavailability of a method for the said purpose is a source of motivation for the current research. The goal is to develop a test that compares several sample observations or group(s); the proposed test is easy to apply and understandable. The proposed modified KruskalWallis test results in the intervalvalued form and is preferable for data containing vagueness and uncertainty. The objectives of this article are (1) to introduce the modified neutrosophic Kruskal Wallis test; (2) to define the methodology of the neutrosophic Kruskal Wallis test; and (3) to compare the performance of the existing Kruskal Wallis test with the proposed test through an application on Covid19 data set under Neutrosophy. More information about the application of neutrosophic statistics can be seen in [22,23,24].
The article is planned as follows. Section 2 presents the computational method for the application of the neutrosophic Kruskal Wallis test. In section 3, the modified Kruskal Wallis test has been demonstrated with an eloquent example of the Covid19 data set for scrutinizing its efficiency and competence. It is anticipated that the modified nonparametric Kruskal Wallis test will proficiently analyze the data in the presence of uncertainty and vagueness as compared to the existing Kruskal Wallis test under classical statistics. Finally, the results are discussed and generalized with some conclusive remarks.
Computational method of the modified Kruskal Wallis test under uncertainty
In Classical Statistics, nonparametric tests are methods of statistical analysis that do not require a distribution to meet the assumptions necessary to be analyzed. These tests apply to nonnormal data sets. Due to this reason, they are sometimes referred to as distributionfree tests. The basic purpose of suggesting the Kruskal Wallis test is to scrutinize that all independent samples containing neutrosophic observations come from neutrosophic populations with equal means implying that the populations under uncertainty are identical. The proposed nonparametric test is applicable for data where the measure of uncertainty or the measure of falseness has been recorded. Suppose X_{N}â€‰=â€‰a_{N}â€‰+â€‰b_{N}I_{N}; X_{N}â€‰âˆˆâ€‰[X_{L},â€‰X_{U}] is a neutrosophic number where the first part represents the measure of determinacy and the second part represents the measure of vagueness or uncertainty. For I_{N}â€‰âˆˆâ€‰[I_{L}, I_{U}]â€‰=â€‰0, the neutrosophic number reduces to a random variable under classical statistics. The neutrosophic variable X_{N} represents the neutrosophic sample obtained from the population containing imprecise, uncertain, and indeterminate observations; for detail, see ref [5].
Modified Kruskal Wallis H test
Under Classical Statistics, the KruskalWallis H test is used to test the null hypothesis that all k independent samples come from populations having equal means against the alternative hypothesis that at least one population varies. The existing nonparametric test is a generalization of the twosample MannWhitney U test. It is an extremely useful test when the assumptions of normality do not hold, or the population variances are not equal, but helpless for data under uncertainty. The modified Kruskal Wallis test under uncertainty will be applicable under the following assumptions:

1.
The data consists of uncertain, imprecise, and indeterminate values.

2.
The neutrosophic samples must be random.

3.
The two neutrosophic samples must be mutually independent.

4.
The test is generally considered robust to ties, but if there are ties in the data set, they shouldnâ€™t be concentrated together in one part of the sample.
Suppose we have k_{N} independent neutrosophic samples of sizes n_{1N}, n_{2N}, â€¦, n_{kN} (âˆ‘n_{iN}â€‰=â€‰n_{N}). Let X_{iN} (X_{i1N},â€‰X_{i2N},â€‰X_{i3N},â€‰â€¦,â€‰X_{inN}) represents the neutrosophic observations of the ith sample. To perform this test under uncertainty, arrange all the n_{N} observations containing uncertainty of the k_{N} samples combined in ascending order of magnitude and assign the ranks to them. In the case of ties, assign the average of the ranks. To distinguish the neutrosophic sample observations, let the letters A_{N}, B_{N}, C_{N, â€¦} represent the sample observations of the first, second, and third neutrosophic samples, respectively. The observations of the neutrosophic samples are replaced with their corresponding ranks. Add these ranks for each sample and denote the sums by R_{1N}, R_{2N}, â€¦, R_{nk}. Now compute
and
where r_{ijN} is the rank assigned to neutrosophic observation X_{ijN}; X_{ijN}â€‰âˆˆâ€‰[X_{ijL},â€‰X_{ijU}].If there are no ties, then
The modified KruskalWallis statistic H_{N}; H_{N}â€‰âˆˆâ€‰[H_{L},â€‰H_{U}] is given by
where C_{N}; C_{N}â€‰âˆˆâ€‰[C_{L},â€‰C_{U}] denotes the appropriate correction term given by
In case of no ties, the neutrosophic statistic H_{N} becomes
The neutrosophi form of the proposed test H_{N}â€‰âˆˆâ€‰[H_{L},â€‰H_{U}] can be expressed as follows
Note here that the proposed statistic H_{N}â€‰âˆˆâ€‰[H_{L},â€‰H_{U}] is a generalization of the existing test under classical statistics. The first part H_{L} shows the determined part, H_{U}I_{NH} denoted the indeterminate part and I_{NH}â€‰âˆˆâ€‰[I_{LH},â€‰I_{UH}] is the measure of indeterminancy/uncertainty. The proposed test reduces to the existing test when I_{LH} =0.
The neutrosophic Kruskal Wallis H_{N} test is used to test the null hypothesis that all k_{N} populations have identical distributions. For a large value of the test statistic under uncertainty is rejected. For example, only three samples have five or fewer neutrosophic observations; the significance of this test statistic is determined by using Kruskal and Wallisâ€™ Table [19] having critical values for all combinations of sample sizes up to 5,5,5. In case one of the neutrosophic samples contains more than five observations, or there are more than five observations in each neutrosophic sample and the null hypothesis is true, the neutrosophic test statistic H_{N} follows a chisquare distribution with (k1) degrees of freedom.
Application of the proposed modified Kruskal Wallis H test
For applying the proposed neutrosophic Kruskal Wallis test, data representing the daily ICU occupancy by Coronapositive patients have been considered, which was recorded specifically from Pakistan. The hypothesis under investigation for the research is to test a statistically significant the difference in the daily ICU occupancy of Covid19 patients based on their age groups. Neutrosophy or uncertainty is introduced in the data for a better illustration. The neutrosophic Kruskal Wallis test is applied to test the null hypothesis that there is no difference in the daily ICU occupancy of Covid19 patients among different age groups in Pakistan during December 2020. Daily ICU occupancy of Covid19 patients aged 55 and above are shown in Fig. 1, Daily ICU occupancy of Covid19 patients aged 35â€“55 are shown in Fig. 2 and Daily ICU occupancy of Covid19 patients aged 35 and below are shown in Fig. 3.
The neutrosophic null and alternate hypotheses for neutrosophic data given in Table 1 are: The average daily ICU occupancy of Covid19 patients for all three age groups are equal against the alternative hypothesis that the average daily ICU occupancy of Covid19 patients for at least two of the three age groups are not equal. Table 1 contains data on daily ICU occupancy of Coronapositive patients by three different age groups. By combining and arranging the data in ascending order and assigning ranks to them, it is found that ties exist in the neutrosophic data set for both determinate and indeterminate parts. Therefore, the neutrosophic statistic is given in (4) applies to this data set containing the measure of uncertainty. Here R_{1N}â€‰=â€‰[757.5,â€‰766.5], R_{2N}â€‰=â€‰[775.5,â€‰767], R_{3N}â€‰=â€‰[297,â€‰296.5], where R_{1N}, R_{2N} and R_{3N} represents the sum of ranks of age groups 1, 2, and 3, respectively.
From (1) and (2), we have
and
From (4)
Assuming the level of significance to be 1%, the critical region is H_{N}â€‰>â€‰Ï‡^{2}_{0.01,2}â€‰=â€‰9.21. Since the calculated value of test statistic based on neutrosophic observations lies in the critical region (pvalue < Î±), we, therefore, reject the neutrosophic null hypothesis and conclude that daily ICU occupancy of different age groups of Covid19 patients is not equal.
Advantages of the proposed test
In this section, the efficiency of the proposed test H_{N}â€‰âˆˆâ€‰[H_{L},â€‰H_{U}] will be compared with the existing test under classical statistics in terms of a measure of uncertainty. As mentioned earlier, the neutrosophic H_{N}â€‰=â€‰H_{L}â€‰+â€‰H_{U}I_{NH}; I_{NH}â€‰âˆˆâ€‰[I_{LH},â€‰I_{UH}] has consisted of determinate (the existing test) and indeterminate parts. The neutrosophic form of H_{N}â€‰âˆˆâ€‰[H_{L},â€‰H_{U}] for the real data is expressed as: H_{N}â€‰=â€‰24.12â€‰+â€‰24.17I_{NH}; I_{NH}â€‰âˆˆâ€‰[0,0.002]; where the first value 24.12 shows the results of the existing test when I_{LH} =0 and 24.17I_{NH} is an indeterminate part. Note that the measure of indeterminacy associated with the test H_{N}â€‰âˆˆâ€‰[H_{L},â€‰H_{U}] is 0.002. From the study, it can be seen that the proposed test the result of the test statistic in the range of 24.12 to 24.17. On the other hand, the existing test provides only the determined/exact value of the test. In addition, the proposed test H_{N}â€‰âˆˆâ€‰[H_{L},â€‰H_{U}] gives information about the measure of uncertainty. Based on the information, the proposed test can be interpreted as follows: when the level of significance Î± =0.05, the chance of rejecting the null hypothesis when it is true is 0.05, the probability of accepting the null hypothesis is 0.95 with the chance of uncertainty of 0.002. From the comparisons, it can be concluded that the proposed test H_{N}â€‰âˆˆâ€‰[H_{L},â€‰H_{U}] gives more information about the test. In addition, the proposed test is flexible, adequate, and effective to be applied in uncertainty as compared to the existing test.
Conclusion and discussion
This article proposed the modified form of the rankbased nonparametric Kruskal Wallis H test for observations containing the measure of uncertainty or the measurement of falseness when comparing k samples. It is evident from Table 1 that uncertain data used for the illustration purpose reduces to the determined part under classical statistics if no observations of uncertainty are logged. For example, for sample one, the first observation 443 for the first group represents the determinate part of the indeterminate interval. The second value, which is 450, represents the indeterminate part of the interval. We can observe here that the modified KruskalWallis test results in the indeterminacy interval rather than the determined values, and this implies that the proposed test provides a good measure of uncertainty. Recent studies also show that the methods dealing with the intervalvalued data are more suitable in the indeterminate environment than classical statistical techniques [25, 26]. The work was originally motivated by the extensive research work under the fuzzy logic and neutrosophic statistics used for the intervalvalued data set. The proposed nonparametric test can be readily applied to compare k samples testing the hypothesis that they have equal means.
The application of this new neutrosophic KruskalWallis test on the Covid19 data set showed that the proposed test provides more relevant and adequate results. The data representing the daily ICU occupancy by the Covid19 patients were recorded for both determinate and indeterminate parts. The existing nonparametric Kruskal Wallis H test under Classical Statistics would have given misleading results. The proposed test showed that at a 1% level of significance, there is a statistically significant difference among the average daily ICU occupancy by coronapositive patients of different age groups.
The modified Kruskal Wallis test can be used to compare the averages of several sample observations or group(s); the proposed test is easy to apply and understandable. The Neutrosophic Kruskal Wallis test results in an uncertain interval, which is ideal when the data is measured from the complex system. The application of the proposed test is recommended for different fields, including biomedical sciences, engineering, and many other statistical areas. On the other hand, applying nonparametric tests under classical statistics on the data containing vagueness can produce misleading results. In conclusion, the proposed neutrosophic nonparametric test provides an efficient tool to data analysts for analyzing k samples in the presence of uncertainty and indeterminacy. However, more properties of this modified KruskalWallis test can be derived for future research. The evaluation of the proposed test using different measures can be studied as future research.
Availability of data and materials
All data generated or analysed during this study are included in this published article.
References
Higgins JJ. An introduction to modern nonparametric statistics. CA: Brooks/Cole Pacific Grove; 2004.
Krzywinski M, Altman N. Nonparametric tests. Nat Methods. 2014;11:467â€“8. https://doi.org/10.1038/nmeth.2937.
Chan Y. Biostatistics 102: quantitative dataâ€“parametric & nonparametric tests. Blood Press. 2003;140:79.
Buckley JJ. Fuzzy statistics: hypothesis testing. Soft Comput. 2005;9:512â€“8. https://doi.org/10.1007/s0050000403685.
Smarandache F. Neutrosophic LogicA Generalization of the Intuitionistic Fuzzy Logic. Multispace & Multistructure Neutrosophic Transdisciplinarity (100 Collected Papers of Science). 2010;4:396.
Smarandache, F., Khalid, H. E. & Essa, A. K. Neutrosophic Logic: the Revolutionary Logic in Science and Philosophy. (Infinite Study, 2018).
Nabeeh NA, AbdelBasset M, ElGhareeb HA, Aboelfetouh A. Neutrosophic multicriteria decision making approach for iotbased enterprises. IEEE Access. 2019;7:59559â€“74.
AbdelBasset M, Nabeeh NA, ElGhareeb HA, Aboelfetouh A. Utilising neutrosophic theory to solve transition difficulties of IoTbased enterprises. Enterprise Inform Syst. 2020;14:1304â€“24.
AbdelBaset M, Chang V, Gamal A. Evaluation of the green supply chain management practices: a novel neutrosophic approach. Comput Ind. 2019;108:210â€“20.
AbdelBasset M, Atef A, Smarandache F. A hybrid Neutrosophic multiple criteria group decision making approach for project selection. Cogn Syst Res. 2019;57:216â€“27.
Broumi, S., Bakali, A., Talea, M. & Smarandache, F. Bipolar neutrosophic minimum spanning tree. (Infinite Study, 2018).
Kruskal WH, Wallis WA. Use of ranks in onecriterion variance analysis. J Am Stat Assoc. 1952;47:583. https://doi.org/10.2307/2280779.
McKight, P. E. & Najab, J. Kruskalwallis test. The corsini encyclopedia of psychology, 1â€“1 (2010).
Hecke TV. Power study of anova versus KruskalWallis test. J Stat Manage Syst. 2012;15:241â€“7.
MacFarland, T. W. & Yates, J. M. in Introduction to nonparametric statistics for the biological sciences using R 177â€“211 (Springer, 2016).
Soltani N, Safajou F, Amouzeshi Z, Zameni E. The relationship between body image and mental health of students in Birjand in 2016 academic year: a short report. J Rafsanjan Univ Med Sci. 2017;16:479â€“86.
Lou, Y., Yuen, S. Y. & Chen, G. in Proceedings of the Genetic and Evolutionary Computation Conference Companion. 1337â€“1341.
Muremi, L. & Bokoro, P. in 2018 IEEE International Conference on Environment and Electrical Engineering and 2018 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe). 1â€“4 (IEEE).
Aslam M. A new goodness of fit test in the presence of uncertain parameters. Complex Intell Syst. 2021;7:359â€“65. https://doi.org/10.1007/s40747020002148.
Aslam M. Introducing Kolmogorovâ€“Smirnov tests under uncertainty: an application to radioactive data. ACS Omega. 2019;5(1):914â€“7.
Aslam M. Design of the Bartlett and Hartley tests for homogeneity of variances under indeterminacy environment. J Taibah Univ Sci. 2020;14(1):6â€“10.
Chen J, Ye J, Du S. Scale effect and anisotropy analyzed for neutrosophic numbers of rock joint roughness coefficient based on neutrosophic statistics. Symmetry. 2017;9:208.
Aslam M. Neutrosophic analysis of variance: application to university students. Complex Intelligent Syst. 2019;5(4):403â€“7.
Smarandache F. Introduction to neutrosophic statistics: infinite study. Columbus, OH, USA: RomaniaEducational Publisher; 2014.
Meyer JP, Seaman MA. A comparison of the exact KruskalWallis distribution to asymptotic approximations for all sample sizes up to 105. J Exp Educ. 2013;81:139â€“56.
Chen J, Ye J, Du S, Yong R. Expressions of rock joint roughness coefficient using neutrosophic interval statistical numbers. Symmetry. 2017;9:123.
Acknowledgments
We are deeply thankful to the editor and reviewers for their valuable suggestions to improve the quality of the paper.
Funding
none.
Author information
Authors and Affiliations
Contributions
RAKS, HS, WBA, MF, MA wrote the paper. All authors read and approved the final manuscript.
Authorsâ€™ information
N/A
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
N/A
Consent for publication
N/A
Competing interests
none.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Sherwani, R.A.K., Shakeel, H., Awan, W.B. et al. Analysis of COVID19 data using neutrosophic Kruskal Wallis H test. BMC Med Res Methodol 21, 215 (2021). https://doi.org/10.1186/s1287402101410x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402101410x