 Research
 Open Access
 Published:
A logical analysis of null hypothesis significance testing using popular terminology
BMC Medical Research Methodology volume 22, Article number: 244 (2022)
Abstract
Background
Null Hypothesis Significance Testing (NHST) has been well criticised over the years yet remains a pillar of statistical inference. Although NHST is well described in terms of statistical models, most textbooks for nonstatisticians present the null and alternative hypotheses (H_{0} and H_{A}, respectively) in terms of differences between groups such as (μ_{1} = μ_{2}) and (μ_{1} ≠ μ_{2}) and H_{A} is often stated to be the research hypothesis. Here we use propositional calculus to analyse the internal logic of NHST when couched in this popular terminology. The testable H_{0} is determined by analysing the scope and limits of the Pvalue and the test statistic’s probability distribution curve.
Results
We propose a minimum axiom set NHST in which it is taken as axiomatic that H_{0} is rejected if Pvalue< α. Using the common scenario of the comparison of the means of two sample groups as an example, the testable H_{0} is {(μ_{1} = μ_{2}) and [(\(\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) due to chance alone]}. The H_{0} and H_{A} pair should be exhaustive to avoid false dichotomies. This entails that H_{A} is ¬{(μ_{1} = μ_{2}) and [(\(\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) due to chance alone]}, rather than the research hypothesis (H_{T}). To see the relationship between H_{A} and H_{T}, H_{A} can be rewritten as the disjunction H_{A}: ({(μ_{1} = μ_{2}) ∧ [(\(\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) not due to chance alone]} ∨ {(μ_{1} ≠ μ_{2}) ∧ [\((\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) not due to (μ_{1} ≠ μ_{2}) alone]} ∨ {(μ_{1} ≠ μ_{2}) ∧ [(\(\overline{\boldsymbol{x}}\) _{1} ≠ \(\overline{\boldsymbol{x}}\) _{2}) due to (μ_{1} ≠ μ_{2}) alone]}). This reveals that H_{T} (the last disjunct in bold) is just one possibility within H_{A}. It is only by adding premises to NHST that H_{T} or other conclusions can be reached.
Conclusions
Using this popular terminology for NHST, analysis shows that the definitions of H_{0} and H_{A} differ from those found in textbooks. In this framework, achieving a statistically significant result only justifies the broad conclusion that the results are not due to chance alone, not that the research hypothesis is true. More transparency is needed concerning the premises added to NHST to rig particular conclusions such as H_{T}. There are also ramifications for the interpretation of Type I and II errors, as well as power, which do not specifically refer to H_{T} as claimed by texts.
Background
Null Hypothesis Significance Testing (NHST^{Footnote 1}) and the Confidence Interval (CI) or estimation method are the pillars of statistical inference [1,2,3,4,5]. NHST is perhaps the more common of the two for the analysis of research questions [6]. In NHST a null hypothesis (H_{0}) is rejected in favour of an alternative hypothesis (H_{A}) only if the Pvalue, P (observed data or more extreme│H_{0}), falls below a prespecified αlevel. The latter is the maximum probability we are prepared to tolerate of erroneously rejecting H_{0}. If the Pvalue is less than α, then this is called a statistically significant result and H_{0} can be rejected. Some familiarity with NHST will be assumed in this paper. NHST is a combination of two different statistical theories: R. A. Fisher’s Pvalue significance test, and the NeymanPearson technique of hypothesis testing. The two groups never intended to unite the theories, with wellknown antagonisms existing between them [7]. However, NHST gained traction perhaps due to its appeal as a mechanical decision tool. Parallel to its popularity is the detailed, sharp criticism it has received from several quarters. Problems raised include: the misinterpretation of the Pvalue as P(H_{0}│observed data) rather than P (observed data or more extreme│H_{0}); the artificial dichotomous nature of statistical significance; and the conflation of statistical significance with clinical importance [8]. In fact, Pvalues have even been temporarily banned from some journals [9]. More recently, the correct level of statistical significance (Pvalue or α cutoff) has again been debated [10]. However, rather than cover old ground, we will here present a new logical analysis of a popular version of NHST presented in textbooks. NHST is perhaps best explained in terms of statistical models [11]. However, in most popular textbooks for nonstatisticians, NHST is frequently presented in terms of the difference between population or sample groups and framed in reference to the research hypothesis. The need for an indepth focus on the logic of NHST when couched in these terms can be seen from the following summary.
Starting with H_{0}, there are various definitions offered. H_{0} is the hypothesis of no difference or association between groups [1, 5, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. Using population means (μ) as an example, this is H_{0}: μ_{1} = μ_{2}, meaning there is no difference in the population [2, 28,29,30,31,32,33]. In addition, there is the idea that H_{0} is the opposite/reverse/complement/negation of the test/experimental/study/research hypothesis [1, 3, 6, 25, 27, 28]. In clinical studies, this segues to the stronger claim that the absence of a difference is due to a lack of treatment effect [3, 5, 6, 13, 20, 21, 28, 31, 34,35,36]. In contrast to the idea of “no difference” is the anticipation that chance or random variation will produce a difference between the sample means [37]. Some texts unite the two ideas about the presence and absence of difference into one H_{0} which states there is no difference in the population and the difference in the sample groups is due to chance [2, 38,39,40,41]. Although a symbol exists for the mean of the sample group \((\overline{x})\), there was no example of this more complex version of H_{0} translated into symbols in any text sampled. In fact, some texts mention this more complex H_{0} only to quickly drop the idea and revert to H_{0}: μ_{1} = μ_{2} anyway [27, 42].
Moving on to the definition of H_{A}, we find similar themes phrased in a contrary fashion. H_{A} is the hypothesis that there is a difference or association between the groups [12, 13, 22, 23, 32]. Some specify that the groups are the populations such that H_{A}: μ_{1} ≠ μ_{2} [2, 4, 24]. This type of difference is described as statistically significant [26] or real [2, 17, 18, 42, 43]. H_{A} is elsewhere proposed to be: the experimental/ research/study hypothesis [3, 5, 6, 28, 36, 43]; or the hypothesis that there is a treatment effect [1, 6, 20, 33, 34, 39]; or the contradictory or complementary hypothesis to H_{0} [14, 34, 35, 42]. There are attempts to unite claims about the population and sample groups, namely that the difference in the sample groups is due to the difference in the population [42]. Again, in the texts sampled, the latter hypothesis was never translated into symbols or further pursued.
Another area of disagreement, apart from the content of H_{A}, is the strength of the conclusion when rejecting H_{0}. Some claim we accept H_{A} as true [1, 5, 16, 20, 23] or real [18]. There are also softer versions that state H_{A} is just “supported” or is “probably true” [6, 19]. Alternatively, conclusions can be framed in terms of the test hypothesis being true [2, 15, 16, 20, 27, 29, 33,34,35, 43, 44], or more tentatively, that we gain confidence or support for the test hypothesis [6, 25, 28, 31, 41, 42]. More bewildering still are claims suggesting there are multiple other hypotheses or explanations! [1, 12, 16, 21, 34, 35, 40]
The interpretation of the phrase “statistically significant” [2, 5, 21, 34, 39, 40, 42], often abbreviated to just “significant” [21, 25, 27, 28, 30, 33,34,35], ranges from the claim that the data are not due to chance [24, 45] to the weaker claim that the data are unlikely to be due to chance [2, 18, 40].
In NHST, H_{0} and H_{A} are presented as a hypothesis pair. A commonly presented pair is H_{0}: μ_{1} = μ_{2} and H_{A}: μ_{1} ≠ μ_{2}. This hypothesis pair is mutually exclusive and exhaustive which some texts explicitly state are desirable characteristics [1, 19, 46]. Elsewhere, however, H_{0} and H_{A} are frequently presented as a nonexhaustive, false dichotomy between the test hypothesis and the hypothesis that the results are due to chance [3, 6, 16, 18, 19, 24, 25, 27, 34, 38, 40, 41, 44].
From the above we see that this family of interpretations of NHST provides no consensus on many aspects. This poses a challenge to interpreting NHST when expressed in this fashion. From within the framework of this popular terminology, the purpose of the present paper is to

1/ define H_{0}, H_{A}, power and type I and II errors,

2/ define the minimum axiom set for NHST and

3/ make transparent which assumptions are needed to conclude the research hypothesis is true.
Methods
Here we assume the common terminology of expressing NHST in terms of differences between populations or sample groups and in reference to the research hypothesis. The scope and limits of the Pvalue, the test statistic and its probability distribution curve (PDC) will be used to arbitrate on the correct form of H_{0} and H_{A} within this framework. Propositional calculus will be employed to analyse NHST. We also acknowledge multifactorial hypotheses. For example, we can hypothesise that the difference between two sample groups is due to bias, chance or an intervention. These hypotheses are independent which entails that they can act in combination to produce the results. To disambiguate between single or multifactorial hypotheses, the term “alone” will be used to refer to the former. For example, “(\(\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) due to chance alone” means chance is the only factor involved in the sample group difference, as opposed to chance acting in concert with other factors to produce the results.
Results
For consistent vocabulary throughout this paper, we will use as our example the common scenario of comparing the means of two sample groups. The appropriate test statistic for this is the tstatistic which has its relevant PDC. We will commence by stating the minimum axiom set needed for a NHST to function. To this end, we accept as axiomatic that if P(observed data or more extreme│H_{0}) < α, then reject H_{0} and accept H_{A}.
The testable H _{0}
In the introduction we saw that H_{0} had various definitions including H_{0}: μ_{1} = μ_{2} or the “opposite” of the research hypothesis. Understandably, these are H_{0}’s that we would like to test, but that does not guarantee that these candidates are testable. Here we propose a new approach: the decision concerning which is the correct H_{0} should be determined by the scope and limits of the actual technique that will be used to reject H_{0}. In our example, the decision to reject H_{0} is based on the Pvalue of the tstatistic read off from its PDC. The PDC yields the probability of finding the observed tstatistic value (or more extreme) due to chance alone when there is no difference in the population means. In symbols, (something which never appeared in the texts mentioned in the introduction), the PDC gives us
Given that the definition of the Pvalue is
we can now see that the H_{0} which the Pvalue and PDC can actually test must be
In other words, it is the hypothesis that the finding in the sample groups is due to chance or random variation alone and does not reflect a difference in the underlying population.
Rejecting (μ _{1} = μ _{2})
Textbooks often claim that we can use NHST to reject (μ_{1} = μ_{2}). However, this is not logically possible with the minimum axiom set NHST. To demonstrate this, we will need to transform (μ_{1} = μ_{2}) to a logically equivalent proposition and use propositional calculus. The proposition (μ_{1} = μ_{2}) is a proposition about the equality of the population means, but states nothing about the sample group means \((\overline{x})\). Using a truth table (Table 1), we can rewrite (μ_{1} = μ_{2}) in a logically equivalent way such that the sample group means do appear in the proposition but without any claim being made about them.^{Footnote 2} Note that P(\(\overline{x}\) _{1} = \(\overline{x}\) _{2}) =0, so any proposition containing (\(\overline{x}\) _{1} = \(\overline{x}\) _{2}) can be eliminated from the analysis.
From Table 1, (μ_{1} = μ_{2}) ≡
Logical equivalence is established because whenever (μ_{1} = μ_{2}) is true, 1 is true too, and whenever (μ_{1} = μ_{2}) is false, 1 is also false. This transformation now allows us to see why eliminating the testable H_{0} does not logically imply the elimination of (μ_{1} = μ_{2}). Let the first disjunct of 1 be called C, and the second disjunct E. Thus, 1 becomes the disjunction C v E. We recognise C as the testable H_{0}. The PDC can assess C, and so it may be possible to reject C depending on the Pvalue. However, the PDC cannot assess E. So even if we do reject C, we cannot reject E, and therefore we cannot reject the whole proposition C v E. Since 1 is logically equivalent to (μ_{1} = μ_{2}), we see that we cannot reject (μ_{1} = μ_{2}) using the minimum axiom set NHST. In other words, (μ_{1} = μ_{2}) is not rejected when we reject the testable H_{0}: {(μ_{1} = μ_{2}) ∧ [(\({\overline{x}}_1\) ≠ \({\overline{x}}_2\)) due to chance alone]}. To reject (μ_{1} = μ_{2}), a further premise will need to be added, namely ¬{(μ_{1} = μ_{2}) ∧ [(\({\overline{x}}_1\) ≠ \({\overline{x}}_2\)) not due to chance alone]}.
The real H _{A}
We take it as axiomatic that H_{0} and H_{A} are mutually exclusive: the hypotheses should not overlap in the sample space. An issue identified in the introduction was whether the hypothesis pair should also be exhaustive. There are serious consequences when the pair are made into a false dichotomy. An obvious criticism is that other possibilities are simply ignored. Furthermore, it opens a Pandora’s box of candidates for H_{A}. Frequently the research or test hypothesis (here H_{T}) is proposed as H_{A}. This is the proposition that there is a difference in the population due to the study intervention or treatment and the finding in the sample groups is due to this difference alone. In symbols
However, if false dichotomies are allowed, what is to prevent other hypotheses being proposed as H_{A}? Such as the hypothesis that bias or confounding produced the results, or some other hypothesis, or even combinations of hypotheses given that they are all independent propositions. In a false dichotomy the selection of H_{A} is subject to prejudice.
The above problems are avoided by forming an exhaustive hypothesis pair. To avoid logical errors of negation, it is critical to note that H_{A} must be the negation of the entire proposition represented by H_{0}, not just a negation of part of H_{0}. So H_{A} must be ¬H_{0} and the real H_{A}: ¬{(μ_{1} = μ_{2}) and [(\({\overline{x}}_1\) ≠ \({\overline{x}}_2\)) due to chance alone]}. Therefore, the only justifiable exhaustive hypothesis pair is
The relationship between H _{A} and H _{T}
H_{A} is a more complex proposition than H_{T}. Once again, we can transform H_{A} into a logically equivalent proposition which has H_{T} as a component. Let H_{A} be represented by ¬(G ∧ J), where G is “μ_{1} = μ_{2}”, and J is “ \((\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) due to chance alone.” The truth table for ¬(G ∧ J) is shown in Table 2.
Table 2 shows that ¬(G ∧ J) is true (bold T in last column) when G and ¬J are true (the second row), or ¬G and J are true (the third row), or ¬G and ¬J are true (the last row). This allows us to formulate a disjunction logically equivalent to ¬(G ∧ J). Thus ¬(G ∧ J) ≡ (G ∧ ¬J) ∨ (¬G ∧ J) ∨ (¬G ∧ ¬J). Now ¬J ≡ {(\(\overline{x}\) _{1} = \(\overline{x}\) _{2}) ∨ [(\(\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) not due to chance alone]}. However, as stated previously, we can eliminate (\(\overline{x}\) _{1} = \(\overline{x}\) _{2}) making ¬J ≡ [(\(\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) not due to chance alone]. Substituting back, H_{A} ≡
Furthermore, the second disjunct is a contradiction and can be eliminated giving
Where does H_{T} lie in 2? H_{T} is contained within the last disjunct of 2, {(μ_{1} ≠ μ_{2}) ∧ [(\(\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) not due to chance alone]}. The latter disjunct expresses the proposition that there is a difference found in the population and also that the sample group difference is not due to chance alone, but instead is due to some other alternative. The other alternatives include the test intervention or bias or some other unknown or even a combination of these given that the alternatives are independent hypotheses. Taking this into account we can rewrite 2 such that H_{A} ≡
The last disjunct of 3 is H_{T} (in bold), indicating that H_{T} is just one subhypothesis of H_{A}.
Finally, the answer to the question “What do we accept when we reject H_{0}?” is: we accept the real H_{A} or its logical equivalent (3). Therefore, a statistically significant finding, expressed in these common terms, should be interpreted as meaning that the data is not due to chance alone. Statistical significance is not a licence to accept H_{T}.
The effect of further premises on the minimum axiom set NHST
It is only by adding premises to NHST that we can conclude anything other than the real H_{A}. The danger with this strategy is that of partially assuming what is being proved. Table 3 presents examples of premises that if added to NHST would rig different conclusions.
Some texts claim that all that is needed to conclude H_{T} when H_{0} is rejected is the assumption that there is no bias [35, 47]. However, Table 3 illustrates exactly which premises are needed in order to conclude H_{T}. Apart from assuming no bias, it is also necessary to assume there are no combination hypotheses in which chance plays a role. A corollary is that if NHST could lead us to conclude H_{T} of its own accord, no further premises would be required. What would the conclusion be if indeed we only assumed that there was no bias? The middle column of Table 3 shows the conclusion. In a model which stipulates that the possible causes of the sample group difference are chance, bias or the intervention (or combinations thereof), the conclusion would be
The first disjunct in bold is H_{T}, showing that the conclusion is more complex than H_{T} alone. The last column demonstrates that a different package of additional premises can be tailored to reach a different conclusion such as the hypothesis that bias produced the results, here represented as H_{B}: {(μ_{1} = μ_{2}) ∧ [(\(\overline{x}\) _{1} ≠ \(\overline{x}\) _{2}) due to bias alone]}. Similar to arithmetic, the process in Table 3 is commutative. The same results are achieved if we were to make the assumptions first and then do the NHST or vice versa ― the order does not matter.
Application to other statistical problems
So far we have focused on the comparison of sample group means. However, with appropriate changes in vocabulary we can define the real H_{0} and H_{A} for other scenarios ― mutatis mutandis, as they say. As illustrations, H_{0} and H_{A} in general form, for the comparison of sample group proportions, and for correlation are presented in Table 4.
Failure to reject H _{0}
What are we to conclude if we fail to reject H_{0}? The axiom of NHST states that we reject H_{0} if Pvalue < α. This does not logically imply that if Pvalue ≥ α we must accept H_{0} ― the axiom and the claim about accepting H_{0} are logically distinct ideas. So if Pvalue ≥ α, we should merely state we have failed to reject H_{0} rather than we accept H_{0}.
Power (1β), type I (α) and type II (β) errors
Textbooks which express NHST in terms of the research hypothesis also tend to carry this over to descriptions of Type I and II errors, as well as power calculations. However, this is fraught with error as can be seen when we apply the real definitions of H_{0} and H_{A}. Type I error is the probability of eliminating H_{0}, and accepting H_{A}, when in fact H_{0} is true. Using the real definitions of H_{0} and H_{A} gives us type I error:
Importantly, type I error is not the probability of accepting H_{T} when H_{0} is true. Since H_{A} is a disjunction, there are multiple propositions that can make it true, with H_{T} being just one of these. So P(H_{A}) > P(H_{T}) and P(mistakenly accepting H_{T}) > P(mistakenly accepting H_{A}). The conflation of H_{T} with H_{A} results in underestimating the probability of mistakenly accepting H_{T}.
Similarly for type II error which is the probability of not rejecting H_{0}, and not accepting H_{A}, when H_{0} is false and should have been rejected. Namely,
Type II error is not the probability of not accepting H_{T} when H_{0} is false. A low probability of not accepting H_{A} does not logically imply a low probability of not accepting H_{T}. P(not accepting H_{T}) > P(not accepting H_{A}) because more propositions need to be rejected in order to accept H_{T}. The conflation of H_{T} with H_{A} results in underestimating the probability of not accepting H_{T} when H_{0} is false.
Power (1 β) refers to the probability of rejecting H_{0} and accepting H_{A} given H_{0} is false. Specifically, power is
However, it does not refer to P(accepting H_{T}│H_{T}). The power to conclude H_{T} < the power to conclude H_{A}. The conflation of H_{T} with H_{A} results in overestimating the power to conclude H_{T} because H_{T} is just one part of H_{A}.
Discussion
NHST has been well described in terms of statistical models. However, it is also commonly presented in terms of group comparisons and with reference to the research hypothesis. Despite this being a popular interpretation, there is currently no standardised approach. The variation in definitions of H_{0} and H_{A}, how they should be paired and conclusions that can be drawn by eliminating H_{0} motivated this new logical analysis. Looking at the conditions of the Pvalue we can see that there can be only one testable H_{0}. Presenting H_{0} and H_{A} as a false dichotomy is common but unjustifiable. Combining these two ideas entails that H_{A} is ¬H_{0}. Texts should acknowledge this and also make transparent any premises added in order to reach a conclusion other than ¬H_{0} when H_{0} is rejected.
It may be thought that using the estimation or CI method can avoid the problems of expressing NHST in these terms. However, this is not true if the estimation method is used as a de facto NHST. The estimation method can be used as a NHST because the CI is mathematically related to the αlevel and the Pvalue such that if the CI does not cross zero (or 1 for ratios), we can claim statistical significance. In the context of using CI as a NHST, the conclusions of the present paper are relevant. Consequently, when using the CI method, the correct interpretation of statistical significance would be to accept the real H_{A} and not claim that H_{T} is true. Of course, there are other appealing features of the CI method and the present discussion is limited only to its use as a significance test.
A limitation of the present paper is that we have not questioned the axiom of NHST that we reject H_{0} if the Pvalue < α. An analysis of this axiom deserves a paper in its own right which discusses inductive logic and defines the conditions under which the axiom is reliable. The issue in the present paper has been solely that if we are to use NHST as it is commonly presented it should at least be with justifiable definitions of H_{0} and H_{A}, transparent assumptions and valid deductions from the given premises.
Conclusions
NHST is commonly expressed in terms of differences between groups and with reference to the research hypothesis. Within this framework, logical analysis reveals that the minimum axiom set NHST (for comparing sample means) is as follows:

H_{0}: {(μ_{1} = μ_{2}) and [(\(\overline{x}\)_{1} ≠ \(\overline{x}\)_{2}) due to chance alone]},

H_{A}: ¬{(μ_{1} = μ_{2}) and [(\(\overline{x}\)_{1} ≠ \(\overline{x}\)_{2}) due to chance alone]}.

If Pvalue ≥ α, then fail to reject H_{0.}

If Pvalue < α, reject H_{0} and conclude H_{A.}
At best, it can be concluded that if H_{0} is rejected, the data were not due to chance alone. Texts should also be transparent about which assumptions have been added to rig a conclusion such as H_{T}. Care should also be exerted to avoid misinterpreting type I and II errors, as well as power, in terms of the research hypothesis.
Availability of data and materials
All data generated or analysed during this study are included in this published article.
Notes
“NHST” is probably the most widely used abbreviation for the various names applied to hypothesis and significance tests 1. Nickerson RS. Null hypothesis significance testing: a review of an old and continuing controversy. Psychol Methods 2000; 5: 241–301. 2000/08/11. DOI: https://doi.org/10.1037/1082989x.5.2.241.
Truth tables analyse the truth of complex propositions based on giving truth values of true (T) or false (F) to its elemental components. When propositions are subject to logical analysis here, we shall use the symbols of propositional calculus: “∧” for “and”; “∨” for “or”; and “¬” for “not” used to express negation. “¬X” means “It is not the case that X.” “≡” means “is equivalent to” such that “X ≡ Y” means “proposition X is equivalent to proposition Y.”
References
Daniel WW. Biostatistics : a foundation for analysis in the health sciences. 9th ed. Hoboken: Wiley; 2009.
Munro BH, Page EB. Statistical methods for health care research, vol. xi. 2nd ed. Philadelphia: Lippincott; 1993. p. 403.
Gallin JI, Ognibene FP, Johnson LL. Principles and practice of clinical research, vol. xvii. 4th ed. London: Academic Press; 2018. p. 80.
Mann PS, Lacke CJ. Introductory statistics, vol. xx. 7th ed. Hoboken: Wiley; 2010. p. 116.
Sullivan LM. Essentials of biostatistics in public health, vol. xii. 3rd ed. Burlington: Jones & Bartlett Learning; 2018. p. 376.
Field AP. Discovering statistics using IBM SPSS statistics : and sex and drugs and rock 'n' roll, vol. xxxvi. 4th ed. Los Angeles: Sage; 2013. p. 915.
Salsburg D. The lady tasting tea : how statistics revolutionized science in the twentieth century, vol. xi. New York: W.H. Freeman; 2001. p. 340.
Nickerson RS. Null hypothesis significance testing: a review of an old and continuing controversy. Psychol Methods. 2000;5:241–301. https://doi.org/10.1037/1082989x.5.2.241.
Trafimow D, Marks M. Editorial. Basic Appl Soc Psychol. 2015;37:1–2. https://doi.org/10.1080/01973533.2015.1012991.
Ioannidis JPA. The Proposal to Lower P Value Thresholds to .005. JAMA. 2018;319:1429–30. https://doi.org/10.1001/jama.2018.1536.
Lehmann EL, Romano JP. Testing statistical hypotheses, vol. xiv. 3rd ed. New York: Springer; 2005. p. 784.
Stewart A. Basic statistics and epidemiology : a practical guide, vol. iv. 3rd ed. Oxford: Radcliffe Pub; 2010. p. 200.
Everitt B. Medical statistics from A to Z : a guide for clinicians and medical students, vol. vi. 2nd ed. Cambridge: Cambridge University Press; 2006. p. 249.
Gerstman BB. Basic biostatistics : statistics for public health practice, vol. xv. 2nd ed. Burlington: Jones & Bartlett Learning; 2015. p. 644.
Hickson M. Research handbook for health care professionals, vol. xiv. Chichester, U.K: WileyBlackwell; 2008. p. 184.
Katz MH. Study design and statistical analysis : a practical guide for clinicians. Cambridge: Cambridge University Press; 2006. p. 188.
Katz DL, Jekel JF. Jekel's epidemiology, biostatistics, preventive medicine, and public health, vol. xiii. 4th ed. Philadelphia, London: Saunders; 2014. p. 405.
O'Brien PMS, BroughtonPipkin F. Introduction to research methodology for specialists and trainees. 3rd ed. Cambridge, New York: Cambridge University Press; 2017.
Townend J. Practical statistics for environmental and biological scientists, vol. x. Chichester; New York: Wiley; 2002. p. 276.
Bland M. An introduction to medical statistics, vol. xviii. 4th ed. Oxford: Oxford University Press; 2015. p. 427.
Wang D, Bakhai A. Clinical trials : a practical guide to design, analysis, and reporting, vol. xiii. London: Remedica; 2006. p. 480.
Guluma K, Wilson MP, Hayden S. Doing research in emergency and acute care : making order out of chaos. Chichester, West Sussex; Hoboken: Wiley; 2015.
Hulley SB. Designing clinical research. 4th ed. Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins; 2013.
Peat JK, Barton B. Medical statistics : a guide to SPSS, data analysis, and critical appraisal. 2nd ed. Chichester, West Sussex ; Hoboken: John Wiley & Sons Inc.; 2014.
Harris M, Taylor G. Medical statistics made easy 3, vol. xii. 3rd ed. Banbury: Scion; 2014. p. 116.
Hofmann AH. Scientific writing and communication. Papers, proposals, and presentations. 3rd ed. New York: Oxford University Press; 2017.
Campbell MJ, Walters SJ, Machin D. Medical statistics : a textbook for the health sciences, vol. xii. 4th ed. Chichester, Hoboken: Wiley; 2007. p. 331.
Hill T, Lewicki P. Statistics : methods and applications : a comprehensive reference for science, industry, and data mining, vol. xvi. Tulsa: StatSoft; 2006. p. 832.
Riegelman RK. Studying a study and testing a test : how to read the medical evidence, vol. vii. 5th ed. Philadelphia: Lippincott Williams & Wilkins; 2005. p. 403.
Rees DG. Essential statistics, vol. xiii. 2nd ed. London, New York: Chapman and Hall; 1989. p. 258.
Kuzma JW, Bohnenblust SE. Basic statistics for the health sciences, vol. xvii. 4th ed. Mountain View: Mayfield Pub. Co; 2001. p. 364.
Peat JK, Barton B, Elliott EJ. Statistics workbook for evidencebased healthcare, vol. viii. Malden: Blackwell; 2008. p. 182.
Altman DG. Practical statistics for medical research, vol. xii. Boca Raton: Chapman & Hall/CRC; 1999. p. 611.
Myles PGT. Statistical methods for Anaesthesia and intensive care. Edinburgh: ButterworthHeinemann; 2000.
Rosner B. Fundamentals of biostatistics, vol. xix. 8th ed. Boston: Cengage Learning; 2016. p. 927.
Petrie A, Sabin C. Medical statistics at a glance. 3rd ed. Chichester, Hoboken: WileyBlackwell; 2009. p. 180.
Campbell MJ, Swinscow TDV. Statistics at square one, vol. iv. 11th ed. Chichester, Hoboken: WileyBlackwell/BMJ Books; 2009. p. 188.
Argyrous G. Statistics for social and Health Research. Great Britain: Sage Publications; 2000.
McCaig C, Dahlberg L. Practical research and evaluation : a starttofinish guide for practitioners, vol. p.viii. London: SAGE; 2010. p. 263.
Daly LE, Bourke GJ, Bourke GJ. Interpretation and uses of medical statistics, vol. xiii. 5th ed. Oxford: Blackwell Science; 2000. p. 568.
Kirkwood BR, Sterne JAC, Kirkwood BR. Essential medical statistics, vol. x. 2nd ed. Malden: Blackwell Science; 2003. p. 501.
Le CT, Eberly LE. Introductory biostatistics, vol. xvii. 2nd ed. Hoboken, New Jersey: Wiley; 2016. p. 591.
McKenzie S. Vital statistics: an introduction to health science statistics. Chatswood: Churchill Livingstone.
Glantz SA. Primer of biostatistics. 7th ed. New York: McGrawHill Medical Pub. p. 2002.
Gosall NaG G. The doctor's guide to critical appraisal. 4th ed. UK: Pastest.
Glover T, Mitchell K. An introduction to biostatistics, vol. x. 3rd ed. Long Grove: McGrawHill; 2016. p. 487.
Hill AB. Principles of medical statistics. 12th ed. New York: Oxford University Press; 1989.
Acknowledgements
The anonymous reviewers are thanked for many useful comments.
List of abbreviations and symbols
α: alphalevel. The prespecified acceptable ceiling on the type I error. The threshold which defines the critical region of the PDC, or the threshold below which the Pvalue has to fall in order to reject H_{0}.
β: type II error. The probability of not rejecting H_{0} when H_{0} is false.
H_{A}: the alternative hypothesis to H_{0} which is accepted only when H_{0} is rejected.
H_{B}: the hypothesis that bias is solely responsible for the research finding.
H_{0}: the null hypothesis. In NHST, it is only rejected when Pvalue < α.
H_{T}: the test or research hypothesis. Sometimes cited as the candidate for H_{A}. For example, the hypothesis that a drug is the cause of a difference between two sample groups, or there is an association between two variables.
μ: mu. The mean of the population.
NHST: null hypothesis significance test/testing. It will be used here as an umbrella term referring to both “test” or “testing” which will be clear from the context.
Pvalue: P(observed data (or more extreme))│H_{0}).
PDC: probability distribution curve of the test statistic.
p: the sample proportion.
p̂: the population proportion.
ρ (rho): population Pearson correlation coefficient.
r: sample group Pearson correlation coefficient.
\(\overline{x}\): the mean of the sample group.
∧: and, used to express conjunction.
∨: or, used to express disjunction.
¬: not, used to express negation. "It is not the case that..."
≡: logical equivalence. E.g., “X ≡ Y” means proposition X is logically equivalent to proposition Y.
Funding
N/a
Author information
Authors and Affiliations
Contributions
RM is sole author. The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
N/a
Consent for publication
N/a
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
McNulty, R. A logical analysis of null hypothesis significance testing using popular terminology. BMC Med Res Methodol 22, 244 (2022). https://doi.org/10.1186/s12874022016965
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874022016965
Keywords
 Logic
 Null hypothesis significance test
 Hypothesis testing
 Statistical inference
 Statistical significance
 Type I error
 Type II error
 Power