Skip to main content

Archived Comments for: Incongruence between test statistics and P values in medical papers

Back to article

  1. Deficiencies of software

    Rudolf Gasko, DOVERA Health Insurance Comp., Bratislava, Slovakia

    15 May 2006

    Deficiencies of software are, to our knowledge, often. We recently compared 3 programs - Statistix, Analyse-it, MedCalc. The respective zero hypothesis were tested in 5 artificially created data sets by the parametric unpaired t-test, non-parametric Mann-Whitney test, two-tailed F-test. The p values in the same tests were mutually compared.

    All three programs calculated identical exact p values for the t-test. In the remaining two tests in case of 26 out of 44 calculations (59.1 %; 95 % confidence interval 43 to 73 %) different p values were calculated. The greatest difference was 18.35 %. In two cases the values oscillated about 0.05 and this fact caused essentially different interpretation of results.

    Our discoveries should undermine the ungrounded belief of the users of statistical tests physicians in ununderminable accuracy of mathematical procedures.

    Ref. 1. Gasko, R. Statistical hypothesis testing - how exact are exact p-values? Bratisl Lek Listy 2003; 104: 36-39. Free full text on

    Competing interests

    No declared.