Incongruence between test statistics and P values in medical papers

García-Berthou, Emili; Alcaraz, Carles

doi:10.1186/1471-2288-4-13

Deficiencies of software

Rudolf Gasko, DOVERA Health Insurance Comp., Bratislava, Slovakia

15 May 2006

Deficiencies of software are, to our knowledge, often. We recently compared 3 programs - Statistix, Analyse-it, MedCalc. The respective zero hypothesis were tested in 5 artificially created data sets by the parametric unpaired t-test, non-parametric Mann-Whitney test, two-tailed F-test. The p values in the same tests were mutually compared.

All three programs calculated identical exact p values for the t-test. In the remaining two tests in case of 26 out of 44 calculations (59.1 %; 95 % confidence interval 43 to 73 %) different p values were calculated. The greatest difference was 18.35 %. In two cases the values oscillated about 0.05 and this fact caused essentially different interpretation of results.

Our discoveries should undermine the ungrounded belief of the users of statistical tests physicians in ununderminable accuracy of mathematical procedures.

Ref. 1. Gasko, R. Statistical hypothesis testing - how exact are exact p-values? Bratisl Lek Listy 2003; 104: 36-39. Free full text on www.bmj.sk

Competing interests

No declared.

Archived Comments for: Incongruence between test statistics and P values in medical papers

Deficiencies of software

Competing interests

BMC Medical Research Methodology

Contact us