统计假设检验——精确的p值有多精确？

Statistical hypothesis testing--how exact are exact p-values?

作者信息

Gasko R

机构信息

Vzajomna zdravotna poistovna Dovera, Kosice, Slovakia.

出版信息

Bratisl Lek Listy. 2003;104(1):36-9.

PMID:12830995

Abstract

OBJECTIVES AND BACKGROUND

When testing a hypothesis statistically, a principle is generally accepted that exact p values shall be stated in the treatise. Researchers have the choice of many statistical computer programmes with implemented hypothesis tests. Are exact p values calculated in the same statistical tests by diverse statistical programmes identical?

METHODS

The respective zero hypothesis were tested in 5 artificially created data sets by the parametric unpaired t-test, non-parametric Mann-Whitney test, two-tailed F-test. The calculations were carried out by the following programmes: Statistix, version 7.1 (source www.statistix.com), Analyse-it, version 1.62 (source www.analyse-it.com), MedCalc, version 6.14 (source www.medcalc.be). The p values in the same tests were mutually compared.

RESULTS

All three programmes calculated identical exact p values for the t-test. In the remaining two tests in case of 26 out of 44 calculations (59.1 per cent; 95 per cent confidence interval 43-73 per cent) different p values were calculated. The greatest difference was 18.35 per cent. In two cases the values oscillated about 0.05 and this fact caused essentially different interpretation of results.

CONCLUSIONS

Using the significance test in the biomedical research has been subject to criticism for a longer period of time. The testing of the zero hypothesis on the arbitrary significance level of 0.05 should be substituted by other methods. Our discoveries should undermine the ungrounded belief of the users of statistical tests--physicians in ununderminable accuracy of mathematical procedures. The use of confidence intervals deems much more suitable although there are objections against them as well. (Tab. 4, Fig. 1, Ref. 19.).

摘要

目的与背景

在对假设进行统计学检验时，一般公认的原则是应在论文中陈述确切的p值。研究人员可以选择许多已实施假设检验的统计计算机程序。不同的统计程序在相同的统计检验中计算出的确切p值是否相同？

方法

通过参数非配对t检验、非参数曼-惠特尼检验、双尾F检验对5个人工创建的数据集检验各自的零假设。计算由以下程序进行：Statistix 7.1版（来源：www.statistix.com）、Analyse-it 1.62版（来源：www.analyse-it.com）、MedCalc 6.14版（来源：www.medcalc.be）。对相同检验中的p值进行相互比较。

结果

所有三个程序对t检验计算出相同的确切p值。在其余两项检验中，44次计算中有26次（59.1%；95%置信区间43 - 73%）计算出不同的p值。最大差异为18.35%。在两种情况下，值在0.05左右波动，这一事实导致对结果的解释有本质不同。

结论

在生物医学研究中使用显著性检验长期以来一直受到批评。应采用其他方法替代在任意显著性水平0.05上对零假设的检验。我们的发现应会削弱统计检验使用者（医生）对数学程序不可动摇的准确性的毫无根据的信念。使用置信区间虽然也有人反对，但似乎更为合适。（表4，图1，参考文献19）

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

统计假设检验——精确的p值有多精确？

Statistical hypothesis testing--how exact are exact p-values?

作者信息

机构信息

出版信息

OBJECTIVES AND BACKGROUND

METHODS

RESULTS

CONCLUSIONS

目的与背景

方法

结果

结论

相似文献

引用本文的文献

统计假设检验——精确的p值有多精确？

Statistical hypothesis testing--how exact are exact p-values?

作者信息

机构信息

出版信息

OBJECTIVES AND BACKGROUND

METHODS

RESULTS

CONCLUSIONS

目的与背景

方法

结果

结论

相似文献

引用本文的文献