Suppr超能文献

P 值——一个长期存在的难题。

P-values - a chronic conundrum.

机构信息

Department of Veterans Affairs, Office of Productivity, Efficiency and Staffing (OPES, RAPID), Albany, USA.

出版信息

BMC Med Res Methodol. 2020 Jun 24;20(1):167. doi: 10.1186/s12874-020-01051-6.

Abstract

BACKGROUND

In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p-value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p-value.

MAIN TEXT

The confusion with p-values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p-value for guarding against randomness also plays a role. The p-value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p-values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p-value is a practical tool gauging the "strength of evidence" against the null hypothesis. It informs investigators that a p-value of 0.001, for example, is stronger than 0.05. However, p-values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p-value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%.

CONCLUSIONS

A long-overdue effort to understand p-values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p-values (the probability that a treatment does not work) should be reported in research papers.

摘要

背景

在医学研究和实践中,p 值可以说是使用最广泛的统计量,但它也被广泛误解为一类错误的概率,这带来了严重的后果。这种误解极大地影响了研究的可重复性、医学实践中的治疗选择以及实证分析中的模型规范。本文通过使用通俗易懂的语言和具体的例子,从根源上阐明 p 值的混淆,解释显著性检验和假设检验之间的区别,阐明混淆的后果,并提出传统 p 值的可行替代方案。

主要内容

p 值的混淆困扰了研究界和医学从业者几十年。然而,澄清这一问题的努力在很大程度上是徒劳的,部分原因是缺乏直观但数学上严谨的教育材料。此外,缺乏传统 p 值之外的实用替代方案来防范随机性也是一个原因。p 值的混淆源于对显著性检验和假设检验的误解。包括许多统计学家在内的大多数人都没有意识到,Fisher 提出的 p 值和显著性检验与 Neyman 和 Pearson 提出的假设检验范式是不可比的。而且,大多数其他优秀的统计学教材往往将这两种范式混为一谈,并没有努力阐明它们之间微妙而根本的区别。p 值是衡量对零假设“证据强度”的实用工具。例如,p 值为 0.001 比 0.05 更强。然而,在显著性检验中产生的 p 值并不是通常误解的一类错误的概率。对于 p 值为 0.05,治疗无效的可能性不是 5%;相反,它至少是 28.9%。

结论

现在非常有必要进行一场早就应该进行的正确理解 p 值的努力。然而,在医学研究和实践中,仅仅禁止显著性检验并接受不确定性是不够的。研究人员、临床医生和患者都需要知道治疗是否有效或无效的概率。因此,应该在研究论文中报告校准的 p 值(治疗无效的概率)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e666/7315482/bec765f2aa47/12874_2020_1051_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验