P 值——一个长期存在的难题。

P-values - a chronic conundrum.

机构信息

Department of Veterans Affairs, Office of Productivity, Efficiency and Staffing (OPES, RAPID), Albany, USA.

出版信息

BMC Med Res Methodol. 2020 Jun 24;20(1):167. doi: 10.1186/s12874-020-01051-6.

DOI:10.1186/s12874-020-01051-6

PMID:32580765

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7315482/

Abstract

BACKGROUND

In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p-value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p-value.

MAIN TEXT

The confusion with p-values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p-value for guarding against randomness also plays a role. The p-value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p-values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p-value is a practical tool gauging the "strength of evidence" against the null hypothesis. It informs investigators that a p-value of 0.001, for example, is stronger than 0.05. However, p-values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p-value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%.

CONCLUSIONS

A long-overdue effort to understand p-values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p-values (the probability that a treatment does not work) should be reported in research papers.

摘要

背景

在医学研究和实践中，p 值可以说是使用最广泛的统计量，但它也被广泛误解为一类错误的概率，这带来了严重的后果。这种误解极大地影响了研究的可重复性、医学实践中的治疗选择以及实证分析中的模型规范。本文通过使用通俗易懂的语言和具体的例子，从根源上阐明 p 值的混淆，解释显著性检验和假设检验之间的区别，阐明混淆的后果，并提出传统 p 值的可行替代方案。

主要内容

p 值的混淆困扰了研究界和医学从业者几十年。然而，澄清这一问题的努力在很大程度上是徒劳的，部分原因是缺乏直观但数学上严谨的教育材料。此外，缺乏传统 p 值之外的实用替代方案来防范随机性也是一个原因。p 值的混淆源于对显著性检验和假设检验的误解。包括许多统计学家在内的大多数人都没有意识到，Fisher 提出的 p 值和显著性检验与 Neyman 和 Pearson 提出的假设检验范式是不可比的。而且，大多数其他优秀的统计学教材往往将这两种范式混为一谈，并没有努力阐明它们之间微妙而根本的区别。p 值是衡量对零假设“证据强度”的实用工具。例如，p 值为 0.001 比 0.05 更强。然而，在显著性检验中产生的 p 值并不是通常误解的一类错误的概率。对于 p 值为 0.05，治疗无效的可能性不是 5%；相反，它至少是 28.9%。

结论

现在非常有必要进行一场早就应该进行的正确理解 p 值的努力。然而，在医学研究和实践中，仅仅禁止显著性检验并接受不确定性是不够的。研究人员、临床医生和患者都需要知道治疗是否有效或无效的概率。因此，应该在研究论文中报告校准的 p 值（治疗无效的概率）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e666/7315482/bec765f2aa47/12874_2020_1051_Fig1_HTML.jpg

相似文献

P-values - a chronic conundrum.

BMC Med Res Methodol. 2020 Jun 24;20(1):167. doi: 10.1186/s12874-020-01051-6.

P value and the theory of hypothesis testing: an explanation for new researchers.

Clin Orthop Relat Res. 2010 Mar;468(3):885-92. doi: 10.1007/s11999-009-1164-4.

[The uncertainties of statistical "significance"].

Rev Med Chil. 2018 Dec;146(10):1184-1189. doi: 10.4067/S0034-98872018001001184.

The Practical Alternative to the Value Is the Correctly Used Value.

Perspect Psychol Sci. 2021 May;16(3):639-648. doi: 10.1177/1745691620958012. Epub 2021 Feb 9.

The researcher and the consultant: from testing to probability statements.

Eur J Epidemiol. 2015 Sep;30(9):1003-8. doi: 10.1007/s10654-015-0054-1. Epub 2015 Jun 25.

The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing.

Am J Epidemiol. 2017 Sep 15;186(6):627-635. doi: 10.1093/aje/kwx261.

Statistics in ophthalmology revisited: the (effect) size matters.

Acta Ophthalmol. 2018 Nov;96(7):e885-e888. doi: 10.1111/aos.13756. Epub 2018 Sep 5.

Pervasive errors in hypothesis testing: Toward better statistical practice in nursing research.

Int J Nurs Stud. 2019 Oct;98:87-93. doi: 10.1016/j.ijnurstu.2019.06.012. Epub 2019 Jul 7.

Détente: A Practical Understanding of P values and Bayesian Posterior Probabilities.

Clin Pharmacol Ther. 2021 Jun;109(6):1489-1498. doi: 10.1002/cpt.2004. Epub 2020 Sep 26.

To P or Not to P: Backing Bayesian Statistics.

Otolaryngol Head Neck Surg. 2017 Dec;157(6):915-918. doi: 10.1177/0194599817739260.

引用本文的文献

Causal clarity in statistical software.

Int J Epidemiol. 2025 Jun 11;54(4). doi: 10.1093/ije/dyaf136.

Irrationality in humans and creativity in AI.

Front Artif Intell. 2025 Jun 20;8:1579704. doi: 10.3389/frai.2025.1579704. eCollection 2025.

Transparency in Science Reporting: A Call to Researchers and Publishers.

Cureus. 2025 Feb 23;17(2):e79493. doi: 10.7759/cureus.79493. eCollection 2025 Feb.

Extracellular Vesicle Protein Expression in Doped Bioactive Glasses: Further Insights Applying Anomaly Detection.

Int J Mol Sci. 2024 Mar 21;25(6):3560. doi: 10.3390/ijms25063560.

The "P"-Value: The Primary Alphabet of Research Revisited.

Int J Prev Med. 2023 Apr 26;14:41. doi: 10.4103/ijpvm.ijpvm_200_22. eCollection 2023.

Relationship between patient experience and hospital readmission: system-level survey with deterministic data linkage method.

BMC Med Res Methodol. 2022 Jul 21;22(1):197. doi: 10.1186/s12874-022-01677-8.

Analysis of 567,758 randomized controlled trials published over 30 years reveals trends in phrases used to discuss results that do not reach statistical significance.

PLoS Biol. 2022 Feb 18;20(2):e3001562. doi: 10.1371/journal.pbio.3001562. eCollection 2022 Feb.

Multiple secondary outcome analyses: precise interpretation is important.

Trials. 2022 Jan 10;23(1):27. doi: 10.1186/s13063-021-05975-2.

COVID-19 diagnosis from routine blood tests using artificial intelligence techniques.

Biomed Signal Process Control. 2022 Feb;72:103263. doi: 10.1016/j.bspc.2021.103263. Epub 2021 Nov 1.

P-value and effect-size in clinical and experimental studies.

J Vasc Bras. 2021 Jul 5;20:e20210038. doi: 10.1590/1677-5449.210038. eCollection 2021.

本文引用的文献

Redefine statistical significance.

Nat Hum Behav. 2018 Jan;2(1):6-10. doi: 10.1038/s41562-017-0189-z.

Scientists rise up against statistical significance.

Nature. 2019 Mar;567(7748):305-307. doi: 10.1038/d41586-019-00857-9.

Annals Understanding Clinical Research: Interpreting Results With Large P Values.

Ann Intern Med. 2018 Oct 2;169(7):485-486. doi: 10.7326/M18-2003. Epub 2018 Sep 11.

Dietary Fats and Cardiovascular Disease: A Presidential Advisory From the American Heart Association.

Circulation. 2017 Jul 18;136(3):e1-e23. doi: 10.1161/CIR.0000000000000510. Epub 2017 Jun 15.

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.

Eur J Epidemiol. 2016 Apr;31(4):337-50. doi: 10.1007/s10654-016-0149-3. Epub 2016 May 21.

Association of dietary, circulating, and supplement fatty acids with coronary risk: a systematic review and meta-analysis.

Ann Intern Med. 2014 Mar 18;160(6):398-406. doi: 10.7326/M13-1788.

Revised standards for statistical evidence.

Proc Natl Acad Sci U S A. 2013 Nov 26;110(48):19313-7. doi: 10.1073/pnas.1313476110. Epub 2013 Nov 11.

A meta-analysis of coffee consumption and pancreatic cancer.

Ann Oncol. 2012 Feb;23(2):311-8. doi: 10.1093/annonc/mdr331. Epub 2011 Jul 11.

Why most discovered true associations are inflated.

Epidemiology. 2008 Sep;19(5):640-8. doi: 10.1097/EDE.0b013e31818131e7.

A dirty dozen: twelve p-value misconceptions.

Semin Hematol. 2008 Jul;45(3):135-40. doi: 10.1053/j.seminhematol.2008.04.003.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

P 值——一个长期存在的难题。

P-values - a chronic conundrum.

机构信息

出版信息

BACKGROUND

MAIN TEXT

CONCLUSIONS

背景

主要内容

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献