• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

P 值——一个长期存在的难题。

P-values - a chronic conundrum.

机构信息

Department of Veterans Affairs, Office of Productivity, Efficiency and Staffing (OPES, RAPID), Albany, USA.

出版信息

BMC Med Res Methodol. 2020 Jun 24;20(1):167. doi: 10.1186/s12874-020-01051-6.

DOI:10.1186/s12874-020-01051-6
PMID:32580765
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7315482/
Abstract

BACKGROUND

In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p-value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p-value.

MAIN TEXT

The confusion with p-values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p-value for guarding against randomness also plays a role. The p-value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p-values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p-value is a practical tool gauging the "strength of evidence" against the null hypothesis. It informs investigators that a p-value of 0.001, for example, is stronger than 0.05. However, p-values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p-value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%.

CONCLUSIONS

A long-overdue effort to understand p-values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p-values (the probability that a treatment does not work) should be reported in research papers.

摘要

背景

在医学研究和实践中,p 值可以说是使用最广泛的统计量,但它也被广泛误解为一类错误的概率,这带来了严重的后果。这种误解极大地影响了研究的可重复性、医学实践中的治疗选择以及实证分析中的模型规范。本文通过使用通俗易懂的语言和具体的例子,从根源上阐明 p 值的混淆,解释显著性检验和假设检验之间的区别,阐明混淆的后果,并提出传统 p 值的可行替代方案。

主要内容

p 值的混淆困扰了研究界和医学从业者几十年。然而,澄清这一问题的努力在很大程度上是徒劳的,部分原因是缺乏直观但数学上严谨的教育材料。此外,缺乏传统 p 值之外的实用替代方案来防范随机性也是一个原因。p 值的混淆源于对显著性检验和假设检验的误解。包括许多统计学家在内的大多数人都没有意识到,Fisher 提出的 p 值和显著性检验与 Neyman 和 Pearson 提出的假设检验范式是不可比的。而且,大多数其他优秀的统计学教材往往将这两种范式混为一谈,并没有努力阐明它们之间微妙而根本的区别。p 值是衡量对零假设“证据强度”的实用工具。例如,p 值为 0.001 比 0.05 更强。然而,在显著性检验中产生的 p 值并不是通常误解的一类错误的概率。对于 p 值为 0.05,治疗无效的可能性不是 5%;相反,它至少是 28.9%。

结论

现在非常有必要进行一场早就应该进行的正确理解 p 值的努力。然而,在医学研究和实践中,仅仅禁止显著性检验并接受不确定性是不够的。研究人员、临床医生和患者都需要知道治疗是否有效或无效的概率。因此,应该在研究论文中报告校准的 p 值(治疗无效的概率)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e666/7315482/bec765f2aa47/12874_2020_1051_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e666/7315482/bec765f2aa47/12874_2020_1051_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e666/7315482/bec765f2aa47/12874_2020_1051_Fig1_HTML.jpg

相似文献

1
P-values - a chronic conundrum.P 值——一个长期存在的难题。
BMC Med Res Methodol. 2020 Jun 24;20(1):167. doi: 10.1186/s12874-020-01051-6.
2
P value and the theory of hypothesis testing: an explanation for new researchers.P 值与假设检验理论:对新研究人员的解释。
Clin Orthop Relat Res. 2010 Mar;468(3):885-92. doi: 10.1007/s11999-009-1164-4.
3
[The uncertainties of statistical "significance"].[统计“显著性”的不确定性]
Rev Med Chil. 2018 Dec;146(10):1184-1189. doi: 10.4067/S0034-98872018001001184.
4
The Practical Alternative to the Value Is the Correctly Used Value.实用的替代价值是正确使用的价值。
Perspect Psychol Sci. 2021 May;16(3):639-648. doi: 10.1177/1745691620958012. Epub 2021 Feb 9.
5
The researcher and the consultant: from testing to probability statements.研究者与顾问:从检验到概率陈述。
Eur J Epidemiol. 2015 Sep;30(9):1003-8. doi: 10.1007/s10654-015-0054-1. Epub 2015 Jun 25.
6
The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing.《零假设显著性检验文化对可重复性造成的危害》
Am J Epidemiol. 2017 Sep 15;186(6):627-635. doi: 10.1093/aje/kwx261.
7
Statistics in ophthalmology revisited: the (effect) size matters.眼科统计学再探:(效应)大小很重要。
Acta Ophthalmol. 2018 Nov;96(7):e885-e888. doi: 10.1111/aos.13756. Epub 2018 Sep 5.
8
Pervasive errors in hypothesis testing: Toward better statistical practice in nursing research.假设检验中的普遍错误:走向护理研究中更好的统计实践。
Int J Nurs Stud. 2019 Oct;98:87-93. doi: 10.1016/j.ijnurstu.2019.06.012. Epub 2019 Jul 7.
9
Détente: A Practical Understanding of P values and Bayesian Posterior Probabilities.放松:对 P 值和贝叶斯后验概率的实际理解。
Clin Pharmacol Ther. 2021 Jun;109(6):1489-1498. doi: 10.1002/cpt.2004. Epub 2020 Sep 26.
10
To P or Not to P: Backing Bayesian Statistics.支持还是反对P值:支持贝叶斯统计。
Otolaryngol Head Neck Surg. 2017 Dec;157(6):915-918. doi: 10.1177/0194599817739260.

引用本文的文献

1
Causal clarity in statistical software.统计软件中的因果关系清晰度
Int J Epidemiol. 2025 Jun 11;54(4). doi: 10.1093/ije/dyaf136.
2
Irrationality in humans and creativity in AI.人类的非理性与人工智能的创造力。
Front Artif Intell. 2025 Jun 20;8:1579704. doi: 10.3389/frai.2025.1579704. eCollection 2025.
3
Transparency in Science Reporting: A Call to Researchers and Publishers.科学报告中的透明度:呼吁研究人员和出版商

本文引用的文献

1
Redefine statistical significance.重新定义统计学显著性。
Nat Hum Behav. 2018 Jan;2(1):6-10. doi: 10.1038/s41562-017-0189-z.
2
Scientists rise up against statistical significance.科学家们奋起反对统计显著性。
Nature. 2019 Mar;567(7748):305-307. doi: 10.1038/d41586-019-00857-9.
3
Annals Understanding Clinical Research: Interpreting Results With Large P Values.《医学年鉴:理解临床研究——解读大P值的结果》
Cureus. 2025 Feb 23;17(2):e79493. doi: 10.7759/cureus.79493. eCollection 2025 Feb.
4
Extracellular Vesicle Protein Expression in Doped Bioactive Glasses: Further Insights Applying Anomaly Detection.外泌体蛋白在掺杂生物活性玻璃中的表达:应用异常检测的进一步见解。
Int J Mol Sci. 2024 Mar 21;25(6):3560. doi: 10.3390/ijms25063560.
5
The "P"-Value: The Primary Alphabet of Research Revisited.“P值”:重新审视研究的主要字母表。
Int J Prev Med. 2023 Apr 26;14:41. doi: 10.4103/ijpvm.ijpvm_200_22. eCollection 2023.
6
Relationship between patient experience and hospital readmission: system-level survey with deterministic data linkage method.患者体验与医院再入院率之间的关系:采用确定性数据链接方法的系统级调查。
BMC Med Res Methodol. 2022 Jul 21;22(1):197. doi: 10.1186/s12874-022-01677-8.
7
Analysis of 567,758 randomized controlled trials published over 30 years reveals trends in phrases used to discuss results that do not reach statistical significance.对 30 多年来发表的 567758 项随机对照试验进行分析,揭示了用于讨论未达到统计学意义的结果的短语趋势。
PLoS Biol. 2022 Feb 18;20(2):e3001562. doi: 10.1371/journal.pbio.3001562. eCollection 2022 Feb.
8
Multiple secondary outcome analyses: precise interpretation is important.多项次要结局分析:精确解读很重要。
Trials. 2022 Jan 10;23(1):27. doi: 10.1186/s13063-021-05975-2.
9
COVID-19 diagnosis from routine blood tests using artificial intelligence techniques.使用人工智能技术通过常规血液检测诊断新冠病毒肺炎
Biomed Signal Process Control. 2022 Feb;72:103263. doi: 10.1016/j.bspc.2021.103263. Epub 2021 Nov 1.
10
P-value and effect-size in clinical and experimental studies.临床和实验研究中的P值与效应量
J Vasc Bras. 2021 Jul 5;20:e20210038. doi: 10.1590/1677-5449.210038. eCollection 2021.
Ann Intern Med. 2018 Oct 2;169(7):485-486. doi: 10.7326/M18-2003. Epub 2018 Sep 11.
4
Dietary Fats and Cardiovascular Disease: A Presidential Advisory From the American Heart Association.膳食脂肪与心血管疾病:美国心脏协会的总统顾问报告。
Circulation. 2017 Jul 18;136(3):e1-e23. doi: 10.1161/CIR.0000000000000510. Epub 2017 Jun 15.
5
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.统计检验、P 值、置信区间与检验效能:误解指南
Eur J Epidemiol. 2016 Apr;31(4):337-50. doi: 10.1007/s10654-016-0149-3. Epub 2016 May 21.
6
Association of dietary, circulating, and supplement fatty acids with coronary risk: a systematic review and meta-analysis.膳食、循环和补充脂肪酸与冠心病风险的关联:系统评价和荟萃分析。
Ann Intern Med. 2014 Mar 18;160(6):398-406. doi: 10.7326/M13-1788.
7
Revised standards for statistical evidence.修订后的统计证据标准。
Proc Natl Acad Sci U S A. 2013 Nov 26;110(48):19313-7. doi: 10.1073/pnas.1313476110. Epub 2013 Nov 11.
8
A meta-analysis of coffee consumption and pancreatic cancer.咖啡饮用量与胰腺癌的关系的荟萃分析
Ann Oncol. 2012 Feb;23(2):311-8. doi: 10.1093/annonc/mdr331. Epub 2011 Jul 11.
9
Why most discovered true associations are inflated.为何大多数已发现的真实关联被夸大了。
Epidemiology. 2008 Sep;19(5):640-8. doi: 10.1097/EDE.0b013e31818131e7.
10
A dirty dozen: twelve p-value misconceptions.有害的十二个:十二个p值误解
Semin Hematol. 2008 Jul;45(3):135-40. doi: 10.1053/j.seminhematol.2008.04.003.