• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

向不可能致敬:p 值、证据和似然。

Hail the impossible: p-values, evidence, and likelihood.

机构信息

Kristianstad University, Kristianstad, Sweden.

出版信息

Scand J Psychol. 2011 Apr;52(2):113-25. doi: 10.1111/j.1467-9450.2010.00852.x. Epub 2010 Nov 16.

DOI:10.1111/j.1467-9450.2010.00852.x
PMID:21077903
Abstract

Significance testing based on p-values is standard in psychological research and teaching. Typically, research articles and textbooks present and use p as a measure of statistical evidence against the null hypothesis (the Fisherian interpretation), although using concepts and tools based on a completely different usage of p as a tool for controlling long-term decision errors (the Neyman-Pearson interpretation). There are four major problems with using p as a measure of evidence and these problems are often overlooked in the domain of psychology. First, p is uniformly distributed under the null hypothesis and can therefore never indicate evidence for the null. Second, p is conditioned solely on the null hypothesis and is therefore unsuited to quantify evidence, because evidence is always relative in the sense of being evidence for or against a hypothesis relative to another hypothesis. Third, p designates probability of obtaining evidence (given the null), rather than strength of evidence. Fourth, p depends on unobserved data and subjective intentions and therefore implies, given the evidential interpretation, that the evidential strength of observed data depends on things that did not happen and subjective intentions. In sum, using p in the Fisherian sense as a measure of statistical evidence is deeply problematic, both statistically and conceptually, while the Neyman-Pearson interpretation is not about evidence at all. In contrast, the likelihood ratio escapes the above problems and is recommended as a tool for psychologists to represent the statistical evidence conveyed by obtained data relative to two hypotheses.

摘要

基于 p 值的显著性检验在心理学研究和教学中是标准的。通常,研究论文和教材以 Fisher 解释(即 p 值作为反对零假设的统计证据的度量)呈现和使用 p 值,尽管使用基于 p 值的完全不同用法的概念和工具来控制长期决策错误(Neyman-Pearson 解释)。在心理学领域,使用 p 值作为证据的度量存在四个主要问题,这些问题通常被忽视。首先,p 值在零假设下是均匀分布的,因此永远不能表示对零假设的证据。其次,p 值仅取决于零假设,因此不适合量化证据,因为证据总是相对的,即相对于另一个假设,证据是支持或反对假设的证据。第三,p 值指定获得证据的概率(给定零假设),而不是证据的强度。第四,p 值取决于未观察到的数据和主观意图,因此,根据证据解释,观察到的数据的证据强度取决于未发生的事情和主观意图。总之,在 Fisher 意义上,将 p 值作为统计证据的度量在统计学和概念上都存在严重问题,而 Neyman-Pearson 解释根本不是关于证据的。相比之下,似然比避免了上述问题,被推荐为心理学家用来表示相对于两个假设获得的数据所传达的统计证据的工具。

相似文献

1
Hail the impossible: p-values, evidence, and likelihood.向不可能致敬:p 值、证据和似然。
Scand J Psychol. 2011 Apr;52(2):113-25. doi: 10.1111/j.1467-9450.2010.00852.x. Epub 2010 Nov 16.
2
A tutorial on a practical Bayesian alternative to null-hypothesis significance testing.关于实用贝叶斯替代零假设检验的教程。
Behav Res Methods. 2011 Sep;43(3):679-90. doi: 10.3758/s13428-010-0049-5.
3
An alternative foundation for the planning and evaluation of linkage analysis. II. Implications for multiple test adjustments.连锁分析规划与评估的另一种基础。II. 多重检验校正的影响
Hum Hered. 2006;61(4):200-9. doi: 10.1159/000094775. Epub 2006 Jul 27.
4
Failed refutations: further comments on parsimony and likelihood methods and their relationship to Popper's degree of corroboration.失败的反驳:关于简约法和似然法及其与波普尔确证度关系的进一步评论
Syst Biol. 2003 Jun;52(3):352-67.
5
A test of the null hypothesis significance testing procedure correlation argument.对零假设显著性检验程序相关性论点的一项检验。
J Gen Psychol. 2009 Jul;136(3):261-9. doi: 10.3200/GENP.136.3.261-270.
6
Are effect sizes and confidence levels problems for or solutions to the null hypothesis test?效应量和置信水平是零假设检验的问题还是解决方案?
J Gen Psychol. 2000 Apr;127(2):198-216. doi: 10.1080/00221300009598579.
7
Null misinterpretation in statistical testing and its impact on health risk assessment.统计检验中的无意义解读及其对健康风险评估的影响。
Prev Med. 2011 Oct;53(4-5):225-8. doi: 10.1016/j.ypmed.2011.08.010. Epub 2011 Aug 17.
8
Bayesian hypothesis testing for psychologists: a tutorial on the Savage-Dickey method.贝叶斯假设检验对心理学家来说:萨维奇-迪基方法教程。
Cogn Psychol. 2010 May;60(3):158-89. doi: 10.1016/j.cogpsych.2009.12.001. Epub 2010 Jan 12.
9
What is the value of a p value?p值的价值是什么?
Ann Thorac Surg. 2009 May;87(5):1337-43. doi: 10.1016/j.athoracsur.2009.03.027.
10
An unexpected influence of widely used significance thresholds on the distribution of reported P-values.广泛使用的显著性阈值对报告的P值分布产生的意外影响。
J Evol Biol. 2007 May;20(3):1082-9. doi: 10.1111/j.1420-9101.2006.01291.x.

引用本文的文献

1
Statistics as a Tool in the Physician's Black Bag.统计学作为医生百宝箱中的一件工具。
Mo Med. 2025 Mar-Apr;122(2):151-156.
2
Nonsignificance misinterpreted as an effect's absence in psychology: prevalence and temporal analyses.心理学中被误作效应不存在的无显著性:发生率与时间分析
R Soc Open Sci. 2025 Mar 19;12(3):242167. doi: 10.1098/rsos.242167. eCollection 2025 Mar.
3
Providing Evidence for the Null Hypothesis in Functional Magnetic Resonance Imaging Using Group-Level Bayesian Inference.使用组水平贝叶斯推理为功能磁共振成像中的零假设提供证据。
Front Neuroinform. 2021 Dec 2;15:738342. doi: 10.3389/fninf.2021.738342. eCollection 2021.
4
A modern method of multiple working hypotheses to improve inference in ecology.一种用于改进生态学推断的多工作假说现代方法。
R Soc Open Sci. 2020 Jun 3;7(6):200231. doi: 10.1098/rsos.200231. eCollection 2020 Jun.
5
MEMRI-based imaging pipeline for guiding preclinical studies in mouse models of sporadic medulloblastoma.基于 MEMRI 的成像流水线用于指导散发性髓母细胞瘤小鼠模型的临床前研究。
Magn Reson Med. 2020 Jan;83(1):214-227. doi: 10.1002/mrm.27904. Epub 2019 Aug 12.
6
Performing Contrast Analysis in Factorial Designs: From NHST to Confidence Intervals and Beyond.析因设计中的对比分析:从零假设显著性检验到置信区间及其他。
Educ Psychol Meas. 2017 Aug;77(4):690-715. doi: 10.1177/0013164416668950. Epub 2016 Oct 6.
7
Speakers of different languages process the visual world differently.说不同语言的人对视觉世界的处理方式不同。
J Exp Psychol Gen. 2015 Jun;144(3):539-50. doi: 10.1037/xge0000075.