• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物医学研究中持续存在的统计显著性检验暴政。

The ongoing tyranny of statistical significance testing in biomedical research.

机构信息

Institut für Klinische Epidemiologie, Medizinische Fakultät, Martin-Luther-Universität Halle-Wittenberg, Magdeburger Str. 8, 06097, Halle (Saale), Germany.

出版信息

Eur J Epidemiol. 2010 Apr;25(4):225-30. doi: 10.1007/s10654-010-9440-x. Epub 2010 Mar 26.

DOI:10.1007/s10654-010-9440-x
PMID:20339903
Abstract

Since its introduction into the biomedical literature, statistical significance testing (abbreviated as SST) caused much debate. The aim of this perspective article is to review frequent fallacies and misuses of SST in the biomedical field and to review a potential way out of the fallacies and misuses associated with SSTs. Two frequentist schools of statistical inference merged to form SST as it is practised nowadays: the Fisher and the Neyman-Pearson school. The P-value is both reported quantitatively and checked against the alpha-level to produce a qualitative dichotomous measure (significant/nonsignificant). However, a P-value mixes the estimated effect size with its estimated precision. Obviously, it is not possible to measure these two things with one single number. For the valid interpretation of SSTs, a variety of presumptions and requirements have to be met. We point here to four of them: study size, correct statistical model, correct causal model, and absence of bias and confounding. It has been stated that the P-value is perhaps the most misunderstood statistical concept in clinical research. As in the social sciences, the tyranny of SST is still highly prevalent in the biomedical literature even after decades of warnings against SST. The ubiquitous misuse and tyranny of SST threatens scientific discoveries and may even impede scientific progress. In the worst case, misuse of significance testing may even harm patients who eventually are incorrectly treated because of improper handling of P-values. For a proper interpretation of study results, both estimated effect size and estimated precision are necessary ingredients.

摘要

自引入生物医学文献以来,统计显著性检验(简称 SST)引起了广泛争议。本文旨在回顾生物医学领域中 SST 常见的谬误和误用,并探讨一种潜在的解决方案,以避免与 SST 相关的谬误和误用。如今实践中使用的 SST 是由两种频率派统计推断学派——Fisher 学派和 Neyman-Pearson 学派——合并形成的。P 值既是定量报告的,也是与 alpha 水平进行比较的,以产生定性的二分测量(显著/不显著)。然而,P 值将估计的效应大小与其估计的精度混合在一起。显然,不可能用一个单一的数字来衡量这两件事。为了正确解释 SST,需要满足各种假设和要求。我们在这里指出其中四个:研究规模、正确的统计模型、正确的因果模型以及不存在偏差和混杂。有人指出,P 值可能是临床研究中最被误解的统计概念。与社会科学一样,即使经过几十年对 SST 的警告,SST 的暴政在生物医学文献中仍然非常普遍。SST 的普遍误用和暴政威胁着科学发现,甚至可能阻碍科学进步。在最坏的情况下,误用显著性检验可能会损害患者,因为对 P 值的不当处理导致他们的治疗不当。为了正确解释研究结果,估计的效应大小和估计的精度都是必要的组成部分。

相似文献

1
The ongoing tyranny of statistical significance testing in biomedical research.生物医学研究中持续存在的统计显著性检验暴政。
Eur J Epidemiol. 2010 Apr;25(4):225-30. doi: 10.1007/s10654-010-9440-x. Epub 2010 Mar 26.
2
Erratum to: Letter to the Editor: The ongoing tyranny of statistical significance testing in biomedical research.致编辑的信的勘误:生物医学研究中统计显著性检验持续存在的暴政。
Eur J Epidemiol. 2010 Dec;25(12):899-900. doi: 10.1007/s10654-010-9537-2.
3
Re: The ongoing tyranny of statistical significance testing in biomedical research.关于:生物医学研究中统计显著性检验持续存在的专制性。
Eur J Epidemiol. 2010 Nov;25(11):843; author reply 844-5. doi: 10.1007/s10654-010-9507-8. Epub 2010 Nov 20.
4
Misconceptions, Misuses, and Misinterpretations of P Values and Significance Testing.对 P 值和显著性检验的误解、误用和曲解。
J Bone Joint Surg Am. 2017 Sep 20;99(18):1598-1603. doi: 10.2106/JBJS.16.01314.
5
Understanding statistical significance.理解统计显著性。
Nurs Res. 2010 May-Jun;59(3):219-23. doi: 10.1097/NNR.0b013e3181dbb2cc.
6
Statistical fallacies & errors can also jeopardize life & health of many.统计谬误和错误也可能危及许多人的生命和健康。
Indian J Med Res. 2018 Dec;148(6):677-679. doi: 10.4103/ijmr.IJMR_853_18.
7
Understanding the effect size and its measures.理解效应量及其测量方法。
Biochem Med (Zagreb). 2016;26(2):150-63. doi: 10.11613/BM.2016.015.
8
Unit of analysis issues in laboratory-based research.基于实验室的研究中的分析单位问题。
Elife. 2018 Jan 10;7:e32486. doi: 10.7554/eLife.32486.
9
Frequent mistakes in the statistical inference of biomedical data.生物医学数据统计推断中的常见错误。
Ital Heart J. 2005 Feb;6(2):90-5.
10
Methods for handling longitudinal outcome processes truncated by dropout and death.处理因失访和死亡而截断的纵向结局过程的方法。
Biostatistics. 2018 Oct 1;19(4):407-425. doi: 10.1093/biostatistics/kxx045.

引用本文的文献

1
Statistical inference and effect measures in abstracts of major HIV and AIDS journals, 1987-2022: A systematic review.1987 - 2022年主要HIV与艾滋病期刊摘要中的统计推断与效应量:一项系统综述
Glob Epidemiol. 2025 Jul 25;10:100213. doi: 10.1016/j.gloepi.2025.100213. eCollection 2025 Dec.
2
Treatment-related adverse events of chimeric antigen receptor-T therapies for cancers in clinical trials: a systematic review and meta-analysis.嵌合抗原受体T细胞疗法治疗癌症在临床试验中的治疗相关不良事件:一项系统评价和荟萃分析
EClinicalMedicine. 2025 May 30;84:103267. doi: 10.1016/j.eclinm.2025.103267. eCollection 2025 Jun.
3

本文引用的文献

1
Translating statistical findings into plain English.将统计结果转化为通俗易懂的语言。
Lancet. 2009 Jun 6;373(9679):1926-8. doi: 10.1016/S0140-6736(09)60499-2. Epub 2009 Apr 15.
2
A dirty dozen: twelve p-value misconceptions.有害的十二个:十二个p值误解
Semin Hematol. 2008 Jul;45(3):135-40. doi: 10.1053/j.seminhematol.2008.04.003.
3
Flame retardants in placenta and breast milk and cryptorchidism in newborn boys.胎盘和母乳中的阻燃剂与男婴隐睾症
Better statistical reporting does not lead to statistical rigour: lessons from two decades of pseudoreplication in mouse-model studies of neurological disorders.
更好的统计报告并不能带来统计严谨性:神经疾病小鼠模型研究中二十年伪重复的教训。
Mol Autism. 2025 May 26;16(1):30. doi: 10.1186/s13229-025-00663-3.
4
Marital status and risk of cardiovascular disease - a multi-analyst study in epidemiology.婚姻状况与心血管疾病风险——一项流行病学的多分析员研究
Eur J Epidemiol. 2025 May 5. doi: 10.1007/s10654-025-01235-8.
5
Sociogeographic determinants of rapid opioid reduction or discontinuation among patients on high-dose long-term opioid therapy in North Carolina, 2006-2018.2006 - 2018年北卡罗来纳州接受高剂量长期阿片类药物治疗患者快速减少或停用阿片类药物的社会地理决定因素
Pain Med. 2025 Feb 1;26(2):63-69. doi: 10.1093/pm/pnae119.
6
Effects of Haptic Feedback Interventions in Post-Stroke Gait and Balance Disorders: A Systematic Review and Meta-Analysis.触觉反馈干预对中风后步态和平衡障碍的影响:一项系统评价和荟萃分析。
J Pers Med. 2024 Sep 14;14(9):974. doi: 10.3390/jpm14090974.
7
Maternal autoimmune disease and offspring risk of haematological malignancies: a case-control study.母体自身免疫性疾病与后代血液系统恶性肿瘤风险:一项病例对照研究。
EClinicalMedicine. 2024 Aug 30;75:102794. doi: 10.1016/j.eclinm.2024.102794. eCollection 2024 Sep.
8
Feedback Interventions in Motor Recovery of Lateropulsion after Stroke: A Literature Review and Case Series.中风后偏侧推挤运动恢复中的反馈干预:文献综述与病例系列
Brain Sci. 2024 Jul 5;14(7):682. doi: 10.3390/brainsci14070682.
9
New Anticancer Drugs: Reliably Assessing "Value" While Addressing High Prices.新型抗癌药物:在解决高价问题的同时,可靠地评估“价值”。
Curr Oncol. 2024 Apr 28;31(5):2453-2480. doi: 10.3390/curroncol31050184.
10
A Utilitarian Perspective on Risk Quantification for Clinical Significance in Binary Outcomes.二元结局中临床意义风险量化的功利主义视角
Inquiry. 2024 Jan-Dec;61:469580241248134. doi: 10.1177/00469580241248134.
Environ Health Perspect. 2007 Oct;115(10):1519-26. doi: 10.1289/ehp.9924.
4
Treating COPD--the TORCH trial, P values, and the Dodo.治疗慢性阻塞性肺疾病——TORCH试验、P值与渡渡鸟。
N Engl J Med. 2007 Feb 22;356(8):851-4. doi: 10.1056/NEJMe068307.
5
Effects of moderate alcohol consumption on cognitive function in women.适度饮酒对女性认知功能的影响。
N Engl J Med. 2005 Jan 20;352(3):245-53. doi: 10.1056/NEJMoa041152.
6
What your statistician never told you about P-values.关于P值,你的统计学家从未告诉你的事。
J Am Assoc Gynecol Laparosc. 2003 Nov;10(4):439-44. doi: 10.1016/s1074-3804(05)60143-0.
7
Commentary: This study failed?评论:这项研究失败了?
Int J Epidemiol. 2003 Aug;32(4):534-5. doi: 10.1093/ije/dyg197.
8
Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women's Health Initiative randomized controlled trial.健康绝经后妇女使用雌激素加孕激素的风险与益处:妇女健康倡议随机对照试验的主要结果
JAMA. 2002 Jul 17;288(3):321-33. doi: 10.1001/jama.288.3.321.
9
Low P-values or narrow confidence intervals: which are more durable?低P值还是窄置信区间:哪个更具稳健性?
Epidemiology. 2001 May;12(3):291-4. doi: 10.1097/00001648-200105000-00005.
10
Sifting the evidence-what's wrong with significance tests?筛选证据——显著性检验存在哪些问题?
BMJ. 2001 Jan 27;322(7280):226-31. doi: 10.1136/bmj.322.7280.226.