生物医学研究中持续存在的统计显著性检验暴政。

The ongoing tyranny of statistical significance testing in biomedical research.

机构信息

Institut für Klinische Epidemiologie, Medizinische Fakultät, Martin-Luther-Universität Halle-Wittenberg, Magdeburger Str. 8, 06097, Halle (Saale), Germany.

出版信息

Eur J Epidemiol. 2010 Apr;25(4):225-30. doi: 10.1007/s10654-010-9440-x. Epub 2010 Mar 26.

DOI:10.1007/s10654-010-9440-x

PMID:20339903

Abstract

Since its introduction into the biomedical literature, statistical significance testing (abbreviated as SST) caused much debate. The aim of this perspective article is to review frequent fallacies and misuses of SST in the biomedical field and to review a potential way out of the fallacies and misuses associated with SSTs. Two frequentist schools of statistical inference merged to form SST as it is practised nowadays: the Fisher and the Neyman-Pearson school. The P-value is both reported quantitatively and checked against the alpha-level to produce a qualitative dichotomous measure (significant/nonsignificant). However, a P-value mixes the estimated effect size with its estimated precision. Obviously, it is not possible to measure these two things with one single number. For the valid interpretation of SSTs, a variety of presumptions and requirements have to be met. We point here to four of them: study size, correct statistical model, correct causal model, and absence of bias and confounding. It has been stated that the P-value is perhaps the most misunderstood statistical concept in clinical research. As in the social sciences, the tyranny of SST is still highly prevalent in the biomedical literature even after decades of warnings against SST. The ubiquitous misuse and tyranny of SST threatens scientific discoveries and may even impede scientific progress. In the worst case, misuse of significance testing may even harm patients who eventually are incorrectly treated because of improper handling of P-values. For a proper interpretation of study results, both estimated effect size and estimated precision are necessary ingredients.

摘要

自引入生物医学文献以来，统计显著性检验（简称 SST）引起了广泛争议。本文旨在回顾生物医学领域中 SST 常见的谬误和误用，并探讨一种潜在的解决方案，以避免与 SST 相关的谬误和误用。如今实践中使用的 SST 是由两种频率派统计推断学派——Fisher 学派和 Neyman-Pearson 学派——合并形成的。P 值既是定量报告的，也是与 alpha 水平进行比较的，以产生定性的二分测量（显著/不显著）。然而，P 值将估计的效应大小与其估计的精度混合在一起。显然，不可能用一个单一的数字来衡量这两件事。为了正确解释 SST，需要满足各种假设和要求。我们在这里指出其中四个：研究规模、正确的统计模型、正确的因果模型以及不存在偏差和混杂。有人指出，P 值可能是临床研究中最被误解的统计概念。与社会科学一样，即使经过几十年对 SST 的警告，SST 的暴政在生物医学文献中仍然非常普遍。SST 的普遍误用和暴政威胁着科学发现，甚至可能阻碍科学进步。在最坏的情况下，误用显著性检验可能会损害患者，因为对 P 值的不当处理导致他们的治疗不当。为了正确解释研究结果，估计的效应大小和估计的精度都是必要的组成部分。

相似文献

The ongoing tyranny of statistical significance testing in biomedical research.

Eur J Epidemiol. 2010 Apr;25(4):225-30. doi: 10.1007/s10654-010-9440-x. Epub 2010 Mar 26.

Erratum to: Letter to the Editor: The ongoing tyranny of statistical significance testing in biomedical research.

Eur J Epidemiol. 2010 Dec;25(12):899-900. doi: 10.1007/s10654-010-9537-2.

Re: The ongoing tyranny of statistical significance testing in biomedical research.

Eur J Epidemiol. 2010 Nov;25(11):843; author reply 844-5. doi: 10.1007/s10654-010-9507-8. Epub 2010 Nov 20.

Misconceptions, Misuses, and Misinterpretations of P Values and Significance Testing.

J Bone Joint Surg Am. 2017 Sep 20;99(18):1598-1603. doi: 10.2106/JBJS.16.01314.

Understanding statistical significance.

Nurs Res. 2010 May-Jun;59(3):219-23. doi: 10.1097/NNR.0b013e3181dbb2cc.

Statistical fallacies & errors can also jeopardize life & health of many.

Indian J Med Res. 2018 Dec;148(6):677-679. doi: 10.4103/ijmr.IJMR_853_18.

Understanding the effect size and its measures.

Biochem Med (Zagreb). 2016;26(2):150-63. doi: 10.11613/BM.2016.015.

Unit of analysis issues in laboratory-based research.

Elife. 2018 Jan 10;7:e32486. doi: 10.7554/eLife.32486.

Frequent mistakes in the statistical inference of biomedical data.

Ital Heart J. 2005 Feb;6(2):90-5.

Methods for handling longitudinal outcome processes truncated by dropout and death.

Biostatistics. 2018 Oct 1;19(4):407-425. doi: 10.1093/biostatistics/kxx045.

引用本文的文献

Statistical inference and effect measures in abstracts of major HIV and AIDS journals, 1987-2022: A systematic review.

Glob Epidemiol. 2025 Jul 25;10:100213. doi: 10.1016/j.gloepi.2025.100213. eCollection 2025 Dec.

EClinicalMedicine. 2025 May 30;84:103267. doi: 10.1016/j.eclinm.2025.103267. eCollection 2025 Jun.

Better statistical reporting does not lead to statistical rigour: lessons from two decades of pseudoreplication in mouse-model studies of neurological disorders.

Mol Autism. 2025 May 26;16(1):30. doi: 10.1186/s13229-025-00663-3.

Marital status and risk of cardiovascular disease - a multi-analyst study in epidemiology.

Eur J Epidemiol. 2025 May 5. doi: 10.1007/s10654-025-01235-8.

Sociogeographic determinants of rapid opioid reduction or discontinuation among patients on high-dose long-term opioid therapy in North Carolina, 2006-2018.

Pain Med. 2025 Feb 1;26(2):63-69. doi: 10.1093/pm/pnae119.

Effects of Haptic Feedback Interventions in Post-Stroke Gait and Balance Disorders: A Systematic Review and Meta-Analysis.

J Pers Med. 2024 Sep 14;14(9):974. doi: 10.3390/jpm14090974.

Maternal autoimmune disease and offspring risk of haematological malignancies: a case-control study.

EClinicalMedicine. 2024 Aug 30;75:102794. doi: 10.1016/j.eclinm.2024.102794. eCollection 2024 Sep.

Feedback Interventions in Motor Recovery of Lateropulsion after Stroke: A Literature Review and Case Series.

Brain Sci. 2024 Jul 5;14(7):682. doi: 10.3390/brainsci14070682.

New Anticancer Drugs: Reliably Assessing "Value" While Addressing High Prices.

Curr Oncol. 2024 Apr 28;31(5):2453-2480. doi: 10.3390/curroncol31050184.

A Utilitarian Perspective on Risk Quantification for Clinical Significance in Binary Outcomes.

Inquiry. 2024 Jan-Dec;61:469580241248134. doi: 10.1177/00469580241248134.

本文引用的文献

Translating statistical findings into plain English.

Lancet. 2009 Jun 6;373(9679):1926-8. doi: 10.1016/S0140-6736(09)60499-2. Epub 2009 Apr 15.

A dirty dozen: twelve p-value misconceptions.

Semin Hematol. 2008 Jul;45(3):135-40. doi: 10.1053/j.seminhematol.2008.04.003.

Flame retardants in placenta and breast milk and cryptorchidism in newborn boys.

Environ Health Perspect. 2007 Oct;115(10):1519-26. doi: 10.1289/ehp.9924.

Treating COPD--the TORCH trial, P values, and the Dodo.

N Engl J Med. 2007 Feb 22;356(8):851-4. doi: 10.1056/NEJMe068307.

Effects of moderate alcohol consumption on cognitive function in women.

N Engl J Med. 2005 Jan 20;352(3):245-53. doi: 10.1056/NEJMoa041152.

What your statistician never told you about P-values.

J Am Assoc Gynecol Laparosc. 2003 Nov;10(4):439-44. doi: 10.1016/s1074-3804(05)60143-0.

Commentary: This study failed?

Int J Epidemiol. 2003 Aug;32(4):534-5. doi: 10.1093/ije/dyg197.

Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women's Health Initiative randomized controlled trial.

JAMA. 2002 Jul 17;288(3):321-33. doi: 10.1001/jama.288.3.321.

Low P-values or narrow confidence intervals: which are more durable?

Epidemiology. 2001 May;12(3):291-4. doi: 10.1097/00001648-200105000-00005.

Sifting the evidence-what's wrong with significance tests?

BMJ. 2001 Jan 27;322(7280):226-31. doi: 10.1136/bmj.322.7280.226.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物医学研究中持续存在的统计显著性检验暴政。

The ongoing tyranny of statistical significance testing in biomedical research.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献