• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

依赖统计显著性的后果:一些例证。

Consequences of relying on statistical significance: Some illustrations.

机构信息

Department of Development and Regeneration, KU Leuven, Leuven, Belgium.

Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands.

出版信息

Eur J Clin Invest. 2018 May;48(5):e12912. doi: 10.1111/eci.12912. Epub 2018 Feb 28.

DOI:10.1111/eci.12912
PMID:29438579
Abstract

BACKGROUND

Despite regular criticisms of null hypothesis significance testing (NHST), a focus on testing persists, sometimes in the belief to get published and sometimes encouraged by journal reviewers. This paper aims to demonstrate known key limitations of NHST using simple nontechnical illustrations.

DESIGN

The first illustration is based on simulated data of 20 000 studies that compare two groups for an outcome event. The true effect size (difference in event rates) and sample size (20-100 per group) were varied. The second illustration used real data from a meta-analysis on alpha-blockers for the treatment of ureteric stones.

RESULTS

The simulations demonstrated the large between-study variability in P-values (range between <.0001 and 1 for most simulation conditions). A focus on statistically significant effects (P < .05), notably in small to moderate samples, led to strongly overestimated effect sizes (up to 240%) and many false-positive conclusions, that is statistically significant effects that were, in fact, true null effects. Effect sizes also exerted strong between-study variability, but confidence intervals accounted for this: the interval width decreased with larger sample size, and the percentage of intervals that contained the true effect size was accurate across simulation conditions. Reducing alpha level, as recently suggested, reduced false-positive conclusions but strongly increased the overestimation of significant effects (up to 320%).

CONCLUSIONS

Researchers and journals should abandon statistical significance as a pivotal element in most scientific publications. Confidence intervals around effect sizes are more informative, but should not merely be reported to comply with journal requirements.

摘要

背景

尽管人们经常对无效假设检验(NHST)提出批评,但人们仍然关注检验,有时是为了发表文章,有时则是受到期刊审稿人的鼓励。本文旨在使用简单的非技术示例来展示 NHST 的已知关键局限性。

设计

第一个示例基于比较两组结局事件的 20000 项研究的模拟数据。真实的效应大小(事件发生率的差异)和样本量(每组 20-100)有所不同。第二个示例使用了一项关于α-受体阻滞剂治疗输尿管结石的荟萃分析的真实数据。

结果

模拟结果表明 P 值的研究间变异性很大(在大多数模拟条件下,范围在<0.0001 至 1 之间)。关注有统计学意义的效应(P < 0.05),特别是在小到中等样本中,会导致效应估计值被严重高估(高达 240%)和许多假阳性结论,即实际上是无效假设的统计学上显著效应。效应大小也表现出很强的研究间变异性,但置信区间对此进行了说明:随着样本量的增大,区间宽度减小,包含真实效应大小的区间百分比在所有模拟条件下都是准确的。最近有人建议降低α水平可以减少假阳性结论,但会强烈增加对显著效应的高估(高达 320%)。

结论

研究人员和期刊应放弃将统计学意义作为大多数科学出版物的关键要素。效应大小的置信区间更具信息性,但不应仅仅为了满足期刊要求而报告。

相似文献

1
Consequences of relying on statistical significance: Some illustrations.依赖统计显著性的后果:一些例证。
Eur J Clin Invest. 2018 May;48(5):e12912. doi: 10.1111/eci.12912. Epub 2018 Feb 28.
2
Statistics in ophthalmology revisited: the (effect) size matters.眼科统计学再探:(效应)大小很重要。
Acta Ophthalmol. 2018 Nov;96(7):e885-e888. doi: 10.1111/aos.13756. Epub 2018 Sep 5.
3
Small class sizes for improving student achievement in primary and secondary schools: a systematic review.小班教学对提高中小学学生成绩的影响:一项系统综述。
Campbell Syst Rev. 2018 Oct 11;14(1):1-107. doi: 10.4073/csr.2018.10. eCollection 2018.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.随机对照试验中的亚组分析:量化假阳性和假阴性风险
Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330.
6
The continuing misuse of null hypothesis significance testing in biological anthropology.生物人类学中持续存在的对零假设显著性检验的误用。
Am J Phys Anthropol. 2018 May;166(1):236-245. doi: 10.1002/ajpa.23399. Epub 2018 Jan 18.
7
Decision qualities of Bayes factor and p value-based hypothesis testing.贝叶斯因子和基于 p 值的假设检验的决策质量。
Psychol Methods. 2017 Jun;22(2):340-360. doi: 10.1037/met0000140.
8
P > .05: The incorrect interpretation of "not significant" results is a significant problem.P > .05:对“不显著”结果的错误解释是一个严重的问题。
Am J Phys Anthropol. 2020 Aug;172(4):521-527. doi: 10.1002/ajpa.24092. Epub 2020 Jun 22.
9
The null hypothesis significance test in health sciences research (1995-2006): statistical analysis and interpretation.健康科学研究中的无效假设显著性检验(1995-2006):统计分析与解释。
BMC Med Res Methodol. 2010 May 19;10:44. doi: 10.1186/1471-2288-10-44.
10
Inappropriate use of statistical power.统计功效的误用。
Bone Marrow Transplant. 2023 May;58(5):474-477. doi: 10.1038/s41409-023-01935-3. Epub 2023 Mar 3.

引用本文的文献

1
The Trend to Use the Word "Trend" to Describe Nonsignificant Results in Orthopaedic Literature.在骨科文献中使用“趋势”一词来描述无显著意义结果的趋势。
JB JS Open Access. 2025 Apr 18;10(2). doi: 10.2106/JBJS.OA.24.00211. eCollection 2025 Apr-Jun.
2
Are Reproducible Dietary Patterns Consistently Associated With Disease Outcomes or Their Drivers in Italy? A Systematic Review.在意大利,可重复的饮食模式是否始终与疾病结局或其驱动因素相关?一项系统综述。
Adv Nutr. 2025 Apr;16(4):100397. doi: 10.1016/j.advnut.2025.100397. Epub 2025 Feb 27.
3
The performance of interrupted time series designs with a limited number of time points: Learning losses due to school closures during the COVID-19 pandemic.
中断时间序列设计在时间点有限的情况下的表现:由于 COVID-19 大流行期间学校关闭而导致的学习损失。
PLoS One. 2024 Aug 7;19(8):e0301301. doi: 10.1371/journal.pone.0301301. eCollection 2024.
4
Bayesian evidence synthesis as a flexible alternative to meta-analysis: A simulation study and empirical demonstration.贝叶斯证据综合作为荟萃分析的一种灵活替代方法:一项模拟研究和实证示范。
Behav Res Methods. 2024 Apr;56(4):4085-4102. doi: 10.3758/s13428-024-02350-2. Epub 2024 Mar 26.
5
Blood-Brain Barrier Dysfunction Predicts Microglial Activation After Traumatic Brain Injury in Juvenile Rats.血脑屏障功能障碍可预测幼年大鼠创伤性脑损伤后的小胶质细胞激活。
Neurotrauma Rep. 2024 Feb 8;5(1):95-116. doi: 10.1089/neur.2023.0057. eCollection 2024.
6
Estimating the minimal clinically important difference of shoulder functional scores after arthroscopic rotator cuff repair: a prospective study.关节镜下肩袖修复术后肩关节功能评分的最小临床重要差异估计:一项前瞻性研究。
Arch Orthop Trauma Surg. 2024 Apr;144(4):1693-1701. doi: 10.1007/s00402-024-05222-8. Epub 2024 Feb 22.
7
Chronic Effects of Static Stretching Exercises on Muscle Strength and Power in Healthy Individuals Across the Lifespan: A Systematic Review with Multi-level Meta-analysis.静态伸展运动对各年龄段健康个体肌肉力量和爆发力的慢性影响:系统评价与多层次荟萃分析。
Sports Med. 2023 Mar;53(3):723-745. doi: 10.1007/s40279-022-01806-9. Epub 2023 Jan 31.
8
Misinterpretations of P-values and statistical tests persists among researchers and professionals working with statistics and epidemiology.在从事统计学和流行病学研究的研究人员和专业人员中,对 P 值和统计检验的误解仍然存在。
Ups J Med Sci. 2022 Aug 4;127. doi: 10.48101/ujms.v127.8760. eCollection 2022.
9
Effect of Plyometric Jump Training on Skeletal Muscle Hypertrophy in Healthy Individuals: A Systematic Review With Multilevel Meta-Analysis.增强式跳跃训练对健康个体骨骼肌肥大的影响:一项多层次荟萃分析的系统评价
Front Physiol. 2022 Jun 27;13:888464. doi: 10.3389/fphys.2022.888464. eCollection 2022.
10
A teaching tool about the fickle p value and other statistical principles based on real-life data.基于真实数据的关于变幻莫测的 P 值和其他统计学原理的教学工具。
Naunyn Schmiedebergs Arch Pharmacol. 2021 Jun;394(6):1315-1319. doi: 10.1007/s00210-020-02045-3. Epub 2021 Jan 14.