Suppr超能文献

依赖统计显著性的后果:一些例证。

Consequences of relying on statistical significance: Some illustrations.

机构信息

Department of Development and Regeneration, KU Leuven, Leuven, Belgium.

Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands.

出版信息

Eur J Clin Invest. 2018 May;48(5):e12912. doi: 10.1111/eci.12912. Epub 2018 Feb 28.

Abstract

BACKGROUND

Despite regular criticisms of null hypothesis significance testing (NHST), a focus on testing persists, sometimes in the belief to get published and sometimes encouraged by journal reviewers. This paper aims to demonstrate known key limitations of NHST using simple nontechnical illustrations.

DESIGN

The first illustration is based on simulated data of 20 000 studies that compare two groups for an outcome event. The true effect size (difference in event rates) and sample size (20-100 per group) were varied. The second illustration used real data from a meta-analysis on alpha-blockers for the treatment of ureteric stones.

RESULTS

The simulations demonstrated the large between-study variability in P-values (range between <.0001 and 1 for most simulation conditions). A focus on statistically significant effects (P < .05), notably in small to moderate samples, led to strongly overestimated effect sizes (up to 240%) and many false-positive conclusions, that is statistically significant effects that were, in fact, true null effects. Effect sizes also exerted strong between-study variability, but confidence intervals accounted for this: the interval width decreased with larger sample size, and the percentage of intervals that contained the true effect size was accurate across simulation conditions. Reducing alpha level, as recently suggested, reduced false-positive conclusions but strongly increased the overestimation of significant effects (up to 320%).

CONCLUSIONS

Researchers and journals should abandon statistical significance as a pivotal element in most scientific publications. Confidence intervals around effect sizes are more informative, but should not merely be reported to comply with journal requirements.

摘要

背景

尽管人们经常对无效假设检验(NHST)提出批评,但人们仍然关注检验,有时是为了发表文章,有时则是受到期刊审稿人的鼓励。本文旨在使用简单的非技术示例来展示 NHST 的已知关键局限性。

设计

第一个示例基于比较两组结局事件的 20000 项研究的模拟数据。真实的效应大小(事件发生率的差异)和样本量(每组 20-100)有所不同。第二个示例使用了一项关于α-受体阻滞剂治疗输尿管结石的荟萃分析的真实数据。

结果

模拟结果表明 P 值的研究间变异性很大(在大多数模拟条件下,范围在<0.0001 至 1 之间)。关注有统计学意义的效应(P < 0.05),特别是在小到中等样本中,会导致效应估计值被严重高估(高达 240%)和许多假阳性结论,即实际上是无效假设的统计学上显著效应。效应大小也表现出很强的研究间变异性,但置信区间对此进行了说明:随着样本量的增大,区间宽度减小,包含真实效应大小的区间百分比在所有模拟条件下都是准确的。最近有人建议降低α水平可以减少假阳性结论,但会强烈增加对显著效应的高估(高达 320%)。

结论

研究人员和期刊应放弃将统计学意义作为大多数科学出版物的关键要素。效应大小的置信区间更具信息性,但不应仅仅为了满足期刊要求而报告。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验