• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物人类学中持续存在的对零假设显著性检验的误用。

The continuing misuse of null hypothesis significance testing in biological anthropology.

机构信息

Department of Anthropology, Washington University in St. Louis, St. Louis, MO, 63130.

出版信息

Am J Phys Anthropol. 2018 May;166(1):236-245. doi: 10.1002/ajpa.23399. Epub 2018 Jan 18.

DOI:10.1002/ajpa.23399
PMID:29345299
Abstract

There is over 60 years of discussion in the statistical literature concerning the misuse and limitations of null hypothesis significance tests (NHST). Based on the prevalence of NHST in biological anthropology research, it appears that the discipline generally is unaware of these concerns. The p values used in NHST usually are interpreted incorrectly. A p value indicates the probability of the data given the null hypothesis. It should not be interpreted as the probability that the null hypothesis is true or as evidence for or against any specific alternative to the null hypothesis. P values are a function of both the sample size and the effect size, and therefore do not indicate whether the effect observed in the study is important, large, or small. P values have poor replicability in repeated experiments. The distribution of p values is continuous and varies from 0 to 1.0. The use of a cut-off, generally p ≤ 0.05, to separate significant from nonsignificant results, is an arbitrary dichotomization of continuous variation. In 2016, the American Statistical Association issued a statement of principles regarding the misinterpretation of NHST, the first time it has done so regarding a specific statistical procedure in its 180-year history. Effect sizes and confidence intervals, which can be calculated for any data used to calculate p values, provide more and better information about tested hypotheses than p values and NHST.

摘要

在统计学文献中,关于无效假设显著性检验(NHST)的误用和局限性的讨论已经超过 60 年。基于 NHST 在生物人类学研究中的普遍存在,似乎该学科普遍没有意识到这些问题。NHST 中使用的 p 值通常被错误地解释。p 值表示给定零假设时数据的概率。它不应该被解释为零假设为真的概率,也不应该被解释为支持或反对零假设的任何特定替代方案的证据。p 值是样本量和效应量的函数,因此并不能说明研究中观察到的效应是否重要、大或小。p 值在重复实验中的可重复性较差。p 值的分布是连续的,范围从 0 到 1.0。使用截止值(通常为 p ≤ 0.05)将显著结果与非显著结果分开,是对连续变化的任意二分法。2016 年,美国统计协会就 NHST 的误解发表了一项原则声明,这是其 180 年历史上首次针对特定统计程序发表此类声明。可以为用于计算 p 值的任何数据计算效应量和置信区间,比 p 值和 NHST 提供更多和更好的关于检验假设的信息。

相似文献

1
The continuing misuse of null hypothesis significance testing in biological anthropology.生物人类学中持续存在的对零假设显著性检验的误用。
Am J Phys Anthropol. 2018 May;166(1):236-245. doi: 10.1002/ajpa.23399. Epub 2018 Jan 18.
2
P > .05: The incorrect interpretation of "not significant" results is a significant problem.P > .05:对“不显著”结果的错误解释是一个严重的问题。
Am J Phys Anthropol. 2020 Aug;172(4):521-527. doi: 10.1002/ajpa.24092. Epub 2020 Jun 22.
3
Statistics in ophthalmology revisited: the (effect) size matters.眼科统计学再探:(效应)大小很重要。
Acta Ophthalmol. 2018 Nov;96(7):e885-e888. doi: 10.1111/aos.13756. Epub 2018 Sep 5.
4
A review of issues about null hypothesis Bayesian testing.对零假设贝叶斯检验相关问题的综述。
Psychol Methods. 2019 Dec;24(6):774-795. doi: 10.1037/met0000221. Epub 2019 May 16.
5
In support of null hypothesis significance testing.支持零假设显著性检验。
Proc Biol Sci. 2004 Feb 7;271 Suppl 3(Suppl 3):S82-4. doi: 10.1098/rsbl.2003.0105.
6
Moving Beyond p < 0.05 in Ecotoxicology: A Guide for Practitioners.超越生态毒理学中的 p < 0.05:从业者指南。
Environ Toxicol Chem. 2020 Sep;39(9):1657-1669. doi: 10.1002/etc.4800.
7
Misconceptions, Misuses, and Misinterpretations of P Values and Significance Testing.对 P 值和显著性检验的误解、误用和曲解。
J Bone Joint Surg Am. 2017 Sep 20;99(18):1598-1603. doi: 10.2106/JBJS.16.01314.
8
Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: an integrated alternative method of conducting null hypothesis statistical tests.使用推断性置信区间评估统计差异、等效性和不确定性:进行零假设统计检验的一种综合替代方法。
Psychol Methods. 2001 Dec;6(4):371-86.
9
The controversy of significance testing: misconceptions and alternatives.显著性检验的争议:误解与替代方法
Am J Crit Care. 1999 Sep;8(5):291-6.
10
Moving college health research forward: Reconsidering our reliance on statistical significance testing.推动大学健康研究的发展:重新考虑我们对统计显著性检验的依赖。
J Am Coll Health. 2019 Apr;67(3):181-188. doi: 10.1080/07448481.2018.1470091. Epub 2018 Sep 19.

引用本文的文献

1
Exercise performance in well-trained male mice is promoted by intermittent hyperoxia via improving metabolic properties and capillary profiles.间歇性高氧通过改善代谢特性和毛细血管形态,促进训练有素的雄性小鼠的运动表现。
Physiol Rep. 2025 Apr;13(8):e70341. doi: 10.14814/phy2.70341.
2
Ancient Egyptian scribes and specific skeletal occupational risk markers (Abusir, Old Kingdom).古埃及抄写员与特定的骨骼职业风险标志物(阿布西尔,古王国)。
Sci Rep. 2024 Jun 27;14(1):13317. doi: 10.1038/s41598-024-63549-z.
3
Interactions with alloparents are associated with the diversity of infant skin and fecal bacterial communities in Chicago, United States.
在美国芝加哥,与异亲的互动与婴儿皮肤和粪便细菌群落的多样性有关。
Am J Hum Biol. 2025 Jan;37(1):e23972. doi: 10.1002/ajhb.23972. Epub 2023 Aug 26.
4
Formal models for the study of the relationship between fluctuating asymmetry and fitness in humans.用于研究人类中波动不对称性与适应度之间关系的形式模型。
Am J Biol Anthropol. 2022 Sep;179(1):73-84. doi: 10.1002/ajpa.24588. Epub 2022 Jul 21.
5
Womb to womb: Maternal litter size and birth weight but not adult characteristics predict early neonatal death of offspring in the common marmoset monkey.从子宫到子宫:母体产仔数和出生体重,但不是成年特征,可预测普通狨猴后代的早期新生儿死亡。
PLoS One. 2021 Jun 9;16(6):e0252093. doi: 10.1371/journal.pone.0252093. eCollection 2021.
6
Integrating buccal and occlusal dental microwear with isotope analyses for a complete paleodietary reconstruction of Holocene populations from Hungary.将颊侧和咬合牙齿微磨损与同位素分析相结合,以完整重建匈牙利全新世人口的古饮食。
Sci Rep. 2021 Mar 29;11(1):7034. doi: 10.1038/s41598-021-86369-x.
7
Testing lipid markers as predictors of all-cause morbidity, cardiac disease, and mortality risk in captive western lowland gorillas ().检测脂质标志物作为圈养西部低地大猩猩全因发病、心脏病和死亡风险的预测指标。
Primate Biol. 2020 Dec 17;7(2):41-59. doi: 10.5194/pb-7-41-2020. eCollection 2020.
8
Bolstering geometric morphometrics sample sizes with damaged and pathologic specimens: Is near enough good enough?利用受损和病变标本增强几何形态测量学样本量:足够接近是否足够好?
J Anat. 2021 Jun;238(6):1444-1455. doi: 10.1111/joa.13390. Epub 2021 Jan 9.
9
Beyond statistical significance.超出统计学显著性。
J Indian Prosthodont Soc. 2019 Jul-Sep;19(3):201-202. doi: 10.4103/jips.jips_207_19.
10
The exceptional abandonment of metal tools by North American hunter-gatherers, 3000 B.P.公元前 3000 年,北美狩猎采集者异常地放弃了金属工具。
Sci Rep. 2019 Apr 8;9(1):5756. doi: 10.1038/s41598-019-42185-y.