几种推理方法的错误率、决定性结果与发表偏倚

Error Rates, Decisive Outcomes and Publication Bias with Several Inferential Methods.

作者信息

Hopkins Will G, Batterham Alan M

机构信息

Institute of Sport Exercise and Active Living, Victoria University, Melbourne, VIC, Australia.

Health and Social Care Institute, Teesside University, Middlesbrough, UK.

出版信息

Sports Med. 2016 Oct;46(10):1563-73. doi: 10.1007/s40279-016-0517-x.

DOI:10.1007/s40279-016-0517-x

PMID:26971328

Abstract

BACKGROUND

Statistical methods for inferring the true magnitude of an effect from a sample should have acceptable error rates when the true effect is trivial (type I rates) or substantial (type II rates).

OBJECTIVE

The objective of this study was to quantify the error rates, rates of decisive (publishable) outcomes and publication bias of five inferential methods commonly used in sports medicine and science. The methods were conventional null-hypothesis significance testing [NHST] (significant and non-significant imply substantial and trivial true effects, respectively); conservative NHST (the observed magnitude is interpreted as the true magnitude only for significant effects); non-clinical magnitude-based inference [MBI] (the true magnitude is interpreted as the magnitude range of the 90 % confidence interval only for intervals not spanning substantial values of the opposite sign); clinical MBI (a possibly beneficial effect is recommended for implementation only if it is most unlikely to be harmful); and odds-ratio clinical MBI (implementation is also recommended when the odds of benefit outweigh the odds of harm, with an odds ratio >66).

METHODS

Simulation was used to quantify standardized mean effects in 500,000 randomized, controlled trials each for true standardized magnitudes ranging from null through marginally moderate with three sample sizes: suboptimal (10 + 10), optimal for MBI (50 + 50) and optimal for NHST (144 + 144).

RESULTS

Type I rates for non-clinical MBI were always lower than for NHST. When type I rates for clinical MBI were higher, most errors were debatable, given the probabilistic qualification of those inferences (unlikely or possibly beneficial). NHST often had unacceptable rates for either type II errors or decisive outcomes, and it had substantial publication bias with the smallest sample size, whereas MBI had no such problems.

CONCLUSION

MBI is a trustworthy, nuanced alternative to NHST, which it outperforms in terms of the sample size, error rates, decision rates and publication bias.

摘要

背景

当真实效应微不足道（I型错误率）或显著（II型错误率）时，用于从样本推断真实效应大小的统计方法应具有可接受的错误率。

目的

本研究的目的是量化运动医学和科学中常用的五种推断方法的错误率、决定性（可发表）结果率和发表偏倚。这些方法包括传统的零假设显著性检验[NHST]（显著和不显著分别意味着真实效应显著和微不足道）；保守的NHST（仅对显著效应将观察到的大小解释为真实大小）；非临床基于大小的推断[MBI]（仅对不跨越相反符号显著值的区间，将真实大小解释为90%置信区间的大小范围）；临床MBI（仅当可能有益的效应极不可能有害时，才建议实施）；以及优势比临床MBI（当获益优势超过危害优势且优势比>66时，也建议实施）。

方法

采用模拟方法，在500,000项随机对照试验中量化标准化平均效应，每项试验针对从零到轻微中等的真实标准化大小，有三种样本量：次优（10 + 10）、MBI最优（50 + 50）和NHST最优（144 + 144）。

结果

非临床MBI的I型错误率始终低于NHST。当临床MBI的I型错误率较高时，鉴于这些推断的概率限定（不太可能或可能有益），大多数错误存在争议。NHST的II型错误率或决定性结果率往往不可接受，并且在样本量最小时存在显著的发表偏倚，而MBI没有此类问题。

结论

MBI是NHST的一种值得信赖、细致入微的替代方法，在样本量、错误率、决策率和发表偏倚方面均优于NHST。

相似文献

Error Rates, Decisive Outcomes and Publication Bias with Several Inferential Methods.

Sports Med. 2016 Oct;46(10):1563-73. doi: 10.1007/s40279-016-0517-x.

The Problem with "Magnitude-based Inference".

Med Sci Sports Exerc. 2018 Oct;50(10):2166-2176. doi: 10.1249/MSS.0000000000001645.

Systematic review of the use of "magnitude-based inference" in sports science and medicine.

PLoS One. 2020 Jun 26;15(6):e0235318. doi: 10.1371/journal.pone.0235318. eCollection 2020.

Estimation versus falsification approaches in sport and exercise science.

J Sports Sci. 2019 Jan;37(1):3-4. doi: 10.1080/02640414.2018.1479116. Epub 2018 May 22.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

The continuing misuse of null hypothesis significance testing in biological anthropology.

Am J Phys Anthropol. 2018 May;166(1):236-245. doi: 10.1002/ajpa.23399. Epub 2018 Jan 18.

Decision qualities of Bayes factor and p value-based hypothesis testing.

Psychol Methods. 2017 Jun;22(2):340-360. doi: 10.1037/met0000140.

Bayesian alternatives for common null-hypothesis significance tests in psychiatry: a non-technical guide using JASP.

BMC Psychiatry. 2018 Jun 7;18(1):178. doi: 10.1186/s12888-018-1761-4.

Current controversies: Null hypothesis significance testing.

Acta Obstet Gynecol Scand. 2022 Jun;101(6):624-627. doi: 10.1111/aogs.14366. Epub 2022 Apr 22.

Are most published research findings false in a continuous universe?

PLoS One. 2022 Dec 20;17(12):e0277935. doi: 10.1371/journal.pone.0277935. eCollection 2022.

引用本文的文献

Quantifying Running Economy in Amateur Runners: Evaluating VO and Energy Cost with Model-based Normalization.

J Sports Sci Med. 2025 Sep 1;24(3):684-695. doi: 10.52082/jssm.2025.684. eCollection 2025 Sep.

Determining relative population-specific acceleration intensity thresholds in soccer using game locomotion data: Validation of a new method using data from male youth elite players.

PLoS One. 2025 May 9;20(5):e0321275. doi: 10.1371/journal.pone.0321275. eCollection 2025.

Pain is Modulated Differently Between Females With and Without Patellofemoral Pain: Factors Related to Sensitization.

J Athl Train. 2025 Feb 1;60(2):125-133. doi: 10.4085/1062-6050-0124.24.

Allometric exponents for scaling running economy in human samples: A systematic review and meta-analysis.

Heliyon. 2024 May 14;10(10):e31211. doi: 10.1016/j.heliyon.2024.e31211. eCollection 2024 May 30.

The test-retest reliability of physiological and perceptual responses during treadmill load carriage.

Eur J Appl Physiol. 2024 Jul;124(7):2093-2100. doi: 10.1007/s00421-024-05435-0. Epub 2024 Feb 28.

Frequentist, Bayesian Analysis and Complementary Statistical Tools for Geriatric and Rehabilitation Fields: Are Traditional Null-Hypothesis Significance Testing Methods Sufficient?

Clin Interv Aging. 2024 Feb 16;19:277-287. doi: 10.2147/CIA.S441799. eCollection 2024.

The associations between physical-test performance and match performance in women's Rugby Sevens players.

Biol Sport. 2023 Jul;40(3):775-785. doi: 10.5114/biolsport.2023.119985. Epub 2022 Nov 18.

Replacing statistical significance and non-significance with better approaches to sampling uncertainty.

Front Physiol. 2022 Sep 5;13:962132. doi: 10.3389/fphys.2022.962132. eCollection 2022.

Effects of a workplace exercise intervention on cardiometabolic health: study protocol for a randomised controlled trial.

BMJ Open. 2021 Nov 3;11(11):e051070. doi: 10.1136/bmjopen-2021-051070.

MCQ-Balance: a method to monitor patients with balance disorders and improve clinical interpretation of posturography.

PeerJ. 2021 Feb 23;9:e10916. doi: 10.7717/peerj.10916. eCollection 2021.

本文引用的文献

P-values as percentiles. Commentary on: "Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations".

Front Psychol. 2015 Apr 1;6:341. doi: 10.3389/fpsyg.2015.00341. eCollection 2015.

The case for magnitude-based inference.

Med Sci Sports Exerc. 2015 Apr;47(4):885. doi: 10.1249/MSS.0000000000000551.

The fickle P value generates irreproducible results.

Nat Methods. 2015 Mar;12(3):179-85. doi: 10.1038/nmeth.3288.

So what does this all mean?

Phys Ther Sport. 2015 Feb;16(1):1-2. doi: 10.1016/j.ptsp.2014.10.005. Epub 2014 Oct 25.

"Magnitude-based inference": a statistical review.

Med Sci Sports Exerc. 2015 Apr;47(4):874-84. doi: 10.1249/MSS.0000000000000451.

Assessing methods to specify the target difference for a randomised controlled trial: DELTA (Difference ELicitation in TriAls) review.

Health Technol Assess. 2014 May;18(28):v-vi, 1-175. doi: 10.3310/hta18280.

Scientific method: statistical errors.

Nature. 2014 Feb 13;506(7487):150-2. doi: 10.1038/506150a.

The new statistics: why and how.

Psychol Sci. 2014 Jan;25(1):7-29. doi: 10.1177/0956797613504966. Epub 2013 Nov 12.

The ongoing tyranny of statistical significance testing in biomedical research.

Eur J Epidemiol. 2010 Apr;25(4):225-30. doi: 10.1007/s10654-010-9440-x. Epub 2010 Mar 26.

An imaginary Bayesian monster.

Int J Sports Physiol Perform. 2008 Dec;3(4):411-2. doi: 10.1123/ijspp.3.4.411.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

几种推理方法的错误率、决定性结果与发表偏倚

Error Rates, Decisive Outcomes and Publication Bias with Several Inferential Methods.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献