Hopkins Will G, Batterham Alan M
Institute of Sport Exercise and Active Living, Victoria University, Melbourne, VIC, Australia.
Health and Social Care Institute, Teesside University, Middlesbrough, UK.
Sports Med. 2016 Oct;46(10):1563-73. doi: 10.1007/s40279-016-0517-x.
Statistical methods for inferring the true magnitude of an effect from a sample should have acceptable error rates when the true effect is trivial (type I rates) or substantial (type II rates).
The objective of this study was to quantify the error rates, rates of decisive (publishable) outcomes and publication bias of five inferential methods commonly used in sports medicine and science. The methods were conventional null-hypothesis significance testing [NHST] (significant and non-significant imply substantial and trivial true effects, respectively); conservative NHST (the observed magnitude is interpreted as the true magnitude only for significant effects); non-clinical magnitude-based inference [MBI] (the true magnitude is interpreted as the magnitude range of the 90 % confidence interval only for intervals not spanning substantial values of the opposite sign); clinical MBI (a possibly beneficial effect is recommended for implementation only if it is most unlikely to be harmful); and odds-ratio clinical MBI (implementation is also recommended when the odds of benefit outweigh the odds of harm, with an odds ratio >66).
Simulation was used to quantify standardized mean effects in 500,000 randomized, controlled trials each for true standardized magnitudes ranging from null through marginally moderate with three sample sizes: suboptimal (10 + 10), optimal for MBI (50 + 50) and optimal for NHST (144 + 144).
Type I rates for non-clinical MBI were always lower than for NHST. When type I rates for clinical MBI were higher, most errors were debatable, given the probabilistic qualification of those inferences (unlikely or possibly beneficial). NHST often had unacceptable rates for either type II errors or decisive outcomes, and it had substantial publication bias with the smallest sample size, whereas MBI had no such problems.
MBI is a trustworthy, nuanced alternative to NHST, which it outperforms in terms of the sample size, error rates, decision rates and publication bias.
当真实效应微不足道(I型错误率)或显著(II型错误率)时,用于从样本推断真实效应大小的统计方法应具有可接受的错误率。
本研究的目的是量化运动医学和科学中常用的五种推断方法的错误率、决定性(可发表)结果率和发表偏倚。这些方法包括传统的零假设显著性检验[NHST](显著和不显著分别意味着真实效应显著和微不足道);保守的NHST(仅对显著效应将观察到的大小解释为真实大小);非临床基于大小的推断[MBI](仅对不跨越相反符号显著值的区间,将真实大小解释为90%置信区间的大小范围);临床MBI(仅当可能有益的效应极不可能有害时,才建议实施);以及优势比临床MBI(当获益优势超过危害优势且优势比>66时,也建议实施)。
采用模拟方法,在500,000项随机对照试验中量化标准化平均效应,每项试验针对从零到轻微中等的真实标准化大小,有三种样本量:次优(10 + 10)、MBI最优(50 + 50)和NHST最优(144 + 144)。
非临床MBI的I型错误率始终低于NHST。当临床MBI的I型错误率较高时,鉴于这些推断的概率限定(不太可能或可能有益),大多数错误存在争议。NHST的II型错误率或决定性结果率往往不可接受,并且在样本量最小时存在显著的发表偏倚,而MBI没有此类问题。
MBI是NHST的一种值得信赖、细致入微的替代方法,在样本量、错误率、决策率和发表偏倚方面均优于NHST。