Department of Health, Kinesiology, and Recreation, University of Utah, Salt Lake City, Utah, United States of America.
Department of Physical Therapy and Athletic Training, University of Utah, Salt Lake City, Utah, United States of America.
PLoS One. 2020 Jun 26;15(6):e0235318. doi: 10.1371/journal.pone.0235318. eCollection 2020.
Magnitude-based inference (MBI) is a controversial statistical method that has been used in hundreds of papers in sports science despite criticism from statisticians. To better understand how this method has been applied in practice, we systematically reviewed 232 papers that used MBI. We extracted data on study design, sample size, and choice of MBI settings and parameters. Median sample size was 10 per group (interquartile range, IQR: 8-15) for multi-group studies and 14 (IQR: 10-24) for single-group studies; few studies reported a priori sample size calculations (15%). Authors predominantly applied MBI's default settings and chose "mechanistic/non-clinical" rather than "clinical" MBI even when testing clinical interventions (only 16 studies out of 232 used clinical MBI). Using these data, we can estimate the Type I error rates for the typical MBI study. Authors frequently made dichotomous claims about effects based on the MBI criterion of a "likely" effect and sometimes based on the MBI criterion of a "possible" effect. When the sample size is n = 8 to 15 per group, these inferences have Type I error rates of 12%-22% and 22%-45%, respectively. High Type I error rates were compounded by multiple testing: Authors reported results from a median of 30 tests related to outcomes; and few studies specified a primary outcome (14%). We conclude that MBI has promoted small studies, promulgated a "black box" approach to statistics, and led to numerous papers where the conclusions are not supported by the data. Amidst debates over the role of p-values and significance testing in science, MBI also provides an important natural experiment: we find no evidence that moving researchers away from p-values or null hypothesis significance testing makes them less prone to dichotomization or over-interpretation of findings.
基于幅度的推断(MBI)是一种有争议的统计方法,尽管受到统计学家的批评,但它已在数百篇体育科学论文中得到应用。为了更好地了解该方法在实践中的应用情况,我们系统地回顾了 232 篇使用 MBI 的论文。我们提取了关于研究设计、样本量以及 MBI 设置和参数选择的数据。多组研究的中位数样本量为每组 10 个(四分位距,IQR:8-15),单组研究为 14 个(IQR:10-24);很少有研究报告了事先的样本量计算(15%)。作者主要应用 MBI 的默认设置,并选择“机械/非临床”而不是“临床”MBI,即使在测试临床干预时也是如此(232 篇论文中只有 16 篇使用了临床 MBI)。使用这些数据,我们可以估计典型 MBI 研究的Ⅰ型错误率。作者经常根据 MBI 标准的“可能”效应和基于 MBI 标准的“可能”效应做出关于效应的二分法结论。当样本量为每组 8-15 时,这些推断的Ⅰ型错误率分别为 12%-22%和 22%-45%。高Ⅰ型错误率是由多重检验造成的:作者报告了与结果相关的中位数为 30 个测试的结果;并且很少有研究指定了主要结果(14%)。我们得出结论,MBI 促进了小样本研究,推行了一种“黑箱”统计方法,并导致了大量结论与数据不支持的论文。在关于 p 值和显著性检验在科学中的作用的争论中,MBI 也提供了一个重要的自然实验:我们没有发现将研究人员从 p 值或零假设显著性检验中转移走会使他们不太倾向于二分法或过度解释发现的证据。