Department of Mathematics, University of Siegen, Walter-Flex-Str. 3, Siegen, Germany.
BMC Med Res Methodol. 2021 Aug 17;21(1):171. doi: 10.1186/s12874-021-01341-7.
Null hypothesis significance testing (NHST) is among the most frequently employed methods in the biomedical sciences. However, the problems of NHST and p-values have been discussed widely and various Bayesian alternatives have been proposed. Some proposals focus on equivalence testing, which aims at testing an interval hypothesis instead of a precise hypothesis. An interval hypothesis includes a small range of parameter values instead of a single null value and the idea goes back to Hodges and Lehmann. As researchers can always expect to observe some (although often negligibly small) effect size, interval hypotheses are more realistic for biomedical research. However, the selection of an equivalence region (the interval boundaries) often seems arbitrary and several Bayesian approaches to equivalence testing coexist.
A new proposal is made how to determine the equivalence region for Bayesian equivalence tests based on objective criteria like type I error rate and power. Existing approaches to Bayesian equivalence testing in the two-sample setting are discussed with a focus on the Bayes factor and the region of practical equivalence (ROPE). A simulation study derives the necessary results to make use of the new method in the two-sample setting, which is among the most frequently carried out procedures in biomedical research.
Bayesian Hodges-Lehmann tests for statistical equivalence differ in their sensitivity to the prior modeling, power, and the associated type I error rates. The relationship between type I error rates, power and sample sizes for existing Bayesian equivalence tests is identified in the two-sample setting. Results allow to determine the equivalence region based on the new method by incorporating such objective criteria. Importantly, results show that not only can prior selection influence the type I error rate and power, but the relationship is even reverse for the Bayes factor and ROPE based equivalence tests.
Based on the results, researchers can select between the existing Bayesian Hodges-Lehmann tests for statistical equivalence and determine the equivalence region based on objective criteria, thus improving the reproducibility of biomedical research.
零假设显著性检验(NHST)是生物医学科学中最常用的方法之一。然而,NHST 和 p 值的问题已经被广泛讨论,并且已经提出了各种贝叶斯替代方法。一些建议侧重于等效性检验,旨在检验区间假设而不是精确假设。区间假设包括参数值的小范围,而不是单个零值,这个想法可以追溯到霍奇斯和莱曼。由于研究人员总是可以预期观察到一些(尽管通常微不足道)的效应大小,因此区间假设更适合生物医学研究。然而,等效区域(区间边界)的选择通常看起来是任意的,并且存在几种贝叶斯等效性检验方法。
提出了一种新的方法,如何根据 I 型错误率和功效等客观标准来确定贝叶斯等效检验的等效区域。讨论了双样本设置中现有的贝叶斯等效检验方法,重点是贝叶斯因子和实际等效区域(ROPE)。一项模拟研究得出了在双样本设置中使用新方法的必要结果,这是生物医学研究中最常进行的程序之一。
贝叶斯霍奇斯-莱曼统计等效性检验在对先验建模、功效和相关 I 型错误率的敏感性方面有所不同。在双样本设置中确定了现有的贝叶斯等效检验的 I 型错误率、功效和样本大小之间的关系。结果允许根据新方法确定等效区域,方法是纳入这些客观标准。重要的是,结果表明,不仅可以通过先验选择来影响 I 型错误率和功效,而且对于基于贝叶斯因子和 ROPE 的等效性检验,这种关系甚至是相反的。
根据结果,研究人员可以在现有的贝叶斯霍奇斯-莱曼统计等效性检验之间进行选择,并根据客观标准确定等效区域,从而提高生物医学研究的可重复性。