Unit of Psychometrics and Statistics, Department of Psychology, Faculty of Behavioural and Social Sciences, University of Groningen.
Unit of Psychological Methods, Department of Psychology, Faculty of Social and Behavioural Sciences, University of Amsterdam.
Psychol Methods. 2023 Jun;28(3):740-755. doi: 10.1037/met0000402. Epub 2021 Nov 4.
Some important research questions require the ability to find evidence for two conditions being practically equivalent. This is impossible to accomplish within the traditional frequentist null hypothesis significance testing framework; hence, other methodologies must be utilized. We explain and illustrate three approaches for finding evidence for equivalence: The frequentist two one-sided tests procedure, the Bayesian highest density interval region of practical equivalence procedure, and the Bayes factor interval null procedure. We compare the classification performances of these three approaches for various plausible scenarios. The results indicate that the Bayes factor interval null approach compares favorably to the other two approaches in terms of statistical power. Critically, compared with the Bayes factor interval null procedure, the two one-sided tests and the highest density interval region of practical equivalence procedures have limited discrimination capabilities when the sample size is relatively small: Specifically, in order to be practically useful, these two methods generally require over 250 cases within each condition when rather large equivalence margins of approximately .2 or .3 are used; for smaller equivalence margins even more cases are required. Because of these results, we recommend that researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence, especially for studies that are constrained on sample size. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
一些重要的研究问题需要能够找到两种情况在实践中等效的证据。这在传统的频率主义零假设显著性检验框架内是不可能实现的;因此,必须采用其他方法。我们解释并说明了三种寻找等效证据的方法:频率主义的两个单边检验程序、贝叶斯实用等效密度区间的最高密度区间程序和贝叶斯因子区间零假设程序。我们比较了这三种方法在各种可能情况下的分类性能。结果表明,贝叶斯因子区间零假设方法在统计功效方面优于其他两种方法。至关重要的是,与贝叶斯因子区间零假设程序相比,当样本量相对较小时,两个单边检验和实用等效密度区间的最高密度区间程序的判别能力有限:具体而言,为了具有实际用途,当使用大约 0.2 或 0.3 的较大等效边际时,这两种方法通常需要在每个条件下有超过 250 个案例;对于较小的等效边际,需要更多的案例。由于这些结果,我们建议研究人员更多地依赖贝叶斯因子区间零假设方法来量化等效证据,特别是对于样本量受到限制的研究。(PsycInfo 数据库记录(c)2023 APA,保留所有权利)。