UCAM Universidad Católica de Murcia.
Psicothema. 2018 Feb;30(1):110-115. doi: 10.7334/psicothema2017.308.
The p-value is currently one of the key elements for testing statistical hypothesis despite its critics. Bayesian statistics and Bayes Factors have been proposed as alternatives to improve the scientific decision making when testing a hypothesis. This study compares the performance of two Bayes Factor estimations (the BIC-based Bayes Factor and the Vovk-Sellke p-value calibration) with the p-value when the null hypothesis holds.
A million pairs of independent data sets were simulated. All simulated data came from a normal population and different sample sizes were considered. Exact p-values for comparing sample means were recorded for each sample pair as well as Bayesian alternatives.
Bayes factors exhibit better performance than the p-value, favouring the null hypothesis over the alternative. The BIC-based Bayes Factor is more accurate than the p-value calibration under the simulation conditions and this behaviour improves as the sample size grows.
Our results show that Bayesian factors are good complements for testing a hypothesis. The use of the Bayesian alternatives we have tested could help researchers avoid claiming false statistical discoveries. We suggest using classical and Bayesian statistics together instead of rejecting either of them.
尽管受到批评,p 值仍是目前检验统计假设的关键要素之一。贝叶斯统计和贝叶斯因子已被提议作为替代方法,以提高检验假设时的科学决策能力。本研究比较了当零假设成立时,两种贝叶斯因子估计(基于 BIC 的贝叶斯因子和 Vovk-Sellke p 值校准)与 p 值的性能。
模拟了一百万对独立数据集。所有模拟数据均来自正态总体,考虑了不同的样本大小。为每个样本对记录了用于比较样本均值的精确 p 值以及贝叶斯替代值。
贝叶斯因子的表现优于 p 值,更倾向于零假设而非备择假设。在模拟条件下,基于 BIC 的贝叶斯因子比 p 值校准更准确,并且随着样本量的增加,这种行为会得到改善。
我们的结果表明,贝叶斯因子是检验假设的良好补充。使用我们测试过的贝叶斯替代方法可以帮助研究人员避免声称虚假的统计发现。我们建议将经典统计学和贝叶斯统计学结合使用,而不是排斥其中任何一种。