贝叶斯假设检验中可选择停止的频率主义影响。

The frequentist implications of optional stopping on Bayesian hypothesis tests.

作者信息

Sanborn Adam N, Hills Thomas T

机构信息

Department of Psychology, University of Warwick, Coventry, CV4 7AL, UK,

出版信息

Psychon Bull Rev. 2014 Apr;21(2):283-300. doi: 10.3758/s13423-013-0518-9.

DOI:10.3758/s13423-013-0518-9

PMID:24101570

Abstract

Null hypothesis significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the data is evaluated, and if it is low enough, the null hypothesis is rejected. However, because common experimental practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use tests that assume that sample sizes are fixed in advance of data collection but then use the data to determine when to stop; in the limit, experimenters can use data monitoring to guarantee that the null hypothesis will be rejected. Bayesian hypothesis testing (BHT) provides a solution to these ills because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here, we show that these guaranteed bounds have limited scope and often do not apply in psychological research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: (1) when the truth is a combination of the hypotheses, such as in a heterogeneous population, and (2) when a hypothesis is composite-taking multiple parameter values-such as the alternative hypothesis in a t-test. We found that, for these situations, while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of experimenters finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT.

摘要

零假设显著性检验（NHST）是心理学中最常用的统计方法。要评估获得一个与从数据中得到的统计量一样极端或更极端的值的概率，如果这个概率足够低，就拒绝零假设。然而，由于常见的实验操作常常与NHST所依据的假设相冲突，这些计算出的概率往往是不正确的。最常见的情况是，实验者使用的检验方法假定样本量在数据收集之前就已确定，但随后却利用数据来决定何时停止；在极端情况下，实验者可以通过数据监测来确保零假设会被拒绝。贝叶斯假设检验（BHT）为这些问题提供了解决方案，因为所使用的停止规则与贝叶斯因子的计算无关。此外，对于担心停止规则可能会影响所产生的贝叶斯因子的研究人员来说，BHT在频率论性质方面有强有力的数学保证，这让人安心。在这里，我们表明这些有保证的界限范围有限，在心理学研究中常常并不适用。具体来说，我们定量地证明了在两种常见情况下，选择性停止对所得贝叶斯因子的影响：（1）当真相是假设的组合时，比如在异质总体中；（2）当一个假设是复合的——取多个参数值时，比如t检验中的备择假设。我们发现，对于这些情况，尽管无论使用何种停止规则，贝叶斯解释仍然是正确的，但在某些情况下，停止规则的选择会大大增加实验者找到他们所期望方向证据的机会。我们提出了一些方法来控制停止规则对BHT的这些频率论影响。

相似文献

The frequentist implications of optional stopping on Bayesian hypothesis tests.

Psychon Bull Rev. 2014 Apr;21(2):283-300. doi: 10.3758/s13423-013-0518-9.

Optional stopping: no problem for Bayesians.

Psychon Bull Rev. 2014 Apr;21(2):301-8. doi: 10.3758/s13423-014-0595-4.

Worked-out examples of the adequacy of Bayesian optional stopping.

Psychon Bull Rev. 2022 Feb;29(1):70-87. doi: 10.3758/s13423-021-01962-5. Epub 2021 Jul 12.

Waldian t tests: Sequential Bayesian t tests with controlled error probabilities.

Psychol Methods. 2024 Feb;29(1):99-116. doi: 10.1037/met0000492. Epub 2022 Apr 14.

Bayesian alternatives to null hypothesis significance testing in biomedical research: a non-technical introduction to Bayesian inference with JASP.

BMC Med Res Methodol. 2020 Jun 5;20(1):142. doi: 10.1186/s12874-020-00980-6.

To P or Not to P: Backing Bayesian Statistics.

Otolaryngol Head Neck Surg. 2017 Dec;157(6):915-918. doi: 10.1177/0194599817739260.

Reply to Rouder (2014): good frequentist properties raise confidence.

Psychon Bull Rev. 2014 Apr;21(2):309-11. doi: 10.3758/s13423-014-0607-4.

Improving Inferences About Null Effects With Bayes Factors and Equivalence Tests.

J Gerontol B Psychol Sci Soc Sci. 2020 Jan 1;75(1):45-57. doi: 10.1093/geronb/gby065.

Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences.

Psychol Methods. 2017 Jun;22(2):322-339. doi: 10.1037/met0000061. Epub 2015 Dec 14.

A Bayesian Analysis of Evidence in Support of the Null Hypothesis in Gerontological Psychology (or Lack Thereof).

J Gerontol B Psychol Sci Soc Sci. 2020 Jan 1;75(1):58-66. doi: 10.1093/geronb/gbz033.

引用本文的文献

Evidence-Based Approaches to Quality Improvement: A Narrative Review of Integrating Bayesian Adaptive Trials Into Health Services.

J Eval Clin Pract. 2025 Aug;31(5):e70197. doi: 10.1111/jep.70197.

Model-averaged Bayesian t tests.

Psychon Bull Rev. 2025 Jun;32(3):1007-1031. doi: 10.3758/s13423-024-02590-5. Epub 2024 Nov 7.

Visual statistical learning requires attention.

Psychon Bull Rev. 2025 Jun;32(3):1240-1253. doi: 10.3758/s13423-024-02605-1. Epub 2024 Nov 4.

A Good check on the Bayes factor.

Behav Res Methods. 2024 Dec;56(8):8552-8566. doi: 10.3758/s13428-024-02491-4. Epub 2024 Sep 4.

The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits.

Front Artif Intell. 2021 Jul 9;4:715690. doi: 10.3389/frai.2021.715690. eCollection 2021.

Early stopping in clinical PET studies: How to reduce expense and exposure.

J Cereb Blood Flow Metab. 2021 Nov;41(11):2805-2819. doi: 10.1177/0271678X211017796. Epub 2021 May 21.

Why optional stopping can be a problem for Bayesians.

Psychon Bull Rev. 2021 Jun;28(3):795-812. doi: 10.3758/s13423-020-01803-x.

Reproducibility in Cognitive Hearing Research: Theoretical Considerations and Their Practical Application in Multi-Lab Studies.

Front Psychol. 2020 Jul 16;11:1590. doi: 10.3389/fpsyg.2020.01590. eCollection 2020.

Moving Sport and Exercise Science Forward: A Call for the Adoption of More Transparent Research Practices.

Sports Med. 2020 Mar;50(3):449-459. doi: 10.1007/s40279-019-01227-1.

Thou Shalt Not Bear False Witness Against Null Hypothesis Significance Testing.

Educ Psychol Meas. 2017 Aug;77(4):631-662. doi: 10.1177/0013164416668232. Epub 2016 Oct 5.

本文引用的文献

Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests.

Perspect Psychol Sci. 2011 May;6(3):291-8. doi: 10.1177/1745691611406923.

When decision heuristics and science collide.

Psychon Bull Rev. 2014 Apr;21(2):268-82. doi: 10.3758/s13423-013-0495-z.

Measuring the prevalence of questionable research practices with incentives for truth telling.

Psychol Sci. 2012 May 1;23(5):524-32. doi: 10.1177/0956797611430953. Epub 2012 Apr 16.

Too good to be true: publication bias in two prominent studies from experimental psychology.

Psychon Bull Rev. 2012 Apr;19(2):151-6. doi: 10.3758/s13423-012-0227-9.

Bayes factor approaches for testing interval null hypotheses.

Psychol Methods. 2011 Dec;16(4):406-19. doi: 10.1037/a0024377. Epub 2011 Jul 25.

Source reliability and the conjunction fallacy.

Cogn Sci. 2011 May-Jun;35(4):682-711. doi: 10.1111/j.1551-6709.2011.01170.x. Epub 2011 Mar 7.

Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011).

J Pers Soc Psychol. 2011 Mar;100(3):426-32. doi: 10.1037/a0022790.

Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect.

J Pers Soc Psychol. 2011 Mar;100(3):407-25. doi: 10.1037/a0021524.

What to believe: Bayesian methods for data analysis.

Trends Cogn Sci. 2010 Jul;14(7):293-300. doi: 10.1016/j.tics.2010.05.001. Epub 2010 Jun 11.

Bayesian hypothesis testing for psychologists: a tutorial on the Savage-Dickey method.

Cogn Psychol. 2010 May;60(3):158-89. doi: 10.1016/j.cogpsych.2009.12.001. Epub 2010 Jan 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

贝叶斯假设检验中可选择停止的频率主义影响。

The frequentist implications of optional stopping on Bayesian hypothesis tests.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献