Suppr超能文献

超越奈曼-皮尔逊:E值实现了基于数据驱动的α水平的假设检验。

Beyond Neyman-Pearson: E-values enable hypothesis testing with a data-driven alpha.

作者信息

Grünwald Peter D

机构信息

Machine Learning Group, National research institute for mathematics and computer science in the Netherlands (Centrum Wiskunde & Informatica), Amsterdam 1098 XG, The Netherlands.

Mathematical Institute, Leiden University, Leiden 2333 CC, The Netherlands.

出版信息

Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2302098121. doi: 10.1073/pnas.2302098121. Epub 2024 Sep 20.

Abstract

A standard practice in statistical hypothesis testing is to mention the -value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With -values, it is not clear how to use an extreme observation (e.g. [Formula: see text]) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post hoc, after observation of the data-thereby providing a handle on "roving [Formula: see text]'s." When Type-II risks are taken into consideration, the only admissible decision rules in the post hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail, whereas e-confidence sets and e-posteriors still provide valid risk guarantees. Sufficiently powerful e-values have by now been developed for a range of classical testing problems. We discuss the main challenges for wider development and deployment.

摘要

统计假设检验中的一个标准做法是在接受/拒绝决策的同时提及p值。我们展示了提及e值的优势。对于p值,尚不清楚如何利用极端观测值(例如[公式:见正文])来做出更好的频率主义决策。对于e值则很直接,因为它们在广义奈曼 - 皮尔逊框架下提供了第一类风险控制,其中决策任务(一般损失函数)在观测数据后事后确定,从而为“移动的[公式:见正文]”提供了一种处理方式。当考虑第二类风险时,事后设定中唯一可接受的决策规则结果是基于e值的。同样,如果指定错误置信区间时产生的损失没有预先确定,标准置信区间和分布可能会失效,而e置信集和e后验仍然提供有效的风险保证。到目前为止,已经为一系列经典检验问题开发出了足够强大的e值。我们讨论了更广泛开发和部署面临的主要挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4014/11441548/7a745fddd109/pnas.2302098121fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验