• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超越奈曼-皮尔逊:E值实现了基于数据驱动的α水平的假设检验。

Beyond Neyman-Pearson: E-values enable hypothesis testing with a data-driven alpha.

作者信息

Grünwald Peter D

机构信息

Machine Learning Group, National research institute for mathematics and computer science in the Netherlands (Centrum Wiskunde & Informatica), Amsterdam 1098 XG, The Netherlands.

Mathematical Institute, Leiden University, Leiden 2333 CC, The Netherlands.

出版信息

Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2302098121. doi: 10.1073/pnas.2302098121. Epub 2024 Sep 20.

DOI:10.1073/pnas.2302098121
PMID:39302968
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11441548/
Abstract

A standard practice in statistical hypothesis testing is to mention the -value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With -values, it is not clear how to use an extreme observation (e.g. [Formula: see text]) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post hoc, after observation of the data-thereby providing a handle on "roving [Formula: see text]'s." When Type-II risks are taken into consideration, the only admissible decision rules in the post hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail, whereas e-confidence sets and e-posteriors still provide valid risk guarantees. Sufficiently powerful e-values have by now been developed for a range of classical testing problems. We discuss the main challenges for wider development and deployment.

摘要

统计假设检验中的一个标准做法是在接受/拒绝决策的同时提及p值。我们展示了提及e值的优势。对于p值,尚不清楚如何利用极端观测值(例如[公式:见正文])来做出更好的频率主义决策。对于e值则很直接,因为它们在广义奈曼 - 皮尔逊框架下提供了第一类风险控制,其中决策任务(一般损失函数)在观测数据后事后确定,从而为“移动的[公式:见正文]”提供了一种处理方式。当考虑第二类风险时,事后设定中唯一可接受的决策规则结果是基于e值的。同样,如果指定错误置信区间时产生的损失没有预先确定,标准置信区间和分布可能会失效,而e置信集和e后验仍然提供有效的风险保证。到目前为止,已经为一系列经典检验问题开发出了足够强大的e值。我们讨论了更广泛开发和部署面临的主要挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4014/11441548/7a745fddd109/pnas.2302098121fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4014/11441548/7a745fddd109/pnas.2302098121fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4014/11441548/7a745fddd109/pnas.2302098121fig01.jpg

相似文献

1
Beyond Neyman-Pearson: E-values enable hypothesis testing with a data-driven alpha.超越奈曼-皮尔逊:E值实现了基于数据驱动的α水平的假设检验。
Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2302098121. doi: 10.1073/pnas.2302098121. Epub 2024 Sep 20.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
[Meta-analysis of the Italian studies on short-term effects of air pollution].[意大利关于空气污染短期影响研究的荟萃分析]
Epidemiol Prev. 2001 Mar-Apr;25(2 Suppl):1-71.
4
On Some Assumptions of the Null Hypothesis Statistical Testing.关于零假设统计检验的一些假设
Educ Psychol Meas. 2017 Jun;77(3):507-528. doi: 10.1177/0013164416667979. Epub 2016 Oct 5.
5
Inappropriate use of statistical power.统计功效的误用。
Bone Marrow Transplant. 2023 May;58(5):474-477. doi: 10.1038/s41409-023-01935-3. Epub 2023 Mar 3.
6
[Standard technical specifications for methacholine chloride (Methacholine) bronchial challenge test (2023)].[氯化乙酰甲胆碱支气管激发试验标准技术规范(2023年)]
Zhonghua Jie He He Hu Xi Za Zhi. 2024 Feb 12;47(2):101-119. doi: 10.3760/cma.j.cn112147-20231019-00247.
7
Phase II design with sequential testing of hypotheses within each stage.在每个阶段内进行假设序贯检验的II期设计。
J Biopharm Stat. 2014;24(4):768-84. doi: 10.1080/10543406.2014.900784.
8
Personal exposure to mixtures of volatile organic compounds: modeling and further analysis of the RIOPA data.个人对挥发性有机化合物混合物的暴露:RIOPA数据的建模与进一步分析
Res Rep Health Eff Inst. 2014 Jun(181):3-63.
9
P value and the theory of hypothesis testing: an explanation for new researchers.P 值与假设检验理论:对新研究人员的解释。
Clin Orthop Relat Res. 2010 Mar;468(3):885-92. doi: 10.1007/s11999-009-1164-4.
10
A logical analysis of null hypothesis significance testing using popular terminology.使用通俗术语对零假设显著性检验进行逻辑分析。
BMC Med Res Methodol. 2022 Sep 19;22(1):244. doi: 10.1186/s12874-022-01696-5.

本文引用的文献

1
The e-posterior.电子海报。
Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220146. doi: 10.1098/rsta.2022.0146. Epub 2023 Mar 27.
2
ALL-IN meta-analysis: breathing life into living systematic reviews.全盘荟萃分析:为系统综述注入生机。
F1000Res. 2022 May 19;11:549. doi: 10.12688/f1000research.74223.1. eCollection 2022.
3
Universal inference.普遍推断。
Proc Natl Acad Sci U S A. 2020 Jul 21;117(29):16880-16890. doi: 10.1073/pnas.1922664117. Epub 2020 Jul 6.
4
UNIFORMLY MOST POWERFUL BAYESIAN TESTS.一致最强大贝叶斯检验
Ann Stat. 2013;41(4):1716-1741. doi: 10.1214/13-AOS1123.