• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

面对不确定性时的乐观态度由一种经过统计设计的多臂赌博机算法提供支持。

Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm.

作者信息

Kamiura Moto, Sano Kohei

机构信息

Graduate School of Science and Engineering, Tokyo Denki University, Japan; School of Science and Engineering, Tokyo Denki University, Japan.

Graduate School of Science and Engineering, Tokyo Denki University, Japan.

出版信息

Biosystems. 2017 Oct;160:25-32. doi: 10.1016/j.biosystems.2017.08.004. Epub 2017 Aug 22.

DOI:10.1016/j.biosystems.2017.08.004
PMID:28838871
Abstract

The principle of optimism in the face of uncertainty is known as a heuristic in sequential decision-making problems. Overtaking method based on this principle is an effective algorithm to solve multi-armed bandit problems. It was defined by a set of some heuristic patterns of the formulation in the previous study. The objective of the present paper is to redefine the value functions of Overtaking method and to unify the formulation of them. The unified Overtaking method is associated with upper bounds of confidence intervals of expected rewards on statistics. The unification of the formulation enhances the universality of Overtaking method. Consequently we newly obtain Overtaking method for the exponentially distributed rewards, numerically analyze it, and show that it outperforms UCB algorithm on average. The present study suggests that the principle of optimism in the face of uncertainty should be regarded as the statistics-based consequence of the law of large numbers for the sample mean of rewards and estimation of upper bounds of expected rewards, rather than as a heuristic, in the context of multi-armed bandit problems.

摘要

在序列决策问题中,面对不确定性时的乐观原则被称为一种启发式方法。基于此原则的超车方法是解决多臂老虎机问题的一种有效算法。它是由先前研究中一组特定的启发式模式定义的。本文的目的是重新定义超车方法的价值函数并统一其公式。统一的超车方法与统计中预期奖励置信区间的上限相关。公式的统一增强了超车方法的通用性。因此,我们新得到了适用于指数分布奖励的超车方法,对其进行了数值分析,并表明它平均优于UCB算法。本研究表明,在多臂老虎机问题的背景下,面对不确定性时的乐观原则应被视为基于奖励样本均值的大数定律和预期奖励上限估计的基于统计的结果,而不是一种启发式方法。

相似文献

1
Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm.面对不确定性时的乐观态度由一种经过统计设计的多臂赌博机算法提供支持。
Biosystems. 2017 Oct;160:25-32. doi: 10.1016/j.biosystems.2017.08.004. Epub 2017 Aug 22.
2
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
3
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
4
Decision making for large-scale multi-armed bandit problems using bias control of chaotic temporal waveforms in semiconductor lasers.利用半导体激光器中混沌时间波形的偏差控制解决大规模多臂老虎机问题的决策方法。
Sci Rep. 2022 May 16;12(1):8073. doi: 10.1038/s41598-022-12155-y.
5
Uncertainty and exploration in a restless bandit problem.动态强盗问题中的不确定性与探索
Top Cogn Sci. 2015 Apr;7(2):351-67. doi: 10.1111/tops.12145. Epub 2015 Apr 20.
6
Finding structure in multi-armed bandits.在多臂老虎机中寻找结构。
Cogn Psychol. 2020 Jun;119:101261. doi: 10.1016/j.cogpsych.2019.101261. Epub 2020 Feb 12.
7
Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.非平稳多臂赌博机:一种新概念漂移感知算法的实证评估
Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.
8
Some performance considerations when using multi-armed bandit algorithms in the presence of missing data.在存在缺失数据的情况下使用多臂赌博机算法时的一些性能考虑因素。
PLoS One. 2022 Sep 12;17(9):e0274272. doi: 10.1371/journal.pone.0274272. eCollection 2022.
9
Risk-aware multi-armed bandit problem with application to portfolio selection.应用于投资组合选择的风险感知多臂老虎机问题。
R Soc Open Sci. 2017 Nov 15;4(11):171377. doi: 10.1098/rsos.171377. eCollection 2017 Nov.
10
Multi-Armed Bandit-Based User Network Node Selection.基于多臂赌博机的用户网络节点选择
Sensors (Basel). 2024 Jun 24;24(13):4104. doi: 10.3390/s24134104.