• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高效探索-利用策略的近似信息。

Approximate information for efficient exploration-exploitation strategies.

作者信息

Barbier-Chebbah Alex, Vestergaard Christian L, Masson Jean-Baptiste

机构信息

Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France.

Épimethée, Inria, 75012 Paris, France.

出版信息

Phys Rev E. 2024 May;109(5):L052105. doi: 10.1103/PhysRevE.109.L052105.

DOI:10.1103/PhysRevE.109.L052105
PMID:38907409
Abstract

This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to surpass the performance of Thompson sampling at short and intermediary times.

摘要

本文探讨了决策过程中固有的探索与利用困境,重点关注多臂老虎机问题。这些问题涉及一个智能体决定是利用当前知识获取即时收益,还是探索新途径以获取潜在的长期回报。我们在此引入一类算法,即近似信息最大化(AIM),它采用精心选择的对熵梯度的解析近似来决定在每个时间点拉动哪一个臂。AIM与已知渐近最优的汤普森采样以及它所衍生的信息最大化算法的性能相匹配。因此,AIM保留了信息最大化的优势,同时还提高了计算速度、可处理性和易于实现性。特别是,我们展示了如何将其应用于一个50臂老虎机游戏。其表达式是可调的,这允许在各种设置中进行特定优化,从而有可能在短期和中期超越汤普森采样的性能。

相似文献

1
Approximate information for efficient exploration-exploitation strategies.高效探索-利用策略的近似信息。
Phys Rev E. 2024 May;109(5):L052105. doi: 10.1103/PhysRevE.109.L052105.
2
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
3
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
4
An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem.一种用于对抗性多臂老虎机问题的在线极小极大最优算法。
IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5565-5580. doi: 10.1109/TNNLS.2018.2806006. Epub 2018 Mar 8.
5
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control.用于在采样控制下实时监测高维数据的强盗变点检测
Technometrics. 2023;65(1):33-43. doi: 10.1080/00401706.2022.2054861. Epub 2022 Apr 22.
6
Amoeba-inspired Tug-of-War algorithms for exploration-exploitation dilemma in extended Bandit Problem.用于扩展多臂老虎机问题中探索-利用困境的受变形虫启发的拔河算法。
Biosystems. 2014 Mar;117:1-9. doi: 10.1016/j.biosystems.2013.12.007. Epub 2013 Dec 31.
7
Some performance considerations when using multi-armed bandit algorithms in the presence of missing data.在存在缺失数据的情况下使用多臂赌博机算法时的一些性能考虑因素。
PLoS One. 2022 Sep 12;17(9):e0274272. doi: 10.1371/journal.pone.0274272. eCollection 2022.
8
Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits.理解序贯决策过程的随机动力学:多臂赌博机的路径积分分析。
Chaos. 2023 Jun 1;33(6). doi: 10.1063/5.0120076.
9
Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models.基于神经网络和能量模型的上下文博弈中的最大熵探索
Entropy (Basel). 2023 Jan 18;25(2):188. doi: 10.3390/e25020188.
10
Sex differences in learning from exploration.从探索中学习的性别差异。
Elife. 2021 Nov 19;10:e69748. doi: 10.7554/eLife.69748.