保证满意与有限遗憾：一种认知满意价值函数的分析

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function.

作者信息

Tamatsukuri Akihiro, Takahashi Tatsuji

机构信息

Graduate School of Advanced Science and Engineering, Tokyo Denki University, Ishizaka, Hatoyama, Hiki, Saitama 350-0394, Japan.

School of Science and Engineering, Tokyo Denki University, Ishizaka, Hatoyama, Hiki, Saitama 350-0394, Japan; Dwango Artificial Intelligence Laboratory, 5-24-5 Hongo, Bunkyo, Tokyo 113-0033, Japan.

出版信息

Biosystems. 2019 Jun;180:46-53. doi: 10.1016/j.biosystems.2019.02.009. Epub 2019 Feb 27.

DOI:10.1016/j.biosystems.2019.02.009

PMID:30822443

Abstract

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a satisficing strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing (RS) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the K-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that RS is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of RS is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of RS with that of other representative algorithms for the K-armed bandit problems.

摘要

随着强化学习算法被应用于越来越复杂和现实的任务，在实际时间框架内解决此类问题变得越来越困难。因此，我们关注一种满意策略，该策略寻找价值高于期望水平（类似于盈亏平衡点）的行动，而不是最优行动。在本文中，我们引入了一个名为风险敏感满意（RS）的简单数学模型，该模型通过在贪婪策略下整合风险规避和风险偏好态度来实现满意策略。我们将所提出的模型应用于K臂赌博机问题，这是强化学习任务中最基本的一类问题，并证明了两个命题。第一个命题是，RS保证能找到一个价值高于期望水平的行动。第二个命题是，假设期望水平被设定为“最优水平”，使得满意意味着优化，那么RS的遗憾值（预期损失）有一个有限的上界。我们通过数值模拟证实了这些结果，并将RS与K臂赌博机问题的其他代表性算法的性能进行了比较。

相似文献

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function.

Biosystems. 2019 Jun;180:46-53. doi: 10.1016/j.biosystems.2019.02.009. Epub 2019 Feb 27.

Softsatisficing: Risk-sensitive softmax action selection.

Biosystems. 2022 Mar;213:104633. doi: 10.1016/j.biosystems.2022.104633. Epub 2022 Jan 29.

Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?

Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.

Social satisficing: Multi-agent reinforcement learning with satisﬁcing agents.

Biosystems. 2024 Sep;243:105276. doi: 10.1016/j.biosystems.2024.105276. Epub 2024 Jul 19.

Thinking Styles and Regret in Physicians.

PLoS One. 2015 Aug 4;10(8):e0134038. doi: 10.1371/journal.pone.0134038. eCollection 2015.

Robust Satisficing Decision Making for Unmanned Aerial Vehicle Complex Missions under Severe Uncertainty.

PLoS One. 2016 Nov 11;11(11):e0166448. doi: 10.1371/journal.pone.0166448. eCollection 2016.

Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control.

Biosystems. 2014 Feb;116:1-9. doi: 10.1016/j.biosystems.2013.11.002. Epub 2013 Dec 1.

[Mathematical models of decision making and learning].

Brain Nerve. 2008 Jul;60(7):791-8.

An empirical evaluation of active inference in multi-armed bandits.

Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.

Structure learning in human sequential decision-making.

PLoS Comput Biol. 2010 Dec 2;6(12):e1001003. doi: 10.1371/journal.pcbi.1001003.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

保证满意与有限遗憾：一种认知满意价值函数的分析

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献