Suppr超能文献

啮齿动物在并发强化程序下基于模型的强化学习

Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.

作者信息

Huh Namjung, Jo Suhyun, Kim Hoseok, Sul Jung Hoon, Jung Min Whan

机构信息

Neuroscience Laboratory, Institute for Medical Sciences and Division of Cell Transformation and Restoration, Ajou University School of Medicine, Suwon, Korea.

出版信息

Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.

Abstract

Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.

摘要

强化学习理论假定,基于价值函数,人们会选择行动以最大化长期积极结果的总和,价值函数是对未来奖励的主观估计。在简单的强化学习算法中,价值函数仅通过试错来更新,而在基于模型的强化学习算法中,它们会根据决策者对环境的了解或模型来更新。为了研究动物如何更新价值函数,我们在两种不同的自由选择任务下训练大鼠。在一项任务中,未被选择目标的奖励概率保持不变,而在另一项任务中,自目标上次被选择以来,奖励概率随时间增加。结果表明,在后一项任务中,目标选择概率随着连续替代选择次数的增加而增加,但在前一项任务中并非如此,这表明动物意识到了武装概率随时间的增加,并在选择目标时利用了这些信息。此外,后一项任务中的选择行为更适合用基于模型的强化学习算法来解释。我们的结果表明,即使在相对简单的二元选择任务中,大鼠采用的决策过程也无法用简单的强化学习模型来解释,这表明大鼠可以通过对环境的了解轻松改进其决策策略。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验