啮齿动物在并发强化程序下基于模型的强化学习

Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.

作者信息

Huh Namjung, Jo Suhyun, Kim Hoseok, Sul Jung Hoon, Jung Min Whan

机构信息

Neuroscience Laboratory, Institute for Medical Sciences and Division of Cell Transformation and Restoration, Ajou University School of Medicine, Suwon, Korea.

出版信息

Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.

DOI:10.1101/lm.1295509

PMID:19403794

Abstract

Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.

摘要

强化学习理论假定，基于价值函数，人们会选择行动以最大化长期积极结果的总和，价值函数是对未来奖励的主观估计。在简单的强化学习算法中，价值函数仅通过试错来更新，而在基于模型的强化学习算法中，它们会根据决策者对环境的了解或模型来更新。为了研究动物如何更新价值函数，我们在两种不同的自由选择任务下训练大鼠。在一项任务中，未被选择目标的奖励概率保持不变，而在另一项任务中，自目标上次被选择以来，奖励概率随时间增加。结果表明，在后一项任务中，目标选择概率随着连续替代选择次数的增加而增加，但在前一项任务中并非如此，这表明动物意识到了武装概率随时间的增加，并在选择目标时利用了这些信息。此外，后一项任务中的选择行为更适合用基于模型的强化学习算法来解释。我们的结果表明，即使在相对简单的二元选择任务中，大鼠采用的决策过程也无法用简单的强化学习模型来解释，这表明大鼠可以通过对环境的了解轻松改进其决策策略。

相似文献

Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.

Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.

[Mathematical models of decision making and learning].

Brain Nerve. 2008 Jul;60(7):791-8.

Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex.

Ann N Y Acad Sci. 2007 May;1104:108-22. doi: 10.1196/annals.1390.007. Epub 2007 Mar 8.

Reinforcement learning and decision making in monkeys during a competitive game.

Brain Res Cogn Brain Res. 2004 Dec;22(1):45-58. doi: 10.1016/j.cogbrainres.2004.07.007.

How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.

Learning and decision making in monkeys during a rock-paper-scissors game.

Brain Res Cogn Brain Res. 2005 Oct;25(2):416-30. doi: 10.1016/j.cogbrainres.2005.07.003. Epub 2005 Aug 10.

The actor-critic learning is behind the matching law: matching versus optimal behaviors.

Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.

Reward-dependent learning in neuronal networks for planning and decision making.

Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.

Neuronal basis for evaluating selected action in the primate striatum.

Eur J Neurosci. 2011 Aug;34(3):489-506. doi: 10.1111/j.1460-9568.2011.07771.x. Epub 2011 Jul 22.

Posterior weighted reinforcement learning with state uncertainty.

Neural Comput. 2010 May;22(5):1149-79. doi: 10.1162/neco.2010.01-09-948.

引用本文的文献

Learning from outcomes shapes reliance on moral rules versus cost-benefit reasoning.

Nat Hum Behav. 2025 Aug 11. doi: 10.1038/s41562-025-02271-w.

Dual credit assignment processes underlie dopamine signals in a complex spatial environment.

Neuron. 2023 Nov 1;111(21):3465-3478.e7. doi: 10.1016/j.neuron.2023.07.017. Epub 2023 Aug 22.

Dual credit assignment processes underlie dopamine signals in a complex spatial environment.

bioRxiv. 2023 Mar 19:2023.02.15.528738. doi: 10.1101/2023.02.15.528738.

Undermatching Is a Consequence of Policy Compression.

J Neurosci. 2023 Jan 18;43(3):447-457. doi: 10.1523/JNEUROSCI.1003-22.2022. Epub 2022 Dec 6.

Robust and distributed neural representation of action values.

Elife. 2021 Apr 20;10:e53045. doi: 10.7554/eLife.53045.

Time elapsed between choices in a probabilistic task correlates with repeating the same decision.

Eur J Neurosci. 2021 Apr;53(8):2639-2654. doi: 10.1111/ejn.15144. Epub 2021 Mar 2.

Primate prefrontal neurons signal economic risk derived from the statistics of recent reward experience.

Elife. 2019 Jul 25;8:e44838. doi: 10.7554/eLife.44838.

Stable Representations of Decision Variables for Flexible Behavior.

Neuron. 2019 Sep 4;103(5):922-933.e7. doi: 10.1016/j.neuron.2019.06.001. Epub 2019 Jul 4.

Reinforcement learning models of risky choice and the promotion of risk-taking by losses disguised as wins in rats.

J Exp Psychol Anim Learn Cogn. 2017 Jul;43(3):262-279. doi: 10.1037/xan0000141.

Neural Signals Related to Outcome Evaluation Are Stronger in CA1 than CA3.

Front Neural Circuits. 2017 Jun 7;11:40. doi: 10.3389/fncir.2017.00040. eCollection 2017.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

啮齿动物在并发强化程序下基于模型的强化学习

Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献