• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在猴子目标搜索任务中测试的具有动态状态空间的强化学习模型:扩展到学习任务事件

Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Extension to Learning Task Events.

作者信息

Sakamoto Kazuhiro, Yamada Hinata, Kawaguchi Norihiko, Furusawa Yoshito, Saito Naohiro, Mushiake Hajime

机构信息

Department of Neuroscience, Faculty of Medicine, Tohoku Medical and Pharmaceutical University, Sendai, Japan.

Department of Physiology, Tohoku University School of Medicine, Sendai, Japan.

出版信息

Front Comput Neurosci. 2022 Jun 2;16:784604. doi: 10.3389/fncom.2022.784604. eCollection 2022.

DOI:10.3389/fncom.2022.784604
PMID:35720772
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9201426/
Abstract

Learning is a crucial basis for biological systems to adapt to environments. Environments include various states or episodes, and episode-dependent learning is essential in adaptation to such complex situations. Here, we developed a model for learning a two-target search task used in primate physiological experiments. In the task, the agent is required to gaze one of the four presented light spots. Two neighboring spots are served as the correct target alternately, and the correct target pair is switched after a certain number of consecutive successes. In order for the agent to obtain rewards with a high probability, it is necessary to make decisions based on the actions and results of the previous two trials. Our previous work achieved this by using a dynamic state space. However, to learn a task that includes events such as fixation to the initial central spot, the model framework should be extended. For this purpose, here we propose a "history-in-episode architecture." Specifically, we divide states into episodes and histories, and actions are selected based on the histories within each episode. When we compared the proposed model including the dynamic state space with the conventional SARSA method in the two-target search task, the former performed close to the theoretical optimum, while the latter never achieved target-pair switch because it had to re-learn each correct target each time. The reinforcement learning model including the proposed history-in-episode architecture and dynamic state scape enables episode-dependent learning and provides a basis for highly adaptable learning systems to complex environments.

摘要

学习是生物系统适应环境的关键基础。环境包括各种状态或事件,而依赖于事件的学习对于适应这种复杂情况至关重要。在此,我们开发了一种用于学习灵长类动物生理实验中使用的双目标搜索任务的模型。在该任务中,智能体需要注视呈现的四个亮点之一。两个相邻的点交替作为正确目标,并且在连续成功一定次数后正确目标对会切换。为了使智能体以高概率获得奖励,有必要根据前两次试验的动作和结果做出决策。我们之前的工作通过使用动态状态空间实现了这一点。然而,为了学习包含诸如注视初始中心点等事件的任务,模型框架需要扩展。为此,我们在此提出一种“事件内历史架构”。具体而言,我们将状态分为事件和历史,并且基于每个事件内的历史来选择动作。当我们在双目标搜索任务中将包含动态状态空间的所提出模型与传统的SARSA方法进行比较时,前者的表现接近理论最优,而后者从未实现目标对切换,因为它每次都必须重新学习每个正确目标。包含所提出的事件内历史架构和动态状态空间的强化学习模型实现了依赖于事件的学习,并为高度适应复杂环境的学习系统提供了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/57e05de44dde/fncom-16-784604-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/07fb137acaf2/fncom-16-784604-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/e8f267494ef0/fncom-16-784604-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/693ef24b2201/fncom-16-784604-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/35805155497b/fncom-16-784604-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/ec2b5cff051a/fncom-16-784604-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/3eb9520e3fd1/fncom-16-784604-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/57e05de44dde/fncom-16-784604-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/07fb137acaf2/fncom-16-784604-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/e8f267494ef0/fncom-16-784604-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/693ef24b2201/fncom-16-784604-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/35805155497b/fncom-16-784604-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/ec2b5cff051a/fncom-16-784604-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/3eb9520e3fd1/fncom-16-784604-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d35/9201426/57e05de44dde/fncom-16-784604-g007.jpg

相似文献

1
Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Extension to Learning Task Events.在猴子目标搜索任务中测试的具有动态状态空间的强化学习模型:扩展到学习任务事件
Front Comput Neurosci. 2022 Jun 2;16:784604. doi: 10.3389/fncom.2022.784604. eCollection 2022.
2
Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness.在猴子目标搜索任务中测试的具有动态状态空间的强化学习模型:基于经验饱和度和决策唯一性的先前状态自决
Front Comput Neurosci. 2022 Feb 4;15:784592. doi: 10.3389/fncom.2021.784592. eCollection 2021.
3
Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback.将时间差分方法与自组织神经网络相结合用于具有延迟评估反馈的强化学习。
IEEE Trans Neural Netw. 2008 Feb;19(2):230-44. doi: 10.1109/TNN.2007.905839.
4
Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments.避免灾难:活跃的树突使动态环境中的多任务学习成为可能。
Front Neurorobot. 2022 Apr 29;16:846219. doi: 10.3389/fnbot.2022.846219. eCollection 2022.
5
A neural network model for the orbitofrontal cortex and task space acquisition during reinforcement learning.一个用于强化学习期间眶额皮质和任务空间获取的神经网络模型。
PLoS Comput Biol. 2018 Jan 4;14(1):e1005925. doi: 10.1371/journal.pcbi.1005925. eCollection 2018 Jan.
6
RL-DOVS: Reinforcement Learning for Autonomous Robot Navigation in Dynamic Environments.RL-DOVS:动态环境下自主机器人导航的强化学习。
Sensors (Basel). 2022 May 19;22(10):3847. doi: 10.3390/s22103847.
7
Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.啮齿动物在并发强化程序下基于模型的强化学习
Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.
8
Nutrient-Sensitive Reinforcement Learning in Monkeys.猴子的营养敏感强化学习。
J Neurosci. 2023 Mar 8;43(10):1714-1730. doi: 10.1523/JNEUROSCI.0752-22.2022. Epub 2023 Jan 20.
9
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
10
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

本文引用的文献

1
Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness.在猴子目标搜索任务中测试的具有动态状态空间的强化学习模型:基于经验饱和度和决策唯一性的先前状态自决
Front Comput Neurosci. 2022 Feb 4;15:784592. doi: 10.3389/fncom.2021.784592. eCollection 2021.
2
Differences in task-phase-dependent time-frequency patterns of local field potentials in the dorsal and ventral regions of the monkey lateral prefrontal cortex.猕猴外侧前额叶皮质背侧和腹侧区域局部场电位的任务阶段相关时频模式差异。
Neurosci Res. 2020 Jul;156:41-49. doi: 10.1016/j.neures.2019.12.016. Epub 2020 Jan 7.
3
Controllability governs the balance between Pavlovian and instrumental action selection.
可控性支配着巴甫洛夫式和工具性动作选择之间的平衡。
Nat Commun. 2019 Dec 20;10(1):5826. doi: 10.1038/s41467-019-13737-7.
4
Dynamic Axis-Tuned Cells in the Monkey Lateral Prefrontal Cortex during a Path-Planning Task.猴子外侧前额叶皮层在路径规划任务中的动态轴调细胞。
J Neurosci. 2020 Jan 2;40(1):203-219. doi: 10.1523/JNEUROSCI.2526-18.2019. Epub 2019 Nov 12.
5
A contextual binding theory of episodic memory: systems consolidation reconsidered.情境绑定理论视角下的情景记忆:系统巩固再思考。
Nat Rev Neurosci. 2019 Jun;20(6):364-375. doi: 10.1038/s41583-019-0150-4.
6
Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。
Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.
7
Appetitive Pavlovian-instrumental Transfer: A review.食欲性巴甫洛夫-工具性转移:综述。
Neurosci Biobehav Rev. 2016 Dec;71:829-848. doi: 10.1016/j.neubiorev.2016.09.020. Epub 2016 Sep 28.
8
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
9
Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning.贝叶斯非参数方法在部分可观察强化学习中的应用。
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):394-407. doi: 10.1109/TPAMI.2013.191.
10
Surprise signals in the supplementary eye field: rectified prediction errors drive exploration-exploitation transitions.辅助眼区中的意外信号:校正后的预测误差驱动探索-利用转换。
J Neurophysiol. 2015 Feb 1;113(3):1001-14. doi: 10.1152/jn.00128.2014. Epub 2014 Nov 19.