• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于强化学习的循环神经网络中的类别学习。

Category learning in a recurrent neural network with reinforcement learning.

作者信息

Zhang Ying, Pan Xiaochuan, Wang Yihong

机构信息

Institute for Cognitive Neurodynamics, East China University of Science and Technology, Shanghai, China.

出版信息

Front Psychiatry. 2022 Oct 25;13:1008011. doi: 10.3389/fpsyt.2022.1008011. eCollection 2022.

DOI:10.3389/fpsyt.2022.1008011
PMID:36387007
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9640766/
Abstract

It is known that humans and animals can learn and utilize category information quickly and efficiently to adapt to changing environments, and several brain areas are involved in learning and encoding category information. However, it is unclear that how the brain system learns and forms categorical representations from the view of neural circuits. In order to investigate this issue from the network level, we combine a recurrent neural network with reinforcement learning to construct a deep reinforcement learning model to demonstrate how the category is learned and represented in the network. The model consists of a policy network and a value network. The policy network is responsible for updating the policy to choose actions, while the value network is responsible for evaluating the action to predict rewards. The agent learns dynamically through the information interaction between the policy network and the value network. This model was trained to learn six stimulus-stimulus associative chains in a sequential paired-association task that was learned by the monkey. The simulated results demonstrated that our model was able to learn the stimulus-stimulus associative chains, and successfully reproduced the similar behavior of the monkey performing the same task. Two types of neurons were found in this model: one type primarily encoded identity information about individual stimuli; the other type mainly encoded category information of associated stimuli in one chain. The two types of activity-patterns were also observed in the primate prefrontal cortex after the monkey learned the same task. Furthermore, the ability of these two types of neurons to encode stimulus or category information was enhanced during this model was learning the task. Our results suggest that the neurons in the recurrent neural network have the ability to form categorical representations through deep reinforcement learning during learning stimulus-stimulus associations. It might provide a new approach for understanding neuronal mechanisms underlying how the prefrontal cortex learns and encodes category information.

摘要

众所周知,人类和动物能够快速有效地学习和利用类别信息以适应不断变化的环境,并且有几个脑区参与学习和编码类别信息。然而,从神经回路的角度来看,大脑系统如何学习并形成类别表征尚不清楚。为了从网络层面研究这个问题,我们将循环神经网络与强化学习相结合,构建了一个深度强化学习模型,以展示类别在网络中是如何被学习和表征的。该模型由一个策略网络和一个价值网络组成。策略网络负责更新策略以选择动作,而价值网络负责评估动作以预测奖励。智能体通过策略网络和价值网络之间的信息交互进行动态学习。这个模型在猴子学习的顺序配对联想任务中被训练来学习六条刺激-刺激联想链。模拟结果表明,我们的模型能够学习刺激-刺激联想链,并成功重现了猴子执行相同任务时的类似行为。在这个模型中发现了两种类型的神经元:一种主要编码关于单个刺激的身份信息;另一种主要编码一条链中相关刺激的类别信息。在猴子学习相同任务后,在灵长类前额叶皮层中也观察到了这两种活动模式。此外,在这个模型学习任务的过程中,这两种类型的神经元编码刺激或类别信息的能力得到了增强。我们的结果表明,循环神经网络中的神经元在学习刺激-刺激关联的过程中具有通过深度强化学习形成类别表征的能力。这可能为理解前额叶皮层如何学习和编码类别信息的神经元机制提供一种新方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/e4b546cf757e/fpsyt-13-1008011-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/d6cf115b3ead/fpsyt-13-1008011-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/6c9eb3128130/fpsyt-13-1008011-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/126b288f7937/fpsyt-13-1008011-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/26d94b415c1d/fpsyt-13-1008011-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/293ca0d93b32/fpsyt-13-1008011-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/5cc0d8457f1f/fpsyt-13-1008011-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/b6283f7163b2/fpsyt-13-1008011-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/6c64182a0ca0/fpsyt-13-1008011-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/e4b546cf757e/fpsyt-13-1008011-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/d6cf115b3ead/fpsyt-13-1008011-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/6c9eb3128130/fpsyt-13-1008011-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/126b288f7937/fpsyt-13-1008011-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/26d94b415c1d/fpsyt-13-1008011-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/293ca0d93b32/fpsyt-13-1008011-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/5cc0d8457f1f/fpsyt-13-1008011-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/b6283f7163b2/fpsyt-13-1008011-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/6c64182a0ca0/fpsyt-13-1008011-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be53/9640766/e4b546cf757e/fpsyt-13-1008011-g0009.jpg

相似文献

1
Category learning in a recurrent neural network with reinforcement learning.基于强化学习的循环神经网络中的类别学习。
Front Psychiatry. 2022 Oct 25;13:1008011. doi: 10.3389/fpsyt.2022.1008011. eCollection 2022.
2
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.
3
A neural network model for the orbitofrontal cortex and task space acquisition during reinforcement learning.一个用于强化学习期间眶额皮质和任务空间获取的神经网络模型。
PLoS Comput Biol. 2018 Jan 4;14(1):e1005925. doi: 10.1371/journal.pcbi.1005925. eCollection 2018 Jan.
4
Neuronal encoding of meaning: establishing category-selective response patterns in the avian 'prefrontal cortex'.意义的神经元编码:在鸟类“前额叶皮层”中建立类别选择性反应模式
Behav Brain Res. 2009 Mar 2;198(1):214-23. doi: 10.1016/j.bbr.2008.11.010. Epub 2008 Nov 12.
5
Amygdala Contributions to Stimulus-Reward Encoding in the Macaque Medial and Orbital Frontal Cortex during Learning.学习过程中杏仁核对猕猴内侧和眶额皮质中刺激-奖励编码的贡献。
J Neurosci. 2017 Feb 22;37(8):2186-2202. doi: 10.1523/JNEUROSCI.0933-16.2017. Epub 2017 Jan 25.
6
Reward prediction based on stimulus categorization in primate lateral prefrontal cortex.基于灵长类动物外侧前额叶皮层刺激分类的奖励预测
Nat Neurosci. 2008 Jun;11(6):703-12. doi: 10.1038/nn.2128. Epub 2008 May 25.
7
Category-Biased Neural Representations Form Spontaneously during Learning That Emphasizes Memory for Specific Instances.类别偏向的神经表示在强调特定实例记忆的学习过程中自发形成。
J Neurosci. 2022 Feb 2;42(5):865-876. doi: 10.1523/JNEUROSCI.1396-21.2021. Epub 2021 Dec 22.
8
A three-layered model of primate prefrontal cortex encodes identity and abstract categorical structure of behavioral sequences.灵长类动物前额叶皮层的三层模型对行为序列的身份和抽象分类结构进行编码。
J Physiol Paris. 2011 Jan-Jun;105(1-3):16-24. doi: 10.1016/j.jphysparis.2011.07.010. Epub 2011 Sep 10.
9
Learning attentional templates for value-based decision-making.学习基于价值的决策的注意模板。
Cell. 2024 Mar 14;187(6):1476-1489.e21. doi: 10.1016/j.cell.2024.01.041. Epub 2024 Feb 23.
10
[Stimulus and reward information encoded by population neurons in the primate prefrontal cortex and striatum].[灵长类动物前额叶皮层和纹状体中群体神经元编码的刺激与奖励信息]
Sheng Li Xue Bao. 2020 Dec 25;72(6):765-776.

引用本文的文献

1
Visual statistical learning based on a coupled shape-position recurrent neural network model.基于耦合形状-位置递归神经网络模型的视觉统计学习
Cogn Neurodyn. 2025 Dec;19(1):96. doi: 10.1007/s11571-025-10285-3. Epub 2025 Jun 17.
2
A working memory model based on recurrent neural networks using reinforcement learning.一种基于使用强化学习的递归神经网络的工作记忆模型。
Cogn Neurodyn. 2024 Oct;18(5):3031-3058. doi: 10.1007/s11571-024-10137-6. Epub 2024 Jun 13.

本文引用的文献

1
Nonlinear relationship between CAN current and influx underpins synergistic action of muscarinic and NMDA receptors on bursts induction in midbrain dopaminergic neurons.钙电流与内流之间的非线性关系是毒蕈碱受体和NMDA受体对中脑多巴胺能神经元爆发诱导协同作用的基础。
Cogn Neurodyn. 2022 Jun;16(3):719-731. doi: 10.1007/s11571-021-09740-8. Epub 2022 Jan 17.
2
Integrating unsupervised and reinforcement learning in human categorical perception: A computational model.无监督学习和强化学习在人类范畴感知中的整合:一个计算模型。
PLoS One. 2022 May 10;17(5):e0267838. doi: 10.1371/journal.pone.0267838. eCollection 2022.
3
Multimodal cortico-cortical associations induced by fear and sensory conditioning in the guinea pig.
豚鼠中由恐惧和感觉条件作用诱导的多模态皮质-皮质关联
Cogn Neurodyn. 2022 Apr;16(2):283-296. doi: 10.1007/s11571-021-09708-8. Epub 2021 Sep 8.
4
A computational examination of the two-streams hypothesis: which pathway needs a longer memory?双流假说的计算检验:哪条通路需要更长的记忆?
Cogn Neurodyn. 2022 Feb;16(1):149-165. doi: 10.1007/s11571-021-09703-z. Epub 2021 Aug 10.
5
Categorical Perception: A Groundwork for Deep Learning.范畴感知:深度学习的基础。
Neural Comput. 2022 Jan 14;34(2):437-475. doi: 10.1162/neco_a_01454.
6
The glutamatergic synapse: a complex machinery for information processing.谷氨酸能突触:一种用于信息处理的复杂机制。
Cogn Neurodyn. 2021 Oct;15(5):757-781. doi: 10.1007/s11571-021-09679-w. Epub 2021 May 7.
7
Mouse visual cortex areas represent perceptual and semantic features of learned visual categories.鼠类视觉皮层区域代表了习得视觉类别的知觉和语义特征。
Nat Neurosci. 2021 Oct;24(10):1441-1451. doi: 10.1038/s41593-021-00914-5. Epub 2021 Sep 20.
8
Distributed functions of prefrontal and parietal cortices during sequential categorical decisions.前额叶和顶叶皮层在序列类别决策中的分布式功能。
Elife. 2021 Sep 7;10:e58782. doi: 10.7554/eLife.58782.
9
Gated Recurrent Units Viewed Through the Lens of Continuous Time Dynamical Systems.从连续时间动态系统视角看门控循环单元
Front Comput Neurosci. 2021 Jul 22;15:678158. doi: 10.3389/fncom.2021.678158. eCollection 2021.
10
Reinforcement-learning in fronto-striatal circuits.额叶-纹状体回路中的强化学习
Neuropsychopharmacology. 2022 Jan;47(1):147-162. doi: 10.1038/s41386-021-01108-0. Epub 2021 Aug 5.