• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过包括直接和间接途径来增强强化学习模型可以提高纹状体依赖任务的性能。

Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks.

机构信息

Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia, United States of America.

Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.

出版信息

PLoS Comput Biol. 2023 Aug 18;19(8):e1011385. doi: 10.1371/journal.pcbi.1011385. eCollection 2023 Aug.

DOI:10.1371/journal.pcbi.1011385
Abstract

A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.

摘要

理解学习行为的一个主要进展源自于实验,这些实验表明奖励学习需要纹状体神经元中的多巴胺输入,并且源于皮质纹状体突触的突触可塑性。许多强化学习模型通过使用类似于多巴胺神经元放电的奖励预测误差来模拟这种多巴胺依赖性突触可塑性,从而学习针对一组线索做出最佳反应的最佳动作。尽管这些模型可以解释行为的许多方面,但复制某些类型的目标导向行为,例如更新和反转,需要额外的模型组件。在这里,我们提出了一种强化学习模型 TD2Q,它具有两个 Q 矩阵,一个代表直接通路神经元 (G),另一个代表间接通路神经元 (N),与基底神经节更好地对应。与以前的两个-Q 架构不同,TD2Q 的一个新颖而关键的方面是利用时间差分奖励预测误差来更新 G 和 N 矩阵。使用基于奖励的自适应探索参数的 softmax 为 N 和 G 选择最佳动作,然后使用应用于两个动作概率的第二个选择步骤来解决差异。该模型在一系列多步任务中进行了测试,包括消退、更新、辨别;切换奖励概率学习;以及序列学习。模拟结果表明,TD2Q 在选择和序列学习任务中产生类似于啮齿动物的行为,并且需要使用时间差分奖励预测误差来学习多步任务。如实验中观察到的,阻止 N 矩阵的更新规则会阻止辨别学习。使用两个矩阵可大大提高序列学习任务的性能。这些结果表明,包括基底神经节生理学的其他方面可以提高强化学习模型的性能,更好地复制动物行为,并深入了解直接和间接纹状体神经元的作用。

相似文献

1
Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks.通过包括直接和间接途径来增强强化学习模型可以提高纹状体依赖任务的性能。
PLoS Comput Biol. 2023 Aug 18;19(8):e1011385. doi: 10.1371/journal.pcbi.1011385. eCollection 2023 Aug.
2
Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits.纹状体多巴胺爬坡可能表明皮质基底神经节回路具有灵活的强化学习和遗忘能力。
Front Neural Circuits. 2014 Apr 9;8:36. doi: 10.3389/fncir.2014.00036. eCollection 2014.
3
A Dual Role Hypothesis of the Cortico-Basal-Ganglia Pathways: Opponency and Temporal Difference Through Dopamine and Adenosine.皮质-基底神经节通路的双重作用假说:多巴胺和腺苷介导的对立和时间差分。
Front Neural Circuits. 2019 Jan 7;12:111. doi: 10.3389/fncir.2018.00111. eCollection 2018.
4
A new framework for cortico-striatal plasticity: behavioural theory meets in vitro data at the reinforcement-action interface.一种用于皮质-纹状体可塑性的新框架:行为理论在强化-行动界面与体外数据相遇。
PLoS Biol. 2015 Jan 6;13(1):e1002034. doi: 10.1371/journal.pbio.1002034. eCollection 2015 Jan.
5
Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.基底神经节和眶额皮质在目标导向行为中的参与。
Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.
6
Modeling functions of striatal dopamine modulation in learning and planning.纹状体多巴胺调节在学习和规划中的建模功能。
Neuroscience. 2001;103(1):65-85. doi: 10.1016/s0306-4522(00)00554-6.
7
Striatal action-learning based on dopamine concentration.基于多巴胺浓度的纹状体动作学习。
Exp Brain Res. 2010 Jan;200(3-4):307-17. doi: 10.1007/s00221-009-2060-6. Epub 2009 Nov 11.
8
Dopaminergic control of motivation and reinforcement learning: a closed-circuit account for reward-oriented behavior.多巴胺能对动机和强化学习的控制:一种针对奖励导向行为的闭环解释。
J Neurosci. 2013 May 15;33(20):8866-90. doi: 10.1523/JNEUROSCI.4614-12.2013.
9
Maladaptive striatal plasticity and abnormal reward-learning in cervical dystonia.颈源性肌张力障碍中的纹状体适应不良性可塑性和异常奖励学习。
Eur J Neurosci. 2019 Oct;50(7):3191-3204. doi: 10.1111/ejn.14414. Epub 2019 May 14.
10
Action selection performance of a reconfigurable basal ganglia inspired model with Hebbian-Bayesian Go-NoGo connectivity.具有赫布 - 贝叶斯Go - NoGo连接的可重构基底神经节启发模型的动作选择性能
Front Behav Neurosci. 2012 Oct 2;6:65. doi: 10.3389/fnbeh.2012.00065. eCollection 2012.

引用本文的文献

1
From avoidance to new action: the multifaceted role of the striatal indirect pathway.从回避到新行动:纹状体间接通路的多方面作用。
Nat Rev Neurosci. 2025 May 7. doi: 10.1038/s41583-025-00925-2.
2
The Computational Bottleneck of Basal Ganglia Output (and What to Do About it).基底神经节输出的计算瓶颈(以及应对方法)。
eNeuro. 2025 Apr 24;12(4). doi: 10.1523/ENEURO.0431-23.2024. Print 2025 Apr.

本文引用的文献

1
Striatal direct pathway neurons play leading roles in accelerating rotarod motor skill learning.纹状体直接通路神经元在加速转棒运动技能学习中起主导作用。
iScience. 2022 Apr 12;25(5):104245. doi: 10.1016/j.isci.2022.104245. eCollection 2022 May 20.
2
Sex differences in learning from exploration.从探索中学习的性别差异。
Elife. 2021 Nov 19;10:e69748. doi: 10.7554/eLife.69748.
3
Specialized coding patterns among dorsomedial prefrontal neuronal ensembles predict conditioned reward seeking.背内侧前额叶神经元集合中的特异性编码模式预测条件性奖励寻求。
Elife. 2021 Jun 29;10:e65764. doi: 10.7554/eLife.65764.
4
General Pavlovian-instrumental transfer tests reveal selective inhibition of the response type - whether Pavlovian or instrumental - performed during extinction.一般的条件性-工具性反应转移测试揭示了在消退期间进行的反应类型(无论是条件性的还是工具性的)的选择性抑制。
Neurobiol Learn Mem. 2021 Sep;183:107483. doi: 10.1016/j.nlm.2021.107483. Epub 2021 Jun 25.
5
Endocannabinoids and Dopamine Balance Basal Ganglia Output.内源性大麻素与多巴胺平衡基底神经节输出。
Front Cell Neurosci. 2021 Mar 17;15:639082. doi: 10.3389/fncel.2021.639082. eCollection 2021.
6
Modular deep reinforcement learning from reward and punishment for robot navigation.基于奖惩的机器人导航模块化深度强化学习。
Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.
7
Fast spiking interneuron activity in primate striatum tracks learning of attention cues.灵长类纹状体中快速尖峰中间神经元的活动跟踪注意力线索的学习。
Proc Natl Acad Sci U S A. 2020 Jul 28;117(30):18049-18058. doi: 10.1073/pnas.2001348117. Epub 2020 Jul 13.
8
Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making.多巴胺对人类决策中探索/利用权衡的调节作用。
Elife. 2020 Jun 2;9:e51260. doi: 10.7554/eLife.51260.
9
Modeling the effects of motivation on choice and learning in the basal ganglia.在基底神经节中建模动机对选择和学习的影响。
PLoS Comput Biol. 2020 May 26;16(5):e1007465. doi: 10.1371/journal.pcbi.1007465. eCollection 2020 May.
10
Dopamine D2 receptors in discrimination learning and spine enlargement.多巴胺 D2 受体在辨别学习和脊柱增大中的作用。
Nature. 2020 Mar;579(7800):555-560. doi: 10.1038/s41586-020-2115-1. Epub 2020 Mar 18.