神经控制多巴胺递质传递：对强化学习的启示。

Neural control of dopamine neurotransmission: implications for reinforcement learning.

机构信息

Neurobiology Research Unit, Okinawa Institute of Science and Technology, 1919-1, Tancha, Onna-Son, Kunigami, Okinawa 904-0412, Japan.

出版信息

Eur J Neurosci. 2012 Apr;35(7):1115-23. doi: 10.1111/j.1460-9568.2012.08055.x.

DOI:10.1111/j.1460-9568.2012.08055.x

PMID:22487041

Abstract

In the past few decades there has been remarkable convergence of machine learning with neurobiological understanding of reinforcement learning mechanisms, exemplified by temporal difference (TD) learning models. The anatomy of the basal ganglia provides a number of potential substrates for instantiation of the TD mechanism. In contrast to the traditional concept of direct and indirect pathway outputs from the striatum, we emphasize that projection neurons of the striatum are branched and individual striatofugal neurons innervate both globus pallidus externa and globus pallidus interna/substantia nigra (GPi/SNr). This suggests that the GPi/SNr has the necessary inputs to operate as the source of a TD signal. We also discuss the mechanism for the timing processes necessary for learning in the TD framework. The TD framework has been particularly successful in analysing electrophysiogical recordings from dopamine (DA) neurons during learning, in terms of reward prediction error. However, present understanding of the neural control of DA release is limited, and hence the neural mechanisms involved are incompletely understood. Inhibition is very conspicuously present among the inputs to the DA neurons, with inhibitory synapses accounting for the majority of synapses on DA neurons. Furthermore, synchronous firing of the DA neuron population requires disinhibition and excitation to occur together in a coordinated manner. We conclude that the inhibitory circuits impinging directly or indirectly on the DA neurons play a central role in the control of DA neuron activity and further investigation of these circuits may provide important insight into the biological mechanisms of reinforcement learning.

摘要

在过去的几十年中，机器学习与强化学习机制的神经生物学理解之间的融合取得了显著的进展，其中以时间差分 (TD) 学习模型为代表。基底神经节的解剖结构为 TD 机制的实现提供了多个潜在的基质。与纹状体的直接和间接途径输出的传统概念相反，我们强调纹状体的投射神经元是分支的，并且单个纹状体传出神经元支配苍白球外和苍白球内/黑质 (GPi/SNr)。这表明 GPi/SNr 具有作为 TD 信号源所需的输入。我们还讨论了在 TD 框架中学习所需的定时过程的机制。TD 框架在分析学习期间多巴胺 (DA) 神经元的电生理记录方面特别成功，就奖励预测误差而言。然而，目前对 DA 释放的神经控制的理解有限，因此涉及的神经机制理解不完整。抑制在 DA 神经元的输入中非常明显，抑制性突触占 DA 神经元上突触的大多数。此外，DA 神经元群体的同步放电需要抑制和兴奋以协调的方式一起发生。我们得出结论，直接或间接影响 DA 神经元的抑制性回路在控制 DA 神经元活动中起着核心作用，对这些回路的进一步研究可能为强化学习的生物学机制提供重要的见解。

相似文献

Neural control of dopamine neurotransmission: implications for reinforcement learning.神经控制多巴胺递质传递：对强化学习的启示。

Eur J Neurosci. 2012 Apr;35(7):1115-23. doi: 10.1111/j.1460-9568.2012.08055.x.

GABAergic control of substantia nigra dopaminergic neurons.黑质多巴胺能神经元的γ-氨基丁酸能调控

Prog Brain Res. 2007;160:189-208. doi: 10.1016/S0079-6123(06)60011-3.

A Dual Role Hypothesis of the Cortico-Basal-Ganglia Pathways: Opponency and Temporal Difference Through Dopamine and Adenosine.皮质-基底神经节通路的双重作用假说：多巴胺和腺苷介导的对立和时间差分。

Front Neural Circuits. 2019 Jan 7;12:111. doi: 10.3389/fncir.2018.00111. eCollection 2018.

An implementation of reinforcement learning based on spike timing dependent plasticity.一种基于脉冲时间依赖可塑性的强化学习实现。

Biol Cybern. 2008 Dec;99(6):517-23. doi: 10.1007/s00422-008-0265-6. Epub 2008 Oct 22.

Can the apparent adaptation of dopamine neurons' mismatch sensitivities be reconciled with their computation of reward prediction errors?多巴胺神经元的失配敏感性的明显适应性能否与它们对奖励预测误差的计算相协调？

Neurosci Lett. 2008 Jun 13;438(1):14-6. doi: 10.1016/j.neulet.2008.04.059. Epub 2008 Apr 22.

The mechanism of ethanol action on midbrain dopaminergic neuron firing: a dynamic-clamp study of the role of I(h) and GABAergic synaptic integration.乙醇对中脑多巴胺能神经元放电作用的机制：I(h)和 GABA 能突触整合作用的动态钳研究。

J Neurophysiol. 2011 Oct;106(4):1901-22. doi: 10.1152/jn.00162.2011. Epub 2011 Jun 22.

[Reward processing of the basal ganglia--reward function of pedunculopontine tegmental nucleus].[基底神经节的奖赏处理——脚桥被盖核的奖赏功能]

Brain Nerve. 2009 Apr;61(4):397-404.

Dopaminergic neuromodulation of synaptic transmission between mitral and granule cells in the teleost olfactory bulb.鱼类嗅球中僧帽细胞和颗粒细胞间突触传递的多巴胺能神经调制。

J Neurophysiol. 2012 Mar;107(5):1313-24. doi: 10.1152/jn.00536.2011. Epub 2011 Dec 7.

A cellular mechanism of reward-related learning.一种与奖励相关学习的细胞机制。

Nature. 2001 Sep 6;413(6851):67-70. doi: 10.1038/35092560.

A dopamine-acetylcholine cascade: simulating learned and lesion-induced behavior of striatal cholinergic interneurons.多巴胺 - 乙酰胆碱级联反应：模拟纹状体胆碱能中间神经元的习得性行为和损伤诱导行为。

J Neurophysiol. 2008 Oct;100(4):2409-21. doi: 10.1152/jn.90486.2008. Epub 2008 Aug 20.

引用本文的文献

Glutamate inputs send prediction error of reward, but not negative value of aversive stimuli, to dopamine neurons.谷氨酸输入将奖励的预测误差，但不是厌恶刺激的负值，发送到多巴胺神经元。

Neuron. 2024 Mar 20;112(6):1001-1019.e6. doi: 10.1016/j.neuron.2023.12.019. Epub 2024 Jan 25.

Glutamate inputs send prediction error of reward but not negative value of aversive stimuli to dopamine neurons.谷氨酸能输入向多巴胺神经元发送奖励的预测误差，但不发送厌恶刺激的负价值。

bioRxiv. 2023 Nov 9:2023.11.09.566472. doi: 10.1101/2023.11.09.566472.

Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.皮层对 NAc 的输入比丘脑的输入更具有选择选择性，从而支持强化学习。

Cell Rep. 2022 May 17;39(7):110756. doi: 10.1016/j.celrep.2022.110756.

What is resilience: an affiliative neuroscience approach.什么是复原力：一种亲和神经科学方法。

World Psychiatry. 2020 Jun;19(2):132-150. doi: 10.1002/wps.20729.

Distributed and Mixed Information in Monosynaptic Inputs to Dopamine Neurons.多巴胺能神经元单突触输入中的分布式和混合信息

Neuron. 2016 Sep 21;91(6):1374-1389. doi: 10.1016/j.neuron.2016.08.018. Epub 2016 Sep 8.

Complex Multiplexing of Reward-Cue- and Licking-Movement-Related Activity in Single Midline Thalamus Neurons.单一中线丘脑神经元中与奖励线索和舔舐运动相关活动的复杂多重化

J Neurosci. 2016 Mar 23;36(12):3567-78. doi: 10.1523/JNEUROSCI.1107-15.2016.

Dopamine Prediction Errors in Reward Learning and Addiction: From Theory to Neural Circuitry.奖励学习与成瘾中的多巴胺预测误差：从理论到神经回路

Neuron. 2015 Oct 21;88(2):247-63. doi: 10.1016/j.neuron.2015.08.037.

Resting-State Functional Connectivity of the Locus Coeruleus in Humans: In Comparison with the Ventral Tegmental Area/Substantia Nigra Pars Compacta and the Effects of Age.人类蓝斑核的静息态功能连接：与腹侧被盖区/黑质致密部的比较及年龄的影响

Cereb Cortex. 2016 Aug;26(8):3413-27. doi: 10.1093/cercor/bhv172. Epub 2015 Jul 28.

Computing reward-prediction error: an integrated account of cortical timing and basal-ganglia pathways for appetitive and aversive learning.计算奖励预测误差：关于皮层时间和基底神经节通路在食欲性和厌恶性学习中的综合阐述

Eur J Neurosci. 2015 Aug;42(4):2003-21. doi: 10.1111/ejn.12994. Epub 2015 Jul 25.

Effects of fictive reward on rat's choice behavior.虚拟奖励对大鼠选择行为的影响。

Sci Rep. 2015 Jan 27;5:8040. doi: 10.1038/srep08040.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

神经控制多巴胺递质传递：对强化学习的启示。

Neural control of dopamine neurotransmission: implications for reinforcement learning.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献