多巴胺系统模型中的刺激表征与奖励预测误差的时间安排。

Stimulus representation and the timing of reward-prediction errors in models of the dopamine system.

作者信息

Ludvig Elliot A, Sutton Richard S, Kehoe E James

机构信息

University of Alberta, Edmonton, Alberta, Canada.

出版信息

Neural Comput. 2008 Dec;20(12):3034-54. doi: 10.1162/neco.2008.11-07-654.

DOI:10.1162/neco.2008.11-07-654

PMID:18624657

Abstract

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.

摘要

多巴胺神经元的相位性放电被理论化为编码一种奖励预测误差，这在强化学习中由时间差分（TD）算法形式化。大多数多巴胺的TD模型都假设了一种刺激表征，即完全序列复合，其中试验中的每个时刻都被清晰地表征。我们为TD模型引入了一种更现实的时间刺激表征。在我们的模型中，所有外部刺激，包括奖励，都会产生一系列内部微刺激，这些微刺激会随着时间的推移而变弱且更分散。TD学习算法利用这些微刺激来生成对未来奖励的预测。这种新的刺激表征将时间泛化注入到TD模型中，并在多个实验中增强了模型与数据之间的对应关系，包括奖励被省略或提前接收的实验。这种改进的拟合主要源于新模型中没有大的负误差，这表明在这些情况下仅多巴胺就能编码TD误差的全范围。

相似文献

Stimulus representation and the timing of reward-prediction errors in models of the dopamine system.多巴胺系统模型中的刺激表征与奖励预测误差的时间安排。

Neural Comput. 2008 Dec;20(12):3034-54. doi: 10.1162/neco.2008.11-07-654.

Representation and timing in theories of the dopamine system.多巴胺系统理论中的表征与时机

Neural Comput. 2006 Jul;18(7):1637-77. doi: 10.1162/neco.2006.18.7.1637.

Dopamine, prediction error and associative learning: a model-based account.多巴胺、预测误差与联想学习：基于模型的解释

Network. 2006 Mar;17(1):61-84. doi: 10.1080/09548980500361624.

Long-term reward prediction in TD models of the dopamine system.多巴胺系统TD模型中的长期奖励预测。

Neural Comput. 2002 Nov;14(11):2567-83. doi: 10.1162/089976602760407973.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型，用于学习空间延迟反应任务。

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.相位多巴胺作为内在和外在强化的预测误差，驱动着动作获取和奖励最大化：一项模拟机器人研究。

Neural Netw. 2013 Mar;39:40-51. doi: 10.1016/j.neunet.2012.12.012. Epub 2013 Jan 14.

Dopamine neurons report an error in the temporal prediction of reward during learning.多巴胺神经元在学习过程中报告奖励时间预测的误差。

Nat Neurosci. 1998 Aug;1(4):304-9. doi: 10.1038/1124.

PVLV: the primary value and learned value Pavlovian learning algorithm.PVLV：主要价值与习得价值的巴甫洛夫学习算法

Behav Neurosci. 2007 Feb;121(1):31-49. doi: 10.1037/0735-7044.121.1.31.

Axiomatic methods, dopamine and reward prediction error.公理法、多巴胺与奖励预测误差。

Curr Opin Neurobiol. 2008 Apr;18(2):197-202. doi: 10.1016/j.conb.2008.07.007. Epub 2008 Aug 12.

An imperfect dopaminergic error signal can drive temporal-difference learning.不完美的多巴胺能误差信号可以驱动时间差分学习。

PLoS Comput Biol. 2011 May;7(5):e1001133. doi: 10.1371/journal.pcbi.1001133. Epub 2011 May 12.

引用本文的文献

A multidimensional distributional map of future reward in dopamine neurons.多巴胺神经元中未来奖励的多维分布图。

Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.

Prospective contingency explains behavior and dopamine signals during associative learning.前瞻性偶然性解释了联想学习过程中的行为和多巴胺信号。

Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.

Addressing Altered Anticipation as a Transdiagnostic Target Through Computational Psychiatry.通过计算精神病学将改变的预期作为跨诊断靶点来解决。

Biol Psychiatry Cogn Neurosci Neuroimaging. 2025 Mar 7. doi: 10.1016/j.bpsc.2025.02.014.

The devilish details affecting TDRL models in dopamine research.多巴胺研究中影响临时残疾评定量表（TDRL）模型的棘手细节。

Trends Cogn Sci. 2025 May;29(5):434-447. doi: 10.1016/j.tics.2025.02.001. Epub 2025 Feb 26.

Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time.学习表达类似于奖励预测误差的多巴胺能活动需要时间的可塑性表示。

Nat Commun. 2024 Jul 12;15(1):5856. doi: 10.1038/s41467-024-50205-3.

Reward-based option competition in human dorsal stream and transition from stochastic exploration to exploitation in continuous space.基于奖励的选项竞争在人类背侧流中，以及在连续空间中从随机探索到利用的转变。

Sci Adv. 2024 Feb 23;10(8):eadj2219. doi: 10.1126/sciadv.adj2219.

Dopamine transients follow a striatal gradient of reward time horizons.多巴胺瞬变遵循纹状体奖赏时程的梯度。

Nat Neurosci. 2024 Apr;27(4):737-746. doi: 10.1038/s41593-023-01566-3. Epub 2024 Feb 6.

Frontal Norepinephrine Represents a Threat Prediction Error Under Uncertainty.前额叶去甲肾上腺素代表不确定性下的威胁预测误差。

Biol Psychiatry. 2024 Aug 15;96(4):256-267. doi: 10.1016/j.biopsych.2024.01.025. Epub 2024 Feb 4.

Interval Timing as a Computational Pathway From Early Life Adversity to Affective Disorders.间隔计时作为从早期生活逆境到情感障碍的计算途径。

Top Cogn Sci. 2024 Jan;16(1):92-112. doi: 10.1111/tops.12701. Epub 2023 Oct 12.

Emergence of belief-like representations through reinforcement learning.通过强化学习产生类信仰的表示。

PLoS Comput Biol. 2023 Sep 11;19(9):e1011067. doi: 10.1371/journal.pcbi.1011067. eCollection 2023 Sep.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多巴胺系统模型中的刺激表征与奖励预测误差的时间安排。

Stimulus representation and the timing of reward-prediction errors in models of the dopamine system.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献