Suppr超能文献

多巴胺系统模型中的刺激表征与奖励预测误差的时间安排。

Stimulus representation and the timing of reward-prediction errors in models of the dopamine system.

作者信息

Ludvig Elliot A, Sutton Richard S, Kehoe E James

机构信息

University of Alberta, Edmonton, Alberta, Canada.

出版信息

Neural Comput. 2008 Dec;20(12):3034-54. doi: 10.1162/neco.2008.11-07-654.

Abstract

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.

摘要

多巴胺神经元的相位性放电被理论化为编码一种奖励预测误差,这在强化学习中由时间差分(TD)算法形式化。大多数多巴胺的TD模型都假设了一种刺激表征,即完全序列复合,其中试验中的每个时刻都被清晰地表征。我们为TD模型引入了一种更现实的时间刺激表征。在我们的模型中,所有外部刺激,包括奖励,都会产生一系列内部微刺激,这些微刺激会随着时间的推移而变弱且更分散。TD学习算法利用这些微刺激来生成对未来奖励的预测。这种新的刺激表征将时间泛化注入到TD模型中,并在多个实验中增强了模型与数据之间的对应关系,包括奖励被省略或提前接收的实验。这种改进的拟合主要源于新模型中没有大的负误差,这表明在这些情况下仅多巴胺就能编码TD误差的全范围。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验