• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多巴胺系统TD模型中的长期奖励预测。

Long-term reward prediction in TD models of the dopamine system.

作者信息

Daw Nathaniel D, Touretzky David S

机构信息

Computer Science Department and Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

出版信息

Neural Comput. 2002 Nov;14(11):2567-83. doi: 10.1162/089976602760407973.

DOI:10.1162/089976602760407973
PMID:12433290
Abstract

This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have been mostly restricted to short-term predictions of rewards expected during a single, somewhat artificially defined trial. Also, the models focused exclusively on the phasic pause-and-burst activity of primate DA neurons; the neurons' slower, tonic background activity was assumed to be constant. This has led to difficulty in explaining the results of neurochemical experiments that measure indications of DA release on a slow timescale, results that seem at first glance inconsistent with a reward prediction model. In this article, we investigate a TD model of DA activity modified so as to enable it to make longer-term predictions about rewards expected far in the future. We show that these predictions manifest themselves as slow changes in the baseline error signal, which we associate with tonic DA activity. Using this model, we make new predictions about the behavior of the DA system in a number of experimental situations. Some of these predictions suggest new computational explanations for previously puzzling data, such as indications from microdialysis studies of elevated DA activity triggered by aversive events.

摘要

本文探讨了多巴胺系统的时间差分(TD)模型中,长期奖励预测与慢时间尺度神经活动之间的关系。此类模型试图解释多巴胺(DA)神经元的活动如何与未来奖励预测中的误差相关。先前的模型大多局限于对单个、某种程度上人为定义的试验中预期奖励的短期预测。此外,这些模型仅专注于灵长类动物DA神经元的相位性停顿和爆发活动;假定神经元较慢的紧张性背景活动是恒定的。这导致难以解释在慢时间尺度上测量DA释放指标的神经化学实验结果,这些结果乍一看与奖励预测模型不一致。在本文中,我们研究了一种经过修改的DA活动TD模型,使其能够对未来很久之后预期的奖励进行长期预测。我们表明,这些预测表现为基线误差信号的缓慢变化,我们将其与紧张性DA活动相关联。使用该模型,我们对多种实验情况下DA系统的行为做出了新的预测。其中一些预测为先前令人困惑的数据提供了新的计算解释,例如微透析研究中由厌恶事件引发的DA活动升高的指标。

相似文献

1
Long-term reward prediction in TD models of the dopamine system.多巴胺系统TD模型中的长期奖励预测。
Neural Comput. 2002 Nov;14(11):2567-83. doi: 10.1162/089976602760407973.
2
Stimulus representation and the timing of reward-prediction errors in models of the dopamine system.多巴胺系统模型中的刺激表征与奖励预测误差的时间安排。
Neural Comput. 2008 Dec;20(12):3034-54. doi: 10.1162/neco.2008.11-07-654.
3
A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。
Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.
4
Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model.由内部模型再现的多巴胺神经元和皮层神经元的预期反应。
Exp Brain Res. 2001 Sep;140(2):234-40. doi: 10.1007/s002210100814.
5
Representation and timing in theories of the dopamine system.多巴胺系统理论中的表征与时机
Neural Comput. 2006 Jul;18(7):1637-77. doi: 10.1162/neco.2006.18.7.1637.
6
Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study.相位多巴胺作为内在和外在强化的预测误差,驱动着动作获取和奖励最大化:一项模拟机器人研究。
Neural Netw. 2013 Mar;39:40-51. doi: 10.1016/j.neunet.2012.12.012. Epub 2013 Jan 14.
7
Can the apparent adaptation of dopamine neurons' mismatch sensitivities be reconciled with their computation of reward prediction errors?多巴胺神经元的失配敏感性的明显适应性能否与它们对奖励预测误差的计算相协调?
Neurosci Lett. 2008 Jun 13;438(1):14-6. doi: 10.1016/j.neulet.2008.04.059. Epub 2008 Apr 22.
8
A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping.中脑多巴胺能神经元在扫视运动对位置-奖励映射的短期和长期适应中的可能作用。
J Neurophysiol. 2004 Oct;92(4):2520-9. doi: 10.1152/jn.00238.2004. Epub 2004 May 26.
9
Reward prediction error computation in the pedunculopontine tegmental nucleus neurons.脚桥被盖核神经元中的奖励预测误差计算
Ann N Y Acad Sci. 2007 May;1104:310-23. doi: 10.1196/annals.1390.003. Epub 2007 Mar 7.
10
Dopamine neurons report an error in the temporal prediction of reward during learning.多巴胺神经元在学习过程中报告奖励时间预测的误差。
Nat Neurosci. 1998 Aug;1(4):304-9. doi: 10.1038/1124.

引用本文的文献

1
Wideband ratiometric measurement of tonic and phasic dopamine release in the striatum.纹状体中多巴胺紧张性和相位性释放的宽带比率测量。
bioRxiv. 2024 Oct 21:2024.10.17.618918. doi: 10.1101/2024.10.17.618918.
2
Explaining dopamine through prediction errors and beyond.通过预测误差解释多巴胺及其他。
Nat Neurosci. 2024 Sep;27(9):1645-1655. doi: 10.1038/s41593-024-01705-4. Epub 2024 Jul 25.
3
Dopamine transients follow a striatal gradient of reward time horizons.多巴胺瞬变遵循纹状体奖赏时程的梯度。
Nat Neurosci. 2024 Apr;27(4):737-746. doi: 10.1038/s41593-023-01566-3. Epub 2024 Feb 6.
4
Striatal dopamine integrates cost, benefit, and motivation.纹状体多巴胺整合了成本、收益和动机。
Neuron. 2024 Feb 7;112(3):500-514.e5. doi: 10.1016/j.neuron.2023.10.038. Epub 2023 Nov 27.
5
Local and global reward learning in the lateral frontal cortex show differential development during human adolescence.外侧前额叶的局部和全局奖励学习在人类青少年时期表现出不同的发展。
PLoS Biol. 2023 Mar 2;21(3):e3002010. doi: 10.1371/journal.pbio.3002010. eCollection 2023 Mar.
6
Performance-gated deliberation: A context-adapted strategy in which urgency is opportunity cost.绩效导向的审议:一种适应情境的策略,其中紧迫性是机会成本。
PLoS Comput Biol. 2022 May 26;18(5):e1010080. doi: 10.1371/journal.pcbi.1010080. eCollection 2022 May.
7
Rats delay gratification during a time-based diminishing returns task.大鼠在基于时间的收益递减任务中延迟满足。
J Exp Psychol Anim Learn Cogn. 2021 Oct;47(4):420-428. doi: 10.1037/xan0000305. Epub 2021 Sep 2.
8
Global reward state affects learning and activity in raphe nucleus and anterior insula in monkeys.全球奖励状态会影响猴子中缝核和前岛叶的学习和活动。
Nat Commun. 2020 Jul 28;11(1):3771. doi: 10.1038/s41467-020-17343-w.
9
Forget-me-some: General versus special purpose models in a hierarchical probabilistic task.忘情草:分层概率任务中的通用模型与专用模型。
PLoS One. 2018 Oct 22;13(10):e0205974. doi: 10.1371/journal.pone.0205974. eCollection 2018.
10
The Successor Representation: Its Computational Logic and Neural Substrates.后继者表象:其计算逻辑与神经基质。
J Neurosci. 2018 Aug 15;38(33):7193-7200. doi: 10.1523/JNEUROSCI.0151-18.2018. Epub 2018 Jul 13.