• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超贴现时间差分学习。

Hyperbolically discounted temporal difference learning.

机构信息

Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, USA.

出版信息

Neural Comput. 2010 Jun;22(6):1511-27. doi: 10.1162/neco.2010.08-09-1080.

DOI:10.1162/neco.2010.08-09-1080
PMID:20100071
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3005720/
Abstract

Hyperbolic discounting of future outcomes is widely observed to underlie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent models of temporal discounting, such as temporal difference learning, assume that future outcomes are discounted exponentially. Exponential discounting has been preferred largely because it can be expressed recursively, whereas hyperbolic discounting has heretofore been thought not to have a recursive definition. In this letter, we define a learning algorithm, hyperbolically discounted temporal difference (HDTD) learning, which constitutes a recursive formulation of the hyperbolic model.

摘要

未来结果的双曲线折扣被广泛认为是动物选择行为的基础。此外,最近的研究(Kobayashi 和 Schultz,2008)报告说,即使在选择的神经基础中也观察到了双曲线折扣。然而,最流行的时间折扣模型,如时间差分学习,假设未来结果按指数折扣。指数折扣之所以被优先选择,主要是因为它可以递归地表示,而双曲线折扣迄今为止被认为没有递归定义。在这封信中,我们定义了一种学习算法,即双曲线折扣时间差分(HDTD)学习,它构成了双曲线模型的递归公式。

相似文献

1
Hyperbolically discounted temporal difference learning.超贴现时间差分学习。
Neural Comput. 2010 Jun;22(6):1511-27. doi: 10.1162/neco.2010.08-09-1080.
2
Discounting of reward sequences: a test of competing formal models of hyperbolic discounting.折扣奖励序列:对双曲线折扣的竞争形式模型的检验。
Front Psychol. 2014 Mar 6;5:178. doi: 10.3389/fpsyg.2014.00178. eCollection 2014.
3
Reward-modulated Hebbian learning of decision making.奖励调节的决策赫布学习。
Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.
4
Supervised learning in spiking neural networks with ReSuMe: sequence learning, classification, and spike shifting.基于 ReSuMe 的尖峰神经网络监督学习:序列学习、分类和尖峰转移。
Neural Comput. 2010 Feb;22(2):467-510. doi: 10.1162/neco.2009.11-08-901.
5
Spatial and temporal pattern analysis via spiking neurons.通过脉冲神经元进行时空模式分析。
Network. 1998 Aug;9(3):319-32.
6
Spiking neuron model for temporal sequence recognition.用于时间序列识别的尖峰神经元模型。
Neural Comput. 2010 Jan;22(1):61-93. doi: 10.1162/neco.2009.12-07-679.
7
Estimating Scale-Invariant Future in Continuous Time.连续时间中尺度不变未来的估计。
Neural Comput. 2019 Apr;31(4):681-709. doi: 10.1162/neco_a_01171. Epub 2019 Feb 14.
8
Bayesian spiking neurons II: learning.贝叶斯脉冲神经元II:学习
Neural Comput. 2008 Jan;20(1):118-45. doi: 10.1162/neco.2008.20.1.118.
9
Efficient continuous-time asymmetric Hopfield networks for memory retrieval.高效连续时间非对称 Hopfield 网络用于记忆检索。
Neural Comput. 2010 Jun;22(6):1597-614. doi: 10.1162/neco.2010.05-09-1014.
10
Systematic fluctuation expansion for neural network activity equations.神经网络活动方程的系统涨落展开。
Neural Comput. 2010 Feb;22(2):377-426. doi: 10.1162/neco.2009.02-09-960.

引用本文的文献

1
Cognitive mechanisms of learning in sequential decision-making under uncertainty: an experimental and theoretical approach.不确定性下序列决策中学习的认知机制:一种实验与理论方法
Front Behav Neurosci. 2024 Aug 12;18:1399394. doi: 10.3389/fnbeh.2024.1399394. eCollection 2024.
2
Exponential history integration with diverse temporal scales in retrosplenial cortex supports hyperbolic behavior.内侧后皮质中具有多种时间尺度的指数历史整合支持双曲行为。
Sci Adv. 2023 Dec;9(48):eadj4897. doi: 10.1126/sciadv.adj4897. Epub 2023 Nov 29.
3
Cognitive bias and how to improve sustainable decision making.认知偏差以及如何改善可持续决策
Front Psychol. 2023 Feb 28;14:1129835. doi: 10.3389/fpsyg.2023.1129835. eCollection 2023.
4
Retention and Transfer of Cognitive Bias Mitigation Interventions: A Systematic Literature Study.认知偏差缓解干预措施的保留与迁移:一项系统文献研究
Front Psychol. 2021 Aug 12;12:629354. doi: 10.3389/fpsyg.2021.629354. eCollection 2021.
5
A Neural Network Framework for Cognitive Bias.一种用于认知偏差的神经网络框架。
Front Psychol. 2018 Sep 3;9:1561. doi: 10.3389/fpsyg.2018.01561. eCollection 2018.
6
Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective.时间折扣能解释不健康行为吗?一项系统综述及强化学习视角
Front Behav Neurosci. 2014 Mar 12;8:76. doi: 10.3389/fnbeh.2014.00076. eCollection 2014.
7
Discounting of reward sequences: a test of competing formal models of hyperbolic discounting.折扣奖励序列:对双曲线折扣的竞争形式模型的检验。
Front Psychol. 2014 Mar 6;5:178. doi: 10.3389/fpsyg.2014.00178. eCollection 2014.
8
Don'T let me do that! - models of precommitment.别让我那样做!-承诺前置模型。
Front Neurosci. 2012 Oct 8;6:138. doi: 10.3389/fnins.2012.00138. eCollection 2012.
9
Dopamine neurons learn to encode the long-term value of multiple future rewards.多巴胺神经元学会编码多个未来奖励的长期价值。
Proc Natl Acad Sci U S A. 2011 Sep 13;108(37):15462-7. doi: 10.1073/pnas.1014457108. Epub 2011 Sep 6.
10
A reinforcement learning model of precommitment in decision making.决策中预先承诺的强化学习模型。
Front Behav Neurosci. 2010 Dec 14;4:184. doi: 10.3389/fnbeh.2010.00184. eCollection 2010.

本文引用的文献

1
Preference for sequences of rewards: further tests of a parallel discounting model.对奖励序列的偏好:平行折扣模型的进一步测试
Behav Processes. 1999 Apr;45(1-3):87-99. doi: 10.1016/s0376-6357(99)00011-x.
2
Influence of reward delays on responses of dopamine neurons.奖励延迟对多巴胺神经元反应的影响。
J Neurosci. 2008 Jul 30;28(31):7837-46. doi: 10.1523/JNEUROSCI.1600-08.2008.
3
Low-serotonin levels increase delayed reward discounting in humans.血清素水平低会增加人类对延迟奖励的折扣。
J Neurosci. 2008 Apr 23;28(17):4528-32. doi: 10.1523/JNEUROSCI.4982-07.2008.
4
Discounting of delayed rewards: Models of individual choice.延迟奖励折扣:个体选择模型。
J Exp Anal Behav. 1995 Nov;64(3):263-76. doi: 10.1901/jeab.1995.64-263.
5
Long-term reward prediction in TD models of the dopamine system.多巴胺系统TD模型中的长期奖励预测。
Neural Comput. 2002 Nov;14(11):2567-83. doi: 10.1162/089976602760407973.
6
Temporal difference model reproduces anticipatory neural activity.时间差异模型再现预期神经活动。
Neural Comput. 2001 Apr;13(4):841-62. doi: 10.1162/089976601300014376.
7
Reward-predicting and reward-detecting neuronal activity in the primate supplementary eye field.灵长类动物辅助眼区中奖励预测和奖励检测的神经元活动。
J Neurophysiol. 2000 Oct;84(4):2166-70. doi: 10.1152/jn.2000.84.4.2166.
8
Reward processing in primate orbitofrontal cortex and basal ganglia.灵长类动物眶额皮质和基底神经节中的奖赏处理
Cereb Cortex. 2000 Mar;10(3):272-84. doi: 10.1093/cercor/10.3.272.
9
Reward-related activity in the monkey striatum and substantia nigra.猴子纹状体和黑质中与奖励相关的活动。
Prog Brain Res. 1993;99:227-35. doi: 10.1016/s0079-6123(08)61349-7.
10
Neuronal activity in monkey ventral striatum related to the expectation of reward.猴子腹侧纹状体中与奖励期望相关的神经元活动。
J Neurosci. 1992 Dec;12(12):4595-610. doi: 10.1523/JNEUROSCI.12-12-04595.1992.