Suppr超能文献

超贴现时间差分学习。

Hyperbolically discounted temporal difference learning.

机构信息

Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, USA.

出版信息

Neural Comput. 2010 Jun;22(6):1511-27. doi: 10.1162/neco.2010.08-09-1080.

Abstract

Hyperbolic discounting of future outcomes is widely observed to underlie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent models of temporal discounting, such as temporal difference learning, assume that future outcomes are discounted exponentially. Exponential discounting has been preferred largely because it can be expressed recursively, whereas hyperbolic discounting has heretofore been thought not to have a recursive definition. In this letter, we define a learning algorithm, hyperbolically discounted temporal difference (HDTD) learning, which constitutes a recursive formulation of the hyperbolic model.

摘要

未来结果的双曲线折扣被广泛认为是动物选择行为的基础。此外,最近的研究(Kobayashi 和 Schultz,2008)报告说,即使在选择的神经基础中也观察到了双曲线折扣。然而,最流行的时间折扣模型,如时间差分学习,假设未来结果按指数折扣。指数折扣之所以被优先选择,主要是因为它可以递归地表示,而双曲线折扣迄今为止被认为没有递归定义。在这封信中,我们定义了一种学习算法,即双曲线折扣时间差分(HDTD)学习,它构成了双曲线模型的递归公式。

相似文献

1
Hyperbolically discounted temporal difference learning.超贴现时间差分学习。
Neural Comput. 2010 Jun;22(6):1511-27. doi: 10.1162/neco.2010.08-09-1080.
3
Reward-modulated Hebbian learning of decision making.奖励调节的决策赫布学习。
Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.
6
7
Estimating Scale-Invariant Future in Continuous Time.连续时间中尺度不变未来的估计。
Neural Comput. 2019 Apr;31(4):681-709. doi: 10.1162/neco_a_01171. Epub 2019 Feb 14.
8
Bayesian spiking neurons II: learning.贝叶斯脉冲神经元II:学习
Neural Comput. 2008 Jan;20(1):118-45. doi: 10.1162/neco.2008.20.1.118.

引用本文的文献

3
Cognitive bias and how to improve sustainable decision making.认知偏差以及如何改善可持续决策
Front Psychol. 2023 Feb 28;14:1129835. doi: 10.3389/fpsyg.2023.1129835. eCollection 2023.
5
A Neural Network Framework for Cognitive Bias.一种用于认知偏差的神经网络框架。
Front Psychol. 2018 Sep 3;9:1561. doi: 10.3389/fpsyg.2018.01561. eCollection 2018.
8
Don'T let me do that! - models of precommitment.别让我那样做!-承诺前置模型。
Front Neurosci. 2012 Oct 8;6:138. doi: 10.3389/fnins.2012.00138. eCollection 2012.
9
Dopamine neurons learn to encode the long-term value of multiple future rewards.多巴胺神经元学会编码多个未来奖励的长期价值。
Proc Natl Acad Sci U S A. 2011 Sep 13;108(37):15462-7. doi: 10.1073/pnas.1014457108. Epub 2011 Sep 6.
10
A reinforcement learning model of precommitment in decision making.决策中预先承诺的强化学习模型。
Front Behav Neurosci. 2010 Dec 14;4:184. doi: 10.3389/fnbeh.2010.00184. eCollection 2010.

本文引用的文献

2
Influence of reward delays on responses of dopamine neurons.奖励延迟对多巴胺神经元反应的影响。
J Neurosci. 2008 Jul 30;28(31):7837-46. doi: 10.1523/JNEUROSCI.1600-08.2008.
4
Discounting of delayed rewards: Models of individual choice.延迟奖励折扣:个体选择模型。
J Exp Anal Behav. 1995 Nov;64(3):263-76. doi: 10.1901/jeab.1995.64-263.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验