• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

折扣价值的强化学习在应用于动物学习时经常会失去目标。

Reinforcement learning for discounted values often loses the goal in the application to animal learning.

机构信息

Graduate School of Brain Sciences, Tamagawa University, Tokyo, Japan.

出版信息

Neural Netw. 2012 Nov;35:88-91. doi: 10.1016/j.neunet.2012.08.004. Epub 2012 Aug 24.

DOI:10.1016/j.neunet.2012.08.004
PMID:22960494
Abstract

The impulsive preference of an animal for an immediate reward implies that it might subjectively discount the value of potential future outcomes. A theoretical framework to maximize the discounted subjective value has been established in the reinforcement learning theory. The framework has been successfully applied in engineering. However, this study identified a limitation when applied to animal behavior, where in some cases, there is no learning goal. Here a possible learning framework was proposed that is well-posed in any cases and that is consistent with the impulsive preference.

摘要

动物对即时奖励的冲动偏好意味着它可能会主观地低估潜在未来结果的价值。强化学习理论中已经建立了一个最大化折扣主观价值的理论框架。该框架已成功应用于工程领域。然而,当应用于动物行为时,该研究发现了一个限制,即某些情况下,没有学习目标。这里提出了一个可能的学习框架,它在任何情况下都是有解的,并且与冲动偏好一致。

相似文献

1
Reinforcement learning for discounted values often loses the goal in the application to animal learning.折扣价值的强化学习在应用于动物学习时经常会失去目标。
Neural Netw. 2012 Nov;35:88-91. doi: 10.1016/j.neunet.2012.08.004. Epub 2012 Aug 24.
2
[The model of the reward choice basing on the theory of reinforcement learning].基于强化学习理论的奖励选择模型
Zh Vyssh Nerv Deiat Im I P Pavlova. 2007 Mar-Apr;57(2):133-43.
3
Beyond simple reinforcement learning: the computational neurobiology of reward-learning and valuation.超越简单的强化学习:奖励学习和估值的计算神经生物学。
Eur J Neurosci. 2012 Apr;35(7):987-90. doi: 10.1111/j.1460-9568.2012.08074.x.
4
Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.啮齿动物在并发强化程序下基于模型的强化学习
Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.
5
Immediate return preference emerged from a synaptic learning rule for return maximization.即时回报偏好源于最大化回报的突触学习规则。
Neural Netw. 2015 Feb;62:83-90. doi: 10.1016/j.neunet.2014.04.004. Epub 2014 May 14.
6
[Reinforcement learning by striatum].[纹状体的强化学习]
Brain Nerve. 2009 Apr;61(4):405-11.
7
Theory meets pigeons: the influence of reward-magnitude on discrimination-learning.理论与鸽子:奖励幅度对辨别学习的影响。
Behav Brain Res. 2009 Mar 2;198(1):125-9. doi: 10.1016/j.bbr.2008.10.038. Epub 2008 Nov 8.
8
SOVEREIGN: An autonomous neural system for incrementally learning planned action sequences to navigate towards a rewarded goal.主权者:一种自主神经系统,用于逐步学习规划动作序列以朝着奖励目标导航。
Neural Netw. 2008 Jun;21(5):699-758. doi: 10.1016/j.neunet.2007.09.016. Epub 2007 Oct 7.
9
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning.基于样本复用的奖励加权回归的强化学习中直接策略搜索
Neural Comput. 2011 Nov;23(11):2798-832. doi: 10.1162/NECO_a_00199. Epub 2011 Aug 18.
10
Cost, benefit, tonic, phasic: what do response rates tell us about dopamine and motivation?成本、收益、紧张性、相位性:反应率能告诉我们关于多巴胺与动机的哪些信息?
Ann N Y Acad Sci. 2007 May;1104:357-76. doi: 10.1196/annals.1390.018. Epub 2007 Apr 7.