• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人类强化学习中动作偏差的短期记忆痕迹

Short-term memory traces for action bias in human reinforcement learning.

作者信息

Bogacz Rafal, McClure Samuel M, Li Jian, Cohen Jonathan D, Montague P Read

机构信息

Center for the Study of Brain, Mind and Behavior, Princeton University, Princeton, NJ 08544, USA.

出版信息

Brain Res. 2007 Jun 11;1153:111-21. doi: 10.1016/j.brainres.2007.03.057. Epub 2007 Mar 24.

DOI:10.1016/j.brainres.2007.03.057
PMID:17459346
Abstract

Recent experimental and theoretical work on reinforcement learning has shed light on the neural bases of learning from rewards and punishments. One fundamental problem in reinforcement learning is the credit assignment problem, or how to properly assign credit to actions that lead to reward or punishment following a delay. Temporal difference learning solves this problem, but its efficiency can be significantly improved by the addition of eligibility traces (ET). In essence, ETs function as decaying memories of previous choices that are used to scale synaptic weight changes. It has been shown in theoretical studies that ETs spanning a number of actions may improve the performance of reinforcement learning. However, it remains an open question whether including ETs that persist over sequences of actions allows reinforcement learning models to better fit empirical data regarding the behaviors of humans and other animals. Here, we report an experiment in which human subjects performed a sequential economic decision game in which the long-term optimal strategy differed from the strategy that leads to the greatest short-term return. We demonstrate that human subjects' performance in the task is significantly affected by the time between choices in a surprising and seemingly counterintuitive way. However, this behavior is naturally explained by a temporal difference learning model which includes ETs persisting across actions. Furthermore, we review recent findings that suggest that short-term synaptic plasticity in dopamine neurons may provide a realistic biophysical mechanism for producing ETs that persist on a timescale consistent with behavioral observations.

摘要

近期关于强化学习的实验和理论研究揭示了从奖励和惩罚中学习的神经基础。强化学习中的一个基本问题是信用分配问题,即如何正确地将信用分配给在延迟后导致奖励或惩罚的行为。时间差分学习解决了这个问题,但通过添加资格迹线(ET)可以显著提高其效率。本质上,资格迹线起到了对先前选择的衰减记忆的作用,用于缩放突触权重变化。理论研究表明,跨越多个行为的资格迹线可能会提高强化学习的性能。然而,包含在一系列行为中持续存在的资格迹线是否能使强化学习模型更好地拟合关于人类和其他动物行为的实证数据,这仍然是一个悬而未决的问题。在这里,我们报告了一项实验,其中人类受试者进行了一个顺序经济决策游戏,在这个游戏中,长期最优策略与导致最大短期回报的策略不同。我们证明,人类受试者在任务中的表现受到选择之间时间的显著影响,这种影响方式令人惊讶且看似违反直觉。然而,这种行为可以通过一个包含跨行为持续存在的资格迹线的时间差分学习模型自然地解释。此外,我们回顾了最近的研究结果,这些结果表明多巴胺神经元中的短期突触可塑性可能为产生与行为观察一致的时间尺度上持续存在的资格迹线提供一种现实的生物物理机制。

相似文献

1
Short-term memory traces for action bias in human reinforcement learning.人类强化学习中动作偏差的短期记忆痕迹
Brain Res. 2007 Jun 11;1153:111-21. doi: 10.1016/j.brainres.2007.03.057. Epub 2007 Mar 24.
2
Dynamical model of salience gated working memory, action selection and reinforcement based on basal ganglia and dopamine feedback.基于基底神经节和多巴胺反馈的显著性门控工作记忆、动作选择与强化的动力学模型。
Neural Netw. 2008 Mar-Apr;21(2-3):322-30. doi: 10.1016/j.neunet.2007.12.040. Epub 2007 Dec 31.
3
An implementation of reinforcement learning based on spike timing dependent plasticity.一种基于脉冲时间依赖可塑性的强化学习实现。
Biol Cybern. 2008 Dec;99(6):517-23. doi: 10.1007/s00422-008-0265-6. Epub 2008 Oct 22.
4
A synaptic reinforcement-based model for transient amnesia following disruptions of memory consolidation and reconsolidation.一种基于突触强化的模型,用于解释记忆巩固和再巩固中断后的短暂性失忆。
Hippocampus. 2008;18(6):584-601. doi: 10.1002/hipo.20420.
5
A spiking neural network model of an actor-critic learning agent.一种基于演员-评论家学习智能体的脉冲神经网络模型。
Neural Comput. 2009 Feb;21(2):301-39. doi: 10.1162/neco.2008.08-07-593.
6
Attention-gated reinforcement learning of internal representations for classification.用于分类的内部表征的注意力门控强化学习。
Neural Comput. 2005 Oct;17(10):2176-214. doi: 10.1162/0899766054615699.
7
From recurrent choice to skill learning: a reinforcement-learning model.从反复选择到技能学习:一种强化学习模型
J Exp Psychol Gen. 2006 May;135(2):184-206. doi: 10.1037/0096-3445.135.2.184.
8
Multiple model-based reinforcement learning explains dopamine neuronal activity.基于多种模型的强化学习解释了多巴胺神经元的活动。
Neural Netw. 2007 Aug;20(6):668-75. doi: 10.1016/j.neunet.2007.04.028. Epub 2007 Jun 6.
9
Reinforcement learning, spike-time-dependent plasticity, and the BCM rule.强化学习、尖峰时间依赖性可塑性与BCM规则。
Neural Comput. 2007 Aug;19(8):2245-79. doi: 10.1162/neco.2007.19.8.2245.
10
Dopamine, prediction error and associative learning: a model-based account.多巴胺、预测误差与联想学习:基于模型的解释
Network. 2006 Mar;17(1):61-84. doi: 10.1080/09548980500361624.

引用本文的文献

1
State-transition-free reinforcement learning in chimpanzees (Pan troglodytes).黑猩猩(Pan troglodytes)中无状态转换的强化学习。
Learn Behav. 2023 Dec;51(4):413-427. doi: 10.3758/s13420-023-00591-3. Epub 2023 Jun 27.
2
Ultrasound modulation of macaque prefrontal cortex selectively alters credit assignment-related activity and behavior.猕猴前额叶皮层的超声调制选择性地改变与信用分配相关的活动和行为。
Sci Adv. 2021 Dec 17;7(51):eabg7700. doi: 10.1126/sciadv.abg7700. Epub 2021 Dec 15.
3
A new model of decision processing in instrumental learning tasks.
一种新的工具学习任务中的决策处理模型。
Elife. 2021 Jan 27;10:e63055. doi: 10.7554/eLife.63055.
4
Global reward state affects learning and activity in raphe nucleus and anterior insula in monkeys.全球奖励状态会影响猴子中缝核和前岛叶的学习和活动。
Nat Commun. 2020 Jul 28;11(1):3771. doi: 10.1038/s41467-020-17343-w.
5
One-shot learning and behavioral eligibility traces in sequential decision making.序列决策中的单次学习和行为资格痕迹。
Elife. 2019 Nov 11;8:e47463. doi: 10.7554/eLife.47463.
6
Proactive Information Sampling in Value-Based Decision-Making: Deciding When and Where to Saccade.基于价值的决策中的主动信息采样:决定何时以及何处进行扫视。
Front Hum Neurosci. 2019 Feb 11;13:35. doi: 10.3389/fnhum.2019.00035. eCollection 2019.
7
Dopamine, time perception, and future time perspective.多巴胺、时间感知与未来时间透视。
Psychopharmacology (Berl). 2018 Oct;235(10):2783-2793. doi: 10.1007/s00213-018-4971-z. Epub 2018 Jul 19.
8
Solving the Credit Assignment Problem With the Prefrontal Cortex.利用前额叶皮层解决信用分配问题。
Front Neurosci. 2018 Mar 27;12:182. doi: 10.3389/fnins.2018.00182. eCollection 2018.
9
A simple computational algorithm of model-based choice preference.一种基于模型的选择偏好的简单计算算法。
Cogn Affect Behav Neurosci. 2017 Aug;17(4):764-783. doi: 10.3758/s13415-017-0511-2.
10
To not settle for small losses: evidence for an ecological aspiration level of zero in dynamic decision-making.不满足于小损失:动态决策中生态抱负水平为零的证据。
Psychon Bull Rev. 2017 Apr;24(2):536-546. doi: 10.3758/s13423-016-1080-z.