• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人类强化引导学习的神经计算机制:综述

Neurocomputational mechanisms of reinforcement-guided learning in humans: a review.

作者信息

Cohen Michael X

机构信息

University of California, Davis, California, USA.

出版信息

Cogn Affect Behav Neurosci. 2008 Jun;8(2):113-25. doi: 10.3758/cabn.8.2.113.

DOI:10.3758/cabn.8.2.113
PMID:18589502
Abstract

Adapting decision making according to dynamic and probabilistic changes in action-reward contingencies is critical for survival in a competitive and resource-limited world. Much research has focused on elucidating the neural systems and computations that underlie how the brain identifies whether the consequences of actions are relatively good or bad. In contrast, less empirical research has focused on the mechanisms by which reinforcements might be used to guide decision making. Here, I review recent studies in which an attempt to bridge this gap has been made by characterizing how humans use reward information to guide and optimize decision making. Regions that have been implicated in reinforcement processing, including the striatum, orbitofrontal cortex, and anterior cingulate, also seem to mediate how reinforcements are used to adjust subsequent decision making. This research provides insights into why the brain devotes resources to evaluating reinforcements and suggests a direction for future research, from studying the mechanisms of reinforcement processing to studying the mechanisms of reinforcement learning.

摘要

根据行动-奖励意外情况中的动态和概率变化来调整决策,对于在竞争激烈且资源有限的世界中生存至关重要。许多研究都集中在阐明大脑如何识别行动后果相对好坏背后的神经系统和计算过程。相比之下,较少有实证研究关注强化可能用于指导决策的机制。在此,我回顾了最近的一些研究,这些研究试图通过描述人类如何利用奖励信息来指导和优化决策来弥合这一差距。与强化处理相关的区域,包括纹状体、眶额皮质和前扣带回,似乎也介导了强化如何用于调整后续决策。这项研究为大脑为何投入资源评估强化提供了见解,并为未来研究指明了方向,即从研究强化处理机制到研究强化学习机制。

相似文献

1
Neurocomputational mechanisms of reinforcement-guided learning in humans: a review.人类强化引导学习的神经计算机制:综述
Cogn Affect Behav Neurosci. 2008 Jun;8(2):113-25. doi: 10.3758/cabn.8.2.113.
2
Neural basis of reinforcement learning and decision making.强化学习和决策的神经基础。
Annu Rev Neurosci. 2012;35:287-308. doi: 10.1146/annurev-neuro-062111-150512. Epub 2012 Mar 29.
3
Opponent Identity Influences Value Learning in Simple Games.对手身份会影响简单游戏中的价值学习。
J Neurosci. 2015 Aug 5;35(31):11133-43. doi: 10.1523/JNEUROSCI.3530-14.2015.
4
With you or against you: social orientation dependent learning signals guide actions made for others.支持你或反对你:社会取向依赖的学习信号引导为他人做出的行动。
Neuroimage. 2015 Jan 1;104:326-35. doi: 10.1016/j.neuroimage.2014.09.011. Epub 2014 Sep 16.
5
Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。
Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.
6
How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.我们如何学习做决策:强化学习预测错误在人类中的快速传播。
J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.
7
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.在基于奖励的决策过程中,人类纹状体中的强化学习信号可区分学习者和非学习者。
J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.
8
The ubiquity of model-based reinforcement learning.基于模型的强化学习无处不在。
Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.
9
Neural correlates of reinforcement learning and social preferences in competitive bidding.竞争投标中强化学习和社会偏好的神经关联。
J Neurosci. 2013 Jan 30;33(5):2137-46. doi: 10.1523/JNEUROSCI.3095-12.2013.
10
Adversarial vulnerabilities of human decision-making.人类决策的对抗性漏洞。
Proc Natl Acad Sci U S A. 2020 Nov 17;117(46):29221-29228. doi: 10.1073/pnas.2016921117. Epub 2020 Nov 4.

引用本文的文献

1
The impact of emotional feedback in learning easy and difficult tasks - an ERP study.情绪反馈对学习简单和困难任务的影响——一项事件相关电位研究
Cogn Affect Behav Neurosci. 2025 Mar 27. doi: 10.3758/s13415-025-01284-2.
2
Pharmacological Modulation of Dopamine Receptors Reveals Distinct Brain-Wide Networks Associated with Learning and Motivation in Nonhuman Primates.多巴胺受体的药理学调节揭示了与非人灵长类动物学习和动机相关的独特全脑网络。
J Neurosci. 2025 Feb 5;45(6):e1301242024. doi: 10.1523/JNEUROSCI.1301-24.2024.
3
Pharmacological modulation of dopamine receptors reveals distinct brain-wide networks associated with learning and motivation in non-human primates.

本文引用的文献

1
Dorsal anterior cingulate cortex integrates reinforcement history to guide voluntary behavior.背侧前扣带回皮层整合强化历史以指导自愿行为。
Cortex. 2008 May;44(5):548-59. doi: 10.1016/j.cortex.2007.08.013. Epub 2007 Dec 23.
2
Risk prediction and aversion by anterior cingulate cortex.前扣带回皮质对风险的预测与规避
Cogn Affect Behav Neurosci. 2007 Dec;7(4):266-77. doi: 10.3758/cabn.7.4.266.
3
Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning.基因三重解离揭示了多巴胺在强化学习中的多种作用。
多巴胺受体的药理学调节揭示了与非人灵长类动物学习和动机相关的不同全脑网络。
bioRxiv. 2024 Nov 8:2023.12.27.573487. doi: 10.1101/2023.12.27.573487.
4
Ageing is associated with disrupted reinforcement learning whilst learning to help others is preserved.随着年龄的增长,强化学习会受到干扰,而学习帮助他人的能力则得以保留。
Nat Commun. 2021 Jul 21;12(1):4440. doi: 10.1038/s41467-021-24576-w.
5
Separating Probability and Reversal Learning in a Novel Probabilistic Reversal Learning Task for Mice.在一种用于小鼠的新型概率反转学习任务中分离概率学习和反转学习
Front Behav Neurosci. 2020 Jan 9;13:270. doi: 10.3389/fnbeh.2019.00270. eCollection 2019.
6
Examining Procedural Learning and Corticostriatal Pathways for Individual Differences in Language: Testing Endophenotypes of .研究语言个体差异的程序性学习和皮质纹状体通路:测试……的内表型
Lang Cogn Neurosci. 2016;31(9):1098-1114. doi: 10.1080/23273798.2015.1089359. Epub 2015 Oct 7.
7
The neural encoding of information prediction errors during non-instrumental information seeking.非工具性信息寻求过程中信息预测误差的神经编码。
Sci Rep. 2018 Apr 17;8(1):6134. doi: 10.1038/s41598-018-24566-x.
8
Impulsivity and predictive control are associated with suboptimal action-selection and action-value learning in regular gamblers.冲动性和预测性控制与普通赌徒次优的行动选择和行动价值学习有关。
Int Gambl Stud. 2015;15(3):489-505. doi: 10.1080/14459795.2015.1078835. Epub 2015 Nov 15.
9
Diagnosis of ADHD and its Behavioral, Neurologic and Genetic Roots.注意力缺陷多动障碍的诊断及其行为、神经学和遗传学根源。
Top Lang Disord. 2012 Jul;32(3):207-227. doi: 10.1097/TLD.0b013e318261ffdd.
10
Early effects of reward anticipation are modulated by dopaminergic stimulation.奖励预期的早期效应受多巴胺能刺激调节。
PLoS One. 2014 Oct 6;9(10):e108886. doi: 10.1371/journal.pone.0108886. eCollection 2014.
Proc Natl Acad Sci U S A. 2007 Oct 9;104(41):16311-6. doi: 10.1073/pnas.0706111104. Epub 2007 Oct 3.
4
Individual differences and the neural representations of reward expectation and reward prediction error.个体差异与奖励预期和奖励预测误差的神经表现。
Soc Cogn Affect Neurosci. 2007 Mar;2(1):20-30. doi: 10.1093/scan/nsl021.
5
A computational model of risk, conflict, and individual difference effects in the anterior cingulate cortex.前扣带回皮质中风险、冲突及个体差异效应的计算模型
Brain Res. 2008 Apr 2;1202:99-108. doi: 10.1016/j.brainres.2007.06.080. Epub 2007 Jul 26.
6
Substantia nigra/ventral tegmental reward prediction error disruption in psychosis.精神分裂症中黑质/腹侧被盖区奖励预测误差的破坏
Mol Psychiatry. 2008 Mar;13(3):239, 267-76. doi: 10.1038/sj.mp.4002058. Epub 2007 Aug 7.
7
Learning the value of information in an uncertain world.在一个不确定的世界中了解信息的价值。
Nat Neurosci. 2007 Sep;10(9):1214-21. doi: 10.1038/nn1954. Epub 2007 Aug 5.
8
It's worse than you thought: the feedback negativity and violations of reward prediction in gambling tasks.情况比你想象的更糟:赌博任务中的反馈负波与奖励预测违背
Psychophysiology. 2007 Nov;44(6):905-12. doi: 10.1111/j.1469-8986.2007.00567.x. Epub 2007 Jul 30.
9
Statistics of midbrain dopamine neuron spike trains in the awake primate.清醒灵长类动物中脑多巴胺能神经元动作电位序列的统计分析
J Neurophysiol. 2007 Sep;98(3):1428-39. doi: 10.1152/jn.01140.2006. Epub 2007 Jul 5.
10
Expected value, reward outcome, and temporal difference error representations in a probabilistic decision task.概率决策任务中的期望值、奖励结果和时间差误差表征
Cereb Cortex. 2008 Mar;18(3):652-63. doi: 10.1093/cercor/bhm097. Epub 2007 Jun 22.