人类大脑中反向强化学习的神经计算。

Neural computations underlying inverse reinforcement learning in the human brain.

机构信息

Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, United States.

Computation and Neural Systems Program, California Institute of Technology, Pasadena, United States.

出版信息

Elife. 2017 Oct 30;6:e29718. doi: 10.7554/eLife.29718.

DOI:10.7554/eLife.29718

PMID:29083301

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5662289/

Abstract

In inverse reinforcement learning an observer infers the reward distribution available for actions in the environment solely through observing the actions implemented by another agent. To address whether this computational process is implemented in the human brain, participants underwent fMRI while learning about slot machines yielding hidden preferred and non-preferred food outcomes with varying probabilities, through observing the repeated slot choices of agents with similar and dissimilar food preferences. Using formal model comparison, we found that participants implemented inverse RL as opposed to a simple imitation strategy, in which the actions of the other agent are copied instead of inferring the underlying reward structure of the decision problem. Our computational fMRI analysis revealed that anterior dorsomedial prefrontal cortex encoded inferences about action-values within the value space of the agent as opposed to that of the observer, demonstrating that inverse RL is an abstract cognitive process divorceable from the values and concerns of the observer him/herself.

摘要

在逆强化学习中，观察者仅通过观察另一个代理执行的操作，就能推断出环境中可用的奖励分布。为了解决这个计算过程是否在人类大脑中实现的问题，参与者在 fMRI 扫描中观察具有相似和不同食物偏好的代理重复选择老虎机，从而了解产生隐藏的偏好和非偏好食物结果的老虎机，其概率各不相同。通过正式的模型比较，我们发现参与者实施了逆强化学习，而不是简单的模仿策略，在模仿策略中，会复制其他代理的操作，而不是推断决策问题的潜在奖励结构。我们的计算 fMRI 分析表明，前背侧前额叶皮层在代理的价值空间内对动作值进行推断，而不是在观察者的价值空间内进行推断，这表明逆强化学习是一种抽象的认知过程，可以与观察者自身的价值观和关注点分离。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6dbe/5662289/703b91987487/elife-29718-fig1.jpg

相似文献

Neural computations underlying inverse reinforcement learning in the human brain.人类大脑中反向强化学习的神经计算。

Elife. 2017 Oct 30;6:e29718. doi: 10.7554/eLife.29718.

Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.多维环境中强化学习与注意力之间的动态交互

Neuron. 2017 Jan 18;93(2):451-463. doi: 10.1016/j.neuron.2016.12.040.

Computations Underlying Social Hierarchy Learning: Distinct Neural Mechanisms for Updating and Representing Self-Relevant Information.社会等级学习背后的计算：更新和表征自我相关信息的不同神经机制。

Neuron. 2016 Dec 7;92(5):1135-1147. doi: 10.1016/j.neuron.2016.10.052.

Individual differences and the neural representations of reward expectation and reward prediction error.个体差异与奖励预期和奖励预测误差的神经表现。

Soc Cogn Affect Neurosci. 2007 Mar;2(1):20-30. doi: 10.1093/scan/nsl021.

A Computational Account of Optimizing Social Predictions Reveals That Adolescents Are Conservative Learners in Social Contexts.优化社会预测的计算说明揭示了青少年在社会情境中是保守的学习者。

J Neurosci. 2018 Jan 24;38(4):974-988. doi: 10.1523/JNEUROSCI.1044-17.2017. Epub 2017 Dec 18.

Vicarious reinforcement learning signals when instructing others.在指导他人时的替代性强化学习信号。

J Neurosci. 2015 Feb 18;35(7):2904-13. doi: 10.1523/JNEUROSCI.3669-14.2015.

Theta oscillations integrate functionally segregated sub-regions of the medial prefrontal cortex.θ 振荡整合了内侧前额叶皮层的功能分离的子区域。

Neuroimage. 2016 Dec;143:166-174. doi: 10.1016/j.neuroimage.2016.08.024. Epub 2016 Aug 15.

How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.我们如何学习做决策：强化学习预测错误在人类中的快速传播。

J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.

Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems.内嗅皮层和腹内侧前额叶皮层抽象和概括了强化学习问题的结构。

Neuron. 2021 Feb 17;109(4):713-723.e7. doi: 10.1016/j.neuron.2020.11.024. Epub 2020 Dec 22.

Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex.灵长类动物背外侧前额叶皮层中的强化学习与决策机制。

Ann N Y Acad Sci. 2007 May;1104:108-22. doi: 10.1196/annals.1390.007. Epub 2007 Mar 8.

引用本文的文献

Individual differences in autism-like traits are associated with reduced goal emulation in a computational model of observational learning.在观察学习的计算模型中，类自闭症特征的个体差异与目标模仿减少有关。

Nat Ment Health. 2024 Sep;2(9):1032-1044. doi: 10.1038/s44220-024-00287-1. Epub 2024 Jul 10.

Transmission of social bias through observational learning.通过观察学习传递社会偏见。

Sci Adv. 2024 Jun 28;10(26):eadk2030. doi: 10.1126/sciadv.adk2030.

Humans can infer social preferences from decision speed alone.人类仅凭决策速度就能推断出社会偏好。

PLoS Biol. 2024 Jun 20;22(6):e3002686. doi: 10.1371/journal.pbio.3002686. eCollection 2024 Jun.

Heterogeneity in strategy use during arbitration between experiential and observational learning.在经验学习和观察学习之间的仲裁中策略使用的异质性。

Nat Commun. 2024 May 24;15(1):4436. doi: 10.1038/s41467-024-48548-y.

Observational reinforcement learning in children and young adults.儿童和青少年的观察性强化学习

NPJ Sci Learn. 2024 Mar 13;9(1):18. doi: 10.1038/s41539-024-00227-9.

Expecting the Unexpected: Infants Use Others' Surprise to Revise Their Own Expectations.意料之外的预期：婴儿利用他人的惊讶来修正自己的预期。

Open Mind (Camb). 2024 Mar 1;8:67-83. doi: 10.1162/opmi_a_00117. eCollection 2024.

On computational models of theory of mind and the imitative reinforcement learning in spiking neural networks.关于心理理论的计算模型和尖峰神经网络中的模仿强化学习。

Sci Rep. 2024 Jan 23;14(1):1945. doi: 10.1038/s41598-024-52299-7.

Interactive cognitive maps support flexible behavior under threat.交互式认知图支持在威胁下灵活的行为。

Cell Rep. 2023 Aug 29;42(8):113008. doi: 10.1016/j.celrep.2023.113008. Epub 2023 Aug 22.

Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T.状态和动作关联或区分泛化的强化学习：3T 和 7T 的 fMRI。

Hum Brain Mapp. 2022 Oct 15;43(15):4750-4790. doi: 10.1002/hbm.25988. Epub 2022 Jul 21.

Learning under social versus nonsocial uncertainty: A meta-analytic approach.在社会不确定性与非社会不确定性下的学习：一项元分析方法。

Hum Brain Mapp. 2022 Sep;43(13):4185-4206. doi: 10.1002/hbm.25948. Epub 2022 May 27.

本文引用的文献

Learning the Structure of Social Influence.学习社会影响的结构。

Cogn Sci. 2017 Apr;41 Suppl 3:545-575. doi: 10.1111/cogs.12480. Epub 2017 Mar 13.

The Naïve Utility Calculus: Computational Principles Underlying Commonsense Psychology.朴素效用演算：常识心理学的基础计算原则。

Trends Cogn Sci. 2016 Aug;20(8):589-604. doi: 10.1016/j.tics.2016.05.011. Epub 2016 Jul 4.

The Anterior Cingulate Gyrus and Social Cognition: Tracking the Motivation of Others.前扣带回与社会认知：追踪他人的动机

Neuron. 2016 May 18;90(4):692-707. doi: 10.1016/j.neuron.2016.04.018.

Learning From Others: The Consequences of Psychological Reasoning for Human Learning.向他人学习：心理推理对人类学习的影响。

Perspect Psychol Sci. 2012 Jul;7(4):341-51. doi: 10.1177/1745691612448481.

Characterizing the associative content of brain structures involved in habitual and goal-directed actions in humans: a multivariate FMRI study.表征人类习惯性和目标导向行为中涉及的脑结构的关联内容：一项多变量功能磁共振成像研究。

J Neurosci. 2015 Mar 4;35(9):3764-71. doi: 10.1523/JNEUROSCI.4677-14.2015.

Multiple neural mechanisms of decision making and their competition under changing risk pressure.多种决策的神经机制及其在不断变化的风险压力下的竞争。

Neuron. 2014 Mar 5;81(5):1190-1202. doi: 10.1016/j.neuron.2014.01.033.

The behavioral and neural mechanisms underlying the tracking of expertise.专长追踪的行为和神经机制。

Neuron. 2013 Dec 18;80(6):1558-71. doi: 10.1016/j.neuron.2013.10.024.

Insights from the application of computational neuroimaging to social neuroscience.计算神经影像学在社会神经科学中的应用的启示。

Curr Opin Neurobiol. 2013 Jun;23(3):387-92. doi: 10.1016/j.conb.2013.02.007. Epub 2013 Mar 18.

An agent independent axis for executed and modeled choice in medial prefrontal cortex.内侧前额叶皮质中执行和模拟选择的代理独立轴。

Neuron. 2012 Sep 20;75(6):1114-21. doi: 10.1016/j.neuron.2012.07.023.

Learning to simulate others' decisions.学习模拟他人的决策。

Neuron. 2012 Jun 21;74(6):1125-37. doi: 10.1016/j.neuron.2012.04.030.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人类大脑中反向强化学习的神经计算。

Neural computations underlying inverse reinforcement learning in the human brain.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献