• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

状态独立任务表示的信用分配及其与基于模型的决策的关系。

Credit assignment to state-independent task representations and its relationship with model-based decision making.

机构信息

Wellcome Centre for Human Neuroimaging, University College London, WC1N 3BG London, United Kingdom;

Department for Imaging Neurosciences, Max Planck University College London Centre for Computational Psychiatry and Ageing Research, WC1B 5EH London, United Kingdom.

出版信息

Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15871-15876. doi: 10.1073/pnas.1821647116. Epub 2019 Jul 18.

DOI:10.1073/pnas.1821647116
PMID:31320592
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6689934/
Abstract

Model-free learning enables an agent to make better decisions based on prior experience while representing only minimal knowledge about an environment's structure. It is generally assumed that model-free state representations are based on outcome-relevant features of the environment. Here, we challenge this assumption by providing evidence that a putative model-free system assigns credit to task representations that are irrelevant to an outcome. We examined data from 769 individuals performing a well-described 2-step reward decision task where stimulus identity but not spatial-motor aspects of the task predicted reward. We show that participants assigned value to spatial-motor representations despite it being outcome irrelevant. Strikingly, spatial-motor value associations affected behavior across all outcome-relevant features and stages of the task, consistent with credit assignment to low-level state-independent task representations. Individual difference analyses suggested that the impact of spatial-motor value formation was attenuated for individuals who showed greater deployment of goal-directed (model-based) strategies. Our findings highlight a need for a reconsideration of how model-free representations are formed and regulated according to the structure of the environment.

摘要

无模型学习使智能体能够基于先前的经验做出更好的决策,同时对环境结构的了解最少。通常假设无模型状态表示基于环境的与结果相关的特征。在这里,我们通过提供证据挑战了这一假设,即假定的无模型系统将信用分配给与结果无关的任务表示。我们检查了 769 名个体执行描述良好的两步奖励决策任务的数据,其中刺激身份而不是任务的空间-运动方面预测奖励。我们表明,尽管结果无关,但参与者仍会为空间-运动表示分配价值。引人注目的是,尽管空间-运动价值关联受到与任务相关的所有特征和阶段的影响,但与低水平的、与状态无关的任务表示的信用分配一致。个体差异分析表明,对于表现出更大目标导向(基于模型)策略的个体,空间-运动价值形成的影响会减弱。我们的研究结果强调了需要重新考虑根据环境结构形成和调节无模型表示的方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/6f8b229d60f3/pnas.1821647116fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/9bf8445492f6/pnas.1821647116fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/75b640451673/pnas.1821647116fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/d9d1286f97f7/pnas.1821647116fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/6f8b229d60f3/pnas.1821647116fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/9bf8445492f6/pnas.1821647116fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/75b640451673/pnas.1821647116fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/d9d1286f97f7/pnas.1821647116fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba47/6689934/6f8b229d60f3/pnas.1821647116fig04.jpg

相似文献

1
Credit assignment to state-independent task representations and its relationship with model-based decision making.状态独立任务表示的信用分配及其与基于模型的决策的关系。
Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15871-15876. doi: 10.1073/pnas.1821647116. Epub 2019 Jul 18.
2
Prefrontal cortex state representations shape human credit assignment.前额皮质状态表示塑造人类的信用分配。
Elife. 2023 Jul 3;12:e84888. doi: 10.7554/eLife.84888.
3
Reward activates stimulus-specific and task-dependent representations in visual association cortices.奖励激活视觉联合皮层中刺激特异性和任务依赖性的表示。
J Neurosci. 2014 Nov 19;34(47):15610-20. doi: 10.1523/JNEUROSCI.1640-14.2014.
4
Efficiency and prioritization of inference-based credit assignment.基于推理的信用分配的效率与优先级划分
Curr Biol. 2021 Jul 12;31(13):2747-2756.e6. doi: 10.1016/j.cub.2021.03.091. Epub 2021 Apr 21.
5
Credit assignment in movement-dependent reinforcement learning.运动依赖型强化学习中的信用分配
Proc Natl Acad Sci U S A. 2016 Jun 14;113(24):6797-802. doi: 10.1073/pnas.1523669113. Epub 2016 May 31.
6
Reward-Mediated, Model-Free Reinforcement-Learning Mechanisms in Pavlovian and Instrumental Tasks Are Related.奖赏介导的、无模型的强化学习机制在条件反射和工具性任务中是相关的。
J Neurosci. 2023 Jan 18;43(3):458-471. doi: 10.1523/JNEUROSCI.1113-22.2022. Epub 2022 Oct 10.
7
Credit Assignment in a Motor Decision Making Task Is Influenced by Agency and Not Sensory Prediction Errors.在一项运动决策任务中,信用分配受机构影响,而不受感官预测误差影响。
J Neurosci. 2018 May 9;38(19):4521-4530. doi: 10.1523/JNEUROSCI.3601-17.2018. Epub 2018 Apr 12.
8
Learning explainable task-relevant state representation for model-free deep reinforcement learning.学习可解释的任务相关状态表示,用于无模型深度强化学习。
Neural Netw. 2024 Dec;180:106741. doi: 10.1016/j.neunet.2024.106741. Epub 2024 Sep 20.
9
Contrasting Effects of Medial and Lateral Orbitofrontal Cortex Lesions on Credit Assignment and Decision-Making in Humans.内侧和外侧眶额皮质损伤对人类信用分配和决策的对比影响
J Neurosci. 2017 Jul 19;37(29):7023-7035. doi: 10.1523/JNEUROSCI.0692-17.2017. Epub 2017 Jun 19.
10
Human subjects exploit a cognitive map for credit assignment.人类主体利用认知图进行信用分配。
Proc Natl Acad Sci U S A. 2021 Jan 26;118(4). doi: 10.1073/pnas.2016884118.

引用本文的文献

1
A State-Transition-Free Delayed-Feedback Task Elicits Heterogeneous Human Responses.一项无状态转换的延迟反馈任务引发了人类的异质性反应。
J Cogn. 2025 Jul 14;8(1):39. doi: 10.5334/joc.453. eCollection 2025.
2
Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour.两步序贯决策任务行为中持续重复和基于启发式的定向探索特征
Comput Psychiatr. 2025 Feb 11;9(1):39-62. doi: 10.5334/cpsy.101. eCollection 2025.
3
Using recurrent neural network to estimate irreducible stochasticity in human choice behavior.

本文引用的文献

1
Raincloud plots: a multi-platform tool for robust data visualization.雨云图:一种用于稳健数据可视化的多平台工具。
Wellcome Open Res. 2021 Jan 21;4:63. doi: 10.12688/wellcomeopenres.15191.2. eCollection 2019.
2
Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling.使用反应时和漂移扩散建模改进两阶段决策任务中基于模型的决策估计的可靠性。
PLoS Comput Biol. 2019 Feb 13;15(2):e1006803. doi: 10.1371/journal.pcbi.1006803. eCollection 2019 Feb.
3
An Integrated Model of Action Selection: Distinct Modes of Cortical Control of Striatal Decision Making.
使用递归神经网络估计人类选择行为中的不可约随机性。
Elife. 2024 Sep 6;13:RP90082. doi: 10.7554/eLife.90082.
4
The challenge of learning adaptive mental behavior.学习自适应心理行为的挑战。
J Psychopathol Clin Sci. 2024 Jul;133(5):413-426. doi: 10.1037/abn0000924. Epub 2024 May 30.
5
Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts.主动强化学习与动作偏差和滞后的比较:混合专家与非专家的控制。
PLoS Comput Biol. 2024 Mar 29;20(3):e1011950. doi: 10.1371/journal.pcbi.1011950. eCollection 2024 Mar.
6
Comparable roles for serotonin in rats and humans for computations underlying flexible decision-making.血清素在大鼠和人类的灵活决策计算中具有可比作用。
Neuropsychopharmacology. 2024 Feb;49(3):600-608. doi: 10.1038/s41386-023-01762-6. Epub 2023 Nov 1.
7
Task state representations in vmPFC mediate relevant and irrelevant value signals and their behavioral influence.vmPFC 中的任务状态表示介导相关和不相关的价值信号及其行为影响。
Nat Commun. 2023 May 31;14(1):3156. doi: 10.1038/s41467-023-38709-w.
8
Hippocampal spatio-predictive cognitive maps adaptively guide reward generalization.海马体空间预测认知图自适应地引导奖励泛化。
Nat Neurosci. 2023 Apr;26(4):615-626. doi: 10.1038/s41593-023-01283-x. Epub 2023 Apr 3.
9
Rethinking model-based and model-free influences on mental effort and striatal prediction errors.重新思考基于模型和无模型的因素对心理努力和纹状体预测误差的影响。
Nat Hum Behav. 2023 Jun;7(6):956-969. doi: 10.1038/s41562-023-01573-1. Epub 2023 Apr 3.
10
Importance of prefrontal meta control in human-like reinforcement learning.前额叶元控制在类人强化学习中的重要性。
Front Comput Neurosci. 2022 Dec 21;16:1060101. doi: 10.3389/fncom.2022.1060101. eCollection 2022.
动作选择的综合模型:纹状体决策的皮质控制的不同模式。
Annu Rev Psychol. 2019 Jan 4;70:53-76. doi: 10.1146/annurev-psych-010418-102824. Epub 2018 Sep 27.
4
Cohort Profile: The NSPN 2400 Cohort: a developmental sample supporting the Wellcome Trust NeuroScience in Psychiatry Network.队列简介:NSPN 2400队列:一个支持惠康信托基金会精神病学神经科学网络的发育样本。
Int J Epidemiol. 2018 Feb 1;47(1):18-19g. doi: 10.1093/ije/dyx117.
5
Defining the place of habit in substance use disorders.界定习惯在物质使用障碍中的地位。
Prog Neuropsychopharmacol Biol Psychiatry. 2018 Dec 20;87(Pt A):22-32. doi: 10.1016/j.pnpbp.2017.06.029. Epub 2017 Jun 27.
6
When Does Model-Based Control Pay Off?基于模型的控制何时能带来回报?
PLoS Comput Biol. 2016 Aug 26;12(8):e1005090. doi: 10.1371/journal.pcbi.1005090. eCollection 2016 Aug.
7
Credit assignment in movement-dependent reinforcement learning.运动依赖型强化学习中的信用分配
Proc Natl Acad Sci U S A. 2016 Jun 14;113(24):6797-802. doi: 10.1073/pnas.1523669113. Epub 2016 May 31.
8
From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning.从习惯型生物到目标导向型学习者:追踪基于模型的强化学习的发展出现过程
Psychol Sci. 2016 Jun;27(6):848-58. doi: 10.1177/0956797616639301. Epub 2016 Apr 15.
9
Characterizing a psychiatric symptom dimension related to deficits in goal-directed control.表征与目标导向控制缺陷相关的一种精神症状维度。
Elife. 2016 Mar 1;5:e11305. doi: 10.7554/eLife.11305.
10
Action-outcome relationships are represented differently by medial prefrontal and orbitofrontal cortex neurons during action execution.在动作执行过程中,内侧前额叶和眶额皮质神经元对动作-结果关系的表征有所不同。
J Neurophysiol. 2015 Dec;114(6):3374-85. doi: 10.1152/jn.00884.2015. Epub 2015 Oct 14.