回溯模型基推理指导无模型信用分配。

Retrospective model-based inference guides model-free credit assignment.

机构信息

Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, 10-12 Russell Square, London, WC1B 5EH, UK.

Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3BG, United Kingdom.

出版信息

Nat Commun. 2019 Feb 14;10(1):750. doi: 10.1038/s41467-019-08662-8.

DOI:10.1038/s41467-019-08662-8

PMID:30765718

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6375980/

Abstract

An extensive reinforcement learning literature shows that organisms assign credit efficiently, even under conditions of state uncertainty. However, little is known about credit-assignment when state uncertainty is subsequently resolved. Here, we address this problem within the framework of an interaction between model-free (MF) and model-based (MB) control systems. We present and support experimentally a theory of MB retrospective-inference. Within this framework, a MB system resolves uncertainty that prevailed when actions were taken thus guiding an MF credit-assignment. Using a task in which there was initial uncertainty about the lotteries that were chosen, we found that when participants' momentary uncertainty about which lottery had generated an outcome was resolved by provision of subsequent information, participants preferentially assigned credit within a MF system to the lottery they retrospectively inferred was responsible for this outcome. These findings extend our knowledge about the range of MB functions and the scope of system interactions.

摘要

大量强化学习文献表明，即使在状态不确定的情况下，生物也能有效地分配信用。然而，当状态不确定性随后得到解决时，关于信用分配的了解甚少。在这里，我们在无模型（MF）和基于模型（MB）控制系统之间的交互框架内解决了这个问题。我们提出并通过实验支持了 MB 回溯推理的理论。在这个框架内，MB 系统解决了在采取行动时存在的不确定性，从而指导 MF 信用分配。使用一个初始时对所选择的彩票存在不确定性的任务，我们发现，当参与者对产生结果的彩票的瞬间不确定性通过提供后续信息得到解决时，参与者在 MF 系统中更倾向于将信用分配给他们回溯推断对这一结果负责的彩票。这些发现扩展了我们对 MB 功能范围和系统交互范围的认识。

相似文献

Retrospective model-based inference guides model-free credit assignment.回溯模型基推理指导无模型信用分配。

Nat Commun. 2019 Feb 14;10(1):750. doi: 10.1038/s41467-019-08662-8.

Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference.多巴胺通过增强回溯模型基推断来增强无模型信用分配。

Elife. 2021 Dec 9;10:e67778. doi: 10.7554/eLife.67778.

Human subjects exploit a cognitive map for credit assignment.人类主体利用认知图进行信用分配。

Proc Natl Acad Sci U S A. 2021 Jan 26;118(4). doi: 10.1073/pnas.2016884118.

Efficiency and prioritization of inference-based credit assignment.基于推理的信用分配的效率与优先级划分

Curr Biol. 2021 Jul 12;31(13):2747-2756.e6. doi: 10.1016/j.cub.2021.03.091. Epub 2021 Apr 21.

Credit Assignment in a Motor Decision Making Task Is Influenced by Agency and Not Sensory Prediction Errors.在一项运动决策任务中，信用分配受机构影响，而不受感官预测误差影响。

J Neurosci. 2018 May 9;38(19):4521-4530. doi: 10.1523/JNEUROSCI.3601-17.2018. Epub 2018 Apr 12.

Statistical mechanics of structural and temporal credit assignment effects on learning in neural networks.神经网络中结构和时间信用分配对学习影响的统计力学

Phys Rev E Stat Nonlin Soft Matter Phys. 2011 May;83(5 Pt 1):051125. doi: 10.1103/PhysRevE.83.051125. Epub 2011 May 20.

Neural mechanisms of credit assignment for inferred relationships in a structured world.在结构化世界中对推断关系进行信用分配的神经机制。

Neuron. 2022 Aug 17;110(16):2680-2690.e9. doi: 10.1016/j.neuron.2022.05.021. Epub 2022 Jun 16.

Surprise-minimization as a solution to the structural credit assignment problem.将惊喜最小化作为解决结构性信用分配问题的一种方法。

PLoS Comput Biol. 2024 May 28;20(5):e1012175. doi: 10.1371/journal.pcbi.1012175. eCollection 2024 May.

Credit assignment during movement reinforcement learning.运动强化学习中的信用分配。

PLoS One. 2013;8(2):e55352. doi: 10.1371/journal.pone.0055352. Epub 2013 Feb 8.

Credit assignment in hierarchical option transfer.分层期权转移中的信用分配

Cogsci. 2022 Jul;44:948-954.

引用本文的文献

Noradrenergic and Dopaminergic modulation of meta-cognition and meta-control.去甲肾上腺素能和多巴胺能对元认知和元控制的调节。

PLoS Comput Biol. 2025 Feb 26;21(2):e1012675. doi: 10.1371/journal.pcbi.1012675. eCollection 2025 Feb.

Striatal arbitration between choice strategies guides few-shot adaptation.选择策略之间的纹状体仲裁引导少样本适应。

Nat Commun. 2025 Feb 20;16(1):1811. doi: 10.1038/s41467-025-57049-5.

Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour.两步序贯决策任务行为中持续重复和基于启发式的定向探索特征

Comput Psychiatr. 2025 Feb 11;9(1):39-62. doi: 10.5334/cpsy.101. eCollection 2025.

Understanding learning through uncertainty and bias.通过不确定性和偏差来理解学习。

Commun Psychol. 2025 Feb 13;3(1):24. doi: 10.1038/s44271-025-00203-y.

Contributions of Attention to Learning in Multidimensional Reward Environments.在多维奖励环境中注意力对学习的贡献。

J Neurosci. 2025 Feb 12;45(7):e2300232024. doi: 10.1523/JNEUROSCI.2300-23.2024.

Model-based and model-free mechanisms in methamphetamine use disorder.基于模型和无模型机制在冰毒使用障碍中的作用。

Addict Biol. 2024 Jan;29(1):e13356. doi: 10.1111/adb.13356.

Cortical Grey Matter Mediates Increases in Model-Based Control and Learning from Positive Feedback from Adolescence to Adulthood.皮质灰质介导了从青春期到成年的基于模型的控制和从正反馈中学习的增加。

J Neurosci. 2023 Mar 22;43(12):2178-2189. doi: 10.1523/JNEUROSCI.1418-22.2023. Epub 2023 Feb 23.

Model-based learning retrospectively updates model-free values.基于模型的学习会对无模型值进行回顾性更新。

Sci Rep. 2022 Feb 11;12(1):2358. doi: 10.1038/s41598-022-05567-3.

Optimism and pessimism in optimised replay.优化重放中的乐观主义和悲观主义。

PLoS Comput Biol. 2022 Jan 12;18(1):e1009634. doi: 10.1371/journal.pcbi.1009634. eCollection 2022 Jan.

Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference.多巴胺通过增强回溯模型基推断来增强无模型信用分配。

Elife. 2021 Dec 9;10:e67778. doi: 10.7554/eLife.67778.

本文引用的文献

Belief state representation in the dopamine system.多巴胺系统中的信念状态表示。

Nat Commun. 2018 May 14;9(1):1891. doi: 10.1038/s41467-018-04397-0.

Dopamine reward prediction error signal codes the temporal evaluation of a perceptual decision report.多巴胺奖赏预测误差信号编码了对感知决策报告的时间评估。

Proc Natl Acad Sci U S A. 2017 Nov 28;114(48):E10494-E10503. doi: 10.1073/pnas.1712479114. Epub 2017 Nov 13.

Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision.中脑多巴胺神经元在知觉决策中对选择准确性的置信度进行信号传递。

Curr Biol. 2017 Mar 20;27(6):821-832. doi: 10.1016/j.cub.2017.02.026. Epub 2017 Mar 9.

Dopamine reward prediction errors reflect hidden-state inference across time.多巴胺奖励预测误差反映了跨时间的隐藏状态推理。

Nat Neurosci. 2017 Apr;20(4):581-589. doi: 10.1038/nn.4520. Epub 2017 Mar 6.

Adaptive integration of habits into depth-limited planning defines a habitual-goal-directed spectrum.将习惯适应性地整合到深度受限的规划中定义了一个习惯-目标导向频谱。

Proc Natl Acad Sci U S A. 2016 Nov 8;113(45):12868-12873. doi: 10.1073/pnas.1609094113. Epub 2016 Oct 24.

Intrinsic Valuation of Information in Decision Making under Uncertainty.不确定性下决策中信息的内在估值

PLoS Comput Biol. 2016 Jul 14;12(7):e1005020. doi: 10.1371/journal.pcbi.1005020. eCollection 2016 Jul.

The modulation of savouring by prediction error and its effects on choice.预测误差对味觉享受的调节及其对选择的影响。

Elife. 2016 Apr 21;5:e13747. doi: 10.7554/eLife.13747.

Habitual control of goal selection in humans.人类目标选择的习惯性控制。

Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):13817-22. doi: 10.1073/pnas.1506367112. Epub 2015 Oct 12.

Irrational choice and the value of information.非理性选择与信息价值

Sci Rep. 2015 Sep 9;5:13874. doi: 10.1038/srep13874.

Old processes, new perspectives: Familiarity is correlated with (not independent of) recollection and is more (not equally) variable for targets than for lures.旧过程，新视角：熟悉度与回忆相关（并非独立于回忆），且目标的熟悉度变化比诱饵更大（并非相同）。

Cogn Psychol. 2015 Jun;79:40-67. doi: 10.1016/j.cogpsych.2015.01.005. Epub 2015 Apr 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

回溯模型基推理指导无模型信用分配。

Retrospective model-based inference guides model-free credit assignment.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献