通过传递价值来优化代理在长时间尺度上的行为。

Optimizing agent behavior over long time scales by transporting value.

机构信息

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

出版信息

Nat Commun. 2019 Nov 19;10(1):5223. doi: 10.1038/s41467-019-13073-w.

DOI:10.1038/s41467-019-13073-w

PMID:31745075

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6864102/

Abstract

Humans prolifically engage in mental time travel. We dwell on past actions and experience satisfaction or regret. More than storytelling, these recollections change how we act in the future and endow us with a computationally important ability to link actions and consequences across spans of time, which helps address the problem of long-term credit assignment: the question of how to evaluate the utility of actions within a long-duration behavioral sequence. Existing approaches to credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a paradigm where agents use recall of specific memories to credit past actions, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire models in neuroscience, psychology, and behavioral economics.

摘要

人类频繁地进行心理时间旅行。我们沉湎于过去的行为，并从中获得满足或遗憾。这些回忆不仅仅是讲故事，它们还改变了我们未来的行为方式，使我们拥有一种在时间跨度上链接行为和后果的计算上重要的能力，这有助于解决长期信用分配问题：即在一个长时间的行为序列中，如何评估行为的效用的问题。现有的人工智能信用分配方法无法解决行为与后果之间存在长时间延迟的任务。在这里，我们引入了一种范式，其中代理使用特定记忆的回忆来为过去的行为赋予信用，使它们能够解决现有算法无法解决的问题。这种范式拓宽了可以在人工智能中研究的问题范围，并提供了一种对可能启发神经科学、心理学和行为经济学模型的行为的机械解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb99/6864102/29929d7a3e25/41467_2019_13073_Fig1_HTML.jpg

相似文献

Optimizing agent behavior over long time scales by transporting value.通过传递价值来优化代理在长时间尺度上的行为。

Nat Commun. 2019 Nov 19;10(1):5223. doi: 10.1038/s41467-019-13073-w.

Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective.分层组织行为及其神经基础：强化学习视角

Cognition. 2009 Dec;113(3):262-280. doi: 10.1016/j.cognition.2008.08.011. Epub 2008 Oct 15.

Human subjects exploit a cognitive map for credit assignment.人类主体利用认知图进行信用分配。

Proc Natl Acad Sci U S A. 2021 Jan 26;118(4). doi: 10.1073/pnas.2016884118.

Dissociable Neural Systems Support the Learning and Transfer of Hierarchical Control Structure.可分离的神经系统支持分层控制结构的学习和转移。

J Neurosci. 2020 Aug 19;40(34):6624-6637. doi: 10.1523/JNEUROSCI.0847-20.2020. Epub 2020 Jul 20.

Solving the credit assignment problem: explicit and implicit learning of action sequences with probabilistic outcomes.解决信用分配问题：具有概率性结果的动作序列的显性和隐性学习。

Psychol Res. 2008 May;72(3):321-30. doi: 10.1007/s00426-007-0113-7. Epub 2007 Apr 20.

Surprise-minimization as a solution to the structural credit assignment problem.将惊喜最小化作为解决结构性信用分配问题的一种方法。

PLoS Comput Biol. 2024 May 28;20(5):e1012175. doi: 10.1371/journal.pcbi.1012175. eCollection 2024 May.

The challenge of learning adaptive mental behavior.学习自适应心理行为的挑战。

J Psychopathol Clin Sci. 2024 Jul;133(5):413-426. doi: 10.1037/abn0000924. Epub 2024 May 30.

Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning.作为强化学习中知识迁移方法的学习预测结果。

IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2259-2270. doi: 10.1109/TNNLS.2017.2690910. Epub 2017 Apr 17.

When is Psychology Research Useful in Artificial Intelligence? A Case for Reducing Computational Complexity in Problem Solving.什么时候心理学研究对人工智能有用？以降低问题解决中的计算复杂度为例。

Top Cogn Sci. 2022 Oct;14(4):687-701. doi: 10.1111/tops.12572. Epub 2021 Aug 31.

Means-end problem solving in infancy: Development, emergence of intentionality, and transfer of knowledge.婴儿期的手段-目的问题解决：发展、意向性的出现和知识的迁移。

Dev Psychobiol. 2019 Mar;61(2):191-202. doi: 10.1002/dev.21798. Epub 2018 Nov 2.

引用本文的文献

Intelligent anti-jamming decision algorithm for wireless communication under limited channel state information conditions.有限信道状态信息条件下无线通信的智能抗干扰决策算法

Sci Rep. 2025 Feb 20;15(1):6271. doi: 10.1038/s41598-025-90201-1.

Elements of episodic memory: insights from artificial agents.情节记忆的要素：人工智能视角下的新见解。

Philos Trans R Soc Lond B Biol Sci. 2024 Nov 4;379(1913):20230416. doi: 10.1098/rstb.2023.0416. Epub 2024 Sep 16.

Champion-level drone racing using deep reinforcement learning.使用深度强化学习的冠军级无人机竞速。

Nature. 2023 Aug;620(7976):982-987. doi: 10.1038/s41586-023-06419-4. Epub 2023 Aug 30.

Bridging adaptive management and reinforcement learning for more robust decisions.通过自适应管理和强化学习实现更稳健的决策。

Philos Trans R Soc Lond B Biol Sci. 2023 Jul 17;378(1881):20220195. doi: 10.1098/rstb.2022.0195. Epub 2023 May 29.

本文引用的文献

Reinforcement Learning, Fast and Slow.强化学习：快与慢。

Trends Cogn Sci. 2019 May;23(5):408-422. doi: 10.1016/j.tics.2019.02.006. Epub 2019 Apr 16.

A Dual-Self Model of Impulse Control.冲动控制的双重自我模型。

Am Econ Rev. 2006 Dec;96(5):1449-76. doi: 10.1257/aer.96.5.1449.

Neuroscience-Inspired Artificial Intelligence.神经科学启发的人工智能。

Neuron. 2017 Jul 19;95(2):245-258. doi: 10.1016/j.neuron.2017.06.011.

Hybrid computing using a neural network with dynamic external memory.使用具有动态外部存储器的神经网络进行混合计算。

Nature. 2016 Oct 27;538(7626):471-476. doi: 10.1038/nature20101. Epub 2016 Oct 12.

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework.人类和动物中的强化学习与情景记忆：一个综合框架

Annu Rev Psychol. 2017 Jan 3;68:101-128. doi: 10.1146/annurev-psych-122414-033625. Epub 2016 Sep 2.

Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。

Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.

Episodic future thinking reduces reward delay discounting through an enhancement of prefrontal-mediotemporal interactions.间歇性未来思维通过增强前额叶-中颞叶相互作用来减少奖励延迟折扣。

Neuron. 2010 Apr 15;66(1):138-48. doi: 10.1016/j.neuron.2010.03.026.

Cognitive maps in rats and men.大鼠和人类的认知地图。

Psychol Rev. 1948 Jul;55(4):189-208. doi: 10.1037/h0061626.

Using imagination to understand the neural basis of episodic memory.运用想象力理解情景记忆的神经基础。

J Neurosci. 2007 Dec 26;27(52):14365-74. doi: 10.1523/JNEUROSCI.4549-07.2007.

Remembering the past to imagine the future: the prospective brain.铭记过去以畅想未来：前瞻性大脑。

Nat Rev Neurosci. 2007 Sep;8(9):657-61. doi: 10.1038/nrn2213.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过传递价值来优化代理在长时间尺度上的行为。

Optimizing agent behavior over long time scales by transporting value.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献