Suppr超能文献

通过传递价值来优化代理在长时间尺度上的行为。

Optimizing agent behavior over long time scales by transporting value.

机构信息

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

出版信息

Nat Commun. 2019 Nov 19;10(1):5223. doi: 10.1038/s41467-019-13073-w.

Abstract

Humans prolifically engage in mental time travel. We dwell on past actions and experience satisfaction or regret. More than storytelling, these recollections change how we act in the future and endow us with a computationally important ability to link actions and consequences across spans of time, which helps address the problem of long-term credit assignment: the question of how to evaluate the utility of actions within a long-duration behavioral sequence. Existing approaches to credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a paradigm where agents use recall of specific memories to credit past actions, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire models in neuroscience, psychology, and behavioral economics.

摘要

人类频繁地进行心理时间旅行。我们沉湎于过去的行为,并从中获得满足或遗憾。这些回忆不仅仅是讲故事,它们还改变了我们未来的行为方式,使我们拥有一种在时间跨度上链接行为和后果的计算上重要的能力,这有助于解决长期信用分配问题:即在一个长时间的行为序列中,如何评估行为的效用的问题。现有的人工智能信用分配方法无法解决行为与后果之间存在长时间延迟的任务。在这里,我们引入了一种范式,其中代理使用特定记忆的回忆来为过去的行为赋予信用,使它们能够解决现有算法无法解决的问题。这种范式拓宽了可以在人工智能中研究的问题范围,并提供了一种对可能启发神经科学、心理学和行为经济学模型的行为的机械解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb99/6864102/29929d7a3e25/41467_2019_13073_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验