Suppr超能文献

通过同态 - 异稳态价值梯度进行的无聊驱动的好奇学习

Boredom-Driven Curious Learning by Homeo-Heterostatic Value Gradients.

作者信息

Yu Yen, Chang Acer Y C, Kanai Ryota

机构信息

Araya, Inc., Tokyo, Japan.

出版信息

Front Neurorobot. 2019 Jan 22;12:88. doi: 10.3389/fnbot.2018.00088. eCollection 2018.

Abstract

This paper presents the Homeo-Heterostatic Value Gradients (HHVG) algorithm as a formal account on the constructive interplay between boredom and curiosity which gives rise to effective exploration and superior forward model learning. We offer an instrumental view of action selection, in which an action serves to disclose outcomes that have intrinsic meaningfulness to an agent itself. This motivated two central algorithmic ingredients: devaluation and devaluation progress, both underpin agent's cognition concerning intrinsically generated rewards. The two serve as an instantiation of homeostatic and heterostatic intrinsic motivation. A key insight from our algorithm is that the two seemingly opposite motivations can be reconciled-without which exploration and information-gathering cannot be effectively carried out. We supported this claim with empirical evidence, showing that boredom-enabled agents consistently outperformed other curious or explorative agent variants in model building benchmarks based on self-assisted experience accumulation.

摘要

本文提出了同态-异稳态价值梯度(HHVG)算法,作为对无聊和好奇心之间建设性相互作用的一种形式化解释,这种相互作用产生了有效的探索和卓越的前向模型学习。我们提供了一种关于行动选择的工具性观点,其中一个行动旨在揭示对智能体自身具有内在意义的结果。这激发了两个核心算法要素:贬值和贬值进展,二者都支撑着智能体关于内在产生的奖励的认知。这两者是稳态和异稳态内在动机的一种实例化。我们算法的一个关键见解是,这两种看似相反的动机可以协调一致——没有这一点,探索和信息收集就无法有效进行。我们用实证证据支持了这一说法,表明在基于自我辅助经验积累的模型构建基准测试中,受无聊驱动的智能体始终优于其他好奇或探索性的智能体变体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2d3/6349823/8c05119913e2/fnbot-12-00088-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验