Yu Yen, Chang Acer Y C, Kanai Ryota
Araya, Inc., Tokyo, Japan.
Front Neurorobot. 2019 Jan 22;12:88. doi: 10.3389/fnbot.2018.00088. eCollection 2018.
This paper presents the Homeo-Heterostatic Value Gradients (HHVG) algorithm as a formal account on the constructive interplay between boredom and curiosity which gives rise to effective exploration and superior forward model learning. We offer an instrumental view of action selection, in which an action serves to disclose outcomes that have intrinsic meaningfulness to an agent itself. This motivated two central algorithmic ingredients: devaluation and devaluation progress, both underpin agent's cognition concerning intrinsically generated rewards. The two serve as an instantiation of homeostatic and heterostatic intrinsic motivation. A key insight from our algorithm is that the two seemingly opposite motivations can be reconciled-without which exploration and information-gathering cannot be effectively carried out. We supported this claim with empirical evidence, showing that boredom-enabled agents consistently outperformed other curious or explorative agent variants in model building benchmarks based on self-assisted experience accumulation.
本文提出了同态-异稳态价值梯度(HHVG)算法,作为对无聊和好奇心之间建设性相互作用的一种形式化解释,这种相互作用产生了有效的探索和卓越的前向模型学习。我们提供了一种关于行动选择的工具性观点,其中一个行动旨在揭示对智能体自身具有内在意义的结果。这激发了两个核心算法要素:贬值和贬值进展,二者都支撑着智能体关于内在产生的奖励的认知。这两者是稳态和异稳态内在动机的一种实例化。我们算法的一个关键见解是,这两种看似相反的动机可以协调一致——没有这一点,探索和信息收集就无法有效进行。我们用实证证据支持了这一说法,表明在基于自我辅助经验积累的模型构建基准测试中,受无聊驱动的智能体始终优于其他好奇或探索性的智能体变体。