Suppr超能文献

通过具身进化和受限强化学习寻找内在奖励。

Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

作者信息

Uchibe Eiji, Doya Kenji

机构信息

Okinawa Institute of Science and Technology, Okinawa 904-2234, Japan.

出版信息

Neural Netw. 2008 Dec;21(10):1447-55. doi: 10.1016/j.neunet.2008.09.013. Epub 2008 Oct 9.

Abstract

Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

摘要

理解奖励函数的设计原理在人工智能和神经科学领域都是一项重大挑战。成功完成一项任务通常不仅需要对目标给予奖励,还需要对中间状态给予奖励,以促进有效的探索。本文提出了一种通过结合约束策略梯度强化学习和具身进化来设计自主智能体“内在”奖励的方法。为了验证该方法,我们使用了Cyber Rodent机器人,其中避障、从电池组充电以及通过软件复制进行“交配”是三种主要的“外在”奖励。我们在硬件实验中表明,机器人可以为电池组和其他机器人的视觉找到合适的“内在”奖励,以促进接近行为。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验