Uchibe Eiji, Doya Kenji
Okinawa Institute of Science and Technology, Okinawa 904-2234, Japan.
Neural Netw. 2008 Dec;21(10):1447-55. doi: 10.1016/j.neunet.2008.09.013. Epub 2008 Oct 9.
Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.
理解奖励函数的设计原理在人工智能和神经科学领域都是一项重大挑战。成功完成一项任务通常不仅需要对目标给予奖励,还需要对中间状态给予奖励,以促进有效的探索。本文提出了一种通过结合约束策略梯度强化学习和具身进化来设计自主智能体“内在”奖励的方法。为了验证该方法,我们使用了Cyber Rodent机器人,其中避障、从电池组充电以及通过软件复制进行“交配”是三种主要的“外在”奖励。我们在硬件实验中表明,机器人可以为电池组和其他机器人的视觉找到合适的“内在”奖励,以促进接近行为。