通过具身进化和受限强化学习寻找内在奖励。

Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

作者信息

Uchibe Eiji, Doya Kenji

机构信息

Okinawa Institute of Science and Technology, Okinawa 904-2234, Japan.

出版信息

Neural Netw. 2008 Dec;21(10):1447-55. doi: 10.1016/j.neunet.2008.09.013. Epub 2008 Oct 9.

DOI:10.1016/j.neunet.2008.09.013

PMID:19013054

Abstract

Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

摘要

理解奖励函数的设计原理在人工智能和神经科学领域都是一项重大挑战。成功完成一项任务通常不仅需要对目标给予奖励，还需要对中间状态给予奖励，以促进有效的探索。本文提出了一种通过结合约束策略梯度强化学习和具身进化来设计自主智能体“内在”奖励的方法。为了验证该方法，我们使用了Cyber Rodent机器人，其中避障、从电池组充电以及通过软件复制进行“交配”是三种主要的“外在”奖励。我们在硬件实验中表明，机器人可以为电池组和其他机器人的视觉找到合适的“内在”奖励，以促进接近行为。

相似文献

Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

Neural Netw. 2008 Dec;21(10):1447-55. doi: 10.1016/j.neunet.2008.09.013. Epub 2008 Oct 9.

Reinforcement learning of motor skills with policy gradients.

Neural Netw. 2008 May;21(4):682-97. doi: 10.1016/j.neunet.2008.02.003. Epub 2008 Apr 26.

A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.

Biosystems. 2004 Nov;77(1-3):109-17. doi: 10.1016/j.biosystems.2004.05.001.

Intrinsically motivated action-outcome learning and goal-based action recall: a system-level bio-constrained computational model.

Neural Netw. 2013 May;41:168-87. doi: 10.1016/j.neunet.2012.09.015. Epub 2012 Oct 4.

Evolving self-assembly in autonomous homogeneous robots: experiments with two physical robots.

Artif Life. 2009 Fall;15(4):465-84. doi: 10.1162/artl.2009.Ampatzis.013.

Efficient exploration through active learning for value function approximation in reinforcement learning.

Neural Netw. 2010 Jun;23(5):639-48. doi: 10.1016/j.neunet.2009.12.010. Epub 2010 Jan 11.

[Mobile autonomous robots-Possibilities and limits].

Zentralbl Chir. 2002 Feb;127(2):134-40. doi: 10.1055/s-2002-22032.

[Mathematical models of decision making and learning].

Brain Nerve. 2008 Jul;60(7):791-8.

Improved Adaptive-Reinforcement Learning Control for morphing unmanned air vehicles.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):1014-20. doi: 10.1109/TSMCB.2008.922018.

Quantum reinforcement learning.

IEEE Trans Syst Man Cybern B Cybern. 2008 Oct;38(5):1207-20. doi: 10.1109/TSMCB.2008.925743.

引用本文的文献

Artificial Development by Reinforcement Learning Can Benefit From Multiple Motivations.

Front Robot AI. 2019 Feb 14;6:6. doi: 10.3389/frobt.2019.00006. eCollection 2019.

Curr Psychiatry Rep. 2015 Dec;17(12):96. doi: 10.1007/s11920-015-0634-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过具身进化和受限强化学习寻找内在奖励。

Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献