受认知启发的强化学习架构及其在大摆动作控制中的应用。

Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control.

作者信息

Uragami Daisuke, Takahashi Tatsuji, Matsuo Yoshiki

机构信息

School of Computer Science, Tokyo University of Technology, Katakuramachi, Hachioji City, Tokyo 192-0982, Japan.

School of Science and Technology, Tokyo Denki University, Hatoyama, Hiki, Saitama 350-0394, Japan.

出版信息

Biosystems. 2014 Feb;116:1-9. doi: 10.1016/j.biosystems.2013.11.002. Epub 2013 Dec 1.

DOI:10.1016/j.biosystems.2013.11.002

PMID:24296286

Abstract

Many algorithms and methods in artificial intelligence or machine learning were inspired by human cognition. As a mechanism to handle the exploration-exploitation dilemma in reinforcement learning, the loosely symmetric (LS) value function that models causal intuition of humans was proposed (Shinohara et al., 2007). While LS shows the highest correlation with causal induction by humans, it has been reported that it effectively works in multi-armed bandit problems that form the simplest class of tasks representing the dilemma. However, the scope of application of LS was limited to the reinforcement learning problems that have K actions with only one state (K-armed bandit problems). This study proposes LS-Q learning architecture that can deal with general reinforcement learning tasks with multiple states and delayed reward. We tested the learning performance of the new architecture in giant-swing robot motion learning, where uncertainty and unknown-ness of the environment is huge. In the test, the help of ready-made internal models or functional approximation of the state space were not given. The simulations showed that while the ordinary Q-learning agent does not reach giant-swing motion because of stagnant loops (local optima with low rewards), LS-Q escapes such loops and acquires giant-swing. It is confirmed that the smaller number of states is, in other words, the more coarse-grained the division of states and the more incomplete the state observation is, the better LS-Q performs in comparison with Q-learning. We also showed that the high performance of LS-Q depends comparatively little on parameter tuning and learning time. This suggests that the proposed method inspired by human cognition works adaptively in real environments.

摘要

人工智能或机器学习中的许多算法和方法都受到人类认知的启发。作为一种处理强化学习中探索-利用困境的机制，提出了一种模拟人类因果直觉的松散对称（LS）价值函数（Shinohara等人，2007年）。虽然LS与人类的因果归纳显示出最高的相关性，但据报道，它在构成代表该困境的最简单任务类别的多臂老虎机问题中有效发挥作用。然而，LS的应用范围仅限于只有一个状态的K个动作的强化学习问题（K臂老虎机问题）。本研究提出了一种LS-Q学习架构，它可以处理具有多个状态和延迟奖励的一般强化学习任务。我们在巨大摆动机器人运动学习中测试了新架构的学习性能，其中环境的不确定性和未知性很大。在测试中，没有提供现成的内部模型或状态空间的函数逼近的帮助。模拟结果表明，普通的Q学习智能体由于停滞循环（奖励低的局部最优）无法达到巨大摆动运动，而LS-Q能够摆脱这种循环并获得巨大摆动。可以确认，状态数量越少，换句话说，状态划分越粗粒度且状态观察越不完整，与Q学习相比，LS-Q的表现就越好。我们还表明，LS-Q的高性能相对较少依赖于参数调整和学习时间。这表明受人类认知启发的所提出方法在实际环境中具有自适应能力。

相似文献

Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control.受认知启发的强化学习架构及其在大摆动作控制中的应用。

Biosystems. 2014 Feb;116:1-9. doi: 10.1016/j.biosystems.2013.11.002. Epub 2013 Dec 1.

Robotic action acquisition with cognitive biases in coarse-grained state space.粗粒度状态空间中具有认知偏差的机器人动作获取

Biosystems. 2016 Jul;145:41-52. doi: 10.1016/j.biosystems.2016.05.007. Epub 2016 May 16.

Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法：为何乐观值函数能在多臂老虎机问题中找到最优解？

Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.

Walking motion generation, synthesis, and control for biped robot by using PGRL, LPI, and fuzzy logic.基于PGRL、LPI和模糊逻辑的双足机器人步行运动生成、合成与控制

IEEE Trans Syst Man Cybern B Cybern. 2011 Jun;41(3):736-48. doi: 10.1109/TSMCB.2010.2089978. Epub 2010 Nov 18.

Efficient exploration through active learning for value function approximation in reinforcement learning.强化学习中基于主动学习的价值函数逼近的有效探索。

Neural Netw. 2010 Jun;23(5):639-48. doi: 10.1016/j.neunet.2009.12.010. Epub 2010 Jan 11.

A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.一种强化学习中用于快速跟踪意外环境变化的参数控制方法。

Biosystems. 2004 Nov;77(1-3):109-17. doi: 10.1016/j.biosystems.2004.05.001.

Cellular Nonlinear Networks for the emergence of perceptual states: application to robot navigation control.用于感知状态出现的细胞非线性网络：应用于机器人导航控制。

Neural Netw. 2009 Jul-Aug;22(5-6):801-11. doi: 10.1016/j.neunet.2009.06.024. Epub 2009 Jul 1.

MOSAIC for multiple-reward environments.多奖励环境下的 MOSAIC 算法。

Neural Comput. 2012 Mar;24(3):577-606. doi: 10.1162/NECO_a_00246. Epub 2011 Dec 14.

Real-time reinforcement learning by sequential Actor-Critics and experience replay.基于序贯 Actor-Critic 和经验回放的实时强化学习。

Neural Netw. 2009 Dec;22(10):1484-97. doi: 10.1016/j.neunet.2009.05.011. Epub 2009 May 31.

Quantum reinforcement learning.量子强化学习

IEEE Trans Syst Man Cybern B Cybern. 2008 Oct;38(5):1207-20. doi: 10.1109/TSMCB.2008.925743.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

受认知启发的强化学习架构及其在大摆动作控制中的应用。

Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献