Suppr超能文献

受认知启发的强化学习架构及其在大摆动作控制中的应用。

Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control.

作者信息

Uragami Daisuke, Takahashi Tatsuji, Matsuo Yoshiki

机构信息

School of Computer Science, Tokyo University of Technology, Katakuramachi, Hachioji City, Tokyo 192-0982, Japan.

School of Science and Technology, Tokyo Denki University, Hatoyama, Hiki, Saitama 350-0394, Japan.

出版信息

Biosystems. 2014 Feb;116:1-9. doi: 10.1016/j.biosystems.2013.11.002. Epub 2013 Dec 1.

Abstract

Many algorithms and methods in artificial intelligence or machine learning were inspired by human cognition. As a mechanism to handle the exploration-exploitation dilemma in reinforcement learning, the loosely symmetric (LS) value function that models causal intuition of humans was proposed (Shinohara et al., 2007). While LS shows the highest correlation with causal induction by humans, it has been reported that it effectively works in multi-armed bandit problems that form the simplest class of tasks representing the dilemma. However, the scope of application of LS was limited to the reinforcement learning problems that have K actions with only one state (K-armed bandit problems). This study proposes LS-Q learning architecture that can deal with general reinforcement learning tasks with multiple states and delayed reward. We tested the learning performance of the new architecture in giant-swing robot motion learning, where uncertainty and unknown-ness of the environment is huge. In the test, the help of ready-made internal models or functional approximation of the state space were not given. The simulations showed that while the ordinary Q-learning agent does not reach giant-swing motion because of stagnant loops (local optima with low rewards), LS-Q escapes such loops and acquires giant-swing. It is confirmed that the smaller number of states is, in other words, the more coarse-grained the division of states and the more incomplete the state observation is, the better LS-Q performs in comparison with Q-learning. We also showed that the high performance of LS-Q depends comparatively little on parameter tuning and learning time. This suggests that the proposed method inspired by human cognition works adaptively in real environments.

摘要

人工智能或机器学习中的许多算法和方法都受到人类认知的启发。作为一种处理强化学习中探索-利用困境的机制,提出了一种模拟人类因果直觉的松散对称(LS)价值函数(Shinohara等人,2007年)。虽然LS与人类的因果归纳显示出最高的相关性,但据报道,它在构成代表该困境的最简单任务类别的多臂老虎机问题中有效发挥作用。然而,LS的应用范围仅限于只有一个状态的K个动作的强化学习问题(K臂老虎机问题)。本研究提出了一种LS-Q学习架构,它可以处理具有多个状态和延迟奖励的一般强化学习任务。我们在巨大摆动机器人运动学习中测试了新架构的学习性能,其中环境的不确定性和未知性很大。在测试中,没有提供现成的内部模型或状态空间的函数逼近的帮助。模拟结果表明,普通的Q学习智能体由于停滞循环(奖励低的局部最优)无法达到巨大摆动运动,而LS-Q能够摆脱这种循环并获得巨大摆动。可以确认,状态数量越少,换句话说,状态划分越粗粒度且状态观察越不完整,与Q学习相比,LS-Q的表现就越好。我们还表明,LS-Q的高性能相对较少依赖于参数调整和学习时间。这表明受人类认知启发的所提出方法在实际环境中具有自适应能力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验