Suppr
超能文献

连续时间和空间中的强化学习：使用分布式函数逼近器时，主要问题是干扰而非病态。

Reinforcement learning in continuous time and space: interference and not ill conditioning is the main problem when using distributed function approximators.

作者信息

Baddeley Bart

机构信息

Centre for Computational Neuroscience and Robotics, Department of Informatics, University of Sussex, Brighton, UK.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):950-6. doi: 10.1109/TSMCB.2008.921000.

DOI:10.1109/TSMCB.2008.921000

PMID:18632383

Abstract

Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and in this instance, RL techniques require the use of function approximators for learning value functions and policies. Often, local linear models have been preferred over distributed nonlinear models for function approximation in RL. We suggest that one reason for the difficulties encountered when using distributed architectures in RL is the problem of negative interference, whereby learning of new data disrupts previously learned mappings. The continuous temporal difference (TD) learning algorithm TD(lambda) was used to learn a value function in a limited-torque pendulum swing-up task using a multilayer perceptron (MLP) network. Three different approaches were examined for learning in the MLP networks; 1) simple gradient descent; 2) vario-eta; and 3) a pseudopattern rehearsal strategy that attempts to reduce the effects of interference. Our results show that MLP networks can be used for value function approximation in this task but require long training times. We also found that vario-eta destabilized learning and resulted in a failure of the learning process to converge. Finally, we showed that the pseudopattern rehearsal strategy drastically improved the speed of learning. The results indicate that interference is a greater problem than ill conditioning for this task.

摘要

强化学习（RL）中的许多有趣问题都是连续的和/或高维的，在这种情况下，RL技术需要使用函数逼近器来学习价值函数和策略。通常，在RL的函数逼近中，局部线性模型比分布式非线性模型更受青睐。我们认为，在RL中使用分布式架构时遇到困难的一个原因是负干扰问题，即新数据的学习会破坏先前学习的映射。使用连续时间差分（TD）学习算法TD（λ），通过多层感知器（MLP）网络在有限扭矩摆起任务中学习价值函数。研究了在MLP网络中进行学习的三种不同方法：1）简单梯度下降；2）变η；3）一种试图减少干扰影响的伪模式排练策略。我们的结果表明，MLP网络可用于此任务中的价值函数逼近，但需要较长的训练时间。我们还发现变η会使学习不稳定，并导致学习过程无法收敛。最后，我们表明伪模式排练策略极大地提高了学习速度。结果表明，对于此任务，干扰比病态条件是一个更大的问题。

相似文献

Reinforcement learning in continuous time and space: interference and not ill conditioning is the main problem when using distributed function approximators.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):950-6. doi: 10.1109/TSMCB.2008.921000.

Ensemble algorithms in reinforcement learning.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):930-6. doi: 10.1109/TSMCB.2008.920231.

Improved Adaptive-Reinforcement Learning Control for morphing unmanned air vehicles.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):1014-20. doi: 10.1109/TSMCB.2008.922018.

Robust reinforcement learning.

Neural Comput. 2005 Feb;17(2):335-59. doi: 10.1162/0899766053011528.

A learning rule for very simple universal approximators consisting of a single layer of perceptrons.

Neural Netw. 2008 Jun;21(5):786-95. doi: 10.1016/j.neunet.2007.12.036. Epub 2007 Dec 31.

Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):994-1001. doi: 10.1109/TSMCB.2008.926607.

An evolutionary approach toward dynamic self-generated fuzzy inference systems.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):963-9. doi: 10.1109/TSMCB.2008.922053.

Incoherent control of quantum systems with wavefunction-controllable subspaces via quantum reinforcement learning.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):957-62. doi: 10.1109/TSMCB.2008.926603.

Direct heuristic dynamic programming for damping oscillations in a large power system.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):1008-13. doi: 10.1109/TSMCB.2008.923157.

Reinforcement learning in continuous time and space.

Neural Comput. 2000 Jan;12(1):219-45. doi: 10.1162/089976600300015961.

引用本文的文献

Unregistered biological words recognition by Q-learning with transfer learning.

ScientificWorldJournal. 2014 Feb 19;2014:173290. doi: 10.1155/2014/173290. eCollection 2014.

Segmentation of neuronal structures using SARSA (λ)-based boundary amendment with reinforced gradient-descent curve shape fitting.

PLoS One. 2014 Mar 13;9(3):e90873. doi: 10.1371/journal.pone.0090873. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

连续时间和空间中的强化学习：使用分布式函数逼近器时，主要问题是干扰而非病态。

Reinforcement learning in continuous time and space: interference and not ill conditioning is the main problem when using distributed function approximators.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译