Smith Andrew James
The Division of Informatics, Institute for Adaptive and Neural Computation, University of Edinburgh, UK.
Neural Netw. 2002 Oct-Nov;15(8-9):1107-24. doi: 10.1016/s0893-6080(02)00083-7.
This article is concerned with the representation and generalisation of continuous action spaces in reinforcement learning (RL) problems. A model is proposed based on the self-organising map (SOM) of Kohonen [Self Organisation and Associative Memory, 1987] which allows either the one-to-one, many-to-one or one-to-many structure of the desired state-action mapping to be captured. Although presented here for tasks involving immediate reward, the approach is easily extended to delayed reward. We conclude that the SOM is a useful tool for providing real-time, on-line generalisation in RL problems in which the latent dimensionalities of the state and action spaces are small. Scalability issues are also discussed.
本文关注强化学习(RL)问题中连续动作空间的表示与泛化。基于科霍宁的自组织映射(SOM)[《自组织与联想记忆》,1987年]提出了一个模型,该模型能够捕捉期望状态-动作映射的一对一、多对一或一对多结构。尽管这里是针对涉及即时奖励的任务提出的,但该方法可轻松扩展到延迟奖励。我们得出结论,在状态和动作空间的潜在维度较小的RL问题中,SOM是提供实时在线泛化的有用工具。还讨论了可扩展性问题。