Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India.
Prog Brain Res. 2013;202:465-88. doi: 10.1016/B978-0-444-62604-2.00023-X.
Reinforcement Learning (RL) is a popular paradigm for sequential decision making under uncertainty. A typical RL algorithm operates with only limited knowledge of the environment and with limited feedback on the quality of the decisions. To operate effectively in complex environments, learning agents require the ability to form useful abstractions, that is, the ability to selectively ignore irrelevant details. It is difficult to derive a single representation that is useful for a large problem setting. In this chapter, we describe a hierarchical RL framework that incorporates an algebraic framework for modeling task-specific abstraction. The basic notion that we will explore is that of a homomorphism of a Markov Decision Process (MDP). We mention various extensions of the basic MDP homomorphism framework in order to accommodate different commonly understood notions of abstraction, namely, aspects of selective attention. Parts of the work described in this chapter have been reported earlier in several papers (Narayanmurthy and Ravindran, 2007, 2008; Ravindran and Barto, 2002, 2003a,b; Ravindran et al., 2007).
强化学习(RL)是一种在不确定环境下进行序列决策的流行范例。典型的 RL 算法仅对环境有有限的了解,并且对决策质量的反馈也有限。为了在复杂环境中有效运作,学习代理需要形成有用的抽象的能力,即选择性忽略不相关细节的能力。很难推导出对大型问题设置有用的单一表示形式。在本章中,我们描述了一个分层 RL 框架,该框架包含用于对任务特定抽象进行建模的代数框架。我们将探索的基本概念是马尔可夫决策过程(MDP)的同态。我们提到了基本 MDP 同态框架的各种扩展,以适应不同的通常理解的抽象概念,即选择性注意的各个方面。本章中描述的部分工作已在几篇论文中进行了报道(Narayanmurthy 和 Ravindran,2007 年,2008 年;Ravindran 和 Barto,2002 年,2003a,b;Ravindran 等人,2007 年)。