Doshi Finale, Pineau Joelle, Roy Nicholas
Massachusetts Institute of Technology, Boston, USA,
Proc Int Conf Mach Learn. 2008;301:256-263. doi: 10.1901/jaba.2008.301-256.
Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent's knowledge and actions that increase an agent's reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to specify only from domain knowledge. In this paper, we present an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a "model-uncertainty" POMDP. Coupled with model-directed queries, our planner actively learns good policies. We demonstrate our approach on several POMDP problems.
部分可观测马尔可夫决策过程(POMDP)已成功应用于需要平衡增加智能体知识的动作和增加智能体奖励的动作的规划领域。不幸的是,大多数POMDP是用大量参数定义的,仅从领域知识很难指定这些参数。在本文中,我们提出一种近似方法,使我们能够将POMDP模型参数视为“模型不确定性”POMDP中的额外隐藏状态。结合模型导向查询,我们的规划器能够积极学习良好策略。我们在几个POMDP问题上展示了我们的方法。