Shani Guy, Brafman Ronen I, Shimony Solomon Eyal
Microsoft Research, Redmond, WA 98052 USA.
IEEE Trans Syst Man Cybern B Cybern. 2008 Dec;38(6):1592-605. doi: 10.1109/TSMCB.2008.928222.
Recent scaling up of partially observable Markov decision process (POMDP) solvers toward realistic applications is largely due to point-based methods that quickly converge to an approximate solution for medium-sized domains. These algorithms compute a value function for a finite reachable set of belief points, using backup operations. Point-based algorithms differ on the selection of the set of belief points and on the order by which backup operations are executed on the selected belief points. We first show how current algorithms execute a large number of backups that can be removed without reducing the quality of the value function. We demonstrate that the ordering of backup operations on a predefined set of belief points is important. In the simpler domain of MDP solvers, prioritizing the order of equivalent backup operations on states is known to speed up convergence. We generalize the notion of prioritized backups to the POMDP framework, showing how existing algorithms can be improved by prioritizing backups. We also present a new algorithm, which is the prioritized value iteration, and show empirically that it outperforms current point-based algorithms. Finally, a new empirical evaluation measure (in addition to the standard runtime comparison), which is based on the number of atomic operations and the number of belief points, is proposed in order to provide more accurate benchmark comparisons.
近期,部分可观测马尔可夫决策过程(POMDP)求解器在实际应用中的扩展,很大程度上归功于基于点的方法,这些方法能快速收敛到中等规模领域的近似解。这些算法使用备份操作,为有限的可达信念点集计算价值函数。基于点的算法在信念点集的选择以及对所选信念点执行备份操作的顺序上存在差异。我们首先展示当前算法如何执行大量可在不降低价值函数质量的情况下移除的备份操作。我们证明在预定义的信念点集上备份操作的顺序很重要。在更简单的MDP求解器领域,已知对状态上等效备份操作的顺序进行优先级排序可加快收敛速度。我们将优先级备份的概念推广到POMDP框架,展示如何通过对备份操作进行优先级排序来改进现有算法。我们还提出了一种新算法,即优先级值迭代,并通过实验表明它优于当前基于点的算法。最后,为了提供更准确的基准比较,提出了一种新的实证评估指标(除了标准的运行时比较),该指标基于原子操作的数量和信念点的数量。