IEEE Trans Cybern. 2022 Oct;52(10):10570-10581. doi: 10.1109/TCYB.2021.3062856. Epub 2022 Sep 19.
This article provides a novel inverse reinforcement learning (RL) algorithm that learns an unknown performance objective function for tracking control. The algorithm combines three steps: 1) an optimal control update; 2) a gradient descent correction step; and 3) an inverse optimal control (IOC) update. The new algorithm clarifies the relation between inverse RL and IOC. It is shown that the reward weight of an unknown performance objective that generates a target control policy may not be unique. We characterize the set of all weights that generate the same target control policy. We develop a model-based algorithm and, further, two model-free algorithms for systems with unknown model information. Finally, simulation experiments are presented to show the effectiveness of the proposed algorithms.
本文提出了一种新的逆强化学习(RL)算法,用于跟踪控制的未知性能目标函数学习。该算法结合了三个步骤:1)最优控制更新;2)梯度下降修正步骤;3)逆最优控制(IOC)更新。新算法阐明了逆 RL 和 IOC 之间的关系。结果表明,生成目标控制策略的未知性能目标的奖励权重可能不唯一。我们刻画了生成相同目标控制策略的所有权重的集合。我们为具有未知模型信息的系统开发了一种基于模型的算法,进一步开发了两种无模型算法。最后,通过仿真实验验证了所提出算法的有效性。