Guo Shangqi, Yan Qi, Su Xin, Hu Xiaolin, Chen Feng
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5572-5589. doi: 10.1109/TPAMI.2021.3069005. Epub 2022 Aug 4.
It is difficult to solve complex tasks that involve large state spaces and long-term decision processes by reinforcement learning (RL) algorithms. A common and promising method to address this challenge is to compress a large RL problem into a small one. Towards this goal, the compression should be state-temporal and optimality-preserving (i.e., the optimal policy of the compressed problem should correspond to that of the uncompressed problem). In this paper, we propose a reward-restricted geodesic (RRG) metric, which can be learned by a neural network, to perform state-temporal compression in RL. We prove that compression based on the RRG metric is approximately optimality-preserving for the raw RL problem endowed with temporally abstract actions. With this compression, we design an RRG metric-based reinforcement learning (RRG-RL) algorithm to solve complex tasks. Experiments in both discrete (2D Minecraft) and continuous (Doom) environments demonstrated the superiority of our method over existing RL approaches.