Department of Automation, Xiamen University, Xiamen 361005, China.
Department of Automation, Xiamen University, Xiamen 361005, China.
Neural Netw. 2024 Nov;179:106579. doi: 10.1016/j.neunet.2024.106579. Epub 2024 Jul 26.
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a realistic and challenging problem in visual reinforcement learning. Recently, unsupervised representation learning methods based on bisimulation metrics, contrast, prediction, and reconstruction have shown the ability for task-relevant information extraction. However, due to the lack of appropriate mechanisms for the extraction of task information in the prediction, contrast, and reconstruction-related approaches and the limitations of bisimulation-related methods in domains with sparse rewards, it is still difficult for these methods to be effectively extended to environments with distractions. To alleviate these problems, in the paper, the action sequences, which contain task-intensive signals, are incorporated into representation learning. Specifically, we propose a Sequential Action-induced invariant Representation (SAR) method, which decouples the controlled part (i.e., task-relevant information) and the uncontrolled part (i.e., task-irrelevant information) in noisy observations through sequential actions, thereby extracting effective representations related to decision tasks. To achieve it, the characteristic function of the action sequence's probability distribution is modeled to specifically optimize the state encoder. We conduct extensive experiments on the distracting DeepMind Control suite while achieving the best performance over strong baselines. We also demonstrate the effectiveness of our method at disregarding task-irrelevant information by applying SAR to real-world CARLA-based autonomous driving with natural distractions. Finally, we provide the analysis results of generalization drawn from the generalization decay and t-SNE visualization. Code and demo videos are available at https://github.com/DMU-XMU/SAR.git.
如何从具有视觉干扰的高维观测中准确地学习与任务相关的状态表示,是视觉强化学习中的一个实际且具有挑战性的问题。最近,基于等价度量、对比、预测和重构的无监督表示学习方法已经显示出提取与任务相关信息的能力。然而,由于预测、对比和重构相关方法中缺乏提取任务信息的适当机制,以及等价相关方法在奖励稀疏的领域中的局限性,这些方法仍然难以有效地扩展到具有干扰的环境中。为了解决这些问题,在本文中,我们将包含任务密集型信号的动作序列纳入表示学习中。具体来说,我们提出了一种序列动作诱导不变表示(SAR)方法,该方法通过序列动作将噪声观测中的受控部分(即与任务相关的信息)和非受控部分(即与任务不相关的信息)解耦,从而提取与决策任务相关的有效表示。为了实现这一点,我们对动作序列的概率分布的特征函数进行建模,以专门优化状态编码器。我们在具有干扰的 DeepMind Control 套件上进行了广泛的实验,在强大的基线之上取得了最佳性能。我们还通过将 SAR 应用于具有自然干扰的基于真实世界的 CARLA 自动驾驶,展示了我们的方法忽略与任务不相关信息的有效性。最后,我们提供了从泛化衰减和 t-SNE 可视化得出的泛化分析结果。代码和演示视频可在 https://github.com/DMU-XMU/SAR.git 上获取。