Jae Lee Young, Kim Jaehoon, Joon Park Young, Kwak Mingu, Bum Kim Seoung
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8814-8827. doi: 10.1109/TNNLS.2024.3439261. Epub 2025 May 2.
In pixel-based deep reinforcement learning (DRL), learning representations of states that change because of an agent's action or interaction with the environment poses a critical challenge in improving data efficiency. Recent data-efficient DRL studies have integrated DRL with self-supervised learning (SSL) and data augmentation to learn state representations from given interactions. However, some methods have difficulties in explicitly capturing evolving state representations or in selecting data augmentations for appropriate reward signals. Our goal is to explicitly learn the inherent dynamics that change with an agent's intervention and interaction with the environment. We propose masked and inverse dynamics modeling (MIND), which uses masking augmentation and fewer hyperparameters to learn agent-controllable representations in changing states. Our method is comprised of a self-supervised multitask learning that leverages a transformer architecture, which captures the spatiotemporal information underlying in the highly correlated consecutive frames. MIND uses two tasks to perform self-supervised multitask learning: masked modeling and inverse dynamics modeling. Masked modeling learns the static visual representation required for control in the state, and inverse dynamics modeling learns the rapidly evolving state representation with agent intervention. By integrating inverse dynamics modeling as a complementary component to masked modeling, our method effectively learns evolving state representations. We evaluate our method by using discrete and continuous control environments with limited interactions. MIND outperforms previous methods across benchmarks and significantly improves data efficiency. The code is available at https://github.com/dudwojae/MIND.
在基于像素的深度强化学习(DRL)中,学习因智能体的动作或与环境的交互而变化的状态表示,是提高数据效率的一项关键挑战。最近的数据高效DRL研究已将DRL与自监督学习(SSL)和数据增强相结合,以便从给定的交互中学习状态表示。然而,一些方法在明确捕捉不断演变的状态表示或为适当的奖励信号选择数据增强方面存在困难。我们的目标是明确学习随着智能体对环境的干预和交互而变化的内在动力学。我们提出了掩码和逆动力学建模(MIND),它使用掩码增强和更少的超参数来学习变化状态下智能体可控的表示。我们的方法由一个利用变压器架构的自监督多任务学习组成,该架构捕捉高度相关的连续帧中的时空信息。MIND使用两个任务来执行自监督多任务学习:掩码建模和逆动力学建模。掩码建模学习状态中控制所需的静态视觉表示,逆动力学建模学习在智能体干预下快速演变的状态表示。通过将逆动力学建模作为掩码建模的补充组件进行整合,我们的方法有效地学习了不断演变的状态表示。我们通过使用具有有限交互的离散和连续控制环境来评估我们的方法。MIND在各个基准测试中均优于先前的方法,并显著提高了数据效率。代码可在https://github.com/dudwojae/MIND获取。