• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于数据高效强化学习的掩码与逆动力学建模

Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning.

作者信息

Jae Lee Young, Kim Jaehoon, Joon Park Young, Kwak Mingu, Bum Kim Seoung

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8814-8827. doi: 10.1109/TNNLS.2024.3439261. Epub 2025 May 2.

DOI:10.1109/TNNLS.2024.3439261
PMID:39141453
Abstract

In pixel-based deep reinforcement learning (DRL), learning representations of states that change because of an agent's action or interaction with the environment poses a critical challenge in improving data efficiency. Recent data-efficient DRL studies have integrated DRL with self-supervised learning (SSL) and data augmentation to learn state representations from given interactions. However, some methods have difficulties in explicitly capturing evolving state representations or in selecting data augmentations for appropriate reward signals. Our goal is to explicitly learn the inherent dynamics that change with an agent's intervention and interaction with the environment. We propose masked and inverse dynamics modeling (MIND), which uses masking augmentation and fewer hyperparameters to learn agent-controllable representations in changing states. Our method is comprised of a self-supervised multitask learning that leverages a transformer architecture, which captures the spatiotemporal information underlying in the highly correlated consecutive frames. MIND uses two tasks to perform self-supervised multitask learning: masked modeling and inverse dynamics modeling. Masked modeling learns the static visual representation required for control in the state, and inverse dynamics modeling learns the rapidly evolving state representation with agent intervention. By integrating inverse dynamics modeling as a complementary component to masked modeling, our method effectively learns evolving state representations. We evaluate our method by using discrete and continuous control environments with limited interactions. MIND outperforms previous methods across benchmarks and significantly improves data efficiency. The code is available at https://github.com/dudwojae/MIND.

摘要

在基于像素的深度强化学习(DRL)中,学习因智能体的动作或与环境的交互而变化的状态表示,是提高数据效率的一项关键挑战。最近的数据高效DRL研究已将DRL与自监督学习(SSL)和数据增强相结合,以便从给定的交互中学习状态表示。然而,一些方法在明确捕捉不断演变的状态表示或为适当的奖励信号选择数据增强方面存在困难。我们的目标是明确学习随着智能体对环境的干预和交互而变化的内在动力学。我们提出了掩码和逆动力学建模(MIND),它使用掩码增强和更少的超参数来学习变化状态下智能体可控的表示。我们的方法由一个利用变压器架构的自监督多任务学习组成,该架构捕捉高度相关的连续帧中的时空信息。MIND使用两个任务来执行自监督多任务学习:掩码建模和逆动力学建模。掩码建模学习状态中控制所需的静态视觉表示,逆动力学建模学习在智能体干预下快速演变的状态表示。通过将逆动力学建模作为掩码建模的补充组件进行整合,我们的方法有效地学习了不断演变的状态表示。我们通过使用具有有限交互的离散和连续控制环境来评估我们的方法。MIND在各个基准测试中均优于先前的方法,并显著提高了数据效率。代码可在https://github.com/dudwojae/MIND获取。

相似文献

1
Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning.用于数据高效强化学习的掩码与逆动力学建模
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8814-8827. doi: 10.1109/TNNLS.2024.3439261. Epub 2025 May 2.
2
STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.STACoRe:用于雅达利强化学习的基于时空和动作对比的表示方法。
Neural Netw. 2023 Mar;160:1-11. doi: 10.1016/j.neunet.2022.12.018. Epub 2022 Dec 29.
3
GMIM: Self-supervised pre-training for 3D medical image segmentation with adaptive and hierarchical masked image modeling.GMIM:基于自适应和分层掩模图像建模的 3D 医学图像分割自监督预训练。
Comput Biol Med. 2024 Jun;176:108547. doi: 10.1016/j.compbiomed.2024.108547. Epub 2024 May 6.
4
Masked Contrastive Representation Learning for Reinforcement Learning.用于强化学习的掩码对比表示学习
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3421-3433. doi: 10.1109/TPAMI.2022.3176413. Epub 2023 Feb 3.
5
Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.基于伪标签自训练的局部对比损失的半监督医学图像分割。
Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.
6
Multi-Task Collaborative Network: Bridge the Supervised and Self-Supervised Learning for EEG Classification in RSVP Tasks.多任务协作网络:用于 RSVP 任务中 EEG 分类的有监督和自监督学习的桥梁。
IEEE Trans Neural Syst Rehabil Eng. 2024;32:638-651. doi: 10.1109/TNSRE.2024.3357863. Epub 2024 Feb 1.
7
Hierarchical discriminative learning improves visual representations of biomedical microscopy.分层判别学习改善了生物医学显微镜的视觉表征。
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2023 Jun;2023:19798-19808. doi: 10.1109/cvpr52729.2023.01896. Epub 2023 Aug 22.
8
Generalization Enhancement of Visual Reinforcement Learning through Internal States.通过内部状态增强视觉强化学习的泛化能力
Sensors (Basel). 2024 Jul 12;24(14):4513. doi: 10.3390/s24144513.
9
StARformer: Transformer With State-Action-Reward Representations for Robot Learning.StARformer:用于机器人学习的具有状态-动作-奖励表示的Transformer
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12862-12877. doi: 10.1109/TPAMI.2022.3204708. Epub 2023 Oct 3.
10
TIMAR: Transition-informed representation for sample-efficient multi-agent reinforcement learning.TIMAR:用于样本高效多智能体强化学习的转换感知表示
Neural Netw. 2025 Apr;184:107081. doi: 10.1016/j.neunet.2024.107081. Epub 2024 Dec 31.