• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学习可解释的任务相关状态表示,用于无模型深度强化学习。

Learning explainable task-relevant state representation for model-free deep reinforcement learning.

机构信息

College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin, China; RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan.

College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin, China.

出版信息

Neural Netw. 2024 Dec;180:106741. doi: 10.1016/j.neunet.2024.106741. Epub 2024 Sep 20.

DOI:10.1016/j.neunet.2024.106741
PMID:39321563
Abstract

State representations considerably accelerate learning speed and improve data efficiency for deep reinforcement learning (DRL), especially for visual tasks. Task-relevant state representations could focus on features relevant to the task, filter out irrelevant elements, and thus further improve performance. However, task-relevant representations are typically obtained through model-based DRL methods, which involves the challenging task of learning a transition function. Moreover, inaccuracies in the learned transition function can potentially lead to performance degradation and negatively impact the learning of the policy. In this paper, to address the above issue, we propose a novel method of explainable task-relevant state representation (ETrSR) for model-free DRL that is direct, robust, and without any requirement of learning of a transition model. More specifically, the proposed ETrSR first disentangles the features from the states based on the beta variational autoencoder (β-VAE). Then, a reward prediction model is employed to bootstrap these features to be relevant to the task, and the explainable states can be obtained by decoding the task-related features. Finally, we validate our proposed method on the CarRacing environment and various tasks in the DeepMind control suite (DMC), which demonstrates the explainability for better understanding of the decision-making process and the outstanding performance of the proposed method even in environments with strong distractions.

摘要

状态表示极大地加速了深度强化学习(DRL)的学习速度并提高了数据效率,特别是对于视觉任务。与任务相关的状态表示可以专注于与任务相关的特征,过滤掉不相关的元素,从而进一步提高性能。然而,与任务相关的表示通常是通过基于模型的 DRL 方法获得的,这涉及到学习转移函数的挑战性任务。此外,学习到的转移函数的不准确性可能导致性能下降,并对策略的学习产生负面影响。在本文中,为了解决上述问题,我们提出了一种新的无模型 DRL 的可解释任务相关状态表示(ETrSR)方法,该方法直接、稳健,且无需学习转移模型。更具体地说,所提出的 ETrSR 首先基于β变分自动编码器(β-VAE)从状态中分离出特征。然后,使用奖励预测模型来引导这些特征与任务相关联,并通过解码与任务相关的特征来获得可解释的状态。最后,我们在 CarRacing 环境和 DeepMind 控制套件(DMC)中的各种任务上验证了我们提出的方法,这表明了该方法的可解释性,有助于更好地理解决策过程,并且即使在具有强烈干扰的环境中,该方法也具有出色的性能。

相似文献

1
Learning explainable task-relevant state representation for model-free deep reinforcement learning.学习可解释的任务相关状态表示,用于无模型深度强化学习。
Neural Netw. 2024 Dec;180:106741. doi: 10.1016/j.neunet.2024.106741. Epub 2024 Sep 20.
2
Sequential action-induced invariant representation for reinforcement learning.强化学习中的序贯动作诱导不变表示。
Neural Netw. 2024 Nov;179:106579. doi: 10.1016/j.neunet.2024.106579. Epub 2024 Jul 26.
3
Reward-predictive representations generalize across tasks in reinforcement learning.在强化学习中,奖励预测表示可以跨任务泛化。
PLoS Comput Biol. 2020 Oct 15;16(10):e1008317. doi: 10.1371/journal.pcbi.1008317. eCollection 2020 Oct.
4
Deep reinforcement learning and its applications in medical imaging and radiation therapy: a survey.深度强化学习及其在医学影像和放射治疗中的应用:综述。
Phys Med Biol. 2022 Nov 11;67(22). doi: 10.1088/1361-6560/ac9cb3.
5
Multimodal information bottleneck for deep reinforcement learning with multiple sensors.多模态信息瓶颈用于多传感器的深度强化学习。
Neural Netw. 2024 Aug;176:106347. doi: 10.1016/j.neunet.2024.106347. Epub 2024 Apr 27.
6
Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network.基于辅助评论家网络的自动驾驶策略深度强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3680-3690. doi: 10.1109/TNNLS.2021.3116063. Epub 2023 Jul 6.
7
Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning.选择性粒子注意:快速灵活地为深度强化学习选择特征。
Neural Netw. 2022 Jun;150:408-421. doi: 10.1016/j.neunet.2022.03.015. Epub 2022 Mar 17.
8
Representation learning for continuous action spaces is beneficial for efficient policy learning.连续动作空间的表示学习有利于高效的策略学习。
Neural Netw. 2023 Feb;159:137-152. doi: 10.1016/j.neunet.2022.12.009. Epub 2022 Dec 16.
9
Combining STDP and binary networks for reinforcement learning from images and sparse rewards.结合 STDP 和二进制网络,从图像和稀疏奖励中进行强化学习。
Neural Netw. 2021 Dec;144:496-506. doi: 10.1016/j.neunet.2021.09.010. Epub 2021 Sep 17.
10
Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning.基于像素的强化学习的对比预测模型的视觉预训练。
Sensors (Basel). 2022 Aug 29;22(17):6504. doi: 10.3390/s22176504.