• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多模态信息瓶颈用于多传感器的深度强化学习。

Multimodal information bottleneck for deep reinforcement learning with multiple sensors.

机构信息

Department of Computer Science and Technology, Beijing National Research Centre for Information Science and Technology, Tsinghua University, Beijing 100084, China.

Department of Computer Science and Technology, Beijing National Research Centre for Information Science and Technology, Tsinghua University, Beijing 100084, China.

出版信息

Neural Netw. 2024 Aug;176:106347. doi: 10.1016/j.neunet.2024.106347. Epub 2024 Apr 27.

DOI:10.1016/j.neunet.2024.106347
PMID:38688069
Abstract

Reinforcement learning has achieved promising results on robotic control tasks but struggles to leverage information effectively from multiple sensory modalities that differ in many characteristics. Recent works construct auxiliary losses based on reconstruction or mutual information to extract joint representations from multiple sensory inputs to improve the sample efficiency and performance of reinforcement learning algorithms. However, the representations learned by these methods could capture information irrelevant to learning a policy and may degrade the performance. We argue that compressing information in the learned joint representations about raw multimodal observations is helpful, and propose a multimodal information bottleneck model to learn task-relevant joint representations from egocentric images and proprioception. Our model compresses and retains the predictive information in multimodal observations for learning a compressed joint representation, which fuses complementary information from visual and proprioceptive feedback and meanwhile filters out task-irrelevant information in raw multimodal observations. We propose to minimize the upper bound of our multimodal information bottleneck objective for computationally tractable optimization. Experimental evaluations on several challenging locomotion tasks with egocentric images and proprioception show that our method achieves better sample efficiency and zero-shot robustness to unseen white noise than leading baselines. We also empirically demonstrate that leveraging information from egocentric images and proprioception is more helpful for learning policies on locomotion tasks than solely using one single modality.

摘要

强化学习在机器人控制任务中取得了令人瞩目的成果,但在有效利用来自多个在许多特性上存在差异的感觉模式的信息方面仍面临挑战。最近的研究工作基于重建或互信息构建辅助损失,从多个感觉输入中提取联合表示,以提高强化学习算法的样本效率和性能。然而,这些方法学习的表示可能会捕获与学习策略无关的信息,从而降低性能。我们认为,压缩从自我中心图像和本体感受中学习到的联合表示中关于原始多模态观测的信息是有帮助的,并提出了一种多模态信息瓶颈模型来学习与任务相关的联合表示。我们的模型压缩并保留多模态观测中的预测信息,以学习压缩的联合表示,该表示融合了视觉和本体反馈的互补信息,同时过滤掉原始多模态观测中的与任务无关的信息。我们提出了最小化我们的多模态信息瓶颈目标的上界,以便进行计算上可行的优化。使用自我中心图像和本体感受的几个具有挑战性的运动任务的实验评估表明,与领先的基线相比,我们的方法在零样本鲁棒性方面具有更好的样本效率和对未见过的白噪声的鲁棒性。我们还实证证明,与仅使用单一模态相比,利用自我中心图像和本体感受的信息更有助于学习运动任务的策略。

相似文献

1
Multimodal information bottleneck for deep reinforcement learning with multiple sensors.多模态信息瓶颈用于多传感器的深度强化学习。
Neural Netw. 2024 Aug;176:106347. doi: 10.1016/j.neunet.2024.106347. Epub 2024 Apr 27.
2
Variational Information Bottleneck Regularized Deep Reinforcement Learning for Efficient Robotic Skill Adaptation.变分信息瓶颈正则化深度强化学习在机器人高效技能自适应中的应用。
Sensors (Basel). 2023 Jan 9;23(2):762. doi: 10.3390/s23020762.
3
Sequential action-induced invariant representation for reinforcement learning.强化学习中的序贯动作诱导不变表示。
Neural Netw. 2024 Nov;179:106579. doi: 10.1016/j.neunet.2024.106579. Epub 2024 Jul 26.
4
Multimodal Deep Reinforcement Learning with Auxiliary Task for Obstacle Avoidance of Indoor Mobile Robot.多模态深度强化学习与辅助任务在室内移动机器人避障中的应用。
Sensors (Basel). 2021 Feb 15;21(4):1363. doi: 10.3390/s21041363.
5
Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning.基于像素的强化学习的对比预测模型的视觉预训练。
Sensors (Basel). 2022 Aug 29;22(17):6504. doi: 10.3390/s22176504.
6
Action-driven contrastive representation for reinforcement learning.基于动作的强化学习对比表示。
PLoS One. 2022 Mar 18;17(3):e0265456. doi: 10.1371/journal.pone.0265456. eCollection 2022.
7
Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information.通过最大化基于状态-动作的互信息在深度强化学习中发现多样的解决方案。
Neural Netw. 2022 Aug;152:90-104. doi: 10.1016/j.neunet.2022.04.009. Epub 2022 Apr 16.
8
Efficient online bootstrapping of sensory representations.高效的在线感官表示自举。
Neural Netw. 2013 May;41:39-50. doi: 10.1016/j.neunet.2012.11.002. Epub 2012 Nov 19.
9
Modular deep reinforcement learning from reward and punishment for robot navigation.基于奖惩的机器人导航模块化深度强化学习。
Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.
10
Reward-predictive representations generalize across tasks in reinforcement learning.在强化学习中,奖励预测表示可以跨任务泛化。
PLoS Comput Biol. 2020 Oct 15;16(10):e1008317. doi: 10.1371/journal.pcbi.1008317. eCollection 2020 Oct.