• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

特征控制作为分层强化学习的内在动机。

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning.

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3409-3418. doi: 10.1109/TNNLS.2019.2891792. Epub 2019 Jan 29.

DOI:10.1109/TNNLS.2019.2891792
PMID:30714933
Abstract

One of the main concerns of deep reinforcement learning (DRL) is the data inefficiency problem, which stems both from an inability to fully utilize data acquired and from naive exploration strategies. In order to alleviate these problems, we propose a DRL algorithm that aims to improve data efficiency via both the utilization of unrewarded experiences and the exploration strategy by combining ideas from unsupervised auxiliary tasks, intrinsic motivation, and hierarchical reinforcement learning (HRL). Our method is based on a simple HRL architecture with a metacontroller and a subcontroller. The subcontroller is intrinsically motivated by the metacontroller to learn to control aspects of the environment, with the intention of giving the agent: 1) a neural representation that is generically useful for tasks that involve manipulation of the environment and 2) the ability to explore the environment in a temporally extended manner through the control of the metacontroller. In this way, we reinterpret the notion of pixel- and feature-control auxiliary tasks as reusable skills that can be learned via an intrinsic reward. We evaluate our method on a number of Atari 2600 games. We found that it outperforms the baseline in several environments and significantly improves performance in one of the hardest games-Montezuma's revenge-for which the ability to utilize sparse data is key. We found that the inclusion of intrinsic reward is crucial for the improvement in the performance and that most of the benefit seems to be derived from the representations learned during training.

摘要

深度强化学习(DRL)的主要关注点之一是数据效率低下的问题,这既源于无法充分利用已获取的数据,也源于幼稚的探索策略。为了缓解这些问题,我们提出了一种 DRL 算法,旨在通过利用未奖励的经验和探索策略来提高数据效率,该策略结合了无监督辅助任务、内在动机和分层强化学习(HRL)的思想。我们的方法基于具有元控制器和子控制器的简单 HRL 架构。子控制器受元控制器的内在激励,以学习控制环境的各个方面,目的是为代理提供:1)一种对涉及环境操作的任务具有普遍用途的神经表示,以及 2)通过元控制器控制以时间扩展方式探索环境的能力。通过这种方式,我们重新解释了像素和特征控制辅助任务的概念,将其作为可通过内在奖励学习的可重用技能。我们在多个雅达利 2600 游戏上评估了我们的方法。我们发现,它在几个环境中优于基线,并且在最难的游戏之一——《蒙特祖玛的复仇》中的性能显著提高,而在该游戏中,利用稀疏数据的能力是关键。我们发现,内在奖励的包含对于提高性能至关重要,而且大部分好处似乎都源于训练期间学到的表示。

相似文献

1
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning.特征控制作为分层强化学习的内在动机。
IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3409-3418. doi: 10.1109/TNNLS.2019.2891792. Epub 2019 Jan 29.
2
Boosting Reinforcement Learning via Hierarchical Game Playing With State Relay.通过带有状态中继的分层博弈来增强强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7077-7089. doi: 10.1109/TNNLS.2024.3386717. Epub 2025 Apr 4.
3
Incremental learning of skill collections based on intrinsic motivation.基于内在动机的技能集合的增量学习。
Front Neurorobot. 2013 Jul 26;7:11. doi: 10.3389/fnbot.2013.00011. eCollection 2013.
4
First return, then explore.先返回,再探索。
Nature. 2021 Feb;590(7847):580-586. doi: 10.1038/s41586-020-03157-9. Epub 2021 Feb 24.
5
Hierarchical intrinsically motivated agent planning behavior with dreaming in grid environments.网格环境中具有梦境的分层内在动机智能体规划行为
Brain Inform. 2022 Apr 2;9(1):8. doi: 10.1186/s40708-022-00156-6.
6
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
7
STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.STACoRe:用于雅达利强化学习的基于时空和动作对比的表示方法。
Neural Netw. 2023 Mar;160:1-11. doi: 10.1016/j.neunet.2022.12.018. Epub 2022 Dec 29.
8
Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning.基于视觉的机器人导航,通过结合无监督学习和分层强化学习。
Sensors (Basel). 2019 Apr 1;19(7):1576. doi: 10.3390/s19071576.
9
Double Sparse Deep Reinforcement Learning via Multilayer Sparse Coding and Nonconvex Regularized Pruning.基于多层稀疏编码和非凸正则化剪枝的双稀疏深度强化学习
IEEE Trans Cybern. 2023 Feb;53(2):765-778. doi: 10.1109/TCYB.2022.3157892. Epub 2023 Jan 13.
10
Deep Reinforcement Learning: A Survey.深度强化学习综述
IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):5064-5078. doi: 10.1109/TNNLS.2022.3207346. Epub 2024 Apr 4.

引用本文的文献

1
IoT-Based Reinforcement Learning Using Probabilistic Model for Determining Extensive Exploration through Computational Intelligence for Next-Generation Techniques.基于物联网的强化学习使用概率模型通过计算智能确定广泛探索,以用于下一代技术。
Comput Intell Neurosci. 2023 Oct 10;2023:5113417. doi: 10.1155/2023/5113417. eCollection 2023.
2
Correlation Analysis of Japanese Literature and Psychotherapy Effects Based on an Equation Diagnosis Algorithm.基于方程诊断算法的日本文学与心理治疗效果的相关分析。
Occup Ther Int. 2022 Jun 11;2022:3032445. doi: 10.1155/2022/3032445. eCollection 2022.
3
Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration.
在分散式多机器人探索中通过基于注意力的通信神经网络学习合作。
Entropy (Basel). 2019 Mar 19;21(3):294. doi: 10.3390/e21030294.