• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索性状态表示学习

Exploratory State Representation Learning.

作者信息

Merckling Astrid, Perrin-Gilbert Nicolas, Coninx Alex, Doncieux Stéphane

机构信息

Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique, ISIR, Paris, France.

出版信息

Front Robot AI. 2022 Feb 14;9:762051. doi: 10.3389/frobt.2022.762051. eCollection 2022.

DOI:10.3389/frobt.2022.762051
PMID:35237669
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8883277/
Abstract

Not having access to compact and meaningful representations is known to significantly increase the complexity of reinforcement learning (RL). For this reason, it can be useful to perform state representation learning (SRL) before tackling RL tasks. However, obtaining a good state representation can only be done if a large diversity of transitions is observed, which can require a difficult exploration, especially if the environment is initially reward-free. To solve the problems of exploration and SRL in parallel, we propose a new approach called XSRL (eXploratory State Representation Learning). On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a -step learning progress bonus to form the maximization objective of a discovery policy. This results in a policy that seeks complex transitions from which the trained models can effectively learn. Our experimental results show that the approach leads to efficient exploration in challenging environments with image observations, and to state representations that significantly accelerate learning in RL tasks.

摘要

众所周知,无法获得紧凑且有意义的表示会显著增加强化学习(RL)的复杂性。因此,在处理RL任务之前执行状态表示学习(SRL)可能会很有用。然而,只有在观察到大量不同的转换时才能获得良好的状态表示,这可能需要进行困难的探索,特别是如果环境最初是无奖励的。为了并行解决探索和SRL的问题,我们提出了一种名为XSRL(探索性状态表示学习)的新方法。一方面,它联合学习紧凑的状态表示和一个状态转换估计器,该估计器用于从表示中去除不可利用的信息。另一方面,它持续训练一个逆模型,并将一个步长学习进度奖励添加到该模型的预测误差中,以形成发现策略的最大化目标。这导致了一种寻求复杂转换的策略,训练模型可以从中有效学习。我们的实验结果表明,该方法在具有图像观测的具有挑战性的环境中能实现高效探索,并能得到在RL任务中显著加速学习的状态表示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/56ac5a833a40/frobt-09-762051-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/5775d360bccc/frobt-09-762051-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/676368e1e932/frobt-09-762051-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/4ab8719dbbea/frobt-09-762051-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/429b7ba61d26/frobt-09-762051-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/699d302ba612/frobt-09-762051-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/8c943b07d858/frobt-09-762051-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/56ac5a833a40/frobt-09-762051-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/5775d360bccc/frobt-09-762051-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/676368e1e932/frobt-09-762051-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/4ab8719dbbea/frobt-09-762051-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/429b7ba61d26/frobt-09-762051-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/699d302ba612/frobt-09-762051-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/8c943b07d858/frobt-09-762051-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/272e/8883277/56ac5a833a40/frobt-09-762051-g007.jpg

相似文献

1
Exploratory State Representation Learning.探索性状态表示学习
Front Robot AI. 2022 Feb 14;9:762051. doi: 10.3389/frobt.2022.762051. eCollection 2022.
2
Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning.基于像素的强化学习的对比预测模型的视觉预训练。
Sensors (Basel). 2022 Aug 29;22(17):6504. doi: 10.3390/s22176504.
3
Representation learning for continuous action spaces is beneficial for efficient policy learning.连续动作空间的表示学习有利于高效的策略学习。
Neural Netw. 2023 Feb;159:137-152. doi: 10.1016/j.neunet.2022.12.009. Epub 2022 Dec 16.
4
Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning.基于视觉的机器人导航,通过结合无监督学习和分层强化学习。
Sensors (Basel). 2019 Apr 1;19(7):1576. doi: 10.3390/s19071576.
5
Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability.主动推理与强化学习:部分可观测性下连续状态与动作空间的统一推理
Neural Comput. 2024 Sep 17;36(10):2073-2135. doi: 10.1162/neco_a_01698.
6
Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning.选择性粒子注意:快速灵活地为深度强化学习选择特征。
Neural Netw. 2022 Jun;150:408-421. doi: 10.1016/j.neunet.2022.03.015. Epub 2022 Mar 17.
7
STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.STACoRe:用于雅达利强化学习的基于时空和动作对比的表示方法。
Neural Netw. 2023 Mar;160:1-11. doi: 10.1016/j.neunet.2022.12.018. Epub 2022 Dec 29.
8
Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation.基于最小二乘条件密度估计的基于模型的策略梯度与基于参数的探索。
Neural Netw. 2014 Sep;57:128-40. doi: 10.1016/j.neunet.2014.06.006. Epub 2014 Jun 21.
9
Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning.用于数据高效强化学习的掩码与逆动力学建模
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8814-8827. doi: 10.1109/TNNLS.2024.3439261. Epub 2025 May 2.
10
Emergence of belief-like representations through reinforcement learning.通过强化学习产生类似信念的表征。
bioRxiv. 2023 Apr 4:2023.04.04.535512. doi: 10.1101/2023.04.04.535512.

引用本文的文献

1
Action of the Euclidean versus projective group on an agent's internal space in curiosity driven exploration.在好奇心驱动的探索中,欧几里得群与射影群对智能体内部空间的作用。
Biol Cybern. 2025 Jan 17;119(1):4. doi: 10.1007/s00422-024-01001-1.

本文引用的文献

1
State representation learning for control: An overview.状态表示学习控制:概述。
Neural Netw. 2018 Dec;108:379-392. doi: 10.1016/j.neunet.2018.07.006. Epub 2018 Aug 4.