• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

State-Temporal Compression in Reinforcement Learning With the Reward-Restricted Geodesic Metric.

作者信息

Guo Shangqi, Yan Qi, Su Xin, Hu Xiaolin, Chen Feng

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5572-5589. doi: 10.1109/TPAMI.2021.3069005. Epub 2022 Aug 4.

DOI:10.1109/TPAMI.2021.3069005
PMID:33764874
Abstract

It is difficult to solve complex tasks that involve large state spaces and long-term decision processes by reinforcement learning (RL) algorithms. A common and promising method to address this challenge is to compress a large RL problem into a small one. Towards this goal, the compression should be state-temporal and optimality-preserving (i.e., the optimal policy of the compressed problem should correspond to that of the uncompressed problem). In this paper, we propose a reward-restricted geodesic (RRG) metric, which can be learned by a neural network, to perform state-temporal compression in RL. We prove that compression based on the RRG metric is approximately optimality-preserving for the raw RL problem endowed with temporally abstract actions. With this compression, we design an RRG metric-based reinforcement learning (RRG-RL) algorithm to solve complex tasks. Experiments in both discrete (2D Minecraft) and continuous (Doom) environments demonstrated the superiority of our method over existing RL approaches.

摘要

相似文献

1
State-Temporal Compression in Reinforcement Learning With the Reward-Restricted Geodesic Metric.
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5572-5589. doi: 10.1109/TPAMI.2021.3069005. Epub 2022 Aug 4.
2
Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability.主动推理与强化学习:部分可观测性下连续状态与动作空间的统一推理
Neural Comput. 2024 Sep 17;36(10):2073-2135. doi: 10.1162/neco_a_01698.
3
Kernel Temporal Difference based Reinforcement Learning for Brain Machine Interfaces.基于核时差分的脑机接口强化学习。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:6721-6724. doi: 10.1109/EMBC46164.2021.9631086.
4
Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units.重症监护病房中智能机械通气和镇静药物剂量的逆强化学习。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):57. doi: 10.1186/s12911-019-0763-6.
5
Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning.选择性粒子注意:快速灵活地为深度强化学习选择特征。
Neural Netw. 2022 Jun;150:408-421. doi: 10.1016/j.neunet.2022.03.015. Epub 2022 Mar 17.
6
Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction.基于贝叶斯网络的知识抽取的高效示教强化学习。
Comput Intell Neurosci. 2021 Sep 24;2021:7588221. doi: 10.1155/2021/7588221. eCollection 2021.
7
HMM for discovering decision-making dynamics using reinforcement learning experiments.用于通过强化学习实验发现决策动态的隐马尔可夫模型。
Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae033.
8
Open-Ended Learning: A Conceptual Framework Based on Representational Redescription.开放式学习:基于表征重述的概念框架
Front Neurorobot. 2018 Sep 25;12:59. doi: 10.3389/fnbot.2018.00059. eCollection 2018.
9
A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.基于 Transformer 的混合在线非策略强化学习代理框架。
Int J Neural Syst. 2023 Dec;33(12):2350065. doi: 10.1142/S012906572350065X. Epub 2023 Oct 20.
10
Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluation.重症医学中的强化学习:奖励塑造和离策略模型评估新方法带来的可操作临床见解。
Intensive Care Med Exp. 2024 Mar 25;12(1):32. doi: 10.1186/s40635-024-00614-x.