• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于模型的降维强化学习。

Model-based reinforcement learning with dimension reduction.

机构信息

Department of Computer Science, The University of Tokyo, Japan.

Department of Brain Robot Interface, ATR Computational Neuroscience Laboratory, Japan.

出版信息

Neural Netw. 2016 Dec;84:1-16. doi: 10.1016/j.neunet.2016.08.005. Epub 2016 Aug 24.

DOI:10.1016/j.neunet.2016.08.005
PMID:27639719
Abstract

The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control.

摘要

强化学习的目标是学习最优策略,该策略控制智能体以获得最大累积奖励。基于模型的强化学习方法从数据中学习环境的转移模型,然后使用转移模型推导出最优策略。然而,在高维环境中学习准确的转移模型需要大量难以获得的数据。为了克服这一困难,在本文中,我们提出将基于模型的强化学习与最近开发的最小二乘条件熵(LSCE)方法相结合,该方法同时执行转移模型估计和降维。我们还进一步将提出的方法扩展到模仿学习场景。实验结果表明,策略搜索与 LSCE 结合可很好地应用于高维控制任务,包括真实人形机器人控制。

相似文献

1
Model-based reinforcement learning with dimension reduction.基于模型的降维强化学习。
Neural Netw. 2016 Dec;84:1-16. doi: 10.1016/j.neunet.2016.08.005. Epub 2016 Aug 24.
2
Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation.基于最小二乘条件密度估计的基于模型的策略梯度与基于参数的探索。
Neural Netw. 2014 Sep;57:128-40. doi: 10.1016/j.neunet.2014.06.006. Epub 2014 Jun 21.
3
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning.基于样本复用的奖励加权回归的强化学习中直接策略搜索
Neural Comput. 2011 Nov;23(11):2798-832. doi: 10.1162/NECO_a_00199. Epub 2011 Aug 18.
4
Efficient exploration through active learning for value function approximation in reinforcement learning.强化学习中基于主动学习的价值函数逼近的有效探索。
Neural Netw. 2010 Jun;23(5):639-48. doi: 10.1016/j.neunet.2009.12.010. Epub 2010 Jan 11.
5
A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.强化学习算法通过划分任务空间从训练代理那里获取演示。
Neural Netw. 2023 Jul;164:419-427. doi: 10.1016/j.neunet.2023.04.042. Epub 2023 May 5.
6
Conditional density estimation with dimensionality reduction via squared-loss conditional entropy minimization.通过平方损失条件熵最小化进行降维的条件密度估计。
Neural Comput. 2015 Jan;27(1):228-54. doi: 10.1162/NECO_a_00683.
7
Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代
IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.
8
MOSAIC for multiple-reward environments.多奖励环境下的 MOSAIC 算法。
Neural Comput. 2012 Mar;24(3):577-606. doi: 10.1162/NECO_a_00246. Epub 2011 Dec 14.
9
State representation learning for control: An overview.状态表示学习控制:概述。
Neural Netw. 2018 Dec;108:379-392. doi: 10.1016/j.neunet.2018.07.006. Epub 2018 Aug 4.
10
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning.对数平稳分布的策略梯度强化学习衍生。
Neural Comput. 2010 Feb;22(2):342-76. doi: 10.1162/neco.2009.12-08-922.