• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从自由能到期望能量:改进强化学习中的基于能量的价值函数逼近。

From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.

机构信息

Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seikacho, Soraku-gun, Kyoto 619-0288, Japan; Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan.

Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan.

出版信息

Neural Netw. 2016 Dec;84:17-27. doi: 10.1016/j.neunet.2016.07.013. Epub 2016 Aug 26.

DOI:10.1016/j.neunet.2016.07.013
PMID:27639720
Abstract

Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions.

摘要

基于自由能的强化学习(FERL)被提出用于学习高维状态和动作空间。然而,FERL 方法仅在二进制或接近二进制的状态输入下真正有效,其中活跃状态的数量少于非活跃状态的数量。在 FERL 方法中,值函数通过受限玻尔兹曼机(RBM)的负自由能来近似。在我们之前的研究中,我们证明了通过将自由能乘以与网络大小相关的常数,可以提高 FERL 方法的性能和鲁棒性。在这项研究中,我们提出通过使用负期望能量(EERL)来近似值函数,而不是负自由能,以及能够处理连续状态输入,可以进一步改进 RBM 函数逼近。我们通过证明 EERL:(1)在具有高维图像状态输入的三个网格世界任务版本中,优于 FERL 以及标准神经网络和线性函数逼近;(2)在无模型和基于模型的学习设置下,在随机 SZ-Tetris 中达到新的最先进结果;(3)在具有原始和嘈杂 RGB 图像作为状态输入和大量动作的机器人导航任务中,显著优于 FERL 和标准神经网络函数逼近,验证了我们的方法。

相似文献

1
From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.从自由能到期望能量:改进强化学习中的基于能量的价值函数逼近。
Neural Netw. 2016 Dec;84:17-27. doi: 10.1016/j.neunet.2016.07.013. Epub 2016 Aug 26.
2
Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces.基于标度自由能的强化学习方法,用于在高维状态空间中进行鲁棒和高效的学习。
Front Neurorobot. 2013 Feb 28;7:3. doi: 10.3389/fnbot.2013.00003. eCollection 2013.
3
Expected energy-based restricted Boltzmann machine for classification.预期基于能量的受限玻尔兹曼机分类。
Neural Netw. 2015 Apr;64:29-38. doi: 10.1016/j.neunet.2014.09.006. Epub 2014 Sep 28.
4
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.在强化学习中用于神经网络函数逼近的 Sigmoid 加权线性单元。
Neural Netw. 2018 Nov;107:3-11. doi: 10.1016/j.neunet.2017.12.012. Epub 2018 Jan 11.
5
Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states.内核动态策略编程:适用于高维状态机器人系统的强化学习。
Neural Netw. 2017 Oct;94:13-23. doi: 10.1016/j.neunet.2017.06.007. Epub 2017 Jun 29.
6
A neural network model for the orbitofrontal cortex and task space acquisition during reinforcement learning.一个用于强化学习期间眶额皮质和任务空间获取的神经网络模型。
PLoS Comput Biol. 2018 Jan 4;14(1):e1005925. doi: 10.1371/journal.pcbi.1005925. eCollection 2018 Jan.
7
Forecast Modelling via Variations in Binary Image-Encoded Information Exploited by Deep Learning Neural Networks.通过深度学习神经网络利用二值图像编码信息中的变化进行预测建模
PLoS One. 2016 Jun 9;11(6):e0157028. doi: 10.1371/journal.pone.0157028. eCollection 2016.
8
Reinforcement learning in continuous time and space: interference and not ill conditioning is the main problem when using distributed function approximators.连续时间和空间中的强化学习:使用分布式函数逼近器时,主要问题是干扰而非病态。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):950-6. doi: 10.1109/TSMCB.2008.921000.
9
Reinforcement learning solution for HJB equation arising in constrained optimal control problem.约束最优控制问题中出现的HJB方程的强化学习解决方案。
Neural Netw. 2015 Nov;71:150-8. doi: 10.1016/j.neunet.2015.08.007. Epub 2015 Aug 24.
10
Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback.将时间差分方法与自组织神经网络相结合用于具有延迟评估反馈的强化学习。
IEEE Trans Neural Netw. 2008 Feb;19(2):230-44. doi: 10.1109/TNN.2007.905839.

引用本文的文献

1
Bayesian mechanics of perceptual inference and motor control in the brain.大脑中感知推理和运动控制的贝叶斯力学。
Biol Cybern. 2021 Feb;115(1):87-102. doi: 10.1007/s00422-021-00859-9. Epub 2021 Jan 20.
2
Dark control: The default mode network as a reinforcement learning agent.暗控制:默认模式网络作为强化学习代理。
Hum Brain Mapp. 2020 Aug 15;41(12):3318-3341. doi: 10.1002/hbm.25019. Epub 2020 Jun 5.
3
Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning.受限深度Q学习逐步逼近普通Q学习。
Front Neurorobot. 2019 Dec 10;13:103. doi: 10.3389/fnbot.2019.00103. eCollection 2019.
4
Regimes of Expectations: An Active Inference Model of Social Conformity and Human Decision Making.期望模式:社会从众与人类决策的主动推理模型
Front Psychol. 2019 Mar 29;10:679. doi: 10.3389/fpsyg.2019.00679. eCollection 2019.
5
Variational ecology and the physics of sentient systems.变分生态学与感知系统物理学。
Phys Life Rev. 2019 Dec;31:188-205. doi: 10.1016/j.plrev.2018.12.002. Epub 2019 Jan 7.