• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于标度自由能的强化学习方法,用于在高维状态空间中进行鲁棒和高效的学习。

Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces.

机构信息

Neural Computation Unit, Okinawa Institute of Science and Technology, Graduate University Okinawa, Japan.

出版信息

Front Neurorobot. 2013 Feb 28;7:3. doi: 10.3389/fnbot.2013.00003. eCollection 2013.

DOI:10.3389/fnbot.2013.00003
PMID:23450126
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3584292/
Abstract

Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state- and action spaces, which cannot be handled by standard function approximation methods. In this study, we propose a scaled version of free-energy based reinforcement learning to achieve more robust and more efficient learning performance. The action-value function is approximated by the negative free-energy of a restricted Boltzmann machine, divided by a constant scaling factor that is related to the size of the Boltzmann machine (the square root of the number of state nodes in this study). Our first task is a digit floor gridworld task, where the states are represented by images of handwritten digits from the MNIST data set. The purpose of the task is to investigate the proposed method's ability, through the extraction of task-relevant features in the hidden layer, to cluster images of the same digit and to cluster images of different digits that corresponds to states with the same optimal action. We also test the method's robustness with respect to different exploration schedules, i.e., different settings of the initial temperature and the temperature discount rate in softmax action selection. Our second task is a robot visual navigation task, where the robot can learn its position by the different colors of the lower part of four landmarks and it can infer the correct corner goal area by the color of the upper part of the landmarks. The state space consists of binarized camera images with, at most, nine different colors, which is equal to 6642 binary states. For both tasks, the learning performance is compared with standard FERL and with function approximation where the action-value function is approximated by a two-layered feedforward neural network.

摘要

基于自由能的强化学习 (FERL) 被提出用于学习高维状态和动作空间,这是标准函数逼近方法无法处理的。在本研究中,我们提出了一种自由能的缩放版本,以实现更稳健和更有效的学习性能。动作值函数由受限玻尔兹曼机的负自由能近似,除以一个与玻尔兹曼机大小相关的常数缩放因子(在本研究中是状态节点数的平方根)。我们的第一个任务是数字地板网格世界任务,其中状态由 MNIST 数据集的手写数字图像表示。任务的目的是通过在隐藏层中提取与任务相关的特征,研究所提出的方法能够对相同数字的图像进行聚类,并对具有相同最优动作的不同数字的图像进行聚类。我们还测试了该方法对不同探索计划的鲁棒性,即软动作选择中初始温度和温度折扣率的不同设置。我们的第二个任务是机器人视觉导航任务,机器人可以通过四个地标物下部的不同颜色来学习自己的位置,并且可以通过地标物上部的颜色来推断正确的角落目标区域。状态空间由最多有九个不同颜色的二值化相机图像组成,这相当于 6642 个二值状态。对于这两个任务,将学习性能与标准 FERL 和函数逼近进行比较,其中动作值函数由两层前馈神经网络近似。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/ca1396ae4288/fnbot-07-00003-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/b0f8bdcbfe41/fnbot-07-00003-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/1ccf1576ed8e/fnbot-07-00003-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/39e4e1c85e3a/fnbot-07-00003-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/cb0b014cf567/fnbot-07-00003-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/e2f8efaf6fdd/fnbot-07-00003-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/2b2cee92977e/fnbot-07-00003-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/dc60a84c1d06/fnbot-07-00003-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/df93424921c7/fnbot-07-00003-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/ca1396ae4288/fnbot-07-00003-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/b0f8bdcbfe41/fnbot-07-00003-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/1ccf1576ed8e/fnbot-07-00003-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/39e4e1c85e3a/fnbot-07-00003-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/cb0b014cf567/fnbot-07-00003-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/e2f8efaf6fdd/fnbot-07-00003-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/2b2cee92977e/fnbot-07-00003-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/dc60a84c1d06/fnbot-07-00003-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/df93424921c7/fnbot-07-00003-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74ba/3584292/ca1396ae4288/fnbot-07-00003-g0009.jpg

相似文献

1
Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces.基于标度自由能的强化学习方法,用于在高维状态空间中进行鲁棒和高效的学习。
Front Neurorobot. 2013 Feb 28;7:3. doi: 10.3389/fnbot.2013.00003. eCollection 2013.
2
From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.从自由能到期望能量:改进强化学习中的基于能量的价值函数逼近。
Neural Netw. 2016 Dec;84:17-27. doi: 10.1016/j.neunet.2016.07.013. Epub 2016 Aug 26.
3
Expected energy-based restricted Boltzmann machine for classification.预期基于能量的受限玻尔兹曼机分类。
Neural Netw. 2015 Apr;64:29-38. doi: 10.1016/j.neunet.2014.09.006. Epub 2014 Sep 28.
4
Modular deep reinforcement learning from reward and punishment for robot navigation.基于奖惩的机器人导航模块化深度强化学习。
Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.
5
Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states.内核动态策略编程:适用于高维状态机器人系统的强化学习。
Neural Netw. 2017 Oct;94:13-23. doi: 10.1016/j.neunet.2017.06.007. Epub 2017 Jun 29.
6
Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.在移动机器人导航任务中,具有动态模型学习的线性可解马尔可夫决策过程的评估。
Front Neurorobot. 2013 Apr 5;7:7. doi: 10.3389/fnbot.2013.00007. eCollection 2013.
7
A pseudo-softmax function for hardware-based high speed image classification.基于硬件的高速图像分类的伪软最大化函数。
Sci Rep. 2021 Jul 28;11(1):15307. doi: 10.1038/s41598-021-94691-7.
8
A Critical Period for Robust Curriculum-Based Deep Reinforcement Learning of Sequential Action in a Robot Arm.在机械臂中基于课程的稳健深度强化学习的序列动作的关键时期。
Top Cogn Sci. 2022 Apr;14(2):311-326. doi: 10.1111/tops.12595. Epub 2022 Jan 10.
9
An efficient learning procedure for deep Boltzmann machines.一种深度玻尔兹曼机的有效学习过程。
Neural Comput. 2012 Aug;24(8):1967-2006. doi: 10.1162/NECO_a_00311. Epub 2012 Apr 17.
10
Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions.使用基于感受野的函数逼近方法并结合连续动作,通过强化学习来学习伸手动作。
Biol Cybern. 2009 Mar;100(3):249-60. doi: 10.1007/s00422-009-0295-8. Epub 2009 Feb 20.

引用本文的文献

1
Accounting for negative automaintenance in pigeons: a dual learning systems approach and factored representations.解释鸽子的负自动维持:一种双学习系统方法和分解表征
PLoS One. 2014 Oct 27;9(10):e111050. doi: 10.1371/journal.pone.0111050. eCollection 2014.
2
Modelling individual differences in the form of Pavlovian conditioned approach responses: a dual learning systems approach with factored representations.以巴甫洛夫条件性趋近反应形式对个体差异进行建模:一种具有分解表征的双学习系统方法。
PLoS Comput Biol. 2014 Feb 13;10(2):e1003466. doi: 10.1371/journal.pcbi.1003466. eCollection 2014 Feb.
3
Biologically inspired intelligent decision making: a commentary on the use of artificial neural networks in bioinformatics.

本文引用的文献

1
A biologically inspired meta-control navigation system for the Psikharpax rat robot.受生物启发的 Psikharpax 老鼠机器人元控制导航系统。
Bioinspir Biomim. 2012 Jun;7(2):025009. doi: 10.1088/1748-3182/7/2/025009. Epub 2012 May 22.
2
Retrospective and prospective responses arising in a modeled hippocampus during maze navigation by a brain-based device.在基于大脑的设备进行迷宫导航期间,模型海马体中产生的回顾性和前瞻性反应。
Proc Natl Acad Sci U S A. 2007 Feb 27;104(9):3556-61. doi: 10.1073/pnas.0611571104. Epub 2007 Feb 21.
3
Spatial navigation and causal analysis in a brain-based device modeling cortical-hippocampal interactions.
受生物启发的智能决策:关于人工神经网络在生物信息学中应用的评论
Bioengineered. 2014 Mar-Apr;5(2):80-95. doi: 10.4161/bioe.26997. Epub 2013 Dec 16.
4
Value and reward based learning in neurorobots.神经机器人中基于价值和奖励的学习
Front Neurorobot. 2013 Sep 13;7:13. doi: 10.3389/fnbot.2013.00013. eCollection 2013.
基于大脑的设备中模拟皮质-海马体相互作用的空间导航与因果分析。
Neuroinformatics. 2005;3(3):197-221. doi: 10.1385/NI:3:3:197.
4
Training products of experts by minimizing contrastive divergence.通过最小化对比散度来训练专家的产品。
Neural Comput. 2002 Aug;14(8):1771-800. doi: 10.1162/089976602760128018.
5
Multiple model-based reinforcement learning.基于多模型的强化学习
Neural Comput. 2002 Jun;14(6):1347-69. doi: 10.1162/089976602753712972.
6
Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity.空间认知与神经模拟导航:海马体位置细胞活动模型
Biol Cybern. 2000 Sep;83(3):287-99. doi: 10.1007/s004220000171.