• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有调制赫布型加Q网络架构的深度强化学习

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

作者信息

Ladosz Pawel, Ben-Iwhiwhu Eseoghene, Dick Jeffery, Ketz Nicholas, Kolouri Soheil, Krichmar Jeffrey L, Pilly Praveen K, Soltoggio Andrea

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):2045-2056. doi: 10.1109/TNNLS.2021.3110281. Epub 2022 May 2.

DOI:10.1109/TNNLS.2021.3110281
PMID:34559664
Abstract

In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.

摘要

在本文中,我们考虑了部分可观测马尔可夫决策过程(POMDP)问题的一个子类,我们将其称为混杂POMDP。在这类POMDP中,基于时间差分(TD)的强化学习(RL)算法面临困难,因为无法轻易从观测中得出TD误差。我们使用一种新的受生物启发的神经架构来解决这类问题,该架构将调制赫布网络(MOHN)与深度Q网络(DQN)相结合,我们称之为调制赫布加Q网络架构(MOHQA)。关键思想是使用具有极少相关生物启发神经痕迹的赫布网络,在混杂观测和稀疏奖励导致不准确的TD误差时,弥合动作与奖励之间的时间延迟。在MOHQA中,DQN学习低级特征和控制,而MOHN通过将奖励与过去的状态和动作相关联来促成高级决策。因此,所提出的架构结合了两个具有显著不同学习算法的模块,一个赫布关联网络和一个经典的DQN管道,利用了两者的优势。在一组POMDP以及马尔默环境上的模拟表明,所提出的算法改进了DQN的结果,甚至在具有混杂刺激和稀疏奖励的最困难POMDP上,优于优势演员评论家(A2C)、带长短期记忆的分位数回归DQN(QRDQN + LSTM)、蒙特卡罗策略梯度(REINFORCE)以及强化学习聚合记忆(AMRL)算法的控制测试。

相似文献

1
Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.具有调制赫布型加Q网络架构的深度强化学习
IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):2045-2056. doi: 10.1109/TNNLS.2021.3110281. Epub 2022 May 2.
2
Combining STDP and binary networks for reinforcement learning from images and sparse rewards.结合 STDP 和二进制网络,从图像和稀疏奖励中进行强化学习。
Neural Netw. 2021 Dec;144:496-506. doi: 10.1016/j.neunet.2021.09.010. Epub 2021 Sep 17.
3
Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks.神经拟态强化学习探索:基于生物启发神经网络的探索-利用平衡计算方法。
Neural Netw. 2022 Jul;151:16-33. doi: 10.1016/j.neunet.2022.03.021. Epub 2022 Mar 23.
4
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
5
Recognition of Hand Gestures Based on EMG Signals with Deep and Double-Deep Q-Networks.基于 EMG 信号的深度和双深度 Q 网络的手势识别。
Sensors (Basel). 2023 Apr 12;23(8):3905. doi: 10.3390/s23083905.
6
Target Tracking Control of a Biomimetic Underwater Vehicle Through Deep Reinforcement Learning.通过深度强化学习的仿生水下航行器目标跟踪控制。
IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3741-3752. doi: 10.1109/TNNLS.2021.3054402. Epub 2022 Aug 3.
7
Reinforcement learning using a continuous time actor-critic framework with spiking neurons.使用具有尖峰神经元的连续时间动作 - 评论框架进行强化学习。
PLoS Comput Biol. 2013 Apr;9(4):e1003024. doi: 10.1371/journal.pcbi.1003024. Epub 2013 Apr 11.
8
Multisource Transfer Double DQN Based on Actor Learning.基于演员学习的多源转移双 DQN。
IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2227-2238. doi: 10.1109/TNNLS.2018.2806087.
9
Application of Deep Reinforcement Learning to NS-SHAFT Game Signal Control.深度强化学习在 NS-SHAFT 游戏信号控制中的应用。
Sensors (Basel). 2022 Jul 14;22(14):5265. doi: 10.3390/s22145265.
10
Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network.基于辅助评论家网络的自动驾驶策略深度强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3680-3690. doi: 10.1109/TNNLS.2021.3116063. Epub 2023 Jul 6.