• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在强化学习中用于神经网络函数逼近的 Sigmoid 加权线性单元。

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.

机构信息

Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seikacho, Soraku-gun, Kyoto 619-0288, Japan.

Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seikacho, Soraku-gun, Kyoto 619-0288, Japan; Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan.

出版信息

Neural Netw. 2018 Nov;107:3-11. doi: 10.1016/j.neunet.2017.12.012. Epub 2018 Jan 11.

DOI:10.1016/j.neunet.2017.12.012
PMID:29395652
Abstract

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10 × 10 board, using TD(λ) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(λ) agent with SiLU and dSiLU hidden units.

摘要

近年来,神经网络作为强化学习中的函数逼近器重新焕发生机。在 Tesauro 的 TD-Gammon 在西洋双陆棋中取得接近人类顶级水平的表现 20 年后,深度强化学习算法 DQN 在许多 Atari 2600 游戏中达到了人类水平的表现。本研究的目的有二。首先,我们提出了两种强化学习中神经网络函数逼近的激活函数:Sigmoid 加权线性单元(SiLU)及其导数函数(dSiLU)。SiLU 的激活是通过将 sigmoid 函数乘以其输入来计算的。其次,我们建议使用基于策略的学习和资格迹(eligibility traces),而不是经验回放(experience replay),以及 softmax 动作选择,可以与 DQN 竞争,而不需要单独的目标网络。我们通过使用 TD(λ)学习和浅层 dSiLU 网络代理,在随机 SZ-Tetris 和小 10×10 棋盘的 Tetris 中取得新的最先进的结果,然后通过使用具有 SiLU 和 dSiLU 隐藏单元的深度 Sarsa(λ)代理在 Atari 2600 领域中超越 DQN,验证了我们的方法。

相似文献

1
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.在强化学习中用于神经网络函数逼近的 Sigmoid 加权线性单元。
Neural Netw. 2018 Nov;107:3-11. doi: 10.1016/j.neunet.2017.12.012. Epub 2018 Jan 11.
2
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
3
From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.从自由能到期望能量:改进强化学习中的基于能量的价值函数逼近。
Neural Netw. 2016 Dec;84:17-27. doi: 10.1016/j.neunet.2016.07.013. Epub 2016 Aug 26.
4
Minibatch Recursive Least Squares Q-Learning.小批量递归最小二乘 Q 学习。
Comput Intell Neurosci. 2021 Oct 8;2021:5370281. doi: 10.1155/2021/5370281. eCollection 2021.
5
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.深度强化学习中的自定步调和带覆盖惩罚的优先级课程学习。
IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2216-2226. doi: 10.1109/TNNLS.2018.2790981.
6
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network.基于回报的深度 Q 网络的政策差异的定性测量。
IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):4374-4380. doi: 10.1109/TNNLS.2019.2948892. Epub 2019 Nov 22.
7
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
8
Multisource Transfer Double DQN Based on Actor Learning.基于演员学习的多源转移双 DQN。
IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2227-2238. doi: 10.1109/TNNLS.2018.2806087.
9
Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning.受限深度Q学习逐步逼近普通Q学习。
Front Neurorobot. 2019 Dec 10;13:103. doi: 10.3389/fnbot.2019.00103. eCollection 2019.
10
Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback.将时间差分方法与自组织神经网络相结合用于具有延迟评估反馈的强化学习。
IEEE Trans Neural Netw. 2008 Feb;19(2):230-44. doi: 10.1109/TNN.2007.905839.

引用本文的文献

1
An anchor-based YOLO fruit detector developed on YOLOv5.基于YOLOv5开发的基于锚点的YOLO水果检测器。
PLoS One. 2025 Sep 5;20(9):e0331012. doi: 10.1371/journal.pone.0331012. eCollection 2025.
2
Fine-grained image classification using the MogaNet network and a multi-level gating mechanism.使用MogaNet网络和多级门控机制的细粒度图像分类
Front Neurorobot. 2025 Aug 6;19:1630281. doi: 10.3389/fnbot.2025.1630281. eCollection 2025.
3
LABind: identifying protein binding ligand-aware sites via learning interactions between ligand and protein.
LABind:通过学习配体与蛋白质之间的相互作用来识别蛋白质结合配体感知位点。
Nat Commun. 2025 Aug 19;16(1):7712. doi: 10.1038/s41467-025-62899-0.
4
A rolling bearing fault diagnosis method based on an improved parallel one-dimensional convolutional neural network.一种基于改进型并行一维卷积神经网络的滚动轴承故障诊断方法。
PLoS One. 2025 Aug 11;20(8):e0327206. doi: 10.1371/journal.pone.0327206. eCollection 2025.
5
Accurate segmentation of localized fuel cladding chemical interaction layers in SEM micrographs with deep learning method.使用深度学习方法对扫描电子显微镜(SEM)显微照片中的局部燃料包壳化学相互作用层进行精确分割。
Sci Rep. 2025 Aug 7;15(1):28878. doi: 10.1038/s41598-025-14927-8.
6
Virtual staining of label-free tissue in imaging mass spectrometry.成像质谱中无标记组织的虚拟染色
Sci Adv. 2025 Aug;11(31):eadv0741. doi: 10.1126/sciadv.adv0741. Epub 2025 Aug 1.
7
Accurate recognition of UAVs on multi-scenario perception with YOLOv9-CAG.使用YOLOv9-CAG在多场景感知中对无人机进行准确识别。
Sci Rep. 2025 Jul 30;15(1):27755. doi: 10.1038/s41598-025-12670-8.
8
MCFA: Multi-Scale Cascade and Feature Adaptive Alignment Network for Cross-View Geo-Localization.MCFA:用于跨视图地理定位的多尺度级联与特征自适应对齐网络
Sensors (Basel). 2025 Jul 21;25(14):4519. doi: 10.3390/s25144519.
9
An automated hybrid deep learning framework for paddy leaf disease identification and classification.一种用于水稻叶片病害识别与分类的自动化混合深度学习框架。
Sci Rep. 2025 Jul 24;15(1):26873. doi: 10.1038/s41598-025-08071-6.
10
BDSER-InceptionNet: A Novel Method for Near-Infrared Spectroscopy Model Transfer Based on Deep Learning and Balanced Distribution Adaptation.BDSER-InceptionNet:一种基于深度学习和平衡分布适应的近红外光谱模型转移新方法。
Sensors (Basel). 2025 Jun 27;25(13):4008. doi: 10.3390/s25134008.