Suppr超能文献

在强化学习中用于神经网络函数逼近的 Sigmoid 加权线性单元。

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.

机构信息

Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seikacho, Soraku-gun, Kyoto 619-0288, Japan.

Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seikacho, Soraku-gun, Kyoto 619-0288, Japan; Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan.

出版信息

Neural Netw. 2018 Nov;107:3-11. doi: 10.1016/j.neunet.2017.12.012. Epub 2018 Jan 11.

Abstract

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10 × 10 board, using TD(λ) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(λ) agent with SiLU and dSiLU hidden units.

摘要

近年来,神经网络作为强化学习中的函数逼近器重新焕发生机。在 Tesauro 的 TD-Gammon 在西洋双陆棋中取得接近人类顶级水平的表现 20 年后,深度强化学习算法 DQN 在许多 Atari 2600 游戏中达到了人类水平的表现。本研究的目的有二。首先,我们提出了两种强化学习中神经网络函数逼近的激活函数:Sigmoid 加权线性单元(SiLU)及其导数函数(dSiLU)。SiLU 的激活是通过将 sigmoid 函数乘以其输入来计算的。其次,我们建议使用基于策略的学习和资格迹(eligibility traces),而不是经验回放(experience replay),以及 softmax 动作选择,可以与 DQN 竞争,而不需要单独的目标网络。我们通过使用 TD(λ)学习和浅层 dSiLU 网络代理,在随机 SZ-Tetris 和小 10×10 棋盘的 Tetris 中取得新的最先进的结果,然后通过使用具有 SiLU 和 dSiLU 隐藏单元的深度 Sarsa(λ)代理在 Atari 2600 领域中超越 DQN,验证了我们的方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验