Suppr超能文献

隐式增量自然动作值函数评论家算法。

Implicit incremental natural actor critic algorithm.

机构信息

Osaka University, 2-1, Yamadaoka, Suita city, Osaka, Japan.

出版信息

Neural Netw. 2019 Jan;109:103-112. doi: 10.1016/j.neunet.2018.10.007. Epub 2018 Oct 21.

Abstract

Natural policy gradient (NPG) methods are promising approaches to finding locally optimal policy parameters. The NPG approach works well in optimizing complex policies with high-dimensional parameters, and the effectiveness of NPG methods has been demonstrated in many fields. However, the incremental estimation of the NPG is computationally unstable owing to its high sensitivity to the step-sizes values, especially to the one used to update the estimate of NPG. In this study, we propose a new incremental and stable algorithm for the NPG estimation. We call the proposed algorithm the implicit incremental natural actor critic (I2NAC), and it is based on the idea of the implicit update. The convergence analysis for I2NAC is provided. Theoretical analysis results indicate the stability of I2NAC and the instability of conventional incremental NPG methods. Numerical experiments were performed, and the results show that I2NAC is less sensitive to the values of the meta-parameters, including the step-size for the NPG update, compared to the existing incremental NPG method.

摘要

自然策略梯度(NPG)方法是寻找局部最优策略参数的一种很有前途的方法。NPG 方法在优化具有高维参数的复杂策略方面效果很好,并且其有效性已经在许多领域得到了证明。然而,由于其对步长值的高度敏感性,尤其是对用于更新 NPG 估计的步长值的敏感性,NPG 的增量估计在计算上是不稳定的。在这项研究中,我们提出了一种新的用于 NPG 估计的增量和稳定算法。我们称所提出的算法为隐式增量自然动作评论家(I2NAC),它基于隐式更新的思想。提供了对 I2NAC 的收敛性分析。理论分析结果表明了 I2NAC 的稳定性和传统增量 NPG 方法的不稳定性。进行了数值实验,结果表明,与现有的增量 NPG 方法相比,I2NAC 对元参数的值(包括 NPG 更新的步长)的敏感性较低。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验