Suppr超能文献

用于提供图像反馈环境的协作深度Q学习框架

Cooperative Deep Q-Learning Framework for Environments Providing Image Feedback.

作者信息

Raghavan Krishnan, Narayanan Vignesh, Jagannathan Sarangapani

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):9267-9276. doi: 10.1109/TNNLS.2022.3232069. Epub 2024 Jul 8.

Abstract

In this article, we address two key challenges in deep reinforcement learning (DRL) setting, sample inefficiency and slow learning, with a dual-neural network (NN)-driven learning approach. In the proposed approach, we use two deep NNs with independent initialization to robustly approximate the action-value function in the presence of image inputs. In particular, we develop a temporal difference (TD) error-driven learning (EDL) approach, where we introduce a set of linear transformations of the TD error to directly update the parameters of each layer in the deep NN. We demonstrate theoretically that the cost minimized by the EDL regime is an approximation of the empirical cost, and the approximation error reduces as learning progresses, irrespective of the size of the network. Using simulation analysis, we show that the proposed methods enable faster learning and convergence and require reduced buffer size (thereby increasing the sample efficiency).

摘要

在本文中,我们采用双神经网络驱动的学习方法,解决深度强化学习(DRL)环境中的两个关键挑战,即样本低效和学习缓慢问题。在所提出的方法中,我们使用两个独立初始化的深度神经网络,在存在图像输入的情况下稳健地逼近动作值函数。具体而言,我们开发了一种时间差分(TD)误差驱动学习(EDL)方法,其中我们引入了一组TD误差的线性变换,以直接更新深度神经网络中各层的参数。我们从理论上证明,由EDL机制最小化的成本是经验成本的近似值,并且无论网络大小如何,随着学习的进行,近似误差都会减小。通过仿真分析,我们表明所提出的方法能够实现更快的学习和收敛,并且所需的缓冲区大小更小(从而提高了样本效率)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验