Suppr超能文献

长短期记忆网络的工作记忆连接。

Working Memory Connections for LSTM.

机构信息

Department of Engineering "Enzo Ferrari", University of Modena and Reggio Emilia, Modena, Italy.

Department of Engineering "Enzo Ferrari", University of Modena and Reggio Emilia, Modena, Italy.

出版信息

Neural Netw. 2021 Dec;144:334-341. doi: 10.1016/j.neunet.2021.08.030. Epub 2021 Sep 4.

Abstract

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

摘要

具有长短时记忆 (LSTM) 的递归神经网络利用门控机制来缓解在学习长期依赖关系时的梯度爆炸和消失问题。出于这个原因,LSTM 和其他门控 RNN 被广泛采用,成为许多序列建模任务的标准事实上的方法。虽然 LSTM 内部的记忆单元包含了重要的信息,但不允许它直接影响门控机制。在这项工作中,我们通过包含来自内部单元状态的信息来改进门控潜力。所提出的修改,名为工作记忆连接(Working Memory Connection),包括将单元内容的可学习非线性投影添加到网络门控中。这种修改可以适应经典的 LSTM 门控,而无需对基础任务做出任何假设,特别是在处理更长序列时效果尤为显著。该方向的早期研究可以追溯到 21 世纪初,之前的研究努力无法带来对 vanilla LSTM 的一致改进。作为本文的一部分,我们确定了一个与之前连接相关的关键问题,该问题严重限制了它们的有效性,从而阻止了来自内部单元状态的知识的成功整合。我们通过广泛的实验评估表明,工作记忆连接在各种任务上不断提高 LSTM 的性能。数值结果表明,单元状态包含了值得纳入门控结构的有用信息。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验