长短期记忆网络的工作记忆连接。

Working Memory Connections for LSTM.

机构信息

Department of Engineering "Enzo Ferrari", University of Modena and Reggio Emilia, Modena, Italy.

出版信息

Neural Netw. 2021 Dec;144:334-341. doi: 10.1016/j.neunet.2021.08.030. Epub 2021 Sep 4.

DOI:10.1016/j.neunet.2021.08.030

Abstract

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

摘要

具有长短时记忆 (LSTM) 的递归神经网络利用门控机制来缓解在学习长期依赖关系时的梯度爆炸和消失问题。出于这个原因，LSTM 和其他门控 RNN 被广泛采用，成为许多序列建模任务的标准事实上的方法。虽然 LSTM 内部的记忆单元包含了重要的信息，但不允许它直接影响门控机制。在这项工作中，我们通过包含来自内部单元状态的信息来改进门控潜力。所提出的修改，名为工作记忆连接（Working Memory Connection），包括将单元内容的可学习非线性投影添加到网络门控中。这种修改可以适应经典的 LSTM 门控，而无需对基础任务做出任何假设，特别是在处理更长序列时效果尤为显著。该方向的早期研究可以追溯到 21 世纪初，之前的研究努力无法带来对 vanilla LSTM 的一致改进。作为本文的一部分，我们确定了一个与之前连接相关的关键问题，该问题严重限制了它们的有效性，从而阻止了来自内部单元状态的知识的成功整合。我们通过广泛的实验评估表明，工作记忆连接在各种任务上不断提高 LSTM 的性能。数值结果表明，单元状态包含了值得纳入门控结构的有用信息。

相似文献

Working Memory Connections for LSTM.长短期记忆网络的工作记忆连接。

Neural Netw. 2021 Dec;144:334-341. doi: 10.1016/j.neunet.2021.08.030. Epub 2021 Sep 4.

Subtraction Gates: Another Way to Learn Long-Term Dependencies in Recurrent Neural Networks.减法门控：循环神经网络中学习长期依赖关系的另一种方法。

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1740-1751. doi: 10.1109/TNNLS.2020.3043752. Epub 2022 Apr 4.

A critical review of RNN and LSTM variants in hydrological time series predictions.对水文时间序列预测中循环神经网络（RNN）和长短期记忆网络（LSTM）变体的批判性综述。

MethodsX. 2024 Sep 12;13:102946. doi: 10.1016/j.mex.2024.102946. eCollection 2024 Dec.

A New Delay Connection for Long Short-Term Memory Networks.一种用于长短时记忆网络的新型延迟连接。

Int J Neural Syst. 2018 Aug;28(6):1750061. doi: 10.1142/S0129065717500617. Epub 2017 Dec 17.

Learning to forget: continual prediction with LSTM.学习遗忘：使用长短期记忆网络进行持续预测。

Neural Comput. 2000 Oct;12(10):2451-71. doi: 10.1162/089976600300015015.

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures.递归神经网络综述：长短期记忆细胞和网络架构。

Neural Comput. 2019 Jul;31(7):1235-1270. doi: 10.1162/neco_a_01199. Epub 2019 May 21.

Gating Revisited: Deep Multi-Layer RNNs That can be Trained.门控重新审视：可训练的深度多层 RNN

IEEE Trans Pattern Anal Mach Intell. 2022 Aug;44(8):4081-4092. doi: 10.1109/TPAMI.2021.3064878. Epub 2022 Jul 1.

Gated Orthogonal Recurrent Units: On Learning to Forget.门控正交循环单元：关于学习遗忘

Neural Comput. 2019 Apr;31(4):765-783. doi: 10.1162/neco_a_01174. Epub 2019 Feb 14.

Explicit Duration Recurrent Networks.显式持续时间递归网络。

IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):3120-3130. doi: 10.1109/TNNLS.2021.3051019. Epub 2022 Jul 6.

Character gated recurrent neural networks for Arabic sentiment analysis.基于字符门控循环神经网络的阿拉伯语情感分析。

Sci Rep. 2022 Jun 13;12(1):9779. doi: 10.1038/s41598-022-13153-w.

引用本文的文献

Fault prediction method of large forging press based on a multi scale and multi model integrated method.基于多尺度多模型集成方法的大型锻造压力机故障预测方法

Sci Rep. 2025 Aug 21;15(1):30675. doi: 10.1038/s41598-025-16528-x.

Deep learning-based time series prediction in multispectral and hyperspectral imaging for cancer detection.基于深度学习的多光谱和高光谱成像时间序列预测用于癌症检测。

Front Med (Lausanne). 2025 Jul 31;12:1605865. doi: 10.3389/fmed.2025.1605865. eCollection 2025.

Time series transformer for tourism demand forecasting.用于旅游需求预测的时间序列变压器

Sci Rep. 2025 Aug 12;15(1):29565. doi: 10.1038/s41598-025-15286-0.

ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations.ERNIE-TextCNN：不同场景下中文新闻标题分类方法研究

Sci Rep. 2025 Aug 8;15(1):29071. doi: 10.1038/s41598-025-14955-4.

Structural health monitoring and evaluation method for an immersed tunnel based on deep learning.基于深度学习的沉管隧道结构健康监测与评估方法

Sci Rep. 2025 Jul 8;15(1):24393. doi: 10.1038/s41598-025-10643-5.

Emotion recognition with multiple physiological parameters based on ensemble learning.基于集成学习的多生理参数情感识别

Sci Rep. 2025 Jun 6;15(1):19869. doi: 10.1038/s41598-025-96616-0.

Innovative novel regularized memory graph attention capsule network for financial fraud detection.用于金融欺诈检测的创新型正则化记忆图注意力胶囊网络。

PLoS One. 2025 May 28;20(5):e0317893. doi: 10.1371/journal.pone.0317893. eCollection 2025.

Forecasting monthly runoff in a glacierized catchment: A comparison of extreme gradient boosting (XGBoost) and deep learning models.预测冰川集水区的月径流量：极端梯度提升（XGBoost）与深度学习模型的比较

PLoS One. 2025 May 23;20(5):e0321008. doi: 10.1371/journal.pone.0321008. eCollection 2025.

Channel equalization in ultraviolet communication based on LSTM-DNN hybrid model.基于长短期记忆网络-深度神经网络混合模型的紫外通信中的信道均衡

Sci Rep. 2025 May 18;15(1):17226. doi: 10.1038/s41598-025-02159-9.

Fatigue life predictor: predicting fatigue life of metallic material using LSTM with a contextual attention model.疲劳寿命预测器：使用带有上下文注意力模型的长短期记忆网络（LSTM）预测金属材料的疲劳寿命。

RSC Adv. 2025 May 13;15(20):15781-15795. doi: 10.1039/d5ra01578b. eCollection 2025 May 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

长短期记忆网络的工作记忆连接。

Working Memory Connections for LSTM.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献