Ororbia Ii Alexander G, Mikolov Tomas, Reitter David
College of Information Sciences and Technology, Pennsylvania State University, State College, PA 16802, U.S.A.
Facebook, New York, NY 10003, U.S.A.
Neural Comput. 2017 Dec;29(12):3327-3352. doi: 10.1162/neco_a_01017. Epub 2017 Sep 28.
Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling. Existing architectures that address the issue are often complex and costly to train. The differential state framework (DSF) is a simple and high-performing design that unifies previously introduced gated neural models. DSF models maintain longer-term memory by learning to interpolate between a fast-changing data-driven representation and a slowly changing, implicitly stable state. Within the DSF framework, a new architecture is presented, the delta-RNN. This model requires hardly any more parameters than a classical, simple recurrent network. In language modeling at the word and character levels, the delta-RNN outperforms popular complex architectures, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU), and, when regularized, performs comparably to several state-of-the-art baselines. At the subword level, the delta-RNN's performance is comparable to that of complex gated architectures.
对于语言建模等任务中的时间神经模型而言,跨越长时间滞后学习有用信息是一个关键且困难的问题。解决该问题的现有架构通常训练起来复杂且成本高昂。差分状态框架(DSF)是一种简单且高性能的设计,它统一了先前引入的门控神经模型。DSF模型通过学习在快速变化的数据驱动表示和缓慢变化、隐含稳定的状态之间进行插值来维持长期记忆。在DSF框架内,提出了一种新架构——增量循环神经网络(delta-RNN)。该模型所需参数几乎不比经典的简单循环网络多。在单词和字符级别的语言建模中,delta-RNN优于流行的复杂架构,如长短期记忆网络(LSTM)和门控循环单元(GRU),并且在经过正则化处理后,其性能与多个最先进的基线相当。在子词级别,delta-RNN的性能与复杂门控架构相当。