Suppr超能文献

使用电阻式交叉点器件训练长短期记忆网络

Training LSTM Networks With Resistive Cross-Point Devices.

作者信息

Gokmen Tayfun, Rasch Malte J, Haensch Wilfried

机构信息

IBM Research AI, Yorktown Heights, NY, United States.

出版信息

Front Neurosci. 2018 Oct 24;12:745. doi: 10.3389/fnins.2018.00745. eCollection 2018.

Abstract

In our previous work we have shown that resistive cross point devices, so called resistive processing unit (RPU) devices, can provide significant power and speed benefits when training deep fully connected networks as well as convolutional neural networks. In this work, we further extend the RPU concept for training recurrent neural networks (RNNs) namely LSTMs. We show that the mapping of recurrent layers is very similar to the mapping of fully connected layers and therefore the RPU concept can potentially provide large acceleration factors for RNNs as well. In addition, we study the effect of various device imperfections and system parameters on training performance. Symmetry of updates becomes even more crucial for RNNs; already a few percent asymmetry results in an increase in the test error compared to the ideal case trained with floating point numbers. Furthermore, the input signal resolution to the device arrays needs to be at least 7 bits for successful training. However, we show that a stochastic rounding scheme can reduce the input signal resolution back to 5 bits. Further, we find that RPU device variations and hardware noise are enough to mitigate overfitting, so that there is less need for using dropout. Here we attempt to study the validity of the RPU approach by simulating large scale networks. For instance, the models studied here are roughly 1500 times larger than the more often studied multilayer perceptron models trained on the MNIST dataset in terms of the total number of multiplication and summation operations performed per epoch.

摘要

在我们之前的工作中,我们已经表明,电阻式交叉点器件,即所谓的电阻处理单元(RPU)器件,在训练深度全连接网络以及卷积神经网络时,可以提供显著的功率和速度优势。在这项工作中,我们进一步扩展了RPU概念,用于训练循环神经网络(RNN),即长短期记忆网络(LSTM)。我们表明,循环层的映射与全连接层的映射非常相似,因此RPU概念也有可能为RNN提供较大的加速因子。此外,我们研究了各种器件缺陷和系统参数对训练性能的影响。更新的对称性对RNN来说变得更加关键;与使用浮点数训练的理想情况相比,仅百分之几的不对称就会导致测试误差增加。此外,为了成功训练,输入到器件阵列的信号分辨率至少需要7位。然而,我们表明,一种随机舍入方案可以将输入信号分辨率降低回5位。此外,我们发现RPU器件的变化和硬件噪声足以减轻过拟合,因此使用随机失活的必要性降低。在这里,我们试图通过模拟大规模网络来研究RPU方法的有效性。例如,就每个epoch执行的乘法和加法运算的总数而言,这里研究的模型比在MNIST数据集上训练的更常被研究的多层感知器模型大约大1500倍。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验