Suppr超能文献

深度学习神经网络预训练中的稀疏性分析。

Sparseness Analysis in the Pretraining of Deep Neural Networks.

出版信息

IEEE Trans Neural Netw Learn Syst. 2017 Jun;28(6):1425-1438. doi: 10.1109/TNNLS.2016.2541681. Epub 2016 Mar 31.

Abstract

A major progress in deep multilayer neural networks (DNNs) is the invention of various unsupervised pretraining methods to initialize network parameters which lead to good prediction accuracy. This paper presents the sparseness analysis on the hidden unit in the pretraining process. In particular, we use the L -norm to measure sparseness and provide some sufficient conditions for that pretraining leads to sparseness with respect to the popular pretraining models-such as denoising autoencoders (DAEs) and restricted Boltzmann machines (RBMs). Our experimental results demonstrate that when the sufficient conditions are satisfied, the pretraining models lead to sparseness. Our experiments also reveal that when using the sigmoid activation functions, pretraining plays an important sparseness role in DNNs with sigmoid (Dsigm), and when using the rectifier linear unit (ReLU) activation functions, pretraining becomes less effective for DNNs with ReLU (Drelu). Luckily, Drelu can reach a higher recognition accuracy than DNNs with pretraining (DAEs and RBMs), as it can capture the main benefit (such as sparseness-encouraging) of pretraining in Dsigm. However, ReLU is not adapted to the different firing rates in biological neurons, because the firing rate actually changes along with the varying membrane resistances. To address this problem, we further propose a family of rectifier piecewise linear units (RePLUs) to fit the different firing rates. The experimental results show that the performance of RePLU is better than ReLU, and is comparable with those with some pretraining techniques, such as RBMs and DAEs.

摘要

深度多层神经网络 (DNNs) 的一个主要进展是发明了各种无监督预训练方法来初始化网络参数,从而获得良好的预测精度。本文对预训练过程中的隐藏单元稀疏性进行了分析。具体来说,我们使用 L-范数来衡量稀疏性,并为一些流行的预训练模型(如去噪自动编码器 (DAE) 和受限玻尔兹曼机 (RBM))提供了预训练导致稀疏性的充分条件。我们的实验结果表明,当满足充分条件时,预训练模型会导致稀疏性。我们的实验还表明,当使用 Sigmoid 激活函数时,预训练在具有 Sigmoid (Dsigm) 的 DNNs 中起着重要的稀疏作用,而当使用整流线性单元 (ReLU) 激活函数时,预训练对于具有 ReLU (Drelu) 的 DNNs 的效果就不那么明显了。幸运的是,Drelu 可以达到比具有预训练 (DAE 和 RBM) 的 DNNs 更高的识别准确率,因为它可以在 Dsigm 中捕获预训练的主要优势(如稀疏性鼓励)。然而,ReLU 并不适应生物神经元的不同放电率,因为放电率实际上会随着膜电阻的变化而变化。为了解决这个问题,我们进一步提出了一系列整流分段线性单元 (RePLU) 来适应不同的放电率。实验结果表明,RePLU 的性能优于 ReLU,并且与一些预训练技术(如 RBM 和 DAE)相当。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验