Lee Ching-Hua, Fedorov Igor, Rao Bhaskar D, Garudadri Harinath
Department of ECE, University of California, San Diego.
ARM ML Research.
Proc IEEE Int Conf Acoust Speech Signal Process. 2020 May;2020:5410-5414. doi: 10.1109/icassp40776.2020.9054436. Epub 2020 May 14.
While deep neural networks (DNNs) have achieved state-of-the-art results in many fields, they are typically over-parameterized. Parameter redundancy, in turn, leads to inefficiency. Sparse signal recovery (SSR) techniques, on the other hand, find compact solutions to overcomplete linear problems. Therefore, a logical step is to draw the connection between SSR and DNNs. In this paper, we explore the application of iterative reweighting methods popular in SSR to learning efficient DNNs. By efficient, we mean sparse networks that require less computation and storage than the original, dense network. We propose a reweighting framework to learn sparse connections within a given architecture without biasing the optimization process, by utilizing the affine scaling transformation strategy. The resulting algorithm, referred to as Sparsity-promoting Stochastic Gradient Descent (SSGD), has simple gradient-based updates which can be easily implemented in existing deep learning libraries. We demonstrate the sparsification ability of SSGD on image classification tasks and show that it outperforms existing methods on the MNIST and CIFAR-10 datasets.
虽然深度神经网络(DNN)在许多领域都取得了领先的成果,但它们通常参数过多。参数冗余进而导致效率低下。另一方面,稀疏信号恢复(SSR)技术能找到超完备线性问题的紧凑解决方案。因此,一个合理的步骤是建立SSR和DNN之间的联系。在本文中,我们探索将SSR中流行的迭代重加权方法应用于学习高效的DNN。所谓高效,我们指的是稀疏网络,它比原始的密集网络需要更少的计算和存储。我们提出了一个重加权框架,通过利用仿射缩放变换策略,在给定架构内学习稀疏连接,而不会对优化过程产生偏差。由此产生的算法称为稀疏促进随机梯度下降(SSGD),它具有基于梯度的简单更新,可轻松在现有的深度学习库中实现。我们在图像分类任务上展示了SSGD的稀疏化能力,并表明它在MNIST和CIFAR - 10数据集上优于现有方法。