Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, PR China.
Neural Netw. 2019 Feb;110:104-115. doi: 10.1016/j.neunet.2018.10.016. Epub 2018 Nov 13.
Despite recent success of deep learning models in numerous applications, their widespread use on mobile devices is seriously impeded by storage and computational requirements. In this paper, we propose a novel network compression method called Adaptive Dimension Adjustment Tucker decomposition (ADA-Tucker). With learnable core tensors and transformation matrices, ADA-Tucker performs Tucker decomposition of arbitrary-order tensors. Furthermore, we propose that weight tensors in networks with proper order and balanced dimension are easier to be compressed. Therefore, the high flexibility in decomposition choice distinguishes ADA-Tucker from all previous low-rank models. To compress more, we further extend the model to Shared Core ADA-Tucker (SCADA-Tucker) by defining a shared core tensor for all layers. Our methods require no overhead of recording indices of non-zero elements. Without loss of accuracy, our methods reduce the storage of LeNet-5 and LeNet-300 by ratios of 691× and 233 ×, respectively, significantly outperforming state of the art. The effectiveness of our methods is also evaluated on other three benchmarks (CIFAR-10, SVHN, ILSVRC12) and modern newly deep networks (ResNet, Wide-ResNet).
尽管深度学习模型在许多应用中取得了近期的成功,但它们在移动设备上的广泛应用受到存储和计算需求的严重阻碍。在本文中,我们提出了一种名为自适应维度调整 Tucker 分解(ADA-Tucker)的新型网络压缩方法。ADA-Tucker 使用可学习的核心张量和变换矩阵,对任意阶张量执行 Tucker 分解。此外,我们提出,具有适当阶数和平衡维度的网络中的权重张量更容易被压缩。因此,在分解选择方面的高度灵活性将 ADA-Tucker 与所有以前的低秩模型区分开来。为了进一步压缩,我们通过为所有层定义一个共享核心张量,将模型进一步扩展为共享核心 ADA-Tucker(SCADA-Tucker)。我们的方法不需要记录非零元素索引的开销。在不损失精度的情况下,我们的方法将 LeNet-5 和 LeNet-300 的存储分别减少了 691 倍和 233 倍,显著优于最新技术。我们的方法的有效性还在其他三个基准(CIFAR-10、SVHN、ILSVRC12)和现代新的深度网络(ResNet、Wide-ResNet)上进行了评估。