School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
Department of Precision Instrumentation, Center for Brain Inspired Computing Research and Beijing Innovation Center for Future Chip, Tsinghua University, Beijing 100084, China.
Neural Netw. 2020 Nov;131:215-230. doi: 10.1016/j.neunet.2020.07.028. Epub 2020 Aug 7.
Three-dimensional convolutional neural networks (3DCNNs) have been applied in many tasks, e.g., video and 3D point cloud recognition. However, due to the higher dimension of convolutional kernels, the space complexity of 3DCNNs is generally larger than that of traditional two-dimensional convolutional neural networks (2DCNNs). To miniaturize 3DCNNs for the deployment in confining environments such as embedded devices, neural network compression is a promising approach. In this work, we adopt the tensor train (TT) decomposition, a straightforward and simple in situ training compression method, to shrink the 3DCNN models. Through proposing tensorizing 3D convolutional kernels in TT format, we investigate how to select appropriate TT ranks for achieving higher compression ratio. We have also discussed the redundancy of 3D convolutional kernels for compression, core significance and future directions of this work, as well as the theoretical computation complexity versus practical executing time of convolution in TT. In the light of multiple contrast experiments based on VIVA challenge, UCF11, UCF101, and ModelNet40 datasets, we conclude that TT decomposition can compress 3DCNNs by around one hundred times without significant accuracy loss, which will enable its applications in extensive real world scenarios.
三维卷积神经网络(3DCNN)已被应用于许多任务,例如视频和 3D 点云识别。然而,由于卷积核的维度更高,3DCNN 的空间复杂度通常大于传统的二维卷积神经网络(2DCNN)。为了使 3DCNN 小型化,以便部署在嵌入式设备等受限环境中,神经网络压缩是一种很有前途的方法。在这项工作中,我们采用张量分解(TT),一种简单直接的原位训练压缩方法,来压缩 3DCNN 模型。通过以 TT 格式张量化 3D 卷积核,我们研究了如何选择适当的 TT 阶数来实现更高的压缩比。我们还讨论了 3D 卷积核压缩的冗余性、这项工作的核心意义和未来方向,以及 TT 中的卷积理论计算复杂度与实际执行时间。根据基于 VIVA 挑战、UCF11、UCF101 和 ModelNet40 数据集的多项对比实验,我们得出结论,TT 分解可以将 3DCNN 压缩约一百倍,而不会显著降低精度,这将使其能够在广泛的实际场景中得到应用。