Lee Donghyun, Wang Dingheng, Yang Yukuan, Deng Lei, Zhao Guangshe, Li Guoqi
Department of Precision Instrumentation, Center for Brain Inspired Computing Research and Beijing Innovation Center for Future Chip, Tsinghua University, Beijing 100084, China.
School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China.
Neural Netw. 2021 Sep;141:420-432. doi: 10.1016/j.neunet.2021.05.034. Epub 2021 Jun 5.
Relying on the rapidly increasing capacity of computing clusters and hardware, convolutional neural networks (CNNs) have been successfully applied in various fields and achieved state-of-the-art results. Despite these exciting developments, the huge memory cost is still involved in training and inferring a large-scale CNN model and makes it hard to be widely used in resource-limited portable devices. To address this problem, we establish a training framework for three-dimensional convolutional neural networks (3DCNNs) named QTTNet that combines tensor train (TT) decomposition and data quantization together for further shrinking the model size and decreasing the memory and time cost. Through this framework, we can fully explore the superiority of TT in reducing the number of trainable parameters and the advantage of quantization in decreasing the bit-width of data, particularly compressing 3DCNN model greatly with little accuracy degradation. In addition, due to the low bit quantization to all parameters during the inference process including TT-cores, activations, and batch normalizations, the proposed method naturally takes advantage in memory and time cost. Experimental results of compressing 3DCNNs for 3D object and video recognition on ModelNet40, UCF11, and UCF50 datasets verify the effectiveness of the proposed method. The best compression ratio we have obtained is up to nearly 180× with competitive performance compared with other state-of-the-art researches. Moreover, the total bytes of our QTTNet models on ModelNet40 and UCF11 datasets can be 1000× lower than some typical practices such as MVCNN.
依靠计算集群和硬件能力的快速增长,卷积神经网络(CNN)已成功应用于各个领域并取得了领先成果。尽管有这些令人兴奋的进展,但在训练和推断大规模CNN模型时仍涉及巨大的内存成本,这使得它难以在资源有限的便携式设备中广泛应用。为了解决这个问题,我们为三维卷积神经网络(3DCNN)建立了一个名为QTTNet的训练框架,该框架将张量列车(TT)分解和数据量化结合在一起,以进一步缩小模型大小并降低内存和时间成本。通过这个框架,我们可以充分探索TT在减少可训练参数数量方面的优势以及量化在降低数据位宽方面的优势,特别是在几乎不降低精度的情况下大幅压缩3DCNN模型。此外,由于在推理过程中对包括TT核、激活值和批量归一化在内的所有参数进行低比特量化,该方法在内存和时间成本方面具有天然优势。在ModelNet40、UCF11和UCF50数据集上对3DCNN进行3D对象和视频识别的压缩实验结果验证了该方法的有效性。我们获得的最佳压缩比高达近180倍,与其他最新研究相比具有竞争力。此外,我们的QTTNet模型在ModelNet40和UCF11数据集上的总字节数可能比一些典型方法(如MVCNN)低1000倍。