Hao Tianxiang, Ding Xiaohan, Han Jungong, Guo Yuchen, Ding Guiguang
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16831-16844. doi: 10.1109/TNNLS.2023.3298263. Epub 2024 Oct 29.
The existence of redundancy in convolutional neural networks (CNNs) enables us to remove some filters/channels with acceptable performance drops. However, the training objective of CNNs usually tends to minimize an accuracy-related loss function without any attention paid to the redundancy, making the redundancy distribute randomly on all the filters, such that removing any of them may trigger information loss and accuracy drop, necessitating a fine-tuning step for recovery. In this article, we propose to manipulate the redundancy during training to facilitate network pruning. To this end, we propose a novel centripetal SGD (C-SGD) to make some filters identical, resulting in ideal redundancy patterns, as such filters become purely redundant due to their duplicates, hence removing them does not harm the network. As shown on CIFAR and ImageNet, C-SGD delivers better performance because the redundancy is better organized, compared to the existing methods. The efficiency also characterizes C-SGD because it is as fast as regular SGD, requires no fine-tuning, and can be conducted simultaneously on all the layers even in very deep CNNs. Besides, C-SGD can improve the accuracy of CNNs by first training a model with the same architecture but wider layers and then squeezing it into the original width.
卷积神经网络(CNN)中冗余的存在使我们能够在可接受的性能下降情况下移除一些滤波器/通道。然而,CNN的训练目标通常倾向于最小化与准确率相关的损失函数,而不关注冗余情况,这使得冗余随机分布在所有滤波器上,以至于移除其中任何一个都可能引发信息损失和准确率下降,因此需要一个微调步骤来恢复。在本文中,我们提议在训练期间操纵冗余以促进网络剪枝。为此,我们提出一种新颖的向心随机梯度下降(C-SGD)方法,使一些滤波器变得相同,从而产生理想的冗余模式,因为这些滤波器由于重复而变得纯粹冗余,所以移除它们不会损害网络。如在CIFAR和ImageNet数据集上所示,与现有方法相比,C-SGD由于冗余得到了更好的组织,因而具有更好的性能。C-SGD的效率也很突出,因为它与常规随机梯度下降一样快,无需微调,并且即使在非常深的CNN中也能在所有层上同时进行。此外,C-SGD可以通过首先训练一个具有相同架构但更宽层的模型,然后将其压缩到原始宽度来提高CNN的准确率。