College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310014, PR China; Key Laboratory of Visual Media Intelligent Processing Technology of Zhejiang Province, Hangzhou 310023, PR China.
School of Mathematical and Computer Science, Zhejiang A & F University, Hangzhou 311300, PR China.
Neural Netw. 2024 Nov;179:106568. doi: 10.1016/j.neunet.2024.106568. Epub 2024 Jul 23.
Dilated convolution has been widely used in various computer vision tasks due to its ability to expand the receptive field while maintaining the resolution of feature maps. However, the critical challenge is the gridding problem caused by the isomorphic structure of the dilated convolution, where the holes filled in the dilated convolution destroy the integrity of the extracted information and cut off the relevance of neighboring pixels. In this work, a novel heterogeneous dilated convolution, called HDConv, is proposed to address this issue by setting independent dilation rates on grouped channels while keeping the general convolution operation. The heterogeneous structure can effectively avoid the gridding problem while introducing multi-scale kernels in the filters. Based on the heterogeneous structure of the proposed HDConv, we also explore the benefit of large receptive fields to feature extraction by comparing different combinations of dilated rates. Finally, a series of experiments are conducted to verify the effectiveness of some computer vision tasks, such as image segmentation and object detection. The results show the proposed HDConv can achieve a competitive performance on ADE20K, Cityscapes, COCO-Stuff10k, COCO, and a medical image dataset UESTC-COVID-19. The proposed module can readily replace conventional convolutions in existing convolutional neural networks (i.e., plug-and-play), and it is promising to further extend dilated convolution to wider scenarios in the field of image segmentation.
扩张卷积由于能够在保持特征图分辨率的同时扩大感受野,因此在各种计算机视觉任务中得到了广泛应用。然而,其面临的关键挑战是扩张卷积的同构结构引起的网格问题,其中扩张卷积中填充的空洞会破坏提取信息的完整性,并切断相邻像素之间的相关性。在这项工作中,提出了一种新的非均匀扩张卷积(HDConv),通过在分组通道上设置独立的扩张率,同时保持一般卷积操作,可以解决这个问题。这种非均匀结构可以有效地避免网格问题,同时在滤波器中引入多尺度核。基于所提出的 HDConv 的非均匀结构,我们还通过比较不同扩张率的组合,探讨了大感受野对特征提取的益处。最后,进行了一系列实验来验证一些计算机视觉任务的有效性,如图像分割和目标检测。结果表明,所提出的 HDConv 在 ADE20K、Cityscapes、COCO-Stuff10k、COCO 和 UESTC-COVID-19 医疗图像数据集上具有竞争力的性能。所提出的模块可以很容易地替代现有卷积神经网络中的常规卷积(即即插即用),并且有望将扩张卷积进一步扩展到图像分割领域的更广泛场景。