Xu Xinyu, Liu Huazhen, Zhang Tao, Xiong Huilin, Yu Wenxian
IEEE Trans Image Process. 2025;34:2781-2795. doi: 10.1109/TIP.2025.3558425. Epub 2025 May 12.
Semantic segmentation is an important branch of image processing and computer vision. With the popularity of deep learning, various convolutional neural networks have been proposed for pixel-level classification and segmentation tasks. In practical scenarios, however, imaging angles are often arbitrary, encompassing instances such as water body images from remote sensing and capillary and polyp images in the medical domain, where prior orientation information is typically unavailable to guide these networks to extract more effective features. In this case, learning features from objects with diverse orientation information poses a significant challenge, as the majority of CNN-based semantic segmentation networks lack rotation equivariance to resist the disturbance from orientation information. To address this challenge, this paper first constructs a universal convolution-group framework aimed at more fully utilizing orientation information and equipping the network with rotation equivariance. Subsequently, we mathematically design a padding-based rotation equivariant convolution mode (PreCM), which is not only applicable to multi-scale images and convolutional kernels but can also serve as a replacement component for various types of convolutions, such as dilated convolutions, transposed convolutions, and asymmetric convolution. To quantitatively assess the impact of image rotation in semantic segmentation tasks, we also propose a new evaluation metric, Rotation Difference (RD). The replacement experiments related to six existing semantic segmentation networks on three datasets (i.e., Satellite Images of Water Bodies, DRIVE, and Floodnet) show that, the average Intersection Over Union (IOU) of their PreCM-based versions respectively improve 6.91%, 10.63%, 4.53%, 5.93%, 7.48%, 8.33% compared to their original versions in terms of random angle rotation. And the average RD values are decreased by 3.58%, 4.56%, 3.47%, 3.66%, 3.47%, 3.43% respectively. The code can be download from https://github.com/XinyuXu414.
语义分割是图像处理和计算机视觉的一个重要分支。随着深度学习的普及,人们提出了各种卷积神经网络用于像素级分类和分割任务。然而,在实际场景中,成像角度往往是任意的,包括遥感水体图像以及医学领域的毛细血管和息肉图像等情况,在这些情况下,通常没有先验方向信息来引导这些网络提取更有效的特征。在这种情况下,从具有不同方向信息的对象中学习特征面临重大挑战,因为大多数基于卷积神经网络的语义分割网络缺乏旋转不变性来抵抗方向信息的干扰。为了应对这一挑战,本文首先构建了一个通用卷积组框架,旨在更充分地利用方向信息并使网络具有旋转不变性。随后,我们从数学上设计了一种基于填充的旋转不变卷积模式(PreCM),它不仅适用于多尺度图像和卷积核,还可以作为各种类型卷积(如空洞卷积、转置卷积和非对称卷积)的替代组件。为了定量评估图像旋转在语义分割任务中的影响,我们还提出了一种新的评估指标,即旋转差异(RD)。在三个数据集(即水体卫星图像、DRIVE和Floodnet)上与六个现有语义分割网络相关的替换实验表明,在随机角度旋转的情况下,基于PreCM版本的平均交并比(IOU)分别比其原始版本提高了6.91%、10.63%、4.53%、5.93%、7.48%、8.33%。并且平均RD值分别降低了3.58%、4.56%、3.47%、3.66%、3.47%、3.43%。代码可从https://github.com/XinyuXu414下载。