Yang Guo-Ye, Li Xiang-Li, Xiao Zi-Kai, Mu Tai-Jiang, Martin Ralph R, Hu Shi-Min
IEEE Trans Image Process. 2023;32:6413-6425. doi: 10.1109/TIP.2023.3327586. Epub 2023 Nov 28.
Objects in aerial images show greater variations in scale and orientation than in other images, making them harder to detect using vanilla deep convolutional neural networks. Networks with sampling equivariance can adapt sampling from input feature maps to object transformation, allowing a convolutional kernel to extract effective object features under different transformations. However, methods such as deformable convolutional networks can only provide sampling equivariance under certain circumstances, as they sample by location. We propose sampling equivariant self-attention networks, which treat self-attention restricted to a local image patch as convolution sampling by masks instead of locations, and a transformation embedding module to improve the equivariant sampling further. We further propose a novel randomized normalization module to enhance network generalization and a quantitative evaluation metric to fairly evaluate the ability of sampling equivariance of different models. Experiments show that our model provides significantly better sampling equivariance than existing methods without additional supervision and can thus extract more effective image features. Our model achieves state-of-the-art results on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets without additional computations or parameters.
航空图像中的物体在尺度和方向上的变化比其他图像更大,这使得使用普通深度卷积神经网络来检测它们变得更加困难。具有采样等变性的网络可以使从输入特征图的采样适应物体变换,从而让卷积核在不同变换下提取有效的物体特征。然而,诸如可变形卷积网络之类的方法仅在某些情况下才能提供采样等变性,因为它们是按位置进行采样的。我们提出了采样等变自注意力网络,该网络将限制在局部图像块上的自注意力视为通过掩码而非位置进行的卷积采样,以及一个变换嵌入模块来进一步改善等变采样。我们还进一步提出了一种新颖的随机归一化模块以增强网络泛化能力,并提出了一种定量评估指标来公平地评估不同模型的采样等变能力。实验表明,我们的模型在无需额外监督的情况下提供了比现有方法明显更好的采样等变性,因此能够提取更有效的图像特征。我们的模型在DOTA-v1.0、DOTA-v1.5和HRSC2016数据集上取得了领先的结果,且无需额外的计算或参数。