Duan Lijuan, Zhang Zichen, Liu Zhaoying, Xiao Fengjin
College of Computer Science, Beijing University of Technology, Beijing, 100124, China; Chongqing Research Institute, Beijing University of Technology, Beijing, 100124, China; Beijing Key Laboratory of Trusted Computing, Beijing University of Technology, Beijing, 100124, China.
College of Computer Science, Beijing University of Technology, Beijing, 100124, China; Beijing Key Laboratory of Trusted Computing, Beijing University of Technology, Beijing, 100124, China; National Engineering Laboratory for Critical Technologies of Information Security Classified Protection, Beijing University of Technology, Beijing, 100124, China.
Neural Netw. 2025 Nov;191:107830. doi: 10.1016/j.neunet.2025.107830. Epub 2025 Jul 6.
Object detection in remote sensing images (RSIs) is facilitated by oriented bounding boxes, yet rotated boxes (RBoxes) are typically more labor-intensive than horizontal boxes (HBoxes). Consequently, there is a tendency in most research to explore HBox-based weakly-supervised detectors with self-supervised constraints on spatial transformations. However, such weakly-supervised networks for HBoxes tend to focus on the most discriminative parts of objects, which can adversely affect the network's location accuracy. Moreover, spatial transformations introduce an ambiguity between RBoxes and HBoxes in the regression loss, detrimentally affecting the network's ability to accurately distinguish closely situated objects at the same angle. To overcome these challenges, we propose a weakly-supervised detector named knowledge-based dropblock and unified regression network (KDUNet). This network aims to learn high-quality feature information and compensate for the disparity between HBoxes and RBoxes. Initially, we use long-distance background information with diverse channel input to intentionally conceal the most distinguishable parts, thus emphasizing the entire object. Furthermore, we have developed a clear bounding box distance measure that unifies RBoxes and HBoxes through a circumscribed rectangle with a transformation angle to assess their Gaussian distance. Extensive experiments demonstrate that KDUNet is capable of learning high-quality feature information and reducing the impact of ambiguity. Experimental results on the DIOR and HRSC datasets confirm that our network surpasses six fully-supervised networks, achieving 57.8 % and 90.1 % mean Average Precision (mAP) respectively.
遥感图像(RSIs)中的目标检测可通过定向边界框来实现,然而旋转框(RBoxes)通常比水平框(HBoxes)更耗费人力。因此,在大多数研究中,倾向于探索基于HBox的弱监督检测器,并对空间变换施加自监督约束。然而,这种针对HBox的弱监督网络往往聚焦于目标中最具判别力的部分,这可能会对网络的定位精度产生不利影响。此外,空间变换在回归损失中引入了RBoxes和HBoxes之间的模糊性,对网络准确区分处于相同角度的近距离目标的能力产生不利影响。为了克服这些挑战,我们提出了一种名为基于知识的丢弃块和统一回归网络(KDUNet)的弱监督检测器。该网络旨在学习高质量的特征信息,并弥补HBoxes和RBoxes之间的差异。首先,我们使用具有多样通道输入的远距离背景信息来有意隐藏最具辨识度的部分,从而强调整个目标。此外,我们开发了一种清晰的边界框距离度量方法,通过一个具有变换角度的外接矩形将RBoxes和HBoxes统一起来,以评估它们的高斯距离。大量实验表明,KDUNet能够学习高质量的特征信息并减少模糊性的影响。在DIOR和HRSC数据集上的实验结果证实,我们的网络超过了六个全监督网络,分别实现了57.8%和90.1%的平均精度均值(mAP)。