Department of Computer Science, University of Southern California, 3650 McClintock Avenue, Los Angeles, 90089, CA, USA.
Neural Netw. 2024 Dec;180:106699. doi: 10.1016/j.neunet.2024.106699. Epub 2024 Sep 3.
Despite significant success of deep learning in object detection tasks, the standard training of deep neural networks requires access to a substantial quantity of annotated images across all classes. Data annotation is an arduous and time-consuming endeavor, particularly when dealing with infrequent objects. Few-shot object detection (FSOD) methods have emerged as a solution to the limitations of classic object detection approaches based on deep learning. FSOD methods demonstrate remarkable performance by achieving robust object detection using a significantly smaller amount of training data. A challenge for FSOD is that instances from novel classes that do not belong to the fixed set of training classes appear in the background and the base model may pick them up as potential objects. These objects behave similarly to label noise because they are classified as one of the training dataset classes, leading to FSOD performance degradation. We develop a semi-supervised algorithm to detect and then utilize these unlabeled novel objects as positive samples during the FSOD training stage to improve FSOD performance. Specifically, we develop a hierarchical ternary classification region proposal network (HTRPN) to localize the potential unlabeled novel objects and assign them new objectness labels to distinguish these objects from the base training dataset classes. Our improved hierarchical sampling strategy for the region proposal network (RPN) also boosts the perception ability of the object detection model for large objects. We test our approach and COCO and PASCAL VOC baselines that are commonly used in FSOD literature. Our experimental results indicate that our method is effective and outperforms the existing state-of-the-art (SOTA) FSOD methods. Our implementation is provided as a supplement to support reproducibility of the results https://github.com/zshanggu/HTRPN..
尽管深度学习在目标检测任务中取得了显著的成功,但深度神经网络的标准训练需要在所有类别中都有大量的标注图像。数据标注是一项艰巨且耗时的工作,尤其是在处理罕见物体时。少样本目标检测(FSOD)方法已经成为解决基于深度学习的经典目标检测方法局限性的一种方案。FSOD 方法通过使用显著较少的训练数据实现稳健的目标检测,表现出卓越的性能。FSOD 的一个挑战是,来自不属于固定训练类别集的新类别的实例可能会出现在背景中,并且基础模型可能会将它们视为潜在的对象。这些对象的行为类似于标签噪声,因为它们被分类为训练数据集的一个类别,从而导致 FSOD 性能下降。我们开发了一种半监督算法,用于在 FSOD 训练阶段检测并利用这些未标记的新对象作为正样本,以提高 FSOD 性能。具体来说,我们开发了一种分层三元分类区域提议网络(HTRPN)来定位潜在的未标记新对象,并为它们分配新的对象性标签,以将这些对象与基础训练数据集的类别区分开来。我们对区域提议网络(RPN)的改进分层抽样策略也提高了对象检测模型对大对象的感知能力。我们在 COCO 和 PASCAL VOC 等 FSOD 文献中常用的基准上测试了我们的方法和 COCO 以及 PASCAL VOC 基准。我们的实验结果表明,我们的方法是有效的,并优于现有的最先进的(SOTA)FSOD 方法。我们的实现作为补充提供,以支持结果的可重复性。https://github.com/zshanggu/HTRPN.