He Yulin, Chen Wei, Wang Siqi, Liu Tianrui, Wang Meng
IEEE Trans Image Process. 2025;34:729-742. doi: 10.1109/TIP.2024.3459589. Epub 2025 Jan 28.
Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include: 1) Noisy labels caused by the class-agnostic nature of SAM; 2) Precision degradation on known objects when more unknown objects are recalled. For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact. For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification. Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin (~20%), but also preserves highly-competitive precision on known objects. The program codes are available at https://github.com/harrylin-hyl/SGROD.
开放世界目标检测(OWOD)旨在使目标检测适应开放世界环境,以便检测未知目标并逐步学习知识。现有的OWOD方法通常利用已知目标数量相对较少的训练集。由于缺乏通用目标知识,它们无法全面感知训练集范围之外的目标。在大规模数据上训练的大视觉模型(LVM)的最新进展,为利用丰富的通用知识推动OWOD的根本进步提供了一个有前景的机会。受分割一切模型(SAM)的启发,SAM是一个著名的LVM,因其分割通用目标的卓越能力而受到赞誉,我们首先展示了将SAM用于OWOD的可能性,并建立了首个基于SAM引导的OWOD基线解决方案。随后,我们识别并解决了基于SAM引导的OWOD中的两个基本挑战,并提出了一种开创性的基于SAM引导的鲁棒开放世界检测器(SGROD)方法,该方法可以显著提高未知目标的召回率,同时不损失已知目标的精度。具体而言,基于SAM引导的OWOD中的两个挑战包括:1)SAM的类别无关性质导致的噪声标签;2)召回更多未知目标时已知目标的精度下降。对于第一个问题,我们提出了一种动态标签分配(DLA)方法,该方法在训练期间从SAM中自适应选择置信标签,明显降低了噪声影响。对于第二个问题,我们引入了跨层学习(CLL)和基于SAM的负采样(SNS),这使得SGROD能够通过学习目标性和分类的鲁棒决策边界来避免精度损失。在公共数据集上的实验表明,SGROD不仅大幅提高了未知目标的召回率(约20%),而且在已知目标上保持了极具竞争力的精度。程序代码可在https://github.com/harrylin-hyl/SGROD获取。