在不损失精度的情况下召回未知物体：一种解决大模型引导的开放世界目标检测的有效方案。

Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection.

作者信息

He Yulin, Chen Wei, Wang Siqi, Liu Tianrui, Wang Meng

出版信息

IEEE Trans Image Process. 2025;34:729-742. doi: 10.1109/TIP.2024.3459589. Epub 2025 Jan 28.

DOI:10.1109/TIP.2024.3459589

Abstract

Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include: 1) Noisy labels caused by the class-agnostic nature of SAM; 2) Precision degradation on known objects when more unknown objects are recalled. For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact. For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification. Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin (~20%), but also preserves highly-competitive precision on known objects. The program codes are available at https://github.com/harrylin-hyl/SGROD.

摘要

开放世界目标检测（OWOD）旨在使目标检测适应开放世界环境，以便检测未知目标并逐步学习知识。现有的OWOD方法通常利用已知目标数量相对较少的训练集。由于缺乏通用目标知识，它们无法全面感知训练集范围之外的目标。在大规模数据上训练的大视觉模型（LVM）的最新进展，为利用丰富的通用知识推动OWOD的根本进步提供了一个有前景的机会。受分割一切模型（SAM）的启发，SAM是一个著名的LVM，因其分割通用目标的卓越能力而受到赞誉，我们首先展示了将SAM用于OWOD的可能性，并建立了首个基于SAM引导的OWOD基线解决方案。随后，我们识别并解决了基于SAM引导的OWOD中的两个基本挑战，并提出了一种开创性的基于SAM引导的鲁棒开放世界检测器（SGROD）方法，该方法可以显著提高未知目标的召回率，同时不损失已知目标的精度。具体而言，基于SAM引导的OWOD中的两个挑战包括：1）SAM的类别无关性质导致的噪声标签；2）召回更多未知目标时已知目标的精度下降。对于第一个问题，我们提出了一种动态标签分配（DLA）方法，该方法在训练期间从SAM中自适应选择置信标签，明显降低了噪声影响。对于第二个问题，我们引入了跨层学习（CLL）和基于SAM的负采样（SNS），这使得SGROD能够通过学习目标性和分类的鲁棒决策边界来避免精度损失。在公共数据集上的实验表明，SGROD不仅大幅提高了未知目标的召回率（约20%），而且在已知目标上保持了极具竞争力的精度。程序代码可在https://github.com/harrylin-hyl/SGROD获取。

相似文献

Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection.在不损失精度的情况下召回未知物体：一种解决大模型引导的开放世界目标检测的有效方案。

IEEE Trans Image Process. 2025;34:729-742. doi: 10.1109/TIP.2024.3459589. Epub 2025 Jan 28.

Unsupervised Recognition of Unknown Objects for Open-World Object Detection.

IEEE Trans Neural Netw Learn Syst. 2025 Jun;36(6):11340-11354. doi: 10.1109/TNNLS.2025.3559940.

OW-Adapter: Human-Assisted Open-World Object Detection with a Few Examples.

IEEE Trans Vis Comput Graph. 2024 Jan;30(1):694-704. doi: 10.1109/TVCG.2023.3326577. Epub 2023 Dec 25.

A segment anything model-guided and match-based semi-supervised segmentation framework for medical imaging.一种用于医学成像的基于段式分割模型引导和匹配的半监督分割框架。

Med Phys. 2025 Mar 29. doi: 10.1002/mp.17785.

End-to-End Open-Vocabulary Video Visual Relationship Detection Using Multi-Modal Prompting.

IEEE Trans Pattern Anal Mach Intell. 2025 Aug;47(8):6599-6615. doi: 10.1109/TPAMI.2025.3561366.

Segment anything model for medical images?用于医学图像的图像分割模型？

Med Image Anal. 2024 Feb;92:103061. doi: 10.1016/j.media.2023.103061. Epub 2023 Dec 7.

Detecting Every Object From Events.

IEEE Trans Pattern Anal Mach Intell. 2025 Aug;47(8):7171-7178. doi: 10.1109/TPAMI.2025.3565102.

Improved region proposal network for enhanced few-shot object detection.改进的区域提议网络，用于增强少样本目标检测。

Neural Netw. 2024 Dec;180:106699. doi: 10.1016/j.neunet.2024.106699. Epub 2024 Sep 3.

Segment Anything Model Is a Good Teacher for Local Feature Learning.

IEEE Trans Image Process. 2025;34:2097-2111. doi: 10.1109/TIP.2025.3554033. Epub 2025 Apr 4.

A New Deep Learning-based Dynamic Paradigm Towards Open-World Plant Disease Detection.一种基于深度学习的面向开放世界植物病害检测的动态范式。

Front Plant Sci. 2023 Oct 2;14:1243822. doi: 10.3389/fpls.2023.1243822. eCollection 2023.

在不损失精度的情况下召回未知物体：一种解决大模型引导的开放世界目标检测的有效方案。

Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection.

作者信息

He Yulin, Chen Wei, Wang Siqi, Liu Tianrui, Wang Meng

出版信息

IEEE Trans Image Process. 2025;34:729-742. doi: 10.1109/TIP.2024.3459589. Epub 2025 Jan 28.

DOI:10.1109/TIP.2024.3459589

PMID:39292592

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在不损失精度的情况下召回未知物体：一种解决大模型引导的开放世界目标检测的有效方案。

Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection.

作者信息

出版信息

相似文献

在不损失精度的情况下召回未知物体：一种解决大模型引导的开放世界目标检测的有效方案。

Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection.

作者信息

出版信息

相似文献