Suppr超能文献

重新思考用于视觉语言模型中少样本异常检测的可学习细粒度文本提示。

Reconsidering learnable fine-grained text prompts for few-shot anomaly detection in visual-language models.

作者信息

Han Delong, Xu Luo, Zhou Mingle, Wan Jin, Li Min, Li Gang

机构信息

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, 250014, China.

出版信息

Neural Netw. 2025 Feb;182:106906. doi: 10.1016/j.neunet.2024.106906. Epub 2024 Nov 18.

Abstract

Few-Shot Anomaly Detection (FSAD) in industrial images aims to identify abnormalities using only a few normal images, which is crucial for industrial scenarios where sample training is limited. The recent advances in large-scale pre-trained visual-language models have brought significant improvements to the FSAD, which typically requires hundreds of text prompts to be manually crafted through prompt engineering. However, manually designed text prompts cannot accurately match the informative features of different categories across diverse images, and the domain gap between train and test datasets can severely impact the generalization capability of text prompts. To address these issues, we propose a visual-language model based on fine-grained learnable text prompts as a unified general framework for FSAD in industry. Firstly, we design a Fine-grained Text Prompts Adapter (FTPA) and an associated registration loss to enhance the efficiency of text prompts. The manually designed text prompts are improved and optimized by capturing normal and abnormal semantic information in the image, so that the text prompts can describe the image semantic information at a finer granularity. In addition, we introduce a Dynamic Modulation Mechanism (DMM) to avoid potential errors in text prompts post-training due to the agnostic during cross-dataset detection. This is achieved by explicitly modulating the branch guided by few-shot images and the branch guided by fine-grained text prompts. Extensive experiments demonstrate that our proposed method achieves state-of-the-art few-shot industrial anomaly detection and segmentation performance. In the 4-shot, the AUROC of the anomaly classification and anomaly segmentation achieves 98.3%, 96.3%, and 93.8%, 97.9% on the MVTec-AD and VisA datasets, respectively.

摘要

工业图像中的少样本异常检测(FSAD)旨在仅使用少量正常图像来识别异常情况,这对于样本训练受限的工业场景至关重要。大规模预训练视觉语言模型的最新进展给FSAD带来了显著改进,FSAD通常需要通过提示工程手动精心设计数百个文本提示。然而,手动设计的文本提示无法准确匹配不同图像中不同类别的信息特征,并且训练数据集和测试数据集之间的领域差距会严重影响文本提示的泛化能力。为了解决这些问题,我们提出了一种基于细粒度可学习文本提示的视觉语言模型,作为工业中FSAD的统一通用框架。首先,我们设计了一个细粒度文本提示适配器(FTPA)和一个相关的配准损失,以提高文本提示的效率。通过捕捉图像中的正常和异常语义信息,对手动设计的文本提示进行改进和优化,使文本提示能够以更细的粒度描述图像语义信息。此外,我们引入了一种动态调制机制(DMM),以避免在跨数据集检测期间由于不可知而导致训练后文本提示出现潜在错误。这是通过明确调制由少样本图像引导的分支和由细粒度文本提示引导的分支来实现的。大量实验表明,我们提出的方法实现了少样本工业异常检测和分割的最优性能。在4次采样中,异常分类和异常分割的AUROC在MVTec-AD和VisA数据集上分别达到98.3%、96.3%和93.8%、97.9%。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验