Jiang Yifan, Chen Jinshui, Lu Jiangang
State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China.
Sensors (Basel). 2025 Apr 11;25(8):2417. doi: 10.3390/s25082417.
In industrial scenarios, image segmentation is essential for accurately identifying defect regions. Recently, the emergence of foundation models driven by powerful computational resources and large-scale training data has brought about a paradigm shift in deep learning-based image segmentation. The Segment Anything Model (SAM) has shown exceptional performance across various downstream tasks, owing to its vast semantic knowledge and strong generalization capabilities. However, the feature distribution discrepancy, reliance on manually labeled prompts, and limited category information of SAM reduce its scalability in industrial settings. To address these issues, we propose PA-SAM, an industrial defect segmentation framework based on SAM. Firstly, to bridge the gap between SAM's pre-training data and distinct characteristics of industrial defects, we introduce a parameter-efficient fine-tuning (PEFT) technique incorporating lightweight Multi-Scale Partial Convolution Aggregation (MSPCA) into Low-Rank Adaptation (LoRA), named MSPCA-LoRA, which effectively enhances the image encoder's sensitivity to prior knowledge biases, while maintaining PEFT efficiency. Furthermore, we present the Image-to-Prompt Embedding Generator (IPEG), which utilizes image embeddings to autonomously create high-quality prompt embeddings for directing mask segmentation, eliminating the limitations of manually provided prompts. Finally, we apply effective refinements to SAM's mask decoder, transforming SAM into an end-to-end semantic segmentation framework. On two real-world defect segmentation datasets, PA-SAM achieves mean Intersections over Union of 73.87% and 68.30%, as well as mean Dice coefficients of 84.90% and 80.22%, outperforming other state-of-the-art algorithms, further demonstrating its robust generalization and application potential.
在工业场景中,图像分割对于准确识别缺陷区域至关重要。近年来,由强大计算资源和大规模训练数据驱动的基础模型的出现,给基于深度学习的图像分割带来了范式转变。分割一切模型(SAM)凭借其丰富的语义知识和强大的泛化能力,在各种下游任务中展现出卓越性能。然而,SAM的特征分布差异、对人工标注提示的依赖以及有限的类别信息,降低了其在工业环境中的可扩展性。为了解决这些问题,我们提出了PA-SAM,一种基于SAM的工业缺陷分割框架。首先,为了弥合SAM预训练数据与工业缺陷独特特征之间的差距,我们引入了一种参数高效微调(PEFT)技术,将轻量级多尺度局部卷积聚合(MSPCA)融入低秩适应(LoRA),命名为MSPCA-LoRA,它有效增强了图像编码器对先验知识偏差的敏感性,同时保持了PEFT效率。此外,我们提出了图像到提示嵌入生成器(IPEG),它利用图像嵌入自动创建高质量的提示嵌入来指导掩码分割,消除了人工提供提示的局限性。最后,我们对SAM的掩码解码器进行有效改进,将SAM转变为一个端到端的语义分割框架。在两个真实世界的缺陷分割数据集上,PA-SAM的平均交并比分别达到73.87%和68.30%,平均Dice系数分别达到84.90%和80.22%,优于其他先进算法,进一步证明了其强大的泛化能力和应用潜力。