Guan Li, Zhang Haitao, Zhou Yijun, Du Xinyu, Li Mingxuan
Department of Smart Manufacturing, Industrial Perception and Intelligent Manufacturing Equipment Engineering Research Center of Jiangsu Province, Nanjing Vocational University of Industry Technology, Nanjing, Jiangsu, China.
Department of Data Analysis, Nanjing Weiwo Software Technology Co., Ltd. Nanjing, Jiangsu, China.
PLoS One. 2025 Sep 10;20(9):e0331025. doi: 10.1371/journal.pone.0331025. eCollection 2025.
In the field of quality control, metal surface defect detection is an important yet challenging task. Although YOLO models perform well in most object detection scenarios, metal surface images under operational conditions often exhibit coexisting high-frequency noise components and spectral aliasing background textures, and defect targets typically exhibit characteristics such as small scale, weak contrast, and multi-class coexistence, posing challenges for automatic defect detection systems. To address this, we introduce concepts including wavelet decomposition, cross-attention, and U-shaped dilated convolution into the YOLO framework, proposing the YOLOv11-WBD model to enhance feature representation capability and semantic mining effectiveness. To improve robustness, a plug-and-play Wavelet-Attentive Multiband Fusion Module (WAMF) is designed, achieving decoupling of low-frequency and high-frequency features and adaptive multi-frequency fusion. To effectively aggregate multi-scale features, a Bottleneck-Enhanced Dilated U-Conv Module (BEDU) is designed, fusing global and local information with lower computational resource consumption. To address feature fusion, a Bidirectional Depthwise Cross-Attention Module (BDCA) is designed to replace simple concatenation and convolution operations, achieving adaptive feature fusion. YOLOv11-WBD undergoes rigorous evaluation on the public NEU-DET and GC10-DET datasets; experimental results show that the improved model achieves performance gains on both datasets: the mAP@0.5 metric increased by 5.8% and 2.8% respectively. Furthermore, the improved model demonstrates stronger noise tolerance, maintaining high defect detection capability even in moderate noise environments, providing a valuable solution for industrial applications.
在质量控制领域,金属表面缺陷检测是一项重要但具有挑战性的任务。尽管YOLO模型在大多数目标检测场景中表现出色,但运行条件下的金属表面图像通常同时存在高频噪声成分和频谱混叠背景纹理,并且缺陷目标通常具有小尺度、弱对比度和多类共存等特征,这给自动缺陷检测系统带来了挑战。为了解决这个问题,我们将小波分解、交叉注意力和U型扩张卷积等概念引入YOLO框架,提出了YOLOv11-WBD模型,以增强特征表示能力和语义挖掘效果。为了提高鲁棒性,设计了一个即插即用的小波注意力多波段融合模块(WAMF),实现低频和高频特征的解耦以及自适应多频融合。为了有效地聚合多尺度特征,设计了一个瓶颈增强扩张U型卷积模块(BEDU),以较低的计算资源消耗融合全局和局部信息。为了解决特征融合问题,设计了一个双向深度交叉注意力模块(BDCA)来取代简单的拼接和卷积操作,实现自适应特征融合。YOLOv11-WBD在公开的NEU-DET和GC10-DET数据集上进行了严格评估;实验结果表明,改进后的模型在两个数据集上均取得了性能提升:mAP@0.5指标分别提高了5.8%和2.8%。此外,改进后的模型表现出更强的噪声容忍能力,即使在中等噪声环境下也能保持较高的缺陷检测能力,为工业应用提供了有价值的解决方案。