基于双骨干网络的钢材表面缺陷检测：MBDNet-Attention-YOLO

Wang Xinyu, Ma Shuhui, Wu Shiting, Li Zhaoye, Cao Jinrong, Xu Peiquan

School of Materials Science and Engineering, Shanghai University of Engineering Science, Shanghai 201620, China.

School of Arts and Sciences, Northeast Agricultural University, Harbin 150030, China.

Sensors (Basel). 2025 Aug 5;25(15):4817. doi: 10.3390/s25154817.

Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies-ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical vision pipelines or recent deep-learning paradigms, struggle to simultaneously satisfy the stringent demands of industrial scenarios: high accuracy on sub-millimeter flaws, insensitivity to texture-rich backgrounds, and real-time throughput on resource-constrained hardware. Although contemporary detectors have narrowed the gap, they still exhibit pronounced sensitivity-robustness trade-offs, particularly in the presence of scale-varying defects and cluttered surfaces. To address these limitations, we introduce MBY (MBDNet-Attention-YOLO), a lightweight yet powerful framework that synergistically couples the MBDNet backbone with the YOLO detection head. Specifically, the backbone embeds three novel components: (1) HGStem, a hierarchical stem block that enriches low-level representations while suppressing redundant activations; (2) Dynamic Align Fusion (DAF), an adaptive cross-scale fusion mechanism that dynamically re-weights feature contributions according to defect saliency; and (3) C2f-DWR, a depth-wise residual variant that progressively expands receptive fields without incurring prohibitive computational costs. Building upon this enriched feature hierarchy, the neck employs our proposed MultiSEAM module-a cascaded squeeze-and-excitation attention mechanism operating at multiple granularities-to harmonize fine-grained and semantic cues, thereby amplifying weak defect signals against complex textures. Finally, we integrate the Inner-SIoU loss, which refines the geometric alignment between predicted and ground-truth boxes by jointly optimizing center distance, aspect ratio consistency, and IoU overlap, leading to faster convergence and tighter localization. Extensive experiments on two publicly available steel-defect benchmarks-NEU-DET and PVEL-AD-demonstrate the superiority of MBY. Without bells and whistles, our model achieves 85.8% mAP@0.5 on NEU-DET and 75.9% mAP@0.5 on PVEL-AD, surpassing the best-reported results by significant margins while maintaining real-time inference on an NVIDIA Jetson Xavier. Ablation studies corroborate the complementary roles of each component, underscoring MBY's robustness across defect scales and surface conditions. These results suggest that MBY strikes an appealing balance between accuracy, efficiency, and deployability, offering a pragmatic solution for next-generation industrial quality-control systems.

钢铁制造中的自动化表面缺陷检测对于确保产品质量至关重要，但由于缺陷形态的极端异质性（从发丝裂缝、微观孔隙到细长划痕和浅凹痕），这仍然是一个悬而未决的挑战。现有的方法，无论是传统的视觉管道还是最近的深度学习范式，都难以同时满足工业场景的严格要求：对亚毫米级缺陷的高精度、对纹理丰富背景的不敏感以及在资源受限硬件上的实时吞吐量。尽管当代探测器已经缩小了差距，但它们仍然表现出明显的灵敏度 - 鲁棒性权衡，特别是在存在尺度变化的缺陷和杂乱表面的情况下。为了解决这些限制，我们引入了MBY（MBDNet - 注意力 - YOLO），这是一个轻量级但强大的框架，它将MBDNet骨干与YOLO检测头协同耦合。具体来说，骨干嵌入了三个新颖的组件：（1）HGStem，一种分层主干块，在抑制冗余激活的同时丰富低级表示；（2）动态对齐融合（DAF），一种自适应跨尺度融合机制，根据缺陷显著性动态重新加权特征贡献；（3）C2f - DWR，一种深度可分离残差变体，在不产生过高计算成本的情况下逐步扩大感受野。基于这种丰富的特征层次结构，颈部采用了我们提出的MultiSEAM模块——一种在多个粒度上运行的级联挤压和激励注意力机制——来协调细粒度和语义线索，从而在复杂纹理中放大微弱的缺陷信号。最后，我们集成了Inner - SIoU损失，通过联合优化中心距离、纵横比一致性和IoU重叠来细化预测框和真实框之间的几何对齐，从而实现更快的收敛和更紧密的定位。在两个公开可用的钢铁缺陷基准数据集——NEU - DET和PVEL - AD上进行的广泛实验证明了MBY的优越性。在没有花里胡哨功能的情况下，我们的模型在NEU - DET上达到了85.8% mAP@0.5，在PVEL - AD上达到了75.9% mAP@0.5，大幅超越了之前报道的最佳结果，同时在NVIDIA Jetson Xavier上保持实时推理。消融研究证实了每个组件的互补作用，强调了MBY在不同缺陷尺度和表面条件下的鲁棒性。这些结果表明，MBY在准确性、效率和可部署性之间取得了诱人的平衡，为下一代工业质量控制系统提供了一个实用的解决方案。