Shi Chenbo, Yan Shaojia, Wang Lei, Zhu Changsheng, Yu Yue, Zang Xiangteng, Liu Aiping, Zhang Chun, Feng Xiaobing
College of lntelligent Equipment, Shandong University of Science and Technology, Taian 271019, China.
Beijing Botsing Technology Co., Ltd., Beijing 100176, China.
Sensors (Basel). 2025 Jul 27;25(15):4656. doi: 10.3390/s25154656.
Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability of deep learning models, this paper proposes a multi-granularity spatiotemporal representation learning algorithm based on the hybrid enhancement of handcrafted and deep learning features. A MobileNetV2 backbone network integrated with a Temporal Shift Module (TSM) is designed to progressively capture the short-term dynamic features of the molten pool and integrate temporal information across both low-level and high-level features. A multi-granularity attention-based feature aggregation module is developed to select key interference-free frames using cross-frame attention, generate multi-granularity features via grouped pooling, and apply the Convolutional Block Attention Module (CBAM) at each granularity level. Finally, these multi-granularity spatiotemporal features are adaptively fused. Meanwhile, an independent branch utilizes the Histogram of Oriented Gradient (HOG) and Scale-Invariant Feature Transform (SIFT) features to extract long-term spatial structural information from historical edge images, enhancing the model's interpretability. The proposed method achieves an accuracy of 99.187% on a self-constructed dataset. Additionally, it attains a real-time inference speed of 20.983 ms per sample on a hardware platform equipped with an Intel i9-12900H CPU and an RTX 3060 GPU, thus effectively balancing accuracy, speed, and interpretability.
利用熔池图像进行实时质量监测是高质量、智能自动化焊接研究的关键重点。为了解决复杂焊接场景下熔池图像中的干扰问题(例如,飞溅反射的激光光斑被误分类为气孔缺陷)以及深度学习模型可解释性有限的问题,本文提出了一种基于手工特征和深度学习特征混合增强的多粒度时空表示学习算法。设计了一个集成了时间移位模块(TSM)的MobileNetV2骨干网络,以逐步捕捉熔池的短期动态特征,并整合低层次和高层次特征的时间信息。开发了一个基于多粒度注意力的特征聚合模块,使用跨帧注意力选择关键的无干扰帧,通过分组池化生成多粒度特征,并在每个粒度级别应用卷积块注意力模块(CBAM)。最后,对这些多粒度时空特征进行自适应融合。同时,一个独立分支利用方向梯度直方图(HOG)和尺度不变特征变换(SIFT)特征从历史边缘图像中提取长期空间结构信息,增强模型的可解释性。该方法在自建数据集上的准确率达到99.187%。此外,在配备英特尔i9-12900H CPU和RTX 3060 GPU的硬件平台上,它实现了每个样本20.983毫秒的实时推理速度,从而有效地平衡了准确率、速度和可解释性。