用于强噪声和有限数据条件下轴承故障诊断的一维时频双通道视觉Transformer

One-dimensional time-frequency dual-channel visual transformer for bearing fault diagnosis under strong noise and limited data conditions.

作者信息

Cai Shaobin, Wang Yuchen, Cai Wanchen, Mo Yuchang, Wei Liansuo

机构信息

Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110, China.

College of Information Engineering, Huzhou University, Huzhou, 313000, China.

出版信息

Sci Rep. 2025 Jul 20;15(1):26361. doi: 10.1038/s41598-025-12533-2.

DOI:10.1038/s41598-025-12533-2

PMID:40685451

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12277460/

Abstract

In industrial settings, bearing health directly affects equipment stability, making accurate and efficient fault diagnosis critical for operational safety. Recently, Transformer models have been widely adopted in bearing fault diagnosis due to their strong global modeling capabilities. However, they still face significant challenges under strong noise and limited data. To address this, this paper proposes an end-to-end Vision Transformer with time-frequency fusion and dual attention across spatial and channel dimensions. The model adopts a dual-branch design: the time-domain branch incorporates spatial and channel attention to capture both local and global features, while the frequency-domain branch uses FFT to extract spectral information and fuses it with temporal features for efficient multi-scale modeling. To further enhance sensitivity to local patterns and periodic variations, a cross-scale convolution module and a periodic feedforward network are introduced. Experiments on the CWRU and PU datasets demonstrate that the proposed model achieves 99.42% and 98.14% accuracy, respectively, under noisy and data-scarce conditions. The results confirm superior noise robustness and diagnostic performance over recent state-of-the-art methods, highlighting its practical potential for real-world industrial applications.

摘要

在工业环境中，轴承健康状况直接影响设备稳定性，因此准确高效的故障诊断对于运行安全至关重要。近年来，Transformer模型因其强大的全局建模能力而被广泛应用于轴承故障诊断。然而，在强噪声和数据有限的情况下，它们仍然面临重大挑战。为了解决这一问题，本文提出了一种具有时频融合以及跨空间和通道维度的双重注意力机制的端到端视觉Transformer模型。该模型采用双分支设计：时域分支结合空间和通道注意力以捕获局部和全局特征，而频域分支使用快速傅里叶变换（FFT）提取频谱信息，并将其与时间特征融合以实现高效的多尺度建模。为了进一步提高对局部模式和周期性变化的敏感度，引入了跨尺度卷积模块和周期性前馈网络。在CWRU和PU数据集上的实验表明，所提出的模型在噪声和数据稀缺条件下分别实现了99.42%和98.14%的准确率。结果证实，该模型相对于最近的先进方法具有卓越的噪声鲁棒性和诊断性能，突出了其在实际工业应用中的潜力。