不平衡未标记样本类别信息筛选下的半监督机电传动系统故障诊断

Wang Chaoge, Jia Pengpeng, Tian Xinyu, Tang Xiaojing, Hu Xiong, Li Hongkun

School of Logistics Engineering, Shanghai Maritime University, Shanghai 201306, China.

School of Mechanical Engineering, Dalian University of Technology, Dalian 116024, China.

Entropy (Basel). 2025 Feb 6;27(2):175. doi: 10.3390/e27020175.

In the health monitoring of electromechanical transmission systems, the collected state data typically consist of only a minimal amount of labeled data, with a vast majority remaining unlabeled. Consequently, deep learning-based diagnostic models encounter the challenge of scarcity in labeled data and abundance in unlabeled data. Traditional semi-supervised deep learning methods based on pseudo-label self-training, while alleviating the issue of labeled data scarcity to some extent, neglect the reliability of pseudo-label information, the accuracy of feature extraction from unlabeled data, and the imbalance in sample selection. To address these issues, this paper proposes a novel semi-supervised fault diagnosis method under imbalanced unlabeled sample class information screening. Firstly, an information screening mechanism for unlabeled data based on active learning is established. This mechanism discriminates based on the variability of intrinsic feature information in fault samples, accurately screening out unlabeled samples located near decision boundaries that are difficult to separate clearly. Then, combining the maximum membership degree of these unlabeled data in the classification space of the supervised model and interacting with the active learning expert system, label information is assigned to the screened unlabeled data. Secondly, a cost-sensitive function driven by data imbalance is constructed to address the class imbalance problem in unlabeled sample screening, adaptively adjusting the weights of different class samples during model training to guide the training of the supervised model. Ultimately, through dynamic optimization of the supervised model and the feature extraction capability of unlabeled samples, the recognition ability of the diagnostic model for unlabeled samples is significantly enhanced. Validation through two datasets, encompassing a total of 12 experimental scenarios, demonstrates that in scenarios with only a small amount of labeled data, the proposed method achieves a diagnostic accuracy increment exceeding 10% compared to existing typical methods, fully validating the effectiveness and superiority of the proposed method in practical applications.

在机电传动系统的健康监测中，采集到的状态数据通常仅包含极少量的标注数据，绝大多数数据仍未标注。因此，基于深度学习的诊断模型面临着标注数据稀缺和未标注数据丰富的挑战。基于伪标签自训练的传统半监督深度学习方法，虽然在一定程度上缓解了标注数据稀缺的问题，但却忽略了伪标签信息的可靠性、从未标注数据中提取特征的准确性以及样本选择的不平衡性。为了解决这些问题，本文提出了一种在不平衡未标注样本类信息筛选下的新型半监督故障诊断方法。首先，建立了一种基于主动学习的未标注数据信息筛选机制。该机制根据故障样本中固有特征信息的变异性进行判别，准确筛选出位于难以清晰分离的决策边界附近的未标注样本。然后，结合这些未标注数据在监督模型分类空间中的最大隶属度，并与主动学习专家系统进行交互，将标签信息分配给筛选出的未标注数据。其次，构建了一个由数据不平衡驱动的成本敏感函数，以解决未标注样本筛选中的类不平衡问题，在模型训练过程中自适应调整不同类样本的权重，以指导监督模型的训练。最终，通过对监督模型和未标注样本特征提取能力的动态优化，显著提高了诊断模型对未标注样本的识别能力。通过两个数据集进行验证，涵盖总共12个实验场景，结果表明，在仅有少量标注数据的场景中，与现有典型方法相比，所提方法的诊断准确率提升超过10%，充分验证了所提方法在实际应用中的有效性和优越性。