Ramírez-Sanz José Miguel, Maestro-Prieto Jose-Alberto, Arnaiz-González Álvar, Bustillo Andrés
Universidad de Burgos, Avda. Cantabria s/n, Burgos, 09006, Burgos, Spain.
Universidad de Burgos, Avda. Cantabria s/n, Burgos, 09006, Burgos, Spain.
ISA Trans. 2023 Dec;143:255-270. doi: 10.1016/j.isatra.2023.09.027. Epub 2023 Sep 25.
The automation of Fault Detection and Diagnosis (FDD) is a central task for many industries today. A myriad of methods are in use, although the most recent leading contenders are data-driven approaches and especially Machine Learning (ML) methods. ML algorithms fall into two main categories: supervised and unsupervised methods, depending on whether or not the instances are labeled with the expected outputs. However, a new approach called Semi-Supervised Learning (SSL) has recently emerged that uses a few labeled instances together with other unlabeled instances for the training process. This new approach can significantly improve the accuracy of conventional ML models for industrial environments where labeled data are scarce. SSL has been tested as a promising solution over the past few years for several FDD problems, although there have been no systemic reviews of this sort of approach up until the present review. In this study, an attempt to organize the existing literature on SSL for FDD using the taxonomy of van Engelen & Hoos is reported. The most and the least frequently used SSL algorithms are identified and considered in terms of different fault detection tasks and their most common dataset structure. Moreover, a set of best practices are proposed in the conclusions of this work for implementation under real industrial conditions, so as to avoid some of the most common faults.
故障检测与诊断(FDD)自动化是当今许多行业的核心任务。尽管目前最新的主要竞争者是数据驱动方法,尤其是机器学习(ML)方法,但仍有无数方法在使用。根据实例是否带有预期输出标签,ML算法主要分为两类:监督式和无监督式方法。然而,最近出现了一种名为半监督学习(SSL)的新方法,它在训练过程中使用少量带标签的实例以及其他无标签的实例。对于标记数据稀缺的工业环境,这种新方法可以显著提高传统ML模型的准确性。在过去几年中,SSL已作为一种有前景的解决方案针对多个FDD问题进行了测试,不过直到本次综述之前,还没有对这种方法进行过系统性综述。在本研究中,报告了一项尝试,即使用范·恩格伦和胡斯的分类法来整理关于用于FDD的SSL的现有文献。根据不同的故障检测任务及其最常见的数据集结构,确定并考虑了使用频率最高和最低的SSL算法。此外,在本工作的结论中提出了一套最佳实践,以便在实际工业条件下实施,从而避免一些最常见的故障。